Skip to main content

About collections

Learn about the definition of personal and sensitive data in Piiano Vault

Vault stores data in a hierarchy, where each vault contains collections, each collection contains objects, and objects hold values.

Each vault may have multiple collections of both independent and associated data. For example, an application may use two collections in a vault – one for customers in an e-commerce system and another collection of associated data for the financial transactions made by those customers.

Collection and schemas

The data values held in a collection's objects are defined with a schema, similar to a database table schema. The schema contains properties that define the data types of all values in the collection's objects, similar to columns of a table. All objects in a collection conform to the same schema. For example, a collection storing data about people may have a schema that specifies properties for first name, last name, SSN, etc. All schemas are based on schema prototypes. These prototypes define properties that have specific meaning or purposes within Vault, such as the item ID, owner ID, and creation time.

Semantic data types

Vault's unique ability to enable privacy by design for personally identifiable information (PII) comes from its support for semantic PII data types, such as name, social security number (SSN), credit card, phone number, address, etc. Using these types, Vault understands data's semantic meaning and provides specialized, built-in capabilities for each data type. For example, Vault can:

  • Verify the format and values for SSN, phone number, addresses, etc.
  • Mask credit card numbers to display only the last 4 digits, as most uses don't require complete credit card numbers. See What is a transformation for more details.
  • Limit read access for sensitive information, such as SSN, and allow access only with explicit user approval, such as using a 2FA code sent to their phone. This mechanism can be helpful, for example, when you want to limit support systems or call center operators from gaining full access to people's data. This control is achieved using the policy feature of identity and access management.

Each semantic data type has a validator that ensures the value is correctly formatted and can support transformations.

See Data types for a list of the supported data types.

Schema cache

Vault maintains a cached copy of the schemas for your collections in memory. For performance reasons, this cache works so that schema information is eventually consistent. Where Vault runs in multiple instances, this eventual consistency can impact schema changes and reading and writing objects. See the metadata cache page for more information.