Skip to main content

Work with files and blobs

Learn how to secure file and unstructured data with Piiano Vault

Vault provides two mechanisms for protecting the content of files and unstructured data:

  • Store the data in a collection using an attribute with the BLOB data type.
  • Use the cryptographic API to encrypt the data and store the resulting ciphertext in another system.
note

Vault doesn't process files directly, only their contents, even when using the Vault SDK. You must read a file's content and pass it to Vault to store or encrypt. When you read or decrypt content, you must save it in the appropriate file format.

Work with blobs in objects

This option is ideal for a small number of data items per user. For example, an identity photograph, avatar, or biometric data.

Specify a blob in a collection schema

You specify a blob as part of a collection in the same way as any other collection attribute. For example, for personal data, including an ID photo, you might specify a collection using a PVSchema like this:

employees PERSONS (
full_name NAME NULL,
id_photo BLOB
);

Or in JSON format:

{
"type": "PERSONS",
"name": "employees",
"properties": [
{
"name": "full_name",
"data_type_name": "NAME",
"is_nullable": true
},
{
"name": "id_photo",
"data_type_name": "BLOB"
}
]
}

Save and retrieve blob data

You can send blobs to and retrieve them from Vault:

  1. Using JSON messages by encoding the file's contents in base64. With this method, you can pass multiple blobs in one call.
  2. Using raw file contents by specifying the request's content type or accept header as application/octet-stream. With this method, you can pass one blob per call.

The raw file contents option is more performant as it requires the transmission of fewer bytes and avoids the conversion to and from base64.

Using JSON messages

To store the employee's ID photo, you encode the image data in base64 and then write it to the collection. For example, using the REST API Add object operation, like this:

curl -s -X POST \
--url 'http://localhost:8123/api/pvlt/1.0/data/collections/employees/objects' \
-H 'Authorization: Bearer pvaultauth' \
-H 'Content-Type: application/json' \
-d '{
"full_name":"Jane Doe"
"id_photo":"JVBERi0xLjMNJeLjz9MNCjcgMCBvYmoNPDwvTGluZWFyaXplZCAxL0wgNzk0NS9PIDkvRSAzNT…VFT0YNCg==",
}'

You can then retrieve the blob as you would any other data from a collection and retrieve the image by decoding the base64 string.

Using raw file contents

To store the employee's ID photo, you write it to the collection specifying the property in the query string. For example, using the REST API Add object operation, like this:

curl -s -X POST \
--url 'http://localhost:8123/api/pvlt/1.0/data/collections/employees/objects?prop=id_photo' \
-H 'Authorization: Bearer pvaultauth' \
-H 'Content-Type: application/octet-stream' \
--data-binary @path-to-file.bin

You can then retrieve the blob in base64, as you would any other data from a collection. Alternatively, you can get your blob in raw form like this:

OBJ_ID=32077c80-3792-4a45-a957-e365bb1c9533
curl -s -X GET \
--url "http://localhost:8123/api/pvlt/1.0/data/collections/employees/objects/${OBJ_ID}?props=id_photo" \
-H "Authorization: Bearer pvaultauth" \
-H "Accept: application/octet-stream" \
-o path-to-file.bin

Or using the CLI like this:

pvault object get-blob --id ${OBJ_ID} --prop id_photo -c employees -o path-to-file.bin

Limitations

By default, Vault limits to 5MB. The PVAULT_DB_MAX_BLOB_LENGTH environment variable controls this limit.

Use encryption to work with blobs

This option is best for cases where there are large numbers of files or larger-size files, such as medical images. In this case, you can store the encrypted copies of the blob in your storage solution (e.g., database, S3, etc.) and use Vault as an encryption and decryption service.

Specify collection schema

The APIs to encrypt and decrypt blobs are based on those that offer that feature to structured object data, so they require a collection to work against. A basic collection only needs to include the blob attribute like this using a PVSchema:

files DATA (
file BLOB
);

Or in JSON format:

{
"type": "DATA",
"name": "files",
"properties": [
{
"name": "file",
"data_type_name": "BLOB"
}
]
}

Encrypt your file

Now you pass the binary content of the file to the REST API encrypt blob operation like this:

curl --request POST \
--url 'http://localhost:8123/api/pvlt/1.0/data/collections/files/encrypt/blob?reason=Maintenance&prop=file&type=randomized&scope=default&tags=tag1%2Ctag2' \
--header 'Authorization: Bearer pvaultauth' \
--header 'Content-Type: application/octet-stream' \
--data string

The operation returns cyphertext, which you store in a database optimized for storing large objects or a file server.

Decrypt your blob

To retrieve the content of the file, you pass the cyphertext to the REST API decrypt blob operation like this:

curl --request POST \
--url 'http://localhost:8123/api/pvlt/1.0/data/collections/files/decrypt/blob?reason=Maintenance&prop=file&scope=default' \
--header 'Authorization: Bearer pvaultauth' \
--header 'Content-Type: application/octet-stream' \
--data string