Protect transcript data
Learn how to encrypt, search, and control access to sensitive data extracted from call transcripts
Businesses are widely adopting AI for labor-intensive activities that are difficult to scale. One such role is customer support, where companies deploy AI-based conversational support agents to speed up response times and increase customer satisfaction.
In addition to conversing with customers during the call, these AI agents can summarize the call, transcribe it, and extract customer contact information such as emails and phone numbers.
Transcripts, summaries, and customer contact information are typically classified as either confidential data or personally identifiable information (PII). To preserve privacy, control access, and minimize the impact of potential data breaches, you shouldn’t store transcript data as plain text in a regular database. One way to protect this information is to use a dedicated data privacy vault.
What you learn
In this tutorial, you learn how to:
- Safely encrypt and decrypt transcript data using Vault.
- Enable searches of encrypted data using blind indexes.
- Configure encryption access permissions.
Tutorial overview
The tutorial covers creating a Vault sandbox and a Node.js application to implement and run the tutorial code. You then examine encryption and decryption. First, you create custom data types for sensitive data extracted by AI from the call transcript. Then, you use these custom and standard vault data types to create a collection to define the data returned from the transcript processing. After adding some transcript data, you look at examples of encrypting and decrypting that data.
Then, as the data isn't stored in Vault and cannot take advantage of its search capabilities, you add blind indexes for the encrypted transcript data and store the encrypted data and indexes in a database before looking at examples of searching for calls by phone or email.
Finally, the tutorial looks at how to set up a Vault identity and access management configuration to control which application functions and users can encrypt and decrypt data.
Create a Vault sandbox
If you don't have a Vault account, create one:
- Go to https://app.piiano.io/register.
- Sign up with your GitHub or Google account.
- When in your Vault account, open the Default Vault sandbox environment.
- From the top bar in your default vault, save the two configuration values Endpoint and Admin API key, you need them later.
Create a Node.js application
The code in this tutorial is executed using a Node.js application. To create the application:
- In terminal, to create and navigate to a directory to host your Node application, run this command:
mkdir node-protect-transcripts && cd node-protect-transcripts
- To generate the
package.json
file for the Node application, run.npm init es6 -y
- To create
index.js
as the entry file for your Node application, run.touch index.js
- To install Vault’s TypeScript SDK and all other packages that are used in this tutorial, run this command.
npm install @piiano/vault-client jose sqlite3
- Open
index.js
in a text editor.
To start you add code that:
- Imports Vault's TypeScript SDK and two more packages that you use later.
- Creates a Vault client instance that connects to your Vault sandbox.
- Calls your Vault sandbox to check and display its health status.
Here is the code. Before you paste it in to index.js
replace <your_piiano_vault_endpoint>
and <your_piiano_admin_api_key>
with the values you copied from your Vault account. (Note that in a production environment, you should fetch the endpoint and the API key from environment variables instead of hard-coding them.)
import { VaultClient } from "@piiano/vault-client";
import * as jose from "jose";
const PVAULT_ADDR = "<your_piiano_vault_endpoint>";
const PVAULT_AUTH_TOKEN = "<your_piiano_admin_api_key>";
const piianoVaultClient = new VaultClient({
vaultURL: PVAULT_ADDR,
apiKey: PVAULT_AUTH_TOKEN,
});
const status = await piianoVaultClient.system.controlHealth();
console.log(status);
Run the Node application by invoking node index.js
in the terminal. If everything is working correctly, you see this in the terminal:
{ status: 'pass' }
Add custom data types
Companies commonly use Vault to store their sensitive data. However, you can also use Vault as an encryption and decryption service: You encrypt your data with Vault, store the encrypted output in your storage, and call Vault when you need to decrypt it. You implement this use case in this tutorial.
In Vault, you need a collection to encrypt and decrypt support call data. The data includes the transcript, summary, agent information, and caller phone number. It can include any additional information extracted from the transcript in a result
field.
Before progressing further, you need to decide whether to use Vault's built-in data types only or create custom data types for your data. With custom data types, you can set up custom validation, normalization, and transformation rules and define data-type-specific access controls.
To store transcripts and summaries, you add two custom data types based on the built-in LONG_TEXT
type. To do this, you use code that:
- Defines a list of the custom data types to create.
- Gets a list of the custom data types in your Vault sandbox.
- Checks whether your Vault sandbox contains data types with the name of your proposed data types and if it doesn't creates the data type. Otherwise, it keeps the existing data type.
Here is the code to add at the end of index.js
:
const customDataTypes = [
{ name: "SUMMARY", base_type_name: "LONG_TEXT", description: "A summary" },
{
name: "TRANSCRIPT",
base_type_name: "LONG_TEXT",
description: "A transcript",
},
];
const existingDatatypes = await piianoVaultClient.customDataTypes.listDataTypes(
{}
);
for (const dataType of customDataTypes) {
if (!existingDatatypes.find((x) => x.name === dataType.name)) {
await piianoVaultClient.customDataTypes.addDataType({
requestBody: dataType,
});
console.log(`Created a new data type ${dataType.name}`);
} else {
console.log(`Found existing data type: ${dataType.name}`);
}
}
Define a Vault collection
You can now define a Vault collection using a schema that includes the custom data types using code that:
- Gets the list of collections from your Vault account.
- If a "calls” collection doesn't exist, the code creates one using a schema with 6 properties, including
transcript
andsummary
properties, using custom data types.
Here is the code. Insert it at the end of index.js
:
const collectionName = "calls";
const existingCollections = await piianoVaultClient.collections.listCollections(
{ format: "json" }
);
if (!existingCollections.find((x) => x.name === collectionName)) {
await piianoVaultClient.collections.addCollection({
requestBody: {
name: collectionName,
type: "DATA",
properties: [
{
name: "transcript",
data_type_name: "TRANSCRIPT",
description: "Call transcript",
},
{
name: "result",
data_type_name: "JSON",
description:
"Additional extracted information from the call transcript",
},
{
name: "summary",
data_type_name: "SUMMARY",
description: "Call summary",
},
{
name: "agent_user_id",
data_type_name: "STRING",
description: "Agent user ID",
},
{
name: "agent_group_id",
data_type_name: "STRING",
description: "Agent group ID",
},
{
name: "blind_index",
data_type_name: "STRING",
description: "Blind indexes",
},
],
},
});
console.log(`Created collection "${collectionName}"`);
} else {
console.log(`Found existing collection "${collectionName}"`);
}
Add example transcript data
You now create example transcript data to encrypt and store. This data includes the transcripts
array of call transcripts and support agent metadata, and the results
array. The results
array contains a summary of the call and extracted customer contact information.
In the directory that contains index.js
, create a file called sampledata.js
and paste this code into it:
export const data = {
transcripts: [
{
transcript: "This is a sample transcript 1",
agent_user_id: "U1",
agent_group_id: "G1",
},
{
transcript: "This is a sample transcript 2",
agent_user_id: "U2",
agent_group_id: "G1",
},
],
results: [
{
result: {
summary: "Summary of transcript 1",
phone: "+1-234-23456",
email: "aa@gmail.com",
},
agent_user_id: "U1",
agent_group_id: "G1",
},
{
result: {
summary: "Summary of transcript 2",
phone: "+1-234-22222",
email: "bb@gmail.com",
},
agent_user_id: "U2",
agent_group_id: "G1",
},
],
};
At the start of the index.js
, file, add this statement to import data:
import { data } from "./sampledata.js";
Encrypt and decrypt transcript data
The data now needs to be protected, and there are several ways to do that. Encryption is a good option because you're dealing with free text rather than strongly structured data.
To use Vault as an encryption service. To do this you add code that:
- Encrypts all transcripts from the data file using the Vault client, then prints an array with the resulting ciphertexts.
- Decrypts the ciphertexts back to plain text and prints out the result.
Add this code at the end of index.js
:
const encryptedTranscripts = await piianoVaultClient.crypto.encrypt({
collection: collectionName,
requestBody: data.transcripts.map((transcript) => ({
object: { fields: transcript },
})),
});
console.log(
`Encrypted transcripts: ${JSON.stringify(encryptedTranscripts, null, 2)}`
);
const decryptedTranscripts = await piianoVaultClient.crypto.decrypt({
collection: collectionName,
requestBody: encryptedTranscripts.map((x) => ({ encrypted_object: x })),
});
console.log(
`Decrypted transcripts: ${JSON.stringify(decryptedTranscripts, null, 2)}`
);
Run node index.js
in terminal, you see something like this:
Encrypted transcripts: [
{
"ciphertext": "AQABSD16KOv3TuUb6dKzmPyDeW3kXOoblQuZDAZvKn3vFpqgm/h1dJAAK8ey1OLTBnsU7DcUNaGKi7TwSDk8t0sJBtarGYD8tMQSebHCRgjCAsVU7ixn8MahgqLO3D75Vtw2GqhE/KHEFSBaanQutQv7HJpSQaWCXewfz7xzg7b0wtdbSX0jXATZmV87WCiTxZPeK5pdbg=="
},
{
"ciphertext": "AQABSD16KEzVQ1QV+/xQIebuQ9epPc20Pqjo1yyhg8oaYJD7hySP2GnYsDr99pBK2CeEGceiuT7s02/cib757r0jFM378qdbz2IGUc2nE0Z1WTyZEo2U8uwPeHKLYjo+MGN9LaDRUaM/PuAP5y3AqeVu36KNfIsiPA4CgFJrgeutJ1ufObq4hua29asgXuS7KVcrw/WTkw=="
}
]
Decrypted transcripts: [
{
"fields": {
"agent_group_id": "G1",
"agent_user_id": "U1",
"transcript": "This is a sample transcript 1"
}
},
{
"fields": {
"agent_group_id": "G1",
"agent_user_id": "U2",
"transcript": "This is a sample transcript 2"
}
}
]
When you provide Vault with data that corresponds to the collection schema, it can encrypt the data. When you provide the ciphertext, Vault can decrypt the data back to its original form. Encryption key management and rotation are handled by Vault.
Now encrypt the results
part of the data in a similar way, and save the encrypted versions of the transcripts and results back to the data object.
Add this code at the end of index.js
:
const encryptedResults = await piianoVaultClient.crypto.encrypt({
collection: collectionName,
requestBody: data.results.map((result) => ({ object: { fields: result } })),
});
data.transcripts = data.transcripts.map((item, index) => {
item.encrypted = encryptedTranscripts[index];
return item;
});
data.results = data.results.map((item, index) => {
item.encrypted = encryptedResults[index];
return item;
});
The admin API key lets you encrypt and decrypt data in the Vault sandbox. In a production environment, you may want to assign access controls so that one component of your application can only perform encryption and other components can only decrypt. The tutorial covers setting up access controls later.
Enable search by creating blind indexes for each call
Encrypting data involves a serious user experience trade-off: plain text search is no longer possible. Enabling searching on encrypted data requires the creation of blind indexes: secure hashes of the raw data that your code can compare to hashes of search requests to return a match. Adding blind indexes would, for example, enable support personnel to find transcripts of past calls by searching for a customer's email address.
To generate a blind index for each of the customer data properties stored in the results
array—customer emails and phone numbers—you add code to:
- Define which property names to generate blind indexes for:
phone
andemail
. - Loop through the properties of each
result
, and if a property name matches one of those to index, it uses the Vault client to generate a hash of the property value. - Prepends the hash with the property name and saves it to an array of blind indexes. The resulting array looks like this.
['phone:3cf8f8c2-01c9-5071-8267-b8181c3472b6', 'email:99cfa14c-e7ce-a8ad-c5c1-9cbd717ebf2e']
This process works for any properties (such as a WhatsApp handle or a Facebook username) that may be present in results
.
Add this code at the end of index.js
:
const keys = ["phone", "email"];
for (const item of data.results) {
item.blind_indexes = [];
const resultProperties = Object.entries(item.result);
for (const property of resultProperties) {
if (keys.some((key) => property[0] === key)) {
const hashedPropertyValue = await piianoVaultClient.crypto.hashObjects({
collection: collectionName,
requestBody: [{ object: { fields: { blind_index: property[1] } } }],
});
const blindIndex = `${property[0]}:${hashedPropertyValue[0].token_id}`;
item.blind_indexes.push(blindIndex);
}
}
}
console.log(`After building blind indexes:rn${JSON.stringify(data, null, 2)}`);
Store encrypted data in a database
Now, set up a database and store the data. Use the sqlite3 npm package to take advantage of an in-memory SQLite database.
First, create a JavaScript file called db.js
.
touch db.JSON
Now add the code to create a database table to hold encrypted call data: the transcript (encrypted_transcript
), the results of processing the transcript to extract summary and additional data (encrypted_result
), and the blind indexes (blind_indexes
) to enable search over encrypted data.
Open db.js
in your text editor and insert this code.
import sqlite3 from "sqlite3";
const db = new sqlite3.Database(":memory:");
export const createDatabase = (initialData) => {
db.serialize(() => {
db.run(`CREATE TABLE IF NOT EXISTS call_data (
ID INTEGER PRIMARY KEY,
encrypted_transcript TEXT NOT NULL,
encrypted_result TEXT NOT NULL,
blind_indexes TEXT NOT NULL
);`);
for (const record of initialData) {
db.run(
`INSERT INTO call_data (encrypted_transcript, encrypted_result, blind_indexes)
VALUES (?, ?, ?);`,
[
record[0].encrypted.ciphertext,
record[1].encrypted.ciphertext,
JSON.stringify(record[1].blind_indexes),
]
);
}
db.each("SELECT * FROM call_data", (err, data) => {
console.log(data);
});
});
};
In index.js
, add this to the list of import statements.
import * as db from "./db.js";
Now you add code to merge the transcripts
and results
arrays into one, resulting in a data structure suitable for inserting into a database at the end of index.js
.
const transcriptsAndResults = data.transcripts.map((transcript, index) => {
return [data.transcripts[index], data.results[index]];
});
console.log(JSON.stringify(transcriptsAndResults, null, 2));
db.createDatabase(transcriptsAndResults);
Running the code inserts the data into the table, selects it, and outputs the returned rows to the console.
{
ID: 1,
encrypted_transcript: 'AQABtR+CvXYxTuPXJ8DgzbAuxbmroAqnkhwMobKKC8jefUAlTltvKeha0i+XJFH6938ha+ck83wco2bS8MKI7RgE/u2lbFBVvCjwkdhJG6AdIEcsjnhmS67FUwksjq2FCUG4TN+6Op0wHDwpQIhMTs7rheZKWY4xrf8sHnqtZoBdw5kVHar92qrXpYoC4Coj3h/pxGRLqA==',
encrypted_result: 'AQABtR+CvXFPj/KwDrbTP1QALfYnRqoNWB2M7KbOVTKUT3z90IuCNAJngGQVPQf9PungfVj7xyUJPypagoahwnVHlnVWb5g4nrBPi4FR2KB0ePGY5+6XJUlXOV2oVAzDlbK6UVHtpxcYMjpEF/pEqtoW4vfVE8VCTeJcuUfJf1mzb1+R20yl2HZ3JyLiNAVy/W06GHkMAT/bMJlQcuX/4ns/tJVmAZhvmIAzgAtvUoEasGJCS6VsKKg/gFfJSG3Lxsnu',
blind_indexes: '["phone:9108b723-a05d-df3f-64b8-7159ece218e9","email:5de0d0b4-29e7-3610-4560-67d16db168d6"]'
}
Look up a call by phone or email
Now that you’ve generated blind indexes and inserted them into a database, verify that they enable you to search calls search by phone or email.
First, add the code that queries the database to get encrypted call data that matches a hash in its blind_indexes
column. Add this code at the end of db.js
.
export const searchByEmail = async (blindIndex) => {
const results = await new Promise((resolve, reject) => {
db.all(
`SELECT encrypted_transcript, encrypted_result
FROM call_data
WHERE blind_indexes LIKE '%${blindIndex}%'`,
(err, rows) => {
if (err) reject(err);
else resolve(rows);
}
);
});
db.close();
return results;
};
In index.js
, you add code that defines the email address you want to search for. You then use the Vault client to hash the email. You prepend the returned hash with email:
, the same format you used earlier to store blind indexes. You now use this value to search the database by calling searchByEmail. You are returned search results in encrypted form, and you use the Vault client to decrypt the record. Then, you display the plain text email in the console.
Open index.js
and add this code at the end of the file.
const lookupEmail = "aa@gmail.com";
const hashedEmail = await piianoVaultClient.crypto.hashObjects({
collection: collectionName,
requestBody: [{ object: { fields: { blind_index: lookupEmail } } }],
});
const lookupEmailBlindIndex = `email:${hashedEmail[0].token_id}`;
const encryptedSearchResults = await db.searchByEmail(lookupEmailBlindIndex);
const decryptedSearchResults = await piianoVaultClient.crypto.decrypt({
collection: collectionName,
requestBody: Object.values(encryptedSearchResults[0]).map((value) => ({
encrypted_object: { ciphertext: value },
})),
});
console.log(
`Decrypted search results: ${JSON.stringify(decryptedSearchResults, null, 2)}`
);
Run node index.js
in the terminal, You see decrypted search results in the console that look like this.
Decrypted search results: [
{
"fields": {
"agent_group_id": "G1",
"agent_user_id": "U1",
"transcript": "This is a sample transcript 1"
}
},
{
"fields": {
"agent_group_id": "G1",
"agent_user_id": "U1",
"result": {
"email": "aa@gmail.com",
"phone": "+1-234-23456",
"summary": "Summary of transcript 1"
}
}
}
]
If, rather than using Vault for encryption only, you had stored your data in Vault, you can achieve the same result by calling the search API. Storing the data in Vault also allows you to perform substring queries on encrypted data.
Configure encryption permissions for the server
In a real-world system, you may want to restrict the tasks that various parts of the system can perform. For example, you may want to allow the server to encrypt all call information but not be able to decrypt data.
You create an identity and access management (IAM) policy to do this. You define Vault IAM policies in a .toml
format. You can edit these details in your Vault sandbox by going to the Identity and access management section. Here you see how to generate a .toml
formatted IAM configuration in JavaScript, and apply it using the Vault client.
You now add code that uses the Vault client to set the IAM configuration using a string in the .toml
format. The configuration:
- Defines the
CallsEncryptPolicy
policy that allows encryption of call data in Vault'scalls
collection. - Assigns that policy to the
CallsEncryptRole
role. - Assigns
CallsEncryptRole
to theServer
user, enabling the server to encrypt any call data.
Add this code at the end of index.js
:
piianoVaultClient.iam.setIamConf({
requestBody: `
[policies]
# Allow encryption on the "calls" collection
[policies.CallsEncryptPolicy]
operations = ["encrypt"]
policy_type = "allow"
reasons = ["AppFunctionality"]
resources = ["calls/properties/*"]
[roles]
[roles.CallsEncryptRole]
capabilities = ["CapCryptoEncrypter"]
policies = ["CallsEncryptPolicy"]
[users]
[users.Server]
role = "CallsEncryptRole"
`,
});
Now, check if the IAM policy you've defined and applied works the way you expect: enabling the server to encrypt but preventing it from decrypting. To do this you add code that:
- Requests Vault to generate an API key for the
Server
user defined in the IAM policy. - Creates a Vault client instance using the API key to represent what the server can do, following the IAM policy.
- Creates a few sample transcripts and uses the server client to encrypt one of them.
- Tries to decrypt the last encrypted transcript, but fails due to lacking permissions.
Add this code at the end of index.js
replacing <your_piiano_vault_endpoint>
with the Vault URL used in the first step:
const {api_key: apiKey} = await piianoVaultClient.iam.regenerateUserApiKey({
requestBody: {name: "Server"}
})
console.log(apiKey)
const serverVaultClient = new VaultClient({
vaultURL: <your_piiano_vault_endpoint>,
apiKey: apiKey,
});
const sampleTranscripts = [
{"transcript": "hi bla1", "agent_user_id": "user-1", "agent_group_id": "group-1"},
{"transcript": "hi bla2", "agent_user_id": "user-1", "agent_group_id": "group-2"},
{"transcript": "hi bla3", "agent_user_id": "user-2", "agent_group_id": "group-1"},
{"transcript": "hi bla4", "agent_user_id": "user-2", "agent_group_id": "group-2"}
]
console.log("The server can encrypt:")
const encryptedUser1Group1 = await serverVaultClient.crypto.encrypt({
collection: collectionName,
requestBody: [{object: {fields: sampleTranscripts[0]}}]
})
console.log(`Encrypted transcript: ${JSON.stringify(encryptedUser1Group1, null, 2)}`)
const encryptedUser1Group2 = await serverVaultClient.crypto.encrypt({
collection: collectionName,
requestBody: [{object: {fields: sampleTranscripts[1]}}]
})
const encryptedUser2Group1 = await serverVaultClient.crypto.encrypt({
collection: collectionName,
requestBody: [{object: {fields: sampleTranscripts[2]}}]
})
const encryptedUser2Group2 = await serverVaultClient.crypto.encrypt({
collection: collectionName,
requestBody: [{object: {fields: sampleTranscripts[3]}}]
})
console.log("But the server can't decrypt anything:")
try {
const decryptedTranscript = await serverVaultClient.crypto.decrypt({
collection: collectionName,
requestBody: [{encrypted_object: encryptedUser1Group1[0]}]
})
} catch (error) {
console.error(error.message)
}
Run node index.js
in the terminal. You see that the decryption attempt returns an error.
The caller doesn't have the required access rights.
PV1007: The operation is forbidden due to missing capabilities.
Context: {
"username": "Server"
}
You can decrypt the data using the original Vault client, piianoVaultClient
. That's because piianoVaultClient
uses PVAULT_AUTH_TOKEN
, which is set to the admin key that allows unrestricted access in the default Vault sandbox environment.
Summary
With the increasing use of AI agents to transcribe and summarize calls and extract actionable customer contact information comes the need to secure and protect customers' PII data.
In this tutorial, you've seen how to:
- Use Vault to secure sensitive information by encrypting it.
- Provide further protection for the data with identity and access management policies that ensure only the right users can encrypt and decrypt the data.
- Add a blind index so users can perform a full sting match search on the encrypted data. However, you also learned that by storing the data in Vault, you can perform substring searches on the encrypted data, too.