Find PII in texts
Learn how to analyze strings to determine whether they contain PII data and find the locations and types of that PII data
You may need to determine whether unstructured data, such as log records or transcripts, contain PII data. The Vault analysis API operations and CLI commands enable you to do this. These features can identify a wide range of universal PII data and country-specific PII data for Canada, India, the UK, and the USA. For a complete list of detected PII types, see Analyze identified PII types. The API and CLI commands can detect PII in Arabic, English, French, German, Hindi, Italian, Japanese, Korean, Portuguese, Spanish, and simplified and traditional Chinese.
This feature:
- is rate-limited:
- For sandbox users, the rate limit is 10 requests per minute and 1,000 requests per day, and subject to a fair use policy for evaluation purposes.
- For production users,
contains
can be called more frequently thanlocate
; your purchased plan determines the rate.
- uses AI to detect PII. As such, it's possible for PII to be missed or non-PII text to be classified as PII.
- is enabled on Vault SaaS only. If you're interested in running it with a self-hosted deployment. Contact us for more information on rates or use in a self-hosted deployment.
Analyze strings for PII content
To determine whether there is PII in text strings, use the contains PII data API operation or CLI command.
You do this using the CLI like this:
pvault
--addr=https://<Vault SaaS Endpoint> \
--authtoken=<Vault SaaS API key> \
analysis contains --lang=en --text="My credit card number 1111-0000-1111-0000 has a minimum payment of $24.53."
You get a response similar to this:
+--------------------------------+---------------------+-----------+
| text | type | score |
+--------------------------------+---------------------+-----------+
| My credit card number | CREDIT_DEBIT_NUMBER | 0.8740872 |
| 1111-0000-1111-0000 has a | | |
| minimum payment of $24.53. | | |
+--------------------------------+---------------------+-----------+
Or using the API operation like this:
curl -v --request POST \
--url 'https://<Vault SaaS Endpoint>/api/pvlt/1.0/data/analysis/contains' \
--header 'Authorization: Bearer <Vault SaaS API key>' \
--header 'Content-Type: application/json' \
--data '{
"language": "en",
"text": [
"My credit card number 1111-0000-1111-0000 has a minimum payment of $24.53."
]
}'
You get a response similar to this:
[
{
"labels": [
{
"type": "CREDIT_DEBIT_NUMBER",
"score": 0.8740872
}
]
}
]
Locate PII content in strings
To find the location and type of PII in text strings, use the locate PII data API operation or CLI command.
You do this using the CLI like this:
pvault
--addr=https://<Vault SaaS Endpoint> \
--authtoken=<Vault SaaS API key> \
analysis locate --lang=en --text=My credit card number 1111-0000-1111-0000 has a minimum payment of $24.53.
You get a response similar to this:
+--------------------------------+---------------------+------------+--------------+------------+
| text | type | score | begin_offset | end_offset |
+--------------------------------+---------------------+------------+--------------+------------+
| My credit card number | CREDIT_DEBIT_NUMBER | 0.99997807 | 22 | 41 |
| 1111-0000-1111-0000 has a | | | | |
| minimum payment of $24.53. | | | | |
+--------------------------------+---------------------+------------+--------------+------------+
Or using the API operation like this:
curl -v --request POST \
--url 'https://<Vault SaaS Endpoint>/api/pvlt/1.0/data/analysis/locate \
--header 'Authorization: Bearer <Vault SaaS API key>' \
--header 'Content-Type: application/json' \
--data '{
"language": "en",
"text": [
"My credit card number 1111-0000-1111-0000 has a minimum payment of $24.53."
]
}'
You get a response similar to this:
[
{
"detections": [
{
"type": "CREDIT_DEBIT_NUMBER",
"score": 0.99997807,
"begin_offset": 22,
"end_offset": 41
}
]
}
]