Search objects by substring
Learn how to search for objects in Vault using substring search
You can search for objects that Vault stores encrypted using full-match or substring search. To learn about full-match searches, see the Search objects guide. This guide explains how to search for objects using substring search.
You can return a subset of the found object’s properties from the search. You can also request transformations of values, where available.
You can use the unsafe
option to get all the values of the objects (though it is not recommended). Using unsafe
can be combined with the show_builtins
option to include the built-in properties.
Whether your request for object values succeeds depends on the permissions you've been granted as part of the Vault’s identity and access management settings.
Overview
To search for objects in a collection using substring search, you use the CLI search objects or the REST API search objects operation passing the property or properties you want to get and the collection name.
To search for objects using substring search, the property you want to search must be indexed for substring queries. You can configure the substring index for a property when you create a collection.
Vault refreshes the substring index every few seconds (See the PVAULT_SERVICE_SUBSTRING_INDEX_REFRESH_INTERVAL
environment variable). This means that if you add or update an object, you may not be able to search for it using substring immediately.
Configure a property for substring search
When you create a collection, you can configure a property for substring search. To do this, set the is_substring_index
field of the property to true
.
You can define it in a PVSchema like this:
users PERSONS (
name NAME SUBSTRING_INDEX,
)
or JSON format, like this:
{
"type": "PERSONS",
"name": "users",
"properties": [
{
"name": "name",
"data_type_name": "NAME",
"is_substring_index": true
}
]
}
Note: Substring index is only supported for these data types:
STRING
LONG_TEXT
EMAIL
EMAIL_STRICT
URL
NAME
GENDER
BAN
SSN
ADDRESS
CC_HOLDER_NAME
US_BANK_ROUTING
US_BANK_ACCOUNT_NUMBER
- Custom data types based on the
STRING
data type.
Search patterns
Searching objects using substring is done by specifying a search pattern. The supported search pattern is a subset of glob patterns. The supported characters are:
*
- matches zero or more characters.?
- matches exactly one character.
For example:
*john*
- matches any value that contains the substring "john".john*
- matches any value that starts with "john".*john
- matches any value that ends with "john".j?hn
- matches any value that has "j" as the first character, "h" as the third character, and "n" as the fourth character.
Searches are case-insensitive, so *john*
matches any value that contains the strings "John", "john", "JOHN", etc.
A search pattern must contain at least two 3-letter words (2 trigrams). For example *jo*
is too short for a search pattern, but *john*
(the 3-letters combinations: joh
and ohn
) and *jon*snow*
are valid.
Querying objects
To search for objects in a collection using substring search, you use the CLI search objects or the REST API search objects operation passing the property or properties you want to get and the collection name. The query operator for substring search is like
.
In CLI, you can search for objects with a substring search like this:
pvault object query -c users --like name="*john*" --props name
or in API, like this:
curl -X POST \
-H 'Authorization: Bearer pvaultauth' \
-H 'Content-Type: application/json' \
-d '{"like":{"name": "*john*"}}' \
"http://localhost:8123/api/pvlt/1.0/data/collections/users/query/objects?props=name&reason=AppFunctionality"
Database load considerations
Vault creates the substring match index from scratch during startup. Creating the index puts a load on the database for up to a few seconds. Once created, the index is updated every few seconds and the load is minimal (see PVAULT_SERVICE_SUBSTRING_INDEX_REFRESH_INTERVAL
). If no properties are configured for substring indexing, no load is caused by this operation.
To reduce the load on the database, use a read replica of the database to refresh the substring index. To configure the read replica, set the PVAULT_DB_READ_REPLICA_HOSTNAME and PVAULT_DB_READ_REPLICA_PORT environment variables.
Performance considerations and pagination
The substring matching process may create false positives that Vault filters out before responding to the API request (Vault always returns the correct results). For that reason, a query that includes both a substring search (like
requirement) and a full-match (in
or match
requirements) search is slightly slower than a query that includes only a full-match search.
The query response is paginated. Because of the potential internal false-positive results, the "remaining_count" parameter of the response is higher than the number of remaining results.
Step-by-step
Say you have a collection called ‘customers’ that you created using Create a collection. You want to retrieve ‘email‘ for all the customers in this collection that meet a search requirement.
pvault collection add --collection-pvschema "customers PERSONS (
name NAME SUBSTRING_INDEX,
email EMAIL,
city STRING,
)"
Consider these search requirements:
- The email address for customers whose first name is "John".
- The email address for customers whose first name is "John" and live in "New York".
Specifying a "like" requirement
To get the email of a customer whose first name is "John", you first need to determine the search pattern to use. In this case, the search pattern is John *
, so the search query is:
name="John *"
You can search for objects with a "like" requirement using the CLI like this:
pvault object query -c customers --like name="John*" --props name,email,city
You get a response similar to:
Displaying 2 results.
+------+------------------------------+------------+
| city | email | name |
+------+------------------------------+------------+
| NY | johndoe@somemail.com | John Doe |
| SF | johnlemon@yetanothermain.com | John Lemon |
+------+------------------------------+------------+
or in API, like this:
curl -s -X POST \
-H 'Authorization: Bearer pvaultauth' \
-H 'Content-Type: application/json' \
-d '{"like":{"name": "John*"}}' \
"http://localhost:8123/api/pvlt/1.0/data/collections/customers/query/objects?props=name,email,city&reason=AppFunctionality"
You get a response similar to:
{
"results": [
{
"email": "johndoe@somemail.com",
"name": "John Doe",
"city": "NY"
},
{
"email": "johnlemon@yetanothermain.com",
"name": "John Lemon",
"city": "SF"
}
],
"paging": {
"cursor": "",
"size": 2,
"remaining_count": 0
}
}
Specifying a "like" requirement and a "match" requirement
You can add the "match" or "in" requirements to the "like" requirement. You get the email of customers whose first name is "John" and live in "New York" like this:
pvault object query -c customers --like name="John*" --match city="NY" --props name,email,city
or in API, like this:
curl -s -X POST \
-H 'Authorization: Bearer pvaultauth' \
-H 'Content-Type: application/json' \
-d '{"like":{"name": "John*"},"match":{"city":"NY"}}' \
"http://localhost:8123/api/pvlt/1.0/data/collections/customers/query/objects?props=name,email,city&reason=AppFunctionality"
You get a response similar to:
{
"results": [
{
"city": "NY",
"email": "johndoe@somemail.com",
"name": "John Doe"
}
],
"paging": {
"cursor": "",
"size": 1,
"remaining_count": 0
}
}
Notice that the response contains only the details of a single customer who meets both of the search requirements, in this case only one customer.
The response is paginated. See CLI pagination for more information about working with paginated responses.
Response values can be transformed. See search-objects for an example.