Skip to main content

Search objects by substring

You can search for objects that Vault stores encrypted using full-match or substring search. To learn about full-match searches, see the Search objects guide. This guide explains how to search for objects using substring search.

You can return a subset of the found object’s properties from the search. You can also request transformations of values, where available.

You can use the unsafe option to get all the values of the objects (though it is not recommended). Using unsafe can be combined with the show_builtins option to include the built-in properties.

Whether your request for object values succeeds depends on the permissions you've been granted as part of the Vault’s identity and access management settings.

Overview

To search for objects in a collection using substring search, you use the CLI search objects or the REST API search objects operation passing the property or properties you want to get and the collection name.

To search for objects using substring search, the property you want to search must be indexed for substring queries. You can configure the substring index for a property when you create a collection.

Vault refreshes the substring index every few seconds (See the PVAULT_SERVICE_SUBSTRING_INDEX_REFRESH_INTERVAL environment variable). This means that if you add or update an object, you may not be able to search for it using substring immediately.

When you create a collection, you can configure a property for substring search. To do this, set the is_substring_index field of the property to true.

You can define it in a PVSchema like this:

users PERSONS (
name NAME SUBSTRING_INDEX,
)

or JSON format, like this:

{
"type": "PERSONS",
"name": "users",
"properties": [
{
"name": "name",
"data_type_name": "NAME",
"is_substring_index": true
}
]
}

Note: Substring index is only supported for these data types:

  • STRING
  • LONG_TEXT
  • EMAIL
  • EMAIL_STRICT
  • URL
  • NAME
  • GENDER
  • BAN
  • SSN
  • ADDRESS
  • CC_HOLDER_NAME
  • US_BANK_ROUTING
  • US_BANK_ACCOUNT_NUMBER
  • Custom data types based on the STRING data type.

Search patterns

Searching objects using substring is done by specifying a search pattern. The supported search pattern is a subset of glob patterns. The supported characters are:

  • * - matches zero or more characters.
  • ? - matches exactly one character.

For example:

  • *john* - matches any value that contains the substring "john".
  • john* - matches any value that starts with "john".
  • *john - matches any value that ends with "john".
  • j?hn - matches any value that has "j" as the first character, "h" as the third character, and "n" as the fourth character.

Searches are case-insensitive, so *john* matches any value that contains the strings "John", "john", "JOHN", etc.

A search pattern must contain at least two 3-letter words (2 trigrams). For example *jo* is too short for a search pattern, but *john* (the 3-letters combinations: joh and ohn) and *jon*snow* are valid.

Querying objects

To search for objects in a collection using substring search, you use the CLI search objects or the REST API search objects operation passing the property or properties you want to get and the collection name. The query operator for substring search is like.

In CLI, you can search for objects with a substring search like this:

pvault object query -c users --like name="*john*" --props name

or in API, like this:

curl -X POST \
-H 'Authorization: Bearer pvaultauth' \
-H 'Content-Type: application/json' \
-d '{"like":{"name": "*john*"}}' \
"http://localhost:8123/api/pvlt/1.0/data/collections/users/query/objects?props=name&reason=AppFunctionality"

Database load considerations

Vault creates the substring match index from scratch during startup. Creating the index puts a load on the database for up to a few seconds. Once created, the index is updated every few seconds and the load is minimal (see PVAULT_SERVICE_SUBSTRING_INDEX_REFRESH_INTERVAL). If no properties are configured for substring indexing, no load is caused by this operation.

To reduce the load on the database, use a read replica of the database to refresh the substring index. To configure the read replica, set the PVAULT_DB_READ_REPLICA_HOSTNAME and PVAULT_DB_READ_REPLICA_PORT environment variables.

Performance considerations and pagination

The substring matching process may create false positives that Vault filters out before responding to the API request (Vault always returns the correct results). For that reason, a query that includes both a substring search (like requirement) and a full-match (in or match requirements) search is slightly slower than a query that includes only a full-match search.

The query response is paginated. Because of the potential internal false-positive results, the "remaining_count" parameter of the response is higher than the number of remaining results.

Step-by-step

Say you have a collection called ‘customers’ that you created using Create a collection. You want to retrieve ‘email‘ for all the customers in this collection that meet a search requirement.

pvault collection add --collection-pvschema "customers PERSONS (
name NAME SUBSTRING_INDEX,
email EMAIL,
city STRING,
)"

Consider these search requirements:

  1. The email address for customers whose first name is "John".
  2. The email address for customers whose first name is "John" and live in "New York".
Specifying a "like" requirement

To get the email of a customer whose first name is "John", you first need to determine the search pattern to use. In this case, the search pattern is John *, so the search query is:

name="John *"

You can search for objects with a "like" requirement using the CLI like this:

pvault object query -c customers --like name="John*" --props name,email,city

You get a response similar to:

Displaying 2 results.
+------+------------------------------+------------+
| city | email | name |
+------+------------------------------+------------+
| NY | johndoe@somemail.com | John Doe |
| SF | johnlemon@yetanothermain.com | John Lemon |
+------+------------------------------+------------+

or in API, like this:

curl -s -X POST \
-H 'Authorization: Bearer pvaultauth' \
-H 'Content-Type: application/json' \
-d '{"like":{"name": "John*"}}' \
"http://localhost:8123/api/pvlt/1.0/data/collections/customers/query/objects?props=name,email,city&reason=AppFunctionality"

You get a response similar to:

{
"results": [
{
"email": "johndoe@somemail.com",
"name": "John Doe",
"city": "NY"
},
{
"email": "johnlemon@yetanothermain.com",
"name": "John Lemon",
"city": "SF"
}
],
"paging": {
"cursor": "",
"size": 2,
"remaining_count": 0
}
}
Specifying a "like" requirement and a "match" requirement

You can add the "match" or "in" requirements to the "like" requirement. You get the email of customers whose first name is "John" and live in "New York" like this:

pvault object query -c customers --like name="John*" --match city="NY" --props name,email,city

or in API, like this:

curl -s -X POST \
-H 'Authorization: Bearer pvaultauth' \
-H 'Content-Type: application/json' \
-d '{"like":{"name": "John*"},"match":{"city":"NY"}}' \
"http://localhost:8123/api/pvlt/1.0/data/collections/customers/query/objects?props=name,email,city&reason=AppFunctionality"

You get a response similar to:

{
"results": [
{
"city": "NY",
"email": "johndoe@somemail.com",
"name": "John Doe"
}
],
"paging": {
"cursor": "",
"size": 1,
"remaining_count": 0
}
}

Notice that the response contains only the details of a single customer who meets both of the search requirements, in this case only one customer.

tip

The response is paginated. See CLI pagination for more information about working with paginated responses.

tip

Response values can be transformed. See search-objects for an example.