Skip to main content

Search set up

Learn how to set up collection properties to enable searching

To search for objects using string or substring search, the property you want to search must be indexed. There are separate indexes for string and substring searches.

You can configure the indexes for a property when you create a collection. You can also update a property to turn string indexing on or off or turn off substring indexing.

String indexing

When you create a collection or add a property to a collection, you can configure some properties for string search. To do this, set the is_index field of the property to true. See indexing in the data type reference for a list of supported data types.

You can define indexing in a PVSchema like this:

users PERSONS (
name NAME INDEX,
)

or JSON format, like this:

{
"type": "PERSONS",
"name": "users",
"properties": [
{
"name": "name",
"data_type_name": "NAME",
"is_index": true
}
]
}

You can turn on or off indexing when updating the collection or property.

Substring indexing

When you create a collection or add a property to a collection, you can configure some properties for substring search. To do this, set the is_substring_index field of the property to true. See indexing in the data type reference for a list of supported data types.

You can define substring indexing in a PVSchema like this:

users PERSONS (
name NAME SUBSTRING_INDEX,
)

or JSON format, like this:

{
"type": "PERSONS",
"name": "users",
"properties": [
{
"name": "name",
"data_type_name": "NAME",
"is_substring_index": true
}
]
}

Database load considerations

Vault creates the substring match index during startup. Creating the index puts a load on the database for up to a few seconds. Once created, the index is updated every few seconds (see PVAULT_SERVICE_SUBSTRING_INDEX_REFRESH_INTERVAL). The load generated by this refresh is minimal. If no properties are configured for substring indexing, no load is created by this operation.

To reduce the load on the database, use a read replica of the database to refresh the substring index. To configure the read replica, set the PVAULT_DB_READ_REPLICA_HOSTNAME and PVAULT_DB_READ_REPLICA_PORT environment variables.

Performance considerations and pagination

Vault refreshes the substring index every few seconds (See the PVAULT_SERVICE_SUBSTRING_INDEX_REFRESH_INTERVAL environment variable). This means that if you add or update an object, you may not be able to search for it using substring immediately.

The substring matching process may create false positives that Vault filters out before responding to the API request (Vault always returns the correct results). Therefore, a query that includes both a substring search (like requirement) and a full-match (in or match requirements) search is slightly slower than a query that includes only a full-match search.

The query response is paginated. Because of the potential internal false-positive results, the "remaining_count" parameter of the response may be higher than the number of remaining results.