Search capabilities in data models
Use the data modeling search endpoint to search for text and property values in the knowledge graph. This article explains tokenization, matching rules, and endpoint differences to help you build efficient search capabilities. For example, you can use the search endpoint to implement full-text queries and multiple field search, and to rank results by relevance.
The search endpoint supports:
- Full-text search across text fields.
- Configurable matching logic with
AND
/OR
operators. - Prefix-based word matching.
- Boosting exact phrases.
- Filtering on properties.
- Converting search results to a different unit.
See the Query features article for information about filtering.
Queries are eventually consistent. Because of indexing delays, it may take a few seconds for new or updated data to become searchable.
Example search query
The search query below filters the results to Equipment
with temperatures
between 15 and 25 degrees Celsius, and a name
or description
that contains the word "temperature", or a word that starts with "sensor".
{
"view": {
"type": "view",
"space": "testSpace",
"externalId": "Equipment",
"version": "v1"
},
"query": "temperature sensor",
"operator": "AND",
"instanceType": "node",
"properties": ["name", "description"],
"targetUnits": [],
"filter": {
"range": {
"property": "temperature",
"lt": 25,
"gt": 15
}
},
"includeTyping": false,
"sort": [
{
"property": ["externalId"],
"direction": "ascending"
}
],
"limit": 100
}
To perform the same query with the GraphQL endpoint:
searchEquipment(
query: "temperature sensor",
fields: ["name", "description"],
filter: {
range: {
temperature: {
lt: 25,
gt: 15
}
}
},
sort: { externalId: ASC }
) {
items {
externalId
name
description
// Include additional fields here
}
}
Text analysis and tokenization
The data you ingest is automatically indexed asynchronously. This indexing process includes a text analysis step that breaks down the text into smaller components called tokens. The tokenization allows for efficient searching and matching of terms. The tokenization process includes:
- Splitting text into tokens: the text is broken into words based on whitespace and punctuation.
- Lower-casing tokens: all text is converted to lowercase to make searches case-insensitive.
Tokenization example
Input | Generated tokens |
---|---|
Pump_123-ABC | pump_123 , abc |
Temperature Sensor | temperature , sensor |
Temperature_Sensor | temperature_sensor (single token) |
John's pump | john's , pump |
example.com | example.com (period preserved) |
system32.exe | system32 , exe (period splits) |
123.45 | 123.45 (period preserved) |
During tokenization, the following rules apply:
- Underscores
_
and apostrophes'
don't split tokens. They're preserved as part of the token. - Most punctuation including hyphens
-
, commas,
, colons:
, and spaces split tokens. - Periods (
.
) are a special case:- Don't split token between letters (
example.com
remains one token) - Don't split token between numbers (
123.45
remains one token) - Will split tokens in most other cases.
- Split token between letter and number (like
system32.exe
→system32
,exe
)
- Don't split token between letters (
For example, searching for sensor
finds Temperature Sensor
(split into two tokens) but not Temperature_Sensor
(remains a single token).
Query matching logic: OR
vs. AND
The way your search query is matched against multiple properties can be controlled by setting the operator
field
to either AND
or OR
. The default is OR
.
OR
operator (default behavior)
When the operator is set to "OR" or is omitted, the search uses bool prefix matching. For a query to match a document, at least one of the query terms must match. This is ideal for broader, more inclusive searches.
The matching logic is as follows:
- Each word in the query except the last one is treated as an exact match (case-insensitive).
- The last word is treated as a prefix match, allowing for "search-as-you-type" functionality.
- Instances that match more terms will rank higher in the results.
For example, a search for "pressure valve ma" creates:
- An exact match for "pressure".
- An exact match for "valve".
- A prefix match for "ma" (matching "main", "manual", etc.).
An instance with just "pressure" would match, but a document containing all three would rank much higher.
AND
operator (strict behavior)
When the operator is set to AND
, the search requires that all terms in the query string are present in the
searched properties for an instance to be returned. This uses a cross_fields
matching strategy,
meaning the terms can be in any of the specified properties. This is useful for more precise searches where all
criteria must be met.
- All search terms in the query string are present as exact, case-insensitive matches.
- Prefix matching is not applied when using the AND operator.
For example, a search for "pressure sensor" with operator: "AND" will only match documents that contain both the word "pressure" and the word "sensor". A document containing only "pressure" will not be returned.
Phrase matching (exact sequences)
Exact phrase matches boost relevance significantly.
For example, searching for heat exchanger
:
- Ranks
Heat Exchanger
higher (exact phrase match). - Ranks
Exchanger for heat
lower (individual term matches only).
Limitations on matching
Matching has these limitations:
-
No fuzzy or typo matching: queries require correct spelling and matching prefixes.
-
No synonym expansion: queries are matched literally. Synonyms or abbreviations must appear explicitly in the data.
Example query matches
valve
- Matches:
Valve control unit
,Safety valve unit
- Matches:
Ball-valve
(tokenized asball
andvalve
) - Does not match:
Valvoline
(different token) orValve's
(tokenized asvalve's
)
pressure sensor
With operator: "OR"
:
- Best matches: documents containing both "pressure" and "sensor"
- Lower relevance: documents with either "pressure" or "sensor" alone
- Example matches:
High pressure sensor calibration
(matches both terms)Pressure transmitter
(matches only "pressure")Temperature sensor
(matches only "sensor")
- Does not match:
Pressured equipment
(pressure is not a prefix query) Withoperator: "AND"
:
- Requires that both "pressure" and "sensor" are present as exact matches.
- Matches:
High pressure sensor calibration
. - Does not match:
Pressure transmitter
(missing "sensor") orTemperature sensor
(missing "pressure").
compressor fail
With operator: "OR"
:
- Matches:
Compressor failure log
("fail" is a prefix of "failure")Compressor failing to start
("fail" is a prefix of "failing") Withoperator: "AND"
:
- Requires an exact match for both "compressor" and "fail".
- Does not match:
Compressor failure log
, becausefail
is not an exact match for the tokenfailure
. - Does not match:
Compressor failing to start
, becausefail
is not an exact match for the tokenfailing
.
oil temp
With operator: "OR"
:
- Matches:
Oil temperature readings
("temp" is a prefix of "temperature")Oil temporary storage
("temp" is a prefix of "temporary")
flow meter calibra
With operator: "OR"
:
- Matches:
Flow meter calibration procedure
(highest rank - all terms match)Flow meter maintenance
(medium rank - two exact terms match)Calibration of temperature meters
(lower rank - only "meter" and "calibra" match)
server1.example.com/v2.0
With operator: "OR"
:
- Matches:
Connect to server1.example.com using v2.0 protocol
(highest rank - all terms match)server2.example.com documentation
(matches example.com)API v2.0 reference
(matches version number)server1 is down after v2.0.1 upgrade
(matches "server1" and prefix on "v2.0")
- Does not match:
example.net
("example.net" is preserved as a single token)v2
(v2.0
is preserved as a single token)
Filtering differences between the query
and search
endpoints
Filters work mostly the same for both the query
and search
endpoints, but there are a few differences in the handling of empty arrays and prefix arrays.
Exists filter with empty array
Endpoint | Behavior | Example |
---|---|---|
Query | Empty array counted as existing | exists([]) → true |
Search/Aggregate | Empty array counted as non-existing | exists([]) → false |
Prefix filter on arrays
Endpoint | Behavior |
---|---|
Query | Checks array prefix sequence (ordered matching). Supports text[] and int[] arrays. |
Search/Aggregate | Checks each array item separately. Supports only single-value text field prefix filters (no arrays). |
Examples of prefix filter behavior
Prefix Condition | Query API | Search API | Note |
---|---|---|---|
"pump" prefix of ["pump", "valve"] | ✅ | ✅ | Both APIs match single elements. |
"pump" prefix of ["pumping", "valve"] | ❌ | ✅ | Only Search matches "pumping" (element prefix exists). |
["pump","valve"] prefix of ["pump","valve","sensor"] | ✅ | ❌ | Search doesn't support array prefix. |
"pump" prefix of ["valve","pump"] | ❌ | ✅ | Query API checks start sequence. Search any element. |
Nested filters are only supported for core data model assets
Nested filters aren't supported in the search and aggregation endpoints of Cognite data models, except when filtering direct relations to core data model assets.
If you need to apply nested filters on properties that are not directly related to core data model assets, use the Query API.
Supported nested filters
The following types and properties support nested filtering in the search and aggregation endpoints:
Core data model type | Core data model property |
---|---|
CogniteActivity | assets |
CogniteFile | assets, category |
CogniteTimeSeries | assets, unit |
CogniteEquipment | asset |
CogniteMaintenance | asset |
CogniteNotification | asset |
CogniteOperation | asset |