Search capabilities in data models
Use the data modeling search endpoint to search for text and property values in the knowledge graph. This article explains tokenization, matching rules, and endpoint differences to help you build efficient search capabilities. For example, you can use the search endpoint to implement full-text queries and multiple field search, and to rank results by relevance.
The search endpoint supports:
- Full-text search across text fields.
- Prefix-based word matching.
- Boosting exact phrases.
- Filtering on properties.
- Converting search results to a different unit.
See the Query features article for information about filtering.
Queries are eventually consistent. Because of indexing delays, it may take a few seconds for new or updated data to become searchable.
Example search query
The search query below filters the results to Equipment
with temperatures
between 15 and 25 degrees Celsius, and a name
or description
that contains the word "temperature", or a word that starts with "sensor".
{
"view": {
"type": "view",
"space": "testSpace",
"externalId": "Equipment",
"version": "v1"
},
"query": "temperature sensor",
"instanceType": "node",
"properties": ["name", "description"],
"targetUnits": [],
"filter": {
"range": {
"property": "temperature",
"lt": 25,
"gt": 15
}
},
"includeTyping": false,
"sort": [
{
"property": ["externalId"],
"direction": "ascending"
}
],
"limit": 100
}
To perform the same query with the GraphQL endpoint:
searchEquipment(
query: "temperature sensor",
fields: ["name", "description"],
filter: {
range: {
temperature: {
lt: 25,
gt: 15
}
}
},
sort: { externalId: ASC }
) {
items {
externalId
name
description
// Include additional fields here
}
}
Text analysis and tokenization
The data you ingest is automatically indexed asynchronously. This indexing process includes a text analysis step that breaks down the text into smaller components called tokens. The tokenization allows for efficient searching and matching of terms. The tokenization process includes:
- Splitting text into tokens: the text is broken into words based on whitespace and punctuation.
- Lower-casing tokens: all text is converted to lowercase to make searches case-insensitive.
Tokenization example
Input | Generated tokens |
---|---|
Pump_123-ABC | pump_123 , abc |
Temperature Sensor | temperature , sensor |
Temperature_Sensor | temperature_sensor (single token) |
John's pump | john's , pump |
example.com | example.com (period preserved) |
system32.exe | system32 , exe (period splits) |
123.45 | 123.45 (period preserved) |
During tokenization, the following rules apply:
- Underscores
_
and apostrophes'
don't split tokens. They're preserved as part of the token. - Most punctuation including hyphens
-
, commas,
, colons:
, and spaces split tokens. - Periods (
.
) are a special case:- Don't split token between letters (
example.com
remains one token) - Don't split token between numbers (
123.45
remains one token) - Will split tokens in most other cases.
- Split token between letter and number (like
system32.exe
→system32
,exe
)
- Don't split token between letters (
For example, searching for sensor
finds Temperature Sensor
(split into two tokens) but not Temperature_Sensor
(remains a single token).
Bool prefix matching
The search endpoint uses this matching approach when you enter a multi-word search query:
- Convert each word except the last one to an exact match.
- Convert the last word to a prefix match, allowing for partial word matching.
This approach enables precise matching for complete words while supporting partial matching on the final word.
For example, if you search for "pressure valve ma"
, the system creates:
- An exact match for "pressure".
- An exact match for "valve".
- A prefix match for "ma" (which could match "main", "maintenance", "manual", etc.)
For a query to match a document, at least one of the conditions must be met. Documents matching several conditions will rank higher in the results.
Matching details
- Exact term matching: complete words are matched exactly (case-insensitive).
- Prefix matching: the last term of the query matches the beginning of words in the document.
- OR logic by default: any term match contributes to the document's relevance score.
For example, searching for pump fail
, matches items such as:
pump failure
(exact match on "pump", prefix match on "fail")Pump Station
(exact match on "pump" only)Failure detection
(prefix match on "fail" only)
Items matching both terms rank higher in the results.
Phrase matching (exact sequences)
Exact phrase matches boost relevance significantly.
For example, searching for heat exchanger
:
- Ranks
Heat Exchanger
higher (exact phrase match). - Ranks
Exchanger for heat
lower (individual term matches only).
Limitations on matching
Matching has these limitations:
-
No fuzzy or typo matching: queries require correct spelling and matching prefixes.
-
No synonym expansion: queries are matched literally. Synonyms or abbreviations must appear explicitly in the data.
Example query matches
valve
- Matches:
Valve control unit
,Safety valve unit
- Matches:
Ball-valve
(tokenized asball
andvalve
) - Does not match:
Valvoline
(different token) orValve's
(tokenized asvalve's
)
pressure sensor
- Best matches: documents containing both "pressure" and "sensor"
- Lower relevance: documents with either "pressure" or "sensor" alone
- Example matches:
High pressure sensor calibration
(matches both terms)Pressure transmitter
(matches only "pressure")Temperature sensor
(matches only "sensor")
- Does not match:
Pressured equipment
(pressure is not a prefix query)
compressor fail
- Matches:
Compressor failure log
("fail" is a prefix of "failure")Compressor failing to start
("fail" is a prefix of "failing")
oil temp
- Matches:
Oil temperature readings
("temp" is a prefix of "temperature")Oil temporary storage
("temp" is a prefix of "temporary")
flow meter calibra
- Matches:
Flow meter calibration procedure
(highest rank - all terms match)Flow meter maintenance
(medium rank - two exact terms match)Calibration of temperature meters
(lower rank - only "meter" and "calibra" match)
server1.example.com/v2.0
- Matches:
Connect to server1.example.com using v2.0 protocol
(highest rank - all terms match)server2.example.com documentation
(matches example.com)API v2.0 reference
(matches version number)server1 is down after v2.0.1 upgrade
(matches "server1" and prefix on "v2.0")
- Does not match:
example.net
("example.net" is preserved as a single token)v2
(v2.0
is preserved as a single token)
Filtering differences between the query
and search
endpoints
Filters work mostly the same for both the query
and search
endpoints, but there are a few differences in the handling of empty arrays and prefix arrays.
Exists filter with empty array
Endpoint | Behavior | Example |
---|---|---|
Query | Empty array counted as existing | exists([]) → true |
Search/Aggregate | Empty array counted as non-existing | exists([]) → false |
Prefix filter on arrays
Endpoint | Behavior |
---|---|
Query | Checks array prefix sequence (ordered matching). Supports text[] and int[] arrays. |
Search/Aggregate | Checks each array item separately. Supports only single-value text field prefix filters (no arrays). |
Examples of prefix filter behavior
Prefix Condition | Query API | Search API | Note |
---|---|---|---|
"pump" prefix of ["pump", "valve"] | ✅ | ✅ | Both APIs match single elements. |
"pump" prefix of ["pumping", "valve"] | ❌ | ✅ | Only Search matches "pumping" (element prefix exists). |
["pump","valve"] prefix of ["pump","valve","sensor"] | ✅ | ❌ | Search doesn't support array prefix. |
"pump" prefix of ["valve","pump"] | ❌ | ✅ | Query API checks start sequence. Search any element. |
Nested filters are only supported for core data model assets
Nested filters aren't supported in the search and aggregation endpoints of Cognite data models, except when filtering direct relations to core data model assets.
If you need to apply nested filters on properties that are not directly related to core data model assets, use the Query API.
Supported nested filters
The following types and properties support nested filtering in the search and aggregation endpoints:
Core data model type | Core data model property |
---|---|
CogniteActivity | assets |
CogniteFile | assets, category |
CogniteTimeSeries | assets, unit |
CogniteEquipment | asset |
CogniteMaintenance | asset |
CogniteNotification | asset |
CogniteOperation | asset |