Use the data modeling search endpoint to search for text and property values in the knowledge graph. This article explains tokenization, matching rules, and endpoint differences to help you build efficient search capabilities. For example, you can use the search endpoint to implement full-text queries and multiple field search, and to rank results by relevance.
The search endpoint supports:
- Full-text search across text fields.
- Configurable matching logic with
AND/OR operators.
- Prefix-based word matching.
- Boosting exact phrases.
- Filtering on properties.
- Converting search results to a different unit.
Queries are eventually consistent. Because of indexing delays, it may take a few seconds for new or updated data to become searchable.
Search process overview
The search query is executed through three distinct steps:
- Text analysis and tokenization: the search query text is analyzed and split into tokens.
- Instance matching: the tokens are matched against pre-indexed instance data in your Cognite Data Fusion (CDF) project.
- Instance ranking: matched instances are sorted by their relevance to the query.
Text analysis and tokenization
The instances you ingest are automatically indexed asynchronously. This indexing process includes a text analysis step that breaks down the text into smaller components called tokens. The system compares the tokens extracted from your search query against the tokens indexed from your instances to determine matches and ranking.
The tokenization process includes:
- Splitting text into tokens: the text is broken into words based on whitespace and punctuation.
- Lower-casing tokens: all text is converted to lowercase to make searches case-insensitive.
Tokenization rules
During tokenization, the following rules apply:
- Letters, numbers, and underscores (
_) don’t split tokens. For example, user_name, File1, and AH12, remain a single token.
- Periods (
.), and apostrophes ('):
- Don’t split token between letters (for example,
e.g. and don't, remain one token).
- Don’t split token between numbers (
3.14 remains one token).
- Will split tokens in other cases (for example, in
end., or between letter and number).
- Commas (
,):
- Don’t split tokens between two numbers (
1,000 remains one token).
- Will split tokens between letters.
- Colons (
:):
- Don’t split tokens between two letters (
Scale:Linear remains one token).
- Will split tokens between numbers.
- Other non-standard characters, such as double quotation mark (
"), hyphen (-), whitespace, etc., will split tokens.
Tokenization example
| Input | Generated tokens | Explanation |
|---|
Pump_123-ABC | pump_123, abc | Hyphen (non-standard character) splits |
Temperature Sensor | temperature, sensor | Whitespace (non-standard character) splits |
Temperature_Sensor_1 | temperature_sensor_1 | Underscores (standard characters) don’t split |
John's pump | john's, pump | Apostrophe doesn’t split the sequence of letters |
example.com | example.com | Period doesn’t split the letter-to-letter sequence |
system32.exe | system32, exe | Period splits the number-to-letter sequence |
first.last 5.10 | first.last, 5.10 | Period doesn’t split the letter or number sequences |
first,last 5,10 | first, last, 5,10 | Comma splits the letter sequence but not the number sequence |
first:last 5:10 | first:last, 5, 10 | Colon splits the number sequence but not the letter sequence |
John's 1st account has 1,000.5$ dollars | John's, 1st, account, has, 1,000.5, $, dollars | Combined rules |
Instance matching
After tokenization, the search compares your query tokens against indexed tokens from instances in your CDF project. The matching behavior depends on the token’s position in your query:
- Standard tokens (exact match): all tokens in the query except the last one require an exact match (case-insensitive) with a token in the instance data.
- Final token (prefix match): the last token is treated as a prefix, allowing for search-as-you-type functionality. It matches any word that starts with those characters.
Example scenario
If you search for pressure valve ma, the system matches instances based on the following criteria:
pressure and valve: require an exact match.
ma: requires a prefix match (matching main, manifold, manual, etc.).
Search operators
The token matching rules above determine whether individual tokens match a given instance. The search operator determines which instances qualify as a match for the entire query.
| Operator | Behavior | Example (query: “pressure sensor”) |
|---|
| OR (default) | Returns instances matching at least one token. | Matches instances containing “pressure” only, “sensor” only, or both. |
| AND | Returns only instances matching all tokens. | Matches only instances containing both “pressure” AND “sensor”. |
Effective November 2026, the default search operator will change from OR to AND. To maintain your current search behavior, we recommend explicitly setting the operator in your search queries.
Limitations on matching
Consider these limitations when designing your search:
- No fuzzy or typo matching: queries require correct spelling and matching prefixes.
- No synonym expansion: queries are matched literally. Synonyms or abbreviations must appear explicitly in the data.
Instance ranking
When multiple instances match a query, they’re ordered by relevance. The following factors determine the ranking:
Number of matching tokens
Instances that match more tokens are ranked higher. This is mainly relevant when using the OR operator since the AND operator requires all tokens to match.
For example, searching for Heat Exchanger 243 alpha using the OR operator:
- Ranks
Heat Exchanger 243 higher (matches three of four tokens).
- Ranks
Heat Exchanger lower (matches two of four tokens).
- Ranks
Heat lowest (matches one of four tokens).
Phrase matching (exact sequences)
Exact phrase matches boost relevance significantly.
For example, searching for heat exchanger:
- Ranks
Heat Exchanger higher (exact phrase match).
- Ranks
Heat For Exchanger lower (individual tokens matches only).
Example query matches
valve
With operator: "OR":
- Matches:
Valve control unit, Safety valve unit
- Matches:
Ball-valve (tokenized as ball and valve)
- Does not match:
Valvoline (different token) or Valve's (tokenized as valve's)
pressure sensor
With operator: "OR":
- Best matches: documents containing both “pressure” and “sensor”
- Lower relevance: documents with either “pressure” or “sensor” alone
- Example matches:
High pressure sensor calibration (matches both tokens)
Pressure transmitter (matches only “pressure”)
Temperature sensor (matches only “sensor”)
- Does not match:
Pressured equipment (pressure is not a prefix query)
With operator: "AND":
- Requires that both “pressure” and “sensor” are present as exact matches.
- Matches:
High pressure sensor calibration.
- Does not match:
Pressure transmitter (missing “sensor”) or Temperature sensor (missing “pressure”).
compressor fail
With operator: "OR":
- Matches:
Compressor failure log (“fail” is a prefix of “failure”)
Compressor failing to start (“fail” is a prefix of “failing”)
With operator: "AND":
- Requires an exact match for both “compressor” and “fail”.
- Does not match:
Compressor failure log, because fail is not an exact match for the token failure.
- Does not match:
Compressor failing to start, because fail is not an exact match for the token failing.
oil temp
With operator: "OR":
- Matches:
Oil temperature readings (“temp” is a prefix of “temperature”)
Oil temporary storage (“temp” is a prefix of “temporary”)
flow meter calibra
With operator: "OR":
- Matches:
Flow meter calibration procedure (highest rank - all tokens match)
Flow meter maintenance (medium rank - two exact tokens match)
Calibration of temperature meters (lower rank - only “meter” and “calibra” match)
server1.example.com/v2.0
With operator: "OR":
- Matches:
Connect to server1.example.com using v2.0 protocol (highest rank - all tokens match)
server2.example.com documentation (matches example.com)
API v2.0 reference (matches version number)
server1 is down after v2.0.1 upgrade (matches “server1” and prefix on “v2.0”)
- Does not match:
example.net (“example.net” is preserved as a single token)
v2 (v2.0 is preserved as a single token)
Example search query
The search query below filters the results to Equipment with temperatures between 15 and 25 degrees Celsius, and a name or description that contains the word “temperature”, or a word that starts with “sensor”.
{
"view": {
"type": "view",
"space": "testSpace",
"externalId": "Equipment",
"version": "v1"
},
"query": "temperature sensor",
"operator": "AND",
"instanceType": "node",
"properties": ["name", "description"],
"targetUnits": [],
"filter": {
"range": {
"property": "temperature",
"lt": 25,
"gt": 15
}
},
"includeTyping": false,
"sort": [
{
"property": ["externalId"],
"direction": "ascending"
}
],
"limit": 100
}
To perform the same query with the GraphQL endpoint:
Example search query in GraphQL
searchEquipment(
query: "temperature sensor",
fields: ["name", "description"],
filter: {
range: {
temperature: {
lt: 25,
gt: 15
}
}
},
sort: { externalId: ASC }
) {
items {
externalId
name
description
# Include additional fields here
}
}
Filtering differences between the query and search endpoints
Filters work mostly the same for both the query and search endpoints, but there are a few differences in the handling of empty arrays and prefix arrays.
Exists filter with empty array
| Endpoint | Behavior | Example |
|---|
| Query | Empty array counted as existing | exists([]) → true |
| Search/Aggregate | Empty array counted as non-existing | exists([]) → false |
Prefix filter on arrays
| Endpoint | Behavior |
|---|
| Query | Checks array prefix sequence (ordered matching). Supports text[] and int[] arrays. |
| Search/Aggregate | Checks each array item separately. Supports only single-value text field prefix filters (no arrays). |
Examples of prefix filter behavior
| Prefix Condition | Query API | Search API | Note |
|---|
"pump" prefix of ["pump", "valve"] | ✅ | ✅ | Both APIs match single elements. |
"pump" prefix of ["pumping", "valve"] | ❌ | ✅ | Only Search matches "pumping" (element prefix exists). |
["pump","valve"] prefix of ["pump","valve","sensor"] | ✅ | ❌ | Search doesn’t support array prefix. |
"pump" prefix of ["valve","pump"] | ❌ | ✅ | Query API checks start sequence. Search any element. |
Nested filters are only supported for core data model assets
Nested filters aren’t supported in the search and aggregation endpoints of Cognite data models, except when filtering direct relations to core data model assets.
If you need to apply nested filters on properties that are not directly related to core data model assets, use the Query API.
Supported nested filters
The following types and properties support nested filtering in the search and aggregation endpoints: