Search features - Cognite Docs

Use the data modeling search endpoint to search for text and property values in the knowledge graph. This article explains tokenization, matching rules, and endpoint differences to help you build efficient search capabilities. For example, you can use the search endpoint to implement full-text queries and multiple field search, and to rank results by relevance. The search endpoint supports:

Full-text search across text fields.
Configurable matching logic with AND/OR operators.
Prefix-based word matching.
Boosting exact phrases.
Filtering on properties.
Converting search results to a different unit.

See the Query features article for information about filtering.

Queries are eventually consistent. Because of indexing delays, it may take a few seconds for new or updated data to become searchable.

Search process overview

The search query is executed through three distinct steps:

Text analysis and tokenization: the search query text is analyzed and split into tokens.
Instance matching: the tokens are matched against pre-indexed instance data in your Cognite Data Fusion (CDF) project.
Instance ranking: matched instances are sorted by their relevance to the query.

Text analysis and tokenization

The instances you ingest are automatically indexed asynchronously. This indexing process includes a text analysis step that breaks down the text into smaller components called tokens. The system compares the tokens extracted from your search query against the tokens indexed from your instances to determine matches and ranking. The tokenization process includes:

Splitting text into tokens: the text is broken into words based on whitespace and punctuation.
Lower-casing tokens: all text is converted to lowercase to make searches case-insensitive.

Tokenization rules

The smarter search results feature (beta) introduces additional tokenization for selected fields if enabled for your CDF project.For more information, see About smarter search results.

During tokenization, the following rules apply:

Letters, numbers, and underscores (_) don’t split tokens. For example, user_name, File1, and AH12, remain a single token.
Periods (.), and apostrophes ('):
- Don’t split token between letters (for example, e.g. and don't, remain one token).
- Don’t split token between numbers (3.14 remains one token).
- Will split tokens in other cases (for example, in end., or between letter and number).
Commas (,):
- Don’t split tokens between two numbers (1,000 remains one token).
- Will split tokens between letters.
Colons (:):
- Don’t split tokens between two letters (Scale:Linear remains one token).
- Will split tokens between numbers.
Other non-standard characters, such as double quotation mark ("), hyphen (-), whitespace, etc., will split tokens.

Refer to the formal specification outlined in Unicode Standard Annex #29 for implementation details.

Tokenization example

Input	Generated tokens	Explanation
`Pump_123-ABC`	`pump_123`, `abc`	Hyphen (non-standard character) splits
`Temperature Sensor`	`temperature`, `sensor`	Whitespace (non-standard character) splits
`Temperature_Sensor_1`	`temperature_sensor_1`	Underscores (standard characters) don’t split
`John's pump`	`john's`, `pump`	Apostrophe doesn’t split the sequence of letters
`example.com`	`example.com`	Period doesn’t split the letter-to-letter sequence
`system32.exe`	`system32`, `exe`	Period splits the number-to-letter sequence
`first.last 5.10`	`first.last`, `5.10`	Period doesn’t split the letter or number sequences
`first,last 5,10`	`first`, `last`, `5,10`	Comma splits the letter sequence but not the number sequence
`first:last 5:10`	`first:last`, `5`, `10`	Colon splits the number sequence but not the letter sequence
`John's 1st account has 1,000.5$ dollars`	`John's`, `1st`, `account`, `has`, `1,000.5`, `$`, `dollars`	Combined rules

Instance matching

After tokenization, the search compares your query tokens against indexed tokens from instances in your CDF project. The matching behavior depends on the token’s position in your query:

Standard tokens (exact match): all tokens in the query except the last one require an exact match (case-insensitive) with a token in the instance data.
Final token (prefix match): the last token is treated as a prefix, allowing for search-as-you-type functionality. It matches any word that starts with those characters.

Example scenario

If you search for pressure valve ma, the system matches instances based on the following criteria:

pressure and valve: require an exact match.
ma: requires a prefix match (matching main, manifold, manual, etc.).

Search operators

The token matching rules above determine whether individual tokens match a given instance. The search operator determines which instances qualify as a match for the entire query.

Operator	Behavior	Example (query: “pressure sensor”)
OR (default)	Returns instances matching at least one token.	Matches instances containing “pressure” only, “sensor” only, or both.
AND	Returns only instances matching all tokens.	Matches only instances containing both “pressure” AND “sensor”.

Effective November 2026, the default search operator will change from OR to AND. To maintain your current search behavior, we recommend explicitly setting the operator in your search queries.

Limitations on matching

Consider these limitations when designing your search:

No fuzzy or typo matching: queries require correct spelling and matching prefixes.
No synonym expansion: queries are matched literally. Synonyms or abbreviations must appear explicitly in the data.

Instance ranking

When multiple instances match a query, they’re ordered by relevance. The following factors determine the ranking:

Number of matching tokens

Instances that match more tokens are ranked higher. This is mainly relevant when using the OR operator since the AND operator requires all tokens to match. For example, searching for Heat Exchanger 243 alpha using the OR operator:

Ranks Heat Exchanger 243 higher (matches three of four tokens).
Ranks Heat Exchanger lower (matches two of four tokens).
Ranks Heat lowest (matches one of four tokens).

Phrase matching (exact sequences)

Exact phrase matches boost relevance significantly. For example, searching for heat exchanger:

Ranks Heat Exchanger higher (exact phrase match).
Ranks Heat For Exchanger lower (individual tokens matches only).

Example query matches

`valve`

With operator: "OR":

Matches: Valve control unit, Safety valve unit
Matches: Ball-valve (tokenized as ball and valve)
Does not match: Valvoline (different token) or Valve's (tokenized as valve's)

`pressure sensor`

With operator: "OR":

Best matches: documents containing both “pressure” and “sensor”
Lower relevance: documents with either “pressure” or “sensor” alone
Example matches:
- High pressure sensor calibration (matches both tokens)
- Pressure transmitter (matches only “pressure”)
- Temperature sensor (matches only “sensor”)
Does not match:
- Pressured equipment (pressure is not a prefix query) With operator: "AND":
Requires that both “pressure” and “sensor” are present as exact matches.
Matches: High pressure sensor calibration.
Does not match: Pressure transmitter (missing “sensor”) or Temperature sensor (missing “pressure”).

`compressor fail`

With operator: "OR":

Matches:
- Compressor failure log (“fail” is a prefix of “failure”)
- Compressor failing to start (“fail” is a prefix of “failing”) With operator: "AND":
Requires an exact match for both “compressor” and “fail”.
Does not match: Compressor failure log, because fail is not an exact match for the token failure.
Does not match: Compressor failing to start, because fail is not an exact match for the token failing.

`oil temp`

With operator: "OR":

Matches:
- Oil temperature readings (“temp” is a prefix of “temperature”)
- Oil temporary storage (“temp” is a prefix of “temporary”)

`flow meter calibra`

With operator: "OR":

Matches:
- Flow meter calibration procedure (highest rank - all tokens match)
- Flow meter maintenance (medium rank - two exact tokens match)
- Calibration of temperature meters (lower rank - only “meter” and “calibra” match)

`server1.example.com/v2.0`

With operator: "OR":

Matches:
- Connect to server1.example.com using v2.0 protocol (highest rank - all tokens match)
- server2.example.com documentation (matches example.com)
- API v2.0 reference (matches version number)
- server1 is down after v2.0.1 upgrade (matches “server1” and prefix on “v2.0”)
Does not match:
- example.net (“example.net” is preserved as a single token)
- v2 (v2.0 is preserved as a single token)

Example search query

The search query below filters the results to Equipment with temperatures between 15 and 25 degrees Celsius, and a name or description that contains the word “temperature”, or a word that starts with “sensor”.

Example search query

{
  "view": {
    "type": "view",
    "space": "testSpace",
    "externalId": "Equipment",
    "version": "v1"
  },
  "query": "temperature sensor",
  "operator": "AND",
  "instanceType": "node",
  "properties": ["name", "description"],
  "targetUnits": [],
  "filter": {
    "range": {
      "property": "temperature",
      "lt": 25,
      "gt": 15
    }
  },
  "includeTyping": false,
  "sort": [
    {
      "property": ["externalId"],
      "direction": "ascending"
    }
  ],
  "limit": 100
}

To perform the same query with the GraphQL endpoint:

Example search query in GraphQL

searchEquipment(
  query: "temperature sensor",
  fields: ["name", "description"],
  filter: {
    range: {
      temperature: {
        lt: 25,
        gt: 15
      }
    }
  },
  sort: { externalId: ASC }
) {
  items {
    externalId
    name
    description
    # Include additional fields here
  }
}

Filtering differences between the `query` and `search` endpoints

Filters work mostly the same for both the query and search endpoints, but there are a few differences in the handling of empty arrays and prefix arrays.

Exists filter with empty array

Endpoint	Behavior	Example
Query	Empty array counted as existing	`exists([])` → true
Search/Aggregate	Empty array counted as non-existing	`exists([])` → false

Prefix filter on arrays

Endpoint	Behavior
Query	Checks array prefix sequence (ordered matching). Supports `text[]` and `int[]` arrays.
Search/Aggregate	Checks each array item separately. Supports only single-value text field prefix filters (no arrays).

Examples of prefix filter behavior

Prefix Condition	Query API	Search API	Note
`"pump"` prefix of `["pump", "valve"]`	✅	✅	Both APIs match single elements.
`"pump"` prefix of `["pumping", "valve"]`	❌	✅	Only Search matches `"pumping"` (element prefix exists).
`["pump","valve"]` prefix of `["pump","valve","sensor"]`	✅	❌	Search doesn’t support array prefix.
`"pump"` prefix of `["valve","pump"]`	❌	✅	Query API checks start sequence. Search any element.

Nested filters are only supported for core data model assets

Nested filters aren’t supported in the search and aggregation endpoints of Cognite data models, except when filtering direct relations to core data model assets. If you need to apply nested filters on properties that are not directly related to core data model assets, use the Query API.

Supported nested filters

The following types and properties support nested filtering in the search and aggregation endpoints:

Core data model type	Core data model property
CogniteActivity	assets
CogniteFile	assets, category
CogniteTimeSeries	assets, unit
CogniteEquipment	asset
CogniteMaintenance	asset
CogniteNotification	asset
CogniteOperation	asset

Data engineering

​Search process overview

​Text analysis and tokenization

​Tokenization rules

​Tokenization example

​Instance matching

​Example scenario

​Search operators

​Limitations on matching

​Instance ranking

​Number of matching tokens

​Phrase matching (exact sequences)

​Example query matches

​valve

​pressure sensor

​compressor fail

​oil temp

​flow meter calibra

​server1.example.com/v2.0

​Example search query

​Filtering differences between the query and search endpoints

​Exists filter with empty array

​Prefix filter on arrays

​Examples of prefix filter behavior

​Nested filters are only supported for core data model assets

​Supported nested filters

Search process overview

Text analysis and tokenization

Tokenization rules

Tokenization example

Instance matching

Example scenario

Search operators

Limitations on matching

Instance ranking

Number of matching tokens

Phrase matching (exact sequences)

Example query matches

`valve`

`pressure sensor`

`compressor fail`

`oil temp`

`flow meter calibra`

`server1.example.com/v2.0`

Example search query

Filtering differences between the `query` and `search` endpoints

Exists filter with empty array

Prefix filter on arrays

Examples of prefix filter behavior

Nested filters are only supported for core data model assets

Supported nested filters