Skip to main content
Use the data modeling search endpoint to search for text and property values in the knowledge graph. This article explains tokenization, matching rules, and endpoint differences to help you build efficient search capabilities. For example, you can use the search endpoint to implement full-text queries and multiple field search, and to rank results by relevance. The search endpoint supports:
  • Full-text search across text fields.
  • Configurable matching logic with AND/OR operators.
  • Prefix-based word matching.
  • Boosting exact phrases.
  • Filtering on properties.
  • Converting search results to a different unit.
See the Query features article for information about filtering.
Queries are eventually consistent. Because of indexing delays, it may take a few seconds for new or updated data to become searchable.

Search process overview

The search query is executed through three distinct steps:
  1. Text analysis and tokenization: the search query text is analyzed and split into tokens.
  2. Instance matching: the tokens are matched against pre-indexed instance data in your Cognite Data Fusion (CDF) project.
  3. Instance ranking: matched instances are sorted by their relevance to the query.

Text analysis and tokenization

The instances you ingest are automatically indexed asynchronously. This indexing process includes a text analysis step that breaks down the text into smaller components called tokens. The system compares the tokens extracted from your search query against the tokens indexed from your instances to determine matches and ranking. The tokenization process includes:
  • Splitting text into tokens: the text is broken into words based on whitespace and punctuation.
  • Lower-casing tokens: all text is converted to lowercase to make searches case-insensitive.

Tokenization rules

During tokenization, the following rules apply:
  • Letters, numbers, and underscores (_) don’t split tokens. For example, user_name, File1, and AH12, remain a single token.
  • Periods (.), and apostrophes ('):
    • Don’t split token between letters (for example, e.g. and don't, remain one token).
    • Don’t split token between numbers (3.14 remains one token).
    • Will split tokens in other cases (for example, in end., or between letter and number).
  • Commas (,):
    • Don’t split tokens between two numbers (1,000 remains one token).
    • Will split tokens between letters.
  • Colons (:):
    • Don’t split tokens between two letters (Scale:Linear remains one token).
    • Will split tokens between numbers.
  • Other non-standard characters, such as double quotation mark ("), hyphen (-), whitespace, etc., will split tokens.
Refer to the formal specification outlined in Unicode Standard Annex #29 for implementation details.

Tokenization example

InputGenerated tokensExplanation
Pump_123-ABCpump_123, abcHyphen (non-standard character) splits
Temperature Sensortemperature, sensorWhitespace (non-standard character) splits
Temperature_Sensor_1temperature_sensor_1Underscores (standard characters) don’t split
John's pumpjohn's, pumpApostrophe doesn’t split the sequence of letters
example.comexample.comPeriod doesn’t split the letter-to-letter sequence
system32.exesystem32, exePeriod splits the number-to-letter sequence
first.last 5.10first.last, 5.10Period doesn’t split the letter or number sequences
first,last 5,10first, last, 5,10Comma splits the letter sequence but not the number sequence
first:last 5:10first:last, 5, 10Colon splits the number sequence but not the letter sequence
John's 1st account has 1,000.5$ dollarsJohn's, 1st, account, has, 1,000.5, $, dollarsCombined rules

Instance matching

After tokenization, the search compares your query tokens against indexed tokens from instances in your CDF project. The matching behavior depends on the token’s position in your query:
  • Standard tokens (exact match): all tokens in the query except the last one require an exact match (case-insensitive) with a token in the instance data.
  • Final token (prefix match): the last token is treated as a prefix, allowing for search-as-you-type functionality. It matches any word that starts with those characters.

Example scenario

If you search for pressure valve ma, the system matches instances based on the following criteria:
  • pressure and valve: require an exact match.
  • ma: requires a prefix match (matching main, manifold, manual, etc.).

Search operators

The token matching rules above determine whether individual tokens match a given instance. The search operator determines which instances qualify as a match for the entire query.
OperatorBehaviorExample (query: “pressure sensor”)
OR (default)Returns instances matching at least one token.Matches instances containing “pressure” only, “sensor” only, or both.
ANDReturns only instances matching all tokens.Matches only instances containing both “pressure” AND “sensor”.
Effective November 2026, the default search operator will change from OR to AND. To maintain your current search behavior, we recommend explicitly setting the operator in your search queries.

Limitations on matching

Consider these limitations when designing your search:
  • No fuzzy or typo matching: queries require correct spelling and matching prefixes.
  • No synonym expansion: queries are matched literally. Synonyms or abbreviations must appear explicitly in the data.

Instance ranking

When multiple instances match a query, they’re ordered by relevance. The following factors determine the ranking:

Number of matching tokens

Instances that match more tokens are ranked higher. This is mainly relevant when using the OR operator since the AND operator requires all tokens to match. For example, searching for Heat Exchanger 243 alpha using the OR operator:
  • Ranks Heat Exchanger 243 higher (matches three of four tokens).
  • Ranks Heat Exchanger lower (matches two of four tokens).
  • Ranks Heat lowest (matches one of four tokens).

Phrase matching (exact sequences)

Exact phrase matches boost relevance significantly. For example, searching for heat exchanger:
  • Ranks Heat Exchanger higher (exact phrase match).
  • Ranks Heat For Exchanger lower (individual tokens matches only).

Example query matches

valve

With operator: "OR":
  • Matches: Valve control unit, Safety valve unit
  • Matches: Ball-valve (tokenized as ball and valve)
  • Does not match: Valvoline (different token) or Valve's (tokenized as valve's)

pressure sensor

With operator: "OR":
  • Best matches: documents containing both “pressure” and “sensor”
  • Lower relevance: documents with either “pressure” or “sensor” alone
  • Example matches:
    • High pressure sensor calibration (matches both tokens)
    • Pressure transmitter (matches only “pressure”)
    • Temperature sensor (matches only “sensor”)
  • Does not match:
    • Pressured equipment (pressure is not a prefix query) With operator: "AND":
  • Requires that both “pressure” and “sensor” are present as exact matches.
  • Matches: High pressure sensor calibration.
  • Does not match: Pressure transmitter (missing “sensor”) or Temperature sensor (missing “pressure”).

compressor fail

With operator: "OR":
  • Matches:
    • Compressor failure log (“fail” is a prefix of “failure”)
    • Compressor failing to start (“fail” is a prefix of “failing”) With operator: "AND":
  • Requires an exact match for both “compressor” and “fail”.
  • Does not match: Compressor failure log, because fail is not an exact match for the token failure.
  • Does not match: Compressor failing to start, because fail is not an exact match for the token failing.

oil temp

With operator: "OR":
  • Matches:
    • Oil temperature readings (“temp” is a prefix of “temperature”)
    • Oil temporary storage (“temp” is a prefix of “temporary”)

flow meter calibra

With operator: "OR":
  • Matches:
    • Flow meter calibration procedure (highest rank - all tokens match)
    • Flow meter maintenance (medium rank - two exact tokens match)
    • Calibration of temperature meters (lower rank - only “meter” and “calibra” match)

server1.example.com/v2.0

With operator: "OR":
  • Matches:
    • Connect to server1.example.com using v2.0 protocol (highest rank - all tokens match)
    • server2.example.com documentation (matches example.com)
    • API v2.0 reference (matches version number)
    • server1 is down after v2.0.1 upgrade (matches “server1” and prefix on “v2.0”)
  • Does not match:
    • example.net (“example.net” is preserved as a single token)
    • v2 (v2.0 is preserved as a single token)

Example search query

The search query below filters the results to Equipment with temperatures between 15 and 25 degrees Celsius, and a name or description that contains the word “temperature”, or a word that starts with “sensor”.
Example search query
{
  "view": {
    "type": "view",
    "space": "testSpace",
    "externalId": "Equipment",
    "version": "v1"
  },
  "query": "temperature sensor",
  "operator": "AND",
  "instanceType": "node",
  "properties": ["name", "description"],
  "targetUnits": [],
  "filter": {
    "range": {
      "property": "temperature",
      "lt": 25,
      "gt": 15
    }
  },
  "includeTyping": false,
  "sort": [
    {
      "property": ["externalId"],
      "direction": "ascending"
    }
  ],
  "limit": 100
}
To perform the same query with the GraphQL endpoint:
Example search query in GraphQL
searchEquipment(
  query: "temperature sensor",
  fields: ["name", "description"],
  filter: {
    range: {
      temperature: {
        lt: 25,
        gt: 15
      }
    }
  },
  sort: { externalId: ASC }
) {
  items {
    externalId
    name
    description
    # Include additional fields here
  }
}

Filtering differences between the query and search endpoints

Filters work mostly the same for both the query and search endpoints, but there are a few differences in the handling of empty arrays and prefix arrays.

Exists filter with empty array

EndpointBehaviorExample
QueryEmpty array counted as existingexists([])true
Search/AggregateEmpty array counted as non-existingexists([])false

Prefix filter on arrays

EndpointBehavior
QueryChecks array prefix sequence (ordered matching). Supports text[] and int[] arrays.
Search/AggregateChecks each array item separately. Supports only single-value text field prefix filters (no arrays).

Examples of prefix filter behavior

Prefix ConditionQuery APISearch APINote
"pump" prefix of ["pump", "valve"]Both APIs match single elements.
"pump" prefix of ["pumping", "valve"]Only Search matches "pumping" (element prefix exists).
["pump","valve"] prefix of ["pump","valve","sensor"]Search doesn’t support array prefix.
"pump" prefix of ["valve","pump"]Query API checks start sequence. Search any element.

Nested filters are only supported for core data model assets

Nested filters aren’t supported in the search and aggregation endpoints of Cognite data models, except when filtering direct relations to core data model assets. If you need to apply nested filters on properties that are not directly related to core data model assets, use the Query API.

Supported nested filters

The following types and properties support nested filtering in the search and aggregation endpoints:
Core data model typeCore data model property
CogniteActivityassets
CogniteFileassets, category
CogniteTimeSeriesassets, unit
CogniteEquipmentasset
CogniteMaintenanceasset
CogniteNotificationasset
CogniteOperationasset
Last modified on December 19, 2025