Pāriet uz galveno saturu

Search capabilities in data models

Use the data modeling search endpoint to search for text and property values in the knowledge graph. This article explains tokenization, matching rules, and endpoint differences to help you build efficient search capabilities. For example, you can use the search endpoint to implement full-text queries and multiple field search, and to rank results by relevance.

The search endpoint supports:

  • Full-text search across text fields.
  • Prefix-based word matching.
  • Boosting exact phrases.
  • Filtering on properties.
  • Converting search results to a different unit.
informācija

See the Query features article for information about filtering.

Queries are eventually consistent. Because of indexing delays, it may take a few seconds for new or updated data to become searchable.

Example search query

The search query below filters the results to Equipment with temperatures between 15 and 25 degrees Celsius, and a name or description that contains the word "temperature", or a word that starts with "sensor".

Example search query
{
"view": {
"type": "view",
"space": "testSpace",
"externalId": "Equipment",
"version": "v1"
},
"query": "temperature sensor",
"instanceType": "node",
"properties": ["name", "description"],
"targetUnits": [],
"filter": {
"range": {
"property": "temperature",
"lt": 25,
"gt": 15
}
},
"includeTyping": false,
"sort": [
{
"property": ["externalId"],
"direction": "ascending"
}
],
"limit": 100
}

To perform the same query with the GraphQL endpoint:

Example search query in GraphQL
searchEquipment(
query: "temperature sensor",
fields: ["name", "description"],
filter: {
range: {
temperature: {
lt: 25,
gt: 15
}
}
},
sort: { externalId: ASC }
) {
items {
externalId
name
description
// Include additional fields here
}
}

Text analysis and tokenization

The data you ingest is automatically indexed asynchronously. This indexing process includes a text analysis step that breaks down the text into smaller components called tokens. The tokenization allows for efficient searching and matching of terms. The tokenization process includes:

  • Splitting text into tokens: the text is broken into words based on whitespace and punctuation.
  • Lower-casing tokens: all text is converted to lowercase to make searches case-insensitive.

Tokenization example

InputGenerated tokens
Pump_123-ABCpump_123, abc
Temperature Sensortemperature, sensor
Temperature_Sensortemperature_sensor (single token)
John's pumpjohn's, pump
example.comexample.com (period preserved)
system32.exesystem32, exe (period splits)
123.45123.45 (period preserved)

During tokenization, the following rules apply:

  • Underscores _ and apostrophes ' don't split tokens. They're preserved as part of the token.
  • Most punctuation including hyphens -, commas ,, colons :, and spaces split tokens.
  • Periods (.) are a special case:
    • Don't split token between letters (example.com remains one token)
    • Don't split token between numbers (123.45 remains one token)
    • Will split tokens in most other cases.
    • Split token between letter and number (like system32.exesystem32, exe)

For example, searching for sensor finds Temperature Sensor (split into two tokens) but not Temperature_Sensor (remains a single token).

Bool prefix matching

The search endpoint uses this matching approach when you enter a multi-word search query:

  • Convert each word except the last one to an exact match.
  • Convert the last word to a prefix match, allowing for partial word matching.

This approach enables precise matching for complete words while supporting partial matching on the final word.

For example, if you search for "pressure valve ma", the system creates:

  • An exact match for "pressure".
  • An exact match for "valve".
  • A prefix match for "ma" (which could match "main", "maintenance", "manual", etc.)

For a query to match a document, at least one of the conditions must be met. Documents matching several conditions will rank higher in the results.

Matching details

  • Exact term matching: complete words are matched exactly (case-insensitive).
  • Prefix matching: the last term of the query matches the beginning of words in the document.
  • OR logic by default: any term match contributes to the document's relevance score.

For example, searching for pump fail, matches items such as:

  • pump failure (exact match on "pump", prefix match on "fail")
  • Pump Station (exact match on "pump" only)
  • Failure detection (prefix match on "fail" only)

Items matching both terms rank higher in the results.

Phrase matching (exact sequences)

Exact phrase matches boost relevance significantly.

For example, searching for heat exchanger:

  • Ranks Heat Exchanger higher (exact phrase match).
  • Ranks Exchanger for heat lower (individual term matches only).

Limitations on matching

Matching has these limitations:

  • No fuzzy or typo matching: queries require correct spelling and matching prefixes.

  • No synonym expansion: queries are matched literally. Synonyms or abbreviations must appear explicitly in the data.

Example query matches

valve

  • Matches: Valve control unit, Safety valve unit
  • Matches: Ball-valve (tokenized as ball and valve)
  • Does not match: Valvoline (different token) or Valve's (tokenized as valve's)

pressure sensor

  • Best matches: documents containing both "pressure" and "sensor"
  • Lower relevance: documents with either "pressure" or "sensor" alone
  • Example matches:
    • High pressure sensor calibration (matches both terms)
    • Pressure transmitter (matches only "pressure")
    • Temperature sensor (matches only "sensor")
  • Does not match:
    • Pressured equipment (pressure is not a prefix query)

compressor fail

  • Matches:
    • Compressor failure log ("fail" is a prefix of "failure")
    • Compressor failing to start ("fail" is a prefix of "failing")

oil temp

  • Matches:
    • Oil temperature readings ("temp" is a prefix of "temperature")
    • Oil temporary storage ("temp" is a prefix of "temporary")

flow meter calibra

  • Matches:
    • Flow meter calibration procedure (highest rank - all terms match)
    • Flow meter maintenance (medium rank - two exact terms match)
    • Calibration of temperature meters (lower rank - only "meter" and "calibra" match)

server1.example.com/v2.0

  • Matches:
    • Connect to server1.example.com using v2.0 protocol (highest rank - all terms match)
    • server2.example.com documentation (matches example.com)
    • API v2.0 reference (matches version number)
    • server1 is down after v2.0.1 upgrade (matches "server1" and prefix on "v2.0")
  • Does not match:
    • example.net ("example.net" is preserved as a single token)
    • v2 (v2.0 is preserved as a single token)

Filtering differences between the query and search endpoints

Filters work mostly the same for both the query and search endpoints, but there are a few differences in the handling of empty arrays and prefix arrays.

Exists filter with empty array

EndpointBehaviorExample
QueryEmpty array counted as existingexists([])true
Search/AggregateEmpty array counted as non-existingexists([])false

Prefix filter on arrays

EndpointBehavior
QueryChecks array prefix sequence (ordered matching). Supports text[] and int[] arrays.
Search/AggregateChecks each array item separately. Supports only single-value text field prefix filters (no arrays).

Examples of prefix filter behavior

Prefix ConditionQuery APISearch APINote
"pump" prefix of ["pump", "valve"]Both APIs match single elements.
"pump" prefix of ["pumping", "valve"]Only Search matches "pumping" (element prefix exists).
["pump","valve"] prefix of ["pump","valve","sensor"]Search doesn't support array prefix.
"pump" prefix of ["valve","pump"]Query API checks start sequence. Search any element.

Nested filters are only supported for core data model assets

Nested filters aren't supported in the search and aggregation endpoints of Cognite data models, except when filtering direct relations to core data model assets.

If you need to apply nested filters on properties that are not directly related to core data model assets, use the Query API.

Supported nested filters

The following types and properties support nested filtering in the search and aggregation endpoints:

Core data model typeCore data model property
CogniteActivityassets
CogniteFileassets, category
CogniteTimeSeriesassets, unit
CogniteEquipmentasset
CogniteMaintenanceasset
CogniteNotificationasset
CogniteOperationasset