Pāriet uz galveno saturu

Search capabilities in data models

Use the data modeling search endpoint to search for text and property values in the knowledge graph. This article explains tokenization, matching rules, and endpoint differences to help you build efficient search capabilities. For example, you can use the search endpoint to implement full-text queries and multiple field search, and to rank results by relevance.

The search endpoint supports:

  • Full-text search across text fields.
  • Configurable matching logic with AND/OR operators.
  • Prefix-based word matching.
  • Boosting exact phrases.
  • Filtering on properties.
  • Converting search results to a different unit.
informācija

See the Query features article for information about filtering.

Queries are eventually consistent. Because of indexing delays, it may take a few seconds for new or updated data to become searchable.

Example search query

The search query below filters the results to Equipment with temperatures between 15 and 25 degrees Celsius, and a name or description that contains the word "temperature", or a word that starts with "sensor".

Example search query
{
"view": {
"type": "view",
"space": "testSpace",
"externalId": "Equipment",
"version": "v1"
},
"query": "temperature sensor",
"operator": "AND",
"instanceType": "node",
"properties": ["name", "description"],
"targetUnits": [],
"filter": {
"range": {
"property": "temperature",
"lt": 25,
"gt": 15
}
},
"includeTyping": false,
"sort": [
{
"property": ["externalId"],
"direction": "ascending"
}
],
"limit": 100
}

To perform the same query with the GraphQL endpoint:

Example search query in GraphQL
searchEquipment(
query: "temperature sensor",
fields: ["name", "description"],
filter: {
range: {
temperature: {
lt: 25,
gt: 15
}
}
},
sort: { externalId: ASC }
) {
items {
externalId
name
description
// Include additional fields here
}
}

Text analysis and tokenization

The data you ingest is automatically indexed asynchronously. This indexing process includes a text analysis step that breaks down the text into smaller components called tokens. The tokenization allows for efficient searching and matching of terms. The tokenization process includes:

  • Splitting text into tokens: the text is broken into words based on whitespace and punctuation.
  • Lower-casing tokens: all text is converted to lowercase to make searches case-insensitive.

Tokenization example

InputGenerated tokens
Pump_123-ABCpump_123, abc
Temperature Sensortemperature, sensor
Temperature_Sensortemperature_sensor (single token)
John's pumpjohn's, pump
example.comexample.com (period preserved)
system32.exesystem32, exe (period splits)
123.45123.45 (period preserved)

During tokenization, the following rules apply:

  • Underscores _ and apostrophes ' don't split tokens. They're preserved as part of the token.
  • Most punctuation including hyphens -, commas ,, colons :, and spaces split tokens.
  • Periods (.) are a special case:
    • Don't split token between letters (example.com remains one token)
    • Don't split token between numbers (123.45 remains one token)
    • Will split tokens in most other cases.
    • Split token between letter and number (like system32.exesystem32, exe)

For example, searching for sensor finds Temperature Sensor (split into two tokens) but not Temperature_Sensor (remains a single token).

Query matching logic: OR vs. AND

The way your search query is matched against multiple properties can be controlled by setting the operator field to either AND or OR. The default is OR.

OR operator (default behavior)

When the operator is set to "OR" or is omitted, the search uses bool prefix matching. For a query to match a document, at least one of the query terms must match. This is ideal for broader, more inclusive searches.

The matching logic is as follows:

  • Each word in the query except the last one is treated as an exact match (case-insensitive).
  • The last word is treated as a prefix match, allowing for "search-as-you-type" functionality.
  • Instances that match more terms will rank higher in the results.

For example, a search for "pressure valve ma" creates:

  • An exact match for "pressure".
  • An exact match for "valve".
  • A prefix match for "ma" (matching "main", "manual", etc.).

An instance with just "pressure" would match, but a document containing all three would rank much higher.

AND operator (strict behavior)

When the operator is set to AND, the search requires that all terms in the query string are present in the searched properties for an instance to be returned. This uses a cross_fields matching strategy, meaning the terms can be in any of the specified properties. This is useful for more precise searches where all criteria must be met.

  • All search terms in the query string are present as exact, case-insensitive matches.
  • Prefix matching is not applied when using the AND operator.

For example, a search for "pressure sensor" with operator: "AND" will only match documents that contain both the word "pressure" and the word "sensor". A document containing only "pressure" will not be returned.

Phrase matching (exact sequences)

Exact phrase matches boost relevance significantly.

For example, searching for heat exchanger:

  • Ranks Heat Exchanger higher (exact phrase match).
  • Ranks Exchanger for heat lower (individual term matches only).

Limitations on matching

Matching has these limitations:

  • No fuzzy or typo matching: queries require correct spelling and matching prefixes.

  • No synonym expansion: queries are matched literally. Synonyms or abbreviations must appear explicitly in the data.

Example query matches

valve

  • Matches: Valve control unit, Safety valve unit
  • Matches: Ball-valve (tokenized as ball and valve)
  • Does not match: Valvoline (different token) or Valve's (tokenized as valve's)

pressure sensor

With operator: "OR":

  • Best matches: documents containing both "pressure" and "sensor"
  • Lower relevance: documents with either "pressure" or "sensor" alone
  • Example matches:
    • High pressure sensor calibration (matches both terms)
    • Pressure transmitter (matches only "pressure")
    • Temperature sensor (matches only "sensor")
  • Does not match:
    • Pressured equipment (pressure is not a prefix query) With operator: "AND":
  • Requires that both "pressure" and "sensor" are present as exact matches.
  • Matches: High pressure sensor calibration.
  • Does not match: Pressure transmitter (missing "sensor") or Temperature sensor (missing "pressure").

compressor fail

With operator: "OR":

  • Matches:
    • Compressor failure log ("fail" is a prefix of "failure")
    • Compressor failing to start ("fail" is a prefix of "failing") With operator: "AND":
  • Requires an exact match for both "compressor" and "fail".
  • Does not match: Compressor failure log, because fail is not an exact match for the token failure.
  • Does not match: Compressor failing to start, because fail is not an exact match for the token failing.

oil temp

With operator: "OR":

  • Matches:
    • Oil temperature readings ("temp" is a prefix of "temperature")
    • Oil temporary storage ("temp" is a prefix of "temporary")

flow meter calibra

With operator: "OR":

  • Matches:
    • Flow meter calibration procedure (highest rank - all terms match)
    • Flow meter maintenance (medium rank - two exact terms match)
    • Calibration of temperature meters (lower rank - only "meter" and "calibra" match)

server1.example.com/v2.0

With operator: "OR":

  • Matches:
    • Connect to server1.example.com using v2.0 protocol (highest rank - all terms match)
    • server2.example.com documentation (matches example.com)
    • API v2.0 reference (matches version number)
    • server1 is down after v2.0.1 upgrade (matches "server1" and prefix on "v2.0")
  • Does not match:
    • example.net ("example.net" is preserved as a single token)
    • v2 (v2.0 is preserved as a single token)

Filtering differences between the query and search endpoints

Filters work mostly the same for both the query and search endpoints, but there are a few differences in the handling of empty arrays and prefix arrays.

Exists filter with empty array

EndpointBehaviorExample
QueryEmpty array counted as existingexists([])true
Search/AggregateEmpty array counted as non-existingexists([])false

Prefix filter on arrays

EndpointBehavior
QueryChecks array prefix sequence (ordered matching). Supports text[] and int[] arrays.
Search/AggregateChecks each array item separately. Supports only single-value text field prefix filters (no arrays).

Examples of prefix filter behavior

Prefix ConditionQuery APISearch APINote
"pump" prefix of ["pump", "valve"]Both APIs match single elements.
"pump" prefix of ["pumping", "valve"]Only Search matches "pumping" (element prefix exists).
["pump","valve"] prefix of ["pump","valve","sensor"]Search doesn't support array prefix.
"pump" prefix of ["valve","pump"]Query API checks start sequence. Search any element.

Nested filters are only supported for core data model assets

Nested filters aren't supported in the search and aggregation endpoints of Cognite data models, except when filtering direct relations to core data model assets.

If you need to apply nested filters on properties that are not directly related to core data model assets, use the Query API.

Supported nested filters

The following types and properties support nested filtering in the search and aggregation endpoints:

Core data model typeCore data model property
CogniteActivityassets
CogniteFileassets, category
CogniteTimeSeriesassets, unit
CogniteEquipmentasset
CogniteMaintenanceasset
CogniteNotificationasset
CogniteOperationasset