> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognite.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Search features

> Learn how to search for text and property values in your knowledge graph using full-text queries, filters, and relevance ranking.

Use the data modeling [search endpoint](/api-reference/concepts/20230101/instances) to search for text and property values in the knowledge graph. This article explains tokenization, matching rules, and endpoint differences to help you build efficient search capabilities. For example, you can use the search endpoint to implement **full-text queries** and **multiple field search**, and to **rank results by relevance**.

The search endpoint supports:

* **Full-text search** across text fields.
* **Configurable matching logic** with `AND`/`OR` operators.
* **Prefix-based word matching**.
* **Boosting exact phrases**.
* **Filtering on properties**.
* **Converting search results** to a different unit.

<Info>
  See the [Query features](/cdf/dm/dm_concepts/dm_querying#filters) article for information about **filtering**.
</Info>

Queries are eventually consistent. Because of indexing delays, it may take a few seconds for new or updated data to become searchable.

## Search process overview

The search query is executed through three distinct steps:

1. **Text analysis and tokenization**: the search query text is analyzed and split into tokens.
2. **Instance matching**: the tokens are matched against pre-indexed instance data in your Cognite Data Fusion (CDF) project.
3. **Instance ranking**: matched instances are sorted by their relevance to the query.

## Text analysis and tokenization

The instances you ingest are automatically indexed asynchronously. This indexing process includes a **text analysis** step that breaks down the text into smaller components called **tokens**. The system compares the tokens extracted from your search query against the tokens indexed from your instances to determine matches and ranking.
The tokenization process includes:

* **Splitting text into tokens**: the text is broken into words based on whitespace and punctuation.
* **Lower-casing tokens**: all text is converted to lowercase to make searches case-insensitive.

### Tokenization rules

<Info>
  The *smarter search results* feature (*public preview*) introduces additional tokenization for selected fields if enabled for your CDF project.

  For more information, see [About smarter search results](/cdf/dm/dm_concepts/dm_smarter_search_results).
</Info>

During tokenization, the following rules apply:

* **Letters, numbers, and underscores (`_`)** don't split tokens. For example, `user_name`, `File1`, and `AH12`, remain a single token.
* **Periods (`.`), and apostrophes (`'`)**:
  * Don't split token between letters (for example, `e.g.` and `don't`, remain one token).
  * Don't split token between numbers (`3.14` remains one token).
  * Will split tokens in other cases (for example, in `end.`, or between letter and number).
* **Commas (`,`)**:
  * Don't split tokens between two numbers (`1,000` remains one token).
  * Will split tokens between letters.
* **Colons (`:`)**:
  * Don't split tokens between two letters (`Scale:Linear` remains one token).
  * Will split tokens between numbers.
* **Other non-standard characters, such as double quotation mark (`"`), hyphen (`-`), whitespace**, etc., will split tokens.

<Info>
  Refer to the formal specification outlined in [Unicode Standard Annex #29](https://unicode.org/reports/tr29/#Word_Boundaries) for implementation details.
</Info>

### Tokenization example

| Input                                     | Generated tokens                                             | Explanation                                                  |
| ----------------------------------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
| `Pump_123-ABC`                            | `pump_123`, `abc`                                            | Hyphen (non-standard character) splits                       |
| `Temperature Sensor`                      | `temperature`, `sensor`                                      | Whitespace (non-standard character) splits                   |
| `Temperature_Sensor_1`                    | `temperature_sensor_1`                                       | Underscores (standard characters) don't split                |
| `John's pump`                             | `john's`, `pump`                                             | Apostrophe doesn't split the sequence of letters             |
| `example.com`                             | `example.com`                                                | Period doesn't split the letter-to-letter sequence           |
| `system32.exe`                            | `system32`, `exe`                                            | Period splits the number-to-letter sequence                  |
| `first.last 5.10`                         | `first.last`, `5.10`                                         | Period doesn't split the letter or number sequences          |
| `first,last 5,10`                         | `first`, `last`, `5,10`                                      | Comma splits the letter sequence but not the number sequence |
| `first:last 5:10`                         | `first:last`, `5`, `10`                                      | Colon splits the number sequence but not the letter sequence |
| `John's 1st account has 1,000.5$ dollars` | `John's`, `1st`, `account`, `has`, `1,000.5`, `$`, `dollars` | Combined rules                                               |

## Instance matching

After tokenization, the search compares your query tokens against indexed tokens from instances in your CDF project. The matching behavior depends on the token's position in your query:

* **Standard tokens (exact match):** all tokens in the query **except the last one** require an exact match (case-insensitive) with a token in the instance data.
* **Final token (prefix match):** the **last token** is treated as a prefix, allowing for *search-as-you-type* functionality. It matches any word that starts with those characters.

### Example scenario

If you search for `pressure valve ma`, the system matches instances based on the following criteria:

* **`pressure`** and **`valve`**: require an exact match.
* **`ma`**: requires a prefix match (matching `main`, `manifold`, `manual`, etc.).

### Search operators

The token matching rules above determine whether individual tokens match a given instance. The **search operator** determines which instances qualify as a match for the entire query.

| Operator         | Behavior                                           | Example (query: "pressure sensor")                                    |
| :--------------- | :------------------------------------------------- | :-------------------------------------------------------------------- |
| **OR** (default) | Returns instances matching **at least one** token. | Matches instances containing "pressure" only, "sensor" only, or both. |
| **AND**          | Returns only instances matching **all** tokens.    | Matches only instances containing both "pressure" AND "sensor".       |

<Info>
  Effective November 2026, the default search operator will change from `OR` to `AND`. Cognite Flows and Cognite InField already use `AND` as the default operator. To maintain your current search behavior, we recommend explicitly setting the operator in your search queries.
</Info>

### Limitations on matching

Consider these limitations when designing your search:

* **No fuzzy or typo matching**: queries require correct spelling and matching prefixes.
* **No synonym expansion**: queries are matched literally. Synonyms or abbreviations must appear explicitly in the data.

## Instance ranking

When multiple instances match a query, they're ordered by relevance. The following factors determine the ranking:

### Number of matching tokens

Instances that match more tokens are ranked higher. This is mainly relevant when using the `OR` operator since the `AND` operator requires all tokens to match.

For example, searching for `Heat Exchanger 243 alpha` using the `OR` operator:

* Ranks `Heat Exchanger 243` higher (matches three of four tokens).
* Ranks `Heat Exchanger` lower (matches two of four tokens).
* Ranks `Heat` lowest (matches one of four tokens).

### Phrase matching (exact sequences)

Exact phrase matches boost relevance significantly.

For example, searching for `heat exchanger`:

* Ranks `Heat Exchanger` higher (exact phrase match).
* Ranks `Heat For Exchanger` lower (individual tokens matches only).

## Example query matches

### `valve`

With `operator: "OR"`:

* Matches: `Valve control unit`, `Safety valve unit`
* Matches: `Ball-valve` (tokenized as `ball` and `valve`)
* Does not match: `Valvoline` (different token) or `Valve's` (tokenized as `valve's`)

### `pressure sensor`

With `operator: "OR"`:

* Best matches: documents containing both "pressure" and "sensor"
* Lower relevance: documents with either "pressure" or "sensor" alone
* Example matches:
  * `High pressure sensor calibration` (matches both tokens)
  * `Pressure transmitter` (matches only "pressure")
  * `Temperature sensor` (matches only "sensor")
* Does not match:
  * `Pressured equipment` (pressure is not a prefix query)
    With `operator: "AND"`:
* Requires that **both** "pressure" and "sensor" are present as exact matches.
* **Matches**: `High pressure sensor calibration`.
* **Does not match**: `Pressure transmitter` (missing "sensor") or `Temperature sensor` (missing "pressure").

### `compressor fail`

With `operator: "OR"`:

* Matches:
  * `Compressor failure log` ("fail" is a prefix of "failure")
  * `Compressor failing to start` ("fail" is a prefix of "failing")
    With `operator: "AND"`:
* Requires an **exact match** for both "compressor" and "fail".
* **Does not match**: `Compressor failure log`, because `fail` is not an exact match for the token `failure`.
* **Does not match**: `Compressor failing to start`, because `fail` is not an exact match for the token `failing`.

### `oil temp`

With `operator: "OR"`:

* Matches:
  * `Oil temperature readings` ("temp" is a prefix of "temperature")
  * `Oil temporary storage` ("temp" is a prefix of "temporary")

### `flow meter calibra`

With `operator: "OR"`:

* Matches:
  * `Flow meter calibration procedure` (highest rank - all tokens match)
  * `Flow meter maintenance` (medium rank - two exact tokens match)
  * `Calibration of temperature meters` (lower rank - only "meter" and "calibra" match)

### `server1.example.com/v2.0`

With `operator: "OR"`:

* Matches:
  * `Connect to server1.example.com using v2.0 protocol` (highest rank - all tokens match)
  * `server2.example.com documentation` (matches example.com)
  * `API v2.0 reference` (matches version number)
  * `server1 is down after v2.0.1 upgrade` (matches "server1" and prefix on "v2.0")
* Does not match:
  * `example.net` ("example.net" is preserved as a single token)
  * `v2` (`v2.0` is preserved as a single token)

## Example search query

The search query below filters the results to `Equipment` with `temperatures` between 15 and 25 degrees Celsius, and a `name` or `description` that contains the word "temperature", or a word that starts with "sensor".

```json title="Example search query" theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
{
  "view": {
    "type": "view",
    "space": "testSpace",
    "externalId": "Equipment",
    "version": "v1"
  },
  "query": "temperature sensor",
  "operator": "AND",
  "instanceType": "node",
  "properties": ["name", "description"],
  "targetUnits": [],
  "filter": {
    "range": {
      "property": "temperature",
      "lt": 25,
      "gt": 15
    }
  },
  "includeTyping": false,
  "sort": [
    {
      "property": ["externalId"],
      "direction": "ascending"
    }
  ],
  "limit": 100
}
```

To perform the same query with the **GraphQL** endpoint:

```graphql title="Example search query in GraphQL" theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
searchEquipment(
  query: "temperature sensor",
  fields: ["name", "description"],
  filter: {
    range: {
      temperature: {
        lt: 25,
        gt: 15
      }
    }
  },
  sort: { externalId: ASC }
) {
  items {
    externalId
    name
    description
    # Include additional fields here
  }
}
```

## Filtering differences between the `query` and `search` endpoints

Filters work mostly the same for both the `query` and `search` endpoints, but there are a few differences in the handling of empty arrays and prefix arrays.

### Exists filter with empty array

| Endpoint         | Behavior                            | Example                  |
| ---------------- | ----------------------------------- | ------------------------ |
| Query            | Empty array counted as existing     | `exists([])` → **true**  |
| Search/Aggregate | Empty array counted as non-existing | `exists([])` → **false** |

### Prefix filter on arrays

| Endpoint         | Behavior                                                                                             |
| ---------------- | ---------------------------------------------------------------------------------------------------- |
| Query            | Checks array prefix sequence (ordered matching). Supports `text[]` and `int[]` arrays.               |
| Search/Aggregate | Checks each array item separately. Supports only single-value text field prefix filters (no arrays). |

#### Examples of prefix filter behavior

| Prefix Condition                                         | Query API | Search API | Note                                                     |
| -------------------------------------------------------- | --------- | ---------- | -------------------------------------------------------- |
| `"pump"` prefix of `["pump", "valve"]`                   | ✅         | ✅          | Both APIs match single elements.                         |
| `"pump"` prefix of `["pumping", "valve"]`                | ❌         | ✅          | Only Search matches `"pumping"` (element prefix exists). |
| `["pump","valve"]` prefix of `["pump","valve","sensor"]` | ✅         | ❌          | Search doesn't support array prefix.                     |
| `"pump"` prefix of `["valve","pump"]`                    | ❌         | ✅          | Query API checks start sequence. Search any element.     |

### Nested filters are only supported for core data model assets

Nested filters aren't supported in the search and aggregation endpoints of Cognite data models, except when filtering direct relations to core data model assets.

If you need to apply nested filters on properties that are not directly related to core data model assets, use the Query API.

#### Supported nested filters

The following types and properties support nested filtering in the search and aggregation endpoints:

| Core data model type                                                                              | Core data model property | Supported nested properties                       |
| ------------------------------------------------------------------------------------------------- | ------------------------ | ------------------------------------------------- |
| [CogniteActivity](/cdf/dm/dm_reference/dm_core_data_model#activity)                               | assets                   | `path`                                            |
| [CogniteFile](/cdf/dm/dm_reference/dm_core_data_model#file)                                       | assets                   | `path`                                            |
| [CogniteFile](/cdf/dm/dm_reference/dm_core_data_model#file)                                       | category                 | `code`, `standard`, `standardReference`           |
| [CogniteTimeSeries](/cdf/dm/dm_reference/dm_core_data_model#timeseries)                           | assets                   | `path`                                            |
| [CogniteTimeSeries](/cdf/dm/dm_reference/dm_core_data_model#timeseries)                           | unit                     | `symbol`, `source`, `sourceReference`, `quantity` |
| [CogniteEquipment](/cdf/dm/dm_reference/dm_core_data_model#equipment)                             | asset                    | `path`                                            |
| [CogniteMaintenance](/cdf/dm/dm_reference/dm_process_industry_data_model#CogniteMaintenanceOrder) | asset                    | `path`                                            |
| [CogniteNotification](/cdf/dm/dm_reference/dm_process_industry_data_model#CogniteNotification)    | asset                    | `path`                                            |
| [CogniteOperation](/cdf/dm/dm_reference/dm_process_industry_data_model#CogniteOperation)          | asset                    | `path`                                            |
