Skip to main content

Optimizing data models for AI search

This article outlines best practices for documenting data models and optimize the accuracy of AI search in Cognite Data Fusion (CDF).

With AI search, a large language model (LLM) analyzes user queries and converts them to GraphQL queries to retrieve data from CDF. The LLM has context about the underlying data model in CDF, including from any documentation included in the data model definition.

The GraphQL queries are always syntactically correct. The semantic correctness depends on how well the underlying data model is documented, and on the naming conventions used for types and properties.

Follow the best practices below to document the data model and increase the semantic accuracy of natural language AI search. It'll help users more effectively find the data they're looking for.

tip

To make sure that the data model is user-friendly and accessible, consider the types of queries users might ask about the data. Would they, provided with the same context as the LLM, be able to answer the questions?

Use human-readable names

Ideally, use human-readable names for types and properties and avoid abbreviations. Clear and concise property names enhance the understanding of the data model.

Include short descriptions for types

Add a brief description above each type to give context. For instance, if the type Event is used for work orders and alerts, specify this in the type's docstring. This helps the LLM understand that queries about work orders and alerts refer to Event.

"""
An abstraction of an event that occurs at some point in time.
An Event can be a work order or an alert.
For example: Open Valve A, Place scaffolding
"""
type Event {..}

Document every property

Document every property within a type. Include a clear description and provide an example value. Follow this convention:

...
type Operation {
...
"""
ID from the source system
Example: 21003104
"""
id: String
...
}

Specify enum values

If a property can only have specific values (is an Enum), clearly specify this in the docstring. For example:

  """
Active status of the work order. Can only be one of: [Open, Closed, Released]
Example: Open
"""
status: String