Defining a schema: Bringing your own properties into the graph

You can create your own schemas to enrich instances with custom properties. The schemas consist of three key elements:

Containers: define physical storage which contains properties.
Views: establish logical schemas which map properties.
Data models are collections of one or more views, used for graph data consumption and ingestion.

All three elements are scoped to a space, just like instances:

Containers

Containers are the physical storage for properties. They are defined within a space, and hold a set of properties that logically belong together. You must define types for your properties, and you can add optional constraints that the data must adhere to, and define indexes to optimize query performance.

Containers store properties for instances (nodes and edges.) An instance can have properties in multiple containers:

You can populate the containers for an instance, in this example below, for a node.

This data:

externalId: 'xyz42'
equipment:
  manufacturer: 'Acme Inc.'
pump:
  maxPressure: 1.2

translates to this:

remarque

You can define containers in different space than the space holding the instances. This can be useful if you want to use the same schema for nodes in different spaces, which is often the case given the access control model.

As you add data to these containers for more nodes, the physical storage of the containers will look similar to this:

Note that only node.{space, externalId, type} is included in the Node base container for brevity.

This is similar to relational database schemas where (space, externalId) constitutes a foreign key to the core node table, and results in a snowflake schema. Importantly, this data lives on a different plane than the graph data discussed in the previous section. For example, nothing ensures that a node has data in Pump just because it has node.type set to [types, pump]. Validation of data content is left to the client to determine, but you can use views to make it more ergonomic.

Which types of instances can you use a container for?

The usedFor field lets you define which types of instances the containers can be used for. Specify one of these values:

node: the container can only be used to populate properties on a node.
edge: the container can only be used to populate properties on an edge.
all: the container can be used to populate properties on both nodes and edges.

remarque

If you use all, ingesting to the container will be more expensive than using only node or edge.

Properties

When you define a container, you must specify the properties it will contain. Data modeling supports the following basic data types for properties:

Property type	Description
`text`	A string of characters.
`int64`	A 64-bit integer.
`float64`	A 64-bit floating point number.
`float32`	A 32-bit floating point number.
`boolean`	A boolean value.
`timestamp`	A timestamp (with timezone).
`date`	A date (without timezone).
`json`	A JSON object.
`direct`	A direct relation to another instance.
`enum`	A fixed set of named values.

In addition to these property types, we support native reference types that point to resources in other Cognite APIs. This lets you reference data not suited for storage in a property graph. We support the following native resource reference types:

Native resource reference type	Description
`TimeSeries`	A reference to one specific time series. You can use GraphQL queries to expand data from the time series, including data points.
`File`	A reference to a file stored in CDF and uploaded through the files service.
`Sequences`	A reference to a sequence stored in CDF.

We support declaring all of these base and reference types as lists. For example, to store a list of file references: files: [File]

You can specify whether the property is nullable, immutable, and provide a default value. Marking a property as immutable means that its value cannot be changed after it is set, although it is possible to toggle this setting.

The full specification of a required string property can look like this:

name: myStringProperty
description: A string property
nullable: false
immutable: true
defaultValue: foo
type:
  type: text
  list: false

conseil

When creating your containers, it is important to consider how they will be queried. Identify the properties likely to be used to filter and sort on when querying, and create indexes or composite indexes to support your queries. Since indexes can represent additional update overhead when data is mutated or ingested, make sure the indexes you create are useful and necessary with testing and performance validation.

It is possible to set nullable to false on an existing property. New null values will be rejected, but existing null values remain. To check for existing null values, fetch the container and check the constraintState.nullability field on the property. This field has four possible values:

null: The property is nullable.
"current": The nullability constraint is valid. No null values exist.
"failed": The nullability constraint is violated. Null values exist in the non-nullable property.
"pending": The validity has not been computed yet. Check back later.

Indexes

Well-designed indexes speed up data access, they support constraints such as uniqueness, and efficient cursoring with custom sort operations. Data modeling supports up to 10 indexes per container.

An index belongs to the container and is not a flag on a property. When you're laying out your physical schema, it's important to remember that you can only build indexes using properties from the same container. Indexes cannot be built from properties hosted in different containers.

We support two index types:

btree: Use a btree index on primitive base types for efficient lookups and range scans. You can set btree indexes to be cursorable, and enable efficient cursoring with custom sorts. Set btree indexes to bySpace, indicating that they'll include the node.space property as a prefix on the index.
inverted: Use an inverted index for list-type properties to enable efficient searching for values that appear within the list.

After an index is created through an API call, the service will start building the index in the background. To check the progress, fetch the container and check the state field of the index. This state field has three possible values:

"current": The index is successfully built, and will speed up data access.
"pending": The index is still building. Data accesses can't yet make use of it.
"failed": Building the index failed. See below for one common reason why B-tree indexes can fail.

Size bounds on btree indexes

B-tree indexes are less effective for very large property values. Large indexed values can exceed datastore limits and cause data ingestion to fail. Large pre-existing values can cause new indexes to fail to build.

To prevent ingestion failures and guard against broken B-tree indexes, you should limit the size of indexed properties. You can bound the property size in two ways:

maxListSize limits the number of elements in a list property. The limit can be set as high as 2000. To be included in a B-tree index, the limit must be at most 300 or 600, depending on the property type.
maxTextSize limits the number of UTF-8-encoded bytes in a text property. The limit can be set as high as 128k. To be included in a B-tree index, the limit mustn't exceed 2400 bytes.

If a btree index has several indexed properties, the combined limits on all the properties must not exceed 2400 bytes.

You must set maxListSize to include a property in a new B-tree index. We currently allow properties without maxTextSize to be included in B-tree indexes, but this will be disallowed in the future.

Existing data that exceeds size bounds

You can add size limits to existing properties that already contain oversized data. New data that exceeds the limits will be rejected, but existing oversized data remains. To see if oversized data exists in a property, check the constraintState field. It contains two subfields:

maxListSize indicates the state of any limit on the number of elements in a list property.
maxTextSize indicates the state of any limit on the size of a text property.

These state fields have four possible values:

null: There is no corresponding size bound on the property.
"current": The size limit is valid. No oversized data exists.
"failed": The size limit is invalid. The property contains oversized values.
"pending": The state is still computing. Check back later.

Clean up existing data that exceeds size bounds before creating B-tree indexes. Oversized data can break new indexes and stop them from improving query performance.

Constraints

Use constraints to restrict the values that can be stored by a property, or in a container. Constraints ensure that the data has integrity, and reflects the real world. We support up to 10 constraints per container.

Currently, we support two constraint types:

uniqueness: Ensures that the values of a property or a set of properties are unique within the container. Set a uniqueness constraint to bySpace, indicating that the uniqueness will apply per space.
requires: Points to another container, and requires that the instance has data in that other container used to populate this container.

You can add constraints to existing containers that already contain data that violates the constraint. New data that violates the constraint will be rejected, but existing data remains. To see if existing data violates the constraint, fetch the container and check the state field of the constraint. The state field has three possible values:

"current": The constraint is satisfied. No data violates the constraint.
"failed": Data exists that violates the constraint.
"pending": The validity of the constraint has not been computed yet. Check back later.

Example container definition

This example defines two containers, Equipment and Pump:

It sets usedFor to node on both containers to allow them to be populated for nodes, not edges.
The btree index on Equipment.manufacturer enables efficient sorting/filtering on the manufacturer property. The index is cursorable and lets you efficiently cursor through equipment nodes when sorting on manufacturer.
The requires constraint on Pump ensures that any node with data in the Pump container also has data in the Equipment container.

Pump-Equipment definition
- space: equipment
  externalId: Equipment
  usedFor: node
  properties:
    - manufacturer:
      type:
        type: text
        list: false
        nullable: false
  indexes:
    manufacturer:
      type: btree
      properties:
        - manufacturer
      cursorable: True
- space: equipment
  externalId: Pump
  usedFor: node
  properties:
    - maxPressure:
      type:
        type: float64
        list: false
        nullable: false
  constraints:
    requireEquipment:
      constraintType: requires
      require:
        space: equipment
        externalId: Equipment

Views

Use views to create logical schemas to consume and populate a graph tailored for specific use cases. Like containers, views contain a group of properties. You define the views by either mapping container properties, or by creating connection properties to express the expected relationships in the graph.

infos

You query data through your defined views. Data is not queried directly from the containers.

Mapped properties

Views let you map properties from different containers in a "flat" object and rename or alias properties.

For example, this view creates a flat object with the properties manufacturer and maxPressure from the Equipment and Pump containers. It also renames the manufacturer property to producer:

You can use the view to populate a node with data from both the Equipment and Pump containers at the same time, and query for the properties when retrieving the nodes.

Connection properties

Connection properties let you describe that you expect certain direct relations or edges to exist between nodes in the graph. When this metadata is persisted, you can retrieve related data when consuming instances through a particular view.

Reverse Direct relation

Defining a reverse direct relation property between two data types is useful for describing the connections that exist in the graph. In the example below the Equipment has a direct relation to a Manufacturer. Since the relation property is part of the Equipment properties, the manufacturer view can describe the reverse direct relation.

connectionType: Either single_reverse_direct_relation or multi_reverse_direct_relation, depending on if you expect a single relation or multiple objects.
name: The name of the property.
description: The description of the property.
through: The source and identifier of the direct relation property.
source: The source definition of the expected data.
targetsList (read-only): Whether the reverse direct relation targets a list of direct relations or not.

attention

The single_reverse_direct_relation doesn't ensure that only a single instance is connected. If you want to ensure only a single instance can be connected you need to ensure that the Direct Relation property has a uniqueness constraint.

attention

As reverse direct relations traverse the graph, it is highly recommended that there is a b-tree index on the direct relation property.

equipments:
  name: 'Equipments'
  description: 'All equipments made by the manufacturer'
  connectionType: multi_reverse_direct_relation
  through:
    identifier: manufacturer
    source:
      type: view
      space: vendor_schema
      externalId: Equipment
      version: 1
  source:
    type: view
    space: vendor_schema
    externalId: Equipment
    version: 1

Edges

You can also define a connection using an edge. For example, you can express that you expect nodes with data in BasicPump to have flows-to edges to Valve nodes.

The example above encodes that nodes with data in BasicPump can have flows-to edges to nodes with data in BasicValve. You can describe this with these fields in a connection property:

connectionType: Supports single_edge_connection or multi_edge_connection (default).
type: the fully qualified external ID of the node representing the edge type.
source: a reference to the view which you can view the node in the other end through.
direction: the direction to traverse the edge (inwards/outwards).
edgeSource: an optional reference to a view.

A Pump/Valve would look like this:

type:
  space: types
  externalId: flows-to
source:
  space: equipment
  externalId: BasicValve
  version: v1
direction: outwards

Implementing other views

Views can implement other views and inherit their properties. This is useful when you want to create a view that combines the properties of multiple other views. For example:

- space: equipment
  externalId: Equipment
  version: v1
  properties:
    manufacturer:
      container:
        space: equipment
        externalId: Equipment
      containerPropertyIdentifier: manufacturer
- space: equipment
  externalId: Pump
  version: v1
  implements: # <-- Declares that the view implements the Equipment view
    - space: equipment
      externalId: Equipment
      version: v1
  properties:
    maxPressure:
      container:
        space: equipment
        externalId: Pump
      containerPropertyIdentifier: maxPressure

The effective properties of the Pump view are now manufacturer and maxPressure.

danger

The effective properties of a view are resolved at query time. We do not allow breaking changes to views, but if a view implemented by another view is deleted, the inherited properties will be removed from the implementing view. This could break clients if it removes any required properties.

Implemented property conflicts and precedence

If you, for example, have four views; A, B, C, and D, each with a single property with the following implements graph, you can see the effective properties on the right.

If you introduce conflicting property identifiers in this graph, they are resolved by sorting the implements graph topologically. The order beneath a node is determined by the order of the implements array, where later entries are preferred.

If B implements [C, D], the order of precedence is A, B, D, C.

If B implements [D, C], the order is A, B, C, D.

In these examples B implements [C, D]:

View versioning

Views are versioned, and you can not introduce breaking changes without changing the version.

See view versioning and data model changes sections for information about data model and view changes that force a version update.

You can adapt the versioning scheme to your needs. If you don't have a preference, we recommend using an integer (whole numbers). Start your version scheme at 1, and increment the version number by one for each new version.

You can use decimals — 1.1, 1.2, and 2.0 — if you want higher version granularity. In this example, the first digit typically increments each time you make a significant or a breaking change. Breaking changes could require changes to your application's business logic. The fractional number — 2^nd and 3^rd digits — increments when you make minor changes that doesn't break your API. These kinds of changes don't typically break the logic required to use the API (and models).

Semantic versioning is a widely used versioning scheme. All version increments to a data model will break, as non-breaking changes are allowed without changing the version. You can choose to implement semantic versioning, but doing so requires close attention to the reusability factor designed for CDF data modeling. The semantic versioning scheme is not designed for data modeling but for software development. The a.b.c versioning scheme in semantic versioning specifies that both changes in a and b are breaking changes. This leaves only c for non-breaking changes.

In CDF, we allow certain changes to the data model without forcing you to increment the version number for the view or data model. For example, adding a new data type to an existing data model isn't a breaking change. However, in semantic versioning, this would be a breaking change.

View filters

All views have a filter field that lets you filter the nodes that are included when querying the view. For most higher-level query endpoints in DMS, the filters are applied automatically. For advanced endpoints, you have to apply the filters manually.

If no filter is specified, the default hasData filter is applied on the list of views specified when querying. Learn more about hasData filters in the querying article.

Equipment example view

This example illustrates a view definition for equipment:

- space: equipment
  externalId: BasicEquipment
  properties:
    producer:
      container:
        space: equipment
        externalId: Equipment
      containerPropertyIdentifier: manufacturer
- space: equipment
  externalId: BasicValve
  version: v1
  # Since this only maps properties in the Equipment view, we can't rely on hasData filtering.
  # We add a custom filter to make sure we only include nodes of the correct type.
  filter:
    equals:
      property: ['node', 'type']
      value: { 'space': 'types', 'externalId': 'valve' }
  implements: # Inherit the properties from the BasicEquipment view
    - space: equipment
      externalId: BasicEquipment
- space: equipment
  externalId: BasicPump
  version: v1
  implements: # Inherit the properties from the BasicEquipment view
    - space: equipment
      externalId: BasicEquipment
  properties:
    maxPressure:
      container:
        space: equipment
        externalId: Pump
      containerPropertyIdentifier: maxPressure
    valves:
      type: # The edge type to traverse
        space: types
        externalId: flows-to
      source: # The view to view the other node in
        space: equipment
        externalId: BasicValve
        version: v1
      direction: outwards

Caveats

When a property mapped by a view is modified or deleted, the lastUpdatedTime of the view will not be updated.
It's important to inspect which views are affected by the deletion of container. When a container is deleted, its properties are removed from any views that map the container. This can break clients if the properties are required.

Polymorphism in views

You can achieve polymorphism for views in two ways:

Using implements and the implicit view hasData filter.
Using explicit view filters on type.

In the sections below, BasicPump and BasicValve are subtypes of BasicEquipment.

Using `implements` and implicit view `hasData` filtering

When the BasicPump view implements the BasicEquipment view, the default filter on BasicPump is a hasData filter across the underlying containers: Pump and Equipment.

If you filter using the BasicEquipment view, you'll get anything with data in the Equipment container. If you filter using the BasicPump view, you'll get anything with data in both the Equipment and the Pump containers.

This approach to polymorphism resembles structural subtyping: "if it looks like a duck and quacks like a duck, it's a duck."

This breaks down in some cases. For example, if a view only has connection properties, there are no backing containers to apply the hasData filtering on. In this case, you can use the type property and view filters - see the next section.

Using explicit view filters on `type`

If, for example, you associate all pump nodes with the type [types, pump], you can use a view filter to only include nodes with that type:

filter:
  equals:
    property: ['node', 'type']
    value: { 'space': 'types', 'externalId': 'pump' }

To list all equipment nodes, you must explicitly include the subtypes of the view in your filter:

filter:
  in:
    property: ['node', 'type']
    values:
      [
        { 'space': 'types', 'externalId': 'pump' },
        { 'space': 'types', 'externalId': 'valve' },
      ]

This approach to polymorphism resembles nominal subtyping: "if it's a duck, it's a duck."

Data models

Use data models to group views that belong together for a purpose. For example, you might define a EquipmentInspection data model containing a BasicValve and a BasicPump view.

space: equipment
externalId: EquipmentInspection
version: v1
views:
  - space: equipment
    externalId: BasicPump
    version: v1
  - space: equipment
    externalId: BasicValve
    version: v1

Impact of changes to views and data models

The tables in this section describe the impact of changes to a container, a view, or a data model. Changes that force a visible change for a consumer of data from a view or a data model are considered breaking changes, and require a version change to be submitted to the service.

remarque

You can change the version for a data model, or view even if a version change is not required. This operation should be considered a controlled switch for any consumer of data from the configuration.

Container changes

Containers are not versioned and changes to containers are not breaking in the same sense as for views and data models. To change a container, we recommend using a delete > recreate > re-ingest pattern. Alternatively, create a new parallel container and re-ingest its data.

attention

After deleting, recreating, re-ingesting or creating a parallel container, you need to recreate any existing view-to-container mapping(s).

In the table below, the "Not allowed" description means that the operation can't be safely implemented in the underlying data store.

Operation	Breaking change	Allowed	Description
Change name.	No	Yes	Changes metadata.
Change description.	No	Yes	Changes metadata.
Change `usedFor`.	N/A	No	Not allowed.
Add property.	No	Yes	When the property identifier isn't in use.
Delete property.	N/A	No	Not allowed.
Add `requires` or `check` constraint.	No	Yes	Identifier must be unique and constraint will apply to any new values (not pre-existing ones).
Add `unique` constraint.	N/A	Yes	Can only be included during the initial creation of the container.
Change constraint.	N/A	No	Not allowed.
Delete constraint.	No	Yes	Allowed.
Add new index.	No	Yes	Allowed.
Delete index.	No	Yes	Allowed.
Change index.	N/A	No	Not allowed.
Change property: `nullable` > `non-nullable`.	Yes	Yes	Changing to non-nullable may break ingestion clients, but represents an acceptable breakage. This shouldn't be unexpected.
Change property: `non-nullable` > `nullable`.	N/A	No	Not allowed.
Change property: `autoIncrement`.	N/A	No	Not allowed.
Change property: `defaultValue`.	No	Yes	Changes are applied to new values added. Previously existing values remain the same.
Change property: `description`.	No	Yes	Changes metadata meant for human consumption.
Change property: `name`.	No	Yes	Changes metadata meant for human consumption.
Change property: `type`.	N/A	No	Not allowed.
Change (text) property: `list` state.	N/A	No	Not allowed.
Change (text) property: `collation`.	N/A	No	Not allowed.
Change (primitive) property: `list` state.	N/A	No	Not allowed.
Change (direct relation) property: target `container`.	N/A	No	Not allowed.
No change.	Yes	Yes	Allows idempotent updates.

View changes

Operations and the impact of the operation for data modeling views.

Operation	View	Property	Relation	Direct relation	Breaking change	Description
Change name.	Yes	Yes	Yes	Yes	No	Changes metadata.
Change description.	Yes	Yes	Yes	Yes	No	Changes metadata.
Change filter.	Yes	Yes	Yes	Yes	No	Changes the result of the view, but doesn't change the form of it.
Change implements.	N/A	N/A	N/A	N/A	Yes	May break clients.
Change version.	N/A	N/A	N/A	N/A	Yes	A version can be bumped even if there are no changes (allows alignment of version numbers for different combinations of views).
Add nullable property.	Yes	Yes	N/A	N/A	No	Safe change as long as there's no collision with other inherited properties, and it doesn't introduce a new mapped container. Still safe as long as the type is the same and the property doesn't change from `nullable` to `required`.
Add non-nullable.	Yes	Yes	N/A	N/A	Yes	Breaking since a client may depend on its existence.
Delete property.	Yes	N/A	N/A	N/A	Yes
Change property type.	N/A	Yes	N/A	N/A	Yes	Version change to not break consumers.
Change container reference for base property.	N/A	Yes	N/A	N/A	No	Permit remapping consumers to new data, from other containers, without forcing a version change for consumers.
Change source hint.	N/A	N/A	Yes	Yes	Yes	Equivalent of changing the property type.
Change type of relation.	N/A	N/A	Yes	Yes	Yes	As long as the source type remains the same (not changed).
Change direction of relation.	N/A	N/A	Yes	Yes	Yes	As long as the source type remains the same (not changed).
Change source of relation.	N/A	N/A	Yes	N/A	Yes	As long as the source type remains the same (not changed).
No change.	N/A	N/A	N/A	N/A	No	Support idempotent updates.

Data model changes

Operations and the impact of the operation for data models in the data modeling service.

Operation	Breaking change	Allowed	Description
Change name.	No	Yes	Changes metadata.
Change description.	No	Yes	Changes metadata.
Add a view.	No	Yes	Non-breaking if the new view doesn't conflict with any existing views in the data model. A conflict is a view using the same external ID.
Remove a view.	Yes	Yes	Breaking since a client may depend on the existence of the data model.
Replace a view.	Yes	Yes	When updating the view identified by the (`space`, `externalId`, `version`). Typically done to update the version of a view used by the data model.
Change version.	Yes	Yes	A version bump without other changes is still a version bump.
Change space.	N/A	No	Not supported.

Containers​

Which types of instances can you use a container for?​

Properties​

Indexes​

Size bounds on btree indexes​

Existing data that exceeds size bounds​

Constraints​

Example container definition​

Views​

Mapped properties​

Connection properties​

Reverse Direct relation​

Edges​

Implementing other views​

Implemented property conflicts and precedence​

View versioning​

View filters​

Equipment example view​

Caveats​

Polymorphism in views​

Using implements and implicit view hasData filtering​

Using explicit view filters on type​

Data models​

Impact of changes to views and data models​

Container changes​

View changes​

Data model changes​

Containers

Which types of instances can you use a container for?

Properties

Indexes

Size bounds on btree indexes

Existing data that exceeds size bounds

Constraints

Example container definition

Views

Mapped properties

Connection properties

Reverse Direct relation

Edges

Implementing other views

Implemented property conflicts and precedence

View versioning

View filters

Equipment example view

Caveats

Polymorphism in views

Using `implements` and implicit view `hasData` filtering

Using explicit view filters on `type`

Data models

Impact of changes to views and data models

Container changes

View changes

Data model changes