Data modeling performance considerations
This article outlines key data modeling factors that impact performance when ingesting and querying data.
To manage containers and govern schema components, you can use NEAT (open source) and the CDF Toolkit.
Indexes and constraints
To improve the speed of data retrieval and sorting, you can index any property defined in a container. However, indexes also come with a performance cost during certain operations.
You can configure a maximum of 10 indexes for each container. Each index can include an individual property, or combinations of properties (composite indexes). The order of the properties is important for composite indexes.
The properties most frequently included in queries are primary candidates to include in an index. Also, you should consider the size of the data set, and the update frequency of the data.
You can specify two main index types for the data modeling service: Btree and inverted indexes.
The examples in this section assume the following container definitions:
# Example container definitions
space: site_assets
container:
# A container for the "Asset" type with a single property "asset_id"
- externalId: Asset
name: Asset
usedFor: node
description: The Asset container
properties:
asset_id:
name: asset_id
type:
type: int64
site:
name: site
type:
type: text
name:
name: name
type:
type: text
# [... more asset specific container properties could be included here]
# A container for the "Description" type
- externalId: Description
usedFor: node
properties:
title:
type:
type: text
list: false
collation: ucs_basic
nullable: true
description_text:
type:
type: text
list: false
collation: ucs_basic
nullable: true
labels:
type:
type: text
list: true
collation: ucs_basic
nullable: true
Why not index all properties?
Well-planned indexes that match the query patterns of your data consumers improve the performance of data retrieval operations. But indexes come at a cost, and creating an index on the wrong properties can be detrimental to both data retrieval, and insert/update operations:
- Indexes built on properties containing a lot of unique data will often impede performance.
- BTree indexes for properties with large amounts of data can use significant amounts of memory and can cause intermittent error/timeout statuses for update or add operations.
- Composite indexes for millions of pieces of data can lead to poor performance during add and update operations for the indexed properties. It may impact the experienced stability of your solutions and your data objects.
Having an index for every property is unlikely to give the desired result even if the total property count in the container would allow it. Too many indexes, across too many properties, in too many related containers may lead to slow performance during ingestion, query, search, and filtering operations. Slow performance can result in a lot of push-back on the client in the form of retry statuses from the data modeling service.
BTree indexes
BTree indexes maintain data in a sorted order and lets you quickly look up instances based on the properties defined in the index. The indexes are ideal for queries using both range and equality filters. Conceptually, a BTree is similar to a phone book, which is sorted by name, allowing you to find a person's phone number quickly.
BTree indexes are particularly useful for data stored as scalar data types. To see the scalar data types available in container properties, navigate to the type
> PrimitiveProperty
section of the Create or update containers entry in the API documentation.
The following example illustrates creating BTree indexes for the asset name and site:
indexes:
siteIndex:
properties:
- site
indexType: btree
cursorable: true
nameIndex:
properties:
- name
indexType: btree