Skip to main content
When you’re working with high-volume industrial data, storing each entry as a node in your knowledge graph becomes impractical. The graph becomes cluttered, query performance degrades, and you quickly hit instance budget limits. Records solve this problem by providing high-performance storage for bulk structured data that doesn’t need to be part of the graph structure. Records are structured data objects stored in streams, separate from the industrial knowledge graph. Like data modeling instances, records use containers to define their schema and spaces for access control, but they don’t create nodes or edges in the graph. This design enables you to store billions of records without impacting graph performance or consuming instance budgets. After reading this article, you’ll understand when to use records versus data modeling instances, how streams organize and manage your data, and how records integrate with your existing data modeling infrastructure.

When to use records

Records are ideal for high-volume, structured data that doesn’t require complex graph relationships. Common use cases include:
  • High-volume immutable data: Logs, events, and notifications (OPC UA events, PI EventFrames, well logs, manufacturing batch logs)
  • Archived and historical data: Completed work orders, resolved alarms, concluded activities
  • Infrequently updated operational data: Active work orders or alarms with low update frequency over their lifecycle
If you need to understand when to choose records versus data modeling instances, see the comparison table below for detailed differences.

Core concepts

To work effectively with records, you need to understand these key concepts:
  • Streams define the lifecycle and performance characteristics of your data.
  • Records are the individual data objects you store.
  • Spaces provide access control and organization.
  • Containers define the schema structure.

Streams

Streams are logical containers that organize your records and define how they behave throughout their lifecycle. When you create a stream, you choose a template that sets policies for:
  • Retention periods: How long records are stored before automatic deletion.
  • Mutability: Whether records can be updated after ingestion.
  • Performance characteristics: Ingestion and query throughput limits.
Each stream is created from one of the available stream templates, and this template cannot be changed after creating the stream. Streams are independent from spaces and containers. A single stream can contain records from multiple spaces with different container schemas, giving you flexibility in how you organize your data. Before you can ingest records, you must create a stream to hold them. When you query records, you reference the specific stream where they’re stored.

Stream templates

Stream templates are in betaStream templates are currently in beta, which means the available templates may change as we gather feedback and continue development. New templates may be added, and existing templates may be modified or removed if necessary. Any changes will not affect existing streams you’ve already created from these templates.
When you create a stream, select a stream template that defines the stream’s behavior, performance, and lifecycle policies. Keep in mind these important considerations:
  • The choice between mutable and immutable records has significant scale implications. Different stream templates support different maximum record counts and storage capacities.
  • You cannot change a stream’s template after creation, so choose your template carefully for production use.
  • Review the limits and specifications for each template in the Streams API documentation before creating your streams, and select based on your expected data volume and mutability requirements.
These stream templates are available for all CDF projects:
TemplateMutabilityPurpose
ImmutableTestStreamImmutableSandbox testing of immutable record data
BasicArchiveImmutableLong-term archival storage with unlimited retention
BasicLiveDataMutableInfrequently updated “live” data
For enterprise CDF subscriptions, additional high-scale stream templates are available with increased capacity:
  • Streams: Up to 30 active streams per project
  • Records: Up to 5 billion (immutable) or 100 million (mutable) records per stream
  • Storage: Up to 5 TB (immutable) or 300 GB (mutable) per stream
  • Max write: Up to 170 MB per 10 minutes per stream
  • Max read: Up to 1.7 GB per 10 minutes per stream
Contact your Cognite representative to learn more about these high-scale options.
Once you create a stream with a specific template, you cannot change that template. If you need different template settings, you must delete all records from the stream, delete the stream, then create a new stream with the correct template. Each stream template defines specific limits for records, storage, and throughput. If you exceed these limits, the Records API returns HTTP 429 Too Many Requests responses. For detailed specifications, limits, and throughput rates for each template, see the Streams API documentation. For a summary of Records resource limits and API operation limits, see Limits and restrictions. If you’re building applications or services that use Records, implement the recommended approaches for managing concurrency and rate limits to avoid hitting these limits.

Deleting streams

When you delete a stream, it enters a soft-deleted state to protect against accidental data loss. During this period:
  • The stream and its data are preserved but inaccessible (no ingestion or queries).
  • The stream doesn’t count toward the active stream limit.
  • You cannot create a new stream with the same identifier.
  • You can recover the stream by contacting Cognite Support.
The duration of the soft-deleted state depends on the stream template, ranging from 1 day for test streams to 6 weeks for production streams. After this period, the stream and its data are permanently deleted. For limits on active and soft-deleted streams, see Limits and restrictions.

Records

Records are individual data objects that represent events, logs, or historical entries. Whether a record is immutable or mutable depends on the stream template you choose when creating the stream. An industrial knowledge graph describes relationships between entities using nodes and edges. Nodes can represent physical entities like equipment or logical concepts like activities and process stages. However, when you’re handling bulk data such as logs or historical records, storing each individual record as a node significantly degrades query and data retrieval performance because of increased relational complexity. This is an anti-pattern you should avoid. Use the Records service to avoid these performance penalties for high-volume data. Records, together with streams, let you store high-volume structured data in bulk, improving both the performance and scalability of your CDF-based solutions. Although records support mutability, updating records comes at a significant processing and ingestion cost compared to data modeling instances. Immutability is a key design feature for records that guarantees historical records cannot be altered, while also delivering cost-effective support for massive storage volumes.

Records vs. nodes

Data modeling nodesRecords
StorageEntities in the industrial knowledge graphData stored in streams as large batches
IdentificationInstances must have a unique external ID per node in the space.See Identifiers for records
MutabilityMutable by default, immutable by configurationDepends on the stream template applied when creating the stream
Data volumesMillions of nodes with low growth (once the initial graph is defined)Billions of records per year continuously
StructureStructured using containers, option to use JSON for semi-structured dataStructured using containers, option to use JSON for semi-structured data
RelationshipsMany-to-many in a mesh, defined by edges and direct relations between instancesConnected to Data Modeling instances via direct relations in the records

Identifiers for records

In data modeling, you identify nodes using a combination of the space ID and the mandatory node external ID. The external ID must be unique within the space it’s scoped to, but you can reuse the same external ID across different spaces. Records also use external IDs. Like data modeling nodes, a record’s external ID belongs to a space and is stored in a stream that can include records from multiple spaces. For records, the stream type determines the uniqueness constraints:
  • Mutable streams: the service enforces uniqueness for each combination of external ID, space ID, and stream. When you update a record with the same external ID/space/stream combination, it updates the existing record rather than creating a new one.
  • Immutable streams: the service does not enforce uniqueness. An immutable stream can contain multiple records with the same stream/space/external ID combination. This is useful for storing the full history of a record over time. You can use filtering capabilities to retrieve these records in bulk.
In a single write request to a mutable stream, all combinations of space + externalId must be unique. You cannot create and update a record with the same space and external ID combination in the same POST request to /streams/{streamId}/records.

Spaces

Records use data modeling spaces for access control and organization. You must define a space before you can ingest records into it. Records can share spaces with data modeling instances, be stored in dedicated spaces, or use multiple shared spaces depending on your access control requirements. The following diagram illustrates three common space organization patterns: shared spaces where instances and records coexist, independent spaces for separate organization, and multiple shared spaces where records can belong to multiple spaces simultaneously.
Currently, you can delete a space that contains records. In an upcoming release, deleting a space with records will be prevented. You’ll need to delete all records from the space before you can delete the space itself, making this consistent with data modeling behavior.If you delete a space and recreate it with the same external ID, the records will still be associated with the original space ID and remain accessible in queries.

Containers

Records use data modeling containers to define their schema. You can only ingest records into containers with usedFor set to record. Containers designated for records (usedFor: record) support significantly more properties than standard containers used for nodes and edges. See Limits and restrictions for specific property limits.
Records currently don’t support the enum property type. Support for enum properties in records is planned for a future release.If you try to ingest records into a container that includes an enum property:
  • The API returns a 4xx error if the ingested data includes a value for that enum property.
  • The ingestion succeeds if the ingested data omits the enum property.
Deleting a container makes all records using that container permanently inaccessible, even if you recreate a container with the same external ID and schema.Plan your container schema carefully before ingesting records, as you cannot recover access to records after deleting their container.

Linking records to nodes in the knowledge graph

You can link records to nodes in the knowledge graph by including a property in your record that stores a reference to the related node identifier. This lets you retrieve records linked to a specific node using filters. When designing your container schema, include properties that you’ll use for filtering and querying records. For example, if you need to retrieve all records associated with a specific asset, include a direct relation to the asset in your container. This allows the Records service to efficiently filter records without scanning the entire stream. For example, a container for maintenance logs might include properties like asset, timestamp, and severity to enable efficient filtering by asset, time range, or severity level.

Capabilities

Records and streams have their own capabilities for access control. These capabilities are independent of each other and are not inherited from the data modeling service. However, because records rely on the data modeling container feature, you must have the dataModels:READ capability to read or write records in a stream.

Data ingestion

The Records service operates with near real-time consistency. When you ingest or update records, there is typically a brief delay, up to a few seconds, between when the API returns a successful response and when the changes become visible in search results, filters, and aggregations. This delay occurs because the service periodically makes newly ingested data searchable, balancing performance for high-volume data ingestion with quick data availability. In most cases, new or updated records become searchable within 1-2 seconds of ingestion. Keep this near real-time consistency in mind when designing your application:
  • Write-then-read scenarios: If you ingest a record and immediately query for it, the record may not appear in the results yet. Consider implementing a brief retry mechanism or delay if your workflow depends on immediate read-after-write consistency.
  • Immediate updates: For use cases with low data volumes requiring immediate visibility of every update, consider using data modeling instances instead of records.

Getting started

To begin using records and streams effectively, start by identifying your high-volume data sources and the target structure you need. Then explore: