About records and streams

When you’re working with high-volume industrial data, storing each entry as a node in your knowledge graph becomes impractical. The graph becomes cluttered, query performance degrades, and you quickly hit instance budget limits. Records solve this problem by providing high-performance storage for bulk structured data that doesn’t need to be part of the graph structure. Records are structured data objects stored in streams, separate from the industrial knowledge graph. Like data modeling instances, records use containers to define their schema and spaces for access control, but they don’t create nodes or edges in the graph. This design enables you to store billions of records without impacting graph performance or consuming instance budgets. After reading this article, you’ll understand when to use records versus data modeling instances, how streams organize and manage your data, and how records integrate with your existing data modeling infrastructure.

When to use records

Records are ideal for high-volume, structured data that doesn’t require complex graph relationships. Common use cases include:

High-volume immutable data: Logs, events, and notifications (OPC UA events, PI EventFrames, well logs, manufacturing batch logs)
Archived and historical data: Completed work orders, resolved alarms, concluded activities
Infrequently updated operational data: Active work orders or alarms with low update frequency over their lifecycle

If you need to understand when to choose records versus data modeling instances, see the comparison table below for detailed differences.

Core concepts

To work effectively with records, you need to understand these key concepts:

Streams define the lifecycle and performance characteristics of your data.
Records are the individual data objects you store.
Spaces provide access control and organization.
Containers define the schema structure.

Streams

Streams are logical containers that organize your records and define how they behave throughout their lifecycle. When you create a stream, you choose a template that sets policies for:

Retention periods: How long records are stored before automatic deletion.
Mutability: Whether records can be updated after ingestion.
Performance characteristics: Ingestion and query throughput limits.

Each stream is created from one of the available stream templates, and this template cannot be changed after creating the stream. Streams are independent from spaces and containers. A single stream can contain records from multiple spaces with different container schemas, giving you flexibility in how you organize your data. Before you can ingest records, you must create a stream to hold them. When you query records, you reference the specific stream where they’re stored.

Stream templates

Stream templates are in betaStream templates are currently in beta, which means the available templates may change as we gather feedback and continue development. New templates may be added, and existing templates may be modified or removed if necessary. Any changes will not affect existing streams you’ve already created from these templates.

When you create a stream, select a stream template that defines the stream’s behavior, performance, and lifecycle policies. Keep in mind these important considerations:

The choice between mutable and immutable records has significant scale implications. Different stream templates support different maximum record counts and storage capacities.
You cannot change a stream’s template after creation, so choose your template carefully for production use.
Review the limits and specifications for each template in the Streams API documentation before creating your streams, and select based on your expected data volume and mutability requirements.

These stream templates are available for all CDF projects:

Template	Mutability	Purpose
ImmutableTestStream	Immutable	Sandbox testing of immutable record data
BasicArchive	Immutable	Long-term archival storage with unlimited retention
BasicLiveData	Mutable	Infrequently updated “live” data

For enterprise CDF subscriptions, additional high-scale stream templates are available with increased capacity:

Streams: Up to 30 active streams per project
Records: Up to 5 billion (immutable) or 100 million (mutable) records per stream
Storage: Up to 5 TB (immutable) or 300 GB (mutable) per stream
Max write: Up to 170 MB per 10 minutes per stream
Max read: Up to 1.7 GB per 10 minutes per stream

Contact your Cognite representative to learn more about these high-scale options.

Once you create a stream with a specific template, you cannot change that template. If you need different template settings, you must delete all records from the stream, delete the stream, then create a new stream with the correct template. Each stream template defines specific limits for records, storage, and throughput. If you exceed these limits, the Records API returns HTTP 429 Too Many Requests responses. For detailed specifications, limits, and throughput rates for each template, see the Streams API documentation. For a summary of Records resource limits and API operation limits, see Limits and restrictions. If you’re building applications or services that use Records, implement the recommended approaches for managing concurrency and rate limits to avoid hitting these limits.

Deleting streams

When you delete a stream, it enters a soft-deleted state to protect against accidental data loss. During this period:

The stream and its data are preserved but inaccessible (no ingestion or queries).
The stream doesn’t count toward the active stream limit.
You cannot create a new stream with the same identifier.
You can recover the stream by contacting Cognite Support.

The duration of the soft-deleted state depends on the stream template, ranging from 1 day for test streams to 6 weeks for production streams. After this period, the stream and its data are permanently deleted. For limits on active and soft-deleted streams, see Limits and restrictions.

Records

Records are individual data objects that represent events, logs, or historical entries. Whether a record is immutable or mutable depends on the stream template you choose when creating the stream. An industrial knowledge graph describes relationships between entities using nodes and edges. Nodes can represent physical entities like equipment or logical concepts like activities and process stages. However, when you’re handling bulk data such as logs or historical records, storing each individual record as a node significantly degrades query and data retrieval performance because of increased relational complexity. This is an anti-pattern you should avoid. Use the Records service to avoid these performance penalties for high-volume data. Records, together with streams, let you store high-volume structured data in bulk, improving both the performance and scalability of your CDF-based solutions. Although records support mutability, updating records comes at a significant processing and ingestion cost compared to data modeling instances. Immutability is a key design feature for records that guarantees historical records cannot be altered, while also delivering cost-effective support for massive storage volumes.

Records vs. nodes

	Data modeling nodes	Records
Storage	Entities in the industrial knowledge graph	Data stored in streams as large batches
Identification	Instances must have a unique external ID per node in the space.	See Identifiers for records
Mutability	Mutable by default, immutable by configuration	Depends on the stream template applied when creating the stream
Data volumes	Millions of nodes with low growth (once the initial graph is defined)	Billions of records per year continuously
Structure	Structured using containers, option to use JSON for semi-structured data	Structured using containers, option to use JSON for semi-structured data
Relationships	Many-to-many in a mesh, defined by edges and direct relations between instances	Connected to Data Modeling instances via direct relations in the records

Identifiers for records

In data modeling, you identify nodes using a combination of the space ID and the mandatory node external ID. The external ID must be unique within the space it’s scoped to, but you can reuse the same external ID across different spaces. Records also use external IDs. Like data modeling nodes, a record’s external ID belongs to a space and is stored in a stream that can include records from multiple spaces. For records, the stream type determines the uniqueness constraints:

Mutable streams: the service enforces uniqueness for each combination of external ID, space ID, and stream. When you update a record with the same external ID/space/stream combination, it updates the existing record rather than creating a new one.
Immutable streams: the service does not enforce uniqueness. An immutable stream can contain multiple records with the same stream/space/external ID combination. This is useful for storing the full history of a record over time. You can use filtering capabilities to retrieve these records in bulk.

In a single write request to a mutable stream, all combinations of space + externalId must be unique. You cannot create and update a record with the same space and external ID combination in the same POST request to /streams/{streamId}/records.

Spaces

Records use data modeling spaces for access control and organization. You must define a space before you can ingest records into it. Records can share spaces with data modeling instances, be stored in dedicated spaces, or use multiple shared spaces depending on your access control requirements. The following diagram illustrates three common space organization patterns: shared spaces where instances and records coexist, independent spaces for separate organization, and multiple shared spaces where records can belong to multiple spaces simultaneously.

Currently, you can delete a space that contains records. In an upcoming release, deleting a space with records will be prevented. You’ll need to delete all records from the space before you can delete the space itself, making this consistent with data modeling behavior.If you delete a space and recreate it with the same external ID, the records will still be associated with the original space ID and remain accessible in queries.

Containers

Records use data modeling containers to define their schema. You can only ingest records into containers with usedFor set to record. Containers designated for records (usedFor: record) support significantly more properties than standard containers used for nodes and edges. See Limits and restrictions for specific property limits.

Records currently don’t support the enum property type. Support for enum properties in records is planned for a future release.If you try to ingest records into a container that includes an enum property:

The API returns a 4xx error if the ingested data includes a value for that enum property.
The ingestion succeeds if the ingested data omits the enum property.

Deleting a container makes all records using that container permanently inaccessible, even if you recreate a container with the same external ID and schema.Plan your container schema carefully before ingesting records, as you cannot recover access to records after deleting their container.

Linking records to the knowledge graph

You can link records to the knowledge graph by defining direct relation properties in your record container schema. These properties enable you to contextualize records by storing references to data modeling instances using their space and external ID. For instance, you can link sensor logs to specific wells or alarm records to particular assets. To link sensor logs to a Well in your data model:

Define a direct relation property in your record container schema, such as well of type direct relation.
Assign the relationship when ingesting records by providing the space and external ID of the target instance.
Query and filter efficiently using this property to retrieve all records associated with a specific instance.

This approach enables the Records service to filter data efficiently without scanning entire streams. For example, you can run queries like “retrieve all sensor logs for well X” or “find all maintenance records for asset Y.”

Validation behavior for direct relationsWhen you ingest records with direct relation properties, the Records service validates that the target space exists but does not verify that the referenced instance exists. This differs from standard data modeling behavior, where both space and instances are validated.You can ingest a record with a direct relation pointing to a non-existent instance, as long as the target space is valid. Design your ingestion pipelines to ensure referenced instances exist before creating records that link to them.

When designing your container schema, include direct relation properties for instances you’ll frequently use in filters and aggregations.

Capabilities

Records and streams have their own capabilities for access control. These capabilities are independent of each other and are not inherited from the data modeling service. However, because records rely on the data modeling container feature, you must have the dataModels:READ capability to read or write records in a stream.

Data ingestion

The Records service operates with near real-time consistency. When you ingest or update records, there is typically a brief delay, up to a few seconds, between when the API returns a successful response and when the changes become visible in search results, filters, and aggregations. This delay occurs because the service periodically makes newly ingested data searchable, balancing performance for high-volume data ingestion with quick data availability. In most cases, new or updated records become searchable within 1-2 seconds of ingestion. Keep this near real-time consistency in mind when designing your application:

Write-then-read scenarios: If you ingest a record and immediately query for it, the record may not appear in the results yet. Consider implementing a brief retry mechanism or delay if your workflow depends on immediate read-after-write consistency.
Immediate updates: For use cases with low data volumes requiring immediate visibility of every update, consider using data modeling instances instead of records.

Getting started

To begin using records and streams effectively, start by identifying your high-volume data sources and the target structure you need. Then explore:

Build a SCADA alarm management pipeline with Records - Complete tutorial that walks you through creating schemas, setting up streams, ingesting records, and querying high-volume OT event data

Data engineering

​When to use records

​Core concepts

​Streams

​Stream templates

​Deleting streams

​Records

​Records vs. nodes

​Identifiers for records

​Spaces

​Containers

​Linking records to the knowledge graph

​Capabilities

​Data ingestion

​Getting started

When to use records

Core concepts

Streams

Stream templates

Deleting streams

Records

Records vs. nodes

Identifiers for records

Spaces

Containers

Linking records to the knowledge graph

Capabilities

Data ingestion

Getting started