About records and streams

When you’re working with high-volume industrial data, storing each entry as a node in your knowledge graph becomes impractical. The graph becomes cluttered, query performance degrades, and you quickly hit instance budget limits. Records solve this problem by providing high-performance storage for bulk structured data that doesn’t need to be part of the graph structure. Records are structured data objects stored in streams, separate from the industrial knowledge graph. Like data modeling instances, records use containers to define their schema and spaces for access control, but they don’t create nodes or edges in the graph. This design enables you to store billions of records without impacting graph performance or consuming instance budgets. After reading this article, you’ll understand when to use records versus data modeling instances, how streams organize and manage your data, and how records integrate with your existing data modeling infrastructure.

When to use records

Records are designed primarily for immutable, high-volume data that doesn’t require complex graph relationships. The service is optimized for write-once, read-many scenarios where data volume and analytics are critical. Common use cases include:

High-volume immutable data: Logs, events, and notifications (OPC UA events, PI EventFrames, well logs, manufacturing batch logs)
Archived and historical data: Completed work orders, resolved alarms, concluded activities
Data with a defined lifecycle: Active work orders or alarms that need updates during their lifecycle before being archived to immutable storage

If your data requires frequent updates and will never be archived, consider using data modeling nodes instead. Nodes are better suited for persistent, frequently-changing entities in your knowledge graph.

To understand when to choose records versus data modeling instances, see the comparison table below for detailed differences.

Core concepts

To work effectively with records, you need to understand these key concepts:

Streams define the lifecycle and performance characteristics of your data.
Records are the individual data objects you store.
Spaces provide access control and organization.
Containers define the schema structure.

Streams

Streams are logical containers that organize your records and define how they behave throughout their lifecycle. When you create a stream, you choose a template that sets policies for:

Retention periods: How long records are stored before automatic deletion
Mutability: Whether records can be updated after ingestion
Performance characteristics: Ingestion and query throughput limits

Each stream is created from one of the available stream templates, and this template cannot be changed after creating the stream. Streams are independent from spaces and containers. A single stream can contain records from multiple spaces with different container schemas, giving you flexibility in how you organize your data. Before you can ingest records, you must create a stream to hold them. When you query records, you reference the specific stream where they’re stored.

Stream templates

Stream templates are in betaStream templates aren’t in production yet, which means the available templates may change as we gather feedback and continue development. New templates may be added, and existing templates may be modified or removed if necessary. Any changes will not affect existing streams you’ve already created from these templates.

When you create a stream, select a stream template that defines the stream’s behavior, performance, and lifecycle policies. Keep in mind these important considerations:

The choice between mutable and immutable records has significant scale implications. Different stream templates support different maximum record counts and storage capacities.
You cannot change a stream’s template after creation, so choose your template carefully for production use.
Review the limits and specifications for each template in the Streams API documentation before creating your streams, and select based on your expected data volume and mutability requirements.

Immutable streams These templates are for data that should not be modified after ingestion.

ImmutableTestStream

Use this template exclusively for experimentation. It’s configured for high throughput and total data volume but has short data retention. Low retention in a soft-deleted state means you can quickly discard such streams when you no longer need them or recreate them to remove experimental data.

Note: Never use this template for production purposes. As this template allows significant load on the system, if we detect improper usage patterns, we may change the settings of streams created from this template as a last resort.

Max number of unique properties with data across all records: 1000
Max number of records ingested per 10 minutes: 800,000 items
Max ingestion throughput per 10 minutes: 1.5GB
Max reading throughput per 10 minutes: 1.5GB
Maximum total number of records: 50M (50,000,000)
Maximum total data volume: 50GB
Maximum range filter interval for the lastUpdatedTime property: 7 days
Data retention: 7 days
Stream stays in soft-deleted state before being hard-deleted: 1 day
Maximum number of active streams per project: 3

BasicArchive

This template is intended for perpetual data storage. However, overall data volume is limited, so plan usage accordingly.

Max number of unique properties with data across all records: 1000
Max number of records ingested per 10 minutes: 170,000 items
Max ingestion throughput per 10 minutes: 170MB
Max reading throughput per 10 minutes: 1.7GB
Maximum total number of records: 50M (50,000,000)
Maximum total data volume: 50GB
Maximum range filter interval for the lastUpdatedTime property: 365 days
Data retention: Unlimited (data never gets deleted)
Stream stays in soft-deleted state before being hard-deleted: 6 weeks
Maximum number of active streams per project: 2

Mutable streams These templates allow data updates but have lower total record limits compared to immutable streams.

BasicLiveData

This template is intended for production usage and offers significant data volume and throughput.

Max number of unique properties with data across all records: 1000
Max number of records ingested per 10 minutes: 170,000 items
Max number of records updated or deleted per 10 minutes: 85,000 items
Max ingestion throughput per 10 minutes: 170MB
Max reading throughput per 10 minutes: 500MB
Maximum total number of records: 5M (5,000,000)
Maximum total data volume: 15GB
Stream stays in soft-deleted state before being hard-deleted: 6 weeks
Maximum number of active streams per project: 2

For enterprise CDF subscriptions, additional high-scale stream templates may be available with increased capacity:

Streams: Up to 30 active streams per project
Records: Up to 5 billion (immutable) or 100 million (mutable) records per stream
Storage: Up to 5 TB (immutable) or 300 GB (mutable) per stream
Max write: Up to 170 MB per 10 minutes per stream
Max read: Up to 1.7 GB per 10 minutes per stream

No API is available to discover which stream templates are available for your project. Contact your Cognite representative to learn which templates are available beyond the base options listed above.

Once you create a stream with a specific template, you cannot change that template. If you need different template settings, you must delete all records from the stream, delete the stream, then create a new stream with the correct template.

Stream naming rules and limits

Stream externalId values must start with a lowercase letter, contain only lowercase letters, digits, hyphens, and underscores, and be at most 100 characters long. The value must match this pattern: ^[a-z]([a-z0-9_-]{0,98}[a-z0-9])?$. The number of active streams per project is limited. For current limits, see Limits and restrictions. Each stream template defines specific limits for records, storage, and throughput. If you exceed these limits, the Records API returns HTTP 429 Too Many Requests responses. For detailed specifications, limits, and throughput rates for each template, see the Streams API documentation. For a summary of Records resource limits and API operation limits, see Limits and restrictions. If you’re building applications or services that use Records, implement the recommended approaches for managing concurrency and rate limits to avoid hitting these limits.

Query time range limits

Immutable streams require a lastUpdatedTime range on every filter and aggregate query. The maxFilteringInterval setting on each stream template defines the maximum span between the gt (start) and lt (end) timestamps in a single request. For example, the BasicArchive template has a maxFilteringInterval of 365 days. This means each request can cover at most a 365-day window, but this window can be anywhere in the stream’s history, not just relative to the current date. If the difference between gt and lt exceeds the interval, the API returns a validation error. Since BasicArchive has unlimited data retention, all historical data remains accessible. To query data spanning more than 365 days, split your requests into adjacent time windows that each stay within the limit. The following table shows an example for a multi-year query.

Request	`gte`	`lt`
1	`2022-01-01T00:00:00Z`	`2023-01-01T00:00:00Z`
2	`2023-01-01T00:00:00Z`	`2024-01-01T00:00:00Z`
3	`2024-01-01T00:00:00Z`	`2025-01-01T00:00:00Z`
4	`2025-01-01T00:00:00Z`	(now)

You can check a stream’s current maxFilteringInterval by retrieving the stream at /api/v1/projects/{project}/streams/{streamId} and inspecting settings.limits.maxFilteringInterval. The value is an ISO 8601 duration (for example, P1Y for one year or P7D for seven days).

For mutable streams, lastUpdatedTime is optional, but using it improves query performance.

Pagination and data retrieval

The Records API provides three endpoints for consuming data, each with different pagination behavior:

filter: Returns up to 1,000 records in a single response. This endpoint does not support cursor-based pagination. It’s designed for interactive queries where you need custom sorting and expect a bounded result set. If you need to retrieve more than 1,000 matching records, use the sync endpoint instead.
sync: The only endpoint that supports cursor-based pagination. It returns up to 1,000 records per page, and you iterate through results by passing the cursor from the previous response. Use this endpoint for batch processing, data exports, or any workflow that needs to process large volumes of records. The sync endpoint provides the same filtering capabilities as filter, but does not support custom sorting.
aggregate: Returns all results in a single response. Cursor-based pagination is not supported because aggregations compute over the entire matching dataset. The result size is inherently bounded by the aggregation structure, not by the number of individual records.

Deleting streams

Streams are resource-intensive, long-lived entities designed to persist for the lifetime of your project. Plan your stream strategy carefully and avoid patterns that involve repeatedly creating and deleting streams. When you delete a stream, it enters a soft-deleted state to protect against accidental data loss. Streams have no backup mechanism, so soft-delete is the only way to recover from accidental deletion. During this period:

The stream and its data are preserved but inaccessible (no ingestion or queries).
The stream doesn’t count toward the active stream limit.
The stream’s externalId is reserved and cannot be reused for a new stream until the soft-delete period expires.
You can recover the stream by contacting Cognite Support.

The duration of the soft-deleted state depends on the stream template. After this period, the stream and its data are permanently deleted, and the externalId becomes available for reuse. A single project can have a limited number of soft-deleted streams at any given time. To avoid hitting this limit, avoid creating and deleting streams frequently.

We expect streams to be long-lived. The exception is streams created with one of the test templates. Deleting a stream can take a long time, depending on the stream settings and the volume of data stored.

For limits on active and soft-deleted streams, see Limits and restrictions.

Data retention

Some stream templates define a dataDeletedAfter retention period that controls how long records are kept before they are automatically removed.

Automatic data retention only applies to immutable streams. Mutable streams do not support the dataDeletedAfter setting. If you need to manage the lifecycle of records in a mutable stream, you must handle it yourself. For example, by deleting records directly or archiving finalized records to an immutable stream with a retention policy.

You can check a stream’s retention setting by retrieving the stream at /api/v1/projects/{project}/streams/{streamId} and inspecting settings.lifecycle.dataDeletedAfter. The value is an ISO 8601 duration (for example, P7D for seven days). If this field is absent, the stream has unlimited retention.

Record deletion after the retention period is not instantaneous. Expired records are removed in large batches, and there may be a delay of up to several days between a record’s expiration and its actual deletion. During this time, expired records remain queryable and continue to count toward the stream’s storage and record count limits. Plan your stream capacity accordingly, accounting for this buffer when estimating total storage needs.

Records

Records are individual data objects that represent events, logs, or historical entries. Whether a record is immutable or mutable depends on the stream template you choose when creating the stream. An industrial knowledge graph describes relationships between entities using nodes and edges. Nodes can represent physical entities, such as equipment, or logical concepts, such as activities and process stages. However, when handling bulk data such as logs or historical records, storing each individual record as a node increases relational complexity and degrades query and retrieval performance. The following diagram represents this anti-pattern you should avoid. Use the Records service to avoid these performance penalties for high-volume data. Records, together with streams, let you store high-volume structured data in bulk, improving both the performance and scalability of your CDF-based solutions. Immutability is a key design feature for records that guarantees historical records cannot be altered, while also delivering cost-effective support for massive storage volumes. Although records support mutability through mutable stream templates, updating records comes at a significant processing and ingestion cost compared to data modeling instances. Use mutable streams as a transitional stage for data that needs updates during its lifecycle, then archive finalized records to an immutable stream for permanent storage.

Updating records in mutable streams

Records do not support partial updates. When you upsert a record in a mutable stream, you must provide the complete state of the record, including all properties for the container. The upsert operation replaces the entire record, any properties you omit are not preserved from the previous version. This means that all non-nullable properties must be included in every upsert request, even if you only want to change one field. Omitting a non-nullable property causes a validation error. The recommended workflow for updating a record is:

Read the existing record using the filter or sync endpoint.
Merge your changes into the full property set.
Upsert the complete record back to the stream.

This differs from data modeling instances, which support partial updates where you only need to send the properties you want to change. For records, every upsert is a full replacement.

Records vs. nodes

	Data modeling nodes	Records
Storage	Entities in the industrial knowledge graph	Data stored in streams as large batches
Identification	Instances must have a unique external ID per node in the space.	See Identifiers for records
Mutability	Mutable by default, immutable by configuration	Depends on the stream template applied when creating the stream
Data volumes	Millions of nodes with low growth (once the initial graph is defined)	Billions of records per year continuously
Structure	Structured using containers, option to use JSON for semi-structured data	Structured using containers, option to use JSON for semi-structured data
Relationships	Many-to-many in a mesh, defined by edges and direct relations between instances	Connected to data modeling instances via direct relations in the records

Identifiers for records

In data modeling, you identify nodes using a combination of the space ID and the mandatory node external ID. The external ID must be unique within the space it’s scoped to, but you can reuse the same external ID across different spaces. Records also use external IDs. Like data modeling nodes, a record’s external ID belongs to a space and is stored in a stream that can include records from multiple spaces. For records, the stream type determines the uniqueness constraints:

Mutable streams: the service enforces uniqueness for each combination of external ID, space ID, and stream. When you update a record with the same external ID/space/stream combination, it updates the existing record rather than creating a new one.
Immutable streams: the service does not enforce uniqueness. An immutable stream can contain multiple records with the same stream/space/external ID combination. This is useful for storing the full history of a record over time. You can use filtering capabilities to retrieve these records in bulk.

In a single write request to a mutable stream, all combinations of space + externalId must be unique. You cannot create and update a record with the same space and external ID combination in the same POST request to /streams/{streamId}/records.

Spaces

Records use data modeling spaces for access control and organization. You must define a space before you can ingest records into it. Records can share spaces with data modeling instances, be stored in dedicated spaces, or use multiple shared spaces depending on your access control requirements. The following diagram illustrates three common space organization patterns: shared spaces where instances and records coexist, independent spaces for separate organization, and multiple shared spaces where records can belong to multiple spaces simultaneously.

Currently, you can delete a space that contains records. In an upcoming release, deleting a space with records will be prevented. You’ll need to delete all records from the space before you can delete the space itself, making this consistent with data modeling behavior.If you delete a space and recreate it with the same external ID, the records will still be associated with the original space ID and remain accessible in queries.

Containers

Records use data modeling containers to define their schema. You can only ingest records into containers with usedFor set to record. Containers designated for records (usedFor: record) support significantly more properties than standard containers used for nodes and edges. See Limits and restrictions for specific property limits.

Deleting a container makes all records using that container permanently inaccessible, even if you recreate a container with the same external ID and schema.Plan your container schema carefully before ingesting records, as you cannot recover access to records after deleting their container.

Not all container capabilities apply to records. When designing containers for records, review the following tables to understand which features are supported and which are not supported.

Constraints and indexes

Constraints and indexes are not available for record containers.

Feature	Data modeling instances	Records
Constraints (uniqueness, requires)	Supported	Not supported
Indexes (BTree, inverted)	Supported	Not supported

Property settings and types

Some individual property settings are not supported for record containers.

Feature	Data modeling instances	Records
`immutable`	Enforced. Updates to the property are rejected after initial write.	Not supported. Immutability is a stream-level concept only.
`autoIncrement`	Supported. The system auto-generates incrementing values.	Not supported. Omitting a non-nullable auto-increment property causes ingestion failure.
`constraintState`	Supported. Tracks validity of constraints on a property.	Not applicable. Records don’t support constraints.
`enum` type	Supported	Supported
`timeseries`, `file`, `sequence` types	Supported	Not supported
`collation` (text sorting rules)	Supported. Controls sort order using ICU collation rules.	Not supported

Direct relations

Direct relation properties are supported in record containers, but with reduced validation and no auto-creation behavior.

Feature	Data modeling instances	Records
Target validation	Enforced. Validates both target space and instance existence.	Partially enforced. Validates target space existence only.
Auto-creation	Supported. Auto-creates target nodes when `autoCreateDirectRelations` is set.	Not supported
`container` constraint	Enforced. Validates the target matches the specified container type.	Not supported

Linking records to the knowledge graph

You can link records to the knowledge graph by defining direct relation properties in your record container schema. These properties enable you to contextualize records by storing references to data modeling instances using their space and external ID. For instance, you can link sensor logs to specific wells or alarm records to particular assets. To link sensor logs to a Well in your data model:

Define a direct relation property in your record container schema, such as well of type direct relation.
Assign the relationship when ingesting records by providing the space and external ID of the target instance.
Query and filter efficiently using this property to retrieve all records associated with a specific instance.

This approach enables the Records service to filter data efficiently without scanning entire streams. For example, you can run queries like “retrieve all sensor logs for well X” or “find all maintenance records for asset Y.”

Validation behavior for direct relationsWhen you ingest records with direct relation properties, the Records service validates that the target space exists but does not verify that the referenced instance exists. This differs from standard data modeling behavior, where both space and instances are validated.You can ingest a record with a direct relation pointing to a non-existent instance, as long as the target space is valid. Design your ingestion pipelines to ensure referenced instances exist before creating records that link to them.

When designing your container schema, include direct relation properties for instances you’ll frequently use in filters and aggregations.

Capabilities

Records and streams have their own capabilities for access control. These capabilities are independent of each other and are not inherited from the data modeling service. However, because records rely on the data modeling container feature, you must have the dataModels:READ capability to read or write records in a stream.

Data ingestion

The Records service operates with near real-time consistency. When you ingest or update records, there is typically a brief delay, up to a few seconds, between when the API returns a successful response and when the changes become visible in search results, filters, and aggregations. This delay occurs because the service periodically makes newly ingested data searchable, balancing performance for high-volume data ingestion with quick data availability. In most cases, new or updated records become searchable within 1-2 seconds of ingestion. Keep this near real-time consistency in mind when designing your application:

Write-then-read scenarios: If you ingest a record and immediately query for it, the record may not appear in the results yet. Consider implementing a brief retry mechanism or delay if your workflow depends on immediate read-after-write consistency.
Immediate updates: For use cases with low data volumes requiring immediate visibility of every update, consider using data modeling instances instead of records.

Unit-aware queries

Record containers support float properties with units from the CDF unit catalog. When you query records, you can request values in a different compatible unit without modifying the stored data. The records/sync, records/filter, and records/aggregate endpoints accept a top-level targetUnits parameter that converts both response values and filter inputs to your requested unit. For example, if your container stores maxPressure in bar, you can request results in pascal and write your range filters with pascal values — the service handles the conversion in both directions automatically. Set includeTyping: true to include a typing block in the response that shows the resolved unit for each property, which is useful for building dashboards that display unit labels dynamically. For the full units reference, code examples, and data normalization patterns, see Units in CDF — Integration with records.

Getting started

To begin using records and streams effectively, start by identifying your high-volume data sources and the target structure you need. Then explore:

Get started with Records — Complete tutorial that walks you through creating schemas, setting up streams, ingesting records, querying, and building stream-to-stream pipelines.
Aggregate records reference — Use aggregations to compute statistics and analyze trends across records without retrieving individual items.
Units in CDF — Integration with records — Full reference and examples for unit-aware queries, filters, aggregations, and data normalization patterns.

Data engineering

Documentation Index

​When to use records

​Core concepts

​Streams

​Stream templates

​Stream naming rules and limits

​Query time range limits

​Pagination and data retrieval

​Deleting streams

​Data retention

​Records

​Updating records in mutable streams

​Records vs. nodes

​Identifiers for records

​Spaces

​Containers

​Constraints and indexes

​Property settings and types

​Direct relations

​Linking records to the knowledge graph

​Capabilities

​Data ingestion

​Unit-aware queries

​Getting started

When to use records

Core concepts

Streams

Stream templates

Stream naming rules and limits

Query time range limits

Pagination and data retrieval

Deleting streams

Data retention

Records

Updating records in mutable streams

Records vs. nodes

Identifiers for records

Spaces

Containers

Constraints and indexes

Property settings and types

Direct relations

Linking records to the knowledge graph

Capabilities

Data ingestion

Unit-aware queries

Getting started