When you’re working with high-volume industrial data, storing each entry as a node in your knowledge graph becomes impractical. The graph becomes cluttered, query performance degrades, and you quickly hit instance budget limits. Records solve this problem by providing high-performance storage for bulk structured data that doesn’t need to be part of the graph structure. Records are structured data objects stored in streams, separate from the industrial knowledge graph. Like data modeling instances, records use containers to define their schema and spaces for access control, but they don’t create nodes or edges in the graph. This design enables you to store billions of records without impacting graph performance or consuming instance budgets. After reading this article, you’ll understand when to use records versus data modeling instances, how streams organize and manage your data, and how records integrate with your existing data modeling infrastructure.Documentation Index
Fetch the complete documentation index at: https://docs.cognite.com/llms.txt
Use this file to discover all available pages before exploring further.
When to use records
Records are designed primarily for immutable, high-volume data that doesn’t require complex graph relationships. The service is optimized for write-once, read-many scenarios where data volume and analytics are critical. Common use cases include:- High-volume immutable data: Logs, events, and notifications (OPC UA events, PI EventFrames, well logs, manufacturing batch logs)
- Archived and historical data: Completed work orders, resolved alarms, concluded activities
- Data with a defined lifecycle: Active work orders or alarms that need updates during their lifecycle before being archived to immutable storage
Core concepts
To work effectively with records, you need to understand these key concepts:- Streams define the lifecycle and performance characteristics of your data.
- Records are the individual data objects you store.
- Spaces provide access control and organization.
- Containers define the schema structure.
Streams
Streams are logical containers that organize your records and define how they behave throughout their lifecycle. When you create a stream, you choose a template that sets policies for:- Retention periods: How long records are stored before automatic deletion
- Mutability: Whether records can be updated after ingestion
- Performance characteristics: Ingestion and query throughput limits
Stream templates
Stream templates are in betaStream templates are currently in beta, which means the available templates may change as we gather feedback and continue development. New templates may be added, and existing templates may be modified or removed if necessary. Any changes will not affect existing streams you’ve already created from these templates.
- The choice between mutable and immutable records has significant scale implications. Different stream templates support different maximum record counts and storage capacities.
- You cannot change a stream’s template after creation, so choose your template carefully for production use.
- Review the limits and specifications for each template in the Streams API documentation before creating your streams, and select based on your expected data volume and mutability requirements.
ImmutableTestStream
ImmutableTestStream
Use this template exclusively for experimentation. It’s configured for high throughput and total data volume but has short data retention. Low retention in a soft-deleted state means you can quickly discard such streams when you no longer need them or recreate them to remove experimental data.
- Max number of unique properties with data across all records: 1000
- Max number of records ingested per 10 minutes: 800,000 items
- Max ingestion throughput per 10 minutes: 1.5GB
- Max reading throughput per 10 minutes: 1.5GB
- Maximum total number of records: 50M (50,000,000)
- Maximum total data volume: 50GB
- Maximum range filter interval for the
lastUpdatedTimeproperty: 7 days - Data retention: 7 days
- Stream stays in soft-deleted state before being hard-deleted: 1 day
- Maximum number of active streams per project: 3
BasicArchive
BasicArchive
This template is intended for perpetual data storage. However, overall data volume is limited, so plan usage accordingly.
- Max number of unique properties with data across all records: 1000
- Max number of records ingested per 10 minutes: 170,000 items
- Max ingestion throughput per 10 minutes: 170MB
- Max reading throughput per 10 minutes: 1.7GB
- Maximum total number of records: 50M (50,000,000)
- Maximum total data volume: 50GB
- Maximum range filter interval for the
lastUpdatedTimeproperty: 365 days - Data retention: Unlimited (data never gets deleted)
- Stream stays in soft-deleted state before being hard-deleted: 6 weeks
- Maximum number of active streams per project: 2
BasicLiveData
BasicLiveData
This template is intended for production usage and offers significant data volume and throughput.
- Max number of unique properties with data across all records: 1000
- Max number of records ingested per 10 minutes: 170,000 items
- Max number of records updated or deleted per 10 minutes: 85,000 items
- Max ingestion throughput per 10 minutes: 170MB
- Max reading throughput per 10 minutes: 500MB
- Maximum total number of records: 5M (5,000,000)
- Maximum total data volume: 15GB
- Stream stays in soft-deleted state before being hard-deleted: 6 weeks
- Maximum number of active streams per project: 2
For enterprise CDF subscriptions, additional high-scale stream templates may be available with increased capacity:
- Streams: Up to 30 active streams per project
- Records: Up to 5 billion (immutable) or 100 million (mutable) records per stream
- Storage: Up to 5 TB (immutable) or 300 GB (mutable) per stream
- Max write: Up to 170 MB per 10 minutes per stream
- Max read: Up to 1.7 GB per 10 minutes per stream
Stream naming rules and limits
StreamexternalId values must start with a lowercase letter, contain only lowercase letters, digits, hyphens, and underscores, and be at most 100 characters long. The value must match this pattern: ^[a-z]([a-z0-9_-]{0,98}[a-z0-9])?$.
The number of active streams per project is limited. For current limits, see Limits and restrictions.
Each stream template defines specific limits for records, storage, and throughput. If you exceed these limits, the Records API returns HTTP 429 Too Many Requests responses. For detailed specifications, limits, and throughput rates for each template, see the Streams API documentation. For a summary of Records resource limits and API operation limits, see Limits and restrictions.
If you’re building applications or services that use Records, implement the recommended approaches for managing concurrency and rate limits to avoid hitting these limits.
Query time range limits
Immutable streams require alastUpdatedTime range on every filter and aggregate query. The maxFilteringInterval setting on each stream template defines the maximum span between the gt (start) and lt (end) timestamps in a single request.
For example, the BasicArchive template has a maxFilteringInterval of 365 days. This means each request can cover at most a 365-day window, but this window can be anywhere in the stream’s history, not just relative to the current date. If the difference between gt and lt exceeds the interval, the API returns a validation error.
Since BasicArchive has unlimited data retention, all historical data remains accessible. To query data spanning more than 365 days, split your requests into adjacent time windows that each stay within the limit. The following table shows an example for a multi-year query.
| Request | gte | lt |
|---|---|---|
| 1 | 2022-01-01T00:00:00Z | 2023-01-01T00:00:00Z |
| 2 | 2023-01-01T00:00:00Z | 2024-01-01T00:00:00Z |
| 3 | 2024-01-01T00:00:00Z | 2025-01-01T00:00:00Z |
| 4 | 2025-01-01T00:00:00Z | (now) |
lastUpdatedTime is optional, but using it improves query performance.
Pagination and data retrieval
The Records API provides three endpoints for consuming data, each with different pagination behavior:-
filter: Returns up to 1,000 records in a single response. This endpoint does not support cursor-based pagination. It’s designed for interactive queries where you need custom sorting and expect a bounded result set. If you need to retrieve more than 1,000 matching records, use thesyncendpoint instead. -
sync: The only endpoint that supports cursor-based pagination. It returns up to 1,000 records per page, and you iterate through results by passing the cursor from the previous response. Use this endpoint for batch processing, data exports, or any workflow that needs to process large volumes of records. Thesyncendpoint provides the same filtering capabilities asfilter, but does not support custom sorting. -
aggregate: Returns all results in a single response. Cursor-based pagination is not supported because aggregations compute over the entire matching dataset. The result size is inherently bounded by the aggregation structure, not by the number of individual records.
Deleting streams
Streams are resource-intensive, long-lived entities designed to persist for the lifetime of your project. Plan your stream strategy carefully and avoid patterns that involve repeatedly creating and deleting streams. When you delete a stream, it enters a soft-deleted state to protect against accidental data loss. Streams have no backup mechanism, so soft-delete is the only way to recover from accidental deletion. During this period:- The stream and its data are preserved but inaccessible (no ingestion or queries).
- The stream doesn’t count toward the active stream limit.
- The stream’s
externalIdis reserved and cannot be reused for a new stream until the soft-delete period expires. - You can recover the stream by contacting Cognite Support.
externalId becomes available for reuse.
A single project can have a limited number of soft-deleted streams at any given time. To avoid hitting this limit, avoid creating and deleting streams frequently.
We expect streams to be long-lived. The exception is streams created with one of the
test templates. Deleting a stream can take a long time, depending on the stream settings and the volume of data stored.Data retention
Some stream templates define adataDeletedAfter retention period that controls how long records are kept before they are automatically removed.
You can check a stream’s retention setting by retrieving the stream at /api/v1/projects/{project}/streams/{streamId} and inspecting settings.lifecycle.dataDeletedAfter. The value is an ISO 8601 duration (for example, P7D for seven days). If this field is absent, the stream has unlimited retention.
Records
Records are individual data objects that represent events, logs, or historical entries. Whether a record is immutable or mutable depends on the stream template you choose when creating the stream. An industrial knowledge graph describes relationships between entities using nodes and edges. Nodes can represent physical entities, such as equipment, or logical concepts, such as activities and process stages. However, when handling bulk data such as logs or historical records, storing each individual record as a node increases relational complexity and degrades query and retrieval performance. The following diagram represents this anti-pattern you should avoid. Use the Records service to avoid these performance penalties for high-volume data. Records, together with streams, let you store high-volume structured data in bulk, improving both the performance and scalability of your CDF-based solutions. Immutability is a key design feature for records that guarantees historical records cannot be altered, while also delivering cost-effective support for massive storage volumes. Although records support mutability through mutable stream templates, updating records comes at a significant processing and ingestion cost compared to data modeling instances. Use mutable streams as a transitional stage for data that needs updates during its lifecycle, then archive finalized records to an immutable stream for permanent storage.Updating records in mutable streams
Records do not support partial updates. When you upsert a record in a mutable stream, you must provide the complete state of the record, including all properties for the container. The upsert operation replaces the entire record, any properties you omit are not preserved from the previous version. This means that all non-nullable properties must be included in every upsert request, even if you only want to change one field. Omitting a non-nullable property causes a validation error. The recommended workflow for updating a record is:- Read the existing record using the
filterorsyncendpoint. - Merge your changes into the full property set.
- Upsert the complete record back to the stream.
This differs from data modeling instances, which support partial updates where you only need to send the properties you want to change. For records, every upsert is a full replacement.
Records vs. nodes
| Data modeling nodes | Records | |
|---|---|---|
| Storage | Entities in the industrial knowledge graph | Data stored in streams as large batches |
| Identification | Instances must have a unique external ID per node in the space. | See Identifiers for records |
| Mutability | Mutable by default, immutable by configuration | Depends on the stream template applied when creating the stream |
| Data volumes | Millions of nodes with low growth (once the initial graph is defined) | Billions of records per year continuously |
| Structure | Structured using containers, option to use JSON for semi-structured data | Structured using containers, option to use JSON for semi-structured data |
| Relationships | Many-to-many in a mesh, defined by edges and direct relations between instances | Connected to data modeling instances via direct relations in the records |
Identifiers for records
In data modeling, you identify nodes using a combination of the space ID and the mandatory node external ID. The external ID must be unique within the space it’s scoped to, but you can reuse the same external ID across different spaces. Records also use external IDs. Like data modeling nodes, a record’s external ID belongs to a space and is stored in a stream that can include records from multiple spaces. For records, the stream type determines the uniqueness constraints:- Mutable streams: the service enforces uniqueness for each combination of external ID, space ID, and stream. When you update a record with the same external ID/space/stream combination, it updates the existing record rather than creating a new one.
- Immutable streams: the service does not enforce uniqueness. An immutable stream can contain multiple records with the same stream/space/external ID combination. This is useful for storing the full history of a record over time. You can use filtering capabilities to retrieve these records in bulk.
In a single write request to a mutable stream, all combinations of
space + externalId must be unique. You cannot create and update a record with the same space and external ID combination in the same POST request to /streams/{streamId}/records.Spaces
Records use data modeling spaces for access control and organization. You must define a space before you can ingest records into it. Records can share spaces with data modeling instances, be stored in dedicated spaces, or use multiple shared spaces depending on your access control requirements. The following diagram illustrates three common space organization patterns: shared spaces where instances and records coexist, independent spaces for separate organization, and multiple shared spaces where records can belong to multiple spaces simultaneously.Containers
Records use data modeling containers to define their schema. You can only ingest records into containers withusedFor set to record.
Containers designated for records (usedFor: record) support significantly more properties than standard containers used for nodes and edges. See Limits and restrictions for specific property limits.
Not all container capabilities apply to records. When designing containers for records, review the following tables to understand which features are supported and which are not supported.
Constraints and indexes
Constraints and indexes are not available for record containers.| Feature | Data modeling instances | Records |
|---|---|---|
| Constraints (uniqueness, requires) | Supported | Not supported |
| Indexes (BTree, inverted) | Supported | Not supported |
Property settings and types
Some individual property settings are not supported for record containers.| Feature | Data modeling instances | Records |
|---|---|---|
immutable | Enforced. Updates to the property are rejected after initial write. | Not supported. Immutability is a stream-level concept only. |
autoIncrement | Supported. The system auto-generates incrementing values. | Not supported. Omitting a non-nullable auto-increment property causes ingestion failure. |
constraintState | Supported. Tracks validity of constraints on a property. | Not applicable. Records don’t support constraints. |
enum type | Supported | Supported |
timeseries, file, sequence types | Supported | Not supported |
collation (text sorting rules) | Supported. Controls sort order using ICU collation rules. | Not supported |
Direct relations
Direct relation properties are supported in record containers, but with reduced validation and no auto-creation behavior.| Feature | Data modeling instances | Records |
|---|---|---|
| Target validation | Enforced. Validates both target space and instance existence. | Partially enforced. Validates target space existence only. |
| Auto-creation | Supported. Auto-creates target nodes when autoCreateDirectRelations is set. | Not supported |
container constraint | Enforced. Validates the target matches the specified container type. | Not supported |
Linking records to the knowledge graph
You can link records to the knowledge graph by defining direct relation properties in your record container schema. These properties enable you to contextualize records by storing references to data modeling instances using their space and external ID. For instance, you can link sensor logs to specific wells or alarm records to particular assets. To link sensor logs to aWell in your data model:
- Define a direct relation property in your record container schema, such as
wellof typedirect relation. - Assign the relationship when ingesting records by providing the space and external ID of the target instance.
- Query and filter efficiently using this property to retrieve all records associated with a specific instance.
Validation behavior for direct relationsWhen you ingest records with direct relation properties, the Records service validates that the target space exists but does not verify that the referenced instance exists. This differs from standard data modeling behavior, where both space and instances are validated.You can ingest a record with a direct relation pointing to a non-existent instance, as long as the target space is valid. Design your ingestion pipelines to ensure referenced instances exist before creating records that link to them.
Capabilities
Records and streams have their own capabilities for access control. These capabilities are independent of each other and are not inherited from the data modeling service. However, because records rely on the data modeling container feature, you must have thedataModels:READ capability to read or write records in a stream.
Data ingestion
The Records service operates with near real-time consistency. When you ingest or update records, there is typically a brief delay, up to a few seconds, between when the API returns a successful response and when the changes become visible in search results, filters, and aggregations. This delay occurs because the service periodically makes newly ingested data searchable, balancing performance for high-volume data ingestion with quick data availability. In most cases, new or updated records become searchable within 1-2 seconds of ingestion. Keep this near real-time consistency in mind when designing your application:- Write-then-read scenarios: If you ingest a record and immediately query for it, the record may not appear in the results yet. Consider implementing a brief retry mechanism or delay if your workflow depends on immediate read-after-write consistency.
- Immediate updates: For use cases with low data volumes requiring immediate visibility of every update, consider using data modeling instances instead of records.
Getting started
To begin using records and streams effectively, start by identifying your high-volume data sources and the target structure you need. Then explore:- Get started with Records — Complete tutorial that walks you through creating schemas, setting up streams, ingesting records, querying, and building stream-to-stream pipelines.
- Aggregate records reference — Use aggregations to compute statistics and analyze trends across records without retrieving individual items.