When to use records
Records are ideal for high-volume, structured data that doesn’t require complex graph relationships. Common use cases include:- High-volume immutable data: Logs, events, and notifications (OPC UA events, PI EventFrames, well logs, manufacturing batch logs)
- Archived and historical data: Completed work orders, resolved alarms, concluded activities
- Infrequently updated operational data: Active work orders or alarms with low update frequency over their lifecycle
Core concepts
To work effectively with records, you need to understand these key concepts:- Streams define the lifecycle and performance characteristics of your data.
- Records are the individual data objects you store.
- Spaces provide access control and organization.
- Containers define the schema structure.
Streams
Streams are logical containers that organize your records and define how they behave throughout their lifecycle. When you create a stream, you choose a template that sets policies for:- Retention periods: How long records are stored before automatic deletion.
- Mutability: Whether records can be updated after ingestion.
- Performance characteristics: Ingestion and query throughput limits.
Stream templates
Stream templates are in betaStream templates are currently in beta, which means the available templates may change as we gather feedback and continue development. New templates may be added, and existing templates may be modified or removed if necessary. Any changes will not affect existing streams you’ve already created from these templates.
- The choice between mutable and immutable records has significant scale implications. Different stream templates support different maximum record counts and storage capacities.
- You cannot change a stream’s template after creation, so choose your template carefully for production use.
- Review the limits and specifications for each template in the Streams API documentation before creating your streams, and select based on your expected data volume and mutability requirements.
| Template | Mutability | Purpose |
|---|---|---|
| ImmutableTestStream | Immutable | Sandbox testing of immutable record data |
| BasicArchive | Immutable | Long-term archival storage with unlimited retention |
| BasicLiveData | Mutable | Infrequently updated “live” data |
For enterprise CDF subscriptions, additional high-scale stream templates are available with increased capacity:
- Streams: Up to 30 active streams per project
- Records: Up to 5 billion (immutable) or 100 million (mutable) records per stream
- Storage: Up to 5 TB (immutable) or 300 GB (mutable) per stream
- Max write: Up to 170 MB per 10 minutes per stream
- Max read: Up to 1.7 GB per 10 minutes per stream
429 Too Many Requests responses. For detailed specifications, limits, and throughput rates for each template, see the Streams API documentation. For a summary of Records resource limits and API operation limits, see Limits and restrictions.
If you’re building applications or services that use Records, implement the recommended approaches for managing concurrency and rate limits to avoid hitting these limits.
Deleting streams
When you delete a stream, it enters a soft-deleted state to protect against accidental data loss. During this period:- The stream and its data are preserved but inaccessible (no ingestion or queries).
- The stream doesn’t count toward the active stream limit.
- You cannot create a new stream with the same identifier.
- You can recover the stream by contacting Cognite Support.
Records
Records are individual data objects that represent events, logs, or historical entries. Whether a record is immutable or mutable depends on the stream template you choose when creating the stream. An industrial knowledge graph describes relationships between entities using nodes and edges. Nodes can represent physical entities like equipment or logical concepts like activities and process stages. However, when you’re handling bulk data such as logs or historical records, storing each individual record as a node significantly degrades query and data retrieval performance because of increased relational complexity. This is an anti-pattern you should avoid. Use the Records service to avoid these performance penalties for high-volume data. Records, together with streams, let you store high-volume structured data in bulk, improving both the performance and scalability of your CDF-based solutions. Although records support mutability, updating records comes at a significant processing and ingestion cost compared to data modeling instances. Immutability is a key design feature for records that guarantees historical records cannot be altered, while also delivering cost-effective support for massive storage volumes.Records vs. nodes
| Data modeling nodes | Records | |
|---|---|---|
| Storage | Entities in the industrial knowledge graph | Data stored in streams as large batches |
| Identification | Instances must have a unique external ID per node in the space. | See Identifiers for records |
| Mutability | Mutable by default, immutable by configuration | Depends on the stream template applied when creating the stream |
| Data volumes | Millions of nodes with low growth (once the initial graph is defined) | Billions of records per year continuously |
| Structure | Structured using containers, option to use JSON for semi-structured data | Structured using containers, option to use JSON for semi-structured data |
| Relationships | Many-to-many in a mesh, defined by edges and direct relations between instances | Connected to Data Modeling instances via direct relations in the records |
Identifiers for records
In data modeling, you identify nodes using a combination of the space ID and the mandatory node external ID. The external ID must be unique within the space it’s scoped to, but you can reuse the same external ID across different spaces. Records also use external IDs. Like data modeling nodes, a record’s external ID belongs to a space and is stored in a stream that can include records from multiple spaces. For records, the stream type determines the uniqueness constraints:- Mutable streams: the service enforces uniqueness for each combination of external ID, space ID, and stream. When you update a record with the same external ID/space/stream combination, it updates the existing record rather than creating a new one.
- Immutable streams: the service does not enforce uniqueness. An immutable stream can contain multiple records with the same stream/space/external ID combination. This is useful for storing the full history of a record over time. You can use filtering capabilities to retrieve these records in bulk.
In a single write request to a mutable stream, all combinations of
space + externalId must be unique. You cannot create and update a record with the same space and external ID combination in the same POST request to /streams/{streamId}/records.Spaces
Records use data modeling spaces for access control and organization. You must define a space before you can ingest records into it. Records can share spaces with data modeling instances, be stored in dedicated spaces, or use multiple shared spaces depending on your access control requirements. The following diagram illustrates three common space organization patterns: shared spaces where instances and records coexist, independent spaces for separate organization, and multiple shared spaces where records can belong to multiple spaces simultaneously.Containers
Records use data modeling containers to define their schema. You can only ingest records into containers withusedFor set to record.
Containers designated for records (usedFor: record) support significantly more properties than standard containers used for nodes and edges. See Limits and restrictions for specific property limits.
Records currently don’t support the
enum property type. Support for enum properties in records is planned for a future release.If you try to ingest records into a container that includes an enum property:- The API returns a
4xxerror if the ingested data includes a value for that enum property. - The ingestion succeeds if the ingested data omits the enum property.
Linking records to nodes in the knowledge graph
You can link records to nodes in the knowledge graph by including a property in your record that stores a reference to the related node identifier. This lets you retrieve records linked to a specific node using filters. When designing your container schema, include properties that you’ll use for filtering and querying records. For example, if you need to retrieve all records associated with a specific asset, include a direct relation to theasset in your container. This allows the Records service to efficiently filter records without scanning the entire stream.
For example, a container for maintenance logs might include properties like asset, timestamp, and severity to enable efficient filtering by asset, time range, or severity level.
Capabilities
Records and streams have their own capabilities for access control. These capabilities are independent of each other and are not inherited from the data modeling service. However, because records rely on the data modeling container feature, you must have thedataModels:READ capability to read or write records in a stream.
Data ingestion
The Records service operates with near real-time consistency. When you ingest or update records, there is typically a brief delay, up to a few seconds, between when the API returns a successful response and when the changes become visible in search results, filters, and aggregations. This delay occurs because the service periodically makes newly ingested data searchable, balancing performance for high-volume data ingestion with quick data availability. In most cases, new or updated records become searchable within 1-2 seconds of ingestion. Keep this near real-time consistency in mind when designing your application:- Write-then-read scenarios: If you ingest a record and immediately query for it, the record may not appear in the results yet. Consider implementing a brief retry mechanism or delay if your workflow depends on immediate read-after-write consistency.
- Immediate updates: For use cases with low data volumes requiring immediate visibility of every update, consider using data modeling instances instead of records.
Getting started
To begin using records and streams effectively, start by identifying your high-volume data sources and the target structure you need. Then explore:- Build a SCADA alarm management pipeline with Records - Complete tutorial that walks you through creating schemas, setting up streams, ingesting records, and querying high-volume OT event data