Prerequisites: Familiarity with the data modeling service and core CDF concepts (containers, views, and data models, spaces and instances) will help you get the most from this guide.
Why it matters
A well-designed data model reduces integration complexity, improves query performance, and makes it easier to scale as your organization grows. Poor design leads to slow queries, maintenance overhead, and difficulty adding new data sources or applications. Following the principles in this guide helps you avoid common pitfalls and build models that support both operational and analytical use cases.Layered architecture
A layered data model architecture separates responsibilities and optimizes for different use cases. Clear ownership at each layer improves maintainability: Data producers focus on quality and freshness, data modelers on integration and business logic, and data consumers on application-specific views. This structure also scales better as your organization grows, because changes in one layer do not cascade unpredictably to others. The three layers map loosely to the medallion architecture (bronze, silver, gold). See medallion architecture for an overview.
| Layer | Owner | Purpose | Medallion analogy |
|---|---|---|---|
| Source | Data producers | Raw copy of data from a single source system. Ensures availability and quality. Responsibilities include building ELT pipelines into CDF, managing upstream lineage, and monitoring source data. | Silver |
| Enterprise | Data modelers | Integrates and contextualizes data from multiple sources. Serves as the single, validated source of truth. Optimizes query performance by grouping granular asset types into higher-level classes. | Gold |
| Solution | Data consumers | Tailored, read-only views for specific applications. Data consumers specify requirements and work with the data modeler to find data in the enterprise data model, helping define new solution schemas as needed. | Application-specific |
You can use a simplified layered architecture to get started faster (for example, with only two layers). The full three-layer architecture described in this section is the recommended approach for enterprise-scale deployments.
Core design principles
Beyond layered architecture, a few design principles keep your data models efficient and scalable. These apply whether you are building a source, enterprise, or solution model.Avoid duplication via shared IDs
A single entity should be represented once in the knowledge graph. Data across views and layers is flexibly linked by using the same instance ID (space + external ID). Explicit direct relations are typically only needed to enable navigation in the UI/GraphQL or when linking instances that have different IDs.Extend core data models (CDM)
Custom views must extend (implement) the relevant base core data model (CDM) types rather than using original CDM views directly. Relating CDM views to custom views that extend CDM is complicated and risks breaking the schema and losing relations. Include all relevant CDM types for the best GraphQL and CDF service compatibility, but omit those not utilized (for example, exclude Cognite3DModel if there is no 3D model). Extend CogniteAsset only once, with a single Asset type, and provide links to domain-specific dimensions in a direct relation. A top-level central type makes the model easy to navigate and query, allows implementation of specific hierarchies within related sub-types, and provides flexibility to link documents and time series to all levels. Avoid single-purpose wide and sparse views or multiple extensions of the CogniteAsset type.
Distribute properties contextually
Split properties for the same instance across multiple, context-specific containers and views. This ensures compliance with the 300-property view limit and improves AI accuracy by restricting the view to only context-relevant properties. For example, a property like pipe coating would be excluded from a view focused on pump equipment.Model relationships
Choose the right relationship type for your use case. Direct relations (One:One, One:Many, Many:Many) are the default: they are simpler, faster, and less resource-intensive. Always include a reverse direct relation. Use edges only when you need relationship properties, when you must exceed the 2,000 linked instance limit, or when you must traverse the graph in queries and attach extra properties to the relationship. Edges add complexity and cost.| Type | When to use | Trade-off |
|---|---|---|
| Direct relations | Default for most relationships. Include reverse direct relation. | Simpler, faster, less resource-intensive |
| Edges | Relationship properties, exceed 2,000 linked instances, graph traversal with extra properties | More complex, higher cost |
Query and performance optimization
Indexing and constraints are essential for efficiency and to prevent timeouts. Without them, queries on large graphs can become unusably slow. Add B-tree indexes to scalar properties and direct relations used for filtering or reverse traversal. Use inverted indexes for list-type properties. Note the limit of 10 indexes per container. See Performance considerations for indexing guidance. Use human-readable names and descriptions for all properties to aid users and AI agents (Atlas AI). Denormalize properties that are frequently queried and give additional context, since this helps AI agents, applications, and human users alike. See Optimizing data models for AI for more guidance.Security and governance
Separating schema from data instances enables granular access control. Use distinct schema spaces (for model definitions) and instance spaces (for data), and rely on tooling for controlled deployment. The permission model distinguishes schema from data:datamodels:read gives schema access, while datamodelinstances:read is required to read any data.
Time series and files are special cases. Conceptually, they are meta/container types: the data model instance (the node) points to the actual bulk storage for the instance’s data (the datapoints or file bytes). To control access, you need two distinct types of permissions:
- Schema access: You need
datamodels:readon thecdf_cdmspace. This is required for your application to understand the structure and definition of what a CogniteTimeSeries or CogniteFile is. - Data access: You need
datamodelinstances:readon the instance space (for example,my-asset-space). This grants access to read the specific instances and their associated time series datapoints or file bytes. Using distinct instance spaces is the essential strategy for segregating data access.
sp_location_a_instances) to control data access using CDF groups.
Consider limits on space count within the project when implementing a strategy that subdivides data by source system, location, or use case. Combining these could result in exponential growth in space usage for enterprise-scale deployments.
Source and enterprise layer instance strategy
Define a strategy for instances between the source and enterprise layers. In the solution layer, you generally map data within the enterprise layer. Separation prevents data deletion risk and enables data isolation (for example, for third-party suppliers who may be competitors) but increases the instance count. Combining reduces instances and simplifies mapping, but risks unintended data loss. Extra care is needed when making changes in the source layer, since deleting an instance in a source model would delete the same instance in the enterprise model.Deployment
Use the Cognite Toolkit as the primary tool for managing and deploying data models. Store models in YAML format in a source code management platform (Git) to enable versioning, schema validation, and deployment across environments (Dev, Staging, Prod) using a CI/CD pipeline. This workflow gives you change control, traceability, and standardized deployments (for example, Gitflow). Consider NEAT when domain subject matter experts (SMEs) need to be involved in both model design and implementation. NEAT simplifies extending CDM types, creating enterprise models that can be subdivided into solution models, and configuring performance optimizations such as indexes and constraints. It may be less useful when most work is done by experienced data modelers who may not need these simplifications.In all cases (including when using NEAT), govern and version the model using the Cognite Toolkit and Git. NEAT models, often stored in Excel, can be converted to YAML for deployment with the Cognite Toolkit.
Naming and versioning
Consistent naming and versioning make long-term maintenance easier and simplify integration across teams. Follow the resource naming conventions. Key recommendations:- Use PascalCase for external IDs of containers and views (for example, CentrifugalPump).
- Use camelCase for property identifiers (for example, ratedCurrent).
- Avoid company names in external IDs.
v1, v1.0.0) and bump the view version with every schema change (recommended). Use Cognite Toolkit variables to configure names (for example, space names) and versions, avoiding the need to manually update all YAML configurations when changes occur.
Validation checklist
Before deploying or merging changes, run through this checklist to confirm your model follows the best practices in this guide. The following table lists verification items by category: layered architecture, core design, query and performance, governance, deployment, and naming.| Category | Checklist item | Check |
|---|---|---|
| Layered architecture | Source data model(s) exist (raw copy of data from a single source). | ☐ |
| Enterprise data model(s) exist (data aggregates from multiple sources). | ☐ | |
| Enterprise data model(s) provide data that is generic and understandable without source system knowledge. | ☐ | |
| Solution data model(s) exist (provided for specific use cases). | ☐ | |
| Solution models write-back data to a separate source data model. | ☐ | |
| Core design and scalability | Properties are distributed across contextual containers and views. | ☐ |
| Custom views extend (implement) the necessary base CDM types. | ☐ | |
| CogniteAsset is extended only once and linked to domain-specific dimensions. | ☐ | |
| Unnecessary data duplication is avoided. | ☐ | |
| Models use shared instance IDs to enable flexible linking. | ☐ | |
| Direct relations are used as the default relationship. | ☐ | |
| Query and performance | B-tree and inverted indexes are used on frequently filtered properties. | ☐ |
| Requires container constraints are used on views that implement others. | ☐ | |
| Human-readable names and descriptions are used for all properties to aid users and AI agents. | ☐ | |
| Properties that are frequently queried and give additional context are denormalized. | ☐ | |
| Governance and security | Schema and instance spaces are kept separate. | ☐ |
| Access control is set up per facility, plant, or site. | ☐ | |
| A clear instance strategy for the source and enterprise layers is defined. The recommendation is to separate. | ☐ | |
| Development and deployment | The Cognite Toolkit is used to manage resources. | ☐ |
| A source code management platform like Git is used. | ☐ | |
| NEAT is considered as one option to develop data models, where customer subject matter experts (SMEs) are heavily involved. | ☐ | |
| Naming and versioning | An explicitly defined naming convention is in place, referencing the defined best practices. | ☐ |
| A numerical versioning pattern is used. | ☐ | |
| View versions are incremented for every schema change. | ☐ | |
| Cognite Toolkit variables are used to configure names (for example, space names) and versions. | ☐ |
Tools and services
The following services and tools support data modeling in CDF. Choose the right tool based on your role and workflow.| Service | Description |
|---|---|
| Data modeling service | The engine within CDF that builds and manages an Industrial Knowledge Graph. Uses a property graph structure (nodes, edges, and properties) with spaces, containers, and views to organize and contextualize industrial data. Provides APIs for creating, ingesting, and querying models. |
| NEAT | A solution for domain experts and developers that enables rapid development of data models. Handles ETL of data instances and ingests models and instances as knowledge graphs into CDF. Acts as an interface to the data modeling service outside of CDF. |
| Cognite Toolkit | Tooling to manage, configure, and deploy resources in CDF. |
Further reading
- Data modeling concepts – Core concepts: property graph, spaces, instances, containers, views, and data models.
- Data modeling guides – Examples and best practices for extending CDM, performance, and CI/CD.
- Data modeling principles – Organizational principles and best practices for data modeling with NEAT.
- Resource naming conventions – Complete reference for naming CDF resources.