About the CDF components

Cognite Data Fusion (CDF) is the backbone of all the industrial applications that we at Cognite and our customers and partners are creating.

CDF has a modular design with components that make previously siloed data readily available and understandable to humans and machines.

CDF components

Data sources and extractors

Data flows from the source systems via extractors. The source systems range from industrial control systems supplying sensor data, through ERP systems to massive 3D CAD models in engineering systems. The extractors require only read-access to the sources. The extractors are custom-built for some of the industry-specific systems, and we use standard off-the-shelf ETL tools for more traditional tabular data in SQL-compatible databases.

Staging area

Data flows from the extractors into the CDF ingestion API. From here on, everything lives in the cloud. The first stop is the staging area, where tabular data is stored in its original format. This approach allows us to minimize logic in the extractors, and to run and re-run transformation on data in the cloud.

Transformation and data model

From the staging area, we transform the data into the CDF data model. We provide a data model out of the box so that users and application developers can make assumptions about the structure of the data and rapidly build applications on top of the data model. Cognite provides the tools and algorithms to perform the transformations so that industrial companies can apply their data to solve use cases at scale.


Contextualization is key to make your data more accessible, rapidly draw new insights from your data, and make your data do more.

The interactive contextualization tools in Cognite Data Fusion (CDF) let you combine machine learning, a powerful rules engine, and domain expertise to map resources from different source systems to each other in the CDF data model.

This way of connecting information allows you to build applications where you, for example, can click a component in a 3D model to see all the connected time series data or ask for all the pressure readings along a flow line.

Our contextualization tools help you build interactive P&IDs (Piping and Instrumentation Diagrams) from static PDF source files, and match entities to set up, automate and validate all your contextualization pipelines from your browser without having to write any code.

Data indexing, aggregations and model hosting

A single time series with raw data is often several gigabytes in size, and we store hundreds of thousands of such time series. It would be extremely inefficient to download all of that data and then subsample it to display and visualize the information. To ensure fast aggregations and queries, Cognite Data Fusion indexes the data in different ways and allows search, filtering, and computations on the data as part of the API. This way, you can build powerful and responsive applications with low effort.

Another way to run calculations near the data is to use the model hosting environment provided as part of Cognite Data Fusion. It can run custom Python code to make predictions from machine learning models or calculations from physics-based simulators.

3D models and visualization

A unique feature is the ability to ingest large 3D models of several gigabytes in size and process them so you can view them in a regular browser on your phone. Combined with the Cognite 3D web viewer, you can visualize industrial data in its physical context in a user-friendly way.

Data governance

Cognite Data Fusion (CDF) provides tools and features to ensure high quality through the complete lifecycle of your data, and that your data conforms to user and organizational expectations:

  • Secure access management to control access for users, apps and services to the various types of resources (data sets, assets, files, events, time series, etc.) in CDF.

  • Data sets let you document and track data lineage, ensure data integrity, and allow 3rd parties to write their insights securely back to your CDF project. Data sets group and track data by its source.

  • Monitor data quality to track time series data quality for apps and models running on data from CDF.

API, SDKs and Data Science toolkit

All the information liberated by Cognite is available through a modern REST-based API. The API does not expose any of the details of the technology used inside Cognite Data Fusion. Cognite can rapidly innovate and replace internal components as new ones become available, without breaking any of the applications that already use the API.

In addition to a well-documented API, Cognite provides connectors and SDKs for many common programming languages and analytics tools, such as Python, JavaScript, Spark, OData and Grafana. We also offer a library of low-code reusable components that make it simple to build web applications on top of Cognite Data Fusion.

See also
Video: The Cognite Architecture

Last Updated: 9/10/2020, 12:11:40 PM