About the CDF components

Cognite Data Fusion (CDF) is the backbone of all the industrial applications that we at Cognite and our customers and partners are creating.

CDF has a modular design with components that make previously siloed data readily available and understandable to humans and machines.

CDF components

Data sources and extractors

Data flows from the source systems via extractors. The source systems range from industrial control systems supplying sensor data, through ERP systems to massive 3D CAD models in engineering systems. The extractors require only read-access to the sources. The extractors are custom-built for some of the industry-specific systems, and we use standard off-the-shelf ETL tools for more traditional tabular data in SQL-compatible databases.

Staging area

Data flows from the extractors into the CDF ingestion API. From here on, everything lives in the cloud. The first stop is the staging area, where tabular data is stored in its original format. This approach allows us to minimize logic in the extractors, and to run and re-run transformation on data in the cloud.

Transformation and data model

From the staging area, we transform the data into the CDF data model. We provide a data model out of the box so that users and application developers can make assumptions about the structure of the data and rapidly build applications on top of the data model. Cognite provides the tools and algorithms to perform the transformations so that industrial companies can apply their data to solve use cases at scale.

Contextualization

Our data model centers on assets, but different source systems often have different names for the same physical object. The first part of Cognite's contextualization process is to map all the asset names from different systems to one globally unique identifier per asset. For instance, an object inside a 3D model may have an ID that can we can map to an asset, while a time series from an instrument monitoring system can have another ID that we can map to the same asset. This way of connecting information from different systems, allows us to build applications where you can click a component in a 3D model and see all the connected time series data.

The second part of our contextualization process is to group assets together based on how they are connected in the real world, for instance, through physical flows. Fluid can flow from a pump through a pipe and a valve into a tank, and Cognite Data Fusion should know about those connections so that a user can, for example, ask for all the pressure readings along a flowline.

Data indexing, aggregations and model hosting

A single time series with raw data is often several gigabytes in size, and we store hundreds of thousands of such time series. It would be extremely inefficient to download all of that data and then subsample it to display and visualize the information. To ensure fast aggregations and queries, Cognite Data Fusion indexes the data in different ways and allows search, filtering, and computations on the data as part of the API. This way, you can build powerful and responsive applications with low effort.

Another way to run calculations near the data is to use the model hosting environment provided as part of Cognite Data Fusion. It can run custom Python code to make predictions from machine learning models or calculations from physics-based simulators.

3D models and visualization

A unique feature is the ability to ingest large 3D models of several gigabytes in size and process them so you can view them in a regular browser on your phone. Combined with the Cognite 3D web viewer, you can visualize industrial data in its physical context in a user-friendly way.

Data governance

Cognite Data Fusion (CDF) provides tools and features to ensure high quality through the complete lifecycle of your data, and that your data conforms to user and organizational expectations:

  • Secure access management to control access for users, apps and services to the various types of resources (data sets, assets, files, events, time series, etc.) in CDF.

  • Data sets let you document and track data lineage, ensure data integrity, and allow 3rd parties to write their insights securely back to your CDF project. Data sets group and track data by its source.

  • Monitor data quality to track time series data quality for apps and models running on data from CDF.

API, SDKs and Data Science toolkit

All the information liberated by Cognite is available through a modern REST-based API. The API does not expose any of the details of the technology used inside Cognite Data Fusion. Cognite can rapidly innovate and replace internal components as new ones become available, without breaking any of the applications that already use the API.

In addition to a well-documented API, Cognite provides connectors and SDKs for many common programming languages and analytics tools, such as Python, JavaScript, Spark, OData and Grafana. We also offer a library of low-code reusable components that make it simple to build web applications on top of Cognite Data Fusion.

See also
Video: The Cognite Architecture

Last Updated: 4/24/2020, 11:37:34 AM