Saltar al contenido principal

About data pipelines

To analyze and contextualize your data in Cognite Data Fusion (CDF), you need to establish efficient data pipelines between your existing data infrastructure and CDF. Data pipelines consist of three main components: data integration, data transformation, and data contextualization.

Data integration

CDF offers two approaches for data integration:

  • Extractors are services that transfer data from external systems into CDF. Extractors perform one-way communication, moving data from source systems into CDF's specialized storage or the staging area (CDF RAW).

  • Connectors are services that enable bi-directional data transfer between external systems and CDF. Connectors can both read from and write to external systems, providing more flexibility in data synchronization.

Depending on their implementation, extractors and connectors write data to different destinations in CDF. Some integrations write directly to specialized and optimized storage for specific resource types, such as time series, assets, and simulations. Other integrations write to CDF RAW, the staging area for ingesting data. Some integrations support multiple destination options.

Always refer to the specific extractor or connector documentation to understand the available data destinations and capabilities for your integration.

Data transformation

The data transformation component shapes data to fit a CDF data model. When working with staged data in CDF RAW, you can use Transformations to process and structure your data. The transformation component typically contains most of the data processing logic in your pipeline.

Data contextualization

The data contextualization component establishes relationships between data elements from different source systems within your data model. It combines machine learning capabilities, a rules engine, and domain expertise to create meaningful connections between your data resources.

Best practices

Regardless of the tools you use, we recommend using a modular design for your data pipelines. Thism makes the pipelines easy to maintain and individual components easier to update or replace.

If you're using extractors that write to CDF RAW, make sure that you lift the data out of the source systems and into the staging area before you start the data transformation. This reduces the load on your source systems.

When you select an integration method, consider your specific needs. Extractors are ideal when you only need to move data from external systems into CDF, while connectors are better suited when you need bi-directional data synchronization between CDF and external systems. Write directly to specialized storage when your data is ready for immediate use, and use CDF RAW when your data requires additional transformation before reaching its final form.