Salt la conținutul principal

About data pipelines

To analyze and contextualize your data in Cognite Data Fusion (CDF), you need to establish efficient data pipelines between your existing data infrastructure and CDF.

Data pipelines are typically composed of an integration component, a transform component, and a contextualization component.

  • The integration component connects to the source system and pushes data to the staging area.

  • The transform component shapes and moves the data from the staging area into a data model. This is the component that typically hosts most of the data processing logic.

  • The contextualization component lets you combine machine learning, a rules engine, and domain expertise to map resources from different source systems to each other in a data model.

Regardless of the tools you use, we recommend that you use a modular design for your data pipelines to make them as maintainable as possible. Also, to reduce the load on your source systems, make sure that you lift the data out of the source systems and into the staging area before you start transforming it.