Skip to main content

About data transformation

Data transformation is the process of changing your data set from one state into another and is a core part of a data integration workflow.

Cognite Data Fusion (CDF) ships with a built-in data transformation tool, CDF Transformations, and provides integrations to many other data transformation technologies. Which tool to use depends on your transformation requirements and your technology preferences.

Use data transformations to:

  • Re-shape the data to fit a target data model. For example, read a data object from the CDF staging area (RAW) and shape it into an event.
  • Enrich the data with more information. For example, add data from other sources or run a feature generation algorithm.
  • Contextualize the data by trying to match it or compare it with other data objects in your collection.
  • Analyze the quality of the data. For example, to check if all the required information is present in the data object.

The primary function is to transform the data from the CDF staging area, or a similar staging system, into the CDF data model, where the data can be further enriched with more relationships for in-depth analytics and real-time insight.

Transformation tools

Data transformation can be a resource-intensive job. The transformation logic may be complex, the data volumes may be large, and the transformation job may need to complete its task within a small time window. This requires the data transformation tool to be performant, scalable, and robust. We recommend you use tools that offer high capacity and are fault tolerant.

Transformation toolDescription
CDF TransformationsWith CDF Transformations, you can use Spark SQL queries to transform data from the CDF staging area, RAW, into the CDF data model and continuously monitor the transformations to solve any issues before they reach the data consumer.

CDF Transformations is an integrated part of CDF, and you can run it in your browser.
DatabricksDatabricks is a collaborative, Jupyter-style notebook application that lets you analyze and transform data in CDF using distributed cloud computing, Spark, and the Cognite Spark Data Source.