With CDF Transformations, you can use Spark SQL queries to transform data from the CDF staging area, RAW, into the CDF data model. You can continuously monitor the transformations to solve any issues before they reach the data consumer.
Before you start
Make sure you have completed the steps in this article to register an app for the transformation in your identity provider (IdP) and to set up the necessary folders and capabilities to run or schedule transformations.
Create a transformation
Navigate to Cognite Data Fusion and select Integrate > Transform data.
Select Create transformation, enter a unique name and a unique external ID.
Optionally, associate the transformation to an existing data set.tip
To make a copy of an existing transformation, select ... > Duplicate when you've created the transformation.
Under Destination, select the CDF resource type you want to ingest data into. The Destination schema section lists the required and optional properties for the destination you select. Required columns are marked with an exclamation point.Note
- If you're ingesting data into the Assets resource type, make sure a parent asset already exists in CDF.
- If you're ingesting data into RAW rows, specify the RAW database and table you want to write to.
Under Action, select how you want to handle data already at the destination.
Under For incoming NULL values on updates, specify how the transformation should set null values when you update existing data at the destination.
Select Keep existing values to not update existing data. This is the default setting.
Select Clear existing values to set existing values to null, for example, when a piece of equipment is removed for maintenance. Use this option to disassociate the asset from its parent in the asset hierarchy.
In the SQL editor, specify a Spark SQL query to select and transform RAW data.
To preview the RAW tables, select RAW explorer ( ) in the sidebar.
Select Preview to verify that the transformation produces the expected output. To change the maximum number of rows to read from the data sources, select a source limit.
The table headings in the Query results preview window show the required data types for the selected destination resource type.
The first time you run a transformation, you must specify the credentials the transformation should use to authenticate with CDF.
Select ... > Set credentials to specify the Client ID and the Client secret for the app you registered for the transformation in Azure AD. CDF automatically refills the remaining fields.
If you don't know what values to enter in these fields, contact your internal help desk or the CDF admin for help.
Optionally, you can specify separate credentials for reading and writing data, for example, to transform data between different projects.
Select Run now to manually start a transformation, or follow the steps in schedule transformations to run your transformation at regular intervals.
Select Schedule to specify when and how often you want the transformation should run.
Select a predefined schedule or specify a cron expression.
45 23 * * *will run the transformation at 23:45 (11:45 PM) every day.
Select Set schedule to activate the schedule. When you schedule a transformation, CDF sets it to read-only to prevent unintentional changes to future scheduled jobs.TIP
To edit credentials, schedules, and notifications for the selected transformation, navigate to the Home ( ) section in the sidebar.
To monitor the transformation process and solve any issues before they reach the data consumer, you can subscribe to email notifications if a transformation fails.
Navigate to the Home ( ) section in the sidebar and select Notifications > Edit.