Aller au contenu principal

Transform data

With CDF Transformations, you can use Spark SQL queries to transform data from the CDF staging area, RAW, into the CDF data model. You can continuously monitor the transformations to solve any issues before they reach the data consumer.

tip

You can also transform data using Databricks, the Cognite API, the Cognite Python SDK, and the Transformations CLI.

Before you start

Make sure you have completed the steps in this article to register an app for the transformation in your identity provider (IdP) and to set up the necessary folders and capabilities to run or schedule transformations.

Create a transformation

  1. Navigate to Cognite Data Fusion and select Integrate > Transform data.

  2. Select Create transformation, enter a unique name and a unique external ID.

  3. Optionally, associate the transformation to an existing data set.

    tip

    To make a copy of an existing transformation, select ... > Duplicate when you've created the transformation.

  4. Under Destination, select the CDF resource type you want to ingest data into. The Destination schema section lists the required and optional properties for the destination you select. Required columns are marked with an exclamation point.

    Note
    • If you're ingesting data into the Assets resource type, make sure a parent asset already exists in CDF.
    • If you're ingesting data into RAW rows, specify the RAW database and table you want to write to.
  5. Under Action, select how you want to handle data already at the destination.

  6. Under For incoming NULL values on updates, specify how the transformation should set null values when you update existing data at the destination.

    • Select Keep existing values to not update existing data. This is the default setting.

    • Select Clear existing values to set existing values to null, for example, when a piece of equipment is removed for maintenance. Use this option to disassociate the asset from its parent in the asset hierarchy.

  7. In the SQL editor, specify a Spark SQL query to select and transform RAW data.

    To preview the RAW tables, select RAW explorer ( RAW explorer ) in the sidebar.

  8. Select Preview to verify that the transformation produces the expected output. To change the maximum number of rows to read from the data sources, select a source limit.

    Transformations user interface

    The table headings in the Query results preview window show the required data types for the selected destination resource type.

  9. The first time you run a transformation, you must specify the credentials the transformation should use to authenticate with CDF.

    Select ... > Set credentials to specify the Client ID and the Client secret for the app you registered for the transformation in Azure AD. CDF automatically refills the remaining fields.

    Set credentials menu option

    If you don't know what values to enter in these fields, contact your internal help desk or the CDF admin for help.

    Optionally, you can specify separate credentials for reading and writing data, for example, to transform data between different projects.

  10. Select Run now to manually start a transformation, or follow the steps in schedule transformations to run your transformation at regular intervals.

Schedule transformations

  1. Select Schedule to specify when and how often you want the transformation should run.

  2. Select a predefined schedule or specify a cron expression.

    For example, 45 23 * * * will run the transformation at 23:45 (11:45 PM) every day.

    Set transformation schedule
  3. Select Set schedule to activate the schedule. When you schedule a transformation, CDF sets it to read-only to prevent unintentional changes to future scheduled jobs.

    TIP

    To edit credentials, schedules, and notifications for the selected transformation, navigate to the Home ( Home ) section in the sidebar.

Monitor transformations

To monitor the transformation process and solve any issues before they reach the data consumer, you can subscribe to email notifications if a transformation fails.

  1. Navigate to the Home ( Home ) section in the sidebar and select Notifications > Edit.

    Monitor transformations