Transformation architecture and purpose
A transformation specifies how source data is mapped and written to a target structure in CDF. Transformations can enrich data with other sources or calculations, contextualize it by matching related objects, and check data quality before it reaches downstream users. A common pattern is to read staged data from CDF RAW and write it to a structured target such as a data model. This helps you deliver consistent, queryable data to apps and workflows, for example when you standardize equipment metadata before it is used in dashboards. Transformations are developer-centric tools for data engineers and developers who define schemas, write SQL, and manage pipelines. CDF Transformations run on a managed Spark SQL engine. You express logic in SQL (or map fields in the UI), and CDF handles scheduling, scaling, and access to CDF data sources.- Start with SQL and use transformations when your logic is declarative and best expressed as set-based operations across tables.
- Use CDF Functions when you need Python logic, external libraries, or custom API calls.
- Use the Cognite Toolkit when you want to manage transformations as code. The Toolkit lets you define transformations, schedules, and notifications in YAML (with optional SQL files) and deploy them through CI/CD.
- You can also run transformations using the Cognite API, and the Cognite Python SDK.
Target selection and operational constraints
Decide where data should land early, and avoid defaulting to the CDF staging area (RAW) as the final destination. Use RAW for staging, then write to a target that matches how the data will be used.Create a transformation
Define and run your first transformation in CDF.
SQL syntax and functions
Reference for SQL syntax and custom functions.