YAML configuration reference

The YAML resource configuration files are core to the CDF Toolkit. Each of the files configure one of the resource types that are supported by the CDF Toolkit and the CDF API. This article describes how to configure the different resource types.

The CDF Toolkit bundles logically connected resource configuration files in modules, and each module stores the configuration files in directories corresponding to the resource types, called resource directories. The available resource directories are:

./<module_name>/
         ├── auth/
         ├── data_models/
         ├── data_sets/
         ├── extraction_pipelines/
         ├── files/
         ├── functions/
         ├── labels/
         ├── raw/
         ├── timeseries/
         ├── timeseries_datapoints/
         ├── transformations/
         └── workflows/

Note that a resource directory can host one or more configuration types. For example, the data_models/ directory hosts the configuration files for spaces, containers, views, data models, and nodes. While, the labels/ directory hosts the configuration files for labels.

When you deploy, the CDF Toolkit uses the CDF API to implement the YAML configurations in the CDF project.

In general, the format of the YAML files matches the API specification for the resource types. We recommend that you use the externalId of the resources as (part of) the name of the YAML file. Use number prefixes (1.<filename.suffix>) to control the order of deployment within each resource type.

Groups

Resource directory: auth/

API documentation: Groups

The group configuration files are stored in the module's auth/ directory. Each group has a separate YAML file.

The name field is used as a unique identifier for the group. If you change the name of the group manually in CDF, it will be treated as a different group and will be ignored by the CDF Toolkit.

Example group configuration:

my_group.yaml
name: 'my_group'
sourceId: '{{mygroup_source_id}}'
metadata:
  origin: 'cognite-toolkit'
capabilities:
  - projectsAcl:
      actions:
        - LIST
        - READ
      scope:
        all: {}

We recommend using the metadata:origin property to indicate that the group is created by the CDF Toolkit. Populate the sourceId with the group ID from CDF project's identity provider.

You can specify each ACL capability in CDF as in the projectsAcl example above. Scoping to dataset, space, RAW table, current user, or pipeline is also supported (see ACL scoping).

Groups and group deletion

If you delete groups with the cdf-tk clean command, the CDF Toolkit skips the groups that the running user or service account is a member of. This prevents the cleaning operation from removing access rights from the running user and potentially locking the user out from further operation.

ACL scoping

Dataset scope

Use to restrict access to data in a specific data set.

<fragment>
- threedAcl:
    actions:
      - READ
    scope:
      datasetScope: { ids: ['my_dataset'] }

piezīme

The groups API uses an internal ID for the data set, while the YAML configuration files reference the external ID. The CDF Toolkit resolves the external ID to the internal ID before sending the request to the CDF API.

Space scope

Use to restrict access to data in a data model space.

<fragment>
- dataModelInstancesAcl:
    actions:
      - READ
    scope:
      spaceIdScope: { spaceIds: ['my_space'] }

Table scope

Use to restrict access to a database or a table in a database.

<fragment>
- rawAcl:
    actions:
      - READ
      - WRITE
    scope:
      tableScope:
        dbsToTables:
          my_database:
            tables: []

Current user-scope

Use to restrict actions to the groups the user is a member of.

<fragment>
- groupsAcl:
    actions:
      - LIST
      - READ
    scope:
      currentuserscope: {}

Security categories

Requires CDF Toolkit v0.2.0 or later

Resource directory: auth/

API documentation: Security categories

The security categories are stored in the module's auth/ directory. You can have one or more security categories in a single YAML file.

We recommend that you start security category names with sc_ and use _ to separate words. You'll get a warning if you don't follow this naming convention. The file name is not significant, but we recommend that you name it after the security categories it creates.

project_categories.SecurityCategory.yaml
- name: sc_my_security_category
- name: sc_my_other_security_category

Data models

Resource directory: data_models/

API documentation: Data modeling

The data model configurations are stored in the module's data_models directory.

A data model consists of a set of data modeling entities: one or more spaces, containers, views, and data models. Each entity has its own file with a suffix to indicate the entity type: my.space.yaml, my.container.yaml, my.view.yaml, my.datamodel.yaml. We recommend you keep one instance per file.

You can also use the CDF Toolkit to create nodes to keep configuration for applications (for instance, InField) and to create node types that are part of the data model. Define nodes in files with the .node.yaml suffix.

The CDF Toolkit applies configurations in the order of dependencies between the entity types: first spaces, next containers, then views, and finally data models.

If there are dependencies between the entities of the same type, you can use prefix numbers in the filename to have the CDF Toolkit apply the files in the correct order.

The CDF Toolkit supports using subdirectories to organize the files, for example:

data_models/
    ┣ 📂 containers/
    ┣ 📂 views/
    ┣ 📂 nodes/
    ┣ 📜 my_data_model.datamodel.yaml
    ┗ 📜 data_model_space.space.yaml

Spaces

API documentation: Spaces

Spaces are the top-level entities and is the home of containers, views, and data models. You can create a space with a .space.yaml file in the data_models/ directory.

sp_cognite_app_data.space.yaml
space: sp_cognite_app_data
name: cognite:app:data
description: Space for InField app data

piezīme

To prevent externally defined spaces referenced in the view and data model configurations from being deleted, the cdf-tk clean command deletes data only in spaces that have been defined by a <space_name>.space.yaml file in the data_models/ directory of the module.

CDF doesn't allow a space to be deleted unless it's empty. If a space contains, for example, nodes that aren't governed by the CDF Toolkit, the CDF Toolkit will not delete the space.

Containers

API documentation: Containers

Containers are the home of properties and data. You can create a container with a .container.yaml file in the data_models/ directory. You can also create indexes and constraints according to the API specification.

MyActivity.container.yaml
externalId: MyActivity
usedFor: node
space: sp_activity_data
properties:
  id:
    type:
      type: text
      list: false
      collation: ucs_basic
    nullable: true
  title:
    type:
      type: text
      list: false
      collation: ucs_basic
    nullable: true
  description:
    type:
      type: text
      list: false
      collation: ucs_basic
    nullable: true

The example container definition creates a container with three properties: id, title, and description.

Note that sp_activity_data requires its own activity_data.space.yaml file in the data_models/ directory.

Views

API documentation: Views

Use views to ingest, query, and structure the data into meaningful entities in your data model. You can create a view with a .view.yaml file in the data_models/ directory.

MyActivity.view.yaml
externalId: MyActivity
name: MyActivity
description: 'An activity represents a set of maintenance tasks with multiple operations for individual assets. The activity is considered incomplete until all its operations are finished.'
version: '3'
space: sp_activity_model
properties:
  id:
    description: 'Unique identifier from the source, for instance, an object ID in SAP.'
    container:
      type: container
      space: sp_activity_data
      externalId: MyActivity
    containerPropertyIdentifier: id
  title:
    description: 'A title or brief description of the maintenance activity or work order.'
    container:
      type: container
      space: sp_activity_data
      externalId: MyActivity
    containerPropertyIdentifier: title
  description:
    description: 'A detailed description of the maintenance activity or work order.'
    container:
      type: container
      space: sp_activity_data
      externalId: MyActivity
    containerPropertyIdentifier: description

This example view configuration creates a view with three properties: id, title, and description.

The view references the properties from the container MyActivity in the sp_activity_data space. The view exists in a space called sp_activity_model, while the container exists in the sp_activity_data space.

Data models

API documentation: Data models

Use data models to structure the data into knowledge graphs with relationships between views using edges. From an implementation perspective, a data model is a collection of views.

You can create a data model with a .datamodel.yaml file in the data_models/ directory.

ActivityDataModel.datamodel.yaml
externalId: ActivityDataModel
name: My activity data model
version: '1'
space: sp_activity_model
description: 'A data model for structuring and querying activity data.'
views:
  - type: view
    externalId: MyActivity
    space: sp_activity_model
    version: '3'
  - type: view
    externalId: MyTask
    space: sp_activity_model
    version: '2'

The example data model configuration creates a data model with two views: MyActivity and MyTasks. The data model exists in a space called sp_activity_model together with the views.

Nodes

API documentation: Instances

Use nodes to populate a data model. You can create nodes with a .node.yaml file in the data_models/ directory.

myapp_config.node.yaml
autoCreateDirectRelations: True
skipOnVersionConflict: False
replace: True
nodes:
  - space: sp_config
    externalId: myapp_config
    sources:
      - source:
          space: sp_config
          externalId: MY_APP_Config
          version: '1'
          type: view
        properties:
          rootLocationConfigurations:
            - assetExternalId: 'my_root_asset_external_id'
              adminGroup:
                - gp_template_admins
          dataSpaceId: sp_activity_data
          modelSpaceId: sp_activity_model
          activityDataModelId: MyActivity
          activityDataModelVersion: '1'

This example node configuration creates a node instance with data that configures a node of the type MY_APP_Config with version '1' in the sp_config space. The instance has data that is read by MY_APP and used to configure the application.

The node instance is created in the sp_config space with myapp_config as the externalId. The example also configures a root location for the application and specifies how to find the application's data: in the sp_activity_data space with version 1 of the MyActivityview.

The autoCreateDirectRelations, skipOnVersionConflict, and replace arguments control the behavior of the node creation. They are optional, and are not part of the node data, but are API arguments. See the API documentation for more information.

Another example is node types. They are part of a data model schema (the description of how data is structured), and creates a type of node that can be created in the data model. This is an example of a YAML file with multiple node types defined.

myapp_config.node.yaml
- space: sp_my_model
  externalId: pump
- space: sp_my_model
  externalId: valve

Note: Requires CDF Toolkit v0.2.0 or later

You can define single or multiple nodes in a single YAML file, with or without the API arguments. The first example shows how to defined multiple nodes with API arguments, while the second example shows how to define multiple nodes without API arguments;

You can define a single node:

myapp_config.node.yaml
space: sp_my_model
externalId: heat_exchanger

Or a single node with API arguments:

myapp_config.node.yaml
autoCreateDirectRelations: True
skipOnVersionConflict: False
replace: True
node:
  space: sp_config
  externalId: myapp_config
  sources:
    - source:
  ...

Data sets

Resource directory: data_sets/

API documentation: Data sets

You can not delete data sets in CDF, but you can use the CDF Toolkit to create new data sets or update existing ones. You can create multiple data sets in the same YAML file.

piezīme

The data sets API uses an internal ID for the data set, while the YAML configuration files reference the external ID; dataSetExternalId. The CDF Toolkit resolves the external ID to the internal ID before sending the request to the CDF API. For an example, see files.

data_sets.yaml
- externalId: ds_asset_hamburg
  name: asset:hamburg
  description: This dataset contains asset data for the Hamburg location.
- externalId: ds_files_hamburg
  name: files:hamburg
  description: This dataset contains files for the Hamburg location.

This example configuration creates two data sets using the naming conventions for data sets.

Labels (dir: labels/)

Requires CDF Toolkit v0.2.0 or later

Resource directory: labels/

API documentation: Data sets

Labels can be found in the module's labels/ directory. You can define one or more labels in a single YAML file. The filename must end with Label, for example, my_equipment.Label.yaml.

The CDF Toolkit creates labels before files and other resources that reference them.

Example label definition:

my_equipment.Label.yaml
- externalId: label_pump
  name: Pump
  description: A pump is an equipment that moves fluids.
  dataSetExternalId: ds_labels_{{example_variable}}

piezīme

The CDF API doesn't support updating labels. When you update a label with the CDF Toolkit, it deletes the previous label and creates a new one.

Extraction pipelines

Resource directory: extraction_pipelines/

API documentation: Extraction pipelines

API documentation: Extraction pipeline config

Documentation: Extraction pipeline documentation

You can create and configure extraction pipelines by adding a separate YAML file for each pipeline and a configuration <pipeline>.config.yaml file per pipeline in the data_models/ directory.

ep_src_asset_hamburg_sap.yaml
externalId: 'ep_src_asset_hamburg_sap'
name: 'src:asset:hamburg:sap'
dataSetExternalId: 'ds_asset_{{location_name}}'
description: 'Asset source extraction pipeline with configuration for a DB extractor reading data from Hamburg SAP'
rawTables:
  - dbName: 'asset_hamburg_sap'
    tableName: 'assets'
source: 'sap'
documentation: "The DB Extractor is a general database extractor that connects to a database, executes one or several queries and sends the result to CDF RAW.\n\nThe extractor connects to a database over ODBC, which means that you need an ODBC driver for your database. If you are running the Docker version of the extractor, ODBC drivers for MySQL, MS SQL, PostgreSql and Oracle DB are preinstalled in the image. See the example config for details on connection strings for these. If you are running the Windows exe version of the extractor, you must provide an ODBC driver yourself. These are typically provided by the database vendor.\n\nFurther documentation is available [here](./docs/documentation.md)\n\nFor information on development, consider the following guides:\n\n * [Development guide](guides/development.md)\n * [Release guide](guides/release.md)"

This example configuration creates an extraction pipeline with the external ID ep_src_asset_hamburg_sap and the name src:asset:hamburg:sap.

The configuration allows an extractor installed inside a closed network to connect to CDF and download the extractor's configuration file. The CDF Toolkit expects the configuration file be in the same directory and have the same name as the extraction pipeline configuration file, but with the suffix .config.yaml. The configuration file is not strictly required, but the CDF Toolkit warns if the file is missing during the deploy process.

The extraction pipeline can be connected to a data set and to the RAW tables that the extractor will write to.

This is an example configuration file for the extraction pipeline above:

ep_src_asset_hamburg_sap.config.yaml
externalId: 'ep_src_asset_hamburg_sap'
description: 'DB extractor config reading data from Hamburg SAP'
config:
  logger:
    console:
      level: INFO
    file:
      level: INFO
      path: 'file.log'
  # List of databases
  databases:
    - type: odbc
      name: postgres
      connection-string: 'DSN={MyPostgresDsn}'
  # List of queries
  queries:
    - name: test-postgres
      database: postgres
      query: >
        SELECT

piezīme

The CDF Toolkit expects the config property to be valid YAML and will not validate the content of the config property beyond the syntax validation. The extractor that is configured to download the configuration file validates the content of the config property.

Files

Resource directory files/

API documentation: Files

uzmanību

Use the CDF Toolkit only to upload example data, and not as a general solution to ingest files into CDF.

Store the file(s) you want to upload in the module's files/ directory and create a one or more files.yaml file(s) to specify the metadata for each file. The name in the YAML file must match the filename of the file that should be uploaded.

Node: You can also use the template for uploading multiple files for uploading multiple files without specifying the configuration for each file.

Configuration for multiple files, my_file.pdf and my_other_file.pdf:

my_files.FileMetadata.yaml
- externalId: 'sharepointABC_my_file.pdf'
  name: 'my_file.pdf'
  source: 'sharepointABC'
  dataSetExternalId: 'ds_files_hamburg'
  directory: 'files'
  mimeType: 'application/pdf'
  metadata:
    origin: 'cdf-project-templates'
- externalId: 'sharepointABC_my_other_file.pdf'
  name: 'my_other_file.pdf'
  source: 'sharepointABC'
  dataSetExternalId: 'ds_files_hamburg'
  directory: 'files'
  mimeType: 'application/pdf'
  metadata:
    origin: 'cdf-project-templates'

Configuration for a single file, my_file.pdf:

my_file.FileMetadata.yaml
externalId: 'sharepointABC_my_file.pdf'
name: 'my_file.pdf'
source: 'sharepointABC'
dataSetExternalId: 'ds_files_hamburg'
directory: 'files'
mimeType: 'application/pdf'
metadata:
  origin: 'cdf-project-templates'

Note that the data set is referenced by the dataSetExternalId. The CDF Toolkit automatically resolves the external ID to the internal ID of the data set.

Uploading multiple files

To upload multiple files without specifying the configuration for each file individually, use this template format:

files.yaml
- externalId: sharepointABC_$FILENAME
  dataSetExternalId: ds_files_hamburg
  name: my_prefix_$FILENAME-my_suffix
  source: sharepointABC

This template is recognized by the CDF Toolkit by

It is a YAML file given in list/array format.
There is a single entry in the list.
The externalId contains the $FILENAME variable.

All files will be uploaded with the same properties except for the externalId and name properties. The $FILENAME variable will be replaced with the filename of the file being uploaded. Note that you can add a prefix and/or suffix to the filename. In the example, above if you have a file my_file.pdf it will be uploaded with the externalId sharepointABC_my_file.pdf and the name my_prefix_my_file.pdf-my_suffix.

Functions

Resource directory functions/

API documentation: Functions

The functions configuration files are stored in the module's functions/ directory. You can define one or more functions in a single YAML file. The CDF Toolkit creates the functions in the order they are defined in the file.

Place the function code and files to deploy to CDF as a function in a sub-directory with the same name as the externalId of the function.

Example function configuration:

my_functions.yaml
# The directory with the function code should have the same name
# and externalId as the function itself as defined below.
- name: 'example:repeater'
  externalId: 'fn_example_repeater'
  owner: 'Anonymous'
  description: 'Returns the input data, secrets, and function info.'
  metadata:
    version: '{{version}}'
  secrets:
    mysecret: '{{example_secret}}'
  envVars:
    # The two environment variables below are set by the Toolkit
    ENV_TYPE: '${CDF_BUILD_TYPE}'
    CDF_ENV: '${CDF_ENVIRON}'
  runtime: 'py311'
  functionPath: './src/handler.py'
  # Data set id for the zip file with the code that is uploaded.
  externalDataSetId: 'ds_files_{{default_location}}'

The functionPath is the path to the handler.py in the function code directory. In this case, handler.py is expected to be in the fn_example_repeater/src/ directory.

Note that externalDataSetId is used to reference the data set that the function itself is assigned to. The CDF Toolkit automatically resolves the external ID to the internal ID of the data set.

Function schedules

Resource directory: functions/

API documentation: Schedules

Schedules for functions are also stored in the module's functions/ directory. The CDF Toolkit expects the YAML file to include "schedule" as part of its file name, for example, schedules.yaml. You can specify more than one schedule in a single file.

To ensure that the function exists before the schedule is created, schedules are deployed after functions. Schedules don't have externalIds, and the CDF Toolkit identifies the schedule by a combination of the functionExternalId and the cronExpression. Consequently, you can't deploy two schedules for a function with the exact same schedule, and with two different sets of data. To work around this limitation, adjust the cronExpression for the schedules as necessary.

schedules.yaml
- name: 'daily-8am-utc'
  functionExternalId: 'fn_example_repeater'
  description: 'Run every day at 8am UTC'
  cronExpression: '0 8 * * *'
  data:
    breakfast: 'today: peanut butter sandwich and coffee'
    lunch: 'today: greek salad and water'
    dinner: 'today: steak and red wine'
  authentication:
    # Credentials to use to run the function in this schedule.
    # In this example, we just use the main deploy credentials, so the result is the same, but use a different set of
    # credentials (env variables) if you want to run the function with different permissions.
    clientId: {{ myfunction_clientId }}
    clientSecret: {{ myfunction_clientSecret }}
- name: 'daily-8pm-utc'
  functionExternalId: 'fn_example_repeater'
  description: 'Run every day at 8pm UTC'
  cronExpression: '0 20 * * *'
  data:
    breakfast: 'tomorrow: peanut butter sandwich and coffee'
    lunch: 'tomorrow: greek salad and water'
    dinner: 'tomorrow: steak and red wine'

The functionExternalId must match an existing function or a function deployed by the tool.

For transformations, the authentication property is optional. You can use it to specify different credentials for the schedule than the default credentials used by the CDF Toolkit. We recommend that you use credentials with the minimum required access rights.

piezīme

The functions YAML files must be located in the functions/ directory and not in sub-directories. This allows you to store YAML files that are not configuration files in sub-directories as part of the function's code.

Running functions locally

The CDF Toolkit supports running a function locally to test the function code before deploying it to CDF. The CDF Toolkit uses the same environment variables as the deployed function, and you can test the function with the same data and environment variables that will be used in CDF.

To run a function locally:

cdf-tk run function --local --payload=\{\"var1\":\ \"testdata\"\} --external_id fn_example_repeater --env dev my_project/

For more information about the command options, run cdf-tk run function --help .

The function runs in a virtual Python environment using the version of Python you use to run the CDF Toolkit. Running a function locally automatically performs a local build and resolves any config.<env>.yaml and environment variables. The requirements.txt file in the function code directory is used to install the required packages in the function's execution environment.

The environment variables configured for the function are injected into the virtual environment. To avoid a potential security issues, secrets are not supported.

The CDF Toolkit expects the payload to be a string on the command line. The string must be possible to interpret as a JSON dictionary, and will be converted to a dictionary that will be passed into the function as data. The input and output of the function will be written to files in the temporary build directory.

RAW

Resource directory: raw/

API documentation: RAW

The RAW configuration files are stored in the module's raw/ directory.

You can have one or more RAW configurations in a single YAML file. For example, multiple tables can be defined in a single file.

raw_tables.yaml
- dbName: sap
  tableName: workorder_mdi2_sap
- dbName: sap
  tableName: workorxder_mdi2_sap2

Or you can define one table per file.

sap_workorder_mdi2_sap.yaml
dbName: sap
tableName: workorder_mdi2_sap

Uploading data to RAW tables

uzmanību

Use the CDF Toolkit only to upload example data, and not as a general solution to ingest data into CDF.

You can upload data to RAW tables. You need to create one YAML file per table you want to upload. The data file can either be a .csv or .parquet file and must be named the same name as the YAML file.

This example configuration creates a RAW database called asset_hamburg_sap with a table called assets and populates it with data from the asset_hamburg_sap.csv file.

asset_hamburg_sap.yaml
dbName: asset_hamburg_sap
tableName: assets

asset_hamburg_sap.csv
"key","categoryId","sourceDb","parentExternalId","updatedDate","createdDate","externalId","isCriticalLine","description","tag","areaId","isActive"
"WMT:48-PAHH-96960","1152","workmate","WMT:48-PT-96960","2015-10-06 12:28:33","2013-05-16 11:50:16","WMT:48-PAHH-96960","false","VRD - PH STG1 COMP WTR MIST RELEASED : PRESSURE ALARM HIGH HIGH","48-PAHH-96960","1004","true"
"WMT:48-XV-96960-02","1113","workmate","WMT:48-XV-96960","2015-10-08 08:48:04","2009-06-26 15:36:40","WMT:48-XV-96960-02","false","VRD - PH STG1 COMP WTR MIST WTR RLS","48-XV-96960-02","1004","true"
"WMT:23-TAL-96183","1152","workmate","WMT:23-TT-96183","2015-10-06 12:28:32","2013-05-16 11:50:16","WMT:23-TAL-96183","false","VRD - PH 1STSTG COMP OIL TANK HEATER : TEMPERATURE ALARM LOW","23-TAL-96183","1004","true"

padoms

If the leftmost column in the CSV file is named key, the CDF Toolkit will use this column as the index column for the table.

Transformations

Resource directory: transformations/

API documentation: Transformations

The transformation configuration files are stored in the module's transformations/ directory. Each transformation has its own YAML file.

Each transformation can have a corresponding .sql file with the accompanying SQL code. The .sql file should have the same filename as the YAML file that defines the transformation (without the number prefix) or use the externalId of the transformation as the filename.

The transformation schedule is a separate resource type, tied to the transformation by external_id.

Example transformation configuration:

tr_asset_oid_workmate_asset_hierarchy.yaml
externalId: 'tr_asset_{{location_name}}_{{source_name}}_asset_hierarchy'
dataSetExternalId: 'ds_asset_{{location_name}}'
name: 'asset:{{location_name}}:{{source_name}}:asset_hierarchy'
destination:
  type: 'asset_hierarchy'
ignoreNullFields: true
isPublic: true
conflictMode: upsert
# Specify credentials separately like this:
# You can also use different credentials for running the transformations than the credentials you use to deploy.
authentication:
  clientId: {{ cicd_clientId }}
  clientSecret: {{ cicd_clientSecret }}
  tokenUri: {{ cicd_tokenUri }}
  # Optional: If idP requires providing the cicd_scopes
  cdfProjectName: {{ cdfProjectName }}
  scopes: {{ cicd_scopes }}
  # Optional: If idP requires providing the cicd_audience
  audience: {{ cicd_audience }}

tr_asset_oid_workmate_asset_hierarchy.schedule.yaml
externalId: 'tr_asset_{{location_name}}_{{source_name}}_asset_hierarchy'
interval: '{{scheduleHourly}}'
isPaused: {{ pause_transformations }}

tr_asset_oid_workmate_asset_hierarchy.sql
SELECT
  externalId                      as externalId,
  if(parentExternalId is null,
     '',
     parentExternalId)            as parentExternalId,
  tag                             as name,
  sourceDb                        as source,
  description,
  dataset_id('{{asset_dataset}}')     as dataSetId,
  to_metadata_except(
    array("sourceDb", "parentExternalId", "description"), *)
                                  as metadata
FROM
  `{{asset_raw_input_db}}`.`{{asset_raw_input_table}}`

The transformation can be configured with both a from and a to set of credentials (sourceOidcCredentials and destinationOidcCredentials). Use authentication: to configure both credentials to the same set of credentials. If you want to configure different credentials for the source and destination, use the sourceOidcCredentials and destinationOidcCredentials properties instead.

schedule is optional. If you do not specify a schedule, the transformation will be created, but not scheduled. You can then schedule it manually in the CDF UI or using the CDF API. Schedule is a separate API endpoint in CDF.

You can specify the SQL inline in the transformation YAML file, using the query property (str), but we recommend that you use a separate .sql file for readability.

In the above transformation, the transformation re-uses the globally defined credentials for the CDF Toolkit. For production use, we recommend that you instead use a service account with the minimum required access rights instead.

Configure two new variables in the config.yaml of the module:

abc_clientId: ${ABC_CLIENT_ID}
abc_clientSecret: ${ABC_CLIENT_SECRET}

In the environment (CI/CD pipeline), you need to set the ABC_CLIENT_ID and ABC_CLIENT_SECRET environment variables to the credentials of the application/service account configured in your identity provider for the transformation.

Transformation notifications

Requires CDF Toolkit v0.2.0 or later

Resource directory: transformations/_

API documentation: Transformation totifications

Transformation notifications are stored in the module's transformations/ directory. You can have one or multiple notifications in a single YAML file. The notification YAML file name must end with Notification, for example, my_transformation.Notification.yaml.

tr_notifications.Notification.yaml
- transformationExternalId: tr_first_transformation
  destination: john.smith@example.com
- transformationExternalId: tr_first_transformation
  destination: jane.smith@example.com

piezīme

CDF identifies notifications by their internal ID while the CDF Toolkit uses a combination of the transformation external ID and the destination to identify each notification. When you run cdf-tk clean you will delete all notifications for a transformation external ID and destination.

Time series

Resource directory: timeseries/

API documentation: Time-series

Use the CDF Toolkit only to upload example data, and not as a general solution to ingest time series into CDF. Normally, you'd create time series as part of ingesting data into CDF using data pipelines configured with corresponding data sets, databases, groups, and so on.

Store the file(s) you want to upload in the module's timeseries/ directory and create a single timeseries.yaml file to specify the time series to create.

timeseries.yaml
- externalId: 'pi_160696'
  name: 'VAL_23-PT-92504:X.Value'
  dataSetExternalId: ds_timeseries_hamburg
  isString: false
  metadata:
    compdev: '0'
    location5: '2'
    pointtype: Float32
    convers: '1'
    descriptor: PH 1stStgSuctCool Gas Out
    contextMatchString: 23-PT-92504
    contextClass: VAL
    digitalset: ''
    zero: '0'
    filtercode: '0'
    compdevpercent: '0'
    compressing: '0'
    tag: 'VAL_23-PT-92504:X.Value'
  isStep: false
  description: PH 1stStgSuctCool Gas Out
- externalId: 'pi_160702'
  name: 'VAL_23-PT-92536:X.Value'
  dataSetExternalId: ds_timeseries_hamburg
  isString: false
  metadata:
    compdev: '0'
    location5: '2'
    pointtype: Float32
    convers: '1'
    descriptor: PH 1stStgComp Discharge
    contextMatchString: 23-PT-92536
    contextClass: VAL
    digitalset: ''
    zero: '0'
    filtercode: '0'
    compdevpercent: '0'
    compressing: '0'
    tag: 'VAL_23-PT-92536:X.Value'

This configuration creates two timeseries in the ds_timeseries_hamburg data set with the external IDs pi_160696 and pi_160702.

Timeseries subscriptions

Requires CDF Toolkit v0.2.0 or later

Resource directory: timeseries/

API documentation: Timeseries subscriptions

Timeseries subscriptions are stored in the module's timeseries/ directory. We recommend to have a separate YAML file for each subscription. Use the DatapointSubscription suffix in the filename, for example my_subscription.DatapointSubscription.yaml.

The CDF toolkit create the timeseries subscription after the timeseries.

Example timeseries subscription configuration:

my_subscription.DatapointSubscription.yaml
externalId: my_subscription
name: My Subscription
description: All timeseries with externalId starting with ts_value
partitionCount: 1
filter:
  prefix:
    property:
      - externalId
    value: ts_value

Time series datapoints

Resource directory: timeseries_datapoints/

API documentation: Time-series

Use the CDF Toolkit only to upload example data, and not as a general solution to ingest time series into CDF. Normally, you'd create time series as part of process of ingesting data into CDF by configuring the data pipelines with corresponding data sets, databases, groups, and so on.

The time series datapoints are stored in the module's timeseries_datapoints/ directory. The time series must be created separately using configurations in the timeseries directory. See time series.

datapoints.csv
timestamp,pi_160696,pi_160702
2013-01-01 00:00:00,0.9430412044195982,0.9212588490581821
2013-01-01 01:00:00,0.9411303320132799,0.9212528389403117
2013-01-01 02:00:00,0.9394743147709556,0.9212779911470234
2013-01-01 03:00:00,0.9375842300608798,
2013-01-01 04:00:00,0.9355836846172971,0.9153202184209938

This .csv file loads data into the time series created in the previous example. The first column is the timestamp, and the following columns are the values for the time series at that timestamp.

padoms

If you specify the column name of the timestamp as timeshift_timestamp instead of timestamp, the CDF Toolkit automatically timeshifts the entire timeseries to end with today's date. This is useful, for example, for data where you want to have a time series that is always up to date.

Workflows

Requires CDF Toolkit v0.2.0 or later

Resource directory: workflows/

API documentation: Workflows

The workflows are stored in the module's workflows/ directory. A workflow has two types of resources: Workflow and WorkflowVersion. They are identified by the Workflow.yaml and WorkflowVersion.yaml suffixes. We recommend having one file per workflow and workflow version.

When creating, updating, and deleting workflows, the CDF Toolkit applies changes in the correct order based on the dependencies between the workflows and workflow versions. The CDF Toolkit create transformations and functions before the workflow versions to ensure that the workflow versions can reference them.

Example workflow:

my_workflow.Workflow.yaml
externalId: wf_my_workflow
description: A workflow for processing data

Example workflow version:

my_version.WorkflowVersion.yaml
workflowExternalId: wf_my_workflow
version: '1'
workflowDefinition:
  description: 'Run tasks in sequence'
  tasks:
    - externalId: '{{ workflow_external_id }}_function_task'
      type: 'function'
      parameters:
        function:
          externalId: 'fn_first_function'
          data: {}
        isAsyncComplete: false
      name: 'Task One'
      description: First task
      retries: 3
      timeout: 3600
      onFailure: 'abortWorkflow'
    - externalId: '{{ workflow_external_id }}_transformation_task'
      type: 'transformation'
      parameters:
        transformation:
          externalId: 'tr_first_transformation'
          concurrencyPolicy: fail
      name: 'Task Two'
      description: Second task
      retries: 3
      timeout: 3600
      onFailure: 'skipTask'
      dependsOn:
        - externalId: '{{ workflow_external_id }}_function_task'

YAML configuration reference

Groups​

Groups and group deletion​

ACL scoping​

Dataset scope​

Space scope​

Table scope​

Current user-scope​

Security categories​

Data models​

Spaces​

Containers​

Views​

Data models​

Nodes​

Data sets​

Labels (dir: labels/)​

Extraction pipelines​

Files​

Uploading multiple files​

Functions​

Function schedules​

Running functions locally​

RAW​

Uploading data to RAW tables​

Transformations​

Transformation notifications​

Time series​

Timeseries subscriptions​

Time series datapoints​

Workflows​

Groups

Groups and group deletion

ACL scoping

Dataset scope

Space scope

Table scope

Current user-scope

Security categories

Data models

Spaces

Containers

Views

Data models

Nodes

Data sets

Labels (dir: labels/)

Extraction pipelines

Files

Uploading multiple files

Functions

Function schedules

Running functions locally

RAW

Uploading data to RAW tables

Transformations

Transformation notifications

Time series

Timeseries subscriptions

Time series datapoints

Workflows