YAML reference library
The YAML resource configuration files are core to the Cognite Toolkit. Each of the files configures one of the resource types that are supported by the Cognite Toolkit and the CDF API. This article describes how to configure the different resource types.
The Cognite Toolkit bundles logically connected resource configuration files in modules, and each module stores the configuration files in directories corresponding to the resource types, called resource directories. The available resource directories are:
./<module_name>/
├── 3dmodels/
├── auth/
├── classic/
├── data_models/
├── data_sets/
├── extraction_pipelines/
├── files/
├── functions/
├── hosted_extractors/
├── locations/
├── raw/
├── robotics/
├── streamlit/
├── timeseries/
├── transformations/
└── workflows/
Note that a resource directory can host one or more configuration types. For example, the data_models/
directory hosts
the configuration files for spaces, containers, views, data models, and nodes. While the classic/
directory hosts the
configuration files for labels, assets, and sequences.
When you deploy, the Cognite Toolkit uses the CDF API to implement the YAML configurations in the CDF project.
In general, the format of the YAML files matches the API specification for the resource types. We recommend
that you use the externalId
of the resources as (part of) the name of the YAML file. This is to enable using the same resource
configuration across multiple CDF projects, for example, a development, staging and production project. Use number prefixes
(1.<filename.suffix>) to control the order of deployment within each resource type.
3D Models
Resource directory: 3dmodels/
Requires Cognite Toolkit v0.3.0 or later
API documentation: 3D models
3D model configurations are stored in the module's 3dmodels/ directory. You can have one or more 3D models in a single YAML file.
The filename must end with 3DModel
, for example, my_3d_model.3DModel.yaml
.
Example 3D model configuration:
name: my_3d_model
dataSetExternalId: ds_3d_models
metadata:
origin: cdf-toolkit
Assets
Resource directory: classic/
Requires Cognite Toolkit v0.3.0 or later
API documentation: Assets
Asset configurations are stored in the module's assets/ directory. You can have one or more asset in a single YAML file.
The filename must end with Asset
, for example, my_asset.Asset.yaml
.
Example Asset configuration:
externalId: my_root_asset
name: SAP hierarchy
description: The root asset in the SAP hierarchy
dataSetExternalId: ds_sap_assets
source: SAP
metadata:
origin: cdf-toolkit
The asset API uses an internal ID for the data set, while the YAML configuration files reference the external ID; dataSetExternalId
. The Cognite Toolkit resolves the external ID to the internal ID before sending the request to the CDF API.
Table formats
In addition, to yaml
Cognite Toolkit also supports csv
and parquet
formats for asset configurations. As with the yaml
format,
the filename must end with Asset
, for example, my_asset.Asset.csv
.
externalId,name,description,dataSetExternalId,source,metadata.origin
my_root_asset,SAP hierarchy,The root asset in the SAP hierarchy,ds_sap_assets,SAP,cdf-toolkit
Note that the column names must match the field names in the yaml
configuration. The exception is the metadata
field,
which is a dictionary in the yaml
configuration, but a string in the csv
configuration. This is solved by using the
notation metadata.origin
column in the csv
configuration.
Groups
Resource directory: auth/
API documentation: Groups
The group configuration files are stored in the module's auth/ directory. Each group has a separate YAML file.
The name
field is used as a unique identifier for the group. If you change the name of the group manually in CDF,
it will be treated as a different group and will be ignored by the Cognite Toolkit.
Example group configuration:
name: 'my_group'
sourceId: '{{mygroup_source_id}}'
metadata:
origin: 'cognite-toolkit'
capabilities:
- projectsAcl:
actions:
- LIST
- READ
scope:
all: {}
We recommend using the metadata:origin
property to indicate that the group is created by the Cognite Toolkit
Populate the sourceId
with the group ID from CDF project's identity provider.
You can specify each ACL capability in CDF as in the projectsAcl
example above. Scoping to dataset, space, RAW table,
current user, or pipeline is also supported (see ACL scoping).
Groups and group deletion
If you delete groups with the cdf clean
or cdf deploy --drop
command, the Cognite Toolkit skips the groups that the running
user or service account is a member of. This prevents the cleaning operation from removing access rights from
the running user and potentially locking the user out from further operation.
ACL scoping
Dataset scope
Use to restrict access to data in a specific data set.
- threedAcl:
actions:
- READ
scope:
datasetScope: { ids: ['my_dataset'] }
The groups API uses an internal ID for the data set, while the YAML configuration files reference the external ID. The Cognite Toolkit resolves the external ID to the internal ID before sending the request to the CDF API.
Space scope
Use to restrict access to data in a data model space.
- dataModelInstancesAcl:
actions:
- READ
scope:
spaceIdScope: { spaceIds: ['my_space'] }
Table scope
Use to restrict access to a database or a table in a database.
- rawAcl:
actions:
- READ
- WRITE
scope:
tableScope:
dbsToTables:
my_database:
tables: []
Current user-scope
Use to restrict actions to the groups the user is a member of.
- groupsAcl:
actions:
- LIST
- READ
scope:
currentuserscope: {}
Security categories
Requires Cognite Toolkit v0.2.0 or later
Resource directory: auth/
API documentation: Security categories
The security categories are stored in the module's auth/ directory. You can have one or more security categories in a single YAML file.
We recommend that you start security category names with sc_
and use _
to separate words. The file name is not significant,
but we recommend that you name it after the security categories it creates.
- name: sc_my_security_category
- name: sc_my_other_security_category
Data models
Resource directory: data_models/
API documentation: Data modeling
The data model configurations are stored in the module's data_models directory.
A data model consists of a set of data modeling entities: one or more spaces, containers, views, and data models. Each entity has its own file with a suffix to indicate the entity type: my.space.yaml, my.container.yaml, my.view.yaml, my.datamodel.yaml.
You can also use the Cognite Toolkit to create nodes to keep configuration for applications (for instance, InField) and to create node types that are part of the data model. Define nodes in files with the .node.yaml suffix.
The Cognite Toolkit applies configurations in the order of dependencies between the entity types: first spaces, next containers, then views, and finally data models.
If there are dependencies between the entities of the same type, you can use prefix numbers in the filename to have the Cognite Toolkit apply the files in the correct order.
The Cognite Toolkit supports using subdirectories to organize the files, for example:
data_models/
┣ 📂 containers/
┣ 📂 views/
┣ 📂 nodes/
┣ 📜 my_data_model.datamodel.yaml
┗ 📜 data_model_space.space.yaml
Spaces
API documentation: Spaces
Spaces are the top-level entities and is the home of containers, views, and data models. You can create a space with a .space.yaml file in the data_models/ directory.
space: sp_cognite_app_data
name: cognite:app:data
description: Space for InField app data
CDF doesn't allow a space to be deleted unless it's empty. If a space contains, for example, nodes that aren't governed by the Cognite Toolkit, the Cognite Toolkit will not delete the space.
Containers
API documentation: Containers
Containers are the home of properties and data. You can create a container with a .container.yaml file in the data_models/ directory. You can also create indexes and constraints according to the API specification.
externalId: MyActivity
usedFor: node
space: sp_activity_data
properties:
id:
type:
type: text
list: false
collation: ucs_basic
nullable: true
title:
type:
type: text
list: false
collation: ucs_basic
nullable: true
description:
type:
type: text
list: false
collation: ucs_basic
nullable: true
The example container definition creates a container with three properties: id
, title
, and description
.
Note that sp_activity_data
requires its own activity_data.space.yaml file in the data_models/ directory.
Views
API documentation: Views
Use views to ingest, query, and structure the data into meaningful entities in your data model. You can create a view with a .view.yaml file in the data_models/ directory.
externalId: MyActivity
name: MyActivity
description: 'An activity represents a set of maintenance tasks with multiple operations for individual assets. The activity is considered incomplete until all its operations are finished.'
version: '3'
space: sp_activity_model
properties:
id:
description: 'Unique identifier from the source, for instance, an object ID in SAP.'
container:
type: container
space: sp_activity_data
externalId: MyActivity
containerPropertyIdentifier: id
title:
description: 'A title or brief description of the maintenance activity or work order.'
container:
type: container
space: sp_activity_data
externalId: MyActivity
containerPropertyIdentifier: title
description:
description: 'A detailed description of the maintenance activity or work order.'
container:
type: container
space: sp_activity_data
externalId: MyActivity
containerPropertyIdentifier: description
This example view configuration creates a view with three properties: id
, title
, and description
.
The view references the properties from the container MyActivity
in the sp_activity_data
space. The view
exists in a space called sp_activity_model
, while the container exists in the sp_activity_data
space.
Data models
API documentation: Data models
Use data models to structure the data into knowledge graphs with relationships between views using edges. From an implementation perspective, a data model is a collection of views.
You can create a data model with a .datamodel.yaml file in the data_models/ directory.
externalId: ActivityDataModel
name: My activity data model
version: '1'
space: sp_activity_model
description: 'A data model for structuring and querying activity data.'
views:
- type: view
externalId: MyActivity
space: sp_activity_model
version: '3'
- type: view
externalId: MyTask
space: sp_activity_model
version: '2'
The example data model configuration creates a data model with two views: MyActivity
and MyTasks
.
The data model exists in a space called sp_activity_model
together with the views.
Nodes
API documentation: Instances
Use nodes to populate a data model. You can create nodes with a .node.yaml file in the data_models/ directory.
- space: sp_config
externalId: myapp_config
sources:
- source:
space: sp_config
externalId: MY_APP_Config
version: '1'
type: view
properties:
rootLocationConfigurations:
- assetExternalId: 'my_root_asset_external_id'
adminGroup:
- gp_template_admins
dataSpaceId: sp_activity_data
modelSpaceId: sp_activity_model
activityDataModelId: MyActivity
activityDataModelVersion: '1'
This example node configuration creates a node instance with data that configures a node of the
type MY_APP_Config
with version '1' in the sp_config
space. The instance has data that is read by MY_APP and used to configure the application.
The node instance is created in the sp_config
space with myapp_config
as the externalId
. The example also
configures a root location for the application and specifies how to find the application's data: in the sp_activity_data
space with version 1
of the MyActivity
view.
Another example is node types. They are part of a data model schema (the description of how data is structured), and creates a type of node that can be created in the data model. This is an example of a YAML file with multiple node types defined.
- space: sp_my_model
externalId: pump
- space: sp_my_model
externalId: valve
Edges
Requires Cognite Toolkit v0.4.0 or later
API documentation: Instances
Use edges to define connections between nodes in a data model. You can create edges with a .edge.yaml file in the data_models/ directory.
You can have one or more edges in a single YAML file. The filename must end with Edge
, for example, my_edge.Edge.yaml
.
space: sp_instance
externalId: 'MyEdge'
startNode:
space: sp_instance
externalId: 'startNode'
endNode:
space: sp_instance
externalId: 'endNode'
type:
space: sp_schema
externalId: 'AnEdgeType'
sources:
- source:
space: sp_schema
externalId: 'MyView'
version: v1
type: 'view'
properties:
myProperty: 'myValue'
Data sets
Resource directory: data_sets/
API documentation: Data sets
You can not delete data sets in CDF, but you can use the Cognite Toolkit to create new data sets or update existing ones. You can create multiple data sets in the same YAML file.
The data sets API uses an internal ID for the data set, while the YAML configuration files reference the
external ID; dataSetExternalId
. The Cognite Toolkit resolves the external ID to the internal ID before
sending the request to the CDF API. For an example, see files.
- externalId: ds_asset_hamburg
name: asset:hamburg
description: This dataset contains asset data for the Hamburg location.
- externalId: ds_files_hamburg
name: files:hamburg
description: This dataset contains files for the Hamburg location.
This example configuration creates two data sets using the naming conventions for data sets.
Events
Requires Cognite Toolkit v0.4.0 or later
Resource directory: classic/
API documentation: Events
Events can be found in the module's classic/
directory. You can define one or more events in a single YAML file.
The filename must end with Event
, for example, my_event.Event.yaml
.
The Cognite Toolkit ensures that dependent resources are created before the events. For example, if you reference an asset or a data set in an event, the Cognite Toolkit creates the asset or data set before creating the event.
Example event definition:
externalId: MyEvent
dataSetExternalId: ds_complete_org
startTime: 1732959346052
endTime: 1732959346052
type: 'success'
subtype: 'info'
description: 'My event description'
metadata:
key: value
assetExternalIds:
- MyAsset
- MyAsset2
source: 'my_source'
The data set is referenced by the dataSetExternalId
. The Cognite Toolkit automatically
resolves the external ID to the internal ID of the data set.
The asset is referenced by the assetExternalId
. The Cognite Toolkit automatically
resolves the external ID to the internal ID of the data set.
Labels
Requires Cognite Toolkit v0.3.0 or later
Resource directory: classic/ (labels/ in v0.2.0
)
API documentation: Data sets
Labels can be found in the module's classic/
directory. You can define one or more labels in a single YAML file.
The filename must end with Label
, for example, my_equipment.Label.yaml
.
The Cognite Toolkit creates labels before files and other resources that reference them.
Example label definition:
- externalId: label_pump
name: Pump
description: A pump is an equipment that moves fluids.
dataSetExternalId: ds_labels_{{example_variable}}
The CDF API doesn't support updating labels. When you update a label with the Cognite Toolkit, it deletes the previous label and creates a new one.
Extraction pipelines
Resource directory: extraction_pipelines/
API documentation: Extraction pipelines
API documentation Cofiguration: Extraction pipeline config
Documentation: Extraction pipeline documentation
Extractor pipelines and configurations are stored in the module's extraction_pipelines/ directory. You can define one or
more extraction pipelines in a single YAML file. It is, however, most common to have one pipeline per file. In
addition, one or more the extraction pipeline configurations must be stored in a separate file. Extraction pipeline
configurations are detected by the .config
suffix, while all other files are considered extraction pipelines.
externalId: 'ep_src_asset_hamburg_sap'
name: 'src:asset:hamburg:sap'
dataSetExternalId: 'ds_asset_{{location_name}}'
description: 'Asset source extraction pipeline with configuration for a DB extractor reading data from Hamburg SAP'
rawTables:
- dbName: 'asset_hamburg_sap'
tableName: 'assets'
source: 'sap'
documentation: "The DB Extractor is a general database extractor that connects to a database, runs one or several queries, and sends the result to CDF RAW.\n\nThe extractor connects to a database over ODBC, which means that you need an ODBC driver for your database. If you are running the Docker version of the extractor, ODBC drivers for MySQL, MS SQL, PostgreSql and Oracle DB are preinstalled in the image. See the example config for details on connection strings for these. If you are running the Windows exe version of the extractor, you must provide an ODBC driver yourself. These are typically provided by the database vendor.\n\nFurther documentation is available [here](./docs/documentation.md)\n\nFor information on development, consider the following guides:\n\n * [Development guide](guides/development.md)\n * [Release guide](guides/release.md)"
This example configuration creates an extraction pipeline with the external ID ep_src_asset_hamburg_sap
and the name src:asset:hamburg:sap
.
The configuration allows an extractor installed inside a closed network to connect to CDF and download the extractor's configuration file.
The Cognite Toolkit expects the configuration file be in the same directory and have the same name as the extraction
pipeline configuration file, but with the suffix .config.yaml
. The configuration file is not strictly required,
but the Cognite Toolkit warns if the file is missing during the deployment process.
The extraction pipeline can be connected to a data set and to the RAW tables that the extractor will write to.
This is an example configuration file for the extraction pipeline above:
externalId: 'ep_src_asset_hamburg_sap'
description: 'DB extractor config reading data from Hamburg SAP'
config:
logger:
console:
level: INFO
file:
level: INFO
path: 'file.log'
# List of databases
databases:
- type: odbc
name: postgres
connection-string: 'DSN={MyPostgresDsn}'
# List of queries
queries:
- name: test-postgres
database: postgres
query: >
SELECT
The Cognite Toolkit expects the config
property to be valid YAML and will not validate the content of the config property
beyond the syntax validation. The extractor that is configured to download the configuration file validates the content of the config
property.
Files
Resource directory files/
API documentation: Files
CogniteFile
Requires Cognite Toolkit v0.3.0 or later
Use the Cognite Toolkit only to upload example data, and not as a general solution to ingest files into CDF.
Files can be found in the module's files/
directory. You can define the metadata of one or more files in a single or multiple YAML file(s).
We support the classic /files
endpoints and CogniteFile
through data modeling endpoints (models/instances
).
To use the CogniteFile
, you must specify the CogniteFile
suffix in the filename, for example, my_file.CogniteFile.yaml
. All other YAML files are considered classic files, but we recommend that you use FileMetadata
as the suffix for classic files, for example, my_file.FileMetadata.yaml
.
To upload a file with the metadata, the name
in the YAML
file must match the filename of the file that should be uploaded.
Note: You can also use the template for uploading multiple files to upload multiple files without specifying the metadata for each file.
Below is an example of a classic file metadata configuration for multiple files, my_file.pdf
and my_other_file.pdf
:
- externalId: 'sharepointABC_my_file.pdf'
name: 'my_file.pdf'
source: 'sharepointABC'
dataSetExternalId: 'ds_files_hamburg'
directory: 'files'
mimeType: 'application/pdf'
metadata:
origin: 'cdf-project-templates'
- externalId: 'sharepointABC_my_other_file.pdf'
name: 'my_other_file.pdf'
source: 'sharepointABC'
dataSetExternalId: 'ds_files_hamburg'
directory: 'files'
mimeType: 'application/pdf'
metadata:
origin: 'cdf-project-templates'
Classic metadata configuration for a single file, my_file.pdf
:
externalId: 'sharepointABC_my_file.pdf'
name: 'my_file.pdf'
source: 'sharepointABC'
dataSetExternalId: 'ds_files_hamburg'
directory: 'files'
mimeType: 'application/pdf'
metadata:
origin: 'cdf-project-templates'
The data set is referenced by the dataSetExternalId
. The Cognite Toolkit automatically
resolves the external ID to the internal ID of the data set.
Below is an example of a CogniteFile
metadata configuration for a single file, my_file.pdf
:
externalId: 'sharepointABC_my_file.pdf'
space: 'sp_files_hamburg'
name: 'my_file.pdf'
description: 'This is a file uploaded from SharePoint.'
tags:
- 'file'
- 'sharepoint'
sourceId: 'sharepointABC'
sourceContext: 'sharepointABC'
source:
space: 'sp_files_hamburg'
externalId: 'sharepointABCSource'
sourceCreatedTime: '2022-01-01T00:00:00Z'
sourceUpdatedTime: '2022-01-01T00:00:00Z'
assets:
- space: 'sp_assets'
externalId: 'my_root_asset'
mimeType: 'application/pdf'
directory: 'files'
category:
- space: 'sp_categories'
externalId: 'sc_my_category'
Below is an example of a CogniteFile
metadata configuration for multiple files, my_file.pdf
and my_other_file.pdf
:
- space: 'sp_files_hamburg'
externalId: 'sharepointABC_my_file.pdf'
name: 'my_file.pdf'
description: 'This is a file uploaded from SharePoint.'
- space: 'sp_files_hamburg'
externalId: 'sharepointABC_my_other_file.pdf'
name: 'my_other_file.pdf'
description: 'This is another file uploaded from SharePoint.'
Uploading multiple files
To upload multiple files without specifying the metadata configuration for each file individually, use this template format
for the FileMetadata
configuration file:
- externalId: sharepointABC_$FILENAME
dataSetExternalId: ds_files_hamburg
name: $FILENAME
source: sharepointABC
or for the CogniteFile
configuration file:
- space: 'sp_files_hamburg'
externalId: sharepointABC_$FILENAME
name: $FILENAME
description: 'This is a file uploaded from SharePoint.'
This template is recognized by the Cognite Toolkit by
- It is a YAML file given in
list/array
format. - There is a single entry in the list.
- The
externalId
contains the$FILENAME
variable.
All files will be uploaded with the same properties except for the externalId
and name
properties.
The $FILENAME
variable will be replaced with the filename of the file being uploaded.
Functions
Resource directory functions/
API documentation: Functions
The function configuration files are stored in the module's functions/ directory. You can define one or more functions in a single or multiple YAML file(s). The Cognite Toolkit creates the functions in the order they are defined in the file.
The functions YAML files must be located in the functions/ directory and not in subdirectories. This allows you to store YAML files that are not configuration files in subdirectories as part of the function's code.
Place the function code and files to deploy to CDF as a function in a subdirectory with the same name
as the externalId
of the function.
Example function configuration:
Folder structure, including a function schedule:
./functions/
├── my_function.yaml
├── schedules.yaml
└── fn_example_repeater/
Configuration file:
# The directory with the function code must have the same name
# and externalId as the function itself as defined below.
- name: 'example:repeater'
externalId: 'fn_example_repeater'
owner: 'Anonymous'
description: 'Returns the input data, secrets, and function info.'
metadata:
version: '{{version}}'
secrets:
mysecret: '{{example_secret}}'
envVars:
# The two environment variables below are set by the Toolkit
ENV_TYPE: '${CDF_BUILD_TYPE}'
CDF_ENV: '${CDF_ENVIRON}'
runtime: 'py311'
functionPath: './src/handler.py'
# Data set id for the zip file with the code that is uploaded.
dataSetExternalId: 'ds_files_{{default_location}}'
The functionPath
is the path to the handler.py in the function code directory. In this case, handler.py
is expected to be in the fn_example_repeater/src/ directory.
Note that dataSetExternalId
is used to reference the data set that the function itself is assigned to.
The Cognite Toolkit automatically resolves the external ID to the internal ID of the data set.
Function schedules
Resource directory: functions/
API documentation: Schedules
Schedules for functions are also stored in the module's functions/ directory. The Cognite Toolkit expects the YAML file to include "schedule" as part of its file name, for example, schedules.yaml. You can specify more than one schedule in a single file.
To ensure that the function exists before the schedule is created, schedules are deployed after functions
Schedules don't have externalId
s, and the Cognite Toolkit identifies the schedule by a combination of the
functionExternalId
and the name
. Consequently, you can't deploy two schedules for a function with the
exact same name, and with two different sets of data.
- name: 'daily-8am-utc'
functionExternalId: 'fn_example_repeater'
description: 'Run every day at 8am UTC'
cronExpression: '0 8 * * *'
data:
breakfast: 'today: peanut butter sandwich and coffee'
lunch: 'today: greek salad and water'
dinner: 'today: steak and red wine'
authentication:
# Credentials to use to run the function in this schedule.
# In this example, we just use the main deploy credentials, so the result is the same, but use a different set of
# credentials (env variables) if you want to run the function with different permissions.
clientId: { { myfunction_clientId } }
clientSecret: { { myfunction_clientSecret } }
- name: 'daily-8pm-utc'
functionExternalId: 'fn_example_repeater'
description: 'Run every day at 8pm UTC'
cronExpression: '0 20 * * *'
data:
breakfast: 'tomorrow: peanut butter sandwich and coffee'
lunch: 'tomorrow: greek salad and water'
dinner: 'tomorrow: steak and red wine'
The functionExternalId
must match an existing function or a function deployed by the tool.
For schedules, the authentication
property is optional but recommended. You can use it to specify
credentials for the schedule that are different from the default credentials used by the Cognite Toolkit. We recommend using credentials with the minimum required access rights to run the function . If
you don't specify the authentication
property, the Cognite Toolkit uses its own credentials to run the function.
This is not recommended for production environments, as the Cognite Toolkit service principal typically has
full access to the CDF project.
Hosted Extractors
Requires Cognite Toolkit v0.3.0 or later
Resource directory: hosted_extractors/
Hosted extractor documentation: Hosted extractors
The hosted extractors are stored in the module's hosted_extractors/
directory. A hosted extractor has four types of
resources: Source
, Destination
, Job
, and Mapping
. Each resource type has its suffix in the filename, for example,
my_kafka.Source.yaml
.
When creating, updating, and deleting hosted extractors, the Cognite Toolkit applies changes in the correct order based on the dependencies between the source, destination, job, and mapping versions.
Source
API documentation: Hosted extractor source
Below is an example of a source configuration file.
type: mqtt5
externalId: my_mqtt
host: mqtt.example.com
port: 1883
authentication:
username: myuser
password: ${my_mqtt_password}
Destination
API documentation: Hosted extractor destination
Below is an example of a destination configuration file.
externalId: my_cdf
credentials:
clientId: ${my_cdf_clientId}
clientSecret: ${my_cdf_clientSecret}
targetDataSetExternalId: ds_files_hamburg
The Cognite Toolkit automatically resolves the external ID to the internal ID of the data set.
Job
API documentation: Hosted extractor job
Below is an example of a job configuration file.
externalId: my_mqtt_to_cdf
sourceId: my_mqtt
destinationId: my_cdf
format:
type: value
encoding: utf-16
compression: gzip
Mapping
API documentation: Hosted extractor mapping
Below is an example of a mapping configuration file.
externalId: my_mqtt_to_cdf
mapping:
expression: '[{
"type": "datapoint",
"timestamp": to_unix_timestamp(input.timestamp, "%Y-%m-%dT%H:%M:%S"),
"value": try_float(input.value, null),
"externalId": input.tag
}].filter(datapoint => datapoint.value is not null)'
input:
type: json
published: true
For more information about the mapping configuration, see the Hosted extractor documentation.
Locations
Requires Cognite Toolkit v0.3.0 or later
Resource directory: locations/
The location filters are stored in the module's locations/ directory. You can have one or multiple locations in a
single YAML file. The location YAML file name must end with LocationFilter
, for example, my.LocationFilter.yaml
.
Location filters work with data modeling or with asset-centric resource types. The below example shows a location filter for data modeling.
externalId: unique-external-id-123
name: 'Example location name'
description: 'This is a description of the location.'
parentExternalId: 'The parent location external ID'
dataModels:
- externalId: CogniteProcessIndustries
space: cdf_idm
version: v1
instanceSpaces:
- instance-space-main
- instance-space-secondary
Asset-centric location filters apply to assets, time series, events, sequences, and events. You can use a shared filter for all of these:
externalId: unique-external-id-123
name: 'Example location name'
parentId: 1
description: 'This is a description of the location.'
assetCentric:
dataSetExternalIds:
- ds_data_set_890
assetSubtreeIds:
- externalId: general-subtree-id-890
externalIdPrefix: general-prefix
It's common to use either dataSetExternalId
, assetSubtreeId
, or externalIdPrefix
in the filter. The example
below illustrates all the options.
You can also set filters for specific resource types, such as assets
, events
, files
,
timeseries
, and sequences
:
externalId: unique-external-id-123
name: 'Example location name'
parentId: 1
description: 'This is a description of the location.'
assetCentric:
assets:
dataSetExternalIds:
- ds_data_set_123
assetSubtreeIds:
- externalId: root-asset
externalIdPrefix: asset-prefix
events:
dataSetExternalIds:
- ds_data_set_456
assetSubtreeIds:
- externalId: event-subtree-id-678
externalIdPrefix: event-prefix
files:
dataSetExternalIds:
- ds_data_set_789
assetSubtreeIds:
- externalId: file-subtree-id-901
externalIdPrefix: file-prefix
timeseries:
dataSetExternalIds:
- ds_data_set_234
assetSubtreeIds:
- externalId: timeseries-subtree-id-234
externalIdPrefix: timeseries-prefix
sequences:
dataSetExternalIds:
- ds_data_set_567
assetSubtreeIds:
- externalId: sequence-subtree-id-567
externalIdPrefix: sequence-prefix
The location filter API uses an internal ID for the parentId
and dataSetId
, while the YAML configuration files
reference the external ID; parentExternalid
and dataSetExternalId
.
The Cognite Toolkit resolves the external ID to the internal ID before sending the request to the CDF API.
RAW
Resource directory: raw/
API documentation: RAW
The RAW configuration files are stored in the module's raw/ directory.
You can have one or more RAW configurations in a single YAML file. For example, multiple tables can be defined in a single file.
- dbName: sap
tableName: workorder_mdi2_sap
- dbName: sap
tableName: workorxder_mdi2_sap2
Or you can define one table per file.
dbName: sap
tableName: workorder_mdi2_sap
Uploading data to RAW tables
Use the Cognite Toolkit only to upload example data, and not as a general solution to ingest data into CDF. However, there are use cases where uploading data to RAW tables can be useful see Use case: Uploading data to RAW tables.
You can upload data to RAW tables. You need to create one YAML file per table you want to upload. The data file can either be a .csv or .parquet file and must be named the same name as the YAML file.
This example configuration creates a RAW database called asset_hamburg_sap
with a table called assets
and
populates it with data from the asset_hamburg_sap.csv file.
dbName: asset_hamburg_sap
tableName: assets
"key","categoryId","sourceDb","parentExternalId","updatedDate","createdDate","externalId","isCriticalLine","description","tag","areaId","isActive"
"WMT:48-PAHH-96960","1152","workmate","WMT:48-PT-96960","2015-10-06 12:28:33","2013-05-16 11:50:16","WMT:48-PAHH-96960","false","VRD - PH STG1 COMP WTR MIST RELEASED : PRESSURE ALARM HIGH HIGH","48-PAHH-96960","1004","true"
"WMT:48-XV-96960-02","1113","workmate","WMT:48-XV-96960","2015-10-08 08:48:04","2009-06-26 15:36:40","WMT:48-XV-96960-02","false","VRD - PH STG1 COMP WTR MIST WTR RLS","48-XV-96960-02","1004","true"
"WMT:23-TAL-96183","1152","workmate","WMT:23-TT-96183","2015-10-06 12:28:32","2013-05-16 11:50:16","WMT:23-TAL-96183","false","VRD - PH 1STSTG COMP OIL TANK HEATER : TEMPERATURE ALARM LOW","23-TAL-96183","1004","true"
If the leftmost column in the CSV file is named key
, the Cognite Toolkit will use this column as the index column for the table.
Use case: Uploading data to RAW tables
The Cognite Toolkit governs resource configurations, typically metadata rather than data. For example, a sensor's name, location, type, and the asset it's attached to are metadata, while the actual sensor readings are data.
Metadata is typically available from a source system. You can, for example, use an extraction pipeline to extract and ingest the metadata to CDF.
If the metadata isn't available for extraction from a source system, a potential option is to store the metadata as .csv files and have them version-controlled, for example, in a Git repository. Next, you can use the Cognite Toolkit to deploy the metadata to RAW tables in CDF. Then, you can use Transformations to write the metadata to the correct destination resources. This way, you can track changes to the metadata and use the Git repository as the single source of truth for the metadata.
Relationships
Requires Cognite Toolkit v0.4.0 or later
Resource directory: classic/
API documentation: Relationships
Relationships can be found in the module's classic/
directory. You can define one or more relationships in a single YAML file. The filename must end with Relationship
, for example, my_relationship.Relationship.yaml
.
The Cognite Toolkit ensures that dependent resources are created before the relationships. For example, if you reference an asset or a data set in a relationship, the Cognite Toolkit creates the asset or data set before creating the relationship.
Example relationship definition:
externalId: MyRelationship
sourceType: asset
sourceExternalId: MyAsset
targetType: event
targetExternalId: MyEvent
dataSetExternalId: ds_complete_org
confidence: 0.42
The data set is referenced by the dataSetExternalId
. The Cognite Toolkit automatically
resolves the external ID to the internal ID of the data set.
Robotics
Requires Cognite Toolkit v0.3.0 or later
Resource directory: robotics/
API documentation: The Robotics API is not yet publicly available.
The Robotics configuration files are stored in the module's robotics/ directory. There are multiple types of Robotics
resources: RobotCapability
, Map
, Location
, Frame
, DataPostProcessing
. You can have one or more resources
in a single YAML file, but all resources in the file must be of the same type. Each resource type has its suffix in the
filename, for example, my_robot_capability.RobotCapability.yaml
.
Robot capabilities
Below is an example of a RobotCapability configuration file.
name: ptz
externalId: ptz
method: ptz
description: Description of the PTZ camera capability
inputSchema:
$schema: http://json-schema.org/draft-07/schema#
id: robotics/schemas/0.1.0/capabilities/ptz
title: PTZ camera capability input
type: object
properties:
method:
type: string
parameters:
type: object
properties:
tilt:
type: number
minimum: -90
maximum: 90
pan:
type: number
minimum: -180
maximum: 180
zoom:
type: number
minimum: 0
maximum: 100
required:
- tilt
- pan
- zoom
required:
- method
- parameters
additionalProperties: false
dataHandlingSchema:
$schema: http://json-schema.org/draft-07/schema#
id: robotics/schemas/0.1.0/data_handling/ptz
type: object
properties:
uploadInstructions:
type: object
properties:
image:
type: object
properties:
method:
const: uploadFile
parameters:
type: object
properties:
filenamePrefix:
type: string
required:
- filenamePrefix
required:
- method
- parameters
additionalProperties: false
additionalProperties: false
required:
- uploadInstructions
In the above schema, we have:
- Required properties:
name
,externalId
andmethod
. - Optional properties:
description
,inputSchema
,dataHandlingSchema
. inputSchema
anddataHandlingSchema
are objects and are not verified by the Cognite Toolkit, they are passed as is to the Robotics API.
Map
Below is an example of a Map configuration file.
name: Robot navigation map
externalId: robotMap
mapType: POINTCLOUD
description: A map of the robot's navigation environment
frameExternalId: robotFrame
data:
filename: map.ply
mimeType: application/octet-stream
locationExternalId: robotLocation
scale: 1.0
In the above schema, we have:
- Required properties:
name
,externalId
, andmapType
. - Optional properties:
description
,data
,locationExternalId
,scale
. MapType
has allowed valuesWAYPOINTMAP
,THREEDMODEL
,TWODMAP
, andPOINTCLOUD
data
is an object that is not verified by the Cognite Toolkit, it is passed as is to the Robotics API.
Location
Below is an example of a Location configuration file.
name: Water treatment plant
externalId: waterTreatmentPlant1_Windows_3_11_8
description: Original Description
In the above schema, we have:
- Required properties:
name
andexternalId
. - Optional properties:
description
.
Frame
Below is an example of a Frame configuration file.
name: Some coordinate frame
externalId: someCoordinateFrame
transform:
parentFrameExternalId: rootCoordinateFrame
translation:
x: 0
y: 0
z: 0
orientation:
x: 0
y: 0
z: 0
w: 1
In the above schema, we have:
- Required properties:
name
andexternalId
. - Optional properties:
transform
. - In
transform
,- Required properties:
parentFrameExternalId
,translation
,orientation
. - Optional properties: None.
- For
translation
andorientation
, all properties are required.
- Required properties:
Data post-processing
Below is an example of a DataPostProcessing configuration file.
name: Read dial gauge
externalId: read_dial_gauge
method: read_dial_gauge
description: Original Description
inputSchema:
$schema: http://json-schema.org/draft-07/schema#
id: robotics/schemas/0.1.0/capabilities/ptz
title: PTZ camera capability input
type: object
properties:
method:
type: string
parameters:
type: object
properties:
tilt:
type: number
minimum: -90
maximum: 90
pan:
type: number
minimum: -180
maximum: 180
zoom:
type: number
minimum: 0
maximum: 100
required:
- tilt
- pan
- zoom
required:
- method
- parameters
additionalProperties: false
In the above schema, we have:
- Required properties:
name
,externalId
, andmethod
. - Optional properties:
description
,inputSchema
. inputSchema
is an object and is not verified by the Cognite Toolkit, it is passed as is to the Robotics API.
Sequences
Requires Cognite Toolkit v0.3.0 or later
Resource directory: classic/
API documentation: Sequences
Sequences can be found in the module's classic/
directory. You can define one or more sequences in a single YAML file.
The filename must end with Sequence
, for example, my_sequence.Sequence.yaml
.
Below is an example of a sequence configuration file.
externalId: windturbine_powercurve_xyz
name: Wind turbine power curve XYZ
description: A power curve for a wind turbine model XYZ
dataSetExternalId: ds_sequences
columns:
- externalId: wind_speed
type: DOUBLE
description: Wind speed in m/s
- externalId: power
type: DOUBLE
description: Power in kW
The data set is referenced by the dataSetExternalId
. The Cognite Toolkit automatically resolves the external ID to the internal ID of the data set.
Streamlit applications
Requires Cognite Toolkit v0.4.0 or later
Resource directory: streamlit/
API documentation: This uses the files API: Files
Streamlit applications are stored in the module's streamlit/ directory. You can define one or more Streamlit applications in a single YAML file. The filename must end with Streamlit
, for example, myapp.Streamlit.yaml
.
Below is an example of a Streamlit application configuration file.
externalId: myapp
name: MySuperApp
creator: doctrino@github.com
description: This is a super app
published: true
theme: Light
thumbnail: 'data:image/webp;base64,....'
dataSetExternalId: ds_complete_org
entrypoint: main.py
The data set is referenced by the dataSetExternalId
. The Cognite Toolkit automatically resolves the external ID to the internal ID of the data set.
The externalId
of the application must be unique within the project and must match the name of a directory
where the .py
files are located, including the entrypoint
file. In addition, there must be a requirements.txt
file in the same directory. For the above example, the directory structure would look like this:
./<my_module>
└── streamlit/
├── myapp.Streamlit.yaml
└── myapp/
├── main.py
└── requirements.txt
Transformations
Resource directory: transformations/
Transformation Notifications requires Cognite Toolkit v0.3.0 or later
API documentation Transformations: Transformations
API documentation Schedules: Schedule
API documentation Notifications: Notifications
The transformation configuration files are stored in the module's transformations/ directory. You can have one or more transformations in a single YAML file, but typically you have one transformation per file.
Each transformation can have a corresponding .sql file with the accompanying SQL code. The .sql file should have the same filename as the YAML file that defines the transformation (without the number prefix) or use the externalId of the transformation as the filename.
The transformation schedule is a separate resource type, tied to the transformation by external_id
.
The Cognite Toolkit detects the transformation schedule YAML file by the schedule
suffix in the filename,
for example, my_transformation.schedule.yaml
. The transformation notification YAML file is detected by the
Notification
suffix in the filename, for example, my_transformation.Notification.yaml
. All other YAML files are
considered transformation configurations.
Transformation configuration
Example transformation configuration:
externalId: 'tr_asset_{{location_name}}_{{source_name}}_asset_hierarchy'
dataSetExternalId: 'ds_asset_{{location_name}}'
name: 'asset:{{location_name}}:{{source_name}}:asset_hierarchy'
destination:
type: 'asset_hierarchy'
ignoreNullFields: true
isPublic: true
conflictMode: upsert
# Specify credentials separately like this:
# You can also use different credentials for running the transformations than the credentials you use to deploy.
authentication:
clientId: { { cicd_clientId } }
clientSecret: { { cicd_clientSecret } }
tokenUri: { { cicd_tokenUri } }
# Optional: If idP requires providing the cicd_scopes
cdfProjectName: { { cdfProjectName } }
scopes: { { cicd_scopes } }
# Optional: If idP requires providing the cicd_audience
audience: { { cicd_audience } }
SELECT
externalId as externalId,
if(parentExternalId is null,
'',
parentExternalId) as parentExternalId,
tag as name,
sourceDb as source,
description,
dataset_id('{{asset_dataset}}') as dataSetId,
to_metadata_except(
array("sourceDb", "parentExternalId", "description"), *)
as metadata
FROM
`{{asset_raw_input_db}}`.`{{asset_raw_input_table}}`
You can configure the transformation with both from and to sets of credentials
(sourceOidcCredentials
and destinationOidcCredentials
). Use authentication:
to configure both credentials
with same set of credentials. If you want to configure different credentials for the source and destination,
use the sourceOidcCredentials
and destinationOidcCredentials
properties instead.
You can specify the SQL inline in the transformation YAML file, using the query
property (str), but we recommend
that you use a separate .sql file for readability.
In the above transformation, the transformation re-uses the globally defined credentials for the Cognite Toolkit. For production porjects, we recommend that you use a service account with the minimum required access rights instead.
Configure two new variables in the config.[env].yaml of the module (for example, config.prod.yaml):
abc_clientId: ${ABC_CLIENT_ID}
abc_clientSecret: ${ABC_CLIENT_SECRET}
In the environment (CI/CD pipeline), you need to set the ABC_CLIENT_ID
and ABC_CLIENT_SECRET
environment variables
to the credentials of the application/service account configured in your identity provider for the transformation.
Transformation Schedule
The transformation schedule
is optional. If you do not specify a schedule, the transformation will be created,
but not scheduled. You can then schedule it manually in the CDF UI or using the CDF API.
Schedule is a separate API endpoint in CDF.
Example transformation schedule configuration:
externalId: 'tr_asset_{{location_name}}_{{source_name}}_asset_hierarchy'
interval: '{{scheduleHourly}}'
isPaused: { { pause_transformations } }
Transformation Notifications
The transformation notification
is optional. Below is an example of a transformation notification configuration file.
- transformationExternalId: tr_first_transformation
destination: john.smith@example.com
- transformationExternalId: tr_first_transformation
destination: jane.smith@example.com
CDF identifies notifications by their internal ID while the Cognite Toolkit uses a combination of the
transformation external ID and the destination to identify each notification
Running cdf clean
deletes all notifications for a transformation external ID and destination.
Time series
Resource directory: timeseries/
API documentation: Time-series
Use the Cognite Toolkit only to upload example data, and not as a general solution to ingest time series into CDF.
TimeSeries can be found in the module's timeseries/
directory. You can define the metadata of one or more time series
in a single or multiple YAML file(s). All YAML files that does not have the DatapointSubscription
suffix are
considered time series configurations.
Typically, you create time series when ingesting data into CDF by configuring the data pipelines with the corresponding data sets, databases, groups, and so on.
Example time series configuration:
- externalId: 'pi_160696'
name: 'VAL_23-PT-92504:X.Value'
dataSetExternalId: ds_timeseries_hamburg
isString: false
metadata:
compdev: '0'
location5: '2'
pointtype: Float32
convers: '1'
descriptor: PH 1stStgSuctCool Gas Out
contextMatchString: 23-PT-92504
contextClass: VAL
digitalset: ''
zero: '0'
filtercode: '0'
compdevpercent: '0'
compressing: '0'
tag: 'VAL_23-PT-92504:X.Value'
isStep: false
description: PH 1stStgSuctCool Gas Out
- externalId: 'pi_160702'
name: 'VAL_23-PT-92536:X.Value'
dataSetExternalId: ds_timeseries_hamburg
isString: false
metadata:
compdev: '0'
location5: '2'
pointtype: Float32
convers: '1'
descriptor: PH 1stStgComp Discharge
contextMatchString: 23-PT-92536
contextClass: VAL
digitalset: ''
zero: '0'
filtercode: '0'
compdevpercent: '0'
compressing: '0'
tag: 'VAL_23-PT-92536:X.Value'
This configuration creates two timeseries in the ds_timeseries_hamburg
data set with the external IDs pi_160696
and pi_160702
.
Uploading datapoints to time series
Use the Cognite Toolkit only to upload example data, and not as a general solution to ingest time series into CDF.
You can upload datapoints to times series using the Cognite Toolkit. The datapoints are stored in the
module's timeseries/ directory. Datapoints are stored in csv
or parquet
files. There is no requirements for the
filename of the datapoints file.
Typically, you create time series when ingesting data into CDF by configuring the data pipelines with the corresponding data sets, databases, groups, and so on.
Example of datapoints:
timestamp,pi_160696,pi_160702
2013-01-01 00:00:00,0.9430412044195982,0.9212588490581821
2013-01-01 01:00:00,0.9411303320132799,0.9212528389403117
2013-01-01 02:00:00,0.9394743147709556,0.9212779911470234
2013-01-01 03:00:00,0.9375842300608798,
2013-01-01 04:00:00,0.9355836846172971,0.9153202184209938
This .csv file loads data into the time series created in the previous example. The first column is the timestamp, and the following columns are the external ID for the time series at that timestamp.
Timeseries subscriptions
Requires Cognite Toolkit v0.2.0 or later
Resource directory: timeseries/
API documentation: Timeseries subscriptions
Timeseries subscriptions are stored in the module's timeseries/ directory. We recommend to
have a separate YAML file for each subscription. Use the DatapointSubscription
suffix in the filename,
for example my_subscription.DatapointSubscription.yaml
.
The Cognite Toolkit create the timeseries subscription after the timeseries.
Example timeseries subscription configuration:
externalId: my_subscription
name: My Subscription
description: All timeseries with externalId starting with ts_value
partitionCount: 1
filter:
prefix:
property:
- externalId
value: ts_value
Workflows
Requires Cognite Toolkit v0.2.0 or later
WorkflowTrigger requires Cognite Toolkit v0.3.0 or later
Resource directory: workflows/
API documentation: Workflows
The workflows are stored in the module's workflows/
directory. A workflow has three types of resources: Workflow
, WorkflowVersion
,
and WorkflowTrigger
. They are identified by the Workflow.yaml
, WorkflowVersion.yaml
, and WorkflowTrigger
suffixes.
We recommend having one file per workflow and workflow version.
When creating, updating, and deleting workflows, the Cognite Toolkit applies changes in the correct order based on the dependencies between the workflows and workflow versions. The Cognite Toolkit creates transformations and functions before the workflow versions to ensure that the workflow versions can reference them.
Example workflow:
externalId: wf_my_workflow
description: A workflow for processing data
Example workflow version:
workflowExternalId: wf_my_workflow
version: '1'
workflowDefinition:
description: 'Run tasks in sequence'
tasks:
- externalId: '{{ workflow_external_id }}_function_task'
type: 'function'
parameters:
function:
externalId: 'fn_first_function'
data: {}
isAsyncComplete: false
name: 'Task One'
description: First task
retries: 3
timeout: 3600
onFailure: 'abortWorkflow'
- externalId: '{{ workflow_external_id }}_transformation_task'
type: 'transformation'
parameters:
transformation:
externalId: 'tr_first_transformation'
concurrencyPolicy: fail
name: 'Task Two'
description: Second task
retries: 3
timeout: 3600
onFailure: 'skipTask'
dependsOn:
- externalId: '{{ workflow_external_id }}_function_task'
Example workflow trigger:
externalId: my_trigger
triggerRule:
triggerType: schedule
cronExpression: '0 0 * * *'
input:
my_input: 'data'
workflowExternalId: wf_my_workflow
workflowVersion: '1'
authentication:
clientId: { { my_trigger_clientId } }
clientSecret: ${IDP_WF_TRIGGER_SECRET}
You can specify the credentials for the workflow trigger by adding a authentication
property to the WorkflowTrigger
configuration.
externalId: my_trigger
---
authentication:
clientId: { { my_trigger_clientId } }
clientSecret: ${IDP_WF_TRIGGER_SECRET}
The CDF API doesn't support updating workflow Triggers. When you update a trigger with the Cognite Toolkit, the Cognite Toolkit deletes the existing trigger and creates a new one with the updated configuration. This means that the run history of the trigger is lost.