Manage data workflows

You can create, run, and monitor data workflows using the Cognite API or the Cognite Python SDK. Use Data workflows in the CDF user interface to automate and manage workflow tasks and process runs.

Before you start

To add the necessary capabilities for data workflows, see assign capabilities.

ヒント

You can assign a workflow to a data set when you create a workflow. All read or write operations on the workflow or related resources, such as versions, runs, tasks, and triggers, need access to this data set.

Create data workflows

Use Data workflows in the CDF user interface to automate and manage tasks and processes runs.

Navigate to Data management > Data workflows.
Select + Create workflow and follow the wizard to start building your workflow:

Add workflow tasks.
Create workflow triggers.
Switch between editing the workflow and viewing the workflow run history.
View or modify workflow versions.
Run the workflow, with or without input data.

Run data workflows

To run a workflow, call the API endpoint /run, and provide workflowExternalId and version.

Authenticate a workflow

To start a workflow run, the workflow requires a nonce. This is a temporary token used for authentication when the workflow triggers the processes defined by its tasks. A nonce can be retrieved from the Sessions API when creating a session. The session will be created behind the scenes when triggering a workflow using the Python SDK.

All tasks in the workflow execution use the same nonce for authentication, except for transformation tasks. These tasks are configured with useTransformationCredentials set to true. The transformation job uses the credentials specified in the transformation. When a trigger creates a workflow run, the nonce configured for the trigger will be passed to the workflow run.

Note

Make sure that the nonce has the necessary capabilities to perform the tasks in the workflow.

Define workflow input data

You can define custom input data as part of the workflow run. This means workflows can behave differently based on the input (parametric workflows). The input data is accessible throughout the workflow using dynamic references.

Note

Don't use input data to pass large amounts of data to the workflow. For large data sets, we recommend using CDF's data stores and setting the tasks to directly read from and write to these. See Limits and restrictions for input size limits.

The workflow input can come from multiple sources:

Manual execution input: Data you provide when manually triggering a workflow.
Trigger input: Static data defined when you create triggers.
Trigger-generated input: Dynamic data collected by triggers, such as data modeling triggers, add items to arrays.
System input: Metadata automatically added by the system, such as version.

Add metadata

A workflow run can also have custom, application-specific metadata, which is separate from the input data. Use metadata to track execution context, add operational information, and facilitate debugging and monitoring.

Workflow execution statuses

During execution, each workflow progresses through different statuses that indicate its current state:

Status	Description
`RUNNING`	The workflow is currently running.
`COMPLETED`	All tasks have completed successfully.
`FAILED`	The workflow failed to complete due to an error and could not continue to run.
`TIMED_OUT`	The workflow exceeded its maximum execution time.
`TERMINATED`	The workflow was manually canceled or stopped by another workflow or process.

ヒント

Monitor these statuses to understand workflow health and take any appropriate actions.

Task input and output

Each task receives a task definition as input during a workflow run. Different outputs are returned depending on the task type:

Task type	Output
Transformation	`jobId` of the transformation job in CDF.
Function	`callId`, `functionId`, and `response` of the Cognite Functions.
Simulation	`runId`, `logId`, and `statusMessage` of the simulation run in CDF.
CDF	`statusCode` and `response` (body) of the API call.
Dynamic	`dynamicTasks`, an array of dynamically generated tasks based on the input data. Each task in this array has its own `externalId` and `type`.

These outputs can be referenced by subsequent tasks using dynamic references, enabling data flow between tasks. For dynamic tasks, refer to the externalId of the dynamically generated tasks to access their outputs.

Using client secrets as input to workflows and tasks

Some tasks in a workflow run may need access to client secrets, client credentials, or other confidential information. Data workflows currently don't support secure storage of client secrets as input to workflows or tasks.

We recommend leveraging the capabilities for secure storage of secrets in Cognite Functions, combined with a Function task in the workflow.

Automate workflows with triggers

Use triggers to automate your data workflow runs. Run workflows at regular intervals for scheduled automation, for instance for daily reports or maintenance tasks or to react to data changes in real-time. Monitor specific conditions and automatically start workflow executions when the conditions are met.

Triggers also automatically handle groups of related data changes (batch processing). For details about trigger types, configuration, and how they pass data to workflows, see Triggers for data workflows and API workflow triggers.

Error handling and retries

Workflows provide built-in error-handling mechanisms:

Task-level retries: Each task can be configured with retry attempts.
Failure policies: Tasks can be marked as optional (skipTask) or required (abortWorkflow).
Compensation flows: Use subsequent tasks to handle failures or cleanup
Manual intervention: Use Function tasks with isAsyncComplete=true to create approval points where the workflow waits for human review and manual task completion.

For details about task configuration and error handling, see Tasks in data workflows.

Before you start​

Create data workflows​

Run data workflows​

Authenticate a workflow​

Define workflow input data​

Add metadata​

Workflow execution statuses​

Task input and output​

Automate workflows with triggers​

Error handling and retries​