Upload data using the Data plugin

Use the cdf data upload command to upload data files from your local machine to Cognite Data Fusion (CDF). The command processes structured directories containing data files and their corresponding manifest files. The manifest files define how the data should be uploaded.

Prerequisites

Before uploading data, ensure you have:

Installed and configured the Cognite Toolkit.
Authenticated with CDF using appropriate credentials.
Prepared your data files in the required directory structure.
Created manifest files for each data file you want to upload.

Directory structure

The upload command requires a directory containing your data files. Organize the directory using this structure:

input_dir/
  resources/                  # Optional, only if deploy_resources is True
    raw/
      table1.Table.yaml
      table2.Table.yaml
  
  datafile1.<kind>.ndjson     # Data file of a specific kind
  datafile1.Manifest.yaml     # Manifest for datafile1
  
  datafile2-part-0001.<kind2>.ndjson   # Another data file
  datafile2-part-0002.<kind2>.ndjson   # Part 2 of data file
  datafile2.Manifest.yaml     # Manifest file for datafile2

When you run the upload command, the Cognite Toolkit performs the following steps:

Searches for all YAML files with the .Manifest.yaml suffix.
For each manifest file, locates the data file with the same prefix.
If --deploy-resources, deploys all resources in the resource directory.
Uploads the data files for each manifest.

The manifest file tells the Cognite Toolkit what kind of data is being uploaded and provides any additional information necessary for the upload. For example, if you are uploading a RAW table, the manifest file specifies which database and table the data should be uploaded to. The format of the manifest file and the supported data types depend on the data you are uploading. Below is a complete list of all supported data types with the expected format of the manifest file for each type. The resources directory contains configurations for resources that must be created before uploading the data. For example, in the case of RAW data, this can be table definitions. If you set the --deploy-resources flag when running the upload command, these resources are created before the data is uploaded.

When you download a resource, the Cognite Toolkit automatically places the downloaded files in your selected directory with the specified format.

Assets

Kind: Assets Supported data file formats: .ndjson, .csv, .parquet API endpoint: /assets You can upload assets both from a record (.ndjson) and tabular (.csv, .parquet) formats. The manifest file for assets can specify a data set or a hierarchy:

myDataSet.Manifest.yaml

kind: Assets
type: dataSet
dataSetExternalId: my_data_set_external_id

myHierarchy.Manifest.yaml

kind: Assets
type: assetSubtree
hierarchy: root_asset_external_id

The data file for records looks like this:

myHierarchy-part-0001.Assets.ndjson

{"externalId":"root_asset_external_id","name":"Root Asset", "depth": 0, "metadata": {"department": "engineering"}}
{"externalId":"child_asset_1","name":"Child Asset 1","parentExternalId":"root_asset_external_id", "depth": 1}
{"externalId":"child_asset_2","name":"Child Asset 2","parentExternalId":"root_asset_external_id" , "depth": 1}
# ...

Or for tabular formats:

myDataSet.Assets.csv

externalId,name,parentExternalId,depth,metadata.department
root_asset_external_id,Root Asset,,0,engineering
child_asset_1,Child Asset 1,root_asset_external_id,1,
child_asset_2,Child Asset 2,root_asset_external_id,1,
# ...

All assets in the data file must have depth defined when uploading an asset hierarchy. The Cognite Toolkit uses this to ensure that parents are created before their children.

Canvas

Kind: IndustrialCanvas Supported data file format: .ndjson API endpoint: No public API You can upload canvases using a record (.ndjson) format. The manifest file specifies the external IDs of the canvases to upload:

myCanvases.Manifest.yaml

kind: IndustrialCanvas
type: canvasExternalId
externalIds:
    - canvas_external_id_1
    - canvas_external_id_2

The data file for records looks like this:

myCanvases-part-0001.IndustrialCanvas.ndjson

{"canvas": {"space": "IndustrialCanvasInstanceSpace", "externalId": "378527d7-e2f1-40d4-a079-537432df26f9" ...}}

Canvases are typically downloaded using the cdf data download canvas command. This is typically used to move canvases between CDF projects.

Charts

Kind: Charts Supported data file format: .ndjson API endpoint: No public API You can upload charts using a record (.ndjson) format. The manifest file specifies the external IDs of the charts to upload:

myCharts.Manifest.yaml

kind: Charts
type: chartExternalId
externalIds:
  - chart_external_id_1
  - chart_external_id_2

The data file for records looks like this:

myCharts-part-0001.Charts.ndjson

{"externalId": "5400ab34-37f1-470d-b965-d81404ac92d8", "visibility": "PUBLIC", "data": ...}

Charts are typically downloaded using the cdf data download charts command.

Datapoints

Kind: Datapoints Supported data file formats: .csv, .parquet API endpoint: /timeseries/data You can upload datapoints using tabular (.csv, .parquet) formats. The manifest file specifies the mapping between the columns in the data file and the time series identifiers. The identifiers can be either internal IDs, external IDs, instance IDs, or a mix of these. In addition, you need to specify the column containing the timestamps:

myDatapoints.Manifest.yaml

kind: Datapoints
type: datapointsFile
timestampColumn: timestamp
columns:
- columnType: instance
  column: ts_instance_id
  dType: numeric
  space: my_timeseries_instance_space
  externalId: my_cognite_timeseries_id
- columnType: externalId
  column: ts_external_id
  dType: string
  externalId: my_external_timeseries_id
- columnType: internalId
  column: ts_internal_id
  dType: numeric
  internalId: 1234567890123

The data file for tabular formats looks like this:

myDatapoints.Datapoints.csv

timestamp,ts_instance_id,ts_external_id,ts_internal_id
1633036800000,30.0,value1,1111111111111
1633036860000,34.0,value2,1111111111111
1633036920000,34.1,value3,1111111111111

Events

Kind: Events Supported data file formats: .ndjson, .csv, .parquet API endpoint: /events You can upload events using record (.ndjson) and tabular (.csv) formats. The manifest file specifies the target data set or the root asset for the event hierarchy:

myDataSet.Manifest.yaml

kind: Events
type: dataSet
dataSetExternalId: my_data_set_external_id

myHierarchy.Manifest.yaml

kind: Events
type: assetSubtree
hierarchy: root_asset_external_id

The data file for records looks like this:

myEvents-part-0001.Events.ndjson

{"externalId":"event_1","type":"inspection","startTime":1633036800000,"dataSetExternalId":"my_data_set_external_id"}
{"externalId":"event_2","type":"maintenance","startTime":1633123200000,"dataSetExternalId":"my_data_set_external_id"}
# ...

Or for tabular formats:

myEvents.Events.csv

externalId,type,startTime,dataSetExternalId,metadata.location
event_1,inspection,1633036800000,my_data_set_external_id,Oslo
event_2,maintenance,1633123200000,my_data_set_external_id,Oslo
# ...

Files

Kind: FileContent Supported data file format: Any format for the file content itself API endpoint: /files

Template for manifest file

You can upload files with content using a template format for the manifest file. Classic template format:

myFileMedata.Manifest.yaml

kind: FileContent
type: fileMetadataTemplate
fileDirectory: /relative_path/to/files  # To files location
template:
  name: $FILENAME
  externalID: file_$FILENAME
  metadata:
    uploadedBy: data_plugin

Or Data Modeling template format:

myFileMedata.Manifest.yaml

kind: FileContent
type: fileDataModelingTemplate
fileDirectory: /relative_path/to/files  # To files location
viewId: # Can be replaced with an extension view
  space: cdf_cdm
  externalId: CogniteFile
  version: v1
template:
  space: my_file_instance_space
  externalID: file_$FILENAME
  name: $FILENAME

For example, if you have the following files in the specified fileDirectory:

/relative_path/to/files
  file1.txt
  file2.csv
myFileMedata.Manifest.yaml

Then, each file will be uploaded using the template defined in the manifest file, replacing $FILENAME with the actual file name.

Instances

Kind: Instances Supported data file format: .ndjson API endpoint: /models/instances You can upload instances using the record (.ndjson) format. The manifest file specifies the instance space for the instances:

myInstances.Manifest.yaml

kind: Instances
type: instanceSpace
instanceSpace: my_instance_space
instanceType: node # Either 'node' or 'edge'. Defaults to 'node'
view: # Optional
  space: my_view_space
  externalId: my_view_external_id
  version: v1

The data file for records:

myInstances-part-0001.Instances.ndjson

{"space":"my_instance_space", "externalId":"instance_1", "type":"node", "sources": [{"source": {"space":"my_view_space", "externalId":"my_view", "version": "v1", "type": "view"}}], "properties": {"prop1": "value1"}}
# ...

Raw

Kind: RawRows Supported data file formats: .ndjson, .csv, .parquet API endpoint: /raw/dbs//tables//rows You can upload raw rows using the record (.ndjson) and tabular (.csv, .parquet) formats. The manifest file specifies the database and table the data should be uploaded to. In addition, for table formats, you can specify the column that should be used as the unique row identifier. If not specified, CDF automatically generates row IDs.

myRawData.Manifest.yaml

kind: RawRows
type: rawTable
table:
  dbName: my_database_name
  tableName: my_table_name
key: my_unique_key_column   # Optional

The data file for records:

myRawData-part-0001.RawRows.ndjson

{"key": "row1", "columns": {"col1": "value1", "col2": 123}}
{"key": "row2", "columns": {"col1": "value2", "col2": 456, "col3": true}}
# ...

Or for tabular formats:

myRawData.RawRows.csv

my_unique_key_column,col1,col2,col3
row1,value1,123,
row2,value2,456,true
# ...

Time series

Kind: TimeSeries Supported data file formats: .ndjson, .csv, .parquet API endpoint: /timeseries You can upload time series definitions using the record (.ndjson) and tabular (.csv) formats. The manifest file specifies the target data set or the root asset for the hierarchy the time series is connected to:

myDataSet.Manifest.yaml

kind: TimeSeries
type: dataSet
dataSetExternalId: my_data_set_external_id

myHierarchy.Manifest.yaml

kind: TimeSeries
type: assetSubtree
hierarchy: root_asset_external_id

The data file for records looks like this:

myTimeSeries-part-0001.TimeSeries.ndjson

{"externalId":"ts_1","name":"Temperature","isString":false,"dataSetExternalId":"my_data_set_external_id"}
{"externalId":"ts_2","name":"Status","isString":true,"dataSetExternalId":"my_data_set_external_id"}
# ...

Or for tabular formats:

myTimeSeries.TimeSeries.csv

externalId,name,isString,dataSetExternalId
ts_1,Temperature,false,my_data_set_external_id
ts_2,Status,true,my_data_set_external_id
# ...

Deploy

Upload data using the Data plugin

Prerequisites

Directory structure

Assets

Canvas

Charts

Datapoints

Events

Files

Template for manifest file

Instances

Raw

Time series

Further reading

Deploy

​Prerequisites

​Directory structure

​Assets

​Canvas

​Charts

​Datapoints

​Events

​Files

​Template for manifest file

​Instances

​Raw

​Time series

​Further reading

Prerequisites

Directory structure

Assets

Canvas

Charts

Datapoints

Events

Files

Template for manifest file

Instances

Raw

Time series

Further reading