Skip to main content
Use the cdf data upload command to upload data files from your local machine to Cognite Data Fusion (CDF). The command processes structured directories containing data files and their corresponding manifest files. The manifest files define how the data should be uploaded.

Prerequisites

Before uploading data, ensure you have:
  • Installed and configured the Cognite Toolkit.
  • Authenticated with CDF using appropriate credentials.
  • Prepared your data files in the required directory structure.
  • Created manifest files for each data file you want to upload.

Directory structure

The upload command requires a directory containing your data files. Organize the directory using this structure:
input_dir/
  resources/                  # Optional, only if deploy_resources is True
    raw/
      table1.Table.yaml
      table2.Table.yaml
  
  datafile1.<kind>.ndjson     # Data file of a specific kind
  datafile1.Manifest.yaml     # Manifest for datafile1
  
  datafile2-part-0001.<kind2>.ndjson   # Another data file
  datafile2-part-0002.<kind2>.ndjson   # Part 2 of data file
  datafile2.Manifest.yaml     # Manifest file for datafile2
When you run the upload command, the Cognite Toolkit performs the following steps:
  1. Searches for all YAML files with the .Manifest.yaml suffix.
  2. For each manifest file, locates the data file with the same prefix.
  3. If --deploy-resources, deploys all resources in the resource directory.
  4. Uploads the data files for each manifest.
The manifest file tells the Cognite Toolkit what kind of data is being uploaded and provides any additional information necessary for the upload. For example, if you are uploading a RAW table, the manifest file specifies which database and table the data should be uploaded to. The format of the manifest file and the supported data types depend on the data you are uploading. Below is a complete list of all supported data types with the expected format of the manifest file for each type. The resources directory contains configurations for resources that must be created before uploading the data. For example, in the case of RAW data, this can be table definitions. If you set the --deploy-resources flag when running the upload command, these resources are created before the data is uploaded.
When you download a resource, the Cognite Toolkit automatically places the downloaded files in your selected directory with the specified format.

Assets

Kind: Assets Supported data file formats: .ndjson, .csv, .parquet API endpoint: /assets You can upload assets both from a record (.ndjson) and tabular (.csv, .parquet) formats. The manifest file for assets can specify a data set or a hierarchy:
myDataSet.Manifest.yaml
kind: Assets
type: dataSet
dataSetExternalId: my_data_set_external_id
myHierarchy.Manifest.yaml
kind: Assets
type: assetSubtree
hierarchy: root_asset_external_id
The data file for records looks like this:
myHierarchy-part-0001.Assets.ndjson
{"externalId":"root_asset_external_id","name":"Root Asset", "depth": 0, "metadata": {"department": "engineering"}}
{"externalId":"child_asset_1","name":"Child Asset 1","parentExternalId":"root_asset_external_id", "depth": 1}
{"externalId":"child_asset_2","name":"Child Asset 2","parentExternalId":"root_asset_external_id" , "depth": 1}
# ...
Or for tabular formats:
myDataSet.Assets.csv
externalId,name,parentExternalId,depth,metadata.department
root_asset_external_id,Root Asset,,0,engineering
child_asset_1,Child Asset 1,root_asset_external_id,1,
child_asset_2,Child Asset 2,root_asset_external_id,1,
# ...
All assets in the data file must have depth defined when uploading an asset hierarchy. The Cognite Toolkit uses this to ensure that parents are created before their children.

Charts

Kind: Charts Supported data file format: .ndjson API endpoint: No public API You can upload charts using a record (.ndjson) format. The manifest file specifies the external IDs of the charts to upload:
myCharts.Manifest.yaml
kind: Charts
type: chartExternalId
externalIds:
  - chart_external_id_1
  - chart_external_id_2
The data file for records looks like this:
myCharts-part-0001.Charts.ndjson
{"externalId": "5400ab34-37f1-470d-b965-d81404ac92d8", "visibility": "PUBLIC", "data": ...}
Charts are typically downloaded using the cdf data download charts command.

Datapoints

Kind: Datapoints Supported data file formats: .csv, .parquet API endpoint: /timeseries/data You can upload datapoints using tabular (.csv, .parquet) formats. The manifest file specifies the mapping between the columns in the data file and the time series identifiers. The identifiers can be either internal IDs, external IDs, instance IDs, or a mix of these. In addition, you need to specify the column containing the timestamps:
myDatapoints.Manifest.yaml
kind: Datapoints
type: datapointsFile
timestampColumn: timestamp
columns:
- columnType: instance
  column: ts_instance_id
  dType: numeric
  space: my_timeseries_instance_space
  externalId: my_cognite_timeseries_id
- columnType: externalId
  column: ts_external_id
  dType: string
  externalId: my_external_timeseries_id
- columnType: internalId
  column: ts_internal_id
  dType: numeric
  internalId: 1234567890123
The data file for tabular formats looks like this:
myDatapoints.Datapoints.csv
timestamp,ts_instance_id,ts_external_id,ts_internal_id
1633036800000,30.0,value1,1111111111111
1633036860000,34.0,value2,1111111111111
1633036920000,34.1,value3,1111111111111

Events

Kind: Events Supported data file formats: .ndjson, .csv, .parquet API endpoint: /events You can upload events using record (.ndjson) and tabular (.csv) formats. The manifest file specifies the target data set or the root asset for the event hierarchy:
myDataSet.Manifest.yaml
kind: Events
type: dataSet
dataSetExternalId: my_data_set_external_id
myHierarchy.Manifest.yaml
kind: Events
type: assetSubtree
hierarchy: root_asset_external_id
The data file for records looks like this:
myEvents-part-0001.Events.ndjson
{"externalId":"event_1","type":"inspection","startTime":1633036800000,"dataSetExternalId":"my_data_set_external_id"}
{"externalId":"event_2","type":"maintenance","startTime":1633123200000,"dataSetExternalId":"my_data_set_external_id"}
# ...
Or for tabular formats:
myEvents.Events.csv
externalId,type,startTime,dataSetExternalId,metadata.location
event_1,inspection,1633036800000,my_data_set_external_id,Oslo
event_2,maintenance,1633123200000,my_data_set_external_id,Oslo
# ...

Instances

Kind: Instances Supported data file format: .ndjson API endpoint: /models/instances You can upload instances using the record (.ndjson) format. The manifest file specifies the instance space for the instances:
myInstances.Manifest.yaml
kind: Instances
type: instanceSpace
instanceSpace: my_instance_space
instanceType: node # Either 'node' or 'edge'. Defaults to 'node'
view: # Optional
  space: my_view_space
  externalId: my_view_external_id
  version: v1
The data file for records:
myInstances-part-0001.Instances.ndjson
{"space":"my_instance_space", "externalId":"instance_1", "type":"node", "sources": [{"source": {"space":"my_view_space", "externalId":"my_view", "version": "v1", "type": "view"}}], "properties": {"prop1": "value1"}}
# ...

Raw

Kind: RawRows Supported data file formats: .ndjson, .csv, .parquet API endpoint: /raw/dbs//tables//rows You can upload raw rows using the record (.ndjson) and tabular (.csv, .parquet) formats. The manifest file specifies the database and table the data should be uploaded to. In addition, for table formats, you can specify the column that should be used as the unique row identifier. If not specified, CDF automatically generates row IDs.
myRawData.Manifest.yaml
kind: RawRows
type: rawTable
table:
  dbName: my_database_name
  tableName: my_table_name
key: my_unique_key_column   # Optional
The data file for records:
myRawData-part-0001.RawRows.ndjson
{"key": "row1", "columns": {"col1": "value1", "col2": 123}}
{"key": "row2", "columns": {"col1": "value2", "col2": 456, "col3": true}}
# ...
Or for tabular formats:
myRawData.RawRows.csv
my_unique_key_column,col1,col2,col3
row1,value1,123,
row2,value2,456,true
# ...

Time series

Kind: TimeSeries Supported data file formats: .ndjson, .csv, .parquet API endpoint: /timeseries You can upload time series definitions using the record (.ndjson) and tabular (.csv) formats. The manifest file specifies the target data set or the root asset for the hierarchy the time series is connected to:
myDataSet.Manifest.yaml
kind: TimeSeries
type: dataSet
dataSetExternalId: my_data_set_external_id
myHierarchy.Manifest.yaml
kind: TimeSeries
type: assetSubtree
hierarchy: root_asset_external_id
The data file for records looks like this:
myTimeSeries-part-0001.TimeSeries.ndjson
{"externalId":"ts_1","name":"Temperature","isString":false,"dataSetExternalId":"my_data_set_external_id"}
{"externalId":"ts_2","name":"Status","isString":true,"dataSetExternalId":"my_data_set_external_id"}
# ...
Or for tabular formats:
myTimeSeries.TimeSeries.csv
externalId,name,isString,dataSetExternalId
ts_1,Temperature,false,my_data_set_external_id
ts_2,Status,true,my_data_set_external_id
# ...

Further reading