Configuration settings - Cognite Docs

You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.

Using values from environment variables

The configuration file allows substitutions with environment variables. For example:

cognite:
  secret: ${COGNITE_CLIENT_SECRET}

will load the value from the COGNITE_CLIENT_SECRET environment variable into the cognite/secret parameter. You can also do string interpolation with environment variables, for example:

url: http://my-host.com/api/endpoint?secret=${MY_SECRET_TOKEN}

Implicit substitutions only work for unquoted value strings. For quoted strings, use the !env tag to activate environment substitution:

url: !env 'http://my-host.com/api/endpoint?secret=${MY_SECRET_TOKEN}'

Using values from Azure Key Vault

The DB extractor also supports loading values from Azure Key Vault. To load a configuration value from Azure Key Vault, use the !keyvault tag followed by the name of the secret you want to load. For example, to load the value of the my-secret-name secret in Key Vault into a password parameter, configure your extractor like this:

password: !keyvault my-secret-name

To use Key Vault, you also need to include the azure-keyvault section in your configuration, with the following parameters:

Parameter	Description
`keyvault-name`	Name of Key Vault to load secrets from
`authentication-method`	How to authenticate to Azure. Either `default` or `client-secret`. For `default`, the extractor will look at the user running the extractor, and look for pre-configured Azure logins from tools like the Azure CLI. For `client-secret`, the extractor will authenticate with a configured client ID/secret pair.
`client-id`	Required for using the `client-secret` authentication method. The client ID to use when authenticating to Azure.
`secret`	Required for using the `client-secret` authentication method. The client secret to use when authenticating to Azure.
`tenant-id`	Required for using the `client-secret` authentication method. The tenant ID of the Key Vault in Azure.

Example:

azure-keyvault:
  keyvault-name: my-keyvault-name
  authentication-method: client-secret
  tenant-id: 6f3f324e-5bfc-4f12-9abe-22ac56e2e648
  client-id: 6b4cc73e-ee58-4b61-ba43-83c4ba639be6
  secret: 1234abcd

Base configuration object

Parameter	Type	Description
`version`	either string or integer	Input the configuration file version.
`type`	either `local` or `remote`	Input the configuration file type. The `local` option loads the full config from this file, while the `remote` option loads only the `cognite` section and the rest from extraction pipelines. Default value is `local`.
`cognite`	object	Describes which CDF project the extractor will load data into and how to connect to the project.
`logger`	object	Sets up logging to a console and files. This is an optional value.
`extractor`	object	Contains the common extractor configuration.
`source`	object	Insert the source configuration for data mirrored from Fabric.
`destination`	object	Insert the destination configuration for time series data mirrored from Fabric.
`subscriptions`	list	Insert the time series subscriptions configuration for time series mirrored to Fabric.
`data-modeling`	list	Insert the data modeling configuration for syncing a data model to Fabric.
`event`	object	Enter the event configuration for mirroring events to Fabric.
`raw-tables`	list	Enter the raw tables configuration to mirror the raw tables to Fabric.

`cognite`

Global parameter. Describes which CDF project the extractor will load data into and how to connect to the project.

Parameter	Type	Description
`project`	string	Insert the CDF project name into which you want to ingest data.
`idp-authentication`	object	Insert the credentials for authenticating to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory).
`data-set`	object	Enter a data set into which the extractor should write data.
`extraction-pipeline`	object	Enter the extraction pipeline for remote config and reporting statuses.
`host`	string	Insert the base URL of the CDF project. Default value is `https://api.cognitedata.com`.
`timeout`	integer	Enter the timeout on requests to CDF in seconds. Default value is `30`.
`external-id-prefix`	string	Enter the external ID prefix to identify the documents in CDF. Leave empty for no prefix.
`connection`	object	This parameter configures the network connection details.

`idp-authentication`

Part of cognite configuration. Insert the credentials for authenticating to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory).

Parameter	Type	Description
`authority`	string	Insert the authority together with `tenant` to authenticate against Azure tenants.s. Default value is `https://login.microsoftonline.com/`.
`client-id`	string	Required. Enter the service principal client ID from the IdP.
`tenant`	string	Enter the EbtraID tenant ID. Do not use in combnation with the `token-url` parameter.
`token-url`	string	Insert the URL to fetch tokens. Do not use in combination with the `tenant` parameter.
`secret`	string	Enter the service principal client secret from the IdP.
`resource`	string	Input the resource parameter and token requests.
`audience`	string	Input the audience parameter and token requests.
`scopes`	list	Enter the list of scopes requested for the token.
`min-ttl`	integer	Insert the minimum time in seconds for a token to be valid. If the cached token expires in less than `min-ttl` seconds, the system will refresh the token, even if it’s still valid. Default value is `30`.
`certificate`	object	Authenticate with a client certificate.

`scopes`

Part of idp-authentication configuration. Enter the list of scopes requested for the token. Each element of this list should be a string.

`certificate`

Part of idp-authentication configuration. Authenticate with a client certificate.

Parameter	Type	Description
`authority-url`	string	Input the authentication authority URL.
`path`	string	Required. Enter the path to the .pem or .pfx certificate for authentication.
`password`	string	Enter the password for the key file if it is encrypted.

`data-set`

Part of cognite configuration. Enter a data set into which the extractor should write data.

Parameter	Type	Description
`id`	integer	Input the resource internal ID.
`external-id`	string	Input the resource external ID.

`extraction-pipeline`

Part of cognite configuration. Enter the extraction pipeline for remote config and reporting statuses.

Parameter	Type	Description
`id`	integer	Input the resource internal ID.
`external-id`	string	Input the resource external ID.

`connection`

Part of cognite configuration. This parameter configures the network connection details.

Parameter	Type	Description
`disable-gzip`	boolean	Set to `true` to turn off gzipping of JSON bodies.
`status-forcelist`	string	Enter the HTTP status codes to retry.
`max-retries`	integer	Enter the HTTP status codes to retry. Default value is `10`.
`max-retries-connect`	integer	Enter the maximum number of retries on connection errors. Default value is `3`.
`max-retry-backoff`	integer	Sets a maximum backoff after any request failure. The retry strategy employs exponential backoff. Default value is `30`.
`max-connection-pool-size`	integer	Sets the maximum number of connections in the SDK’s connection pool. Default value is `50`.
`disable-ssl`	boolean	Set to `true` to turn off SSL verification.
`proxies`	object	Input the dictionary mapping from protocol to URL.

`proxies`

Part of connection configuration. Input the dictionary mapping from protocol to URL.

`logger`

Global parameter. Sets up logging to a console and files. This is an optional value.

Parameter	Type	Description
`console`	object	Include the console section to enable logging to standard output, such as a terminal window.
`file`	object	Include the file section to enable logging to a file. The files are rotated daily.
`metrics`	boolean	Enables metrics on the number of log messages recorded per logger and level. Configure `metrics` to retrieve the logs.

`console`

Part of logger configuration. Include the console section to enable logging to standard output, such as a terminal window.

Parameter	Type	Description
`level`	either `DEBUG`, `INFO`, `WARNING`, `ERROR` or `CRITICAL`	Select the verbosity level for console logging. To reduce the verbosity levels, use `DEBUG`, `INFO`, `WARNING`, `ERROR`, or `CRITICAL`. Default value is `INFO`.

`file`

Part of logger configuration. Include the file section to enable logging to a file. The files are rotated daily.

Parameter	Type	Description
`level`	either `DEBUG`, `INFO`, `WARNING`, `ERROR` or `CRITICAL`	Select the verbosity level for file logging. To reduce the verbosity levels, use `DEBUG`, `INFO`, `WARNING`, `ERROR`, or `CRITICAL`. Default value is `INFO`.
`path`	string	Required. Insert the path to the log file.
`retention`	integer	Specify the number of days to keep logs. Default value is `7`.

`extractor`

Global parameter. Contains the common extractor configuration.

Parameter	Type	Description
`state-store`	object	Include the state store section to save extraction states between runs. Use a state store if data is loaded incrementally. We support multiple state stores, but you can only configure one at a time.
`subscription-batch-size`	integer	Input the batch size for time series subscriptions. Default value is `10000`.
`ingest-batch-size`	integer	Input the batch size for time series ingestion. Default value is `100000`.
`fabric-ingest-batch-size`	integer	Input the batch size for ingestion into Fabric. Default value is `1000`.
`poll-time`	integer	Enter the time in seconds to wait between polling for new data. Default value is `3600`.

`state-store`

Part of extractor configuration. Include the state store section to save extraction states between runs. Use a state store if data is loaded incrementally. We support multiple state stores, but you can only configure one at a time.

Parameter	Type	Description
`raw`	object	Stores the extraction state in a table in CDF RAW.
`local`	object	Stores the extraction state in a JSON file on the local machine.

`raw`

Part of state-store configuration. Stores the extraction state in a table in CDF RAW.

Parameter	Type	Description
`database`	string	Required. Enter the database name in CDF RAW.
`table`	string	Required. Enter the table name in CDF RAW.
`upload-interval`	integer	Enter the interval in seconds between each upload to CDF RAW. Default value is `30`.

`local`

Part of state-store configuration. Stores the extraction state in a JSON file on the local machine.

Parameter	Type	Description
`path`	string	Required. Insert the file path to a JSON file.
`save-interval`	integer	Enter the interval in seconds between each save. Default value is `30`.

`source`

Global parameter. Insert the source configuration for data mirrored from Fabric.

Parameter	Type	Description
`abfss-prefix`	string	Input the ABFSS prefix for the data lake.
`data-set-id`	string	Input the data set ID.
`event-path`	string	Enter the folder combined with ABFFS Path to the event data.
`event-path-incremental-field`	string	Input the field for incremental loading.
`raw-time-series-path`	string	Enter the folder with ABFFS Path to the raw time series data.
`read-batch-size`	integer	Input the batch size for reading data from Fabric.
`file-path`	string	Input the file path for the file’s data.
`raw-tables`	list	Input the list of raw tables to be ingested.

`raw-tables`

Part of source configuration. Input the list of raw tables to be ingested. Each element of this list should be a cDF RAW configuration

Parameter	Type	Description
`table-name`	string	Enter the name of the RAW table in CDF to store rows.
`db-name`	string	Enter the database name in CDF to store the row.
`raw-path`	string	Input the subpath in the lakehouse to read rows.
`incremental-field`	string	Input the field for incremental loading. This value is normally a timestamp.

`destination`

Global parameter. Insert the destination configuration for time series data mirrored from Fabric.

Parameter	Type	Description
`time-series-prefix`	string	Enter the prefix to add to CDF time series external IDs created from Fabric.

`subscriptions`

Global parameter. Insert the time series subscriptions configuration for time series mirrored to Fabric. Each element of this list should be a cDF time series sync to CDF configuration

Parameter	Type	Description
`external-id`	string	Input the external ID of the time series subscription.
`partitions`	list	Enter the List of partitions to be ingested.
`lakehouse-abfss-path-dps`	string	Input the ABFSS path to the data points.
`lakehouse-abfss-path-ts`	string	Input the ABFSS path to the time series.

`partitions`

Part of subscriptions configuration. Enter the List of partitions to be ingested. Each element of this list should be an integer.

`data-modeling`

Global parameter. Insert the data modeling configuration for syncing a data model to Fabric. Each element of this list should be a data Modeling sync configuration

Parameter	Type	Description
`space`	string	Enter the data modeling space name to synchronize to Fabric
`lakehouse-abfss-prefix`	string	Enter the full ABFSS prefix for a folder in the lakehouse.

`event`

Global parameter. Enter the event configuration for mirroring events to Fabric.

Parameter	Type	Description
`lakehouse-abfss-path-events`	string	Input the path to the table in the lakehouse to store CDF events.
`batch-size`	integer	Input the number of events to read in a single batch from CDF.
`dataset_external_id`	string	Input the external id of the dataset to pull events from CDF (optional).

`raw-tables`

Global parameter. Enter the raw tables configuration to mirror the raw tables to Fabric. Each element of this list should be a cDF RAW configuration for to be synced to Fabric

Parameter	Type	Description
`table-name`	string	Enter the name of the RAW table in CDF to sync to Fabric.
`db-name`	string	Enter the database name in CDF to sync to Fabric.
`lakehouse-abfss-path-raw`	string	Enter the full ABFFS path of the table to store RAW rows into.

Data engineering

​Using values from environment variables

​Using values from Azure Key Vault

​Base configuration object

​cognite

​idp-authentication

​scopes

​certificate

​data-set

​extraction-pipeline

​connection

​proxies

​logger

​console

​file

​extractor

​state-store

​raw

​local

​source

​raw-tables

​destination

​subscriptions

​partitions

​data-modeling

​event

​raw-tables

Using values from environment variables

Using values from Azure Key Vault

Base configuration object

`cognite`

`idp-authentication`

`scopes`

`certificate`

`data-set`

`extraction-pipeline`

`connection`

`proxies`

`logger`

`console`

`file`

`extractor`

`state-store`

`raw`

`local`

`source`

`raw-tables`

`destination`

`subscriptions`

`partitions`

`data-modeling`

`event`

`raw-tables`