Using values from environment variables
The configuration file allows substitutions with environment variables. For example:COGNITE_CLIENT_SECRET environment variable into the cognite/secret parameter. You can also do string interpolation with environment variables, for example:
Implicit substitutions only work for unquoted value strings. For quoted strings, use the
!env tag to activate environment substitution:Using values from Azure Key Vault
The DB extractor also supports loading values from Azure Key Vault. To load a configuration value from Azure Key Vault, use the!keyvault tag followed by the name of the secret you want to load. For example, to load the value of the my-secret-name secret in Key Vault into a password parameter, configure your extractor like this:
azure-keyvault section in your configuration, with the following parameters:
| Parameter | Description |
|---|---|
keyvault-name | Name of Key Vault to load secrets from |
authentication-method | How to authenticate to Azure. Either default or client-secret. For default, the extractor will look at the user running the extractor, and look for pre-configured Azure logins from tools like the Azure CLI. For client-secret, the extractor will authenticate with a configured client ID/secret pair. |
client-id | Required for using the client-secret authentication method. The client ID to use when authenticating to Azure. |
secret | Required for using the client-secret authentication method. The client secret to use when authenticating to Azure. |
tenant-id | Required for using the client-secret authentication method. The tenant ID of the Key Vault in Azure. |
Base configuration object
| Parameter | Type | Description |
|---|---|---|
version | either string or integer | Input the configuration file version. |
type | either local or remote | Input the configuration file type. The local option loads the full config from this file, while the remote option loads only the cognite section and the rest from extraction pipelines. Default value is local. |
cognite | object | Describes which CDF project the extractor will load data into and how to connect to the project. |
logger | object | Sets up logging to a console and files. This is an optional value. |
extractor | object | Contains the common extractor configuration. |
source | object | Insert the source configuration for data mirrored from Fabric. |
destination | object | Insert the destination configuration for time series data mirrored from Fabric. |
subscriptions | list | Insert the time series subscriptions configuration for time series mirrored to Fabric. |
data-modeling | list | Insert the data modeling configuration for syncing a data model to Fabric. |
event | object | Enter the event configuration for mirroring events to Fabric. |
raw-tables | list | Enter the raw tables configuration to mirror the raw tables to Fabric. |
cognite
Global parameter.
Describes which CDF project the extractor will load data into and how to connect to the project.
| Parameter | Type | Description |
|---|---|---|
project | string | Insert the CDF project name into which you want to ingest data. |
idp-authentication | object | Insert the credentials for authenticating to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory). |
data-set | object | Enter a data set into which the extractor should write data. |
extraction-pipeline | object | Enter the extraction pipeline for remote config and reporting statuses. |
host | string | Insert the base URL of the CDF project. Default value is https://api.cognitedata.com. |
timeout | integer | Enter the timeout on requests to CDF in seconds. Default value is 30. |
external-id-prefix | string | Enter the external ID prefix to identify the documents in CDF. Leave empty for no prefix. |
connection | object | This parameter configures the network connection details. |
idp-authentication
Part of cognite configuration.
Insert the credentials for authenticating to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory).
| Parameter | Type | Description |
|---|---|---|
authority | string | Insert the authority together with tenant to authenticate against Azure tenants.s. Default value is https://login.microsoftonline.com/. |
client-id | string | Required. Enter the service principal client ID from the IdP. |
tenant | string | Enter the EbtraID tenant ID. Do not use in combnation with the token-url parameter. |
token-url | string | Insert the URL to fetch tokens. Do not use in combination with the tenant parameter. |
secret | string | Enter the service principal client secret from the IdP. |
resource | string | Input the resource parameter and token requests. |
audience | string | Input the audience parameter and token requests. |
scopes | list | Enter the list of scopes requested for the token. |
min-ttl | integer | Insert the minimum time in seconds for a token to be valid. If the cached token expires in less than min-ttl seconds, the system will refresh the token, even if it’s still valid. Default value is 30. |
certificate | object | Authenticate with a client certificate. |
scopes
Part of idp-authentication configuration.
Enter the list of scopes requested for the token.
Each element of this list should be a string.
certificate
Part of idp-authentication configuration.
Authenticate with a client certificate.
| Parameter | Type | Description |
|---|---|---|
authority-url | string | Input the authentication authority URL. |
path | string | Required. Enter the path to the .pem or .pfx certificate for authentication. |
password | string | Enter the password for the key file if it is encrypted. |
data-set
Part of cognite configuration.
Enter a data set into which the extractor should write data.
| Parameter | Type | Description |
|---|---|---|
id | integer | Input the resource internal ID. |
external-id | string | Input the resource external ID. |
extraction-pipeline
Part of cognite configuration.
Enter the extraction pipeline for remote config and reporting statuses.
| Parameter | Type | Description |
|---|---|---|
id | integer | Input the resource internal ID. |
external-id | string | Input the resource external ID. |
connection
Part of cognite configuration.
This parameter configures the network connection details.
| Parameter | Type | Description |
|---|---|---|
disable-gzip | boolean | Set to true to turn off gzipping of JSON bodies. |
status-forcelist | string | Enter the HTTP status codes to retry. |
max-retries | integer | Enter the HTTP status codes to retry. Default value is 10. |
max-retries-connect | integer | Enter the maximum number of retries on connection errors. Default value is 3. |
max-retry-backoff | integer | Sets a maximum backoff after any request failure. The retry strategy employs exponential backoff. Default value is 30. |
max-connection-pool-size | integer | Sets the maximum number of connections in the SDK’s connection pool. Default value is 50. |
disable-ssl | boolean | Set to true to turn off SSL verification. |
proxies | object | Input the dictionary mapping from protocol to URL. |
proxies
Part of connection configuration.
Input the dictionary mapping from protocol to URL.
logger
Global parameter.
Sets up logging to a console and files. This is an optional value.
| Parameter | Type | Description |
|---|---|---|
console | object | Include the console section to enable logging to standard output, such as a terminal window. |
file | object | Include the file section to enable logging to a file. The files are rotated daily. |
metrics | boolean | Enables metrics on the number of log messages recorded per logger and level. Configure metrics to retrieve the logs. |
console
Part of logger configuration.
Include the console section to enable logging to standard output, such as a terminal window.
| Parameter | Type | Description |
|---|---|---|
level | either DEBUG, INFO, WARNING, ERROR or CRITICAL | Select the verbosity level for console logging. To reduce the verbosity levels, use DEBUG, INFO, WARNING, ERROR, or CRITICAL. Default value is INFO. |
file
Part of logger configuration.
Include the file section to enable logging to a file. The files are rotated daily.
| Parameter | Type | Description |
|---|---|---|
level | either DEBUG, INFO, WARNING, ERROR or CRITICAL | Select the verbosity level for file logging. To reduce the verbosity levels, use DEBUG, INFO, WARNING, ERROR, or CRITICAL. Default value is INFO. |
path | string | Required. Insert the path to the log file. |
retention | integer | Specify the number of days to keep logs. Default value is 7. |
extractor
Global parameter.
Contains the common extractor configuration.
| Parameter | Type | Description |
|---|---|---|
state-store | object | Include the state store section to save extraction states between runs. Use a state store if data is loaded incrementally. We support multiple state stores, but you can only configure one at a time. |
subscription-batch-size | integer | Input the batch size for time series subscriptions. Default value is 10000. |
ingest-batch-size | integer | Input the batch size for time series ingestion. Default value is 100000. |
fabric-ingest-batch-size | integer | Input the batch size for ingestion into Fabric. Default value is 1000. |
poll-time | integer | Enter the time in seconds to wait between polling for new data. Default value is 3600. |
state-store
Part of extractor configuration.
Include the state store section to save extraction states between runs. Use a state store if data is loaded incrementally. We support multiple state stores, but you can only configure one at a time.
| Parameter | Type | Description |
|---|---|---|
raw | object | Stores the extraction state in a table in CDF RAW. |
local | object | Stores the extraction state in a JSON file on the local machine. |
raw
Part of state-store configuration.
Stores the extraction state in a table in CDF RAW.
| Parameter | Type | Description |
|---|---|---|
database | string | Required. Enter the database name in CDF RAW. |
table | string | Required. Enter the table name in CDF RAW. |
upload-interval | integer | Enter the interval in seconds between each upload to CDF RAW. Default value is 30. |
local
Part of state-store configuration.
Stores the extraction state in a JSON file on the local machine.
| Parameter | Type | Description |
|---|---|---|
path | string | Required. Insert the file path to a JSON file. |
save-interval | integer | Enter the interval in seconds between each save. Default value is 30. |
source
Global parameter.
Insert the source configuration for data mirrored from Fabric.
| Parameter | Type | Description |
|---|---|---|
abfss-prefix | string | Input the ABFSS prefix for the data lake. |
data-set-id | string | Input the data set ID. |
event-path | string | Enter the folder combined with ABFFS Path to the event data. |
event-path-incremental-field | string | Input the field for incremental loading. |
raw-time-series-path | string | Enter the folder with ABFFS Path to the raw time series data. |
read-batch-size | integer | Input the batch size for reading data from Fabric. |
file-path | string | Input the file path for the file’s data. |
raw-tables | list | Input the list of raw tables to be ingested. |
raw-tables
Part of source configuration.
Input the list of raw tables to be ingested.
Each element of this list should be a cDF RAW configuration
| Parameter | Type | Description |
|---|---|---|
table-name | string | Enter the name of the RAW table in CDF to store rows. |
db-name | string | Enter the database name in CDF to store the row. |
raw-path | string | Input the subpath in the lakehouse to read rows. |
incremental-field | string | Input the field for incremental loading. This value is normally a timestamp. |
destination
Global parameter.
Insert the destination configuration for time series data mirrored from Fabric.
| Parameter | Type | Description |
|---|---|---|
time-series-prefix | string | Enter the prefix to add to CDF time series external IDs created from Fabric. |
subscriptions
Global parameter.
Insert the time series subscriptions configuration for time series mirrored to Fabric.
Each element of this list should be a cDF time series sync to CDF configuration
| Parameter | Type | Description |
|---|---|---|
external-id | string | Input the external ID of the time series subscription. |
partitions | list | Enter the List of partitions to be ingested. |
lakehouse-abfss-path-dps | string | Input the ABFSS path to the data points. |
lakehouse-abfss-path-ts | string | Input the ABFSS path to the time series. |
partitions
Part of subscriptions configuration.
Enter the List of partitions to be ingested.
Each element of this list should be an integer.
data-modeling
Global parameter.
Insert the data modeling configuration for syncing a data model to Fabric.
Each element of this list should be a data Modeling sync configuration
| Parameter | Type | Description |
|---|---|---|
space | string | Enter the data modeling space name to synchronize to Fabric |
lakehouse-abfss-prefix | string | Enter the full ABFSS prefix for a folder in the lakehouse. |
event
Global parameter.
Enter the event configuration for mirroring events to Fabric.
| Parameter | Type | Description |
|---|---|---|
lakehouse-abfss-path-events | string | Input the path to the table in the lakehouse to store CDF events. |
batch-size | integer | Input the number of events to read in a single batch from CDF. |
dataset_external_id | string | Input the external id of the dataset to pull events from CDF (optional). |
raw-tables
Global parameter.
Enter the raw tables configuration to mirror the raw tables to Fabric.
Each element of this list should be a cDF RAW configuration for to be synced to Fabric
| Parameter | Type | Description |
|---|---|---|
table-name | string | Enter the name of the RAW table in CDF to sync to Fabric. |
db-name | string | Enter the database name in CDF to sync to Fabric. |
lakehouse-abfss-path-raw | string | Enter the full ABFFS path of the table to store RAW rows into. |