> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognite.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Configuration settings

> Configuration reference for the Cognite File extractor with all available parameters and options.

To configure the File extractor, you must create a configuration file. The file must be in [YAML](https://yaml.org) format.

<Tip>
  You can set up [extraction pipelines](/cdf/integration/guides/interfaces/configure_integrations) to use versioned extractor configuration files stored in the cloud.
</Tip>

## Using values from environment variables

The configuration file allows substitutions with environment variables. For example:

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
cognite:
  secret: ${COGNITE_CLIENT_SECRET}
```

will load the value from the `COGNITE_CLIENT_SECRET` environment variable into the `cognite/secret` parameter. You can also do string interpolation with environment variables, for example:

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
url: http://my-host.com/api/endpoint?secret=${MY_SECRET_TOKEN}
```

<Info>
  Implicit substitutions only work for unquoted value strings. For quoted strings, use the `!env` tag to activate environment substitution:

  ```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
  url: !env 'http://my-host.com/api/endpoint?secret=${MY_SECRET_TOKEN}'
  ```
</Info>

## Using values from Azure Key Vault

The DB extractor also supports loading values from Azure Key Vault. To load a configuration value from Azure Key Vault, use the `!keyvault` tag followed by the name of the secret you want to load. For example, to load the value of the `my-secret-name` secret in Key Vault into a `password` parameter, configure your extractor like this:

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
password: !keyvault my-secret-name
```

To use Key Vault, you also need to include the `azure-keyvault` section in your configuration, with the following parameters:

| Parameter               | Description                                                                                                                                                                                                                                                                                                                                                                         |
| ----------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `keyvault-name`         | Name of Key Vault to load secrets from                                                                                                                                                                                                                                                                                                                                              |
| `authentication-method` | How to authenticate to Azure. Either `default` or `client-secret`. For `default`, the extractor will look at the user running the extractor, and look for pre-configured Azure logins from tools like the [Azure CLI](https://learn.microsoft.com/en-us/cli/azure/install-azure-cli). For `client-secret`, the extractor will authenticate with a configured client ID/secret pair. |
| `client-id`             | Required for using the `client-secret` authentication method. The client ID to use when authenticating to Azure.                                                                                                                                                                                                                                                                    |
| `secret`                | Required for using the `client-secret` authentication method. The client secret to use when authenticating to Azure.                                                                                                                                                                                                                                                                |
| `tenant-id`             | Required for using the `client-secret` authentication method. The tenant ID of the Key Vault in Azure.                                                                                                                                                                                                                                                                              |

**Example:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
azure-keyvault:
  keyvault-name: my-keyvault-name
  authentication-method: client-secret
  tenant-id: 6f3f324e-5bfc-4f12-9abe-22ac56e2e648
  client-id: 6b4cc73e-ee58-4b61-ba43-83c4ba639be6
  secret: 1234abcd
```

Base configuration object

| Parameter   | Type                       | Description                                                                                                                                                                                                                                                                |
| ----------- | -------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `version`   | either string or integer   | Configuration file version                                                                                                                                                                                                                                                 |
| `type`      | either `local` or `remote` | Configuration file type. Either `local`, meaning the full config is loaded from this file, or `remote`, which means that only the `cognite` section is loaded from this file, and the rest is loaded from extraction pipelines. Default value is `local`.                  |
| `cognite`   | object                     | The cognite section describes which CDF project the extractor will load data into and how to connect to the project.                                                                                                                                                       |
| `logger`    | object                     | The optional `logger` section sets up logging to a console and files.                                                                                                                                                                                                      |
| `files`     | object                     | Configure files to be extracted to CDF.                                                                                                                                                                                                                                    |
| `extractor` | object                     | General configuration for the file extractor.                                                                                                                                                                                                                              |
| `metrics`   | object                     | The `metrics` section describes where to send metrics on extractor performance for remote monitoring of the extractor. We recommend sending metrics to a [Prometheus pushgateway](https://prometheus.io), but you can also send metrics as time series in the CDF project. |

## `cognite`

Global parameter.

The cognite section describes which CDF project the extractor will load data into and how to connect to the project.

| Parameter             | Type    | Description                                                                                                                                                                            |
| --------------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `project`             | string  | Insert the CDF project name.                                                                                                                                                           |
| `idp-authentication`  | object  | The `idp-authentication` section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory). |
| `data-set`            | object  | Enter a data set the extractor should write data into                                                                                                                                  |
| `extraction-pipeline` | object  | Enter the extraction pipeline used for remote config and reporting statuses                                                                                                            |
| `host`                | string  | Insert the base URL of the CDF project. Default value is `https://api.cognitedata.com`.                                                                                                |
| `timeout`             | integer | Enter the timeout on requests to CDF, in seconds. Default value is `30`.                                                                                                               |
| `external-id-prefix`  | string  | Prefix on external ID used when creating CDF resources                                                                                                                                 |
| `connection`          | object  | Configure network connection details                                                                                                                                                   |

### `idp-authentication`

Part of [`cognite`](#cognite) configuration.

The `idp-authentication` section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory).

| Parameter     | Type    | Description                                                                                                                                                                                  |
| ------------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `authority`   | string  | Insert the authority together with `tenant` to authenticate against Azure tenants. Default value is `https://login.microsoftonline.com/`.                                                    |
| `client-id`   | string  | **Required.** Enter the service principal client ID from the IdP.                                                                                                                            |
| `tenant`      | string  | Enter the Azure tenant.                                                                                                                                                                      |
| `token-url`   | string  | Insert the URL to fetch tokens from.                                                                                                                                                         |
| `secret`      | string  | Enter the service principal client secret from the IdP.                                                                                                                                      |
| `resource`    | string  | Resource parameter passed along with token requests.                                                                                                                                         |
| `audience`    | string  | Audience parameter passed along with token requests.                                                                                                                                         |
| `scopes`      | list    | Enter a list of scopes requested for the token                                                                                                                                               |
| `min-ttl`     | integer | Insert the minimum time in seconds a token will be valid. If the cached token expires in less than `min-ttl` seconds, it will be refreshed even if it is still valid. Default value is `30`. |
| `certificate` | object  | Authenticate with a client certificate                                                                                                                                                       |

#### `scopes`

Part of [`idp-authentication`](#cognite.idp-authentication) configuration.

Enter a list of scopes requested for the token

Each element of this list should be a string.

#### `certificate`

Part of [`idp-authentication`](#cognite.idp-authentication) configuration.

Authenticate with a client certificate

| Parameter       | Type   | Description                                                                                |
| --------------- | ------ | ------------------------------------------------------------------------------------------ |
| `authority-url` | string | Authentication authority URL                                                               |
| `path`          | string | **Required.** Enter the path to the .pem or .pfx certificate to be used for authentication |
| `password`      | string | Enter the password for the key file, if it is encrypted.                                   |

### `data-set`

Part of [`cognite`](#cognite) configuration.

Enter a data set the extractor should write data into

| Parameter     | Type    | Description          |
| ------------- | ------- | -------------------- |
| `id`          | integer | Resource internal ID |
| `external-id` | string  | Resource external ID |

### `extraction-pipeline`

Part of [`cognite`](#cognite) configuration.

Enter the extraction pipeline used for remote config and reporting statuses

| Parameter     | Type    | Description          |
| ------------- | ------- | -------------------- |
| `id`          | integer | Resource internal ID |
| `external-id` | string  | Resource external ID |

### `connection`

Part of [`cognite`](#cognite) configuration.

Configure network connection details

| Parameter                  | Type    | Description                                                                                                                                      |
| -------------------------- | ------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| `disable-gzip`             | boolean | Whether or not to disable gzipping of json bodies.                                                                                               |
| `status-forcelist`         | string  | HTTP status codes to retry. Defaults to 429, 502, 503 and 504                                                                                    |
| `max-retries`              | integer | Max number of retries on a given http request. Default value is `10`.                                                                            |
| `max-retries-connect`      | integer | Max number of retries on connection errors. Default value is `3`.                                                                                |
| `max-retry-backoff`        | integer | Retry strategy employs exponential backoff. This parameter sets a max on the amount of backoff after any request failure. Default value is `30`. |
| `max-connection-pool-size` | integer | The maximum number of connections which will be kept in the SDKs connection pool. Default value is `50`.                                         |
| `disable-ssl`              | boolean | Whether or not to disable SSL verification.                                                                                                      |
| `proxies`                  | object  | Dictionary mapping from protocol to url.                                                                                                         |

#### `proxies`

Part of [`connection`](#cognite.connection) configuration.

Dictionary mapping from protocol to url.

## `logger`

Global parameter.

The optional `logger` section sets up logging to a console and files.

| Parameter | Type    | Description                                                                                                                   |
| --------- | ------- | ----------------------------------------------------------------------------------------------------------------------------- |
| `console` | object  | Include the console section to enable logging to a standard output, such as a terminal window.                                |
| `file`    | object  | Include the file section to enable logging to a file. The files are rotated daily.                                            |
| `metrics` | boolean | Enables metrics on the number of log messages recorded per logger and level. This requires `metrics` to be configured as well |

### `console`

Part of [`logger`](#logger) configuration.

Include the console section to enable logging to a standard output, such as a terminal window.

| Parameter | Type                                                     | Description                                                                                                                                                                      |
| --------- | -------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `level`   | either `DEBUG`, `INFO`, `WARNING`, `ERROR` or `CRITICAL` | Select the verbosity level for console logging. Valid options, in decreasing verbosity levels, are `DEBUG`, `INFO`, `WARNING`, `ERROR`, and `CRITICAL`. Default value is `INFO`. |

### `file`

Part of `logger` configuration.

Include the file section to enable logging to a file. The files are rotated daily.

| Parameter   | Type                                                     | Description                                                                                                                                                                   |
| ----------- | -------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `level`     | either `DEBUG`, `INFO`, `WARNING`, `ERROR` or `CRITICAL` | Select the verbosity level for file logging. Valid options, in decreasing verbosity levels, are `DEBUG`, `INFO`, `WARNING`, `ERROR`, and `CRITICAL`. Default value is `INFO`. |
| `path`      | string                                                   | **Required.** Insert the path to the log file.                                                                                                                                |
| `retention` | integer                                                  | Specify the number of days to keep logs for. Default value is `7`.                                                                                                            |

## `files`

Global parameter.

Configure files to be extracted to CDF.

| Parameter             | Type                                                                                                                                                                 | Description                                                                                                                                                                                                  |
| --------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `file-provider`       | configuration for either Local Files, Sharepoint Online, FTP/FTPS, SFTP, GCP Cloud Storage, Azure Blob Storage, Azure Data Lake Storage, Samba, AWS S3 or Documentum | Configure a file provider for where the files are extracted from.                                                                                                                                            |
| `extensions`          | list                                                                                                                                                                 | List of file extensions to include. If left out, all file extensions will be allowed.                                                                                                                        |
| `labels`              | list                                                                                                                                                                 | List of label external IDs to add to extracted files.                                                                                                                                                        |
| `security-categories` | list                                                                                                                                                                 | List of security category IDs to add to extracted files.                                                                                                                                                     |
| `max-file-size`       | either string or number                                                                                                                                              | Maximum file size of files to include. Set to -1 to allow any file size. Syntax is `N(KB\|MB\|GB\|TB\|KiB\|miB\|GiB\|TiB)`. Note that the extractor supports files up to 1000GiB. Default value is `100GiB`. |
| `with-metadata`       | boolean                                                                                                                                                              | Add metadata extracted from the file source to files in CDF.                                                                                                                                                 |
| `directory-prefix`    | string                                                                                                                                                               | Prefix to add to all extracted file directories.                                                                                                                                                             |
| `metadata-to-raw`     | object                                                                                                                                                               | If this is configured, write metadata to a table in CDF RAW instead of files.                                                                                                                                |
| `data_model`          | object                                                                                                                                                               | When this is provided, all files metadata are uploaded to data models, hence metadata-to-raw becomes redundant.                                                                                              |
| `destination_mode`    | either `cdm` or `classic`                                                                                                                                            | Mode of the file extractor. Can be 'cdm' for data models or 'classic' for Files.                                                                                                                             |
| `destination_mode`    | either `cdm` or `classic`                                                                                                                                            | Mode of the file extractor. Can be 'cdm' for data models or 'classic' for Files.                                                                                                                             |
| `source`              | object                                                                                                                                                               | Sets the `Source` metadata field for the related files. When data modelling is set, it updates the underlying CogniteSourceSystem with the correspondent source. This is an optional parameter.              |
| `filter`              | configuration for either And, Or, Not, Equals or In                                                                                                                  |                                                                                                                                                                                                              |
| `delete-behavior`     | object                                                                                                                                                               | When files are no longer at the source: `soft` (mark with metadata) or `hard` (delete from CDF).                                                                                                             |
| `missing-as-deleted`  | boolean                                                                                                                                                              | When set to `true`, treat files missing from the current source listing as logically deleted. This only affects files tracked in the [state store](#state-store).                                            |

<Note>
  The `missing-as-deleted` and `delete-behavior` parameters require a configured [state store](#state-store) to track files between runs. Without a state store, the extractor will not delete or mark files as deleted.
</Note>

### `file-provider`

Part of [`files`](#files) configuration.

Configure a file provider for where the files are extracted from.

Either one of the following options:

* Local Files
* Sharepoint Online
* FTP/FTPS
* SFTP
* GCP Cloud Storage
* Azure Blob Storage
* Azure Data Lake Storage
* Samba
* AWS S3
* Documentum

#### `local_files`

Part of `file-provider` configuration.

Read files from a local folder. This file provider will recursively traverse the given path and extract all discovered files.

**Examples:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: local
path: /some/local/path
```

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: local
path:
- /some/local/path
- /another/path
```

| Parameter                | Type                                                  | Description                                                                                                                                              |
| ------------------------ | ----------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `type`                   | string                                                | Select the type of file provider. Set to `local` for local files.                                                                                        |
| `path`                   | configuration for either string or list               |                                                                                                                                                          |
| `use-relative-directory` | boolean                                               | Set CDF metadata 'Directory' to the file relative path. When set to 'false', CDF directory will be set to the full folder path. Default value is `True`. |
| `ignore_patterns`        | configuration for either String or Pattern with flags | Any file path that matches this pattern will be ignored.                                                                                                 |

##### `ignore_patterns`

Part of [`local_files`](#local_files) configuration.

Any file path that matches this pattern will be ignored.

Either one of the following options:

* String
* Pattern with flags

###### `pattern_with_flags`

Part of [`ignore_patterns`](#ignore_patterns) configuration.

| Parameter | Type              | Description    |
| --------- | ----------------- | -------------- |
| `pattern` | string            | Pattern string |
| `flags`   | either `a` or `i` |                |

#### `sharepoint_online`

Part of [`file-provider`](#file-provider) configuration.

Read files from one or more sharepoint online sites.

**Example:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: sharepoint_online
client-id: ${SP_CLIENT_ID}
client-secret: ${SP_CLIENT_SECRET}
tenant-id: ${SP_AZURE_TENANT_ID}
paths:
- url: ${SP_EXTRACT_URL}
```

| Parameter                     | Type                                                  | Description                                                                                                                                                                                                                                     |
| ----------------------------- | ----------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `type`                        | string                                                | **Required.** Select the type of file provider. Set to `sharpeoint_online` for sharepoint online.                                                                                                                                               |
| `client-id`                   | string                                                | **Required.** Enter the App registration client ID.                                                                                                                                                                                             |
| `client-secret`               | string                                                | Enter the App registration secret.                                                                                                                                                                                                              |
| `certificate-path`            | string                                                | Enter the path to a certificate used for authentication. Either this, `client-secret`, or `certificate-data` must be specified.                                                                                                                 |
| `certificate-data`            | string                                                | Provide authentication certificate data directly.                                                                                                                                                                                               |
| `tenant-id`                   | string                                                | **Required.** Enter the Azure tenant containing the App registration.                                                                                                                                                                           |
| `paths`                       | list                                                  | **Required.** Enter a list of sharepoint base URLs to extract from.                                                                                                                                                                             |
| `resync-on-expired-delta-url` | boolean                                               | If set to `true`, when SharePoint returns a 410 Gone error because the delta URL has expired, the extractor performs a full re-synchronization of the document library. Default value is `False`.                                               |
| `datetime-format`             | string                                                | Format string for timestamp metadata. Default value is `%Y-%m-%dT%H:%M:%SZ`.                                                                                                                                                                    |
| `extract-columns`             | object                                                | Extract Sharepoint columns as metadata. This is a map from column names in Sharepoint to the name you want to extracted columns to have in file metadata in CDF.<br /><br />**Example:**<br />`{'columnNameInSharepoint': 'metadataNameInCdf'}` |
| `restrict-to`                 | object                                                | Restrict to extract only files visible to a given Sharepoint Group or SiteUser.<br />Important: In order to use this, the extractor MUST authenticate to Sharepoint online using a certificate, NOT a client secret.                            |
| `performance`                 | object                                                | Configuration to tune the parallel calls made to Sharepoint MS Graph API.                                                                                                                                                                       |
| `ignore_patterns`             | configuration for either String or Pattern with flags | Any file path that matches this pattern will be ignored.                                                                                                                                                                                        |

##### `paths`

Part of `sharepoint_online` configuration.

Enter a list of sharepoint base URLs to extract from.

Each element of this list should be an a sharepoint base URL to extract from.

| Parameter   | Type    | Description                                                                                                                                                          |
| ----------- | ------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `url`       | string  | **Required.** URL to the Sharepoint location you want to extract from. This can be the url to a site, a document library, or to a file or folder inside the library. |
| `recursive` | boolean | Whether to traverse into subfolders or not, for this path. Default value is `True`.                                                                                  |

##### `extract-columns`

Part of [`sharepoint_online`](#sharepoint_online) configuration.

Extract Sharepoint columns as metadata. This is a map from column names in Sharepoint to the name you want to extracted columns to have in file metadata in CDF.

**Example:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
columnNameInSharepoint: metadataNameInCdf
```

| Parameter  | Type   | Description                    |
| ---------- | ------ | ------------------------------ |
| Any string | string | Name of metadata field in CDF. |

##### `restrict-to`

Part of [`sharepoint_online`](#sharepoint_online) configuration.

Restrict to extract only files visible to a given Sharepoint Group or SiteUser.
Important: In order to use this, the extractor MUST authenticate to Sharepoint online using a certificate, NOT a client secret.

| Parameter  | Type   | Description                                                                                                                             |
| ---------- | ------ | --------------------------------------------------------------------------------------------------------------------------------------- |
| `group-id` | string | The ID of a Sharepoint Group                                                                                                            |
| `username` | string | The "login name" of a SiteUser. This can be useful if you for example want to restrict extraction to the built-in "all users" SiteUser. |

##### `performance`

Part of [`sharepoint_online`](#sharepoint_online) configuration.

Configuration to tune the parallel calls made to Sharepoint MS Graph API.

| Parameter         | Type    | Description                                                                                                       |
| ----------------- | ------- | ----------------------------------------------------------------------------------------------------------------- |
| `workers`         | integer | Number of parallel workers used to read from Sharepoint. Default value is `10`.                                   |
| `document-buffer` | integer | Number of document metadata instances to buffer before uploading the file contents to CDF. Default value is `60`. |

##### `ignore_patterns`

Part of [`sharepoint_online`](#sharepoint_online) configuration.

Any file path that matches this pattern will be ignored.

Either one of the following options:

* String
* Pattern with flags

###### `pattern_with_flags`

Part of [`ignore_patterns`](#ignore_patterns) configuration.

| Parameter | Type              | Description    |
| --------- | ----------------- | -------------- |
| `pattern` | string            | Pattern string |
| `flags`   | either `a` or `i` |                |

#### `ftp/ftps`

Part of [`file-provider`](#file-provider) configuration.

Read files from an FTP server.

**Example:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: ftp
host: ftp.myserver.com
username: username
password: ${FTP_PASSWORD}
```

| Parameter               | Type                                                  | Description                                                                                                 |
| ----------------------- | ----------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |
| `type`                  | string                                                | **Required.** Select the type of file provider. Set to `ftp` for FTP.                                       |
| `host`                  | string                                                | **Required.** Host name for the FTP server.<br /><br />**Example:**<br />`ftp.myserver.com`                 |
| `port`                  | integer                                               | FTP server port. Default value is `21`.                                                                     |
| `username`              | string                                                | **Required.** Username to use to login to the FTP server.                                                   |
| `password`              | string                                                | **Required.** Password to use to login to the FTP server.                                                   |
| `root-directory`        | string                                                | Root folder for extraction. Default value is `/`.                                                           |
| `recursive`             | boolean                                               | Whether to recursively traverse sub-folders for files. Default value is `True`.                             |
| `use-ssl`               | boolean                                               | Whether to connect using FTPS (FTP with SSL/TLS). Using SSL is strongly recommended.                        |
| `certificate-file-path` | string                                                | Path to SSL certificate authority certificate for the FTP server, useful if using self-signed certificates. |
| `ignore_patterns`       | configuration for either String or Pattern with flags | Any file path that matches this pattern will be ignored.                                                    |

##### `ignore_patterns`

Part of [`ftp/ftps`](#ftpftps) configuration.

Any file path that matches this pattern will be ignored.

Either one of the following options:

* String
* Pattern with flags

###### `pattern_with_flags`

Part of `ignore_patterns` configuration.

| Parameter | Type              | Description    |
| --------- | ----------------- | -------------- |
| `pattern` | string            | Pattern string |
| `flags`   | either `a` or `i` |                |

#### `sftp`

Part of [`file-provider`](#file-provider) configuration.

Read files from an SFTP server, file transfer over SSH.

**Example:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: sftp
host: ftp.myserver.com
username: username
password: ${FTP_PASSWORD}
```

| Parameter                                                      | Type                                                  | Description                                                                                     |
| -------------------------------------------------------------- | ----------------------------------------------------- | ----------------------------------------------------------------------------------------------- |
| `type`                                                         | string                                                | **Required.** Select the type of file provider. Set to `sftp` for SFTP.                         |
| `host`                                                         | string                                                | **Required.** Host name for the SSH server.<br /><br />**Example:**<br />`ftp.myserver.com`     |
| `port`                                                         | integer                                               | SSH server port. Default value is `22`.                                                         |
| `username`                                                     | string                                                | **Required.** Username to use to login to the SSH server.                                       |
| `password`                                                     | string                                                | Password to use to login to the SSH server. Either `password` or `key-path` is required.        |
| `key-path`                                                     | string                                                | Path to SSH private key for the connection. Either `password` or `key-path` is required.        |
| `key-password`                                                 | string                                                | Password for SSH private key if the key is encrypted. Only used in combination with `key-path`. |
| `root-directory`                                               | string                                                | Root folder for extraction. Default value is `/`.                                               |
| `recursive`                                                    | boolean                                               | Whether to recursively traverse sub-folders for files. Default value is `True`.                 |
| [`ignore_patterns`](#files.file-provider.sftp.ignore_patterns) | configuration for either String or Pattern with flags | Any file path that matches this pattern will be ignored.                                        |

##### `ignore_patterns`

Part of `sftp` configuration.

Any file path that matches this pattern will be ignored.

Either one of the following options:

* [String](#files.file-provider.sftp.ignore_patterns.string)
* [Pattern with flags](#files.file-provider.sftp.ignore_patterns.pattern_with_flags)

###### `pattern_with_flags`

Part of [`ignore_patterns`](#ignore_patterns) configuration.

| Parameter | Type              | Description    |
| --------- | ----------------- | -------------- |
| `pattern` | string            | Pattern string |
| `flags`   | either `a` or `i` |                |

#### `gcp_cloud_storage`

Part of `file-provider` configuration.

Read files from a GCP Cloud Storage bucket.

**Example:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: gcp_cloud_storage
google-application-credentials: ${GOOGLE_APPLICATION_CREDENTIALS}
bucket: bucket_name
folders:
- list
- of
- folders
```

| Parameter                        | Type                                                  | Description                                                                         |
| -------------------------------- | ----------------------------------------------------- | ----------------------------------------------------------------------------------- |
| `type`                           | string                                                | Select the type of file provider. Set to `gcp_cloud_storage` for GCP Cloud Storage. |
| `google-application-credentials` | string                                                | Base-64 encoded GCP service account credentials.                                    |
| `bucket`                         | string                                                | Name of GCP Cloud Storage bucket to fetch files from.                               |
| `folders`                        | list                                                  | List of folders in bucket to fetch files from.                                      |
| `ignore_patterns`                | configuration for either String or Pattern with flags | Any file path that matches this pattern will be ignored.                            |

##### `folders`

Part of `gcp_cloud_storage` configuration.

List of folders in bucket to fetch files from.

Each element of this list should be a string.

##### `ignore_patterns`

Part of `gcp_cloud_storage` configuration.

Any file path that matches this pattern will be ignored.

Either one of the following options:

* String
* Pattern with flags

###### `pattern_with_flags`

Part of `ignore_patterns` configuration.

| Parameter | Type              | Description    |
| --------- | ----------------- | -------------- |
| `pattern` | string            | Pattern string |
| `flags`   | either `a` or `i` |                |

#### `azure_blob_storage`

Part of `file-provider` configuration.

Read files from an Azure Blob Store.

**Example:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: azure_blob_storage
connection-string: ${AZURE_BLOB_STORAGE_CONNECTION_STRING}
```

| Parameter           | Type                                                  | Description                                                                                         |
| ------------------- | ----------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
| `type`              | string                                                | **Required.** Select the type of file provider. Set to `azure_blob_storage` for Azure Blob Storage. |
| `connection-string` | string                                                | **Required.** Azure Blob Storage connection string.                                                 |
| `containers`        | list                                                  | Optional list of containers to extract from. If left out or empty, all files will be read.          |
| `ignore_patterns`   | configuration for either String or Pattern with flags | Any file path that matches this pattern will be ignored.                                            |

##### `containers`

Part of `azure_blob_storage` configuration.

Optional list of containers to extract from. If left out or empty, all files will be read.

Each element of this list should be a string.

##### `ignore_patterns`

Part of `azure_blob_storage` configuration.

Any file path that matches this pattern will be ignored.

Either one of the following options:

* String
* Pattern with flags

###### `pattern_with_flags`

Part of `ignore_patterns` configuration.

| Parameter | Type              | Description    |
| --------- | ----------------- | -------------- |
| `pattern` | string            | Pattern string |
| `flags`   | either `a` or `i` |                |

#### `azure_data_lake_storage`

Part of `file-provider` configuration.

Read files from an Azure Data Lake Storage.

**Example:**

To authenticate using a connection-string:

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: azure_data_lake
authentication:
  type: connection-string
  connection-string: ${AZURE_DATA_LAKE_CONNECTION_STRING}
```

To authenticate using client credentials:

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: azure_data_lake
authentication:
  type: client-credentials
  account-url: ${AZURE_STORAGE_ACCOUNT_URL}
  tenant-id: ${AZURE_TENANT_ID}
  client-id: ${AZURE_CLIENT_ID}
  client-secret: ${AZURE_CLIENT_SECRET}
```

To authenticate using default credentials:

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: azure_data_lake
authentication:
  type: default
  account-url: ${AZURE_STORAGE_ACCOUNT_URL}
  managed-identity-client-id: ${AZURE_MANAGED_IDENTITY_CLIENT_ID}
```

| Parameter         | Type             | Description                                                                                      |
| ----------------- | ---------------- | ------------------------------------------------------------------------------------------------ |
| `type`            | string           | **Required.** Select the type of file provider. Set to `azure_data_lake` for Azure Blob Storage. |
| `authentication`  | object           | **Required.** Configuration for authenticating to Azure Data Lake Storage.                       |
| `file-systems`    | list             | Optional list of file systems to extract from. If left out or empty, all files will be read.     |
| `ignore_patterns` | string or object | File paths matching this pattern are ignored. Accepts a string or a pattern object with flags.   |

##### `authentication`

Part of `azure_data_lake_storage` configuration.

The `authentication` section is used to define how the extractor authenticates with Azure Data Lake Storage.

| Parameter                    | Type   | Description                                                                                                                               |
| ---------------------------- | ------ | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `type`                       | string | **Required** Authentication method: either `connection-string` or `client-credentials` or `default`.                                      |
| `connection-string`          | string | **Required** when `type` is `connection-string`. Full Azure storage connection string.                                                    |
| `account-url`                | string | **Required** when `type` is `client-credentials`. or `default`. The account DFS/Blob endpoint (for example, the storage account URL).     |
| `tenant-id`                  | string | **Required** when `type` is `client-credentials`. Azure AD tenant ID for the service principal.                                           |
| `client-id`                  | string | **Required** when `type` is `client-credentials`. Service principal (app) client ID.                                                      |
| `client-secret`              | string | **Required** when `type` is `client-credentials`. Service principal client secret; use environment variables or key vault where possible. |
| `managed-identity-client-id` | string | The client ID for an Azure user-managed identity. Defaults to the system-managed identity if not specified.                               |

##### `file-systems`

Part of `azure_data_lake_storage` configuration.

Optional list of file systems to extract from. If left out or empty, all files will be read.

Each element of this list should be a string.

##### `ignore_patterns`

Part of `azure_data_lake_storage` configuration.

Any file path that matches this pattern will be ignored.

Either one of the following options:

* String
* Pattern with flags

###### `pattern_with_flags`

Part of `ignore_patterns` configuration.

| Parameter | Type              | Description    |
| --------- | ----------------- | -------------- |
| `pattern` | string            | Pattern string |
| `flags`   | either `a` or `i` |                |

#### `samba`

Part of `file-provider` configuration.

Read files from a Samba file share.

**Example:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: smb
server: serverhost
share-path: \\serverhost\share_path
username: username
password: ${SMB_PASSWORD}
```

| Parameter                  | Type                                                  | Description                                                                                                                                                                                                            |
| -------------------------- | ----------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `type`                     | string                                                | **Required.** Select the type of file provider. Set to `smb` for Samba.                                                                                                                                                |
| `share-path`               | string                                                | **Required.** Share path, i.e. name of path shared.<br /><br />**Example:**<br />`\\server\share_path`                                                                                                                 |
| `server`                   | string                                                | **Required.** The IP or hostname of the server to connect to                                                                                                                                                           |
| `port`                     | integer                                               | Port to connect to on the Samba server. Default value is `445`.                                                                                                                                                        |
| `username`                 | string                                                | Username for authentication.                                                                                                                                                                                           |
| `password`                 | string                                                | Password for authentication.                                                                                                                                                                                           |
| `require-signing`          | boolean                                               | Whether signing is required on messages sent to and received from the Samba server. Default value is `True`.                                                                                                           |
| `domain-controller`        | string                                                | The domain controller hostname. When set the file provider will send a DFS referral request to this hostname to populate the domain cache used for DFS connections or when connecting to `SYSVOL` or `NETLOGON`        |
| `skip-dfs`                 | boolean                                               | Whether to skip using any DFS referral checks and treat any path as a normal path. This is only useful if there are problems with the DFS resolver or you wish to avoid the extra round trip(s) the resolver requires. |
| `auth-protocol`            | either `negotiate`, `kerberos` or `ntlm`              | The protocol to use for authentication. Default value is `negotiate`.                                                                                                                                                  |
| `require-secure-negotiate` | boolean                                               | Whether to verify the negotiated dialects and capabilities on the connection to a share to protect against man in the middle downgrade attacks. Default value is `True`.                                               |
| `ignore_patterns`          | configuration for either String or Pattern with flags | Any file path that matches this pattern will be ignored.                                                                                                                                                               |

##### `ignore_patterns`

Part of `samba` configuration.

Any file path that matches this pattern will be ignored.

Either one of the following options:

* String
* Pattern with flags

###### `pattern_with_flags`

Part of `ignore_patterns` configuration.

| Parameter | Type              | Description    |
| --------- | ----------------- | -------------- |
| `pattern` | string            | Pattern string |
| `flags`   | either `a` or `i` |                |

#### `aws_s3`

Part of `file-provider` configuration.

Read files from an AWS S3 cloud bucket.

**Example:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: aws_s3
bucket: some_bucket
aws-access-key-id: ${AWS_ACCESS_KEY_ID}
aws-secret-access-key: ${AWS_SECRET_ACCESS_KEY}
```

| Parameter               | Type                                                  | Description                                                                                                                                     |
| ----------------------- | ----------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------- |
| `type`                  | string                                                | **Required.** Select the type of file provider. Set to `aws_s3` for AWS S3.                                                                     |
| `bucket`                | string                                                | **Required.** AWS S3 cloud bucket to read files from. Valid formats are `bucket-name` or `bucket-name/subfolder1/subfolder2`.                   |
| `aws-access-key-id`     | string                                                | AWS access key ID to use for authentication. If left out, use default authentication configured on the machine the extractor is running on.     |
| `aws-secret-access-key` | string                                                | AWS secret access key to use for authentication. If left out, use default authentication configured on the machine the extractor is running on. |
| `ignore_patterns`       | configuration for either String or Pattern with flags | Any file path that matches this pattern will be ignored.                                                                                        |

##### `ignore_patterns`

Part of `aws_s3` configuration.

Any file path that matches this pattern will be ignored.

Either one of the following options:

* String
* Pattern with flags

###### `pattern_with_flags`

Part of `ignore_patterns` configuration.

| Parameter | Type              | Description    |
| --------- | ----------------- | -------------- |
| `pattern` | string            | Pattern string |
| `flags`   | either `a` or `i` |                |

#### `documentum`

Part of `file-provider` configuration.

Read files from an OpenText Documentum server.

**Example:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: documentum
base-url: https://my.documentum.server/dctm
username: username
password: ${DOCUMENTUM_PASSWORD}
field-map:
  name:
    - file_name
    - file_id
    - full_file_path
  external-id:
    - i_chronicle_id
    - r_object_id
repositories:
- name: some_repo
  query: SELECT * FROM tech_document WHERE object_name LIKE 'PREFIX%' ORDER BY r_object_id
```

| Parameter                         | Type                                                  | Description                                                                                                                                                                                                                                                                  |
| --------------------------------- | ----------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `type`                            | string                                                | **Required.** Select the type of file provider. Set to `documentum` to read from a Documentum server.                                                                                                                                                                        |
| `base-url`                        | string                                                | **Required.** URL of the documentum server to read from.                                                                                                                                                                                                                     |
| `username`                        | string                                                | **Required.** Documentum server username.                                                                                                                                                                                                                                    |
| `password`                        | string                                                | **Required.** Documentum server password.                                                                                                                                                                                                                                    |
| `repositories`                    | list                                                  | **Required.** List of documentum repositories to read from, and the query to make towards each repository.                                                                                                                                                                   |
| `ssl-verify`                      | boolean                                               | Enable SSL certificate verification.<br /><br />Warning: Disabling SSL verification is a potential security risk. Do not use this option over the internet, only on local networks secured through other means. Default value is `True`.                                     |
| `get-all-renditions`              | boolean                                               | Get all renditions of each document, not just the primary.                                                                                                                                                                                                                   |
| `external-id-separator`           | string                                                | Text to use as a separator between the different parts of the created external IDs. Default value is `.`.                                                                                                                                                                    |
| `include-extension-in-file-names` | boolean                                               | Include the file extension in the name of the file in CDF. For example, upload it as `My File.pdf` instead of `My File`                                                                                                                                                      |
| `timeout`                         | string                                                | Timeout on queries to documentum. On the form `N(s\|m\|h\|d)`. Default value is `5m`.                                                                                                                                                                                        |
| `field-map`                       | object                                                | Optional. For each supported `field-map` parameter, list Documentum property names in priority order. The extractor uses the first property name in each list that exists on the document.<br /><br />Omit the whole object to use [built-in defaults](#field-map-defaults). |
| `performance`                     | object                                                | Configuration to tune the parallelism of the file extractor.                                                                                                                                                                                                                 |
| `ignore_patterns`                 | configuration for either String or Pattern with flags | Any file path that matches this pattern will be ignored.                                                                                                                                                                                                                     |

##### `repositories`

Part of `documentum` configuration.

List of documentum repositories to read from, and the query to make towards each repository.

Each element of this list should be an a documentum repository the file extractor should read from, and the query it should use.

| Parameter | Type   | Description                                                                                                                                                                                                                                                                                                    |
| --------- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `name`    | string | **Required.** Name of the documentum repository.                                                                                                                                                                                                                                                               |
| `query`   | string | **Required.** Query to retrieve file info from the repository.<br /><br />Warning: The query *must* return a consistent ordering. To ensure this, add an `ORDER BY` clause for some ID field.<br /><br />**Example:**<br />`SELECT * FROM tech_document WHERE object_name LIKE 'PREFIX%' ORDER BY r_object_id` |

##### `field-map`

Part of `documentum` configuration.

For each supported `field-map` parameter, list Documentum **property names in priority order**.

The extractor reads the document's properties and uses the **first property name in each list that exists on the document**. If none of those properties exist, it follows the behavior in the table.

Each list item is a string (a property name). The [Documentum example](#documentum) shows `field-map` with prioritized property names for the `name` and `external-id` parameters.

<a id="field-map-defaults" />

**Default `field-map` when you omit the section:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
field-map:
  external-id:
    - i_chronicle_id
  name:
    - object_name
  file-extension:
    - dos_extension
    - format_name
  modify-date:
    - r_modify_date
  mime-type:
    - mime_type
```

| Parameter        | Type | Description                                                                                                                                                            |
| ---------------- | ---- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `external-id`    | list | Contributes to the file external ID in CDF, together with `external-id-separator` and the extension. If no listed property exists, extraction fails for that document. |
| `name`           | list | File name in CDF. If no listed property exists, the extractor logs a warning and continues.                                                                            |
| `file-extension` | list | File extension, for example `pdf`. If no listed property exists, the extractor uses `bin` and logs a warning.                                                          |
| `modify-date`    | list | Source modified time. If no listed property exists, the extractor logs a warning and continues.                                                                        |
| `mime-type`      | list | MIME type. If no listed property exists, the type is inferred from the file extension.                                                                                 |

<Note>
  The file external ID in CDF combines the Documentum value resolved for `external-id`, `external-id-separator`, and the resolved file extension.
</Note>

##### `performance`

Part of `documentum` configuration.

Configuration to tune the parallelism of the file extractor.

| Parameter         | Type    | Description                                                                                                       |
| ----------------- | ------- | ----------------------------------------------------------------------------------------------------------------- |
| `workers`         | integer | Number of parallel workers used to read from documentum. Default value is `30`.                                   |
| `document-buffer` | integer | Number of document metadata instances to buffer before uploading the file contents to CDF. Default value is `60`. |
| `page-buffer`     | integer | Number of document metadata pages to buffer. Default value is `5`.                                                |

##### `ignore_patterns`

Part of `documentum` configuration.

Any file path that matches this pattern will be ignored.

Either one of the following options:

* String
* Pattern with flags

###### `pattern_with_flags`

Part of `ignore_patterns` configuration.

| Parameter | Type              | Description    |
| --------- | ----------------- | -------------- |
| `pattern` | string            | Pattern string |
| `flags`   | either `a` or `i` |                |

### `extensions`

Part of `files` configuration.

List of file extensions to include. If left out, all file extensions will be allowed.

Each element of this list should be a string.

### `labels`

Part of `files` configuration.

List of label external IDs to add to extracted files.

Each element of this list should be a string.

### `security-categories`

Part of `files` configuration.

List of security category IDs to add to extracted files.

Each element of this list should be an integer.

### `metadata-to-raw`

Part of `files` configuration.

If this is configured, write metadata to a table in CDF RAW instead of files.

| Parameter  | Type   | Description                                             |
| ---------- | ------ | ------------------------------------------------------- |
| `database` | string | **Required.** Write file metadata to this Raw database. |
| `table`    | string | **Required.** Write file metadata to this Raw table.    |

### `data_model`

Part of `files` configuration.

When this is provided, all files metadata are uploaded to data models, hence metadata-to-raw becomes redundant.

| Parameter | Type   | Description |
| --------- | ------ | ----------- |
| `space`   | string |             |

### `source`

Part of `files` configuration.

Sets the `Source` metadata field for the related files. When data modelling is set, it updates the underlying [CogniteSourceSystem](/cdf/dm/dm_reference/dm_core_data_model#cognitesourcesystem) with the correspondent source. This is an optional parameter.

| Parameter     | Type   | Description |
| ------------- | ------ | ----------- |
| `name`        | string |             |
| `external_id` | string |             |

### `filter`

Part of `files` configuration.

Either one of the following options:

* And
* Or
* Not
* Equals
* In

**Example:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
and:
- equals:
    property: some_metadata_field
    value: some_metadata_value
- not:
    in:
      property: some_metadata_field
      values:
      - metadata
      - values
```

#### `and`

Part of `filter` configuration.

Matches if all sub filters match.

| Parameter | Type | Description                                                 |
| --------- | ---- | ----------------------------------------------------------- |
| `and`     | list | **Required.** List of sub filters, all of these must match. |

##### `and`

Part of `and` configuration.

List of sub filters, all of these must match.

Each element of this list should be a configuration for either And, Or, Not, Equals or In.

#### `or`

Part of `filter` configuration.

Matches if any of the sub filters match.

| Parameter | Type | Description                                                          |
| --------- | ---- | -------------------------------------------------------------------- |
| `or`      | list | **Required.** List of sub filters, at least one of these must match. |

##### `or`

Part of `or` configuration.

List of sub filters, at least one of these must match.

Each element of this list should be a configuration for either And, Or, Not, Equals or In.

#### `not`

Part of `filter` configuration.

Matches if the sub filter does not match.

| Parameter | Type                                                | Description |
| --------- | --------------------------------------------------- | ----------- |
| `not`     | configuration for either And, Or, Not, Equals or In |             |

#### `equals`

Part of `filter` configuration.

Matches if the property on the file is equal to the given value.

| Parameter | Type   | Description                    |
| --------- | ------ | ------------------------------ |
| `equals`  | object | **Required.** Equality filter. |

##### `equals`

Part of `equals` configuration.

Equality filter.

| Parameter  | Type   | Description                            |
| ---------- | ------ | -------------------------------------- |
| `property` | string | **Required.** File property name.      |
| `value`    | string | **Required.** Property value to match. |

#### `in`

Part of `filter` configuration.

Matches if the property on the file is equal to one of the given values.

| Parameter | Type   | Description              |
| --------- | ------ | ------------------------ |
| `in`      | object | **Required.** In filter. |

##### `in`

Part of `in` configuration.

In filter.

| Parameter  | Type   | Description                               |
| ---------- | ------ | ----------------------------------------- |
| `property` | string | **Required.** File property name.         |
| `values`   | list   | Property values. One of these must match. |

###### `values`

Part of `in` configuration.

Property values. One of these must match.

Each element of this list should be a string.

### `delete-behavior`

Part of `files` configuration.

Configures how to handle files no longer present at the source when [missing-as-deleted](#missing-as-deleted) is enabled.

| Parameter | Type                    | Description                                                                                         |
| --------- | ----------------------- | --------------------------------------------------------------------------------------------------- |
| `mode`    | either `soft` or `hard` | `soft`: add a metadata field to the file in CDF (given by `key`). `hard`: delete the file from CDF. |
| `key`     | string                  | Metadata field to add to the deleted file in CDF when `mode` is `soft`. Default value is `deleted`. |

<Note>
  The 'missing-as-deleted' and 'delete-behavior' parameters require a configured [state store](#state-store) to track files between runs. Without a state store, the extractor will not delete or mark files as deleted.
</Note>

## `extractor`

Global parameter.

General configuration for the file extractor.

| Parameter            | Type                                                 | Description                                                                                                                                                                                              |
| -------------------- | ---------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `state-store`        | object                                               | Include the state store section to save extraction states between runs. Use this if data is loaded incrementally. We support multiple state stores, but you can only configure one at a time.            |
| `errors-threshold`   | integer                                              | Maximum number of retries for fallible operations in the extractor. Default value is `5`.                                                                                                                |
| `failed-report-path` | string                                               | Path to the failure report file. The report is a .jsonl file that contains the list of failed files and details regarding the errors triggered during the extraction.                                    |
| `upload-queue-size`  | integer                                              | Maximum number of files in the upload queue at a time. Default value is `10`.                                                                                                                            |
| `parallelism`        | integer                                              | Maximum number of files to upload to CDF in parallel. Note that these files are streamed directly from the source, so this is also the number of parallel downloads. Default value is `4`.               |
| `dry-run`            | boolean                                              | Run the extractor in dry run mode. If set to true, nothing will be uploaded to CDF and no states will be stored. This means that we will load file metadata from the source, but not download any files. |
| `schedule`           | configuration for either Cron Expression or Interval | File extractor schedule.<br /><br />**Examples:**<br />`{'type': 'cron', 'expression': '*/30 * * * *'}`<br />`{'type': 'interval', 'expression': '10m'}`                                                 |

### `state-store`

Part of `extractor` configuration.

Include the state store section to save extraction states between runs. Use this if data is loaded incrementally. The state store is also required for deletion ([`missing-as-deleted`](#missing-as-deleted) and [`delete-behavior`](#delete-behavior)) to take effect. We support multiple state stores, but you can only configure one at a time.

| Parameter | Type   | Description                                                                          |
| --------- | ------ | ------------------------------------------------------------------------------------ |
| `raw`     | object | A RAW state store stores the extraction state in a table in CDF RAW.                 |
| `local`   | object | A local state store stores the extraction state in a JSON file on the local machine. |

#### `raw`

Part of `state-store` configuration.

A RAW state store stores the extraction state in a table in CDF RAW.

| Parameter         | Type    | Description                                                                          |
| ----------------- | ------- | ------------------------------------------------------------------------------------ |
| `database`        | string  | **Required.** Enter the database name in CDF RAW.                                    |
| `table`           | string  | **Required.** Enter the table name in CDF RAW.                                       |
| `upload-interval` | integer | Enter the interval in seconds between each upload to CDF RAW. Default value is `30`. |

#### `local`

Part of `state-store` configuration.

A local state store stores the extraction state in a JSON file on the local machine.

| Parameter       | Type    | Description                                                             |
| --------------- | ------- | ----------------------------------------------------------------------- |
| `path`          | string  | **Required.** Insert the file path to a JSON file.                      |
| `save-interval` | integer | Enter the interval in seconds between each save. Default value is `30`. |

### `schedule`

Part of `extractor` configuration.

File extractor schedule.

Either one of the following options:

* Cron Expression
* Interval

**Examples:**

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: cron
expression: '*/30 * * * *'
```

```yaml theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
type: interval
expression: 10m
```

#### `cron_expression`

Part of `schedule` configuration.

| Parameter    | Type   | Description                                                                         |
| ------------ | ------ | ----------------------------------------------------------------------------------- |
| `type`       | string |                                                                                     |
| `expression` | string | **Required.** Cron expression schedule.<br /><br />**Example:**<br />`*/30 * * * *` |

#### `interval`

Part of `schedule` configuration.

| Parameter    | Type   | Description                                                                                                   |
| ------------ | ------ | ------------------------------------------------------------------------------------------------------------- |
| `type`       | string |                                                                                                               |
| `expression` | string | **Required.** Fixed time interval. On the form `N(s\|m\|h\|d)`.<br /><br />**Examples:**<br />`10m`<br />`3h` |

## `metrics`

Global parameter.

The `metrics` section describes where to send metrics on extractor performance for remote monitoring of the extractor. We recommend sending metrics to a [Prometheus pushgateway](https://prometheus.io), but you can also send metrics as time series in the CDF project.

| Parameter       | Type   | Description                                                                                       |
| --------------- | ------ | ------------------------------------------------------------------------------------------------- |
| `push-gateways` | list   | List of prometheus pushgateway configurations                                                     |
| `cognite`       | object | Push metrics to CDF timeseries. Requires CDF credentials to be configured                         |
| `server`        | object | The extractor can also be configured to expose a HTTP server with prometheus metrics for scraping |

### `push-gateways`

Part of `metrics` configuration.

List of prometheus pushgateway configurations

Each element of this list should be a the push-gateways sections contain a list of metric destinations.

| Parameter       | Type                   | Description                                                                                                                                                                                                                                                                                                                                                                                      |
| --------------- | ---------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `host`          | string                 | Enter the address of the host to push metrics to.                                                                                                                                                                                                                                                                                                                                                |
| `job-name`      | string                 | Enter the value of the `exported_job` label to associate metrics with. This separates several deployments on a single pushgateway, and should be unique.                                                                                                                                                                                                                                         |
| `username`      | string                 | Enter the credentials for the pushgateway.                                                                                                                                                                                                                                                                                                                                                       |
| `password`      | string                 | Enter the credentials for the pushgateway.                                                                                                                                                                                                                                                                                                                                                       |
| `clear-after`   | either null or integer | Enter the number of seconds to wait before clearing the pushgateway. When this parameter is present, the extractor will stall after the run is complete before deleting all metrics from the pushgateway. The recommended value is at least twice that of the scrape interval on the pushgateway. This is to ensure that the last metrics are gathered before the deletion. Default is disabled. |
| `push-interval` | integer                | Enter the interval in seconds between each push. Default value is `30`.                                                                                                                                                                                                                                                                                                                          |

### `cognite`

Part of `metrics` configuration.

Push metrics to CDF timeseries. Requires CDF credentials to be configured

| Parameter            | Type    | Description                                                                                      |
| -------------------- | ------- | ------------------------------------------------------------------------------------------------ |
| `external-id-prefix` | string  | **Required.** Prefix on external ID used when creating CDF time series to store metrics.         |
| `asset-name`         | string  | Enter the name for a CDF asset that will have all the metrics time series attached to it.        |
| `asset-external-id`  | string  | Enter the external ID for a CDF asset that will have all the metrics time series attached to it. |
| `push-interval`      | integer | Enter the interval in seconds between each push to CDF. Default value is `30`.                   |
| `data-set`           | object  | Data set the metrics will be created under                                                       |

#### `data-set`

Part of `cognite` configuration.

Data set the metrics will be created under

| Parameter     | Type    | Description          |
| ------------- | ------- | -------------------- |
| `id`          | integer | Resource internal ID |
| `external-id` | string  | Resource external ID |

### `server`

Part of `metrics` configuration.

The extractor can also be configured to expose a HTTP server with prometheus metrics for scraping

| Parameter | Type    | Description                                                             |
| --------- | ------- | ----------------------------------------------------------------------- |
| `host`    | string  | Host to run the prometheus server on. Default value is `0.0.0.0`.       |
| `port`    | integer | Local port to expose the prometheus server on. Default value is `9000`. |
