> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognite.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Configuration settings

> Configuration parameters and settings for the WITSML extractor to connect to WITSML servers and ingest data into CDF.

To configure the WITSML extractor, you must create a configuration file. The file must be in [YAML](https://yaml.org) format.
Below, you'll find the minimal configuration file needed to run the WITSML extractor in [simple](/cdf/integration/guides/extraction/witsml/witsml_setup#simple-deployment) mode.

```yaml showLineNumbers theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
extractor:
  CONFIG_DATASET_ID: <the data set ID for temporary files created by the extractor>
  DATA_DATASET_ID: <The data set ID for the CDF resources created by the extractor>

cdf:
  COGNITE_PROJECT: <The CDF project name>
  TENANT_ID: <The ID of your tenant>
  TOKEN_CLIENT_ID: <Client ID for the WITSML extractor>
  TOKEN_CLIENT_SECRET: <Client secret for the WITSML extractor>
  CDF_CLUSTER: <The CDF cluster>

extract:
  'your-witsml-server-reference':
    gateway:
      host: <WITSML server host>
      user: <WITSML user name>
      password: <WITSML password>
    rules:
      well:
        schedule: '*/10 * * * *'
        object_type: WELL
```

Other configuration examples are available [here](https://github.com/cognitedata/subsurface-extractor-config-samples/tree/main/witsml-extractor/configuration).

## Extraction rules

You can configure the extractor using two rules to define what, how, and when to ingest WITSML data into Cognite Data Fusion (CDF). Use the [`ruletype`](/cdf/integration/guides/extraction/witsml/witsml_configuration#rules) configuration parameter to add a rule:

* **ChangedObjects** -Finds objects that are changed since the last time the request was sent to the WITSML server. This requires that the WITSML server sets the `dTimeLastUpdated` flag correctly. This has been tested and verified for Petrolink PetroVault and Kongsberg SiteCom. This is the default setting.

* **UpdateStatus** - Finds objects that are ingested into CDF based on a given status that matches the rule. The query can become out of sync when the status is changed, and no rule captures the changed attribute. The *UpdateStatus* rule runs its query and compares the result with the data in CDF RAW. If there's a mismatch, you update CDF RAW by creating a `ScheduledObjectQuery` for the object. This rule can handle the wellbore `isActive` status or log the `objectGrowing` flag.

<Info>
  - If you add new rules to the configuration file at runtime, the extractor sets all existing rules to inactive before new rules are ingested into the table based on the rules definitions.
  - If the `extractor` section isn't added to the configuration file at runtime, the extractor uses the rules stored in the *extractionrules* table in the `witsml_config` database.
</Info>

## Extractor

Include the `extractor` section to configure the extractor setup.

| Parameter                         | Description                                                                                                                                                                                                                              |
| --------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `APP_NAME`                        | Enter the name of the extractor deployment. This is used in the extraction pipeline run in CDF. The default value is `witsml-extractor`.                                                                                                 |
| `MODE`                            | Define the [execution mode](/cdf/integration/guides/extraction/witsml/witsml_setup#deploy-the-extractor). The default value is `SIMPLE`.                                                                                                 |
| `JSON_LOGGING`                    | Set to `true` to enable debug logging in JSON format. This is useful for troubleshooting. The default value is `false`.                                                                                                                  |
| `LOG_LEVEL`                       | Select the verbosity level for logging. Valid options, in decreasing verbosity levels, are `DEBUG`, `INFO`, `WARNING`, `ERROR`, and `CRITICAL`. The default value is `INFO`.                                                             |
| `CONFIG_DATASET_ID`               | Insert the data set ID for the WITSML configuration in CDF RAW. This is a required field.                                                                                                                                                |
| `DATA_DATASET_ID`                 | Insert the data set ID for the WITSML data in CDF RAW. This is a required field.                                                                                                                                                         |
| `RAW_DB_FOR_CONFIG`               | Enter the name of the CDF RAW database for the WITSML configuration. The default value is `witsml-config`.                                                                                                                               |
| `RAW_DB_FOR_DATA`                 | Enter the name of the CDF RAW database for the WITSML data. The default value is `witsml-data`.                                                                                                                                          |
| `DEPTH_TO_ROWNUMBER_SCALE_FACTOR` | Insert a factor to multiply depth indexes for `row-key` in sequences. The default value is 1000.                                                                                                                                         |
| `EXTPIPELINE_EXT_ID_PREFIX`       | Enter a prefix that will be added to the extraction pipeline run in CDF.                                                                                                                                                                 |
| `EXT_ID_PREFIX`                   | Enter an external ID prefix to identify the objects created directly in the CDF resource type.                                                                                                                                           |
| `EXT_ID_SUFFIX`                   | Enter an external ID suffix to identify the objects created directly in the CDF resource type.                                                                                                                                           |
| `FIND_UNAVAILABLE_IN_SOURCE`      | Set to `true` for maintenance jobs that will look for objects in CDF that are no longer available in the source system. The default value is `true`.                                                                                     |
| `ADD_LOG_INFO_TO_TIMESERIES`      | Set to `true` to add log header information to time series created by the extractor. The default value is `true`.                                                                                                                        |
| `ARCHIVE_DOWNLOADED_FILES`        | Set to `true` to archive and compress complete XML files downloaded from the WITSML server. Use this when you're reprocessing the same file in different environments or testing different configurations. The default value is `false`. |
| `ARCHIVE_RESPONSE_FILES`          | Set to `false` to remove the response XML files downloaded from the WITSML server after being processed by the extractor. The default value is `true`.                                                                                   |

## CDF

Include the `cdf` section to configure which CDF project the extractor will load data into and how to connect to the project. This section is mandatory and should always contain the project and authentication configuration.

| Parameter             | Description                                                                         |
| --------------------- | ----------------------------------------------------------------------------------- |
| `COGNITE_PROJECT`     | Insert the CDF project name you want to ingest data into.                           |
| `TENANT_ID`           | Enter the Azure tenant ID.                                                          |
| `TOKEN_CLIENT_ID`     | Enter the CDF client ID. This is mandatory if you're using OIDC authentication.     |
| `TOKEN_CLIENT_SECRET` | Enter the CDF client secret. This is mandatory if you're using OIDC authentication. |
| `CDF_CLUSTER`         | Enter the name of the CDF cluster.                                                  |

## ETP

Include the `etp` section when you want to setup the ingestion of live data from WITSML ETP objects to CDF.

| Parameter               | Description                                                                                                                                                          |
| ----------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `keep_etp_msg`          | Keep the ETP message stored in the queue after processing. The default value is `false`.                                                                             |
| `trigger_refresh_lag`   | Trigger a websocket reconnection in case the latency becames above 10 minutes. The default value is `false`.                                                         |
| `max_msg_rate`          | Maximum ETP message rate while streaming ETP records. The default value is `100`, defined by Energistics.                                                            |
| `max_data_items`        | Maximum number of data items per ETP message. The default value is `1100`, , defined by Energistics.                                                                 |
| `index_padding_minutes` | Set the index padding time (in minutes) whenever the ETP client starts the websocket communication. The default value is `10`                                        |
| `filter_uids`           | Filter the WITSML logs to be listened, to per log uid. The default value is `None`, hence without filter all active logs will be fetched by the websocket connection |
| `max_queue_size`        | Maximum size of upload queue. Upload to CDF will be triggered once this limit is reached. Default value is `1000`.                                                   |
| `message_timeout`       | ETP websocket communication timeout, in seconds. Default value is `30`.                                                                                              |

### ETP Gateway

Include the `gateway` subsection to configure how the extractor connects to the WITSML ETP provider.

| Parameter  | Description                                                                           |
| ---------- | ------------------------------------------------------------------------------------- |
| `host`     | Insert the base URL of the WITSML server. This is a required field.                   |
| `user`     | Enter the username for authenticating to the WITSML server. This is a required field. |
| `password` | Enter the password authenticating to the WITSML server. This is a required field.     |

## Extract

This section contains the parameters needed to connect to your WITSML server and the related extraction rules. You can configure several WITSML servers. Each server needs its own `witsml-server-reference` with `gateway` and `rules` sections. The server reference is stored on all main object rows in CDF RAW to reference the object source.

### Gateway

Include the `gateway` subsection to configure how the extractor connects to the WITSML server.

| Parameter  | Description                                                                           |
| ---------- | ------------------------------------------------------------------------------------- |
| `host`     | Insert the base URL of the WITSML server. This is a required field.                   |
| `user`     | Enter the username for authenticating to the WITSML server. This is a required field. |
| `password` | Enter the password authenticating to the WITSML server. This is a required field.     |

### Rules

Include the `rules` subsection to define what, how, and when to ingest WITSML data into CDF. See the [extraction rules](/cdf/integration/guides/extraction/witsml/witsml_configuration#extraction-rules) section for more details.

| Parameter     | Description                                                                                                                                                                                                                                                                                       |
| ------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `schedule`    | Set up a schedule for the given `rule type`. Use [Cron expressions](https://en.wikipedia.org/wiki/Cron) enclosed with `""` or `s:10`. This is a required field.                                                                                                                                   |
| `object_type` | Insert the WITSML object type. Valid options are `WELL`, `WELLBORE`, `TUBULAR`, `TRAJECTORY`, `LOG`. You can define logs as `TIMELOG`, `DEPTHLOG`, or `LOG`. The default value is the WITSML object type as defined in the standard. **Enter this value in uppercase**. This is a required field. |
| `rule_type`   | Select a rule type. Valid options are `CHANGEDOBJECT` or `UPDATESTATUS`.                                                                                                                                                                                                                          |
| `base_query`  | If you don't enter a base\_query, the standard query for the given object type is used to look for all occurrences of given type.                                                                                                                                                                 |
| `config`      | Different values based on `rule_type`. See the sections below. This is a required field.                                                                                                                                                                                                          |

For `CHANGEOBJECT` rules:

| Parameter                   | Description                                                                                                                                                                                                                                                                                                                                                                                        |
| --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `load_deltas`               | Set to `true` to only process new items for growing objects. Set to `false` to process all items every time. This is a required field. The default value is `true`.                                                                                                                                                                                                                                |
| `ingest_to_clean`           | Set to `true` to create objects directly to the CDF resource type. This parameter only applies to *Attachment* and *Log objects*. The default value is `false`.                                                                                                                                                                                                                                    |
| `only_for_active_wellbores` | Set to `true` to add active wellbores to the query before the query is run. This significantly improves performance when the extractor looks for changes in growing objects. The default value is `false`.                                                                                                                                                                                         |
| `filter_on_last_modified`   | Set to `true` to look for changes since the last received modification. If you set this to `false`, the extractor retrieves all rows every time the query is run. Use this parameter when the objects change on the WITSML server without setting the `lastModified` timestamp on the objects. The default value is `true`. This has some impact on performance and the load on the WITSML server. |
| `log_data_to_raw_variant`   | Select how to store the log data records in CDF RAW. The default values are `NONE`, `ALL`, `ONLY_TIME`, `ONLY_DEPTH`.                                                                                                                                                                                                                                                                              |

For `UPDATESTATUS` rules:

| Parameter                   | Description                                                                                                                                                                                              |
| --------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `attribute`                 | Typically, use `isActive` for wells and `objectGrowing` for logs. The valid option is the attribute name in the WITSML object. This is a required field.                                                 |
| `look_for_value`            | Typically, use `true` when comparing the `isActive` parameter. The valid option is the specified attribute value to compare with CDF RAW to find objects that are out of sync. This is a required field. |
| `ingest_rule_ref`           | Enter the rule name to use if the extractor finds out-of-sync objects. This is a required field.                                                                                                         |
| `only_for_active_wellbores` | Add active wellbores to query before the query is run. This significantly improves performance when looking for changes in growing objects. The default value is `false`.                                |
