Skip to main content

Configure the WITSML extractor

To configure the WITSML extractor, you must create a configuration file. The file must be in YAML format. Below, you'll find the minimal configuration file needed to run the WITSML extractor in simple mode.

extractor:
CONFIG_DATASET_ID: <the data set ID for temporary files created by the extractor>
DATA_DATASET_ID: <The data set ID for the CDF resources created by the extractor>

cdf:
COGNITE_PROJECT: <The CDF project name>
TENANT_ID: <The ID of your tenant>
TOKEN_CLIENT_ID: <Client ID for the WITSML extractor>
TOKEN_CLIENT_SECRET: <Client secret for the WITSML extractor>
CDF_CLUSTER: <The CDF cluster>

extract:
'your-witsml-server-reference':
gateway:
host: <WITSML server host>
user: <WITSML user name>
password: <WITSML password>
rules:
well:
schedule: '*/10 * * * *'
object_type: WELL

Other configuration examples are available here.

Extraction rules

You can configure the extractor using two rules to define what, how, and when to ingest WITSML data into Cognite Data Fusion (CDF). Use the ruletype configuration parameter to add a rule:

  • ChangedObjects -Finds objects that are changed since the last time the request was sent to the WITSML server. This requires that the WITSML server sets the dTimeLastUpdated flag correctly. This has been tested and verified for Petrolink PetroVault and Kongsberg SiteCom. This is the default setting.

  • UpdateStatus - Finds objects that are ingested into CDF based on a given status that matches the rule. The query can become out of sync when the status is changed, and no rule captures the changed attribute. The UpdateStatus rule runs its query and compares the result with the data in CDF RAW. If there's a mismatch, you update CDF RAW by creating a ScheduledObjectQuery for the object. This rule can handle the wellbore isActive status or log the objectGrowing flag.

info
  • If you add new rules to the configuration file at runtime, the extractor sets all existing rules to inactive before new rules are ingested into the table based on the rules definitions.

  • If the extractor section isn't added to the configuration file at runtime, the extractor uses the rules stored in the extractionrules table in the witsml_config database.

Extractor

Include the extractor section to configure the extractor setup.

ParameterDescription
APP_NAMEEnter the name of the extractor deployment. This is used in the extraction pipeline run in CDF. The default value is witsml-extractor.
MODEDefine the execution mode. The default value is SIMPLE.
JSON_LOGGINGSet to true to enable debug logging in JSON format. This is useful for troubleshooting. The default value is false.
LOG_LEVELSelect the verbosity level for logging. Valid options, in decreasing verbosity levels, are DEBUG, INFO, WARNING, ERROR, and CRITICAL. The default value is INFO.
CONFIG_DATASET_IDInsert the data set ID for the WITSML configuration in CDF RAW. This is a required field.
DATA_DATASET_IDInsert the data set ID for the WITSML data in CDF RAW. This is a required field.
RAW_DB_FOR_CONFIGEnter the name of the CDF RAW database for the WITSML configuration. The default value is witsml-config.
RAW_DB_FOR_DATAEnter the name of the CDF RAW database for the WITSML data. The default value is witsml-data.
DEPTH_TO_ROWNUMBER_SCALE_FACTORInsert a factor to multiply depth indexes for row-key in sequences. The default value is 1000.
EXTPIPELINE_EXT_ID_PREFIXEnter a prefix that will be added to the extraction pipeline run in CDF.
EXT_ID_PREFIXEnter an external ID prefix to identify the objects created directly in the CDF resource type.
EXT_ID_SUFFIXEnter an external ID suffix to identify the objects created directly in the CDF resource type.
FIND_UNAVAILABLE_IN_SOURCESet to true for maintenance jobs that will look for objects in CDF that are no longer available in the source system. The default value is true.
ADD_LOG_INFO_TO_TIMESERIESSet to true to add log header information to time series created by the extractor. The default value is true.
ARCHIVE_DOWNLOADED_FILESSet to true to archive and compress complete XML files downloaded from the WITSML server. Use this when you're reprocessing the same file in different environments or testing different configurations. The default value is false.

CDF

Include the cdf section to configure which CDF project the extractor will load data into and how to connect to the project. This section is mandatory and should always contain the project and authentication configuration.

ParameterDescription
COGNITE_PROJECTInsert the CDF project name you want to ingest data into.
TENANT_IDEnter the Azure tenant ID.
TOKEN_CLIENT_IDEnter the CDF client ID. This is mandatory if you're using OIDC authentication.
TOKEN_CLIENT_SECRETEnter the CDF client secret. This is mandatory if you're using OIDC authentication.
CDF_CLUSTEREnter the name of the CDF cluster.

ETP

Include the etp section when you want to setup the ingestion of live data from WITSML ETP objects to CDF.

ParameterDescription
keep_etp_msgKeep the ETP message stored in the queue after processing. The default value is false.
refresh_afterRefresh period (in seconds) that the ETP receiver worker will look for new active wells. The default value is 300.

ETP Gateway

Include the gateway subsection to configure how the extractor connects to the WITSML ETP provider.

ParameterDescription
hostInsert the base URL of the WITSML server. This is a required field.
userEnter the username for authenticating to the WITSML server. This is a required field.
passwordEnter the password authenticating to the WITSML server. This is a required field.

Extract

This section contains the parameters needed to connect to your WITSML server and the related extraction rules. You can configure several WITSML servers. Each server needs its own witsml-server-reference with gateway and rules sections. The server reference is stored on all main object rows in CDF RAW to reference the object source.

Gateway

Include the gateway subsection to configure how the extractor connects to the WITSML server.

ParameterDescription
hostInsert the base URL of the WITSML server. This is a required field.
userEnter the username for authenticating to the WITSML server. This is a required field.
passwordEnter the password authenticating to the WITSML server. This is a required field.

Rules

Include the rules subsection to define what, how, and when to ingest WITSML data into CDF. See the extraction rules section for more details.

ParameterDescription
scheduleSet up a schedule for the given rule type. Use Cron expressions enclosed with "" or s:10. This is a required field.
object_typeInsert the WITSML object type. Valid options are WELL, WELLBORE, TUBULAR, TRAJECTORY, LOG. You can define logs as TIMELOG, DEPTHLOG, or LOG. The default value is the WITSML object type as defined in the standard. Enter this value in uppercase. This is a required field.
rule_typeSelect a rule type. Valid options are CHANGEDOBJECT or UPDATESTATUS.
base_queryIf you don't enter a base_query, the standard query for the given object type is used to look for all occurrences of given type.
configDifferent values based on rule_type. See the sections below. This is a required field.

For CHANGEOBJECT rules:

ParameterDescription
load_deltasSet to true to only process new items for growing objects. Set to false to process all items every time. This is a required field. The default value is true.
ingest_to_cleanSet to true to create objects directly to the CDF resource type. This parameter only applies to Attachment and Log objects. The default value is false.
only_for_active_wellboresSet to true to add active wellbores to the query before the query is executed. This significantly improves performance when the extractor looks for changes in growing objects. The default value is false.
filter_on_last_modifiedSet to true to look for changes since the last received modification. If you set this to false, the extractor retrieves all rows every time the query is executed. Use this parameter when the objects change on the WITSML server without setting the lastModified timestamp on the objects. The default value is true. This has some impact on performance and the load on the WITSML server.
log_data_to_raw_variantSelect how to store the log data records in CDF RAW. The default values are NONE, ALL, ONLY_TIME, ONLY_DEPTH.

For UPDATESTATUS rules:

ParameterDescription
attributeTypically, use isActive for wells and objectGrowing for logs. The valid option is the attribute name in the WITSML object. This is a required field.
look_for_valueTypically, use true when comparing the isActive parameter. The valid option is the specified attribute value to compare with CDF RAW to find objects that are out of sync. This is a required field.
ingest_rule_refEnter the rule name to use if the extractor finds out-of-sync objects. This is a required field.
only_for_active_wellboresAdd active wellbores to query before the query is executed. This significantly improves performance when looking for changes in growing objects. The default value is false.