Configure the WITSML extractor
To configure the WITSML extractor, you must create a configuration file. The file must be in YAML format. Below, you'll find the minimal configuration file needed to run the WITSML extractor in simple mode.
extractor:
CONFIG_DATASET_ID: <the data set ID for temporary files created by the extractor>
DATA_DATASET_ID: <The data set ID for the CDF resources created by the extractor>
cdf:
COGNITE_PROJECT: <The CDF project name>
TENANT_ID: <The ID of your tenant>
TOKEN_CLIENT_ID: <Client ID for the WITSML extractor>
TOKEN_CLIENT_SECRET: <Client secret for the WITSML extractor>
CDF_CLUSTER: <The CDF cluster>
extract:
'your-witsml-server-reference':
gateway:
host: <WITSML server host>
user: <WITSML user name>
password: <WITSML password>
rules:
well:
schedule: '*/10 * * * *'
object_type: WELL
Other configuration examples are available here.
Extraction rules
You can configure the extractor using two rules to define what, how, and when to ingest WITSML data into Cognite Data Fusion (CDF). Use the ruletype
configuration parameter to add a rule:
-
ChangedObjects -Finds objects that are changed since the last time the request was sent to the WITSML server. This requires that the WITSML server sets the
dTimeLastUpdated
flag correctly. This has been tested and verified for Petrolink PetroVault and Kongsberg SiteCom. This is the default setting. -
UpdateStatus - Finds objects that are ingested into CDF based on a given status that matches the rule. The query can become out of sync when the status is changed, and no rule captures the changed attribute. The UpdateStatus rule runs its query and compares the result with the data in CDF RAW. If there's a mismatch, you update CDF RAW by creating a
ScheduledObjectQuery
for the object. This rule can handle the wellboreisActive
status or log theobjectGrowing
flag.
-
If you add new rules to the configuration file at runtime, the extractor sets all existing rules to inactive before new rules are ingested into the table based on the rules definitions.
-
If the
extractor
section isn't added to the configuration file at runtime, the extractor uses the rules stored in the extractionrules table in thewitsml_config
database.
Extractor
Include the extractor
section to configure the extractor setup.
Parameter | Description |
---|---|
APP_NAME | Enter the name of the extractor deployment. This is used in the extraction pipeline run in CDF. The default value is witsml-extractor . |
MODE | Define the execution mode. The default value is SIMPLE . |
JSON_LOGGING | Set to true to enable debug logging in JSON format. This is useful for troubleshooting. The default value is false . |
LOG_LEVEL | Select the verbosity level for logging. Valid options, in decreasing verbosity levels, are DEBUG , INFO , WARNING , ERROR , and CRITICAL . The default value is INFO . |
CONFIG_DATASET_ID | Insert the data set ID for the WITSML configuration in CDF RAW. This is a required field. |
DATA_DATASET_ID | Insert the data set ID for the WITSML data in CDF RAW. This is a required field. |
RAW_DB_FOR_CONFIG | Enter the name of the CDF RAW database for the WITSML configuration. The default value is witsml-config . |
RAW_DB_FOR_DATA | Enter the name of the CDF RAW database for the WITSML data. The default value is witsml-data . |
DEPTH_TO_ROWNUMBER_SCALE_FACTOR | Insert a factor to multiply depth indexes for row-key in sequences. The default value is 1000. |
EXTPIPELINE_EXT_ID_PREFIX | Enter a prefix that will be added to the extraction pipeline run in CDF. |
EXT_ID_PREFIX | Enter an external ID prefix to identify the objects created directly in the CDF resource type. |
EXT_ID_SUFFIX | Enter an external ID suffix to identify the objects created directly in the CDF resource type. |
FIND_UNAVAILABLE_IN_SOURCE | Set to true for maintenance jobs that will look for objects in CDF that are no longer available in the source system. The default value is true . |
ADD_LOG_INFO_TO_TIMESERIES | Set to true to add log header information to time series created by the extractor. The default value is true . |
ARCHIVE_DOWNLOADED_FILES | Set to true to archive and compress complete XML files downloaded from the WITSML server. Use this when you're reprocessing the same file in different environments or testing different configurations. The default value is false . |
ARCHIVE_RESPONSE_FILES | Set to false to remove the response XML files downloaded from the WITSML server after being processed by the extractor. The default value is true . |
CDF
Include the cdf
section to configure which CDF project the extractor will load data into and how to connect to the project. This section is mandatory and should always contain the project and authentication configuration.
Parameter | Description |
---|---|
COGNITE_PROJECT | Insert the CDF project name you want to ingest data into. |
TENANT_ID | Enter the Azure tenant ID. |
TOKEN_CLIENT_ID | Enter the CDF client ID. This is mandatory if you're using OIDC authentication. |
TOKEN_CLIENT_SECRET | Enter the CDF client secret. This is mandatory if you're using OIDC authentication. |
CDF_CLUSTER | Enter the name of the CDF cluster. |
ETP
Include the etp
section when you want to setup the ingestion of live data from WITSML ETP objects to CDF.
Parameter | Description |
---|---|
keep_etp_msg | Keep the ETP message stored in the queue after processing. The default value is false . |
refresh_after | Refresh period (in seconds) that the ETP receiver worker will look for new active wells. The default value is 300 . |
ETP Gateway
Include the gateway
subsection to configure how the extractor connects to the WITSML ETP provider.
Parameter | Description |
---|---|
host | Insert the base URL of the WITSML server. This is a required field. |
user | Enter the username for authenticating to the WITSML server. This is a required field. |
password | Enter the password authenticating to the WITSML server. This is a required field. |
Extract
This section contains the parameters needed to connect to your WITSML server and the related extraction rules. You can configure several WITSML servers. Each server needs its own witsml-server-reference
with gateway
and rules
sections. The server reference is stored on all main object rows in CDF RAW to reference the object source.
Gateway
Include the gateway
subsection to configure how the extractor connects to the WITSML server.
Parameter | Description |
---|---|
host | Insert the base URL of the WITSML server. This is a required field. |
user | Enter the username for authenticating to the WITSML server. This is a required field. |
password | Enter the password authenticating to the WITSML server. This is a required field. |
Rules
Include the rules
subsection to define what, how, and when to ingest WITSML data into CDF. See the extraction rules section for more details.
Parameter | Description |
---|---|
schedule | Set up a schedule for the given rule type . Use Cron expressions enclosed with "" or s:10 . This is a required field. |
object_type | Insert the WITSML object type. Valid options are WELL , WELLBORE , TUBULAR , TRAJECTORY , LOG . You can define logs as TIMELOG , DEPTHLOG , or LOG . The default value is the WITSML object type as defined in the standard. Enter this value in uppercase. This is a required field. |
rule_type | Select a rule type. Valid options are CHANGEDOBJECT or UPDATESTATUS . |
base_query | If you don't enter a base_query, the standard query for the given object type is used to look for all occurrences of given type. |
config | Different values based on rule_type . See the sections below. This is a required field. |
For CHANGEOBJECT
rules:
Parameter | Description |
---|---|
load_deltas | Set to true to only process new items for growing objects. Set to false to process all items every time. This is a required field. The default value is true . |
ingest_to_clean | Set to true to create objects directly to the CDF resource type. This parameter only applies to Attachment and Log objects. The default value is false . |
only_for_active_wellbores | Set to true to add active wellbores to the query before the query is run. This significantly improves performance when the extractor looks for changes in growing objects. The default value is false . |
filter_on_last_modified | Set to true to look for changes since the last received modification. If you set this to false , the extractor retrieves all rows every time the query is run. Use this parameter when the objects change on the WITSML server without setting the lastModified timestamp on the objects. The default value is true . This has some impact on performance and the load on the WITSML server. |
log_data_to_raw_variant | Select how to store the log data records in CDF RAW. The default values are NONE , ALL , ONLY_TIME , ONLY_DEPTH . |
For UPDATESTATUS
rules:
Parameter | Description |
---|---|
attribute | Typically, use isActive for wells and objectGrowing for logs. The valid option is the attribute name in the WITSML object. This is a required field. |
look_for_value | Typically, use true when comparing the isActive parameter. The valid option is the specified attribute value to compare with CDF RAW to find objects that are out of sync. This is a required field. |
ingest_rule_ref | Enter the rule name to use if the extractor finds out-of-sync objects. This is a required field. |
only_for_active_wellbores | Add active wellbores to query before the query is run. This significantly improves performance when looking for changes in growing objects. The default value is false . |