Set up the WITSML extractor

The Cognite WITSML extractor is distributed as a Docker container.

Before you start

Assign access capabilities for the extractor to write data to the respective CDF destination resources.
Check the server requirements for the extractor.

Tip
You can use OpenID Connect and your existing identity provider (IdP) framework to manage access to CDF data securely. Read more.
Set up data sets to store the WITSML configuration data (RAW_DB_FOR_CONFIG) and the WITSML extracted data (RAW_DB_FOR_DATA).
Create a configuration file according to the configuration settings. The file must be in YAML format.

Run as a Docker container

Make sure you're signed into the Cognite Docker Registry.

Enter this docker run statement:

docker run -v $(pwd)/config.yaml:/config.yaml \
-it eu.gcr.io/cognite-registry/witsml-connector:<version>

Make sure the extractor runs by verifying that Next wakeup is due is displayed:

---
2022-10-14 10:51:48,247 - INFO - root - Starting
2022-10-14 10:51:48,248 - INFO - apscheduler.scheduler - Adding job tentatively -- it will be properly scheduled when the scheduler starts
2022-10-14 10:51:48,248 - INFO - apscheduler.scheduler - Adding job tentatively -- it will be properly scheduled when the scheduler starts
2022-10-14 10:51:48,249 - INFO - apscheduler.scheduler - Adding job tentatively -- it will be properly scheduled when the scheduler starts
2022-10-14 10:51:48,250 - INFO - apscheduler.scheduler - Added job "well.all" to job store "default"
2022-10-14 10:51:48,250 - INFO - apscheduler.scheduler - Added job "wellbore.all" to job store "default"
2022-10-14 10:51:48,250 - INFO - apscheduler.scheduler - Added job "find_and_post_scheduled_queries" to job store "default"
2022-10-14 10:51:48,250 - INFO - apscheduler.scheduler - Scheduler started
2022-10-14 10:51:48,251 - DEBUG - apscheduler.scheduler - Looking for jobs to run
2022-10-14 10:51:48,251 - DEBUG - apscheduler.scheduler - Next wakeup is due at 2022-10-14 10:52:00+00:00 (in 11.748631 seconds)

Run in Docker Compose

Make sure you're signed in to the Cognite Docker Registry.
Create your docker-compose file following the docker-compose examples available here
Enter this docker compose statement in the same folder where the docker-compose file :
```
   docker compose up -d
```

Deploy the extractor

The Cognite WITSML extractor supports different scenarios depending on the data load and type of WITSML objects to be extracted.

There are two deployment methods available for the WITSML extractor: Simple and Advanced.

Simple deployment

Set mode to SIMPLE for low/medium data volume integrations between the WITSML server and CDF. The SIMPLE deployment is recommended to the following scenarios:

1–5 active wellbores.
1–5 object types to ingest.
1–5 growing objects.

The SIMPLE deployment can run with a built-in queue engine or with a external queue, which improves performance and stability of the extraction.

SIMPLE deployment with built-in queue engine is the default method when running the extractor as a Docker container
SIMPLE deployment with external queue deployment example can be found here

tip

To ingest big historical data loads combined with growing objects such as WITSML log, use the advanced deployment methods for scalable integrations.

Advanced deployment

Use this deployment for high data volumes and change rates, typically complex extractions that needs further scalability.

Scalable extractions using only SOAP

In this deployment, the extractor is divided into separated modules:

Scheduler: Module responsible to manage the query schedule towards the WITSML server and manage the internal extraction queue.
Worker: Module responsible to process the data extraction from WITSML server to CDF.

Docker Compose file example is avaiable here.

Use this mode for projects with:

More than 10 active wellbores.
More than 15 object types to ingest.
More than 30 growing objects.
Extraction workers can be scaled.
Extraction of historical records.

Scalable extractions using SOAP and ETP

This scenario supports both historical data extraction via SOAP and live data (WITSML log records) extraction using ETP.

In this deployment, the extractor is divided into separated modules:

Scheduler: Manages the query schedule towards the WITSML server. Also, the scheduler is responsible for manage the internal queue of the extractor.
Worker: Processes the data extraction from the WITSML server to CDF.
ETP Receiver: Connects to the WITSML ETP subscriptions for live wells and listen for new WITSML logs data points.
ETP Ingestor: Ingests the WITSML logs data points into CDF time series (WITSML time log) or sequences (WITSML depth log)

Docker Compose file example is available here.

Use this mode for projects with:

More than 10 active wellbores.
More than 15 object types to ingest.
More than 30 growing objects.
Extraction workers can be scaled.
Extraction of historical records.
Extraction of live data using ETP protocol.

To extract data using ETP, you must connect two extra modules to the to WITSML server:

ETP receiver creates the web-socket session and listens for changes. Single worker that will handle all ETP subscriptions.
ETP ingestor transforms WITSML data extracted from ETP to the respective CDF resources (sequences or time series). This is a scalable worker that handles ingestions of multiple live data.

These modules require a special configuration. If you have a use case with ETP real-time logging, contact your Cognite representative.

Monitor extractions

Set up Extraction pipelines to monitor the data extractions. The first time you run the extractor, it creates the extraction pipeline entries and then logs succeeding runs with a success or failed status.

You'll see the extraction runs as two different sections on the Run history tab on the Extraction pipeline page:

The extractor stores runs related to the query scheduler:

CDF Extraction Pipelines - Scheduler runs

The extractor stores runs related to the data ingestion to CDF:

CDF Extraction Pipelines - Ingestion runs

All issues related to the extractor execution are logged in the extraction pipelines, including query issues in the WITSML server. All SOAP queries and the corresponding responses will be available in the extraction pipeline related to your WITSML extractor deployment.

Scheduled SOAP queries

Scheduled queries come in two variants:

Scheduled list query: Looks for changes.
Scheduled object query: Downloads changes.

tip

Cognite WITSML extractor validates if the source object is a valid WITSML 1.4.1.1 object. If your server doesn't comply with the standard schema, extracting the non-compliant objects will fail.

Capturing the changes is separated from the actual data download to support scaling.

For SOAP queries, note these observations:

The WITSML server can have a maximum number of requests per second, depending on your WITSML server setup. You must evaluate this with your WITSML vendor to avoid overloading the server.
The XML returned from the server doesn't always comply with the WITSML standard. When this happens, the extraction fails since the data mapping performed by the extractor is based on the WITSML standard schema.
A request can fail if there's bad data on some returned data objects.
The amount of data for some objects can be 100+MB.
Not all data is returned if the size of the data object is above a threshold. New requests must be sent to the server to retrieve missing data.

Explore the extracted data

You can explore the extracted data by navigating in CDF RAW or the respective CDF resource types (for WITSML growing objects).

Data stored in CDF resource types

WITSML logs are stored in CDF as time series (for time-based logs) and sequences (depth-based logs). You can navigate and check the extracted data in CDF.

Data stored in CDF RAW

WITSML is defined by a set of XSD files with a very deep-nested structure. The extractor parses the XML structure to store it in CDF RAW, enabling flexible data navigation and consumption. If the linked types have an unbound relation, this qualifies for a new table in CDF RAW, where the table name has the new element added to the name. The parent key represents the link to the table above in the XSD hierarchy.

Example: WITSML MudLog structure

XSD structure:
All top-level types merged into one structure:

All levels merged into distinct tables:

Before you start​

Run as a Docker container​

Run in Docker Compose​

Deploy the extractor​

Simple deployment​

Advanced deployment​

Scalable extractions using only SOAP​

Scalable extractions using SOAP and ETP​

Monitor extractions​

Scheduled SOAP queries​

Explore the extracted data​

Data stored in CDF resource types​

Data stored in CDF RAW​

Example: WITSML MudLog structure​