Set up the WITSML extractor
The Cognite WITSML extractor is distributed as a Docker container.
Before you start
- 
Assign access capabilities for the extractor to write data to the respective CDF destination resources. 
- 
Check the server requirements for the extractor. TipYou can use OpenID Connect and your existing identity provider (IdP) framework to manage access to CDF data securely. Read more. 
- 
Set up data sets to store the WITSML configuration data ( RAW_DB_FOR_CONFIG) and the WITSML extracted data (RAW_DB_FOR_DATA).
- 
Create a configuration file according to the configuration settings. The file must be in YAML format. 
Run as a Docker container
- 
Make sure you're signed into the Cognite Docker Registry. 
- 
Enter this docker runstatement:docker run -v $(pwd)/config.yaml:/config.yaml \
 -it eu.gcr.io/cognite-registry/witsml-connector:<version>
- 
Make sure the extractor runs by verifying that Next wakeup is dueis displayed:
 ---
 2022-10-14 10:51:48,247 - INFO - root - Starting
 2022-10-14 10:51:48,248 - INFO - apscheduler.scheduler - Adding job tentatively -- it will be properly scheduled when the scheduler starts
 2022-10-14 10:51:48,248 - INFO - apscheduler.scheduler - Adding job tentatively -- it will be properly scheduled when the scheduler starts
 2022-10-14 10:51:48,249 - INFO - apscheduler.scheduler - Adding job tentatively -- it will be properly scheduled when the scheduler starts
 2022-10-14 10:51:48,250 - INFO - apscheduler.scheduler - Added job "well.all" to job store "default"
 2022-10-14 10:51:48,250 - INFO - apscheduler.scheduler - Added job "wellbore.all" to job store "default"
 2022-10-14 10:51:48,250 - INFO - apscheduler.scheduler - Added job "find_and_post_scheduled_queries" to job store "default"
 2022-10-14 10:51:48,250 - INFO - apscheduler.scheduler - Scheduler started
 2022-10-14 10:51:48,251 - DEBUG - apscheduler.scheduler - Looking for jobs to run
 2022-10-14 10:51:48,251 - DEBUG - apscheduler.scheduler - Next wakeup is due at 2022-10-14 10:52:00+00:00 (in 11.748631 seconds)
Run in Docker Compose
- 
Make sure you're signed in to the Cognite Docker Registry. 
- 
Create your docker-compose file following the docker-compose examples available here 
- 
Enter this docker composestatement in the same folder where the docker-compose file :docker compose up -d
Deploy the extractor
The Cognite WITSML extractor supports different scenarios depending on the data load and type of WITSML objects to be extracted.
There are two deployment methods available for the WITSML extractor: Simple and Advanced.
Simple deployment
Set mode to SIMPLE for low/medium data volume integrations between the WITSML server and CDF. The SIMPLE deployment is recommended to the following scenarios:
- 1–5 active wellbores.
- 1–5 object types to ingest.
- 1–5 growing objects.
The SIMPLE deployment can run with a built-in queue engine or with a external queue, which improves performance and stability of the extraction.
- SIMPLEdeployment with built-in queue engine is the default method when running the extractor as a Docker container
- SIMPLEdeployment with external queue deployment example can be found here
To ingest big historical data loads combined with growing objects such as WITSML log, use the advanced deployment methods for scalable integrations.
Advanced deployment
Use this deployment for high data volumes and change rates, typically complex extractions that needs further scalability.
Scalable extractions using only SOAP
In this deployment, the extractor is divided into separated modules:
- Scheduler: Module responsible to manage the query schedule towards the WITSML server and manage the internal extraction queue.
- Worker: Module responsible to process the data extraction from WITSML server to CDF.
Docker Compose file example is avaiable here.
Use this mode for projects with:
- More than 10 active wellbores.
- More than 15 object types to ingest.
- More than 30 growing objects.
- Extraction workers can be scaled.
- Extraction of historical records.
Scalable extractions using SOAP and ETP
This scenario supports both historical data extraction via SOAP and live data (WITSML log records) extraction using ETP.
In this deployment, the extractor is divided into separated modules:
- Scheduler: Manages the query schedule towards the WITSML server. Also, the scheduler is responsible for manage the internal queue of the extractor.
- Worker: Processes the data extraction from the WITSML server to CDF.
- ETP Receiver: Connects to the WITSML ETP subscriptions for live wells and listen for new WITSML logs data points.
- ETP Ingestor: Ingests the WITSML logs data points into CDF time series (WITSML time log) or sequences (WITSML depth log)
Docker Compose file example is available here.
Use this mode for projects with:
- More than 10 active wellbores.
- More than 15 object types to ingest.
- More than 30 growing objects.
- Extraction workers can be scaled.
- Extraction of historical records.
- Extraction of live data using ETP protocol.
To extract data using ETP, you must connect two extra modules to the to WITSML server:
- ETP receiver creates the web-socket session and listens for changes. Single worker that will handle all ETP subscriptions.
- ETP ingestor transforms WITSML data extracted from ETP to the respective CDF resources (sequencesortime series). This is a scalable worker that handles ingestions of multiple live data.
These modules require a special configuration. If you have a use case with ETP real-time logging, contact your Cognite representative.
Monitor extractions
Set up Extraction pipelines to monitor the data extractions. The first time you run the extractor, it creates the extraction pipeline entries and then logs succeeding runs with a success or failed status.
You'll see the extraction runs as two different sections on the Run history tab on the Extraction pipeline page:
The extractor stores runs related to the query scheduler:
 
The extractor stores runs related to the data ingestion to CDF:
 
All issues related to the extractor execution are logged in the extraction pipelines, including query issues in the WITSML server. All SOAP queries and the corresponding responses will be available in the extraction pipeline related to your WITSML extractor deployment.
Scheduled SOAP queries
Scheduled queries come in two variants:
- Scheduled list query: Looks for changes.
- Scheduled object query: Downloads changes.
Cognite WITSML extractor validates if the source object is a valid WITSML 1.4.1.1 object. If your server doesn't comply with the standard schema, extracting the non-compliant objects will fail.
Capturing the changes is separated from the actual data download to support scaling.
For SOAP queries, note these observations:
- The WITSML server can have a maximum number of requests per second, depending on your WITSML server setup. You must evaluate this with your WITSML vendor to avoid overloading the server.
- The XML returned from the server doesn't always comply with the WITSML standard. When this happens, the extraction fails since the data mapping performed by the extractor is based on the WITSML standard schema.
- A request can fail if there's bad data on some returned data objects.
- The amount of data for some objects can be 100+MB.
- Not all data is returned if the size of the data object is above a threshold. New requests must be sent to the server to retrieve missing data.
Explore the extracted data
You can explore the extracted data by navigating in CDF RAW or the respective CDF resource types (for WITSML growing objects).
Data stored in CDF resource types
WITSML logs are stored in CDF as time series (for time-based logs) and sequences (depth-based logs). You can navigate and check the extracted data in CDF.
Data stored in CDF RAW
WITSML is defined by a set of XSD files with a very deep-nested structure. The extractor parses the XML structure to store it in CDF RAW, enabling flexible data navigation and consumption. If the linked types have an unbound relation, this qualifies for a new table in CDF RAW, where the table name has the new element added to the name. The parent key represents the link to the table above in the XSD hierarchy.
Example: WITSML MudLog structure
- 
XSD structure:   
- 
All top-level types merged into one structure: 
 
- All levels merged into distinct tables:
