Set up the SAP extractor

Before you start

Assign access capabilities for the extractor to write data to the respective CDF destination resources.
Set up the SAP endpoints you want to extract data from.
Check the server requirements for the extractor.
Create a configuration file according to the configuration settings. The file must be in YAML format.

Connect to SAP

The extractor supports two different protocols for connecting to SAP: OData and SOAP.

OData
SOAP

The extractor connects to OData V2 and OData V4 endpoints in the SAP NetWeaver Gateway. SAP OData is available in multiple SAP ERP versions, such as:

SAP version	Description
SAP ERP 6.0	We recommend the SAP Gateway Service builder, which automates and generates OData entities from the SAP standard data schemas. There are multiple ways of mapping SAP entities to OData entities. To ensure that schema mapping is done correctly on the entity level, use the Import DDIC Structure function for structures or database table that you expose through the OData service. See this guide for creating an OData service in SAP ERP 6.0.
SAP S/4HANA OnPremise	We recommend using the standard OData endpoints delivered in the SAP S/4HANA installation. You’ll find the endpoints at the SAP Business Accelerator Hub. To extract data using SAP OData predefined schemas, activate the standard endpoints on the /IWFND/MAINT_SERVICES transaction. For instance, the standard endpoint for SAP plant maintenance (PM) work orders.
SAP S/4HANA Cloud	The OData endpoints are available inside the cloud communication scenarios predefined by SAP. You’ll find the standard endpoints at the SAP Business Accelerator Hub. For instance, the standard endpoint for SAP plant maintenance (PM) work orders.

See the SAP documentation for more information about SAP OData endpoints.

We recommend using OData to connect to and extract data from SAP.

Secure SAP connections with self-signed certificates

The extractor uses HTTPS to securely connect to SAP OData services. If your SAP system uses an internal or self-signed certificate, update the extractor’s certificate bundle to include it so connections between SAP, the VM environment, and the CDF service remain trusted.

Export the existing certificate bundle

Locate the public CA bundle in the certifi library:

import certifi

# Get the path to the CA bundle
ca_bundle_path = certifi.where()
print(f"CA bundle located at: {ca_bundle_path}")

Copy the file from the printed path and save it on your desktop as cacert.pem.
Add the SAP self-signed certificate
1. Obtain the SAP self-signed certificate in PEM format.
2. Open the certificate in a text editor.
3. Open cacert.pem from step 1.
4. Prepend the SAP certificate contents to the top of the file.
5. Save the updated file.

Now, cacert.pem contains both public and SAP self-signed certificates.

Configure the extractor

Set the environment variable REQUESTS_CA_BUNDLE to the path of the updated cacert.pem:

Windows (PowerShell)
Linux/MacOS (Bash)

$env:REQUESTS_CA_BUNDLE="C:\Users\<username>\Desktop\cacert.pem"

export REQUESTS_CA_BUNDLE=/home/<username>/Desktop/cacert.pem

Run the extractor in the same shell where the environment variable is defined.

The certificate must be manually added to the bundle on the VM to ensure both SAP and CDF connections work correctly.

Run as a Windows executable file

Download the SAP extractor package

Navigate to Data management > Integrate > Extractors and find the SAP extractor’s package for Windows executable. Download and decompress the zip file.

Run the executable file

Open a command line window and run the executable file with the configuration file as an argument.In this example, the configuration file is named config.yml and saved in the same folder as the executable file:

> .\sap_extractor_standalone<VERSION>-win32.exe .\config.yml

You stop the extractor by pressing Ctrl+C on your keyboard. The log file is stored in the configured path.

Run as a Windows service

Download the Windows service package

Navigate to Data management > Integrate > Extractors and find the SAP extractor’s installation package for Windows service. Download and decompress the zip file to the same directory as a configuration file.

You must name the configuration file config.yaml.

Install the service

As an administrator, open up a command line window in the folder you placed the executable file and the configuration file. Run the following command:

> .\sap_extractor_service<VERSION>-win32.exe install

Configure the service

Open Services in Windows and find the Cognite SAP extractor service. Right-click the service and select Properties. Configure the service according to your requirements.

Run as a Linux executable

Download the Linux executable package

Navigate to Data management > Integrate > Extractors and find the SAP extractor’s package for Linux executable. Download and decompress the zip file.

Run the executable file

> ./sap_extractor-<VERSION>-linux path/to/the/folder/config.yaml

Pagination

When you extract data from SAP OData, use different pagination methods: no pagination, client-side pagination, and server-side pagination. Use the pagination-type configuration setting to specify the pagination type to use when running full-load queries.

Pagination is available when you extract data from SAP OData endpoints.

No pagination
Client-side pagination
Server-side pagination

No pagination means that the extractor fetches all data available from the SAP OData endpoint without using chunking logic on the server or client. Therefore, the SAP extractor may time out and return an error while waiting for the response from SAP.

OData client-side pagination uses query parameters from the client to define a record offset. This limits the volume of data retrieved from the server.A client-side pagination request is built with the following URI parameters:

$top: specifies the number of records to return in a single batch. The default and maximum value is 1,000 records.
$skip: specifies the number of records to bypass (skip) from the total data set before returning the desired subset.

For example, a query with $skip=2000&$top=500 returns the fifth page of data, assuming there’s data available and the page size is 500 records.

The SAP server controls server-side pagination. The server generates a cursor to control the next batch of requests. This cursor represents a pointer to the start of the next page in the full data set and is returned to the calling SAP extractor. The server uses a $skiptoken value to resume pagination from the position identified by the cursor.

The pagination type you use in your extractor configuration depends on the SAP OData endpoint implementation your extractor connects to.

Load data incrementally (OData only)

If the OData entities have an incremental field, you can set up the extractor to only process new or updated entities since the last extraction. For example, you can use the S/4HANA standard OData entity Maintenance Order and enter the field LastChangeDateTime in the incremental_field configuration parameter for incremental delta queries to the CDF staging area.

For the incremental load to work properly, the incremental field in SAP must be a Edm.DateTimeOffset field. For example, LastChangeDateTime for SAP OData entity Maintenance Order.

The extractor depends on client-side pagination to do incremental load queries. Therefore, when configuring the SAP extractor to use a custom SAP OData endpoint, make sure your SAP implementation supports both client-side pagination and the $orderby operation. See Client-side pagination implementation for more information on the implementation provided by SAP.

Schedule automatic runs

To schedule automatic runs on Windows, you can run the extractors in Windows Task Scheduler. To schedule automatic runs on Mac OS and Linux, use cron expressions. To enter a new cron job, run crontab -e to edit the cron table file with the default system text editor. Here’s the format for a job in the cron table:

showLineNumbers

<minute>  <hour>  <day of month (1-31)>  <month (1-12)>  <day of week (0-6 starting on Sunday)>  <command>

Attachments

The extractor can extract attachments stored in SAP document frameworks, such as Generic Object Services (GOS), and ingest these into CDF Files. The extractor connects to the SAP OData endpoint API_CV_ATTACHMENT_SRV and fetches files that are linked to a standard SAP OData entity, such as maintenance orders. See the attachments configuration section.

You can extract attachments when you’ve connected to SAP S/4HANA servers.

Data engineering

​Before you start

​Connect to SAP

​Secure SAP connections with self-signed certificates

​Export the existing certificate bundle

​Configure the extractor

​Run as a Windows executable file

​Run as a Windows service

​Run as a Linux executable

​Pagination

​Load data incrementally (OData only)

​Schedule automatic runs

​Attachments

Before you start

Connect to SAP

Secure SAP connections with self-signed certificates

Export the existing certificate bundle

Configure the extractor

Run as a Windows executable file

Run as a Windows service

Run as a Linux executable

Pagination

Load data incrementally (OData only)

Schedule automatic runs

Attachments