Skip to main content

Configure the SAP extractor

To configure the SAP extractor, you must create a configuration file. The file must be in YAML format.

You can use the sample minimal configuration file included with the extractor packages as a starting point for your configuration settings.

The configuration file contains the global parameter version, which holds the version of the configuration schema. This article describes version 1.

Tip

You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.

Logger

Use the optional logger section to set up logging to a console or files.

ParameterDescription
consoleSet up console logger configuration. See the Console section.
fileSet up file logger configuration. See the File section.

Console

Use the console subsection to enable logging to a standard output, such as a terminal window.

ParameterDescription
levelSelect the verbosity level for console logging. Valid options, in decreasing verbosity levels, are DEBUG, INFO, WARNING, ERROR, and CRITICAL.

File

Use the file subsection to enable logging to a file. The files are rotated daily.

ParameterDescription
levelSelect the verbosity level for file logging. Valid options, in decreasing verbosity levels, are DEBUG, INFO, WARNING, ERROR, and CRITICAL.
pathInsert the path to the log file.
retentionSpecify the number of days to keep logs for. The default value is 7 days.

Cognite

Use the cognite section to describe which CDF project the extractor will load data into and how to connect to the project.

ParameterDescription
projectInsert the CDF project name. This is a required parameter.
hostInsert the base URL of the CDF project. The default value is https://api.cognitedata.com.
api-keyWe've deprecated API key authentication.
idp-authenticationInsert the credentials for authenticating to CDF using an external identity provider. You must enter either an API key or use IdP authentication.

Identity provider (IdP) authentication

Use the idp-authentication subsection to enable the extractor to authenticate to CDF using an external identity provider, such as Azure AD.

ParameterDescription
client-idEnter the client ID from the IdP. This is a required parameter.
secretEnter the client secret from the IdP. This is a required parameter.
scopesList the scopes. This is a required parameter.
resourceInsert token requests. This is an optional parameter.
token-urlInsert the URL to fetch tokens from. You must enter either a token URL or an Azure tenant.
tenantEnter the Azure tenant. You must enter either a token URL or an Azure tenant.

Extractor

Use the optional extractor section to add tuning parameters.

ParameterDescription
modeSet the execution mode. Options are single or continuous.
Use continous to run the extractor in a continuous mode, executing the Odata queries defined in the endpoints section. The default value is single
upload-queue-sizeEnter the size of the upload queue. The default value is 50 000 rows.
parallelismInsert the number of parallel queries to run. The default value is 4 queries.
state-storeSet to true to configure state store. The default value is no state store, and the incremental load is deactivated. See the State store section.
chunk_sizeEnter the number of rows to be extracted from SAP OData on every run. The default value is 1000 rows, as recommended by SAP.
delta_padding_minutesExtractor internal parameter to control the incremental load padding. Do not change.

State store

Use the state store subsection to save extraction states between runs. Use this if data is loaded incrementally. We support multiple state stores, but you can only configure one at a time.

ParameterDescription
localLocal state store configuration. See the Local section.
rawRAW state store configuration. See the RAW section.

Local

Use the local section to store the extraction state in a JSON file on a local machine.

ParameterDescription
pathInsert the file path to a JSON file.
save-intervalEnter the interval in seconds between each save. The default value is 30 seconds.

RAW

Use the RAW section to store the extraction state in a table in the CDF staging area.

ParameterDescription
databaseEnter the database name in the CDF staging area.
tableEnter the table name in the CDF staging area.
upload-intervalEnter the interval in seconds between each save. The default value is 30 seconds.

Metrics

Use the metrics section to describe where to send performance metrics for remote monitoring of the extractor. We recommend sending metrics to a Prometheus pushgateway, but you can also send metrics as time series in the CDF project.

ParameterDescription
push-gatewaysList the Pushgateway configurations. See the Pushgateways section.
cogniteList the Cognite metrics configurations. See the Cognite section.

Pushgateways

Use the pushgateways subsection to define a list of metric destinations, each on the following schema:

ParameterDescription
hostEnter the address of the host to push metrics to. This is a required parameter.
job-nameEnter the value of the exported_job label to associate metrics with. This separates several deployments on a single pushgateway, and should be unique. This is a required parameter.
usernameEnter the credentials for the pushgateway. This is a required parameter.
passwordEnter the credentials for the pushgateway. This is a required parameter.
clear-afterEnter the number of seconds to wait before clearing the pushgateway. When this parameter is present, the extractor will stall after the run is complete before deleting all metrics from the pushgateway. The recommended value is at least twice that of the scrape interval on the pushgateway. This is to ensure that the last metrics are gathered before the deletion.
push-intervalEnter the interval in seconds between each push. The default value is 30 seconds.

Cognite

Use the cognite subsection to sent metrics as time series to the CDF project configured in the cognite main section above. Only numeric metrics, such as Prometheus counters and gauges, are sent.

ParameterDescription
external-id-prefixInsert a prefix to all time series used to represent metrics for this deployment. This creates a scope for the set of time series created by these metrics exported and should be deployment-unique across the entire project. This is a required parameter.
asset-nameEnter the name of the asset to attach to time series. This will be created if it doesn't already exist.
asset-external-idEnter the external ID for the asset to create if the asset doesn't already exist.
push-intervalEnter the interval in seconds between each push. The default value is 30 seconds.

SAP

Use the SAP section to enter the mandatory SAP parameters.

ParameterDescription
gateway_urlInsert the SAP NetWeaver Gateway URL. This is a required parameter.
clientEnter the SAP client number. This is a required parameter.
usernameEnter the SAP username to connect to the SAP NetWeaver Gateway. This is a required parameter.
passwordEnter the password to connect to the SAP NetWeaver Gateway. This is a required parameter.
certificatesCertificates needed for authentication towards SAP instance. This is an optional parameter. See the Certificates section.
endpointsList the SAP OData endpoints. Each endpoint corresponds to a valid SAP OData service. See the Endpoints section.

Certificates

Use the certificates subsection the certificates to be used for authentication towards SAP instances.

There are three certificates needed to perform the authentication: certificate authority (ca_cert), public key (public_key), and private key (private_key)

Please check this documentation on how to generate the three certificates from a .p12 certificate file, if needed.

When setting the certificate authentication, note thatthree certificates are needed and they must be placed in the same folder where the extractor will be running.

ParameterDescription
ca_certEnter the name of the CA certificate file.
public_keyEnter the name of the public key file.
private_keyEnter the name private key file.

Endpoints

Use the endpoint subsection to specify the OData endpoints.

ParameterDescription
nameEnter the name of the SAP OData endpoint. This is a required parameter.
destinationInsert the CDF staging database and table to upload to. This is a required value. See the RAW destination section.
sap_serviceEnter the name of the SAP OData service. This is a required parameter.
sap_entityEnter the name of the SAP OData entity related to the SAP OData service. This is a required parameter.
sap_keyEnter the name of the primary key field related to the SAP OData entity. This is a required parameter.
incremental_fieldEnter the name of the field to be used as reference for the incremental runs. This is an optional parameter. If you leave this field empty, the extractor will fetch full data loads every run.
scheduleSchedule the interval which the OData queries will be executed towards the SAP Odata service. See the Schedule section.
extract_schemaExtract the SAP OData entity schema to the CDF staging area. It expects database and table parameters, same as RAW destination. This is an optional parameter.

Schedule

Use the schedule subsection to schedule runs when the extractor runs as a service.

ParameterDescription
typeInsert the schedule type. Valid options are cron and interval.
  • cron uses regular cron expressions.
  • interval expects an interval-based schedule.
  • expressionEnter the cron or interval expression to trigger the query. For example, 1h repeats the query hourly, and 5m repeats the query every 5 minutes.

    RAW destination

    The raw destination subsection enables writing data to the CDF staging area.

    ParameterDescription
    databaseEnter the CDF staging database to upload data into. This will be created if it doesn't exist. This is a required parameter.
    tableEnter the CDF staging table to upload data into. This will be created if it doesn't exist. This is a required parameter.