Configuration settings
To configure the SAP extractor, you must create a configuration file. The file must be in YAML format.
You can use the sample minimal configuration file included with the extractor packages as a starting point for your configuration settings.
The configuration file contains the global parameter version
, which holds the version of the configuration schema. This article describes version 1.
You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.
Using values from environment variables
The configuration file allows substitutions with environment variables. For example:
cognite:
secret: ${COGNITE_CLIENT_SECRET}
will load the value from the COGNITE_CLIENT_SECRET
environment variable into the cognite/secret
parameter. You can also do string interpolation with environment variables, for example:
url: http://my-host.com/api/endpoint?secret=${MY_SECRET_TOKEN}
Implicit substitutions only work for unquoted value strings. For quoted strings, use the !env
tag to activate environment substitution:
url: !env 'http://my-host.com/api/endpoint?secret=${MY_SECRET_TOKEN}'
Using values from Azure Key Vault
The DB extractor also supports loading values from Azure Key Vault. To load a configuration value from Azure Key Vault, use the !keyvault
tag followed by the name of the secret you want to load. For example, to load the value of the my-secret-name
secret in Key Vault into a password
parameter, configure your extractor like this:
password: !keyvault my-secret-name
To use Key Vault, you also need to include the azure-keyvault
section in your configuration, with the following parameters:
Parameter | Description |
---|---|
keyvault-name | Name of Key Vault to load secrets from |
authentication-method | How to authenticate to Azure. Either default or client-secret . For default , the extractor will look at the user running the extractor, and look for pre-configured Azure logins from tools like the Azure CLI. For client-secret , the extractor will authenticate with a configured client ID/secret pair. |
client-id | Required for using the client-secret authentication method. The client ID to use when authenticating to Azure. |
secret | Required for using the client-secret authentication method. The client secret to use when authenticating to Azure. |
tenant-id | Required for using the client-secret authentication method. The tenant ID of the Key Vault in Azure. |
Example:
azure-keyvault:
keyvault-name: my-keyvault-name
authentication-method: client-secret
tenant-id: 6f3f324e-5bfc-4f12-9abe-22ac56e2e648
client-id: 6b4cc73e-ee58-4b61-ba43-83c4ba639be6
secret: 1234abcd
Base configuration object
Parameter | Type | Description |
---|---|---|
version | either string or integer | Configuration file version |
type | either local or remote | Configuration file type. Either local , meaning the full config is loaded from this file, or remote , which means that only the cognite section is loaded from this file, and the rest is loaded from extraction pipelines. Default value is local . |
cognite | object | The cognite section describes which CDF project the extractor will load data into and how to connect to the project. |
logger | object | The optional logger section sets up logging to a console and files. |
metrics | object | The metrics section describes where to send metrics on extractor performance for remote monitoring of the extractor. We recommend sending metrics to a Prometheus pushgateway, but you can also send metrics as time series in the CDF project. |
sap | list | List of SAP instances to connect to |
endpoints | list | List of endpoints to query |
extractor | object | General extractor configuration |
cognite
Global parameter.
The cognite section describes which CDF project the extractor will load data into and how to connect to the project.
Parameter | Type | Description |
---|---|---|
project | string | Insert the CDF project name. |
idp-authentication | object | The idp-authentication section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory). |
data-set | object | Enter a data set the extractor should write data into |
extraction-pipeline | object | Enter the extraction pipeline used for remote config and reporting statuses |
host | string | Insert the base URL of the CDF project. Default value is https://api.cognitedata.com . |
timeout | integer | Enter the timeout on requests to CDF, in seconds. Default value is 30 . |
external-id-prefix | string | Prefix on external ID used when creating CDF resources |
connection | object | Configure network connection details |
idp-authentication
Part of cognite
configuration.
The idp-authentication
section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory).
Parameter | Type | Description |
---|---|---|
authority | string | Insert the authority together with tenant to authenticate against Azure tenants. Default value is https://login.microsoftonline.com/ . |
client-id | string | Required. Enter the service principal client id from the IdP. |
tenant | string | Enter the Azure tenant. |
token-url | string | Insert the URL to fetch tokens from. |
secret | string | Enter the service principal client secret from the IdP. |
resource | string | Resource parameter passed along with token requests. |
audience | string | Audience parameter passed along with token requests. |
scopes | list | Enter a list of scopes requested for the token |
min-ttl | integer | Insert the minimum time in seconds a token will be valid. If the cached token expires in less than min-ttl seconds, it will be refreshed even if it is still valid. Default value is 30 . |
certificate | object | Authenticate with a client certificate |
scopes
Part of idp-authentication
configuration.
Enter a list of scopes requested for the token
Each element of this list should be a string.
certificate
Part of idp-authentication
configuration.
Authenticate with a client certificate
Parameter | Type | Description |
---|---|---|
authority-url | string | Authentication authority URL |
path | string | Required. Enter the path to the .pem or .pfx certificate to be used for authentication |
password | string | Enter the password for the key file, if it is encrypted. |
data-set
Part of cognite
configuration.
Enter a data set the extractor should write data into
Parameter | Type | Description |
---|---|---|
id | integer | Resource internal id |
external-id | string | Resource external id |
extraction-pipeline
Part of cognite
configuration.
Enter the extraction pipeline used for remote config and reporting statuses
Parameter | Type | Description |
---|---|---|
id | integer | Resource internal id |
external-id | string | Resource external id |
connection
Part of cognite
configuration.
Configure network connection details
Parameter | Type | Description |
---|---|---|
disable-gzip | boolean | Whether or not to disable gzipping of json bodies. |
status-forcelist | string | HTTP status codes to retry. Defaults to 429, 502, 503 and 504 |
max-retries | integer | Max number of retries on a given http request. Default value is 10 . |
max-retries-connect | integer | Max number of retries on connection errors. Default value is 3 . |
max-retry-backoff | integer | Retry strategy employs exponential backoff. This parameter sets a max on the amount of backoff after any request failure. Default value is 30 . |
max-connection-pool-size | integer | The maximum number of connections which will be kept in the SDKs connection pool. Default value is 50 . |
disable-ssl | boolean | Whether or not to disable SSL verification. |
proxies | object | Dictionary mapping from protocol to url. |
proxies
Part of connection
configuration.
Dictionary mapping from protocol to url.
logger
Global parameter.
The optional logger
section sets up logging to a console and files.
Parameter | Type | Description |
---|---|---|
console | object | Include the console section to enable logging to a standard output, such as a terminal window. |
file | object | Include the file section to enable logging to a file. The files are rotated daily. |
metrics | boolean | Enables metrics on the number of log messages recorded per logger and level. This requires metrics to be configured as well |
console
Part of logger
configuration.
Include the console section to enable logging to a standard output, such as a terminal window.
Parameter | Type | Description |
---|---|---|
level | either DEBUG , INFO , WARNING , ERROR or CRITICAL | Select the verbosity level for console logging. Valid options, in decreasing verbosity levels, are DEBUG , INFO , WARNING , ERROR , and CRITICAL . Default value is INFO . |
file
Part of logger
configuration.
Include the file section to enable logging to a file. The files are rotated daily.
Parameter | Type | Description |
---|---|---|
level | either DEBUG , INFO , WARNING , ERROR or CRITICAL | Select the verbosity level for file logging. Valid options, in decreasing verbosity levels, are DEBUG , INFO , WARNING , ERROR , and CRITICAL . Default value is INFO . |
path | string | Required. Insert the path to the log file. |
retention | integer | Specify the number of days to keep logs for. Default value is 7 . |
metrics
Global parameter.
The metrics
section describes where to send metrics on extractor performance for remote monitoring of the extractor. We recommend sending metrics to a Prometheus pushgateway, but you can also send metrics as time series in the CDF project.
Parameter | Type | Description |
---|---|---|
push-gateways | list | List of prometheus pushgateway configurations |
cognite | object | Push metrics to CDF timeseries. Requires CDF credentials to be configured |
server | object | The extractor can also be configured to expose a HTTP server with prometheus metrics for scraping |
push-gateways
Part of metrics
configuration.
List of prometheus pushgateway configurations
Each element of this list should be a the push-gateways sections contain a list of metric destinations.
Parameter | Type | Description |
---|---|---|
host | string | Enter the address of the host to push metrics to. |
job-name | string | Enter the value of the exported_job label to associate metrics with. This separates several deployments on a single pushgateway, and should be unique. |
username | string | Enter the credentials for the pushgateway. |
password | string | Enter the credentials for the pushgateway. |
clear-after | either null or integer | Enter the number of seconds to wait before clearing the pushgateway. When this parameter is present, the extractor will stall after the run is complete before deleting all metrics from the pushgateway. The recommended value is at least twice that of the scrape interval on the pushgateway. This is to ensure that the last metrics are gathered before the deletion. Default is disabled. |
push-interval | integer | Enter the interval in seconds between each push. Default value is 30 . |
cognite
Part of metrics
configuration.
Push metrics to CDF timeseries. Requires CDF credentials to be configured
Parameter | Type | Description |
---|---|---|
external-id-prefix | string | Required. Prefix on external ID used when creating CDF time series to store metrics. |
asset-name | string | Enter the name for a CDF asset that will have all the metrics time series attached to it. |
asset-external-id | string | Enter the external ID for a CDF asset that will have all the metrics time series attached to it. |
push-interval | integer | Enter the interval in seconds between each push to CDF. Default value is 30 . |
data-set | object | Data set the metrics will be created under |
data-set
Part of cognite
configuration.
Data set the metrics will be created under
Parameter | Type | Description |
---|---|---|
id | integer | Resource internal id |
external-id | string | Resource external id |
server
Part of metrics
configuration.
The extractor can also be configured to expose a HTTP server with prometheus metrics for scraping
Parameter | Type | Description |
---|---|---|
host | string | Host to run the prometheus server on. Default value is 0.0.0.0 . |
port | integer | Local port to expose the prometheus server on. Default value is 9000 . |
sap
Global parameter.
List of SAP instances to connect to
Each element of this list should be a configuration of an SAP source
Either one of the following options:
sap_netweaver_gateway
Part of sap
configuration.
The SAP NetWeaver Gateway lets clients connect using the Open Data Protocol (OData).
Example:
type: odata
source-name: mys4hana
gateway-url: https://mys4hana.com/sap/opu/odata/sap/
client: '100'
username: ${SAP_USERNAME}
password: ${SAP_PASSWORD}
language: EN
Parameter | Type | Description |
---|---|---|
type | always odata | Required. Type of SAP source connection, set to odata for SAP OData sources. |
source-name | string | Required. Enter a name for the source that will be used throughout the endpoints section and for logging. The name must be unique for each source in the configuration file. |
gateway-url | string | Required. Insert the SAP NetWeaver Gateway URL |
client | string | Required. Insert the SAP client number |
username | string | Required. Enter the SAP username to connect to the SAP NetWeaver Gateway. |
password | string | Required. Enter the SAP password to connect to the SAP NetWeaver Gateway |
disable-ssl | boolean | Disable the SSL certificate verification towards the SAP destination. Default value is set to False. |
connection_check_timeout | integer | Enter the timeout (in seconds) period the extractor should wait when doing connectivity tests towards the SAP source. Default value is 60 . |
language | string | Enter the sap-language URL parameter. The default value is EN |
certificates | object | Certificates needed for authentication towards SAP instance. There are three certificates needed to perform the authentication: certificate authority ( ca-cert ), public key (public-key ), and private key (private-key ). |
proxy | object | HTTP and/or HTTPS proxies to be used when connecting to the SAP source system. |
timezone | either local or utc | Specify how the extractor should handle the source time zones. Default value is local . |
certificates
Part of sap_netweaver_gateway
configuration.
Certificates needed for authentication towards SAP instance.
There are three certificates needed to perform the authentication: certificate authority (ca-cert
), public key (public-key
), and private key (private-key
).
Parameter | Type | Description |
---|---|---|
ca-cert | string | Required. Enter the path to the CA certificate file. |
public-key | string | Required. Enter the path to the public key file. |
private-key | string | Required. Enter the path to the private key file. |
proxy
Part of sap_netweaver_gateway
configuration.
HTTP and/or HTTPS proxies to be used when connecting to the SAP source system.
Parameter | Type | Description |
---|---|---|
http | string | Enter the address of the HTTP proxy |
https | string | Enter the address of the HTTPS proxy. |
sap_soap_source
Part of sap
configuration.
The SAP extractor can connect to SAP instances using SOAP, such as for SAP ERP SOAMANAGER
Example:
type: soap
source-name: soap-funcloc
wsdl-url: https://myerp.com/sap/bc/srt/wsdl/sap/bc/srt/rfc/sap/test/100/test_funcloc/test_funcloc?sap-client=100
client: '100'
username: ${SAP_USERNAME}
password: ${SAP_PASSWORD}
language: EN
Parameter | Type | Description |
---|---|---|
type | always soap | Required. Type of SAP source connection, set to soap for SAP SOAP sources. |
source-name | string | Required. Enter a name for the source that will be used throughout the endpoints section and for logging. The name must be unique for each source in the configuration file. |
wsdl-url | string | Required. Insert the SOAP WSDL URL related to the SAP ABAP webservice |
strict-parser | boolean | Flag to control how the extractor behave when parsing SOAP responses. Default value is set to True. |
disable-ssl | boolean | Disable the SSL certificate verification towards the SAP destination. Default value is set to False. |
client | string | Required. Insert the SAP client number |
username | string | Required. Enter the username to connect to the SAP Webservice. |
password | string | Required. Enter the password to connect to the SAP Webservice |
language | string | Enter the sap-language URL parameter. The default value is EN |
certificates | object | Certificates needed for authentication towards SAP instance. There are three certificates needed to perform the authentication: certificate authority ( ca-cert ), public key (public-key ), and private key (private-key ). |
connection_check_timeout | integer | Enter the timeout (in seconds) period the extractor should wait when doing connectivity tests towards the SAP source. Default value is 60 . |
session_timeout | integer | Enter the timeout (in seconds) period the extractor should use when fetching WSDL definitions from the SOAP source. Default value is 300 . |
timezone | either local or utc | Specify how the extractor should handle the source time zones. Default value is local . |
certificates
Part of sap_soap_source
configuration.
Certificates needed for authentication towards SAP instance.
There are three certificates needed to perform the authentication: certificate authority (ca-cert
), public key (public-key
), and private key (private-key
).
Parameter | Type | Description |
---|---|---|
ca-cert | string | Required. Enter the path to the CA certificate file. |
public-key | string | Required. Enter the path to the public key file. |
private-key | string | Required. Enter the path to the private key file. |
endpoints
Global parameter.
List of endpoints to query
Each element of this list should be a description of an endpoint to extract data from on one of the configured SAP sources.
Parameter | Type | Description |
---|---|---|
name | string | Required. Enter a name of this SAP endpoint that will be used to for logging. The name must be unique for each query in the configuration file. |
source-name | string | Required. Enter the name of the SAP source related to this endpoint. This must be one of the SAP sources configured in the sap section. |
sap-service | string | Required. Enter the name of the related SAP service. For odata endpoints, it's the SAP OData service. For soap endpoints, it's the service defined in the WSDL document. |
sap-entity | string | Required. Enter the name of the related SAP entity. For odata endpoints, it's the name of the OData entity. For soap endpoints, it's the name of the SOAP operator as defined in the WSDL |
destination | configuration for either RAW, Events, Assets, Time Series or Files | Required. The destination of the data in CDF. |
pagination-type | either client , server or no-pagination | OData pagination type when running full load (non incremental) from SAP. Default value is no-pagination . |
sap-key | list | Enter a list of fields related to the SAP entity to be used as keys while ingesting data to CDF staging. This is a required parameter when using raw as the CDF destination. |
select | list | Enter a list of fields to be selected from the SAP entity, using OData $select operation. This parameter is available only for SAP OData endpoints. |
related-entities | object | Sub entities to be fetched from the main sap-entity. Feature is available for SOAP and OData sources. |
attachments | object | Extraction of SAP attachments to CDF files. Feature is available only for OData endpoints (restricted to S/4HANA installations) |
request | string | Enter the request to be sent to the SAP. This is a required parameter for soap endpoints. See the Requests section for more details. |
incremental-field | string | Enter the name of the field to be used as reference for the incremental runs. If this field is left out, the extractor will fetch full data loads every run. |
incremental-field-format | string | Enter the date format of the incremental field so the extractor can convert it properly when running incremental queries. Example: %H:%M:%S is the date format for the date value 12:00:00 . This is an optional parameter. |
initial-start | string | Enter the initial value to be used when running incremental requests. Mandatory parameter when running incremental loads in SOAP endpoints. |
schedule | configuration for either Fixed interval or CRON expression | Enter the schedule for when this query should run. Make sure not to schedule runs too often, but leave some room for the previous execution to be done. Required when running in continuous mode, ignored otherwise. Examples: {'schedule': {'type': 'interval', 'expression': '1h'}} {'schedule': {'type': 'cron', 'expression': '0 7-17 * * 1-5'}} |
extract-schema | object | If included, the extractor will extract the SAP entity schema to CDF RAW |
filter | string | Enter the filter query string. The $filter system query option allows clients to filter a collection of resources from the target SAP OData endpoint. This is only relevant for odata sources. |
soap-operation | string | Enter the name of the SOAP operation defined in the WSDL configuration. For soap endpoints, this is a required parameter. |
soap-port | string | Enter the name of the SOAP service port defined in the WSDL configuration. This parameter is only available for soap endpoints. |
Requests
The request
parameter is part of the endpoints
configuration, and is required soap
endpoints.
SOAP requests
SAP ABAP Webservices are SOAP/based, meaning the requests to the SAP server must be in a valid XML format.
The SAP extractor expects this XML to be added as a string in the request
parameter. This is an example of a valid XML request to a SAP ABAP Webservice generated from a SAP Function Module:
request: |
<FUNCLOC_LIST>
<item>
<FUNCTLOCATION>String 57</FUNCTLOCATION>
<FUNCLOC>String 58</FUNCLOC>
<LABEL_SYST>S</LABEL_SYST>
<DESCRIPT>String 60</DESCRIPT>
<STRIND>Strin</STRIND>
<CATEGORY>S</CATEGORY>
<SUPFLOC>String 63</SUPFLOC>
<PLANPLANT>Stri</PLANPLANT>
<MAINTPLANT>1010</MAINTPLANT>
<PLANGROUP>Str</PLANGROUP>
<SORTFIELD>String 67</SORTFIELD>
</item>
</FUNCLOC_LIST>
<MAINTPLANT_RA>
<item>
<SIGN>I</SIGN>
<OPTION>EQ</OPTION>
<LOW>1010</LOW>
<HIGH>1010</HIGH>
</item>
</MAINTPLANT_RA>
destination
Part of endpoints
configuration.
The destination of the data in CDF.
Either one of the following options:
raw
Part of destination
configuration.
The raw
destination writes data to the CDF staging area (RAW). The raw
destination requires the sap-key
parameter in the endpoint configuration.
Parameter | Type | Description |
---|---|---|
type | always raw | Type of CDF destination, set to raw to write data to RAW. |
database | string | Enter the CDF RAW database to upload data into. This will be created if it doesn't exist. |
table | string | Enter the CDF RAW table to upload data into. This will be created if it doesn't exist. |