Skip to main content

Configuration settings

To configure the PI AF extractor, you must edit the configuration file. The file is in YAML format, and the sample configuration file contains all valid options with default values.

You can leave many fields empty to let the extractor use the default values. The configuration file separates the settings by component, and you can remove an entire component to disable it or use the default values.

Sample configuration files

In the extractor installation folder, the /config subfolder contains sample complete and minimal configuration files. The values wrapped in ${} are replaced with environment variables with that name. For example ${COGNITE_PROJECT} will be replaced with the value of the environment variable called COGNITE_PROJECT.

The configuration file also contains the global parameter version, which holds the version of the configuration schema used in the configuration file. This document describes version 1 of the configuration schema.

Not that it is not recommended to use the config.example.yml as a basis for configuration files. This file contains all configuration options, which is both hard to read, and may cause issues. It is intended as a reference showing how each option is configured, not as a basis. Use config.minimal.yml instead.

Naming the configuration file

You must name the configuration file config.yml.

Tip

You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.

Before you start

  • Optionally, copy one of the sample files in the config directory and rename it to config.yml.
  • The config.minimal.yml file doesn't include a metrics section. Copy this section from the example below if the extractor is required to send metrics to a Prometheus Pushgateway.
  • Set up an extraction pipeline and note the external ID.

Minimal YAML configuration file

version: 1

pi:
# Required, host for the PI server
host: "${PI_HOST}"
# Windows username on the PI server
username: "${PI_USERNAME}"
# Windows password on the PI server
password: "${PI_PASSWORD}"

destination:
# Change these to make sure you get the data somewhere unique in Raw
database: piaf
elements-table: elements

cognite:
# The project to connect to in the API, uses the environment variable COGNITE_PROJECT.
project: "${COGNITE_PROJECT}"
# This is for microsoft as IdP, to use a different provider,
# set implementation: Basic, and use token-url instead of tenant.
# See the example config for the full list of options.
idp-authentication:
# Directory tenant
tenant: ${COGNITE_TENANT_ID}
# Application Id
client-id: ${COGNITE_CLIENT_ID}
# Client secret
secret: ${COGNITE_CLIENT_SECRET}
# List of resource scopes, ex:
# scopes:
# - scopeA
# - scopeB
scopes:
- ${COGNITE_SCOPE}

logger:
console:
level: "information"
file:
level: "debug"
path: "logs/log.txt"

Intervals

In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit], for example 10m for 10 minutes or 1h for 1 hour. timeunit is one of d, h, m, s, ms. You can also use a cron expression in some places.

Using values from Azure Key Vault

The PI AF extractor also supports loading values from Azure Key Vault. To load a configuration value from Azure Key Vault, use the !keyvault tag followed by the name of the secret you want to load. For example, to load the value of the my-secret-name secret in Key Vault into a password parameter, configure your extractor like this:

password: !keyvault my-secret-name

To use Key Vault, you also need to include the azure-keyvault section in your configuration, with the following parameters:

ParameterDescription
keyvault-nameName of Key Vault to load secrets from
authentication-methodHow to authenticate to Azure. Either default or client-secret. For default, the extractor will look at the user running the extractor, and look for pre-configured Azure logins from tools like the Azure CLI. For client-secret, the extractor will authenticate with a configured client ID/secret pair.
client-idRequired for using the client-secret authentication method. The client ID to use when authenticating to Azure.
secretRequired for using the client-secret authentication method. The client secret to use when authenticating to Azure.
tenant-idRequired for using the client-secret authentication method. The tenant ID of the Key Vault in Azure.

Example:

azure-keyvault:
keyvault-name: my-keyvault-name
authentication-method: client-secret
tenant-id: 6f3f324e-5bfc-4f12-9abe-22ac56e2e648
client-id: 6b4cc73e-ee58-4b61-ba43-83c4ba639be6
secret: 1234abcd

Configure the PI AF extractor

PI AF Extractor configuration

ParameterTypeDescription
versionintegerVersion of the config file, the extractor specifies which config file versions are accepted in each version of the extractor.
piobjectInclude the pi section to configure the connection to the PI AF system.
cogniteobjectConfigure connection to Cognite Data Fusion (CDF)
extractionobjectInclude the extraction section to configure how to extract data from PI AF.
destinationobjectInclude the destination section to configure the destination for the extracted data. Currently this is only the CDF staging area (RAW).
metricsobjectConfiguration for publishing metrics.
loggerobjectConfiguration for logging to console or file. Log entries are either Fatal, Error, Warning, Information, Debug, or Verbose, in order of decreasing priority. The extractor will log any messages at an equal or higher log level than the configured level for each sink.

pi

Global parameter.

Include the pi section to configure the connection to the PI AF system.

This is how the PI AF extractor selects a system:

  • If you configure system-name, the extractor selects the system by name from the preconfigured list of PI system on the machine the extractor runs on.
  • If you configure host, the extractor selects a PI system running on the PI server's host.
  • If you don't configure either of these parameters, the extractor selects the default system on the machine the extractor runs on. If there is no default system, the extractor selects the first system from the preconfigured PI system list on the machine the extractor runs on.
ParameterTypeDescription
hoststringInsert the base URL of the PI server's host. If you don't enter any value, you must configure a PI system in the installed SDK on the machine the extractor runs on.
usernamestringRequired. Insert the Windows username on the PI server.
passwordstringRequired. Insert the Windows password on the PI server.
system-namestringEnter the name of the PI system you want to use. This is used instead of host to select a PI system.
database-namestringEnter the name of the PI database you want to use. The default value is the default database configured on the machine the extractor runs on or the first database in the list of no default database in configured.

cognite

Global parameter.

Configure connection to Cognite Data Fusion (CDF)

ParameterTypeDescription
projectstringCDF project to connect to.
idp-authenticationobjectThe idp-authentication section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory).
See OAuth 2.0 client credentials flow
hoststringInsert the base URL of the CDF project. Default value is https://api.cognitedata.com.
cdf-retriesobjectConfigure automatic retries on requests to CDF.
cdf-chunkingobjectConfigure chunking of data on requests to CDF. Note that increasing these may cause requests to fail due to limits in the API itself
cdf-throttlingobjectConfigure the maximum number of parallel requests for different CDF resources.
sdk-loggingobjectConfigure logging of requests from the SDK
nan-replacementeither number or nullReplacement for NaN values when writing to CDF. If left out, NaN values are skipped.
extraction-pipelineobjectConfigure an associated extraction pipeline
certificatesobjectConfigure special handling of SSL certificates. This should never be considered a permanent solution to certificate problems

idp-authentication

Part of cognite configuration.

The idp-authentication section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory). See OAuth 2.0 client credentials flow

ParameterTypeDescription
authoritystringAInsert the authority together with tenant to authenticate against Azure tenants. Default value is https://login.microsoftonline.com/.
client-idstringRequired. Enter the service principal client id from the IdP.
tenantstringEnter the Azure tenant.
token-urlstringInsert the URL to fetch tokens from.
secretstringEnter the service principal client secret from the IdP.
resourcestringResource parameter passed along with token requests.
audiencestringAudience parameter passed along with token requests.
scopesconfiguration for either list or string
min-ttlintegerInsert the minimum time in seconds a token will be valid. If the cached token expires in less than min-ttl seconds, it will be refreshed even if it is still valid. Default value is 30.
certificateobjectAuthenticate with a client certificate

certificate

Part of idp-authentication configuration.

Authenticate with a client certificate

ParameterTypeDescription
authority-urlstringAuthentication authority URL
pathstringRequired. Enter the path to the .pem or .pfx certificate to be used for authentication
passwordstringEnter the password for the key file, if it is encrypted.

cdf-retries

Part of cognite configuration.

Configure automatic retries on requests to CDF.

ParameterTypeDescription
timeoutintegerTimeout in milliseconds for each individual request to CDF. Default value is 80000.
max-retriesintegerMaximum number of retries on requests to CDF. If this is less than 0, retry forever. Default value is 5.
max-delayintegerMax delay in milliseconds between each retry. Base delay is calculated according to 125*2^retry milliseconds. If less than 0, there is no maximum. Default value is 5000.

cdf-chunking

Part of cognite configuration.

Configure chunking of data on requests to CDF. Note that increasing these may cause requests to fail due to limits in the API itself

ParameterTypeDescription
time-seriesintegerMaximum number of timeseries per get/create timeseries request. Default value is 1000.
assetsintegerMaximum number of assets per get/create assets request. Default value is 1000.
data-point-time-seriesintegerMaximum number of timeseries per datapoint create request. Default value is 10000.
data-point-deleteintegerMaximum number of ranges per delete datapoints request. Default value is 10000.
data-point-listintegerMaximum number of timeseries per datapoint read request. Used when getting the first point in a timeseries. Default value is 100.
data-pointsintegerMaximum number of datapoints per datapoints create request. Default value is 100000.
data-points-gzip-limitintegerMinimum number of datapoints in request to switch to using gzip. Set to -1 to disable, and 0 to always enable (not recommended). The minimum HTTP packet size is generally 1500 bytes, so this should never be set below 100 for numeric datapoints. Even for larger packages gzip is efficient enough that packages are compressed below 1500 bytes. At 5000 it is always a performance gain. It can be set lower if bandwidth is a major issue. Default value is 5000.
raw-rowsintegerMaximum number of rows per request to cdf raw. Default value is 10000.
raw-rows-deleteintegerMaximum number of row keys per delete request to raw. Default value is 1000.
data-point-latestintegerMaximum number of timeseries per datapoint read latest request. Default value is 100.
eventsintegerMaximum number of events per get/create events request. Default value is 1000.
sequencesintegerMaximum number of sequences per get/create sequences request. Default value is 1000.
sequence-row-sequencesintegerMaximum number of sequences per create sequence rows request. Default value is 1000.
sequence-rowsintegerMaximum number of sequence rows per sequence when creating rows. Default value is 10000.

cdf-throttling

Part of cognite configuration.

Configure the maximum number of parallel requests for different CDF resources.

ParameterTypeDescription
time-seriesintegerMaximum number of parallel requests per timeseries operation. Default value is 20.
assetsintegerMaximum number of parallel requests per assets operation. Default value is 20.
data-pointsintegerMaximum number of parallel requests per datapoints operation. Default value is 10.
rawintegerMaximum number of parallel requests per raw operation. Default value is 10.
rangesintegerMaximum number of parallel requests per get first/last datapoint operation. Default value is 20.
eventsintegerMaximum number of parallel requests per events operation. Default value is 20.
sequencesintegerMaximum number of parallel requests per sequences operation. Default value is 10.

sdk-logging

Part of cognite configuration.

Configure logging of requests from the SDK

ParameterTypeDescription
disablebooleanTrue to disable logging from the SDK, it is enabled by default
leveleither trace, debug, information, warning, error, critical or noneLog level to log messages from the SDK at. Default value is debug.
formatstringFormat of the log message. Default value is CDF ({Message}): {HttpMethod} {Url} {ResponseHeader[X-Request-ID]} - {Elapsed} ms.

extraction-pipeline

Part of cognite configuration.

Configure an associated extraction pipeline

ParameterTypeDescription
external-idstringExternal ID of the extraction pipeline
frequencyintegerFrequency to report Seen to the extraction pipeline in seconds. Less than or equal to zero will not report automatically. Default value is 600.

certificates

Part of cognite configuration.

Configure special handling of SSL certificates. This should never be considered a permanent solution to certificate problems

ParameterTypeDescription
accept-allbooleanAccept all remote SSL certificates. This introduces a severe risk of man-in-the-middle attacks
allow-listlistList of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates

allow-list

Part of certificates configuration.

List of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates

Each element of this list should be a string.

extraction

Global parameter.

Include the extraction section to configure how to extract data from PI AF.

ParameterTypeDescription
elementsobjectConfiguration for extraction PI AF Elements
update-periodstringEnter the time between each time the extractor reads update events from the PI AF server. This is used to partiall refresh the PI AF elements to get newly created elements, or any changes to attribute values. Format is as given in Intervals.

For instance, 2h means incremental updates run every other hour, starting at extractor startup. The extractor won't read updates at all if you set this parameter to 0 or a negative value. If both this parameter and refresh-period are set to 0 or a negative value, the extractor quits after reading all elements or after hitting the limit set in elements.limit. Default value is 0s.
keep-alivestringTime between each time the extractor checks for changes to system and database status. This serves as a kind of keep alive, which may be necessary in some cases where the connection is timed out by an external mechanism. Format is as given in Intervals.

If this is 0 or negative, the extractor will not make keep alive requests. Default value is 5m.
refresh-periodstringTime between each time the extractor performs a full refresh, reading all data from the PI AF server. Format is as given in Intervals.

If this is 0 or negative, the extractor will only read all data on startup. Default value is 0s.

elements

Part of extraction configuration.

Configuration for extraction PI AF Elements

ParameterTypeDescription
chunkintegerInsert the maximum number of PI AF elements to read per request to PI. These are immediately written to CDF RAW. Default value is 1000.
limitintegerInsert the total maximum number of PI AF elements to read. Use this to get a reasonable subset of the server for testing. Note that this doesn't work if extraction.update-period is configured.
querystringInsert the string query, see Aveva documentation
flatten-attributesbooleanTrue to flatten attributes into a separate Raw table. If false, all attributes belonging to an element will be extracted as fields on elements in the elements table

destination

Global parameter.

Include the destination section to configure the destination for the extracted data. Currently this is only the CDF staging area (RAW).

ParameterTypeDescription
databasestringInsert the CDF RAW database to extract data to. If no database exists, the extractor creates a database. Default value is piaf.
elements-tablestringEnter the table name for the PI AF elements in the CDF RAW database. If no table exists, the database creates a table. Default value is elements.
unit-of-measure-classes-tablestringEnter the table name for unit-of-measure classes in the CDF RAW database. If no table exists, the extractor creates a table. Default value is unit-of-measure-classes.
attributes-tablestringEnter the table name for the attributes in the CDF RAW database to be used if elements.flatten-attributes is set to true. If no table exists, the extractor creates a table. Default value is attributes.

metrics

Global parameter.

Configuration for publishing metrics.

ParameterTypeDescription
serverobjectConfiguration for having the extractor start a Prometheus scrape server on a local port.
push-gatewayslistA list of pushgateway destinations to push metrics to. The extractor will automatically push metrics to each of these.

server

Part of metrics configuration.

Configuration for having the extractor start a Prometheus scrape server on a local port.

ParameterTypeDescription
hoststringRequired. Host name for local Prometheus server, must be exposed to some prometheus instance for scraping.

Examples:
localhost
0.0.0.0
portintegerRequired. The port used for a local Prometheus server.

push-gateways

Part of metrics configuration.

A list of pushgateway destinations to push metrics to. The extractor will automatically push metrics to each of these.

ParameterTypeDescription
hoststringRequired. URI of the pushgateway host

Example:
http://my.pushgateway:9091
jobstringRequired. Name of the Prometheus pushgateway job.
usernamestringUsername for basic authentication
passwordstringPassword for basic authentication
push-intervalintegerInterval in seconds between each push to the gateway. Default value is 1.

logger

Global parameter.

Configuration for logging to console or file. Log entries are either Fatal, Error, Warning, Information, Debug, or Verbose, in order of decreasing priority. The extractor will log any messages at an equal or higher log level than the configured level for each sink.

ParameterTypeDescription
consoleobjectConfiguration for logging to the console.
fileobjectConfiguration for logging to a rotating log file.
trace-listenerobjectAdds a listener that uses the configured logger to output messages from System.Diagnostics.Trace

console

Part of logger configuration.

Configuration for logging to the console.

ParameterTypeDescription
leveleither verbose, debug, information, warning, error or fatalRequired. Minimum level of log events to write to the console. If not present, or invalid, logging to console is disabled.
stderr-leveleither verbose, debug, information, warning, error or fatalLog events at this level or above are redirected to standard error.

file

Part of logger configuration.

Configuration for logging to a rotating log file.

ParameterTypeDescription
leveleither verbose, debug, information, warning, error or fatalRequired. Minimum level of log events to write to file.
pathstringRequired. Path to the files to be logged. If this is set to logs/log.txt, logs on the form logs/log[date].txt will be created, depending on rolling-interval.
retention-limitintegerMaximum number of log files that are kept in the log folder. Default value is 31.
rolling-intervaleither day or hourRolling interval for log files. Default value is day.

trace-listener

Part of logger configuration.

Adds a listener that uses the configured logger to output messages from System.Diagnostics.Trace

ParameterTypeDescription
leveleither verbose, debug, information, warning, error or fatalRequired. Level to output trace messages at