Configuration settings

To configure the PI AF extractor, you must edit the configuration file. The file is in YAML format, and the sample configuration file contains all valid options with default values.

You can leave many fields empty to let the extractor use the default values. The configuration file separates the settings by component, and you can remove an entire component to disable it or use the default values.

Sample configuration files

In the extractor installation folder, the /config subfolder contains sample complete and minimal configuration files. The values wrapped in ${} are replaced with environment variables with that name. For example ${COGNITE_PROJECT} will be replaced with the value of the environment variable called COGNITE_PROJECT.

The configuration file also contains the global parameter version, which holds the version of the configuration schema used in the configuration file. This document describes version 1 of the configuration schema.

Not that it is not recommended to use the config.example.yml as a basis for configuration files. This file contains all configuration options, which is both hard to read, and may cause issues. It is intended as a reference showing how each option is configured, not as a basis. Use config.minimal.yml instead.

Naming the configuration file

You must name the configuration file config.yml.

Tip

You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.

Before you start

Optionally, copy one of the sample files in the config directory and rename it to config.yml.
The config.minimal.yml file doesn't include a metrics section. Copy this section from the example below if the extractor is required to send metrics to a Prometheus Pushgateway.
Set up an extraction pipeline and note the external ID.

Minimal YAML configuration file

version: 1

pi:
    # Required, host for the PI server
    host: "${PI_HOST}"
    # Windows username on the PI server
    username: "${PI_USERNAME}"
    # Windows password on the PI server
    password: "${PI_PASSWORD}"

destination:
    # Change these to make sure you get the data somewhere unique in Raw
    database: piaf
    elements-table: elements
    
cognite:
    # The project to connect to in the API, uses the environment variable COGNITE_PROJECT.
    project: "${COGNITE_PROJECT}"
    # This is for microsoft as IdP, to use a different provider,
    # set implementation: Basic, and use token-url instead of tenant.
    # See the example config for the full list of options.
    idp-authentication:
        # Directory tenant
        tenant: ${COGNITE_TENANT_ID}
        # Application Id
        client-id: ${COGNITE_CLIENT_ID}
        # Client secret
        secret: ${COGNITE_CLIENT_SECRET}
        # List of resource scopes, ex:
        # scopes:
        #   - scopeA
        #   - scopeB
        scopes:
          - ${COGNITE_SCOPE}
          
logger:
    console:
        level: "information"
    file:
        level: "debug"
        path: "logs/log.txt"

Intervals

In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit], for example 10m for 10 minutes or 1h for 1 hour. timeunit is one of d, h, m, s, ms. You can also use a cron expression in some places.

Using values from Azure Key Vault

The PI AF extractor also supports loading values from Azure Key Vault. To load a configuration value from Azure Key Vault, use the !keyvault tag followed by the name of the secret you want to load. For example, to load the value of the my-secret-name secret in Key Vault into a password parameter, configure your extractor like this:

password: !keyvault my-secret-name

To use Key Vault, you also need to include the azure-keyvault section in your configuration, with the following parameters:

Parameter	Description
`keyvault-name`	Name of Key Vault to load secrets from
`authentication-method`	How to authenticate to Azure. Either `default` or `client-secret`. For `default`, the extractor will look at the user running the extractor, and look for pre-configured Azure logins from tools like the Azure CLI. For `client-secret`, the extractor will authenticate with a configured client ID/secret pair.
`client-id`	Required for using the `client-secret` authentication method. The client ID to use when authenticating to Azure.
`secret`	Required for using the `client-secret` authentication method. The client secret to use when authenticating to Azure.
`tenant-id`	Required for using the `client-secret` authentication method. The tenant ID of the Key Vault in Azure.

Example:

azure-keyvault:
  keyvault-name: my-keyvault-name
  authentication-method: client-secret
  tenant-id: 6f3f324e-5bfc-4f12-9abe-22ac56e2e648
  client-id: 6b4cc73e-ee58-4b61-ba43-83c4ba639be6
  secret: 1234abcd

Configure the PI AF extractor

PI AF Extractor configuration

Parameter	Type	Description
`version`	integer	Version of the config file, the extractor specifies which config file versions are accepted in each version of the extractor.
`pi`	object	Include the `pi` section to configure the connection to the PI AF system.
`cognite`	object	Configure connection to Cognite Data Fusion (CDF)
`extraction`	object	Include the `extraction` section to configure how to extract data from PI AF.
`destination`	object	Include the `destination` section to configure the destination for the extracted data. Currently this is only the CDF staging area (RAW).
`metrics`	object	Configuration for publishing metrics.
`logger`	object	Configuration for logging to console or file. Log entries are either `Fatal`, `Error`, `Warning`, `Information`, `Debug`, or `Verbose`, in order of decreasing priority. The extractor will log any messages at an equal or higher log level than the configured level for each sink.

`pi`

Global parameter.

Include the pi section to configure the connection to the PI AF system.

This is how the PI AF extractor selects a system:

If you configure system-name, the extractor selects the system by name from the preconfigured list of PI system on the machine the extractor runs on.
If you configure host, the extractor selects a PI system running on the PI server's host.
If you don't configure either of these parameters, the extractor selects the default system on the machine the extractor runs on. If there is no default system, the extractor selects the first system from the preconfigured PI system list on the machine the extractor runs on.

Parameter	Type	Description
`host`	string	Insert the base URL of the PI server's host. If you don't enter any value, you must configure a PI system in the installed SDK on the machine the extractor runs on.
`username`	string	Required. Insert the Windows username on the PI server.
`password`	string	Required. Insert the Windows password on the PI server.
`system-name`	string	Enter the name of the PI system you want to use. This is used instead of `host` to select a PI system.
`database-name`	string	Enter the name of the PI database you want to use. The default value is the default database configured on the machine the extractor runs on or the first database in the list of no default database in configured.

`cognite`

Global parameter.

Configure connection to Cognite Data Fusion (CDF)

Parameter	Type	Description
`project`	string	CDF project to connect to.
`idp-authentication`	object	The `idp-authentication` section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory). See OAuth 2.0 client credentials flow
`host`	string	Insert the base URL of the CDF project. Default value is `https://api.cognitedata.com`.
`cdf-retries`	object	Configure automatic retries on requests to CDF.
`cdf-chunking`	object	Configure chunking of data on requests to CDF. Note that increasing these may cause requests to fail due to limits in the API itself
`cdf-throttling`	object	Configure the maximum number of parallel requests for different CDF resources.
`sdk-logging`	object	Configure logging of requests from the SDK
`nan-replacement`	either number or null	Replacement for NaN values when writing to CDF. If left out, NaN values are skipped.
`extraction-pipeline`	object	Configure an associated extraction pipeline
`certificates`	object	Configure special handling of SSL certificates. This should never be considered a permanent solution to certificate problems

`idp-authentication`

Part of cognite configuration.

The idp-authentication section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory). See OAuth 2.0 client credentials flow

Parameter	Type	Description
`authority`	string	AInsert the authority together with `tenant` to authenticate against Azure tenants. Default value is `https://login.microsoftonline.com/`.
`client-id`	string	Required. Enter the service principal client id from the IdP.
`tenant`	string	Enter the Azure tenant.
`token-url`	string	Insert the URL to fetch tokens from.
`secret`	string	Enter the service principal client secret from the IdP.
`resource`	string	Resource parameter passed along with token requests.
`audience`	string	Audience parameter passed along with token requests.
`scopes`	configuration for either list or string
`min-ttl`	integer	Insert the minimum time in seconds a token will be valid. If the cached token expires in less than `min-ttl` seconds, it will be refreshed even if it is still valid. Default value is `30`.
`certificate`	object	Authenticate with a client certificate

`certificate`

Part of idp-authentication configuration.

Authenticate with a client certificate

Parameter	Type	Description
`authority-url`	string	Authentication authority URL
`path`	string	Required. Enter the path to the .pem or .pfx certificate to be used for authentication
`password`	string	Enter the password for the key file, if it is encrypted.

`cdf-retries`

Part of cognite configuration.

Configure automatic retries on requests to CDF.

Parameter	Type	Description
`timeout`	integer	Timeout in milliseconds for each individual request to CDF. Default value is `80000`.
`max-retries`	integer	Maximum number of retries on requests to CDF. If this is less than 0, retry forever. Default value is `5`.
`max-delay`	integer	Max delay in milliseconds between each retry. Base delay is calculated according to 125*2^retry milliseconds. If less than 0, there is no maximum. Default value is `5000`.

`cdf-chunking`

Part of cognite configuration.

Configure chunking of data on requests to CDF. Note that increasing these may cause requests to fail due to limits in the API itself

Parameter	Type	Description
`time-series`	integer	Maximum number of timeseries per get/create timeseries request. Default value is `1000`.
`assets`	integer	Maximum number of assets per get/create assets request. Default value is `1000`.
`data-point-time-series`	integer	Maximum number of timeseries per datapoint create request. Default value is `10000`.
`data-point-delete`	integer	Maximum number of ranges per delete datapoints request. Default value is `10000`.
`data-point-list`	integer	Maximum number of timeseries per datapoint read request. Used when getting the first point in a timeseries. Default value is `100`.
`data-points`	integer	Maximum number of datapoints per datapoints create request. Default value is `100000`.
`data-points-gzip-limit`	integer	Minimum number of datapoints in request to switch to using gzip. Set to -1 to disable, and 0 to always enable (not recommended). The minimum HTTP packet size is generally 1500 bytes, so this should never be set below 100 for numeric datapoints. Even for larger packages gzip is efficient enough that packages are compressed below 1500 bytes. At 5000 it is always a performance gain. It can be set lower if bandwidth is a major issue. Default value is `5000`.
`raw-rows`	integer	Maximum number of rows per request to cdf raw. Default value is `10000`.
`raw-rows-delete`	integer	Maximum number of row keys per delete request to raw. Default value is `1000`.
`data-point-latest`	integer	Maximum number of timeseries per datapoint read latest request. Default value is `100`.
`events`	integer	Maximum number of events per get/create events request. Default value is `1000`.
`sequences`	integer	Maximum number of sequences per get/create sequences request. Default value is `1000`.
`sequence-row-sequences`	integer	Maximum number of sequences per create sequence rows request. Default value is `1000`.
`sequence-rows`	integer	Maximum number of sequence rows per sequence when creating rows. Default value is `10000`.

`cdf-throttling`

Part of cognite configuration.

Configure the maximum number of parallel requests for different CDF resources.

Parameter	Type	Description
`time-series`	integer	Maximum number of parallel requests per timeseries operation. Default value is `20`.
`assets`	integer	Maximum number of parallel requests per assets operation. Default value is `20`.
`data-points`	integer	Maximum number of parallel requests per datapoints operation. Default value is `10`.
`raw`	integer	Maximum number of parallel requests per raw operation. Default value is `10`.
`ranges`	integer	Maximum number of parallel requests per get first/last datapoint operation. Default value is `20`.
`events`	integer	Maximum number of parallel requests per events operation. Default value is `20`.
`sequences`	integer	Maximum number of parallel requests per sequences operation. Default value is `10`.

`sdk-logging`

Part of cognite configuration.

Configure logging of requests from the SDK

Parameter	Type	Description
`disable`	boolean	True to disable logging from the SDK, it is enabled by default
`level`	either `trace`, `debug`, `information`, `warning`, `error`, `critical` or `none`	Log level to log messages from the SDK at. Default value is `debug`.
`format`	string	Format of the log message. Default value is `CDF ({Message}): {HttpMethod} {Url} {ResponseHeader[X-Request-ID]} - {Elapsed} ms`.

`extraction-pipeline`

Part of cognite configuration.

Configure an associated extraction pipeline

Parameter	Type	Description
`external-id`	string	External ID of the extraction pipeline
`frequency`	integer	Frequency to report `Seen` to the extraction pipeline in seconds. Less than or equal to zero will not report automatically. Default value is `600`.

`certificates`

Part of cognite configuration.

Configure special handling of SSL certificates. This should never be considered a permanent solution to certificate problems

Parameter	Type	Description
`accept-all`	boolean	Accept all remote SSL certificates. This introduces a severe risk of man-in-the-middle attacks
`allow-list`	list	List of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates

`allow-list`

Part of certificates configuration.

List of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates

Each element of this list should be a string.

`extraction`

Global parameter.

Include the extraction section to configure how to extract data from PI AF.

Parameter	Type	Description
`elements`	object	Configuration for extraction PI AF Elements
`update-period`	string	Enter the time between each time the extractor reads update events from the PI AF server. This is used to partiall refresh the PI AF elements to get newly created elements, or any changes to attribute values. Format is as given in Intervals. For instance, `2h` means incremental updates run every other hour, starting at extractor startup. The extractor won't read updates at all if you set this parameter to `0` or a negative value. If both this parameter and `refresh-period` are set to `0` or a negative value, the extractor quits after reading all elements or after hitting the limit set in `elements.limit`. Default value is `0s`.
`keep-alive`	string	Time between each time the extractor checks for changes to system and database status. This serves as a kind of keep alive, which may be necessary in some cases where the connection is timed out by an external mechanism. Format is as given in Intervals. If this is 0 or negative, the extractor will not make keep alive requests. Default value is `5m`.
`refresh-period`	string	Time between each time the extractor performs a full refresh, reading all data from the PI AF server. Format is as given in Intervals. If this is 0 or negative, the extractor will only read all data on startup. Default value is `0s`.

`elements`

Part of extraction configuration.

Configuration for extraction PI AF Elements

Parameter	Type	Description
`chunk`	integer	Insert the maximum number of PI AF elements to read per request to PI. These are immediately written to CDF RAW. Default value is `1000`.
`limit`	integer	Insert the total maximum number of PI AF elements to read. Use this to get a reasonable subset of the server for testing. Note that this doesn't work if `extraction.update-period` is configured.
`query`	string	Insert the string query, see Aveva documentation
`flatten-attributes`	boolean	True to flatten attributes into a separate Raw table. If `false`, all attributes belonging to an element will be extracted as fields on elements in the elements table

`destination`

Global parameter.

Include the destination section to configure the destination for the extracted data. Currently this is only the CDF staging area (RAW).

Parameter	Type	Description
`database`	string	Insert the CDF RAW database to extract data to. If no database exists, the extractor creates a database. Default value is `piaf`.
`elements-table`	string	Enter the table name for the PI AF elements in the CDF RAW database. If no table exists, the database creates a table. Default value is `elements`.
`unit-of-measure-classes-table`	string	Enter the table name for unit-of-measure classes in the CDF RAW database. If no table exists, the extractor creates a table. Default value is `unit-of-measure-classes`.
`attributes-table`	string	Enter the table name for the attributes in the CDF RAW database to be used if `elements.flatten-attributes` is set to `true`. If no table exists, the extractor creates a table. Default value is `attributes`.

`metrics`

Global parameter.

Configuration for publishing metrics.

Parameter	Type	Description
`server`	object	Configuration for having the extractor start a Prometheus scrape server on a local port.
`push-gateways`	list	A list of pushgateway destinations to push metrics to. The extractor will automatically push metrics to each of these.

`server`

Part of metrics configuration.

Configuration for having the extractor start a Prometheus scrape server on a local port.

Parameter	Type	Description
`host`	string	Required. Host name for local Prometheus server, must be exposed to some prometheus instance for scraping. Examples: `localhost` `0.0.0.0`
`port`	integer	Required. The port used for a local Prometheus server.

`push-gateways`

Part of metrics configuration.

A list of pushgateway destinations to push metrics to. The extractor will automatically push metrics to each of these.

Parameter	Type	Description
`host`	string	Required. URI of the pushgateway host Example: `http://my.pushgateway:9091`
`job`	string	Required. Name of the Prometheus pushgateway job.
`username`	string	Username for basic authentication
`password`	string	Password for basic authentication
`push-interval`	integer	Interval in seconds between each push to the gateway. Default value is `1`.

`logger`

Global parameter.

Configuration for logging to console or file. Log entries are either Fatal, Error, Warning, Information, Debug, or Verbose, in order of decreasing priority. The extractor will log any messages at an equal or higher log level than the configured level for each sink.

Parameter	Type	Description
`console`	object	Configuration for logging to the console.
`file`	object	Configuration for logging to a rotating log file.
`trace-listener`	object	Adds a listener that uses the configured logger to output messages from `System.Diagnostics.Trace`

`console`

Part of logger configuration.

Configuration for logging to the console.

Parameter	Type	Description
`level`	either `verbose`, `debug`, `information`, `warning`, `error` or `fatal`	Required. Minimum level of log events to write to the console. If not present, or invalid, logging to console is disabled.
`stderr-level`	either `verbose`, `debug`, `information`, `warning`, `error` or `fatal`	Log events at this level or above are redirected to standard error.

`file`

Part of logger configuration.

Configuration for logging to a rotating log file.

Parameter	Type	Description
`level`	either `verbose`, `debug`, `information`, `warning`, `error` or `fatal`	Required. Minimum level of log events to write to file.
`path`	string	Required. Path to the files to be logged. If this is set to `logs/log.txt`, logs on the form `logs/log[date].txt` will be created, depending on `rolling-interval`.
`retention-limit`	integer	Maximum number of log files that are kept in the log folder. Default value is `31`.
`rolling-interval`	either `day` or `hour`	Rolling interval for log files. Default value is `day`.

`trace-listener`

Part of logger configuration.

Adds a listener that uses the configured logger to output messages from System.Diagnostics.Trace

Parameter	Type	Description
`level`	either `verbose`, `debug`, `information`, `warning`, `error` or `fatal`	Required. Level to output trace messages at

Sample configuration files​

Before you start​

Minimal YAML configuration file​

Intervals​

Using values from Azure Key Vault​

Configure the PI AF extractor​

pi​

cognite​

idp-authentication​

certificate​

cdf-retries​

cdf-chunking​

cdf-throttling​

sdk-logging​

extraction-pipeline​

certificates​

allow-list​

extraction​

elements​

destination​

metrics​

server​

push-gateways​

logger​

console​

file​

trace-listener​

Sample configuration files

Before you start

Minimal YAML configuration file

Intervals

Using values from Azure Key Vault

Configure the PI AF extractor

`pi`

`cognite`

`idp-authentication`

`certificate`

`cdf-retries`

`cdf-chunking`

`cdf-throttling`

`sdk-logging`

`extraction-pipeline`

`certificates`

`allow-list`

`extraction`

`elements`

`destination`

`metrics`

`server`

`push-gateways`

`logger`

`console`

`file`

`trace-listener`