Configuration settings
To configure the PI AF extractor, you must edit the configuration file. The file is in YAML format, and the sample configuration file contains all valid options with default values.
You can leave many fields empty to let the extractor use the default values. The configuration file separates the settings by component, and you can remove an entire component to disable it or use the default values.
Sample configuration files
In the extractor installation folder, the /config
subfolder contains sample complete and minimal configuration files. The values wrapped in ${}
are replaced with environment variables with that name. For example ${COGNITE_PROJECT}
will be replaced with the value of the environment variable called COGNITE_PROJECT
.
The configuration file also contains the global parameter version
, which holds the version of the configuration schema used in the configuration file. This document describes version 1 of the configuration schema.
Not that it is not recommended to use the config.example.yml
as a basis for configuration files. This file contains all configuration options, which is both hard to read, and may cause issues. It is intended as a reference showing how each option is configured, not as a basis. Use config.minimal.yml
instead.
You must name the configuration file config.yml.
You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.
Before you start
- Optionally, copy one of the sample files in the
config
directory and rename it toconfig.yml
. - The
config.minimal.yml
file doesn't include a metrics section. Copy this section from the example below if the extractor is required to send metrics to a Prometheus Pushgateway. - Set up an extraction pipeline and note the external ID.
Minimal YAML configuration file
version: 1
pi:
# Required, host for the PI server
host: "${PI_HOST}"
# Windows username on the PI server
username: "${PI_USERNAME}"
# Windows password on the PI server
password: "${PI_PASSWORD}"
destination:
# Change these to make sure you get the data somewhere unique in Raw
database: piaf
elements-table: elements
cognite:
# The project to connect to in the API, uses the environment variable COGNITE_PROJECT.
project: "${COGNITE_PROJECT}"
# This is for microsoft as IdP, to use a different provider,
# set implementation: Basic, and use token-url instead of tenant.
# See the example config for the full list of options.
idp-authentication:
# Directory tenant
tenant: ${COGNITE_TENANT_ID}
# Application Id
client-id: ${COGNITE_CLIENT_ID}
# Client secret
secret: ${COGNITE_CLIENT_SECRET}
# List of resource scopes, ex:
# scopes:
# - scopeA
# - scopeB
scopes:
- ${COGNITE_SCOPE}
logger:
console:
level: "information"
file:
level: "debug"
path: "logs/log.txt"
Intervals
In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit]
, for example 10m
for 10 minutes or 1h
for 1 hour. timeunit
is one of d
, h
, m
, s
, ms
. You can also use a cron expression in some places.
Using values from Azure Key Vault
The PI AF extractor also supports loading values from Azure Key Vault. To load a configuration value from Azure Key Vault, use the !keyvault
tag followed by the name of the secret you want to load. For example, to load the value of the my-secret-name
secret in Key Vault into a password
parameter, configure your extractor like this:
password: !keyvault my-secret-name
To use Key Vault, you also need to include the azure-keyvault
section in your configuration, with the following parameters:
Parameter | Description |
---|---|
keyvault-name | Name of Key Vault to load secrets from |
authentication-method | How to authenticate to Azure. Either default or client-secret . For default , the extractor will look at the user running the extractor, and look for pre-configured Azure logins from tools like the Azure CLI. For client-secret , the extractor will authenticate with a configured client ID/secret pair. |
client-id | Required for using the client-secret authentication method. The client ID to use when authenticating to Azure. |
secret | Required for using the client-secret authentication method. The client secret to use when authenticating to Azure. |
tenant-id | Required for using the client-secret authentication method. The tenant ID of the Key Vault in Azure. |
Example:
azure-keyvault:
keyvault-name: my-keyvault-name
authentication-method: client-secret
tenant-id: 6f3f324e-5bfc-4f12-9abe-22ac56e2e648
client-id: 6b4cc73e-ee58-4b61-ba43-83c4ba639be6
secret: 1234abcd
Configure the PI AF extractor
PI AF Extractor configuration
Parameter | Type | Description |
---|---|---|
version | integer | Version of the config file, the extractor specifies which config file versions are accepted in each version of the extractor. |
pi | object | Include the pi section to configure the connection to the PI AF system. |
cognite | object | Configure connection to Cognite Data Fusion (CDF) |
extraction | object | Include the extraction section to configure how to extract data from PI AF. |
destination | object | Include the destination section to configure the destination for the extracted data. Currently this is only the CDF staging area (RAW). |
metrics | object | Configuration for publishing metrics. |
logger | object | Configuration for logging to console or file. Log entries are either Fatal , Error , Warning , Information , Debug , or Verbose , in order of decreasing priority. The extractor will log any messages at an equal or higher log level than the configured level for each sink. |
pi
Global parameter.
Include the pi
section to configure the connection to the PI AF system.
This is how the PI AF extractor selects a system:
- If you configure
system-name
, the extractor selects the system by name from the preconfigured list of PI system on the machine the extractor runs on. - If you configure
host
, the extractor selects a PI system running on the PI server's host. - If you don't configure either of these parameters, the extractor selects the default system on the machine the extractor runs on. If there is no default system, the extractor selects the first system from the preconfigured PI system list on the machine the extractor runs on.
Parameter | Type | Description |
---|---|---|
host | string | Insert the base URL of the PI server's host. If you don't enter any value, you must configure a PI system in the installed SDK on the machine the extractor runs on. |
username | string | Required. Insert the Windows username on the PI server. |
password | string | Required. Insert the Windows password on the PI server. |
system-name | string | Enter the name of the PI system you want to use. This is used instead of host to select a PI system. |
database-name | string | Enter the name of the PI database you want to use. The default value is the default database configured on the machine the extractor runs on or the first database in the list of no default database in configured. |
cognite
Global parameter.
Configure connection to Cognite Data Fusion (CDF)
Parameter | Type | Description |
---|---|---|
project | string | CDF project to connect to. |
idp-authentication | object | The idp-authentication section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory).See OAuth 2.0 client credentials flow |
host | string | Insert the base URL of the CDF project. Default value is https://api.cognitedata.com . |
cdf-retries | object | Configure automatic retries on requests to CDF. |
cdf-chunking | object | Configure chunking of data on requests to CDF. Note that increasing these may cause requests to fail due to limits in the API itself |
cdf-throttling | object | Configure the maximum number of parallel requests for different CDF resources. |
sdk-logging | object | Configure logging of requests from the SDK |
nan-replacement | either number or null | Replacement for NaN values when writing to CDF. If left out, NaN values are skipped. |
extraction-pipeline | object | Configure an associated extraction pipeline |
certificates | object | Configure special handling of SSL certificates. This should never be considered a permanent solution to certificate problems |
idp-authentication
Part of cognite
configuration.
The idp-authentication
section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory).
See OAuth 2.0 client credentials flow
Parameter | Type | Description |
---|---|---|
authority | string | AInsert the authority together with tenant to authenticate against Azure tenants. Default value is https://login.microsoftonline.com/ . |
client-id | string | Required. Enter the service principal client id from the IdP. |
tenant | string | Enter the Azure tenant. |
token-url | string | Insert the URL to fetch tokens from. |
secret | string | Enter the service principal client secret from the IdP. |
resource | string | Resource parameter passed along with token requests. |
audience | string | Audience parameter passed along with token requests. |
scopes | configuration for either list or string | |
min-ttl | integer | Insert the minimum time in seconds a token will be valid. If the cached token expires in less than min-ttl seconds, it will be refreshed even if it is still valid. Default value is 30 . |
certificate | object | Authenticate with a client certificate |
certificate
Part of idp-authentication
configuration.
Authenticate with a client certificate
Parameter | Type | Description |
---|---|---|
authority-url | string | Authentication authority URL |
path | string | Required. Enter the path to the .pem or .pfx certificate to be used for authentication |
password | string | Enter the password for the key file, if it is encrypted. |
cdf-retries
Part of cognite
configuration.
Configure automatic retries on requests to CDF.
Parameter | Type | Description |
---|---|---|
timeout | integer | Timeout in milliseconds for each individual request to CDF. Default value is 80000 . |
max-retries | integer | Maximum number of retries on requests to CDF. If this is less than 0, retry forever. Default value is 5 . |
max-delay | integer | Max delay in milliseconds between each retry. Base delay is calculated according to 125*2^retry milliseconds. If less than 0, there is no maximum. Default value is 5000 . |
cdf-chunking
Part of cognite
configuration.
Configure chunking of data on requests to CDF. Note that increasing these may cause requests to fail due to limits in the API itself
Parameter | Type | Description |
---|---|---|
time-series | integer | Maximum number of timeseries per get/create timeseries request. Default value is 1000 . |
assets | integer | Maximum number of assets per get/create assets request. Default value is 1000 . |
data-point-time-series | integer | Maximum number of timeseries per datapoint create request. Default value is 10000 . |
data-point-delete | integer | Maximum number of ranges per delete datapoints request. Default value is 10000 . |
data-point-list | integer | Maximum number of timeseries per datapoint read request. Used when getting the first point in a timeseries. Default value is 100 . |
data-points | integer | Maximum number of datapoints per datapoints create request. Default value is 100000 . |
data-points-gzip-limit | integer | Minimum number of datapoints in request to switch to using gzip. Set to -1 to disable, and 0 to always enable (not recommended). The minimum HTTP packet size is generally 1500 bytes, so this should never be set below 100 for numeric datapoints. Even for larger packages gzip is efficient enough that packages are compressed below 1500 bytes. At 5000 it is always a performance gain. It can be set lower if bandwidth is a major issue. Default value is 5000 . |
raw-rows | integer | Maximum number of rows per request to cdf raw. Default value is 10000 . |
raw-rows-delete | integer | Maximum number of row keys per delete request to raw. Default value is 1000 . |
data-point-latest | integer | Maximum number of timeseries per datapoint read latest request. Default value is 100 . |
events | integer | Maximum number of events per get/create events request. Default value is 1000 . |
sequences | integer | Maximum number of sequences per get/create sequences request. Default value is 1000 . |
sequence-row-sequences | integer | Maximum number of sequences per create sequence rows request. Default value is 1000 . |
sequence-rows | integer | Maximum number of sequence rows per sequence when creating rows. Default value is 10000 . |
cdf-throttling
Part of cognite
configuration.
Configure the maximum number of parallel requests for different CDF resources.
Parameter | Type | Description |
---|---|---|
time-series | integer | Maximum number of parallel requests per timeseries operation. Default value is 20 . |
assets | integer | Maximum number of parallel requests per assets operation. Default value is 20 . |
data-points | integer | Maximum number of parallel requests per datapoints operation. Default value is 10 . |
raw | integer | Maximum number of parallel requests per raw operation. Default value is 10 . |
ranges | integer | Maximum number of parallel requests per get first/last datapoint operation. Default value is 20 . |
events | integer | Maximum number of parallel requests per events operation. Default value is 20 . |
sequences | integer | Maximum number of parallel requests per sequences operation. Default value is 10 . |
sdk-logging
Part of cognite
configuration.
Configure logging of requests from the SDK
Parameter | Type | Description |
---|---|---|
disable | boolean | True to disable logging from the SDK, it is enabled by default |
level | either trace , debug , information , warning , error , critical or none | Log level to log messages from the SDK at. Default value is debug . |
format | string | Format of the log message. Default value is CDF ({Message}): {HttpMethod} {Url} {ResponseHeader[X-Request-ID]} - {Elapsed} ms . |
extraction-pipeline
Part of cognite
configuration.
Configure an associated extraction pipeline
Parameter | Type | Description |
---|---|---|
external-id | string | External ID of the extraction pipeline |
frequency | integer | Frequency to report Seen to the extraction pipeline in seconds. Less than or equal to zero will not report automatically. Default value is 600 . |
certificates
Part of cognite
configuration.
Configure special handling of SSL certificates. This should never be considered a permanent solution to certificate problems
Parameter | Type | Description |
---|---|---|
accept-all | boolean | Accept all remote SSL certificates. This introduces a severe risk of man-in-the-middle attacks |
allow-list | list | List of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates |
allow-list
Part of certificates
configuration.
List of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates
Each element of this list should be a string.
extraction
Global parameter.
Include the extraction
section to configure how to extract data from PI AF.
Parameter | Type | Description |
---|---|---|
elements | object | Configuration for extraction PI AF Elements |
update-period | string | Enter the time between each time the extractor reads update events from the PI AF server. This is used to partiall refresh the PI AF elements to get newly created elements, or any changes to attribute values. Format is as given in Intervals. For instance, 2h means incremental updates run every other hour, starting at extractor startup. The extractor won't read updates at all if you set this parameter to 0 or a negative value. If both this parameter and refresh-period are set to 0 or a negative value, the extractor quits after reading all elements or after hitting the limit set in elements.limit . Default value is 0s . |
keep-alive | string | Time between each time the extractor checks for changes to system and database status. This serves as a kind of keep alive, which may be necessary in some cases where the connection is timed out by an external mechanism. Format is as given in Intervals. If this is 0 or negative, the extractor will not make keep alive requests. Default value is 5m . |
refresh-period | string | Time between each time the extractor performs a full refresh, reading all data from the PI AF server. Format is as given in Intervals. If this is 0 or negative, the extractor will only read all data on startup. Default value is 0s . |
elements
Part of extraction
configuration.
Configuration for extraction PI AF Elements
Parameter | Type | Description |
---|---|---|
chunk | integer | Insert the maximum number of PI AF elements to read per request to PI. These are immediately written to CDF RAW. Default value is 1000 . |
limit | integer | Insert the total maximum number of PI AF elements to read. Use this to get a reasonable subset of the server for testing. Note that this doesn't work if extraction.update-period is configured. |
query | string | Insert the string query, see Aveva documentation |
flatten-attributes | boolean | True to flatten attributes into a separate Raw table. If false , all attributes belonging to an element will be extracted as fields on elements in the elements table |
destination
Global parameter.
Include the destination
section to configure the destination for the extracted data. Currently this is only the CDF staging area (RAW).
Parameter | Type | Description |
---|---|---|
database | string | Insert the CDF RAW database to extract data to. If no database exists, the extractor creates a database. Default value is piaf . |
elements-table | string | Enter the table name for the PI AF elements in the CDF RAW database. If no table exists, the database creates a table. Default value is elements . |
unit-of-measure-classes-table | string | Enter the table name for unit-of-measure classes in the CDF RAW database. If no table exists, the extractor creates a table. Default value is unit-of-measure-classes . |
attributes-table | string | Enter the table name for the attributes in the CDF RAW database to be used if elements.flatten-attributes is set to true . If no table exists, the extractor creates a table. Default value is attributes . |
metrics
Global parameter.
Configuration for publishing metrics.
Parameter | Type | Description |
---|---|---|
server | object | Configuration for having the extractor start a Prometheus scrape server on a local port. |
push-gateways | list | A list of pushgateway destinations to push metrics to. The extractor will automatically push metrics to each of these. |
server
Part of metrics
configuration.
Configuration for having the extractor start a Prometheus scrape server on a local port.
Parameter | Type | Description |
---|---|---|
host | string | Required. Host name for local Prometheus server, must be exposed to some prometheus instance for scraping. Examples: localhost 0.0.0.0 |
port | integer | Required. The port used for a local Prometheus server. |
push-gateways
Part of metrics
configuration.
A list of pushgateway destinations to push metrics to. The extractor will automatically push metrics to each of these.
Parameter | Type | Description |
---|---|---|
host | string | Required. URI of the pushgateway host Example: http://my.pushgateway:9091 |
job | string | Required. Name of the Prometheus pushgateway job. |
username | string | Username for basic authentication |
password | string | Password for basic authentication |
push-interval | integer | Interval in seconds between each push to the gateway. Default value is 1 . |
logger
Global parameter.
Configuration for logging to console or file. Log entries are either Fatal
, Error
, Warning
, Information
, Debug
, or Verbose
, in order of decreasing priority. The extractor will log any messages at an equal or higher log level than the configured level for each sink.
Parameter | Type | Description |
---|---|---|
console | object | Configuration for logging to the console. |
file | object | Configuration for logging to a rotating log file. |
trace-listener | object | Adds a listener that uses the configured logger to output messages from System.Diagnostics.Trace |
console
Part of logger
configuration.
Configuration for logging to the console.
Parameter | Type | Description |
---|---|---|
level | either verbose , debug , information , warning , error or fatal | Required. Minimum level of log events to write to the console. If not present, or invalid, logging to console is disabled. |
stderr-level | either verbose , debug , information , warning , error or fatal | Log events at this level or above are redirected to standard error. |
file
Part of logger
configuration.
Configuration for logging to a rotating log file.
Parameter | Type | Description |
---|---|---|
level | either verbose , debug , information , warning , error or fatal | Required. Minimum level of log events to write to file. |
path | string | Required. Path to the files to be logged. If this is set to logs/log.txt , logs on the form logs/log[date].txt will be created, depending on rolling-interval . |
retention-limit | integer | Maximum number of log files that are kept in the log folder. Default value is 31 . |
rolling-interval | either day or hour | Rolling interval for log files. Default value is day . |
trace-listener
Part of logger
configuration.
Adds a listener that uses the configured logger to output messages from System.Diagnostics.Trace
Parameter | Type | Description |
---|---|---|
level | either verbose , debug , information , warning , error or fatal | Required. Level to output trace messages at |