Configuration settings
To configure the PI extractor, you must create a configuration file. This file must be in YAML format. The configuration file is split into sections, each represented by a top-level entry in the YAML format. Subsections are nested under a section in the YAML format.
You can use either the sample complete or minimal configuration files included with the installer as a starting point for your configuration settings:
-
config.default.yml - This file contains all configuration options and descriptions.
-
config.minimal.yml - This file contains a minimum configuration and no descriptions.
You must name the configuration file config.yml.
You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.
Before you start
- Optionally, copy one of the sample files in the
config
directory and rename it toconfig.yml
. - The
config.minimal.yml
file doesn't include a metrics section. Copy this section from the example below if the extractor is required to send metrics to a Prometheus Pushgateway. - Set up an extraction pipeline and note the external ID.
Minimal YAML configuration file
The YAML settings below contain valid PI extractor 2.1 configurations. The values wrapped in ${}
are replaced with environment variables with that name. For example, ${COGNITE_PROJECT}
will be replaced with the value of the environment variable called COGNITE_PROJECT
.
The configuration file has a global parameter version, which holds the version of the configuration schema used in the configuration file. This document describes version 3 of the configuration schema.
version: 3
cognite:
project: '${COGNITE_PROJECT}'
idp-authentication:
tenant: ${COGNITE_TENANT_ID}
client-id: ${COGNITE_CLIENT_ID}
secret: ${COGNITE_CLIENT_SECRET}
scopes:
- ${COGNITE_SCOPE}
time-series:
external-id-prefix: 'pi:'
pi:
host: ${PI_HOST}
username: ${PI_USER}
password: ${PI_PASSWORD}
state-store:
database: LiteDb
location: 'state.db'
logger:
file:
level: 'information'
path: 'logs/log.log'
metrics:
push-gateways:
- host: 'https://prometheus-push.cognite.ai/'
username: ${PROMETHEUS_USER}
password: ${PROMETHEUS_PASSWORD}
job: ${PROMETHEUS_JOB}
where:
-
version
is the version of the configuration schema. Use version 3 to be compatible with the Cognite PI extractor 2.1. -
cognite
is the how the extractor reads the authentication details for PI (pi) and CDF (cognite) from environment variables. Since no host is specified in thecognite
section, the extractor uses the default value, <https://api.cognitedata.com >, and assumes that the PI server uses Windows authentication. -
time-series
configures the extractor to create time series in CDF where the external IDs will be prefixed withpi:
. You can also use a data set ID configuration to add all time series created by the extractor to a particular data set. -
state-store
configures the extractor to save the extraction state locally using a LiteDB database file namedstate.db
. -
logger
configures the extractor to log at information level and outputs log messages to a log file in the logs/log.log directory. By default, new files are created daily and retained for 31 days. The date is appended to the file name. -
metrics
points to the Prometheus Pushgateway hosted by CDF. It assumes that a user has already been created. For the Pushgateway hosted by CDF, the job name(${PROMETHEUS_JOB})
must start with the username followed by'-'
.
Using values from Azure Key Vault
The PI extractor also supports loading values from Azure Key Vault. To load a configuration value from Azure Key Vault, use the !keyvault
tag followed by the name of the secret you want to load. For example, to load the value of the my-secret-name
secret in Key Vault into a password
parameter, configure your extractor like this:
password: !keyvault my-secret-name
To use Key Vault, you also need to include the azure-keyvault
section in your configuration, with the following parameters:
Parameter | Description |
---|---|
keyvault-name | Name of Key Vault to load secrets from |
authentication-method | How to authenticate to Azure. Either default or client-secret . For default , the extractor will look at the user running the extractor, and look for pre-configured Azure logins from tools like the Azure CLI. For client-secret , the extractor will authenticate with a configured client ID/secret pair. |
client-id | Required for using the client-secret authentication method. The client ID to use when authenticating to Azure. |
secret | Required for using the client-secret authentication method. The client secret to use when authenticating to Azure. |
tenant-id | Required for using the client-secret authentication method. The tenant ID of the Key Vault in Azure. |
Example:
azure-keyvault:
keyvault-name: my-keyvault-name
authentication-method: client-secret
tenant-id: 6f3f324e-5bfc-4f12-9abe-22ac56e2e648
client-id: 6b4cc73e-ee58-4b61-ba43-83c4ba639be6
secret: 1234abcd
Intervals
In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit]
, for example, 10m
for 10 minutes or 1h
for 1 hour. timeunit
is one of d
, h
, m
, s
, ms
. You can also use a cron expression when this makes sense.
Configure the PI extractor
Configuration for the PI extractor. Each section configures a different aspect of the extractor.
Parameter | Type | Description |
---|---|---|
version | integer | Version of the config file, the extractor specifies which config file versions are accepted in each version of the extractor. |
logger | object | Configuration for logging to console or file. Log entries are either Fatal , Error , Warning , Information , Debug , or Verbose , in order of decreasing priority. The extractor will log any messages at an equal or higher log level than the configured level for each sink. |
metrics | object | Configuration for publishing metrics. |
cognite | object | Configure connection to Cognite Data Fusion (CDF) |
state-store | object | Use a local LiteDb database or a set of tables in CDF RAW to store persistent information between runs. This can be used to avoid loading large volumes of data from CDF on startup, which can greatly speed up the extractor. |
pi | object | Configure the extractor to connect to a particular PI server or PI collective. If you configure the extractor with a PI collective, the extractor will transparently maintain a connection to one of the active servers in the collective. The default settings provide Active Directory authorization to the PI host when the server account for the Windows service is authorized. |
time-series | object | Include the time-series section for configuration related to the time series ingested by the extractor. This section is optional. |
events | object | Include the events section for configuration related to writing events on extractor incidents. This section is optional. If configured and store-extractor-events-interval is greater than zero, the PI extractor creates events for extractor reconnection and PI data pipe loss incidents. Reconnection and data loss may cause the extractor to miss some historical data point updates. Logging these as events in CDF provides a way of inspecting data quality. When combined with the PI replace utility, you can use these events to correct data inconsistencies in CDF. |
extractor | object | The extractor section contains various configuration options for the operation of the extractor itself. The options here can be used to extract only a subset of the PI points in the server. This is how the list is created:1. If include-tags , include-prefixes , include-patterns or include-attribute-values are not empty, start with the union of these three. Otherwise, start with all points.2. Remove points as specified by exclude-tags , exclude-prefixes , exclude-patterns and exclude-attribute-values . |
backfill | object | Include the backfill section to configure how the extractor fills in historical data back in time with respect to the first data point in CDF. The backfill process completes when all the data points in the PI Data Archive are sent to CDF or when the extractor reaches the target timestamp for all time series if the to parameter is set. |
frontfill | object | Include the frontfill section to configure how the extractor fills in historical data forward in time with respect to the last data point in CDF. At startup, the extractor fills in the gap between the last data point in CDF and the last data point in PI by querying the archived data in the PI Data Archive. After that, the extractor only receives data streamed through the PI Data Pipe. These are real-time changes made to the time series in PI before archiving. |
high-availability | object | Configuration for a Redis based high availability store. Requires Redis to be configured in state-store . |