Skip to main content

Configure the PI AF extractor

To configure the PI AF extractor, you must create a configuration file. The file must be in YAML format. You can use either the sample complete or minimal configuration files included with the installer as a starting point for your configuration settings:

  • config.default.yml - This file contains all configuration options and descriptions.

  • config.minimal.yml - This file contains a minimum configuration and no descriptions.

NAMING THE CONFIGURATION FILE

The configuration file must be named config.yml.

Tip

If the configuration file contains errors, check the Windows event viewer or the extractor logs if you've configured logging. You can also manually check the configuration by starting the extractor from a Windows shell.

Pi

Include the pi section to configure the connection to the PI AF system. This is how the PI AF extractor selects a system:

  • If you configure system-name, the extractor selects the system by name from the preconfigured list of PI systems on the machine the extractor runs on.

  • If you configure host, the extractor selects a PI system running on the PI server's host.

  • If you don't configure either of these parameters, the extractor selects the default system on the machine the extractor runs on. If there is no default system, the extractor selects the first system from the preconfigured PI system list on the machine the extractor runs on.

ParameterDescription
hostInsert the base URL for the PI server's host. If you don't enter any value, you must configure a PI system in the installed SDK on the machine the extractor runs on. This is an optional setting.
usernameInsert the Windows username on the PI server. This is a required value.
passwordInsert the Windows password on the PI server. This is a required value.
system-nameEnter the name of the PI system you want to use. This is used instead of host to select a PI system. This is an optional setting.
database-nameEnter the name of the PI database you want to use. The default value is the default database configured on the machine the extractor runs on or the first database in the list if no default database is configured. This is an optional setting.

Destination

Include the destination section to configure the destination for the extracted data. Currently, this is only the CDF staging area (RAW).

ParameterDescription
databaseInsert the CDF RAW database to extract data to. If no database exists, the extractor creates a database.
elements-tableEnter the table name for the PI AF elements in the CDF RAW database. If no table exists, the extractor creates a table. The default name is elements.
unit-of-measure-classes-tableEnter the table name for unit-of-measure classes in the CDF RAW database. If no table exists, the extractor creates a table. The default name is unit-of-measure-classes.
attributes-tableEnter the table name for the attributes in the CDF RAW database to be used if elements.flatten-attributes is set to true. If no table exists, the extractor creates a table. The default name is attributes.

Extraction

Include the extraction section to configure how to extract data from PI AF.

ParameterDescription
elements.chunkInsert the maximum number of PI AF elements to read per request to PI. These are immediately written to CDF RAW.
elements.queryInsert the string query. See the OSIsoft documentation.
elements.limitInsert the total maximum number of PI AF elements to read. Use this to get a reasonable subset of the server for testing. Note that this doesn't work if update_period is configured.
elements.flatten-attributesSet to true to create a row in a separate table for each attribute instead of creating an attribute hierarchy for each PI AF element in CDF RAW. The default value is false. See also attributes-table.
update-periodEnter the time between each time the extractor reads update events from the PI AF server. This is used to partially refresh the PI AF elements, to get newly created elements or any changes to attribute values.

The syntax is N[time unit], where
d(day)
h(hour)
m(min)
s (seconds)
ms (milliseconds).

For instance, 2h means this configuration runs once every other hour, starting at extractor startup. The default unit is seconds. The extractor will not read updates if you set this parameter to 0 or a negative value. If you set this parameter and refresh-period to 0 or a negative value, the extractor quits after reading all elements or after the limit set in elements.limit.
refresh-periodEnter the time between each time the extractor performs a full refresh, reading all data from the PI AF server again.

The syntax is N[time unit], where
d(day)
h(hour)
m(min)
s (seconds)
ms (milliseconds).

For instance, 2h means this configuration runs once every other hour, starting at extractor startup. The default unit is seconds. If you set 0 or a negative value, the extractor only reads PI AF elements at startup. If you set this parameter and update-period to 0 or a negative value, the extractor quits after reading all elements or after the limit set in elements.limit.
keep-aliveEnter the time between each time the extractor looks for updates in the PI system and database. This is a cheap operation that serves as a keep-alive.

The syntax is N[time unit], where
d(day)
h(hour)
m(min)
s (seconds)
ms (milliseconds)

For instance, 2h means this configuration runs once every other hour, starting at extractor startup. The default unit is seconds. If you set 0 or a negative value, the extractor will not look for updates. The default value is every 5 minutes.

Cognite

Include the cognite section to configure which CDF project the extractor will load data into and how to connect to the project. This section is mandatory and should always contain the project and authentication configuration.

ParameterDescription
projectInsert the CDF project name you want to ingest data into. This is a required value.
api-keyEnter the API key for the CDF project. You must enter either an API key or use IdP authentication.
hostInsert the base URL for the CDF project. The default value is https://api.cognitedata.com.
idp-authenticationInsert the client credentials for authenticating to CDF using an external identity provider. You must enter either an API key or use IdP authentication.

token-url- Insert the URL to fetch tokens from. You must enter either a token URL or an Azure tenant.

client ID - Enter the client ID from the IdP. This is a required value.

tenant - Enter the Azure tenant. This is a required value.

secret - Enter the client secret from the IdP. This is a required value.

scopes- List the scopes. This is a required value..

min-ttl - Enter the minimum time in seconds a token will be valid. The cached token is refreshed if it expires in less than min_ttl seconds. The default value is 30. This is an optional value.

authority- Insert the base URL for the authority. The default value is https://login.microsoftonline.com/.
cdf-retriesConfigure the automatic retry policy used for requests to CDF.

timeout - Specify the timeout in milliseconds for each retry. The default value is 80000.

max-retries - Enter the maximum number of retries. If you enter a negative value, the extractor keeps retrying. The default value is 5.

max-delay - Enter the maximum delay in milliseconds between each try. The base delay is calculated according to 125*2^retry ms. If you enter a negative value, there is no maximum delay. 0 means that there is never any delay. The default value is 5000.

You don't need to change these values unless the connection to CDF is poor. Lowering the maximum number of retries also lowers the time to failure-buffering starts, which may be necessary if there is a lot of data.
cdf-chunkingConfigure the number of requests against CDF endpoints. This parameter is optional. If you don't enter any values, the extractor uses the default values based on CDF's current limits.

raw-rows - Enter the maximum number of rows per request to CDF RAW. This is used with raw state-store and for RAW asset and time series metadata. The default value is 10000.
cdf-throttlingConfigure how the extractor throttle requests to CDF. Each entry is the maximum allowed number of parallel requests to CDF. The only relevant field here is raw.
sdk-loggingEnable or disable output log messages from the .NET SDK. This additional debug information about requests shows the failed requests and how long they take.

disable - Set to true to disable logging from the SDK. The default value is false.

level- Enter the minimum level of logging, either trace, debug, information, warning, error, critical, none. The default value is debug.

format- Select the format of the log message.
extraction-pipelineInsert the external ID of the extraction pipeline in CDF. You should create the extraction pipeline before you configure this parameter.

pipeline-id- Enter the external ID of the extraction pipeline in CDF.

frequency- Enter the frequency in seconds to report Seen. If you enter 0 or a negative value, no reports are generated.
certificatesConfigure this parameter for special handling of SSL certificates. This should never be considered a permanent solution to certificate problems.

accept-all - Set to true to accept all certificates. This poses a severe security risk.

allow-list - List the thumbprints of allowed certificates. This is a smaller risk compared to accepting all certificates.

Logger

ParameterDescription
console/levelSelect the verbosity level for console logging. If this parameter is not set or invalid, logging to a console is disabled.
file/levelSelect the verbosity level for file logging. If this parameter is not set or invalid, logging to a file is disabled.
file/pathInsert the path to the file logs. Logs are rotated according to file/rolling-interval.
file/retention-limitInsert the maximum number of logs to keep in the log folder. The oldest logs will be deleted according to file/rolling-interval.
file/rolling-intervalInsert the rolling interval for log files as either day or hour. The default value is day.