Skip to main content

Configuration settings

To configure the PI extractor, you must create a configuration file. This file must be in YAML format. The configuration file is split into sections, each represented by a top-level entry in the YAML format. Subsections are nested under a section in the YAML format.

You can use either the sample complete or minimal configuration files included with the installer as a starting point for your configuration settings:

  • config.default.yml - This file contains all configuration options and descriptions.

  • config.minimal.yml - This file contains a minimum configuration and no descriptions.

Naming the configuration file

You must name the configuration file config.yml.

Tip

You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.

Before you start

  • Optionally, copy one of the sample files in the config directory and rename it to config.yml.
  • The config.minimal.yml file doesn't include a metrics section. Copy this section from the example below if the extractor is required to send metrics to a Prometheus Pushgateway.
  • Set up an extraction pipeline and note the external ID.

Minimal YAML configuration file

The YAML settings below contain valid PI extractor 2.1 configurations. The values wrapped in ${} are replaced with environment variables with that name. For example, ${COGNITE_PROJECT} will be replaced with the value of the environment variable called COGNITE_PROJECT.

The configuration file has a global parameter version, which holds the version of the configuration schema used in the configuration file. This document describes version 3 of the configuration schema.

version: 3

cognite:
project: '${COGNITE_PROJECT}'
idp-authentication:
tenant: ${COGNITE_TENANT_ID}
client-id: ${COGNITE_CLIENT_ID}
secret: ${COGNITE_CLIENT_SECRET}
scopes:
- ${COGNITE_SCOPE}

time-series:
external-id-prefix: 'pi:'

pi:
host: ${PI_HOST}
username: ${PI_USER}
password: ${PI_PASSWORD}

state-store:
database: LiteDb
location: 'state.db'

logger:
file:
level: 'information'
path: 'logs/log.log'

metrics:
push-gateways:
- host: 'https://prometheus-push.cognite.ai/'
username: ${PROMETHEUS_USER}
password: ${PROMETHEUS_PASSWORD}
job: ${PROMETHEUS_JOB}

where:

  • version is the version of the configuration schema. Use version 3 to be compatible with the Cognite PI extractor 2.1.

  • cognite is the how the extractor reads the authentication details for PI (pi) and CDF (cognite) from environment variables. Since no host is specified in the cognite section, the extractor uses the default value, <https://api.cognitedata.com >, and assumes that the PI server uses Windows authentication.

  • time-series configures the extractor to create time series in CDF where the external IDs will be prefixed with pi:. You can also use a data set ID configuration to add all time series created by the extractor to a particular data set.

  • state-store configures the extractor to save the extraction state locally using a LiteDB database file named state.db.

  • logger configures the extractor to log at information level and outputs log messages to a log file in the logs/log.log directory. By default, new files are created daily and retained for 31 days. The date is appended to the file name.

  • metrics points to the Prometheus Pushgateway hosted by CDF. It assumes that a user has already been created. For the Pushgateway hosted by CDF, the job name (${PROMETHEUS_JOB}) must start with the username followed by '-'.

Using values from Azure Key Vault

The PI extractor also supports loading values from Azure Key Vault. To load a configuration value from Azure Key Vault, use the !keyvault tag followed by the name of the secret you want to load. For example, to load the value of the my-secret-name secret in Key Vault into a password parameter, configure your extractor like this:

password: !keyvault my-secret-name

To use Key Vault, you also need to include the azure-keyvault section in your configuration, with the following parameters:

ParameterDescription
keyvault-nameName of Key Vault to load secrets from
authentication-methodHow to authenticate to Azure. Either default or client-secret. For default, the extractor will look at the user running the extractor, and look for pre-configured Azure logins from tools like the Azure CLI. For client-secret, the extractor will authenticate with a configured client ID/secret pair.
client-idRequired for using the client-secret authentication method. The client ID to use when authenticating to Azure.
secretRequired for using the client-secret authentication method. The client secret to use when authenticating to Azure.
tenant-idRequired for using the client-secret authentication method. The tenant ID of the Key Vault in Azure.

Example:

azure-keyvault:
keyvault-name: my-keyvault-name
authentication-method: client-secret
tenant-id: 6f3f324e-5bfc-4f12-9abe-22ac56e2e648
client-id: 6b4cc73e-ee58-4b61-ba43-83c4ba639be6
secret: 1234abcd

Intervals

In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit], for example, 10m for 10 minutes or 1h for 1 hour. timeunit is one of d, h, m, s, ms. You can also use a cron expression when this makes sense.

Configure the PI extractor

Configuration for the PI extractor. Each section configures a different aspect of the extractor.

ParameterTypeDescription
versionintegerVersion of the config file, the extractor specifies which config file versions are accepted in each version of the extractor.
loggerobjectConfiguration for logging to console or file. Log entries are either Fatal, Error, Warning, Information, Debug, or Verbose, in order of decreasing priority. The extractor will log any messages at an equal or higher log level than the configured level for each sink.
metricsobjectConfiguration for publishing metrics.
cogniteobjectConfigure connection to Cognite Data Fusion (CDF)
state-storeobjectUse a local LiteDb database or a set of tables in CDF RAW to store persistent information between runs. This can be used to avoid loading large volumes of data from CDF on startup, which can greatly speed up the extractor.
piobjectConfigure the extractor to connect to a particular PI server or PI collective. If you configure the extractor with a PI collective, the extractor will transparently maintain a connection to one of the active servers in the collective. The default settings provide Active Directory authorization to the PI host when the server account for the Windows service is authorized.
time-seriesobjectInclude the time-series section for configuration related to the time series ingested by the extractor. This section is optional.
eventsobjectInclude the events section for configuration related to writing events on extractor incidents. This section is optional. If configured and store-extractor-events-interval is greater than zero, the PI extractor creates events for extractor reconnection and PI data pipe loss incidents. Reconnection and data loss may cause the extractor to miss some historical data point updates. Logging these as events in CDF provides a way of inspecting data quality. When combined with the PI replace utility, you can use these events to correct data inconsistencies in CDF.
extractorobjectThe extractor section contains various configuration options for the operation of the extractor itself. The options here can be used to extract only a subset of the PI points in the server. This is how the list is created:

1. If include-tags, include-prefixes, include-patterns or include-attribute-values are not empty, start with the union of these three. Otherwise, start with all points.
2. Remove points as specified by exclude-tags, exclude-prefixes, exclude-patterns and exclude-attribute-values.
backfillobjectInclude the backfill section to configure how the extractor fills in historical data back in time with respect to the first data point in CDF. The backfill process completes when all the data points in the PI Data Archive are sent to CDF or when the extractor reaches the target timestamp for all time series if the to parameter is set.
frontfillobjectInclude the frontfill section to configure how the extractor fills in historical data forward in time with respect to the last data point in CDF. At startup, the extractor fills in the gap between the last data point in CDF and the last data point in PI by querying the archived data in the PI Data Archive. After that, the extractor only receives data streamed through the PI Data Pipe. These are real-time changes made to the time series in PI before archiving.
high-availabilityobjectConfiguration for a Redis based high availability store. Requires Redis to be configured in state-store.

logger

Global parameter.

Configuration for logging to console or file. Log entries are either Fatal, Error, Warning, Information, Debug, or Verbose, in order of decreasing priority. The extractor will log any messages at an equal or higher log level than the configured level for each sink.

ParameterTypeDescription
consoleobjectConfiguration for logging to the console.
fileobjectConfiguration for logging to a rotating log file.
trace-listenerobjectAdds a listener that uses the configured logger to output messages from System.Diagnostics.Trace

console

Part of logger configuration.

Configuration for logging to the console.

ParameterTypeDescription
leveleither verbose, debug, information, warning, error or fatalRequired. Minimum level of log events to write to the console. If not present, or invalid, logging to console is disabled.
stderr-leveleither verbose, debug, information, warning, error or fatalLog events at this level or above are redirected to standard error.

file

Part of logger configuration.

Configuration for logging to a rotating log file.

ParameterTypeDescription
leveleither verbose, debug, information, warning, error or fatalRequired. Minimum level of log events to write to file.
pathstringRequired. Path to the files to be logged. If this is set to logs/log.txt, logs on the form logs/log[date].txt will be created, depending on rolling-interval.
retention-limitintegerMaximum number of log files that are kept in the log folder. Default value is 31.
rolling-intervaleither day or hourRolling interval for log files. Default value is day.

trace-listener

Part of logger configuration.

Adds a listener that uses the configured logger to output messages from System.Diagnostics.Trace

ParameterTypeDescription
leveleither verbose, debug, information, warning, error or fatalRequired. Level to output trace messages at

metrics

Global parameter.

Configuration for publishing metrics.

ParameterTypeDescription
serverobjectConfiguration for having the extractor start a Prometheus scrape server on a local port.
push-gatewayslistA list of pushgateway destinations to push metrics to. The extractor will automatically push metrics to each of these.

server

Part of metrics configuration.

Configuration for having the extractor start a Prometheus scrape server on a local port.

ParameterTypeDescription
hoststringRequired. Host name for local Prometheus server, must be exposed to some prometheus instance for scraping.

Examples:
localhost
0.0.0.0
portintegerRequired. The port used for a local Prometheus server.

push-gateways

Part of metrics configuration.

A list of pushgateway destinations to push metrics to. The extractor will automatically push metrics to each of these.

ParameterTypeDescription
hoststringRequired. URI of the pushgateway host

Example:
http://my.pushgateway:9091
jobstringRequired. Name of the Prometheus pushgateway job.
usernamestringUsername for basic authentication
passwordstringPassword for basic authentication
push-intervalintegerInterval in seconds between each push to the gateway. Default value is 1.

cognite

Global parameter.

Configure connection to Cognite Data Fusion (CDF)

ParameterTypeDescription
projectstringCDF project to connect to.
idp-authenticationobjectThe idp-authentication section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory).
See OAuth 2.0 client credentials flow
hoststringInsert the base URL of the CDF project. Default value is https://api.cognitedata.com.
cdf-retriesobjectConfigure automatic retries on requests to CDF.
cdf-chunkingobjectConfigure chunking of data on requests to CDF. Note that increasing these may cause requests to fail due to limits in the API itself
cdf-throttlingobjectConfigure the maximum number of parallel requests for different CDF resources.
sdk-loggingobjectConfigure logging of requests from the SDK
nan-replacementeither number or nullReplacement for NaN values when writing to CDF. If left out, NaN values are skipped.
extraction-pipelineobjectConfigure an associated extraction pipeline
certificatesobjectConfigure special handling of SSL certificates. This should never be considered a permanent solution to certificate problems
metadata-targetsobjectConfiguration for targets for time series metadata.

idp-authentication

Part of cognite configuration.

The idp-authentication section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory). See OAuth 2.0 client credentials flow

ParameterTypeDescription
authoritystringAInsert the authority together with tenant to authenticate against Azure tenants. Default value is https://login.microsoftonline.com/.
client-idstringRequired. Enter the service principal client id from the IdP.
tenantstringEnter the Azure tenant.
token-urlstringInsert the URL to fetch tokens from.
secretstringEnter the service principal client secret from the IdP.
resourcestringResource parameter passed along with token requests.
audiencestringAudience parameter passed along with token requests.
scopesconfiguration for either list or string
min-ttlintegerInsert the minimum time in seconds a token will be valid. If the cached token expires in less than min-ttl seconds, it will be refreshed even if it is still valid. Default value is 30.
certificateobjectAuthenticate with a client certificate

certificate

Part of idp-authentication configuration.

Authenticate with a client certificate

ParameterTypeDescription
authority-urlstringAuthentication authority URL
pathstringRequired. Enter the path to the .pem or .pfx certificate to be used for authentication
passwordstringEnter the password for the key file, if it is encrypted.

cdf-retries

Part of cognite configuration.

Configure automatic retries on requests to CDF.

ParameterTypeDescription
timeoutintegerTimeout in milliseconds for each individual request to CDF. Default value is 80000.
max-retriesintegerMaximum number of retries on requests to CDF. If this is less than 0, retry forever. Default value is 5.
max-delayintegerMax delay in milliseconds between each retry. Base delay is calculated according to 125*2^retry milliseconds. If less than 0, there is no maximum. Default value is 5000.

cdf-chunking

Part of cognite configuration.

Configure chunking of data on requests to CDF. Note that increasing these may cause requests to fail due to limits in the API itself

ParameterTypeDescription
time-seriesintegerMaximum number of timeseries per get/create timeseries request. Default value is 1000.
assetsintegerMaximum number of assets per get/create assets request. Default value is 1000.
data-point-time-seriesintegerMaximum number of timeseries per datapoint create request. Default value is 10000.
data-point-deleteintegerMaximum number of ranges per delete datapoints request. Default value is 10000.
data-point-listintegerMaximum number of timeseries per datapoint read request. Used when getting the first point in a timeseries. Default value is 100.
data-pointsintegerMaximum number of datapoints per datapoints create request. Default value is 100000.
data-points-gzip-limitintegerMinimum number of datapoints in request to switch to using gzip. Set to -1 to disable, and 0 to always enable (not recommended). The minimum HTTP packet size is generally 1500 bytes, so this should never be set below 100 for numeric datapoints. Even for larger packages gzip is efficient enough that packages are compressed below 1500 bytes. At 5000 it is always a performance gain. It can be set lower if bandwidth is a major issue. Default value is 5000.
raw-rowsintegerMaximum number of rows per request to cdf raw. Default value is 10000.
raw-rows-deleteintegerMaximum number of row keys per delete request to raw. Default value is 1000.
data-point-latestintegerMaximum number of timeseries per datapoint read latest request. Default value is 100.
eventsintegerMaximum number of events per get/create events request. Default value is 1000.
sequencesintegerMaximum number of sequences per get/create sequences request. Default value is 1000.
sequence-row-sequencesintegerMaximum number of sequences per create sequence rows request. Default value is 1000.
sequence-rowsintegerMaximum number of sequence rows per sequence when creating rows. Default value is 10000.

cdf-throttling

Part of cognite configuration.

Configure the maximum number of parallel requests for different CDF resources.

ParameterTypeDescription
time-seriesintegerMaximum number of parallel requests per timeseries operation. Default value is 20.
assetsintegerMaximum number of parallel requests per assets operation. Default value is 20.
data-pointsintegerMaximum number of parallel requests per datapoints operation. Default value is 10.
rawintegerMaximum number of parallel requests per raw operation. Default value is 10.
rangesintegerMaximum number of parallel requests per get first/last datapoint operation. Default value is 20.
eventsintegerMaximum number of parallel requests per events operation. Default value is 20.
sequencesintegerMaximum number of parallel requests per sequences operation. Default value is 10.

sdk-logging

Part of cognite configuration.

Configure logging of requests from the SDK

ParameterTypeDescription
disablebooleanTrue to disable logging from the SDK, it is enabled by default
leveleither trace, debug, information, warning, error, critical or noneLog level to log messages from the SDK at. Default value is debug.
formatstringFormat of the log message. Default value is CDF ({Message}): {HttpMethod} {Url} {ResponseHeader[X-Request-ID]} - {Elapsed} ms.

extraction-pipeline

Part of cognite configuration.

Configure an associated extraction pipeline

ParameterTypeDescription
external-idstringExternal ID of the extraction pipeline
frequencyintegerFrequency to report Seen to the extraction pipeline in seconds. Less than or equal to zero will not report automatically. Default value is 600.

certificates

Part of cognite configuration.

Configure special handling of SSL certificates. This should never be considered a permanent solution to certificate problems

ParameterTypeDescription
accept-allbooleanAccept all remote SSL certificates. This introduces a severe risk of man-in-the-middle attacks
allow-listlistList of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates

allow-list

Part of certificates configuration.

List of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates

Each element of this list should be a string.

metadata-targets

Part of cognite configuration.

Configuration for targets for time series metadata.

ParameterTypeDescription
rawobjectConfiguration for writing metadata to CDF Raw.
cleanobjectConfiguration for enabling writing metadata to CDF Clean.

raw

Part of metadata-targets configuration.

Configuration for writing metadata to CDF Raw.

ParameterTypeDescription
databasestringRequired. The Raw database to write to.
timeseries-tablestringName of the Raw table to write timeseries metadata to, enables writing metadata to Raw. Metadata in this case includes name, description, and unit.

clean

Part of metadata-targets configuration.

Configuration for enabling writing metadata to CDF Clean.

ParameterTypeDescription
timeseriesbooleanSet to false to disable writing metadata to time series. Default value is True.

state-store

Global parameter.

Use a local LiteDb database or a set of tables in CDF RAW to store persistent information between runs. This can be used to avoid loading large volumes of data from CDF on startup, which can greatly speed up the extractor.

ParameterTypeDescription
locationstringRequired. Path to .db file used for storage, or name of a CDF RAW database.
databaseeither None, LiteDb or RawWhich type of database to use. Default value is None.
intervalstringEnter the time between each write to the state store. 0 or less disables the state store. Format is as given in Intervals. Default value is 10s.
time-series-table-namestringTable name in Raw/Redis/LiteDB for storing time series state. Default value is ranges.
extractor-table-namestringTable name in Raw/Redis/LiteDB for storing general extractor state. Default value is extractor.
redisbooleanUse a redis state store. Overrides type.

pi

Global parameter.

Configure the extractor to connect to a particular PI server or PI collective. If you configure the extractor with a PI collective, the extractor will transparently maintain a connection to one of the active servers in the collective. The default settings provide Active Directory authorization to the PI host when the server account for the Windows service is authorized.

ParameterTypeDescription
hoststringRequired. Enter the hostname or IP address of the PI server or the PI collective name. If the host is a collective member name, connect to that member.
usernamestringEnter the username to use for authentication. Leave this empty if the Windows service account under which the extractor runs is authorized in Active Directory to access the PI host.
passwordstringEnter the password for the given username, if any.
native-authenticationbooleanDetermines whether the extractor will use native PI authentication or Windows authentication. The default value is false, which indicates Windows authentication.
parallelismintegerInsert the number of parallel requests to the PI server. If backfill-parallelism is set, this excludes backfill requests. Default value is 1.
backfill-parallelismintegerInsert the number of parallel requests to the PI server for backfills. This allows the separate throttling of backfills. The default value is 0, which means that the value in parallelism is used.
use-member-prioritybooleanWhen connecting to a PI Collective, attempt to connect to the member with highest priority. If set to false, attempt to connect to the member with the same name as host.
max-connection-retriesintegerThe maximum number of times to attempt to connect to the PI server before failing fatally. If this is 0, retry forever.

time-series

Global parameter.

Include the time-series section for configuration related to the time series ingested by the extractor. This section is optional.

The example would create time series on the following form in CDF:

{
"externalId": "pi:12345",
"name": "PI-Point-Name",
"isString": false,
"isStep": false,
"dataSetId": 1234567890123456
}

Example:

external-id-prefix: 'pi:'
external-id-source: SourceId
data-set-id: 1234567890123456
ParameterTypeDescription
external-id-prefixstringEnter the external ID prefix to identify the time series in CDF. Leave empty for no prefix. The external ID in CDF will be this prefix followed by either the PI Point name or PI Point ID.
external-id-sourceeither Name or SourceIdEnter the source of the external ID. Name means that the PI Point name is used, while SourceID means that the PI Point ID is used. Default value is Name.
data-set-idintegerSpecify the data set to assign to all time series controlled by the extractor, both new and current. If you don't configure this, the extractor will not change the current time series' data set.
data-set-external-idstringSpecify the external ID of the data set to use, see data-set-id. Using this requires the dataSets:READ ACL in CDF
sanitationeither Remove, Clean or NoneSpecify what to do when the time series fields exceed CDF limits. Remove will skip any time series that fail sanitation. Clean will truncate and remove values to conform to limits. None does nothing (requests may fail as a result). External IDs are never truncated, any time series exceeding CDF limits will be skipped to avoid external ID collisions, regardless of this configuration. Default value is Clean.
update-metadatabooleanEnable updating time series if extractor configuration or PI metadata changes. Default value is True.

events

Global parameter.

Include the events section for configuration related to writing events on extractor incidents. This section is optional. If configured and store-extractor-events-interval is greater than zero, the PI extractor creates events for extractor reconnection and PI data pipe loss incidents. Reconnection and data loss may cause the extractor to miss some historical data point updates. Logging these as events in CDF provides a way of inspecting data quality. When combined with the PI replace utility, you can use these events to correct data inconsistencies in CDF.

The example configuration produces events on the following form

{
"externalId": "pi-events:PiExtractor-2020-08-04 01:01:21.395(0)",
"startTime": 1596502850668,
"endTime": 1596502880393,
"type": "DataLoss",
"subtype": "DataPipeOverflow",
"source": "PiExtractor",
"dataSetId": 1234567890123456
}

Example:

source: PiExtractor
external-id-prefix: 'pi-events:'
data-set-id: 1234567890123456
store-extractor-events-interval: 5m
ParameterTypeDescription
sourcestringEvents have this value as the source in CDF.
external-id-prefixstringSet an external ID prefix to identify events in CDF. Leave empty for no prefix.
data-set-idintegerSelect a data set to assign to all events created by the extractor. We recommend using the same data set ID used in the Time series section.
data-set-external-idstringData set external ID added to events written to CDF, see data-set-id. Using this requires the dataSets:READ ACL in CDF
store-extractor-events-intervalstringStore events in CDF with this interval. Format is as given in Intervals. If this is not set, events are not created in CDF.

Examples:
10s
5m

extractor

Global parameter.

The extractor section contains various configuration options for the operation of the extractor itself. The options here can be used to extract only a subset of the PI points in the server. This is how the list is created:

  1. If include-tags, include-prefixes, include-patterns or include-attribute-values are not empty, start with the union of these three. Otherwise, start with all points.
  2. Remove points as specified by exclude-tags, exclude-prefixes, exclude-patterns and exclude-attribute-values.
ParameterTypeDescription
include-tagslistList of tag names to include.
include-prefixeslistList of tag name prefixes to include.
include-patternslistList of substrings of tag names to include.
include-attribute-valueslistList of attribute values to include.
exclude-tagslistList of tag names to exclude.
exclude-prefixeslistList of tag name prefixes to exclude.
exclude-patternslistList of substrings of tag names to exclude.
exclude-attribute-valueslistList of attribute values to exclude.
time-series-update-intervaleither string or integerInterval between checks for new time series in PI. Format is as given in Intervals, this option accepts cron expressions. Default value is 24h.
deleted-time-seriesobjectInclude the deleted-time-series subsection to configure how the extractor handles time series that exist in CDF but not in PI. This subsection is optional, and the default behavior is none (do nothing).
end-of-stream-intervalstringInterval for fetching the end-of-stream timestamps from PI. Format is as given in Intervals.
end-of-stream-chunkingintegerMaximum number of time series per end-of-stream request. Default value is 10000.
end-of-stream-throttleintegerMaximum number of parallel end-of-stream requests. Default value is 10.
time-series-update-throttleintegerMaximum number of parallel time series updates. Default value is 10.
dry-runbooleanSet this to true to run the extractor in dry-run mode, where it does not push any data to CDF. Useful for testing the connection to the PI Server
read-extracted-rangesbooleanSet this to true to disable reading from CDF on extractor startup. If this is set to false it is strongly recommended to have a state-store configured, or the extractor will read all history from PI on every run. Default value is True.
status-codesobjectALPHA: Configuration for ingesting status codes to CDF timeseries.

include-tags

Part of extractor configuration.

List of tag names to include.

Each element of this list should be a string.

include-prefixes

Part of extractor configuration.

List of tag name prefixes to include.

Each element of this list should be a string.

include-patterns

Part of extractor configuration.

List of substrings of tag names to include.

Each element of this list should be a string.

include-attribute-values

Part of extractor configuration.

List of attribute values to include.

ParameterTypeDescription
keystringAttribute name
valuestringAttribute value

exclude-tags

Part of extractor configuration.

List of tag names to exclude.

Each element of this list should be a string.

exclude-prefixes

Part of extractor configuration.

List of tag name prefixes to exclude.

Each element of this list should be a string.

exclude-patterns

Part of extractor configuration.

List of substrings of tag names to exclude.

Each element of this list should be a string.

exclude-attribute-values

Part of extractor configuration.

List of attribute values to exclude.

ParameterTypeDescription
keystringAttribute name
valuestringAttribute value

deleted-time-series

Part of extractor configuration.

Include the deleted-time-series subsection to configure how the extractor handles time series that exist in CDF but not in PI. This subsection is optional, and the default behavior is none (do nothing).

Note

This only affects time series with the same data set ID and external ID prefix as the time series configured in the extractor

To find time series that exist in CDF but not in PI, the extractor:

  • Lists all time series in CDF that have the configured external ID prefix and data set ID
  • Filters the time series using the include/exclude rules defined in the extractor section.
  • Matches the result against the time series obtained from the PI Server after filtering these using the include/exclude rules.
ParameterTypeDescription
behavioreither None, Flag or DeleteSelect the action taken by the extractor. Setting this to Flag will perform soft deletion: Flag the time series as deleted but don't delete them from CDF. Setting it to Delete will delete the time series from CDF.

If you set this to Delete, the time series in CDF that cannot be found in PI will be permanently deleted from CDF.. Default value is None.
flag-namestringIf you've set behavior to Flag, use this string to mark the time series as deleted. Metadata is added to the time series with this as the key, and the current time as the value. Default value is deletedByExtractor.
time-series-delete-throttleintegerMaximum number of parallel delete operations. Default value is 10.

status-codes

Part of extractor configuration.

ALPHA: Configuration for ingesting status codes to CDF timeseries.

ParameterTypeDescription
status-codes-to-ingesteither GoodOnly, Uncertain or AllWhich data points to ingest to CDF. All ingests all datapoints, including bad. Uncertain ingests good and uncertain data points. GoodOnly ingest only good datapoints. Default value is GoodOnly.
ingest-status-codesbooleanWhether to ingest status codes into the time series API.

backfill

Global parameter.

Include the backfill section to configure how the extractor fills in historical data back in time with respect to the first data point in CDF. The backfill process completes when all the data points in the PI Data Archive are sent to CDF or when the extractor reaches the target timestamp for all time series if the to parameter is set.

ParameterTypeDescription
skipbooleanSet to true to disable backfill.
step-size-hoursintegerStep, in whole number of hours. Set to 0 to freely backfill all time series. Each iteration of backfill will backfill all time series to the next step before stepping further backward. This helps even out the progression of long backfill processes. Default value is 168.
tointegerThe target CDF timestamp in miliseconds at which to stop the backfill. The backfill may overshoot this timestamp. The overshoot will be less with smaller data point chunk size.
time-series-delayintegerDelay in milliseconds between each time series backfill request within a step.
step-delayintegerDelay in milliseconds between each step.
retry-bucket-sizeintegerMaximum size of the retry bucket. Zero or less for no limit. Default value is 100.
retry-bucket-cleanup-timeintegerDelay in seconds between each cleanup of the retry bucket. Default value is 3600.

frontfill

Global parameter.

Include the frontfill section to configure how the extractor fills in historical data forward in time with respect to the last data point in CDF. At startup, the extractor fills in the gap between the last data point in CDF and the last data point in PI by querying the archived data in the PI Data Archive. After that, the extractor only receives data streamed through the PI Data Pipe. These are real-time changes made to the time series in PI before archiving.

Note

When data points are archived in PI, they may be subject to compression, reducing the total amount of data points in a time series. Therefore, the backfill and frontfill tasks will receive data points after compression, while the streaming task will receive data points before compression. Learn more about compression in this video.

ParameterTypeDescription
skipbooleanSet this to true to disable frontfill and streaming.
streaming-intervalstringInterval between each call to the PI Data Pipe to fetch new data. Format is as given in Intervals.

If you set this parameter to a high value, there is a higher chance of having a client outbound queue overflow in the PI server. Overflow may cause loss of data points in CDF.. Default value is 1s.
delete-data-pointsbooleanIf you set this to true, the corresponding data points are deleted in CDF when data point deletion events are received in the PI Data Pipe.

Enabling this parameter may increase the streaming latency of the extractor since the extractor verifies the data point deletion before proceeding with the insertions.
use-data-pipebooleanOlder PI servers may not support data pipes. If that's the case, set this value to false to disable data pipe streaming. The frontfiller task will run constantly and will frequently query the PI Data Archive for new data points. Default value is True.
time-series-chunkintegerThe maximum number of time series in each frontfill query to PI. The chunks will be adapted according to the density of data points per time series. Default value is 1000.
data-points-chunkintegerThe maximum number of requested data points in each frontfill query to PI. Default value is 10000.

high-availability

Global parameter.

Configuration for a Redis based high availability store. Requires Redis to be configured in state-store.

ParameterTypeDescription
intervalstringInterval to update the high availability state in Redis. Format is as given in Intervals.
timeoutintegerTimeout in seconds before taking over as active extractor. Must be set greater than 0 to enable high availability.
  • Before you start
  • Minimal YAML configuration file
  • Using values from Azure Key Vault
  • Intervals
  • Configure the PI extractor
  • logger
    • console
    • file
    • trace-listener
  • metrics
    • server
    • push-gateways
  • cognite
    • idp-authentication
    • cdf-retries
    • cdf-chunking
    • cdf-throttling
    • sdk-logging
    • extraction-pipeline
    • certificates
    • metadata-targets
  • state-store
  • pi
  • time-series
  • events
  • extractor
    • include-tags
    • include-prefixes
    • include-patterns
    • include-attribute-values
    • exclude-tags
    • exclude-prefixes
    • exclude-patterns
    • exclude-attribute-values
    • deleted-time-series
    • status-codes
  • backfill
  • frontfill
  • high-availability