Configuration settings
To configure the PI extractor, you must create a configuration file. This file must be in YAML format. The configuration file is split into sections, each represented by a top-level entry in the YAML format. Subsections are nested under a section in the YAML format.
You can use either the sample complete or minimal configuration files included with the installer as a starting point for your configuration settings:
-
config.default.yml - This file contains all configuration options and descriptions.
-
config.minimal.yml - This file contains a minimum configuration and no descriptions.
You must name the configuration file config.yml.
You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.
Before you start
- Optionally, copy one of the sample files in the
config
directory and rename it toconfig.yml
. - The
config.minimal.yml
file doesn't include a metrics section. Copy this section from the example below if the extractor is required to send metrics to a Prometheus Pushgateway. - Set up an extraction pipeline and note the external ID.
Minimal YAML configuration file
The YAML settings below contain valid PI extractor 2.1 configurations. The values wrapped in ${}
are replaced with environment variables with that name. For example, ${COGNITE_PROJECT}
will be replaced with the value of the environment variable called COGNITE_PROJECT
.
The configuration file has a global parameter version, which holds the version of the configuration schema used in the configuration file. This document describes version 3 of the configuration schema.
version: 3
cognite:
project: '${COGNITE_PROJECT}'
idp-authentication:
tenant: ${COGNITE_TENANT_ID}
client-id: ${COGNITE_CLIENT_ID}
secret: ${COGNITE_CLIENT_SECRET}
scopes:
- ${COGNITE_SCOPE}
time-series:
external-id-prefix: 'pi:'
pi:
host: ${PI_HOST}
username: ${PI_USER}
password: ${PI_PASSWORD}
state-store:
database: LiteDb
location: 'state.db'
logger:
file:
level: 'information'
path: 'logs/log.log'
metrics:
push-gateways:
- host: 'https://prometheus-push.cognite.ai/'
username: ${PROMETHEUS_USER}
password: ${PROMETHEUS_PASSWORD}
job: ${PROMETHEUS_JOB}
where:
-
version
is the version of the configuration schema. Use version 3 to be compatible with the Cognite PI extractor 2.1. -
cognite
is the how the extractor reads the authentication details for PI (pi) and CDF (cognite) from environment variables. Since no host is specified in thecognite
section, the extractor uses the default value, <https://api.cognitedata.com >, and assumes that the PI server uses Windows authentication. -
time-series
configures the extractor to create time series in CDF where the external IDs will be prefixed withpi:
. You can also use a data set ID configuration to add all time series created by the extractor to a particular data set. -
state-store
configures the extractor to save the extraction state locally using a LiteDB database file namedstate.db
. -
logger
configures the extractor to log at information level and outputs log messages to a log file in the logs/log.log directory. By default, new files are created daily and retained for 31 days. The date is appended to the file name. -
metrics
points to the Prometheus Pushgateway hosted by CDF. It assumes that a user has already been created. For the Pushgateway hosted by CDF, the job name(${PROMETHEUS_JOB})
must start with the username followed by'-'
.
Using values from Azure Key Vault
The PI extractor also supports loading values from Azure Key Vault. To load a configuration value from Azure Key Vault, use the !keyvault
tag followed by the name of the secret you want to load. For example, to load the value of the my-secret-name
secret in Key Vault into a password
parameter, configure your extractor like this:
password: !keyvault my-secret-name
To use Key Vault, you also need to include the key-vault
section in your configuration, with the following parameters:
Parameter | Description |
---|---|
keyvault-name | Name of Key Vault to load secrets from |
authentication-method | How to authenticate to Azure. Either default or client-secret . For default , the extractor will look at the user running the extractor, and look for pre-configured Azure logins from tools like the Azure CLI. For client-secret , the extractor will authenticate with a configured client ID/secret pair. |
client-id | Required for using the client-secret authentication method. The client ID to use when authenticating to Azure. |
secret | Required for using the client-secret authentication method. The client secret to use when authenticating to Azure. |
tenant-id | Required for using the client-secret authentication method. The tenant ID of the Key Vault in Azure. |
Example:
azure-keyvault:
keyvault-name: my-keyvault-name
authentication-method: client-secret
tenant-id: 6f3f324e-5bfc-4f12-9abe-22ac56e2e648
client-id: 6b4cc73e-ee58-4b61-ba43-83c4ba639be6
secret: 1234abcd
Timestamps and intervals
In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit]
, for example 10m
for 10 minutes or 1h
for 1 hour. timeunit
is one of d
, h
, m
, s
, ms
. You can also use a cron expression in some places.
For configuring the earlies point to backfill to you can use a similar syntax. [N][timeunit]
and [N][timeunit]-ago
. 1d-ago
means 1 day in the past from the time history starts, and 1h
means 1 hour in the future. For instance, you can use this syntax to configure the backfill only recent history.
Configure the PI extractor
Configuration for the PI extractor. Each section configures a different aspect of the extractor.
Parameter | Type | Description |
---|---|---|
version | integer | Version of the config file, the extractor specifies which config file versions are accepted in each version of the extractor. |
logger | object | Configuration for logging to console or file. Log entries are either Fatal , Error , Warning , Information , Debug , or Verbose , in order of decreasing priority. The extractor will log any messages at an equal or higher log level than the configured level for each sink. |
metrics | object | Configuration for publishing metrics. |
cognite | object | Configure connection to Cognite Data Fusion (CDF) |
state-store | object | Include the state-store section to configure the extractor to save the extraction state periodically. This makes the extraction resume faster in the next run. This section is optional. If not present, or if database is set to none , the extraction state is restored by querying the timestamps of the first and last data points of each time series. If CDF Raw is used as a state store, you can see the extracted ranges under Manage staged data in CDF. |
pi | object | Configure the extractor to connect to a particular PI server or PI collective. If you configure the extractor with a PI collective, the extractor will transparently maintain a connection to one of the active servers in the collective. The default settings provide Active Directory authorization to the PI host when the server account for the Windows service is authorized. |
time-series | object | Include the time-series section for configuration related to the time series ingested by the extractor. This section is optional. |
events | object | Configuration for writing events on reconnect and data loss incidents |
extractor | object | The extractor section contains various configuration options for the operation of the extractor itself. The options here can be used to extract only a subset of the PI points in the server. This is how the list is created:1. If include-tags , include-prefixes , include-patterns or include-attribute-values are not empty, start with the union of these three. Otherwise, start with all points.2. Remove points as specified by exclude-tags , exclude-prefixes , exclude-patterns and exclude-attribute-values . |
backfill | object | Include the backfill section to configure how the extractor fills in historical data back in time with respect to the first data point in CDF. The backfill process completes when all the data points in the PI Data Archive are sent to CDF or when the extractor reaches the target timestamp for all time series if the to parameter is set. |
frontfill | object | Include the frontfill section to configure how the extractor fills in historical data forward in time with respect to the last data point in CDF. At startup, the extractor fills in the gap between the last data point in CDF and the last data point in PI by querying the archived data in the PI Data Archive. After that, the extractor only receives data streamed through the PI Data Pipe. These are real-time changes made to the time series in PI before archiving. |
high-availability | object | Configuration for a Redis based high availability store. Requires Redis to be configured in state-store . |
logger
Global parameter.
Configuration for logging to console or file. Log entries are either Fatal
, Error
, Warning
, Information
, Debug
, or Verbose
, in order of decreasing priority. The extractor will log any messages at an equal or higher log level than the configured level for each sink.
Parameter | Type | Description |
---|---|---|
console | object | Configuration for logging to the console. |
file | object | Configuration for logging to a rotating log file. |
trace-listener | object | Adds a listener that uses the configured logger to output messages from System.Diagnostics.Trace |
console
Part of logger
configuration.
Configuration for logging to the console.
Parameter | Type | Description |
---|---|---|
level | either verbose , debug , information , warning , error or fatal | Required. Minimum level of log events to write to the console. If not present, or invalid, logging to console is disabled. |
stderr-level | either verbose , debug , information , warning , error or fatal | Log events at this level or above are redirected to standard error. |
file
Part of logger
configuration.
Configuration for logging to a rotating log file.
Parameter | Type | Description |
---|---|---|
level | either verbose , debug , information , warning , error or fatal | Required. Minimum level of log events to write to file. |
path | string | Required. Path to the files to be logged. If this is set to logs/log.txt , logs on the form logs/log[date].txt will be created, depending on rolling-interval . |
retention-limit | integer | Maximum number of log files that are kept in the log folder. Default value is 31 . |
rolling-interval | either day or hour | Rolling interval for log files. Default value is day . |
trace-listener
Part of logger
configuration.
Adds a listener that uses the configured logger to output messages from System.Diagnostics.Trace
Parameter | Type | Description |
---|---|---|
level | either verbose , debug , information , warning , error or fatal | Required. Level to output trace messages at |
metrics
Global parameter.
Configuration for publishing metrics.
Parameter | Type | Description |
---|---|---|
server | object | Configuration for having the extractor start a Prometheus scrape server on a local port. |
push-gateways | list | A list of pushgateway destinations to push metrics to. The extractor will automatically push metrics to each of these. |
server
Part of metrics
configuration.
Configuration for having the extractor start a Prometheus scrape server on a local port.
Parameter | Type | Description |
---|---|---|
host | string | Required. Host name for local Prometheus server, must be exposed to some prometheus instance for scraping. Examples: localhost 0.0.0.0 |
port | integer | Required. The port used for a local Prometheus server. |
push-gateways
Part of metrics
configuration.
A list of pushgateway destinations to push metrics to. The extractor will automatically push metrics to each of these.
Parameter | Type | Description |
---|---|---|
host | string | Required. URI of the pushgateway host Example: http://my.pushgateway:9091 |
job | string | Required. Name of the Prometheus pushgateway job. |
username | string | Username for basic authentication |
password | string | Password for basic authentication |
push-interval | integer | Interval in seconds between each push to the gateway. Default value is 1 . |
cognite
Global parameter.
Configure connection to Cognite Data Fusion (CDF)
Parameter | Type | Description |
---|---|---|
project | string | CDF project to connect to. |
idp-authentication | object | The idp-authentication section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory).See OAuth 2.0 client credentials flow |
host | string | Insert the base URL of the CDF project. Default value is https://api.cognitedata.com . |
cdf-retries | object | Configure automatic retries on requests to CDF. |
cdf-chunking | object | Configure chunking of data on requests to CDF. Note that increasing these may cause requests to fail due to limits in the API itself |
cdf-throttling | object | Configure the maximum number of parallel requests for different CDF resources. |
sdk-logging | object | Configure logging of requests from the SDK |
nan-replacement | either number or null | Replacement for NaN values when writing to CDF. If left out, NaN values are skipped. |
extraction-pipeline | object | Configure an associated extraction pipeline |
certificates | object | Configure special handling of SSL certificates. This should never be considered a permanent solution to certificate problems |
metadata-targets | object | Configuration for targets for time series metadata. |
idp-authentication
Part of cognite
configuration.
The idp-authentication
section enables the extractor to authenticate to CDF using an external identity provider (IdP), such as Microsoft Entra ID (formerly Azure Active Directory).
See OAuth 2.0 client credentials flow
Parameter | Type | Description |
---|---|---|
authority | string | AInsert the authority together with tenant to authenticate against Azure tenants. Default value is https://login.microsoftonline.com/ . |
client-id | string | Required. Enter the service principal client id from the IdP. |
tenant | string | Enter the Azure tenant. |
token-url | string | Insert the URL to fetch tokens from. |
secret | string | Enter the service principal client secret from the IdP. |
resource | string | Resource parameter passed along with token requests. |
audience | string | Audience parameter passed along with token requests. |
scopes | configuration for either list or string | |
min-ttl | integer | Insert the minimum time in seconds a token will be valid. If the cached token expires in less than min-ttl seconds, it will be refreshed even if it is still valid. Default value is 30 . |
certificate | object | Authenticate with a client certificate |
certificate
Part of idp-authentication
configuration.
Authenticate with a client certificate
Parameter | Type | Description |
---|---|---|
authority-url | string | Authentication authority URL |
path | string | Required. Enter the path to the .pem or .pfx certificate to be used for authentication |
password | string | Enter the password for the key file, if it is encrypted. |
cdf-retries
Part of cognite
configuration.
Configure automatic retries on requests to CDF.
Parameter | Type | Description |
---|---|---|
timeout | integer | Timeout in milliseconds for each individual request to CDF. Default value is 80000 . |
max-retries | integer | Maximum number of retries on requests to CDF. If this is less than 0, retry forever. Default value is 5 . |
max-delay | integer | Max delay in milliseconds between each retry. Base delay is calculated according to 125*2^retry milliseconds. If less than 0, there is no maximum. Default value is 5000 . |
cdf-chunking
Part of cognite
configuration.
Configure chunking of data on requests to CDF. Note that increasing these may cause requests to fail due to limits in the API itself
Parameter | Type | Description |
---|---|---|
time-series | integer | Maximum number of timeseries per get/create timeseries request. Default value is 1000 . |
assets | integer | Maximum number of assets per get/create assets request. Default value is 1000 . |
data-point-time-series | integer | Maximum number of timeseries per datapoint create request. Default value is 10000 . |
data-point-delete | integer | Maximum number of ranges per delete datapoints request. Default value is 10000 . |
data-point-list | integer | Maximum number of timeseries per datapoint read request. Used when getting the first point in a timeseries. Default value is 100 . |
data-points | integer | Maximum number of datapoints per datapoints create request. Default value is 100000 . |
data-points-gzip-limit | integer | Minimum number of datapoints in request to switch to using gzip. Set to -1 to disable, and 0 to always enable (not recommended). The minimum HTTP packet size is generally 1500 bytes, so this should never be set below 100 for numeric datapoints. Even for larger packages gzip is efficient enough that packages are compressed below 1500 bytes. At 5000 it is always a performance gain. It can be set lower if bandwidth is a major issue. Default value is 5000 . |
raw-rows | integer | Maximum number of rows per request to cdf raw. Default value is 10000 . |
raw-rows-delete | integer | Maximum number of row keys per delete request to raw. Default value is 1000 . |
data-point-latest | integer | Maximum number of timeseries per datapoint read latest request. Default value is 100 . |
events | integer | Maximum number of events per get/create events request. Default value is 1000 . |
sequences | integer | Maximum number of sequences per get/create sequences request. Default value is 1000 . |
sequence-row-sequences | integer | Maximum number of sequences per create sequence rows request. Default value is 1000 . |
sequence-rows | integer | Maximum number of sequence rows per sequence when creating rows. Default value is 10000 . |
instances | integer | Maximum number of data modeling instances per get/create instance request. Default value is 1000 . |
cdf-throttling
Part of cognite
configuration.
Configure the maximum number of parallel requests for different CDF resources.
Parameter | Type | Description |
---|---|---|
time-series | integer | Maximum number of parallel requests per timeseries operation. Default value is 20 . |
assets | integer | Maximum number of parallel requests per assets operation. Default value is 20 . |
data-points | integer | Maximum number of parallel requests per datapoints operation. Default value is 10 . |
raw | integer | Maximum number of parallel requests per raw operation. Default value is 10 . |
ranges | integer | Maximum number of parallel requests per get first/last datapoint operation. Default value is 20 . |
events | integer | Maximum number of parallel requests per events operation. Default value is 20 . |
sequences | integer | Maximum number of parallel requests per sequences operation. Default value is 10 . |
instances | integer | Maximum number of parallel requests per data modeling instances operation. Default value is 4 . |
sdk-logging
Part of cognite
configuration.
Configure logging of requests from the SDK
Parameter | Type | Description |
---|---|---|
disable | boolean | True to disable logging from the SDK, it is enabled by default |
level | either trace , debug , information , warning , error , critical or none | Log level to log messages from the SDK at. Default value is debug . |
format | string | Format of the log message. Default value is CDF ({Message}): {HttpMethod} {Url} {ResponseHeader[X-Request-ID]} - {Elapsed} ms . |
extraction-pipeline
Part of cognite
configuration.
Configure an associated extraction pipeline
Parameter | Type | Description |
---|---|---|
external-id | string | External ID of the extraction pipeline |
frequency | integer | Frequency to report Seen to the extraction pipeline in seconds. Less than or equal to zero will not report automatically. Default value is 600 . |
certificates
Part of cognite
configuration.
Configure special handling of SSL certificates. This should never be considered a permanent solution to certificate problems
Parameter | Type | Description |
---|---|---|
accept-all | boolean | Accept all remote SSL certificates. This introduces a severe risk of man-in-the-middle attacks |
allow-list | list | List of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates |
allow-list
Part of certificates
configuration.
List of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates
Each element of this list should be a string.
metadata-targets
Part of cognite
configuration.
Configuration for targets for time series metadata.
Parameter | Type | Description |
---|---|---|
raw | object | Configuration for writing metadata to CDF Raw. |
clean | object | Configuration for enabling writing metadata to CDF Clean. |
raw
Part of metadata-targets
configuration.
Configuration for writing metadata to CDF Raw.
Parameter | Type | Description |
---|---|---|
database | string | Required. The Raw database to write to. |
timeseries-table | string | Name of the Raw table to write timeseries metadata to, enables writing metadata to Raw. Metadata in this case includes name, description, and unit. |
clean
Part of metadata-targets
configuration.
Configuration for enabling writing metadata to CDF Clean.
Parameter | Type | Description |
---|---|---|
timeseries | boolean | Set to false to disable writing metadata to time series. Default value is True . |
state-store
Global parameter.
Include the state-store
section to configure the extractor to save the extraction state periodically. This makes the extraction resume faster in the next run. This section is optional. If not present, or if database
is set to none
, the extraction state is restored by querying the timestamps of the first and last data points of each time series. If CDF Raw is used as a state store, you can see the extracted ranges under Manage staged data in CDF.
Parameter | Type | Description |
---|---|---|
location | string | Required. Path to .db file used for storage, or name of a CDF RAW database. |
database | either None , LiteDb or Raw | Which type of database to use. Default value is None . |
interval | string | Enter the time between each write to the state store. 0 or less disables the state store. Format is as given in Intervals. Default value is 10s . |
time-series-table-name | string | Table name in Raw/Redis/LiteDB for storing time series state. Default value is ranges . |
extractor-table-name | string | Table name in Raw/Redis/LiteDB for storing general extractor state. Default value is extractor . |
redis | boolean | Use a redis state store. Overrides database . |
pi
Global parameter.
Configure the extractor to connect to a particular PI server or PI collective. If you configure the extractor with a PI collective, the extractor will transparently maintain a connection to one of the active servers in the collective. The default settings provide Active Directory authorization to the PI host when the server account for the Windows service is authorized.
Parameter | Type | Description |
---|---|---|
host | string | Required. Enter the hostname or IP address of the PI server or the PI collective name. If the host is a collective member name, connect to that member. |
username | string | Enter the username to use for authentication. Leave this empty if the Windows service account under which the extractor runs is authorized in Active Directory to access the PI host. |
password | string | Enter the password for the given username, if any. |
native-authentication | boolean | Determines whether the extractor will use native PI authentication or Windows authentication. The default value is false , which indicates Windows authentication. |
parallelism | integer | Insert the number of parallel requests to the PI server. If backfill-parallelism is set, this excludes backfill requests. Default value is 1 . |
backfill-parallelism | integer | Insert the number of parallel requests to the PI server for backfills. This allows the separate throttling of backfills. The default value is 0, which means that the value in parallelism is used. |
use-member-priority | boolean | When connecting to a PI Collective, attempt to connect to the member with highest priority. If set to false , attempt to connect to the member with the same name as host . |
max-connection-retries | integer | The maximum number of times to attempt to connect to the PI server before failing fatally. If this is 0 , retry forever. |
time-series
Global parameter.
Include the time-series
section for configuration related to the time series ingested by the extractor. This section is optional.
The example would create time series on the following form in CDF:
{
"externalId": "pi:12345",
"name": "PI-Point-Name",
"isString": false,
"isStep": false,
"dataSetId": 1234567890123456
}
Example:
external-id-prefix: 'pi:'
external-id-source: SourceId
data-set-id: 1234567890123456
Parameter | Type | Description |
---|---|---|
external-id-prefix | string | Enter the external ID prefix to identify the time series in CDF. Leave empty for no prefix. The external ID in CDF will be this prefix followed by either the PI Point name or PI Point ID. |
external-id-source | either Name or SourceId | Enter the source of the external ID. Name means that the PI Point name is used, while SourceID means that the PI Point ID is used. Default value is Name . |
data-set-id | integer | Specify the data set to assign to all time series controlled by the extractor, both new and current. If you don't configure this, the extractor will not change the current time series' data set. |
data-set-external-id | string | Specify the external ID of the data set to use, see data-set-id . Using this requires the dataSets:READ ACL in CDF |
sanitation | either Remove , Clean or None | Specify what to do when the time series fields exceed CDF limits. Remove will skip any time series that fail sanitation. Clean will truncate and remove values to conform to limits. None does nothing (requests may fail as a result). External IDs are never truncated, any time series exceeding CDF limits will be skipped to avoid external ID collisions, regardless of this configuration. Default value is Clean . |
update-metadata | boolean | Enable updating time series if extractor configuration or PI metadata changes. Default value is True . |
space-id | string | Data modeling space id, this parameter will override data-set-id and switch the extractor destinationfrom Asset-Centric model TimeSeries to Core Data Model TimeSeries with the ExtractorTimeSeries extension. |
source-id | string | Data modeling source id, this parameter will override the external id of the source node in data modeling, which defaults to the host parameter from PI server configuration. |
events
Global parameter.
Configuration for writing events on reconnect and data loss incidents
The example configuration produces events on the following form
{
"externalId": "pi-events:PiExtractor-2020-08-04 01:01:21.395(0)",
"startTime": 1596502850668,
"endTime": 1596502880393,
"type": "DataLoss",
"subtype": "DataPipeOverflow",
"source": "PiExtractor",
"dataSetId": 1234567890123456
}
Example:
source: PiExtractor
external-id-prefix: 'pi-events:'
data-set-id: 1234567890123456
store-extractor-events-interval: 5m
Parameter | Type | Description |
---|---|---|
source | string | Events have this value as the source in CDF. |
external-id-prefix | string | Set an external ID prefix to identify events in CDF. Leave empty for no prefix. |
data-set-id | integer | Select a data set to assign to all events created by the extractor. We recommend using the same data set ID used in the Time series section. |
data-set-external-id | string | Data set external ID added to events written to CDF, see data-set-id . Using this requires the dataSets:READ ACL in CDF |
store-extractor-events-interval | string | Store events in CDF with this interval. Format is as given in Intervals. If this is not set, events are not created in CDF. Examples: 10s 5m |
extractor
Global parameter.
The extractor
section contains various configuration options for the operation of the extractor itself. The options here can be used to extract only a subset of the PI points in the server. This is how the list is created:
- If
include-tags
,include-prefixes
,include-patterns
orinclude-attribute-values
are not empty, start with the union of these three. Otherwise, start with all points. - Remove points as specified by
exclude-tags
,exclude-prefixes
,exclude-patterns
andexclude-attribute-values
.
Parameter | Type | Description |
---|---|---|
include-tags | list | List of tag names to include. |
include-prefixes | list | List of tag name prefixes to include. |
include-patterns | list | List of substrings of tag names to include. |
include-attribute-values | list | List of attribute values to include. |
exclude-tags | list | List of tag names to exclude. |
exclude-prefixes | list | List of tag name prefixes to exclude. |
exclude-patterns | list | List of substrings of tag names to exclude. |
exclude-attribute-values | list | List of attribute values to exclude. |
time-series-update-interval | either string or integer | Interval between checks for new time series in PI. Format is as given in Intervals, this option accepts cron expressions. Default value is 24h . |
deleted-time-series | object | Include the deleted-time-series subsection to configure how the extractor handles time series that exist in CDF but not in PI. This subsection is optional, and the default behavior is none (do nothing). |
end-of-stream-interval | string | Interval for fetching the end-of-stream timestamps from PI. Format is as given in Intervals. |
end-of-stream-chunking | integer | Maximum number of time series per end-of-stream request. Default value is 10000 . |
end-of-stream-throttle | integer | Maximum number of parallel end-of-stream requests. Default value is 10 . |
time-series-update-throttle | integer | Maximum number of parallel time series updates. Default value is 10 . |
dry-run | boolean | Set this to true to run the extractor in dry-run mode, where it does not push any data to CDF. Useful for testing the connection to the PI Server |
read-extracted-ranges | boolean | Set this to true to disable reading from CDF on extractor startup. If this is set to false it is strongly recommended to have a state-store configured, or the extractor will read all history from PI on every run. Default value is True . |
status-codes | object | Configuration for ingesting status codes to CDF timeseries. |
include-tags
Part of extractor
configuration.
List of tag names to include.
Each element of this list should be a string.
include-prefixes
Part of extractor
configuration.
List of tag name prefixes to include.
Each element of this list should be a string.
include-patterns
Part of extractor
configuration.
List of substrings of tag names to include.
Each element of this list should be a string.
include-attribute-values
Part of extractor
configuration.
List of attribute values to include.
Parameter | Type | Description |
---|---|---|
key | string | Attribute name |
value | string | Attribute value |
exclude-tags
Part of extractor
configuration.
List of tag names to exclude.
Each element of this list should be a string.
exclude-prefixes
Part of extractor
configuration.
List of tag name prefixes to exclude.
Each element of this list should be a string.
exclude-patterns
Part of extractor
configuration.
List of substrings of tag names to exclude.
Each element of this list should be a string.
exclude-attribute-values
Part of extractor
configuration.
List of attribute values to exclude.
Parameter | Type | Description |
---|---|---|
key | string | Attribute name |
value | string | Attribute value |
deleted-time-series
Part of extractor
configuration.
Include the deleted-time-series
subsection to configure how the extractor handles time series that exist in CDF but not in PI. This subsection is optional, and the default behavior is none
(do nothing).
This only affects time series with the same data set ID and external ID prefix as the time series configured in the extractor
To find time series that exist in CDF but not in PI, the extractor:
- Lists all time series in CDF that have the configured external ID prefix and data set ID
- Filters the time series using the include/exclude rules defined in the extractor section.
- Matches the result against the time series obtained from the PI Server after filtering these using the include/exclude rules.
Parameter | Type | Description |
---|---|---|
behavior | either None , Flag or Delete | Select the action taken by the extractor. Setting this to Flag will perform soft deletion: Flag the time series as deleted but don't delete them from CDF. Setting it to Delete will delete the time series from CDF.If you set this to Delete , the time series in CDF that cannot be found in PI will be permanently deleted from CDF.. Default value is None . |
flag-name | string | If you've set behavior to Flag , use this string to mark the time series as deleted. Metadata is added to the time series with this as the key, and the current time as the value. Default value is deletedByExtractor . |
time-series-delete-throttle | integer | Maximum number of parallel delete operations. Default value is 10 . |
status-codes
Part of extractor
configuration.
Configuration for ingesting status codes to CDF timeseries.
Parameter | Type | Description |
---|---|---|
status-codes-to-ingest | either GoodOnly , Uncertain or All | Which data points to ingest to CDF. All ingests all datapoints, including bad. Uncertain ingests good and uncertain data points. GoodOnly ingest only good datapoints. Default value is GoodOnly . |
backfill
Global parameter.
Include the backfill
section to configure how the extractor fills in historical data back in time with respect to the first data point in CDF. The backfill process completes when all the data points in the PI Data Archive are sent to CDF or when the extractor reaches the target timestamp for all time series if the to
parameter is set.
Parameter | Type | Description |
---|---|---|
skip | boolean | Set to true to disable backfill. |
step-size-hours | integer | Step, in whole number of hours. Set to 0 to freely backfill all time series. Each iteration of backfill will backfill all time series to the next step before stepping further backward. This helps even out the progression of long backfill processes. Default value is 168 . |
to | string | The target CDF timestamp in miliseconds at which to stop the backfill. Format is as given in Timestamps and intervals, -ago can be added to make a relative timestamp in the past. The backfill may overshoot this timestamp. The overshoot will be less with smaller data point chunk size. Default value is 0 .Example: 3d-ago |
time-series-delay | integer | Delay in milliseconds between each time series backfill request within a step. |
step-delay | integer | Delay in milliseconds between each step. |
retry-bucket-size | integer | Maximum size of the retry bucket. Zero or less for no limit. Default value is 100 . |
retry-bucket-cleanup-time | integer | Delay in seconds between each cleanup of the retry bucket. Default value is 3600 . |
frontfill
Global parameter.
Include the frontfill
section to configure how the extractor fills in historical data forward in time with respect to the last data point in CDF. At startup, the extractor fills in the gap between the last data point in CDF and the last data point in PI by querying the archived data in the PI Data Archive. After that, the extractor only receives data streamed through the PI Data Pipe. These are real-time changes made to the time series in PI before archiving.
When data points are archived in PI, they may be subject to compression, reducing the total amount of data points in a time series. Therefore, the backfill and frontfill tasks will receive data points after compression, while the streaming task will receive data points before compression. Learn more about compression in this video.
Parameter | Type | Description |
---|---|---|
skip | boolean | Set this to true to disable frontfill and streaming. |
streaming-interval | string | Interval between each call to the PI Data Pipe to fetch new data. Format is as given in Intervals. If you set this parameter to a high value, there is a higher chance of having a client outbound queue overflow in the PI server. Overflow may cause loss of data points in CDF.. Default value is 1s . |
delete-data-points | boolean | If you set this to true , the corresponding data points are deleted in CDF when data point deletion events are received in the PI Data Pipe.Enabling this parameter may increase the streaming latency of the extractor since the extractor verifies the data point deletion before proceeding with the insertions. |
use-data-pipe | boolean | Older PI servers may not support data pipes. If that's the case, set this value to false to disable data pipe streaming. The frontfiller task will run constantly and will frequently query the PI Data Archive for new data points. Default value is True . |
time-series-chunk | integer | The maximum number of time series in each frontfill query to PI. The chunks will be adapted according to the density of data points per time series. Default value is 1000 . |
data-points-chunk | integer | The maximum number of requested data points in each frontfill query to PI. Default value is 10000 . |
high-availability
Global parameter.
Configuration for a Redis based high availability store. Requires Redis to be configured in state-store
.
Parameter | Type | Description |
---|---|---|
interval | string | Interval to update the high availability state in Redis. Format is as given in Intervals. |
timeout | integer | Timeout in seconds before taking over as active extractor. Must be set greater than 0 to enable high availability. |