Configure the OPC Classic extractor
To configure the OPC Classic extractor, you must edit the configuration file. The file is in YAML format and the sample configuration file contains all valid options with default values.
When setting up an extractor, you should not base your config on the file config.example.yml
, but instead, use the config.minimal.yml
as your base and copy the parts you need from config.example.yml
.
You can exclude fields entirely to let the extractor use default values. The configuration file separates settings by component, and you can remove an entire component to disable it or use default values.
Environment variable substitution
In the config file, values wrapped in ${}
are replaced with environment variables with that name. For example, ${COGNITE_PROJECT}
will be replaced with the value of the environment variable called COGNITE_PROJECT
.
The configuration file also contains the global parameter version
, which holds the version of the configuration schema used in the configuration file. This document describes version 1 of the configuration schema.
You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.
Minimal YAML configuration file
version: 1
source:
# Windows username for authentication
username:
# Windows password for authentication
password:
# List of servers to connect to.
servers:
- # Server host name or IP address
host:
# Version of DA to use, one of V2 or V3
# This can be left out to disable live data.
da-version:
# Version of HDA to connect to on this host.
# This can be left out to disable history.
# Must be V1
hda-version:
# Prefix on externalIds for nodes generated by this server.
id-prefix:
# Name of state store used to store states if history is enabled for this server. Required if hda-version is set.
state-store-name:
endpoint-url: "opc.tcp://localhost:4840"
cognite:
# The project to connect to in the API, uses the environment variable COGNITE_PROJECT.
project: "${COGNITE_PROJECT}"
# If this is set to true, credentials can be left out, and the extractor
# will read data without pushing it to CDF.
debug: false
# This is for Microsoft as IdP, to use a different provider,
# set implementation: Basic, and use token-url instead of tenant.
# See the example config for the full list of options.
idp-authentication:
# Directory tenant
tenant: ${COGNITE_TENANT_ID}
# Application Id
client-id: ${COGNITE_CLIENT_ID}
# Client secret
secret: ${COGNITE_CLIENT_SECRET}
# List of resource scopes, ex:
# scopes:
# - scopeA
# - scopeB
scopes:
- ${COGNITE_SCOPE}
Timestamps and intervals
In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit]
. For example, 10m
for 10 minutes or 1h
for 1 hour. timeunit
is one of d
, h
, m
, s
, ms
. You can also use cron expressions.
For history start and end times, you can use a similar syntax. [N][timeunit]
and [N][timeunit]-ago
. 1d-ago
means 1 day in the past from the time history starts, and 1h
means 1 hour in the future. For instance, you can use this syntax to configure the extractor to read only recent history.
Source
This section contains parameters for connecting to the OPC Classic servers. The extractor can connect to multiple servers, where each server has its own ID prefix. Technically, each server may have multiple connections because DA and HDA are effectively separate servers. In practice, many servers support both DA and HDA interfaces but share information between the two.
All servers share the same authentication information. This is under the assumption that the extractor signs in to the servers using a shared domain or similar. It is considered best practice to give the extractor a separate network user.
Run multiple extractors if the extractor needs multiple sets of credentials.
Parameter | Description |
---|---|
username | Windows username to use for authentication to the servers. |
password | Windows password to use for authentication to the servers. |
domain | Domain for the user used for authentication. |
parallelism | Maximum number of requests made in parallel to each server. The default value is 10 . |
use-async | Use async mode when making requests to HDA servers. This can be more efficient both for the extractor and the server, but not all servers support it, so it's disabled by default. |
servers | A list of servers to extract from. |
servers[].host | Host or IP address to connect to. |
servers[].da-version | Version of DA to connect to on this host. This can be left out to not connect to DA at all. Valid options are V2 or V3 . |
servers[].hda-version | Version of HDA to connect to. This can be left out to not connect to HDA at all. Must be set equal to V1 if enabled. |
servers[].name | Name of the server to connect to on the given host. This is used to pick which server to connect to if multiple are available. If left out, pick the first server found. |
servers[].proxy-address | Proxy requests to the server through this address. |
servers[].id-prefix | Prefix for external IDs of assets and time series created in CDF by this server. |
servers[].state-store-name | Name of state store used to store states if history is enabled for this server. Required if hda-version is set. |
servers[].cache.path | Path to a local JSON file caching the node hierarchy. If the file doesn't exist, the extractor will browse the DA and HDA servers and generate it. The file may be manually edited to limit which nodes the extractor should read. |
attribute-chunking | Configuration for chunking of attributes read from HDA servers. |
attribute-chunking.chunk-size | Maximum number of items per attribute read request. The default value is 1000 . |
attribute-chunking.parallelism | Maximum number of parallel requests for attributes. The default value is 10 . |
keep-alive-interval | Interval between each read of server status, used as a keep alive. The syntax is described in Timestamps and intervals. |
Subscriptions
If you connect the extractor to a Data Access (DA) server, it establishes subscriptions on the tags it discovers. Subscriptions in OPC DA are callbacks, meaning that the server will call a function in the extractor whenever it sees any changes.
Parameter | Description |
---|---|
chunking | Configuration for how the extractor will chunk requests for subscriptions. |
chunking.chunk-size | Maximum number of items per subscription request. The default value is 1000 . |
chunking.parallelism | Maximum number of parallel requests to create subscriptions. The default value is 10 . |
keep-alive | Keep alive rate in milliseconds for subscriptions. The default value is 10000 . |
update-rate | Requested update rate, this is how often the server should check for updates from its underlying systems. The server is not required to obey this, and may return a revised update rate, or just use a notification based approach. The default value is 1000 |
deadband | Minimum difference in value required for an update to be registered. The default value is 0.0 |
Logger
Log entries are either Fatal
, Error
, Warning
, Information
, Debug
, Verbose
, in order of decreasing importance. Each level covers the ones of higher importance.
Parameter | Description |
---|---|
console | Configuration for logging to the console. |
console.level | Minimum level of log events to write to the console. Set this to enable console logging. |
console.stderr-level | Log events at this level or above are redirected to standard error. |
file | Configuration for logging to a rotating log file. |
file.level | Minimum level of log events to write to file. |
file.path | Path to the files to be logged. If this is, for example, set to logs/log.txt , logs on the form logs/log[date].txt will be created, depending on rolling-interval . |
file.retention-limit | Maximum number of log files that are kept in the log folder. |
file.rolling-interval | A rolling interval for log files. Either day or hour . The default value is day . |
Metrics
The OPC Classic extractor can push some general metrics about usage to a Prometheus pushgateway server, or expose a prometheus server for scraping.
Parameter | Description |
---|---|
server | Configuration for a prometheus scrape server. |
server.host | Host for a locally hosted prometheus server, used for scraping. |
server.port | The port used by the local prometheus server. |
push-gateways | A list of pushgateway destinations the extractor will push metrics to. |
push-gateways[].host | URI of the pushgateway host. |
push-gateways[].job | Name of the metrics job on this pushgateway. |
push-gateways[].username | Username for basic authentication. |
push-gateways[].password | Password for basic authentication. |
push-gateways[].push-interval | Interval in seconds between each push of metrics. |
Cognite
Configuration for the connection to Cognite Data Fusion (CDF).
Parameter | Description |
---|---|
host | The CDF service URL. Defaults to https://api.cognitedata.com . |
project | The CDF project. Required. |
idp-authentication | Configuration for authenticating to CDF. |
idp-authentication.authority | Authority used with tenant to authenticate to azure tenants. Use token-url if connecting using a non-azure IdP. Defaults to https://login.microsoftonline.com |
idp-authentication.tenant | Azure tenant used with authority . |
idp-authentication.token-url | URL used to obtain service tokens, used for non-azure IdPs. |
idp-authentication.client-id | Service principal client ID. |
idp-authentication.secret | Service principal client secret. |
idp-authentication.resource | Optional resource parameter to pass along with token request. |
idp-authentication.scopes | A list of scopes to pass along with the request, will typically need to contain [host]/.default |
idp-authentication.audience | Optional audience parameter to pass along with token request. |
idp-authentication.min-ttl | Requested minimum time-to-live in seconds for the token. |
idp-authentication.certificate | Configuration for authenticating using a client certificate. |
idp-authentication.certificate.authority-url | Certificate authority URL. |
idp-authentication.certificate.path | Path to the .pem or .pfx certificate to be used for authentication. |
idp-authentication.certificate.password | Certificate password. |
data-set | Configuration for data set to assign newly created assets and time series to. |
data-set.id | Internal ID of dataset. Specify either this or external-id . |
data-set.external-id | External ID of dataset. Specify either this or id . |
update | Set this to true to enable updating assets and time series that have changed in the source. |
metadata-targets | Targets for writing "metadata", meaning assets and timeseries name, description, and metadata. By default the extractor will create time series with just an external ID and nothing else. |
metadata-targets.clean | Configuration for writing metadata to CDF Clean. |
metadata-targets.clean.assets | Set to true to enable creating assets in CDF. |
metadata-targets.clean.time-series | Set to true to enable writing time series metadata. |
max-upload-interval | Maximum time to cache datapoints before they are uploaded to CDF. The syntax is described in Timestamps and intervals. Defaults to 1s |
max-data-points-upload-queue-size | Maximum number of cached datapoints before they are uploaded to CDF. Defaults to 1000000 . |
cdf-retries | Configuration for automatic retries on requests to CDF. |
cdf-retries.timeout | Timeout in milliseconds for each individual try. Defaults to 80000 . |
cdf-retries.max-retries | The maximum number of retries. Less than 0 retries forever. |
cdf-retries.max-delay | Maximum delay between each try in milliseconds. Base delay is calculated according to 125 * 2 ^ retry milliseconds. If this is less than 0, there is no upper limit. Defaults to 5000 . |
cdf-chunking | Configuration for chunking on requests to CDF. Note that increasing these may cause requests to fail, due to limits in the API. Read the API documentation before making these higher than their current value. |
cdf-chunking.time-series | Maximum number of time series per get/create time series request. |
cdf-chunking.assets | Maximum number of assets per get/create assets request. |
cdf-chunking.data-point-time-series | Maximum number of time series per datapoint create request. |
cdf-chunking.data-points | Maximum number of datapoints per datapoint create request. |
cdf-throttling | Configuration for how requests to CDF should be throttled. |
cdf-throttling.time-series | Maximum number of parallel requests per time series operation. Defaults to 20 . |
cdf-throttling.assets | Maximum number of parallel requests per assets operation. Defaults to 20 . |
cdf-throttling.data-points | Maximum number of parallel requests per datapoints operation. Defaults to 10 . |
sdk-logging | Configuration for logging of requests from the SDK. |
sdk-logging.disable | Set this to true to disable logging of requests from the SDK, it's enabled by default. |
sdk-logging.level | Log level to log messages from the SDK at, defaults to debug . |
sdk-logging.format | Format of the log message. Defaults to CDF ({Message}): {HttpMethod} {Url} - {Elapsed} ms |
nan-replacement | Replacement for NaN values when writing to CDF. Defaults to none, meaning these are just removed. |
extraction-pipeline.external-id | Configuration for associating this extractor with an extraction pipeline. Used for monitoring and remote configuration. |
certificates | Configuration for special handling of SSL certificates. This shouldn't be considered a permanent solution to certificate problems. |
certificates.accept-all | Accept all remote SSL certificates even if verification fails. This introduces a risk of man-in-the-middle attacks. |
certificates.allow-list | List of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates. |
State Store
Configuration for storing state in a local database or in CDF RAW. This is required if reading from an HDA server.
Parameter | Description |
---|---|
location | Path to database file, or name of raw database containing state store. |
database | Which type of database to use. Valid options are LiteDb , Raw , or None . The default value is None . |
interval | Interval between each push of local states to the state store. The syntax is described in Timestamps and intervals. The default value is 1m . |
History
Configuration for reading historical data from an HDA server.
Parameter | Description |
---|---|
backfill | Set this to true to enable backfill, meaning that the extractor will read backwards from the earliest known timestamp as well as forwards from the latest known timestamp on startup. This is only useful if there is enough data in the server that reading it all will take a very long time. |
start-time | The earliest timestamp history will be read from, in milliseconds since 01/01/1970 . Alternatively use syntax N[timeunit](-ago) where timeunit is one of w , d , h , m , s , or ms . -ago indicates that this is in the past, otherwise it will be in the future. |
end-time | The latest timestamp that history will be read from, in milliseconds since 01/01/1970 . Alternatively use syntax N[timeunit](-ago) where timeunit is one of w , d , h , m , s , or ms . -ago indicates that this is in the past, otherwise it will be in the future. |
chunking | Chunking for history reads. |
chunking.chunk-size | Maximum number of items per history read request. Defaults to 1000 . |
chunking.parallelism | Maximum number of parallel history read requests. Defaults to 10 . |
chunking.max-per-minute | Maximum number of history read requests per minute. |
chunking.max-read-per-tag | Maximum number of values returned per tag per request. Defaults to 1000 . |
granularity | Granularity to use when doing history read. Nodes with the last/earliest known timestamp within this range of each other will be read together. This shouldn't be smaller than the usual average update rate. Leave at 0 to always read a single node each time. The syntax is described in Timestamps and intervals. Defaults to 15s . |