Skip to main content

Configure the OPC Classic extractor

To configure the OPC Classic extractor, you must edit the configuration file. The file is in YAML format and the sample configuration file contains all valid options with default values.

When setting up an extractor, you should not base your config on the file config.example.yml, but instead, use the config.minimal.yml as your base and copy the parts you need from config.example.yml.

You can exclude fields entirely to let the extractor use default values. The configuration file separates settings by component, and you can remove an entire component to disable it or use default values.

Environment variable substitution

In the config file, values wrapped in ${} are replaced with environment variables with that name. For example, ${COGNITE_PROJECT} will be replaced with the value of the environment variable called COGNITE_PROJECT.

The configuration file also contains the global parameter version, which holds the version of the configuration schema used in the configuration file. This document describes version 1 of the configuration schema.

Tip

You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.

Minimal YAML configuration file

version: 1

source:
# Windows username for authentication
username:
# Windows password for authentication
password:
# List of servers to connect to.
servers:
- # Server host name or IP address
host:
# Version of DA to use, one of V2 or V3
# This can be left out to disable live data.
da-version:
# Version of HDA to connect to on this host.
# This can be left out to disable history.
# Must be V1
hda-version:
# Prefix on externalIds for nodes generated by this server.
id-prefix:
# Name of state store used to store states if history is enabled for this server. Required if hda-version is set.
state-store-name:
endpoint-url: "opc.tcp://localhost:4840"

cognite:
# The project to connect to in the API, uses the environment variable COGNITE_PROJECT.
project: "${COGNITE_PROJECT}"

# If this is set to true, credentials can be left out, and the extractor
# will read data without pushing it to CDF.
debug: false

# This is for Microsoft as IdP, to use a different provider,
# set implementation: Basic, and use token-url instead of tenant.
# See the example config for the full list of options.
idp-authentication:
# Directory tenant
tenant: ${COGNITE_TENANT_ID}
# Application Id
client-id: ${COGNITE_CLIENT_ID}
# Client secret
secret: ${COGNITE_CLIENT_SECRET}
# List of resource scopes, ex:
# scopes:
# - scopeA
# - scopeB
scopes:
- ${COGNITE_SCOPE}

Timestamps and intervals

In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit]. For example, 10m for 10 minutes or 1h for 1 hour. timeunit is one of d, h, m, s, ms. You can also use cron expressions.

For history start and end times, you can use a similar syntax. [N][timeunit] and [N][timeunit]-ago. 1d-ago means 1 day in the past from the time history starts, and 1h means 1 hour in the future. For instance, you can use this syntax to configure the extractor to read only recent history.

Source

This section contains parameters for connecting to the OPC Classic servers. The extractor can connect to multiple servers, where each server has its own ID prefix. Technically, each server may have multiple connections because DA and HDA are effectively separate servers. In practice, many servers support both DA and HDA interfaces but share information between the two.

All servers share the same authentication information. This is under the assumption that the extractor signs in to the servers using a shared domain or similar. It is considered best practice to give the extractor a separate network user.

Run multiple extractors if the extractor needs multiple sets of credentials.

ParameterDescription
usernameWindows username to use for authentication to the servers.
passwordWindows password to use for authentication to the servers.
domainDomain for the user used for authentication.
parallelismMaximum number of requests made in parallel to each server. The default value is 10.
use-asyncUse async mode when making requests to HDA servers. This can be more efficient both for the extractor and the server, but not all servers support it, so it's disabled by default.
serversA list of servers to extract from.
servers[].hostHost or IP address to connect to.
servers[].da-versionVersion of DA to connect to on this host. This can be left out to not connect to DA at all. Valid options are V2 or V3.
servers[].hda-versionVersion of HDA to connect to. This can be left out to not connect to HDA at all. Must be set equal to V1 if enabled.
servers[].nameName of the server to connect to on the given host. This is used to pick which server to connect to if multiple are available. If left out, pick the first server found.
servers[].proxy-addressProxy requests to the server through this address.
servers[].id-prefixPrefix for external IDs of assets and time series created in CDF by this server.
servers[].state-store-nameName of state store used to store states if history is enabled for this server. Required if hda-version is set.
servers[].cache.pathPath to a local JSON file caching the node hierarchy. If the file doesn't exist, the extractor will browse the DA and HDA servers and generate it. The file may be manually edited to limit which nodes the extractor should read.
attribute-chunkingConfiguration for chunking of attributes read from HDA servers.
attribute-chunking.chunk-sizeMaximum number of items per attribute read request. The default value is 1000.
attribute-chunking.parallelismMaximum number of parallel requests for attributes. The default value is 10.
keep-alive-intervalInterval between each read of server status, used as a keep alive. The syntax is described in Timestamps and intervals.

Subscriptions

If you connect the extractor to a Data Access (DA) server, it establishes subscriptions on the tags it discovers. Subscriptions in OPC DA are callbacks, meaning that the server will call a function in the extractor whenever it sees any changes.

ParameterDescription
chunkingConfiguration for how the extractor will chunk requests for subscriptions.
chunking.chunk-sizeMaximum number of items per subscription request. The default value is 1000.
chunking.parallelismMaximum number of parallel requests to create subscriptions. The default value is 10.
keep-aliveKeep alive rate in milliseconds for subscriptions. The default value is 10000.
update-rateRequested update rate, this is how often the server should check for updates from its underlying systems. The server is not required to obey this, and may return a revised update rate, or just use a notification based approach. The default value is 1000
deadbandMinimum difference in value required for an update to be registered. The default value is 0.0

Logger

Log entries are either Fatal, Error, Warning, Information, Debug, Verbose, in order of decreasing importance. Each level covers the ones of higher importance.

ParameterDescription
consoleConfiguration for logging to the console.
console.levelMinimum level of log events to write to the console. Set this to enable console logging.
console.stderr-levelLog events at this level or above are redirected to standard error.
fileConfiguration for logging to a rotating log file.
file.levelMinimum level of log events to write to file.
file.pathPath to the files to be logged. If this is, for example, set to logs/log.txt, logs on the form logs/log[date].txt will be created, depending on rolling-interval.
file.retention-limitMaximum number of log files that are kept in the log folder.
file.rolling-intervalA rolling interval for log files. Either day or hour. The default value is day.

Metrics

The OPC Classic extractor can push some general metrics about usage to a Prometheus pushgateway server, or expose a prometheus server for scraping.

ParameterDescription
serverConfiguration for a prometheus scrape server.
server.hostHost for a locally hosted prometheus server, used for scraping.
server.portThe port used by the local prometheus server.
push-gatewaysA list of pushgateway destinations the extractor will push metrics to.
push-gateways[].hostURI of the pushgateway host.
push-gateways[].jobName of the metrics job on this pushgateway.
push-gateways[].usernameUsername for basic authentication.
push-gateways[].passwordPassword for basic authentication.
push-gateways[].push-intervalInterval in seconds between each push of metrics.

Cognite

Configuration for the connection to Cognite Data Fusion (CDF).

ParameterDescription
hostThe CDF service URL. Defaults to https://api.cognitedata.com.
projectThe CDF project. Required.
idp-authenticationConfiguration for authenticating to CDF.
idp-authentication.authorityAuthority used with tenant to authenticate to azure tenants. Use token-url if connecting using a non-azure IdP. Defaults to https://login.microsoftonline.com
idp-authentication.tenantAzure tenant used with authority.
idp-authentication.token-urlURL used to obtain service tokens, used for non-azure IdPs.
idp-authentication.client-idService principal client ID.
idp-authentication.secretService principal client secret.
idp-authentication.resourceOptional resource parameter to pass along with token request.
idp-authentication.scopesA list of scopes to pass along with the request, will typically need to contain [host]/.default
idp-authentication.audienceOptional audience parameter to pass along with token request.
idp-authentication.min-ttlRequested minimum time-to-live in seconds for the token.
idp-authentication.certificateConfiguration for authenticating using a client certificate.
idp-authentication.certificate.authority-urlCertificate authority URL.
idp-authentication.certificate.pathPath to the .pem or .pfx certificate to be used for authentication.
idp-authentication.certificate.passwordCertificate password.
data-setConfiguration for data set to assign newly created assets and time series to.
data-set.idInternal ID of dataset. Specify either this or external-id.
data-set.external-idExternal ID of dataset. Specify either this or id.
updateSet this to true to enable updating assets and time series that have changed in the source.
metadata-targetsTargets for writing "metadata", meaning assets and timeseries name, description, and metadata. By default the extractor will create time series with just an external ID and nothing else.
metadata-targets.cleanConfiguration for writing metadata to CDF Clean.
metadata-targets.clean.assetsSet to true to enable creating assets in CDF.
metadata-targets.clean.time-seriesSet to true to enable writing time series metadata.
max-upload-intervalMaximum time to cache datapoints before they are uploaded to CDF. The syntax is described in Timestamps and intervals. Defaults to 1s
max-data-points-upload-queue-sizeMaximum number of cached datapoints before they are uploaded to CDF. Defaults to 1000000.
cdf-retriesConfiguration for automatic retries on requests to CDF.
cdf-retries.timeoutTimeout in milliseconds for each individual try. Defaults to 80000.
cdf-retries.max-retriesThe maximum number of retries. Less than 0 retries forever.
cdf-retries.max-delayMaximum delay between each try in milliseconds. Base delay is calculated according to 125 * 2 ^ retry milliseconds. If this is less than 0, there is no upper limit. Defaults to 5000.
cdf-chunkingConfiguration for chunking on requests to CDF. Note that increasing these may cause requests to fail, due to limits in the API. Read the API documentation before making these higher than their current value.
cdf-chunking.time-seriesMaximum number of time series per get/create time series request.
cdf-chunking.assetsMaximum number of assets per get/create assets request.
cdf-chunking.data-point-time-seriesMaximum number of time series per datapoint create request.
cdf-chunking.data-pointsMaximum number of datapoints per datapoint create request.
cdf-throttlingConfiguration for how requests to CDF should be throttled.
cdf-throttling.time-seriesMaximum number of parallel requests per time series operation. Defaults to 20.
cdf-throttling.assetsMaximum number of parallel requests per assets operation. Defaults to 20.
cdf-throttling.data-pointsMaximum number of parallel requests per datapoints operation. Defaults to 10.
sdk-loggingConfiguration for logging of requests from the SDK.
sdk-logging.disableSet this to true to disable logging of requests from the SDK, it's enabled by default.
sdk-logging.levelLog level to log messages from the SDK at, defaults to debug.
sdk-logging.formatFormat of the log message. Defaults to CDF ({Message}): {HttpMethod} {Url} - {Elapsed} ms
nan-replacementReplacement for NaN values when writing to CDF. Defaults to none, meaning these are just removed.
extraction-pipeline.external-idConfiguration for associating this extractor with an extraction pipeline. Used for monitoring and remote configuration.
certificatesConfiguration for special handling of SSL certificates. This shouldn't be considered a permanent solution to certificate problems.
certificates.accept-allAccept all remote SSL certificates even if verification fails. This introduces a risk of man-in-the-middle attacks.
certificates.allow-listList of certificate thumbprints to automatically accept. This is a much smaller risk than accepting all certificates.

State Store

Configuration for storing state in a local database or in CDF RAW. This is required if reading from an HDA server.

ParameterDescription
locationPath to database file, or name of raw database containing state store.
databaseWhich type of database to use. Valid options are LiteDb, Raw, or None. The default value is None.
intervalInterval between each push of local states to the state store. The syntax is described in Timestamps and intervals. The default value is 1m.

History

Configuration for reading historical data from an HDA server.

ParameterDescription
backfillSet this to true to enable backfill, meaning that the extractor will read backwards from the earliest known timestamp as well as forwards from the latest known timestamp on startup. This is only useful if there is enough data in the server that reading it all will take a very long time.
start-timeThe earliest timestamp history will be read from, in milliseconds since 01/01/1970. Alternatively use syntax N[timeunit](-ago) where timeunit is one of w, d, h, m, s, or ms. -ago indicates that this is in the past, otherwise it will be in the future.
end-timeThe latest timestamp that history will be read from, in milliseconds since 01/01/1970. Alternatively use syntax N[timeunit](-ago) where timeunit is one of w, d, h, m, s, or ms. -ago indicates that this is in the past, otherwise it will be in the future.
chunkingChunking for history reads.
chunking.chunk-sizeMaximum number of items per history read request. Defaults to 1000.
chunking.parallelismMaximum number of parallel history read requests. Defaults to 10.
chunking.max-per-minuteMaximum number of history read requests per minute.
chunking.max-read-per-tagMaximum number of values returned per tag per request. Defaults to 1000.
granularityGranularity to use when doing history read. Nodes with the last/earliest known timestamp within this range of each other will be read together. This shouldn't be smaller than the usual average update rate. Leave at 0 to always read a single node each time. The syntax is described in Timestamps and intervals. Defaults to 15s.