Hopp til hovedinnhold

Configure the OPC UA extractor

To configure the OPC UA extractor, you must edit the configuration file. The file is in YAML format, and the sample configuration file contains all valid options with default values.

You can leave many fields empty to let the extractor use the default values. The configuration file separates the settings by component, and you can remove an entire component to disable it or use the default values.

Sample configuration files

In the extractor installation folder, the /config subfolder contains sample complete and minimal configuration files. The values wrapped in ${} are replaced with environment variables with that name. For example, ${COGNITE_PROJECT} will be replaced with the value of the environment variable called COGNITE_PROJECT.

The configuration file also contains the global parameter version, which holds the version of the configuration schema used in the configuration file. This document describes version 1 of the configuration schema.

Tip

You can set up extraction pipelines to use versioned extractor configuration files stored in the cloud.

Minimal YAML configuration file

version: 1

source:
# The URL of the OPC-UA server to connect to
endpoint-url: 'opc.tcp://localhost:4840'

cognite:
# The project to connect to in the API, uses the environment variable COGNITE_PROJECT.
project: '${COGNITE_PROJECT}'
# Cognite authentication
# This is for Microsoft as IdP. To use a different provider,
# set implementation: Basic, and use token-url instead of tenant.
# See the example config for the full list of options.
idp-authentication:
# Directory tenant
tenant: ${COGNITE_TENANT_ID}
# Application Id
client-id: ${COGNITE_CLIENT_ID}
# Client secret
secret: ${COGNITE_CLIENT_SECRET}
# List of resource scopes, ex:
# scopes:
# - scopeA
# - scopeB
scopes:
- ${COGNITE_SCOPE}

extraction:
# Global prefix for externalId in destinations. Should be unique to prevent name conflicts.
id-prefix: 'gp:'
# Map OPC-UA namespaces to prefixes in CDF. If not mapped, the full namespace URI is used.
# Saves space compared to using the full URL. Using the ns index is not safe as the order can change on the server.
# It is recommended to set this before extracting the node hierarchy.
# For example:
# NamespaceMap:
# "urn:cognite:net:server": cns
# "urn:freeopcua:python:server": fps
# "http://examples.freeopcua.github.io": efg

ProtoNodeId

You can provide an OPC UA nodeid in several places in the configuration file with an object in YAML with the following structure:

  • node:
    • node-id: i=123
    • namespace-urii: opc.tcp://test.test/

To find the node IDs, we recommend using the UAexpert tool.

Locate the datatype/event-type/node in the hierarchy, then find the node ID on the right side under Attribute > NodeId. Find the Namespace Uri by matching the NamespaceIndex on the right to the list on the left. The default value is No highlight.

If either part is left empty, it's converted to a different node ID based on context. This happens automatically for events if you use the configuration tool released with version 1.1. If a mapping is specified in namespace-map, you can use the mapped value in place of namespace-uri.

Timestamps and intervals

In most places where time intervals are required, you can use a CDF-like syntax of [N][timeunit], for example, 10m for 10 minutes or 1h for 1 hour. timeunit is one of d, h, m, s, ms. You can also use a cron expression when this makes sense.

For history start and end times you can use a similar syntax. [N][timeunit] and [N][timeunit]-ago. 1d-ago means 1 day in the past from the time history starts, and 1h means 1 hour in the future. For instance, you can use this syntax to configure the extractor to read only recent history.

Source

This part of the configuration file concerns the extraction from the OPC UA server.

ParameterDescription
endpoint-urlThe URL of the OPC UA server to connect to. In practice, this is the URL of the discovery server, where multiple levels of security may be provided. The OPC UA extractor attempts to use the highest security possible based on the configuration. Required.
alt-endpoint-urlsList alternative endpoint URLs the extractor can attempt when connecting to the server. Use this for non-transparent redundancy. See the OPC UA standard part 4, section 6.6.2. We recommend setting force-restart to true. Otherwise, the extractor will reconnect to the same server each time.
endpoint-detailsDetails to override default endpoint behavior. This is used to make the client connect directly to an OPC UA endpoint, for example if the server is behind NAT (Network Address Translation), circumventing server discovery. This parameter contains one field: override-endpoint-url, which overrides the URL of the selected endpoint.
redundancyAdditional configuration options related to redundant servers. The OPC UA extractor supports Cold redundancy, as described in the OPC UA standard part 4, section 6.6.2. Options:
  • service-level-threshold - servers above this level are considered live,. If the server drops below this level, the extractor will switch, provided monitor-service-level is set to true. The default value is 200.
  • reconnect-interval: The extractor will look through the available servers at this interval if the service level of the current server is below service-level-threshold. The default value is 10m. The syntax is as given in Timestamps and intervals.
  • monitor-service-level - If true, the extractor will subscribe to changes in the ServiceLevel of the server, and attempt to change server once it drops below service-level-threshold. This also makes the extractor not update states if the service level is below the threshold, letting servers indicate to the extractor that they are not receiving all data from sources.
reverse-connect-urlThe local URL used for reverse-connect. This is the URL the server should connect to. You should also specify an endpoint-url. The server is responsible for initiating connections, so it can be placed behind a firewall. Leave empty to use direct connections.
auto-acceptSet to true to automatically accept connections from servers. If you set this to false and try to connect to a server with higher security than None, the connection fails. A certificate is placed in the rejected certificates folder (by default application_dir/pki/rejected/), but you can manually move it to the accepted certificates folder (application_dir/pki/accepted). A simple solution is to set this to true once on the first connection, then change it to false.
username/passwordUsed for server sign-in. Leave username empty to use no authentication.
x509-certificateSpecifies the configuration for using a signed x509 certificate to connect to the server. Options:
  • file-name - the location of the x509-certificate
  • password - the password to the x509-certificate file.
  • store - the local store to use, either None (to use file), Local (for LocalMachine), or User.
  • cert-name - the name of the certificate in the store.
secureTry to connect to an endpoint with security above None.
ignore-certificate-issuesIgnore all suppressible certificate errors on the server certificate. You can use this setting if you receive errors such as Certificate use not allowed.

CAUTION: This is potentially a security risk. Bad certificates can open the extractor to man-in-the-middle attacks from the server or similar. If the server security is located elsewhere (it's running locally, over a secure VPN, or similar), it's most likely fairly safe.

Some errors aren't suppressible and must be remedied on the server.
publishing-intervalSets the interval (n milliseconds) between publishing requests to the server. This limits the maximum frequency of points pushed to CDF but not the maximum frequency of points on the server. In most cases, this can be set to the same as Extraction.DataPushDelay. If you set it to 0, the server chooses the interval according to the specification.
force-restartIf true, the OPC UA extractor won't attempt to reconnect using the OPC UA reconnect protocol on a disconnect from the server but restart completely. Use this option for servers that do not support reconnecting
exit-on-failureIf true, the OPC UA extractor won't automatically restart after a crash, but defer to some external mechanism.
restart-on-reconnectIf true, the OPC UA extractor will be restarted on reconnect. This may not be required if the server is expected to be static and if it handles reconnects well. Setting this to true lowers restart times.
keep-alive-intervalSpecifies the interval in milliseconds between each keep-alive request to the server. The connection times out if a keep-alive request fails twice (2 * interval + 100ms). This typically happens if the server hangs on a heavy operation and doesn't manage to respond to keep-alive requests or if the server goes down. In the first case, waiting can be a good option. In the second case, it's better to time out quickly.
node-set-sourceRead from NodeSet2 files instead of browsing the OPC UA node hierarchy. This is useful for smaller servers, where the full node hierarchy is defined. In general, it can be used to lower the load on the server if parts of it's known beforehand. Options:
  • node-sets - a list of objects with either file-name or url, pointing to a NodeSet2.xml file.
  • instance - Boolean. If true, the instance hierarchy isn't browsed from the server but obtained from the NodeSet files instead.
  • types - Boolean. If true, event types, reference types, and object types are obtained from the NodeSet2 files.
limit-to-server-configThe default value true uses the Server_ServerCapabilities object to limit chunk sizes. Set this to false only if you want to set the limits higher and are certain that the server is reporting the wrong limits. If the real server limits are exceeded, the extractor will typically crash.
alt-source-background-browseIf true, browses the OPC UA node hierarchy in the background when reading nodes from NodeSet files or from CDF RAW. This setup doesn't reduce the load on the server but can speed up startup.
browse-chunkSets the number of maximum desired results from each call of the Browse service to OPC UA. Most servers have some limits, but the default of 1000 is usually reasonable. The server should also usually limit this on its own.
browse-nodes-chunkSets the number of maximum nodes to browse per browse service call. If set too high, the browse operation may fail. Most servers have an upper limit to the number of operations per service call, and this value also may affect the speed. We don't recommend setting this to 1, but it may be necessary for some servers.
attributes-chunkSpecifies the maximum number of attributes to fetch per operation. If the server fails with a TooManyOperations exception during attribute read, it may help to lower this value. 1000 should be fine for most servers and may even be set higher for higher-spec servers. For very large servers, 1000 will take a long time, and this should be set as high as possible, even if that requires increasing the keep-alive-interval.
subscription-chunkSets the maximum number of new MonitoredItems to create per operation. If the server fails with TooManyOperations, try to lower this value. Unless there are a large number of nodes on the server, 1000 per chunk is generally fine.
browse-throttlingConfiguration object for throttling browses.
  • max-per-minute - Maximum number of browse requests per minute.
  • max-parallelism - Maximum number of parallel browse requests, if supported by the server.
  • max-node-parallelism - Maximum number of nodes to read in parallel. This can be used to limit the number of continuation points used by the extractor.
certificate-expirySpecifies the default certificate expiration in months. You can also replace the certificate with your own by modifying the .xml configuration file. Defaults to 5 years as of v2.5.3.
retriesSpecify the retry policy for requests to the OPC UA server.
  • timeout - Total timeout, after this much time has elapsed, no more retries will be attempted.
  • max-tries - Enter the maximum number of retries.
  • max-delay - Enter the maximum delay between each retry attempt.
  • initial-delay - Enter the initial delay between each retry attempt, This is used as the basis for exponential backoff.
  • retry-status-codes - List numerical status codes to retry and a number retried by default by the extractor.
The syntax for delays is described in Timestamps and intervals.

History

The OPC UA extractor supports reading from data and event history in OPC UA. For data, the Historizing attribute must be set on the nodes to be read. For events, you must specify explicitly the node IDs of the emitters in the configuration.

ParameterDescription
enabledSet to false to disable history read. This overrides all other history configurations and disables these entirely for both events and data points.
dataSet to false to disable history for data points. The default value is true. Use this to only enable history for events.
backfillEnable backfill, meaning that data is read backward and forward through history. The server can start reading live values without completing history-read first if there is a lot of history. If set to false (default), the behavior is pre 1.1, meaning that the data is read from the beginning of history to the end before any live streaming begins.
require-historizingSet to true to require Historizing to be set on time series to read history.
restart-periodTime in seconds to wait between each restart of history. Setting this too low may impact performance. Leave at 0 to disable periodic restarts. The syntax is described in Timestamps and intervals, this option allows cron expressions.
data-chunkMaximum number of results to request per HistoryRead call when reading variables. Generally, this is limited by the server, so it can safely be set to 0.
data-nodes-chunkMaximum number of nodes to query per HistoryRead call when reading variables. If Granularity is set, this is applied afterward.
event-chunkMaximum number of results to request per HistoryRead call when reading events. Generally, this is limited by the server, so it can safely be set to 0.
event-nodes-chunkMaximum number of nodes to query per HistoryRead call when reading events.
granularityGranularity in seconds for chunking history read operations. Variables with the latest timestamp within the same chunk have their history read together. Reading more variables per operation is more efficient, but if the granularity is set too high, then a large number of duplicates are fetched. This can be inefficient for very large granularities. The best choice for this value is a few times the expected update frequency of your variables. The syntax is described in Timestamps and intervals.
start-timeEarliest timestamp to read from in milliseconds since January 1, 1970. The syntax is described in Timestamps and intervals, -ago can be added to make a timestamp in the past.
end-timeTimestamp to be considered the end of forward history. Only relevant if max-read-length is set. In milliseconds since 1/1/1970. The default is the current time, if this is 0. The syntax is described in Timestamps and intervals, -ago can be added to make a timestamp in the past.
ignore-continuation-pointsSet to true to attempt to read history without using ContinationPoints, instead using the Time of events and SourceTimestamp of data points to incrementally change the start time of the request until no points are returned.
max-history-lengthMaximum length of each read of history, in seconds. If this is set greater than zero, history will be read in chunks of maximum this size until the end. This can potentially take a very long time if end-time is much larger than start-time. The syntax is described in Timestamps and intervals.
throttlingConfiguration object for throttling history reads.
  • max-per-minute - The maximum number of history requests per minute.
  • max-parallelism - The maximum number of parallel history requests if supported by the server.
  • max-node-parallelism - Maximum number of nodes to read in parallel. This can be used to limit the number of continuation points used by the extractor.
log-bad-valuesThe default value is true. Log bad history data points, count per read at debug, and each data point at verbose.
error-thresholdThe threshold in percent for a history run to be considered failed. For example, if this is set to 10.0, the history read will be considered failed if more than 10% of nodes fail to read at some point. Retries still apply. This only applies to nodes that fail even after retries. This is safe in terms of data loss. A node that has failed during history will not receive state updates from streaming.

Dry run

The dry-run option is on the top level. If this is set to true, the extractor will read from OPC UA, but not push anything to CDF. This is useful for debugging the extractor setup.

Cognite - CDF API

Configuration for pushing directly to the CDF API.

ParameterDescription
projectThe CDF project. Required. Can be left out if the OPC UA extractor is set to debug mode.
hostThe CDF service URL.
read-extracted-rangesSpecifies whether to read start/endpoints on startup, where possible. At least one pusher should be able to do this. Otherwise, the back/frontfill will run for the entire history of every restart. The CDF pusher can't read start/end points for events, so if reading historical events is enabled, one other pusher able to do this should be enabled. If the server has a lot of variables, this can be extremely slow, and we recommend using the state-store instead.
data-set-idThe internal ID of the CDF data set to be used for all new time series, assets, and events. Already created items won't be affected.
data-set-external-idThe data set to use for new objects, overridden by data-set-id. Requires the capability datasets:read for the given data set.
nan-replacementReplacement value for values that are non-finite, for instance NaN, +Infinity, and -Infinity. If this is left empty, these points are ignored.
metadata-targetsConfiguration for targets for metadata, meaning assets, time series metadata, and relationships.
metadata-targets/cleanConfiguration for enabling writing to clean. Options:
  • assets - Set to true to enable writing to CDF assets. The default value is false.
  • timeseries - Set to true to enable writing to CDF time series. The default value is false.
  • relationships - Set to true to enable writing to CDF relationships. The default value is false.
metadata-targets/rawConfiguration for writing to CDF RAW. Options:
  • database - The RAW database to write to, required.
  • assets-table - Name of the RAW table to write assets to, enables writing objects and types to RAW.
  • timeseries-table - Name of the RAW table to write timeseries to, enables writing variables to RAW.
  • relationships-table - Name of the table to write relationships to, enables writing references to RAW.
raw-metadataConfiguration for using CDF RAW to store assets and time series metadata. This is deprecated in favor of cognite.metadata-targets.
raw-node-bufferRead from CDF instead of OPC UA when starting the extractor to speed up starting on slow servers. This requires the extraction.expand-node-ids and extraction-append-internal-values to be set to true. Generally, this would be enabled along with skip-metadata or raw-metadata. Reading from CDF RAW into clean using this is generally not supported.

If browse-on-empty is set to true, and raw-metadata is configured with the same database and tables, the extractor will read from the server on first startup only, then use CDF RAW for all further reads.

With this enabled, rebrowse/updates are generally pointless.

  • enabled - set to true to enable feature.
  • database - CDF RAW database to read from
  • assets-table - CDF RAW table to read assets from for events.
  • timeseries-table - CDF RAW table to read time series from for events and data points.
  • browse-on-empty - Run normal browse if nothing is found when reading from CDF. Note that nodes may be present in the CDF RAW table. Browse will still run if none are variables and none have a valid EventNotifier.
metadata-mappingContains two string/string maps named assets and timeseries. It lets you define mappings between properties in OPC UA and CDF attributes. For example, it's quite common for variables in OPC UA to have an EngineeringUnits field, which ideally should be mapped to unit in CDF. This can be done with

timeseries:

  EngineeringUnits: unit

Valid attributes are: name, description, and parentId, and unit for time series. parentId must be the parent external ID of the time series, and it must be an asset mapped by the OPC UA extractor. It may be a string ID or a node ID.
skip-metadataIf true, assets won't be written to CDF, and only basic time series will be created. This is the same as when raw-metadata is enabled, except that nothing will be pushed to CDF RAW either. This is deprecated in favor of cognite.metadata-targets.
idp-authenticationConfiguration for authentication using a bearer access token.

See OAuth 2.0 client credentials flow.

Required fields are client-id, tenant, secret, scopes.

min-ttl is optional minimum time-to-live in seconds for the token. The default value is 30.

The authentication is inferred if you enter a tenant or token-url. You can only set one. If you set tenant, MSAL is used for authentication. If you set token-url , basic is used for authentication. .

authority is the identity provider endpoint. The default is https://login.microsoftonline.com/.
cdf-retriesConfigure automatic retries on requests to CDF. Fields:
  • timeout - The maximum timeout for each individual try.
  • max-retries - The maximum number of retries, less than 0 retries forever.
  • max-delay - The maximum delay in milliseconds between each try. Base delay is calculated according to 125*2^retry ms. If less than 0, there is no maximum (0 would mean no delay).
If the connection to CDF is very poor, you may need to change this setting. Lowering the maximum number of retries can also lower the time to failure-buffering starts, which may be necessary if there is a lot of data.
cdf-chunkingConfigure chunking of data on requests to CDF. Note that some of these reflect actual limits in the API, and increasing them may cause requests to fail. See https://docs.cognite.com/api/v1/.
  • time-series - The maximum number of time series per get/create time series request.
  • assets - The maximum number of assets per get/create asset request.
  • data-point-time-series - The maximum number of time series per data point create request.
  • data-points - The maximum number of data points per data point create request.
  • data-point-list - The maximum number of time series per data point read request, used when getting first point in a time series.
  • data-point-latest - The maximum number of time series per data point read latest request.
  • raw-rows - The maximum number of rows per request to CDF RAW. Used with RAW state-store and for RAW asset/time series metadata.
  • events - The maximum number of events per get/create events request.
cdf-throttlingConfigure how requests to CDF should be throttled. Each entry is the maximum allowed number of parallel requests to CDF. Fields: time series, assets, datapoints, raw, ranges (first/last data point), and events.
sdk-loggingConfiguration for logging using the .NET SDK. This is additional debug information about requests and will show in detail what requests fail and how long they take.
  • disable - Set to true to disable logging from the SDK. The default value is false.
  • level - The level of logging, either trace, debug, information, warning, error, critical, none.
  • format - The formatting of the log message.
extraction-pipelineConfigure an extraction pipeline manager. The pipeline must be created beforehand.
  • external-id - The external ID of the extraction pipeline in CDF.
  • frequency - The frequency to report Seen in seconds. Less than or equal to zero won't report automatically.
browse-callbackCall a Cognite function with the number of assets, time series, and relationships created and updated after each browse and rebrowse operation. The function is called with a JSON object containing the following fields:
  • idPrefix - The configured extraction.id-prefix.
  • assetsCreated - The number of new assets or raw rows in the assets table created.
  • assetsUpdated - The number of assets updated, or raw rows in the asset table modified.
  • timeSeriesCreated - The number of new time series or raw rows in the time series table.
  • timeSeriesUpdated - The number of time series updated, or raw rows in the time series table modified.
  • minimalTimeSeriesCreated - The number of time series created with no metadata, only used if time series are written to CDF RAW.
  • relationshipsCreated - The number of new relationships or raw rows in the relationships table.
  • rawDatabase - Name of the configured CDF RAW database.
  • assetsTable - Name of the configured CDF RAW table for assets.
  • timeSeriesTable - Name of the configured CDF RAW table for time series.
  • relationshipsTable - Name of the configured CDF RAW table for relationships.
Minimal time series refers to time series that are created with no metadata when time series are written to CDF RAW. This option requires functions:WRITE scoped to the function given by external ID or ID, and functions:READ if external ID is used. It's a YAML object with fields:
  • external-id - function external ID. If this is used, functions:READ is required.
  • id - function internal ID.
  • report-on-empty - default false, set to true to always report, even if nothing was modified in CDF.
delete-relationshipsIf this is set to true, relationships deleted from the source will be hard-deleted in CDF.

Influx

Configuration for pushing to an InfluxDB database. Data points and events will be pushed, but no context or metadata.

ParameterDescription
hostThe URL of the InfluxDB server.
usernameThe username for connecting to the database.
passwordThe password for connecting to the database
databaseThe database to connect to on the server. The database won't be created automatically.
read-extracted-rangesWhether to read start/endpoints on startup, where possible.
read-extracted-event-rangesWhether to read start/endpoints for events on startup, where possible.
point-chunk-sizeMaximum number of points per push. Try to increase if the pushing seems to be slow.
non-finite-replacementReplacement value for values that are non-finite, e.g. NaN, +Infinity and -Infinity. Leave empty to ignore these points.

MQTT

The MQTT pusher pushes to CDF one-way over MQTT. It requires that the MQTTCDFBridge application is running somewhere with access to CDF.

ParameterDescription
hostThe address of TCP MQTT broker. This needs to be running for the pusher to function.
portThe port on the TCP MQTT broker.
usernameThe MQTT broker username. Leave empty to connect without authentication.
passwordThe MQTT broker password. Leave empty to connect without authentication.
client-idThe MQTT Client ID. This needs to be unique for each broker.
data-set-idThe internal ID of CDF dataset to be used for all new time series, assets, and events. Already created items won't be affected.
asset-topicThe topic to use for assets. Needs to match the configuration of MQTTCDFBridge (it does by default).
ts-topicThe topic to use for time series.
event-topicThe topic to use for events.
datapoint-topicThe topic to use for data points.
raw-topicThe topic to use for raw rows.
local-stateSet to enable storing a list of created assets/time series in a local database. Requires the StateStorage.Location property to be set. The value of this option is the table name. The default value is empty. Using this with raw state-storage doesn't make sense.
invalidate-beforeTimestamp in ms since epoch to invalidate stored states. Any objects created before this will be replaced the next time the OPC UA extractor is restarted.
non-finite-replacementThe replacement value for values that are non-finite e.g. NaN, +Infinity and -Infinity, or not between -10^100 and 10^100. If this is left empty, these points are ignored.
raw-metadataConfiguration for using CDF RAW to store assets and time series metadata.
raw-metadata/databaseThe CDF RAW database to store metadata in, required for this feature to be enabled.
raw-metadata/assets-tableThe CDF RAW table to store assets in. If this is set along with database, assets aren't pushed to the asset hierarchy but instead written to RAW. Time series won't be contextualized in this case, but if timeseries-table is set, the asset external ID will be stored there. The assets are pushed as full asset JSON objects with all the data available from extraction.
raw-metadata/timeseries-tableThe CDF RAW table to store time series in. If this is set along with database, time series are pushed with minimum information (isStep, isString, externalId). Everything else is stored in CDF RAW as full time series JSON objects.
metadata-mappingContains two string/string maps named assets and timeseries. It lets you define mappings between properties in OPC UA and CDF attributes. For example, it's quite common for variables in OPC UA to have an EngineeringUnits field, which ideally should be mapped to a unit in CDF. This can be done with

timeseries:

  EngineeringUnits: unit

Valid attributes are name, description, and parentId, and unit for time series. parentId must be the parent externalId of the time series, and it must be an asset mapped by the OPC UA extractor. It may be a string ID directly or a node ID.
skip-metadataIf true, assets won't be written to CDF, and only basic time series will be created. This is the same as when raw-metadata is enabled, except that nothing will be pushed to CDF RAW either.
allow-untrusted-certificatesIf true, allow untrusted certificates when connecting to the MQTT broker. This is a security risk. We recommend using custom-certificate-authority instead.
custom-certificate-authorityPath to a custom certificate file for a certificate authority the broker SSL certificate will be verified against.

Logger

Log entries are either Fatal, Error, Warning, Information, Debug, Verbose, in order of decreasing importance. Each level covers the ones of higher importance.

ParameterDescription
console/levelThe level of messages to write to console. If not present, or invalid, logging to console is disabled. One of fatal, error, warning, information, debug, or verbose.
file/levelThe level of messages to write to file. If not present, or invalid, logging to file is disabled. One of fatal, error, warning, information, debug, or verbose.
file/pathThe path to a log file, logs are rotated.
file/retention-limitThe maximum number of logs to keep in log folder. The oldest are deleted.
file/rolling-intervalA rolling interval for log files. Either day or hour. The default value is day.
ua-trace-levelCapture OPC-UA tracing at this level or above. One of fatal, error, warning, information, debug, or verbose. This parameter is optional.
ua-session-tracingLog data sent to and received from the OPC UA server.

StateStorage

A local LiteDb database or a table in CDF RAW that stores various persistent information between runs. It can be used as a replacement of the potential process of reading first/last data points from CDF, and also allow storing first/last times for events.

ParameterDescription
locationThe path to the .db file used for storage, or the name of the CDF RAW database.
intervalThe time between each time the state store is updated. Use syntax described in Timestamps and intervals. Defaults to 10s.
databaseWhich type of database to use. Valid options are None, Raw, LiteDb.
variable-storeThe name of the table or litedb collection to store information about extracted OPC UA variables.
event-storeThe name of the table or litedb collection to store information about extracted events.
influx-variable-storeThe name of the table or litedb collection to store information about variable ranges in influxdb failure buffer.
influx-event-storeThe name of the table or litedb collection to store information about event ranges in influxdb failure buffer.

FailureBuffer

If the connection to a destination goes down, the OPC UA extractor supports buffering data points and events in influxdb or a local file. This is helpful if the connection is unstable.

ParameterDescription
datapoint-pathThe path to the binary file where data points are buffered. Leave empty to disable pushing data points to file. Buffering to file is very fast, and is generally hardware bound.
enabledSet to true to enable the FailureBuffer for all pushers.
event-pathThe path to the binary file where events are buffered. Leave empty to disable pushing events to file.
influxSet to true to enable buffering in influxdb. This requires influxdb to be running. This serves as an alternative to a local file, but should only be used if pushing to influxdb is required for other reasons.
influx-state-storeSet to true to enable storing the state of the influxdb buffer to a local database. This makes the influxdb buffer persistent even if the OPC UA extractor stops before it's emptied. Requires the StateStorage.Location option to be set.
max-buffer-sizeSet the maximum size in bytes for the buffer file. If the size exceeds this size, no new datapoints or events will be written to their respective buffer files, and any further ephemeral data is lost. Note that if both datapoint and event buffers are enabled, the potential disk usage is twice this number.

Metrics

The OPC UA extractor can push some metrics about usage to a Prometheus pushgateway server.

ParameterDescription
server/hostThe hostname for a locally hosted Prometheus server, used for scraping.
server/portThe port used for a locally hosted Prometheus server.
push-gatewaysA list of pushgateway configurations. The OPC UA extractor will periodically push to each of these in turn.
push-gateways/hostThe pushgateway URL root. Ex. config my.prometheus.server and job myjob gives the final endpoint my.prometheus.server/metrics/jobs/myjob
push-gateways/jobThe job to use in the destination.
push-gateways/usernameThe username for the Prometheus target.
push-gateways/passwordThe password for the Prometheus target.
nodesUse to treat certain OPC UA nodes as metrics.
  • server-metrics - If true, a couple of relevant diagnostics from ServerDiagnosticsSummary are mapped.
  • other-metrics - List of ProtoNodeId describing nodes that should be treated as metrics.

Extraction

Contains configuration settings for most extraction options, such as mapping, datatypes, and filters.

External ID generation

IDs used in OPC UA are special nodeId object with an identifier and a namespace that need to be converted to a string for destination systems. However, a direct conversion has several problems:

  • It will use the namespaceIndex, which isn't necessarily preserved between server restarts.
  • The namespace table may be modified, in which case all old nodeIds are invalidated.
      • NodeIds are also not unique between OPC UA servers and frequently just count from 1, which makes reading from multiple OPC UA servers impossible.
  • Node identifiers can be duplicated on different namespaces.

The solution is a nodeId on the following form:

IdPrefix + namespace + identifiertype(i,s,g,etc.) + = + identifier value as string
(+ [index in array if viable])

For example, the node with nodeId (SomeId, http://my.namespace.url), using the ID prefix gp: will be mapped to gp:http://my.namespace.url:i=SomeId. You can specify a namespace mapping in extraction/namespace-map to, for example, turn this into gp:mnu:i=SomeId

If it's an array, it turns into an object with the above ID, and several time series with IDs like gp:mnu:i=SomeId[1].

Alternatively, you can manually override each nodeId.

ParameterDescription
id-prefixPrefix used to generate NodeIds.
ignore-name-prefixDEPRECATED, use transformations. List of strings used to filter out prefixes on the DisplayName of nodes during browsing. This means that children of these nodes are also filtered out.
ignore-nameDEPRECATED, use transformations. List of full DisplayNames to ignore instead of just a prefix.
data-push-delayTime between each push to destinations, in ms. The syntax is described in Timestamps and intervals.
root-nodeA single ProtoNodeId (as described above) used as the origin of the browse. An empty ProtoNodeId (no identifier or no namespace) is treated as the objects folder. Combined with root-nodes, if specified. If neither root-node or root-nodes is specified, this defaults to the Objects folder.
root-nodesA list of ProtoNodeIds to use as root nodes when browsing. These will generally be created as root assets in CDF. If a node set as root node is discovered as a descendant of another root node it will be ignored, but it may be best to avoid doing this at all.
node-mapMap from strings, representing externalIds, to ProtoNodeIds. This can be used to override the externalIds, for example to place the hierarchy as children of an asset in CDF.

For example, if UaRoot is set to the same value as the RootNode, all the nodes in the tree will be placed as children of the node with externalId UaRoot.
namespace-mapUsed as described above to map namespaces to shortened identifiers.
data-typesSub-object containing configuration for how data types and arrays should be handled by the OPC UA extractor.
data-types/custom-numeric-typesUsed to manually set types in OPC UA to be numeric. This can be used to make custom types be treated as numbers, etc. The conversion is done with the C# Convert functionality. If no valid conversion exists, this will fail.
data-types/ignore-data-typesList of ProtoNodeId (as described above), describing data types on variables to filter out.
data-types/unknown-as-scalarAssume non-specific ValueRanks in OPC UA (ScalarOrOneDimensions and Any), are scalar, if they do not have an ArrayDimension set. If such a variable produces an array, only the first element will be mapped to CDF. In order to properly extract arrays to CDF, ArrayDimensions must be set.
data-types/max-array-sizeMaximum length of arrays to be mapped to destinations. If this is set to 0, only scalar values are mapped. Each array-type variable in the source system is converted to an object in the destination system, then each entry in the array is added as a child variable of that object. (In CDF this will mean that you get an asset with the externalId corresponding to the original variable, with time series for each entry in the array.)

This requires the ArrayDimensions property to be set and be of length 1.
data-types/allow-string-variablesSet to true to map variables of non-numeric types to strings in destination systems.
data-types/auto-identify-typesMap out the data type hierarchy before starting. This is useful if there are custom or enum types. This is necessary for enum metadata and for enums-as-strings to work. If set to false, any custom numeric types must be added manually.

This causes some extra work on startup.
data-types/enums-as-stringsIf set to false and auto-identify-types is set to true, or there are manually added enums in custom-numeric-types, enums will be mapped to numeric time series, and labels are added as metadata fields. If set to true, labels aren't mapped to metadata, and enums will be mapped to string time series with values equal to mapped label values.
data-types/data-type-metadataAdd a metadata property dataType which contains the name or ID of the OPC UA datatype. Built-in types can always be mapped to name, custom types require auto-identify-types to be set to true.
data-types/null-as-numericTreat null data types as numeric. This can be useful on servers without string variables and faulty data types.
data-types/expand-node-idsAdd attributes such as NodeId, ParentNodeId, and TypeDefinitionId to nodes in CDF RAW , as full NodeIds encoded reversibly.
data-types/append-internal-valuesAdd internal attributes like ValueRank, ArrayDimensions, AccessLevel, and Historizing to nodes in CDF RAW .
data-types/estimate-array-sizesIf max-array-size is set, this looks for the MaxArraySize property on each node with one-dimension ValueRank. If this isn't found, it tries to read the value as well and look at the current size. ArrayDimensions is still the preferred way to identify array sizes, this isn't guaranteed to generate reasonable or useful values.
auto-rebrowse-periodTime in minutes between each automatic re-browse of the node hierarchy. Since only new nodes are pushed to destinations, this is usually quite fast. The syntax is described in Timestamps and intervals, this option accepts cron expressions.
enable-audit-discoveryThe OPC UA extractor listens to AuditAddNodes and AuditAddReferences events on the server node, then uses the information in these to browse the hierarchy. This is more efficient than browsing periodically, but requires server support for auditing.
map-variable-childrenBy default, children of variables are treated as properties. If this is set to true, they can be treated as objects or variables instead. This will cause some variables to be mapped to both time series and assets, to allow time series to have time series children.
updateUpdate data in destinations on re-browse or restart. Set auto-rebrowse-period to some value to do this periodically. Consists of two objects, objects, and variables, controlling updates of assets and time series, respectively. For each, name, description, context, and metadata can be configured separately.

context refers to the structure of the node graph in OPC UA (assetId and parentId in CDF). Metadata refers to any information obtained from OPC UA properties (metadata in CDF).

Enabling any of these will increase the startup- and rebrowse-time of the OPC UA extractor. Enabling metadata will increase it more.
relationshipsMap OPC UA non-hierarchical references to relationships in CDF. The generated relationships will have external-id [prefix][reference type name (or inverse-name)];[namespace source][id source];[namespace target][id target]

Only relationships between mapped nodes will be added. This may be relevant if the server contains functional relationships, like connected components, a non-hierarchical reference based system for location, etc.
relationships/enabledEnable mapping non-hierarchical relationships to CDF. This is also required for any kind of relationship mapping to occur at all.
relationships/hierarchicalMap hierarchical references to relationships in CDF.
relationships/inverse-hierarchicalCreate inverse relationships for each hierarchical reference. For efficiency these are inferred, not read.
node-typesConfig related to mapping object- and variable-types to destinations.
node-types/metadataAdd the TypeDefinition as a metadata field to all nodes.
node-types/as-nodesAllow discovered types to be treated as nodes and mapped to CDF assets. Requires these to be inside the hierarchy, a solution to this may be to specify the Types folder as a root node.
transformationsA list of transformations to be applied to the source nodes before pushing. The possible transformations are:
  • Ignore - ignore the node. This will ignore all descendants of the node. If the filter doesn't use is-array, description, or parent, this is done while reading, and so children won't be read. Otherwise, the filtering happens later.
  • Property - turn the node into a property, which is treated as metadata. This also applies to descendants. Nested metadata is give a name like grandparent_parent_variable, for each variable in the tree. There is some overhead associated with the filters.
  • DropSubscriptions - do not subscribe to this node with events or data points.
  • TimeSeries - make the variable not a property, so that it's treated as a time series instead. Requires parents to be non-properties as well.
Note that transformations are applied sequentially, so it can help performance to put Ignore filters first, and that TimeSeries transformations can undo Property transformations.

It's possible to have multiples of each filter type. Each transformation consists of a type field and a filter field. The type is either Ignore, Property, or TimeSeries, the filter has the following fields:
  • name - regex filter on node DisplayName.
  • description - regex filter on node Description.
  • id - regex filter on string representation of node ID, on the form “i=123, s=string, etc.
  • is-array - true/false whether the node is an array. If this is set to some value, the filter will only match variables that satisfy the requirement.
  • namespace - regex filter on full namespace of the node ID.
  • type-definition - regex filter on string representation of TypeDefinition NodeId, on the form i=123, s=string, etc.
  • node-class - filter on the NodeClass of the node, either Object, Variable, ObjectType, VariableType.
  • historizing - true/false on the Historizing attribute on variables. If this is set to some value, filter will only match variables.
  • parent - another instance of this filter that will be applied to the parent node if it exists. For nodes without registered parents, this will always miss.
rebrowse-triggersConfigure the extractor to trigger a rebrowse of the server when there are changes to specific namespace metadata nodes. Options:
  • targets - Which node types to listen to changes to, currently the only valid option is namespace-publication-date: true
  • namespaces - A list of namespace URIs filtering the selected nodes. Leave empty to use all namespaces.
deletesConfiguration for soft deletes. When this is enabled, all read nodes are written to a state store after browse. Nodes that are missing on subsequent browses are marked as deleted from CDF with a configurable marker. A notable exception is relationships in CDF, which have no metadata. These are hard-deleted if cognite.delete-relahionships is enabled. Options:
  • enabled - Enable deletes, this requires a state store to be configured.
  • delete-marker - Name of marker indicating a node is deleted. Added to metadata, or as a column in RAW. The default value is deleted.

Subscriptions

A few options for subscriptions to events and data points. Subscriptions in OPC UA consist of Subscription objects on the server, which contain a list of MonitoredItems. By default, the extractor produces a maximum of four subscriptions:

  • DataChangeListener - handles data point subscriptions.
  • EventListener - handles event subscriptions.
  • AuditListener - which handles audit events.
  • NodeMetrics - which handles subscriptions for use as metrics.

Each of these can contain a number of MonitoredItems.

ParameterDescription
data-pointsThe default value is true. Enables subscriptions on data points.
eventsThe default value is true. Enables subscriptions for events.
data-change-filterModify the DataChangeFilter used for data point subscriptions. See OPC UA reference part 4 7.17.2 for details. These are passed to the server in DataChangeListener.
  • trigger - Either Status, StatusValue, StatusValueTimestamp. The default value is StatusValue.
  • deadband-type - Default None. Either None, Absolute, or Percent.
  • deadband-value - Default 0, meaning depends on deadband-type.
ignore-access-levelIgnore the AccessLevel attribute and subscribe to all Variables, reading history from all nodes with Historizing set to true. This is the pre-2.3 behavior.
log-bad-valuesLog bad subscription data points.
sampling-intervalSets the sample rate of subscriptions on the server. The server usually defines a set of permitted sample-rates and picks the closest to what you specify here. Many servers don't support more than a single sample rate. Set the interval to 0 to use the server default.

This setting generally sets the maximum rate of points from the server (in milliseconds). On many servers, sampling is an internal operation, but on some, this may access external systems. Setting this very low can increase the load on the server significantly. It typically limits the density of the points from the server, but not always.

queue-lengthSpecifies the length of the internal server queue for points and events. Normally, this can be set to the same as publishing-interval/sampling-interval. Higher numbers increase the strain on the server. Many servers have a limited maximum queue size or ignore this parameter entirely and use a fixed size for everything.
keep-alive-countThe number of publish requests without a response before the server should send a keep alive message. Default 10.
lifetime-countThe number of publish requests without a response before the server should close the subscription. Must be at least 3 * keep-alive-count. Default 1000.
alternative-configsList of alternative subscription configurations. The first entry with a matching filter will be used for each node. Contains data-change-filter, sampling-interval, and queue-length, as well as filter, which contains the following fields:
  • id - regex on node external ID.
  • data-type - regex on node data type, if it is a variable
  • is-event-state - match on whether this subscription is for data points or events

Events

Events in OPC UA are usually custom when used on a server, and servers that support events often have a large number active. In OPC UA, any node may specify the EventNotifier property, which indicates that it emits events and optionally stores historical events.

By default, all events will be read. If all-events is set to false, only events that do not belong to the base namespace will be read.

The attributes of each event are automatically mapped out, and a few general properties are filtered off. Others may be used as metadata in CDF or other source systems, or in some cases be mapped directly to event properties.

If the event has a SourceNode that refers to a node in the mapped hierarchy, it will be used to set the assetId property on the event in CDF.

The old options event-ids, emitter-ids, and historizing-emitter-ids are, deprecated, but will still work and may be used as a workaround for servers that aren't fully compliant with the OPC UA standard.

ParameterDescription
enabledTrue to enable reading events from the server. If this is false, no events will be read.
historyTrue to enable reading historical events.
all-eventsTrue to read all events, not just custom events. The default value is true.
read-serverTrue to also check the server node when looking for event emitters. The default true.
exclude-event-filterRegex filter on event type DisplayName, matches won't be extracted.
exclude-propertiesList of BrowseNames for properties of events to be excluded from metadata or other consideration. By default, only Time and Severity are used from the BaseEventType, all properties of subtypes are included.
destination-name-mapMap source browse names to other values in the destination. For CDF, internal properties may be overwritten, by default Message is mapped to description, SourceNode is used for context, and EventType is used for type. These may also be excluded or replaced by overrides in DestinationNameMap. If multiple properties are mapped to the same value, the first non-null is used.

If StartTime, EndTime, or SubType are specified, either directly or through the map, these are used as event properties instead of metadata. StartTime and EndTime should be either DateTime, or a number corresponding to the number of milliseconds since January 1 1970. If no StartTime or EndTime are specified, both are set to the Time property of BaseEventType. Type may be overridden case-by-case using NodeMap in the Extraction configuration, or in a dynamic way here. If no Type is specified, it's generated from Event NodeId in the same way ExternalIds are generated for normal nodes.
event-ids (deprecated)List of ProtoNodeIds (as described above) to be mapped to destinations. Events must be ObjectTypes and subtypes of BaseEventType in the OPC UA hierarchy. An empty ProtoNodeId defaults to the BaseEventType. This serves as an allowlist. If not specified, all events will be extracted.
emitter-ids (deprecated)List of ProtoNodeIds used as emitters. An empty ProtoNodeId defaults to the server node. This allows specifying additional event emitters. This is used to add extra emitters that aren't in the extracted node hierarchy, or that doesn't correctly specify the EventNotifier property.
historizing-emitter-ids (deprecated)List of ProtoNodeIds that must be a subset of the EmitterIds. These emitters will have their event history read. The server must support this. The events.history option must be set for this to work. This is used to supplement the EventNotifier property, so that events that do not have the EventNotifier property set may still have their events read. Note that attempting to read historical events from non-historizing emitters may cause issues.

Pub-Sub

This is an experimental feature that allows subscribing to OPC UA pubsub instead of using OPC UA subscriptions for data points only. This requires the OPC UA server to be available and to expose the full PubSub configuration, as described in Part 14 of the OPC UA standard. It currently only supports MQTT.

Note that this doesn't disable subscriptions, you may want to consider setting subscriptions: data-points: false to avoid getting double data points.

Time series aren't created from OPC UA pubsub configuration, but must be discovered in the OPC UA node hierarchy.

ParameterDescription
enabledThe default value is false. Enables pub-sub discovery.
prefer-uadpThe default value is true. If set to false, the extractor will prefer using uadp if the same datasets are exposed through multiple DataSetWriters.
file-nameSave or read configuration from a file. If the file doesn't exist, it will be created from server configuration. If this is pre-created manually, the server doesn't need to expose pubsub configuration.

High availability

The extractor can run with a rudimentary form of redundancy. Multiple extractors on different machines are on standby, with one actively extracting from the OPC UA server. Each extractor must have a unique index.

ParameterDescription
indexA unique index for this extractor. Indices must be unique, or high availability will not work correctly.
rawUse the CDF staging area as a shared store for the extractor. This configuration must be the same for each redundant extractor.
  • database-name - Name of the database in CDF.
  • table-name - Name of the table in CDF.
redisUse a redis store as shared state for the extractor. This configuration must be the same for each redundant extractor.
  • connection-string - Redis connection string.
  • table-name - Name of the redis table to use.