Set up on-premises data integration

This guide contains best practices for setting up on-premises data integration services for Cognite Data Fusion (CDF).

In CDF, there are two options for integrating with external systems: extractors for one-way data transfer into CDF, and connectors for two-way communication between external systems and CDF.

User and service configuration

Most data integration services deployed on Windows run as Windows services, which provides reliable and stable operation.

Each deployment should ideally have a dedicated Windows user with minimal necessary permissions. The service user must be granted:

Read access to the service executable
Write access to the working directory

Configure access in Windows:

Right-click the file or folder you want to configure.
Select Properties.
Select Security.
Edit the groups to add the service user.
Select the permissions for the user.

In Linux, see $ man chmod for instructions on how to change permissions to specific files and folders.

Configuration files and secrets

Configuration files should generally be kept free of secrets. Use environment variable substitution or Azure Key Vault instead. Note that this is not itself more secure, but it makes it harder to accidentally leak secrets when sharing configuration files.

Environment variables should be assigned to the service user. Do not store secrets in system environment variables.

If you're using Azure Key Vault, the client credentials for the user should only have access to the secrets necessary for that service. In general, using Azure Key Vault is less secure than storing secrets on-premises. Anyone with access to the on-premises machine will also have access to the Azure Key Vault secrets, but now the secrets are also accessible from other locations.

The primary advantage of Azure Key Vault is that it allows you to rotate secrets easily.

Network configuration

Firewall configuration

Cognite's data integration services don't require any open ports, unless noted otherwise in the setup documentation. If you do open ports, for example, for Prometheus metric scraping, make sure that only the specified service is accessible on that port. Set Program in the firewall rule to the service executable. If the external application accessing that port has a static IP address, you can also set the Remote address option in the firewall rule.

Transport Level Security

The data integration services use Transport Layer Security (TLS) when connecting to CDF. Since traffic to CDF always goes over the internet, it's very important to have proper end-to-end encryption. This is required and can't be disabled.

When connecting to CDF, the service verifies that a valid certificate authority issued the CDF certificate. This is to ensure you're connecting to CDF and not to something pretending to be CDF.

Certificate validation may fail for various reasons, each with separate solutions:

There's a proxy or firewall in between systems with a unsigned or invalid certificate. Add the certificate for the proxy to the certificate store. In Windows, open the certificate and select Install Certificate
The machine lacks up-to-date certificates and is likely far behind on Windows updates. This is a severe security vulnerability. Always keep your Windows updates current.

Source system security

Security varies between different integration services when you're connecting CDF to source systems. Some systems use TLS, in which case the recommendations above apply. Other services have more specialized security features for the specific source. Read the service documentation to determine if any action is necessary to ensure a secure connection.

User and service configuration​

Configuration files and secrets​

Network configuration​

Firewall configuration​

Transport Level Security​

Source system security​