Skip to main content

General security guidelines for extractors

This guide contains best practices for setting up on-premise extractors. The focus is on Windows, but most of the guidelines apply to Linux as well.

User and service configuration

Most extractors deployed on Windows run as Windows services. This is a good practice in general as it is reliable and stable.

Each extractor deployment should ideally have a dedicated Windows user, with no permissions beyond what is absolutely necessary. The extractor user must then be granted:

  • Read access to the extractor executable.
  • Write access to the working directory of the extractor.

Configure access in Windows:

  1. Right click on the file or folder you wish to configure.
  2. Select properties.
  3. Select security
  4. Edit the groups to add the extractor user.
  5. Select the desired permissions for the user.

In Linux, see $ man chmod for instructions on how to change permissions to specific files and folders.

Configuration files and secrets

Configuration files should generally be kept free of secrets. Use environment variable substitution or Azure Key Vault instead. Note that this is not itself more secure, but it makes it harder to accidentally leak secrets when sharing configuration files.

Environment variables should be assigned to the extractor user, do not store secrets in system environment variables.

When using Azure Key Cault, the client credentials for the user should only have access to the secrets necessary for that extractor. Note that in general, using Azure Key Vault is less secure than storing them on-premise. Anyone with access to the on-premise machine will also have access to the Azure Key Vault secrets, but now the secrets are accessible from elsewhere as well.

The primary advantage of Azure Key Vault is that it allows you to rotate secrets easily.

Network configuration

Firewall configuration

Our productized extractors, unless noted otherwise in setup documentation, do not require any open ports. If you do open ports, for example for Prometheus metric scraping, make sure that only the specific extractor is accessible on that port, by setting the Program option in the firewall rule to the extractor executable. If the external application accessing that port has a static IP address, you can also set the Remote address option in the firewall rule.

Transport Level Security

The extractors use Transport Level Security (TLS) when connecting to Cognite Data Fusion (CDF). Since traffic to CDF always goes over the internet, it is very important to have proper end-to-end encryption. This is required, and cannot be disabled.

When connecting to CDF, the extractor will verify that the CDF certificate was issued by a valid certificate authority. This is necessary in order to know that you are actually connecting to CDF, and not something pretending to be CDF.

Sometimes, certificate validation will fail, which might happen for a few reasons, with different ways to fix the issue in each case.

  • There is some proxy or firewall in between that has an unsigned or invalid certificate. In this case, the certificate for the proxy should be added to the certificate store. In Windows, this can be done by simply opening the certificate and selecting Install Certificate.
  • The extractor machine lacks up-to-date certificates. In this case, the machine is likely far behind on Windows updates. This is a severe security vulnerability. You should always keep up to date on Windows updates.

Source system security

Security when connecting to the extractor sources vary a great deal between different extractors. Some extractors use TLS, in which case the recommendations above hold. Other extractors have more specialized security features for the specific source. Read extractor documentation carefully to determine if any action is necessary to ensure a secure connection.