Reliability
We've designed Cognite Data Fusion (CDF) to ensure that it's available to end-users during incidents and to ensure quick recovery from failures.
High availability
We have designed each CDF component with high availability to avoid single points of failure and reduce the effects of infrastructure maintenance. The goal is to eliminate the impact of incidents quickly and automatically and ensure that CDF continues to process requests, even during incidents.
We use native cloud platform features as much as possible to ensure high availability and resilience. Cognite Data Fusion (CDF) runs on an elastic cloud infrastructure that allows it to scale upwards and downwards with the system load—it autoscales. CDF autoscales both the compute capacity (throughput) and the storage capacity (volume).
For deployment, we use native Kubernetes self-healing and scalable functionality to do rolling updates with no downtime.
On-premises components
If the data is critical for production solutions or there is a risk of data loss, on-premises components, like extractors, should run in a high-availability hosting environment. To configure the extractors for high availability, we recommend installing extractors in mutually redundant environments and setting up the necessary failover. For example, if you have extractors in three different environments, you can:
-
Configure each extractor with an active instance in its primary environment and a passive instance in one of the other environments. Allow failover if one of the environments becomes unavailable.
-
Configure each extractor with an active instance in its primary environment and a passive instance in the other environments. Allow failover if two of the environments become unavailable.
CDF automatically resolves conflicts if it receives duplicate data from multiple streaming extractors. Batch extractors automatically backfill any missing data when they're brought online after a failure or migration.