Establish deployment workflows
TThe main goal of establishing deployment workflows is to promote changes to production frequently. This will result in faster feedback, reduced risk due to smaller changes being promoted, more efficient development processes, and encourage a culture of continuous improvement.
To deploy Cognite Data Fusion (CDF), we recommend that you use the Agile methodology and establish DevOps and DataOps workflows to manage the implementation.
-
DevOps is the combination of people, processes, and tools that enable you to rapidly deliver high-quality solutions and value from software development (Dev) and IT operations (Ops).
-
DataOps (Data Operations) is a set of tools and practices to manage your data's lifecycle through collaboration and automation.
DevOps
To automate how you build and test your code, we recommend that you run continuous integration (CI) every time a team member commits changes to the version control system.
Continuous delivery (CD) is the preferred method to test, configure, and deploy from one environment to the next.
Continuous integration (CI)
The continuous integration (CI) process automates the building and testing of code. For each small task they complete, developers merge their code and configuration changes into a shared version control repository. This triggers the integration process to build, test, and validate the full branch that the code is a part of.
CI helps you identify defects and bugs early, making them less expensive to fix. Another benefit is that CI enables you to shorten the release cycle and improve your release pipeline by deploying more frequently.
Teams typically use modern version control systems such as Git to organize their code and isolate their work. Depending on the type of component they're working on and the development phase they're in, each development team establishes policies, strategies, documentation, and automated testing to ensure that they keep a consistent build quality.
Continuous deployment (CD)
Continuous delivery (CD) is the process to test, configure, and promote from a development to a production environment. By establishing successive testing and staging areas, you create a release pipeline with automatic setup of infrastructure and deployment of new builds. All environments use the same type and version of infrastructure components and allow you to test solutions in production-like environments early in the development cycle.
The CD process starts from the continuous integration (CI), and the release pipeline promotes code from one area to the next as it meets the necessary quality requirements. Promoting code from one area to the next can also depend on approval and sign-off from a decision-maker.
For a production grade Cognite Data Fusion (CDF) implementation that contain solutions with high requirements on availability and reliability, we recommend that you set up a three separate areas in your release pipeline:
-
Development
This is the area where the development team write the code and configure the solution. In this area, the data characteristics are the most critical factor, and the volume of data is less critical. All code and configuration go through mandatory testing. When a single change or a whole module has been validated and passes the requisite test criteria, the CI/CD pipelines promote it to the Testing environment.
-
Testing
This area is where the subject matter experts and other stakeholders do rigorous acceptance testing of new code and configurations. Ideally, they test a single change or module at the time, but depending on their bandwidth, they may test several changes or modules concurrently. In this area, the data should include all the production data sets. Key representatives from the development team should have access to this area to validate (or debug) modules with production data.
-
Production
This is the area where end-users of the solution work with live data. If necessary, representatives from the development team can have access to this area to monitor the solution.
DataOps
When you rely on data to make operational decisions, you must know when the data is reliable and that end-users understand when they can depend on the data to make decisions.
Tools like extractors, transformations, data sets, contextualization, and machine learning models allow people from different disciplines across your organization to work together to establish, automate, and continuously optimize your data management and decision making practices.
When deploying Cognite Data Fusion (CDF), we recommend that you use CDF's robust data operations features combined with you preferred DevOps tools to ensure that the implementation teams have access to source data representing the final production environment in all project stages. The feature set from CDF will, for example, provide improved lineage, observability, and alerting mechanisms if there are failures in your data pipeline.
For some part of your data operations you will want to complement the CDF tools with your preferred toolset. This may be for version handling of contextualization, deployment pipelines for transformations, or monitoring of a custom data cleansing step. CDF integrates well with many such tools using our API or connectors.