Skip to main content

Set up extraction pipelines

This article explains how you create extraction pipelines for all types of extractors you want to monitor via the Extraction pipelines page. You need to add data set, name, and external ID on the Create extraction pipeline page. You can add or edit additional information later on the Extraction pipelines overview page.

Before you start

  • A data set must exist for the data you want to add to an extraction pipeline.

  • Navigate to Manage & Configure > Manage access and set any of these capabilities for users and extractors:

    CapabilityActionDescriptionBest practice
    extractionpipelinesreadGives access to list and view metadata of the pipeline.Align access with users that have read access to the data set used by the extraction pipeline.
    extractionpipelineswriteGives access to create and edit individual pipelines and edit notification settings.Align access with users that have write access to the data set used by the extraction pipeline. Write access is intended for data engineers and data set owners, and you can scope this to a specific data set.
    extractionrunsreadGives access to view extractor run history reported by the individual extraction pipeline runs.Align access with users that have read access to the data set which is populated by this extraction pipeline.
    extractionrunswriteGive this capability to extractors to allow for state and heartbeat reporting back to Cognite Data Fusion.Scope access to the specific extraction pipeline ID which represents a particular extractor, ensuring that statuses and errors can be reported only by that specific extractor.

Create extraction pipelines

  1. Sign in to Cognite Data Fusion.

  2. In the top menu, navigate to Integrate > Monitor extraction pipelines, or to Manage and Configure > Create, view, and manage data sets.

    Then select a data set and open the Lineage tab to add a pipeline to the selected data set.

  3. Select Create extraction pipeline, where you will be requested to fill in the mandatory fields for creating a pipeline.

  1. Select Create to open the Extraction pipeline overview. On this page, you can add additional information to give contexts and insights about the pipeline.

Enable email notifications

Owners and other stakeholders can receive email notifications about the extraction pipelines. The notifications are triggered when:

  • An extraction pipeline reports a failed run to CDF.
  • An extraction pipeline with continuous data flow stops communicating with CDF. The notification is sent when a predefined time condition is reached.

In the Contacts section of the Extraction pipeline page, enter the email addresses that will be notified when you add Owner and other contacts for the extraction pipeline.

Set up email notifications
note

Email notifications are only sent when an extraction pipeline status changes state or CDF has not registered any communication with the pipeline after a predefined time condition. This is to prevent multiple emails for ongoing incidents.

For new incidents, emails are only sent for the first reported failed run and when the incident is resolved. Multiple reported failures in succession are ignored.

Best practice for documenting extraction pipelines

It is good practice to enter comprehensive information about a pipeline to simplify troubleshooting and administration of pipelines. The minimum information you need to record is Data set, Name, and External ID.

Monitor the pipeline status by setting up email notifications for failed and interrupted pipelines and add contact details for the pipeline owner and other stakeholders. You'll find the switch for activating email notifications when you add a contact.

Select the schedule set up in the extractor to document how often the extractor is expected to update the data in CDF and check the Last connected to make sure CDF and the extractor are communicating. It's useful to record the source system name and the RAW database tables to keep track of where your data is extracted from and ingested into.

Information specific to your organization can be added using the metadata fields with key/value pairs.

You can enter all contexts and other insights about the pipeline to speed up troubleshooting issues. Enter free text using the Documentation field for this purpose. This is displayed as a ReadMe section on the Extraction pipeline overview page. Make sure to keep this content updated at all times. You can format the text using Markdown.