# Create extraction pipelines

This article explains how you create extraction pipelines for all types of extractors you want to monitor via the Extraction pipelines page. You need to add Data set, Name and External ID on the Create extraction pipeline page, then you can add or edit additional information later on the Extraction pipelines overview page.

In this article:

# Before you start

  • A data set must exist for the data you want to add to an extraction pipeline.

  • Navigate to Manage & Configure > Manage access and set any of these capabilities for users and extractors:

Capability Action Description Best practice
extractionpipelines read Gives access to list and view metadata of the pipeline. Align access with users that have read access to the data set used by the extraction pipeline.
extractionpipelines write Gives access to create and edit individual pipelines and edit notification settings. Align access with users that have write access to the data set used by the extraction pipeline. Write access is intended for data engineers and data set owners, and you can scope this to a specific data set.
extractionruns read Gives access to view extractor run history reported by the individual extraction pipeline runs. Align access with users that have read access to the data set which is populated by this extraction pipeline.
extractionruns write Give this capability to extractors to allow for state and heartbeat reporting back to Cognite Data Fusion. Scope access to the specific extraction pipeline ID which represents a particular extractor, ensuring that statuses and errors can be reported only by that specific extractor.

# Create extraction pipelines

  1. Sign in to Cognite Data Fusion (opens new window).

  2. In the top menu,

    • navigate to Integrate > Monitor extraction pipelines, or

    • navigate to Manage & Configure > Create, view, and manage data sets. Then select a data set and open the Lineage tab to add a pipeline to the selected data set.

  3. Click Create extraction pipeline where you will be requested to fill in the mandatory fields for creating a pipeline.

  1. Click Create to open the Extraction pipeline overview. On this page, you can add additional information to give contexts and insights about the pipeline.

# Enable email notifications

Extraction pipelines enable owners and other stakeholders to receive email notifications. The notifications are triggered when the extraction pipeline reports a failed run to CDF.

You enter the email addresses that will be notified when you add Owner and other contacts for the extraction pipeline in the Contacts section of the Extraction pipeline page.

Set up email notificationss


Email notifications are only triggered when an extraction pipeline status changes state. This is to prevent multiple emails for ongoing incidents. For new incidents, emails are only sent for the first reported failed run and when the incident is resolved. Multiple reported failures in succession are ignored.

# Best practice for documenting extraction pipelines

It is good practice to enter comprehensive information about a pipeline to simplify troubleshooting and administration of pipelines. The minimum information you need to record is Data set, Name and External ID.

Monitor the pipeline status by setting up email notifications for failed pipelines and adding contact details for the pipeline owner and other stakeholders that need to be alerted about a failed pipeline. You'll find the switch for activating email notifications when you add a contact.

Select the schedule that is set up in the extractor to document how often extractor is expected to update the data in CDF and record the source system name and the Raw database tables to keep track of where your data is extracted from and ingested into.

Information specific to your organization can be added using the metadata fields with key/value pairs.

It is also very useful to enter all contexts and other insights about the pipeline to speed up troubleshooting issues. Enter free text using the Documentation field for this purpose. This will be displayed as a ReadMe section on the Extraction pipeline overview page. Make sure to keep this content updated at all times. You can format the text using Markdown (opens new window)

Last Updated: 10/5/2021, 2:24:37 PM