# Use Azure Data Factory with PostgreSQL gateway
Cognite supports ingestion using Microsoft Azure Data Factory (ADF).
This article shows how to create a simple data pipeline in Azure Data Factory to copy data from a file in Azure blob storage and ingest it to CDF. Based on your data source and your data processing requirements you may need to create more advanced data pipelines in Azure data factory (opens new window).
In this article:
# Before you start
Make sure you have an Azure subscription and have created Azure resource groups (opens new window).
Make sure you have created an Azure storage account (opens new window).
Make sure you have the file containing the data to be ingested to CDF in Azure blob storage.
# Step 1: Create a data factory
Sign in to the Azure portal (opens new window) as an admin.
If you have access to multiple tenants, use the Directory + subscription filter in the top menu to select the tenant in which you want to register a data factory.
Search for and select Data Factory.
Click Create to open the Data Factory page.
On the Basics tab:
- Subscription: Select the Azure subscription in which you want to create the data factory.
- Resource group: Create or select the Azure resource group you want to use.
- Region: Select a location where your ADF metadata will be stored. This list only shows locations that ADF supports.
- Name: Enter a globally unique name for the data factory.
- Version: Select V2.
Click Next: Git configuration and fill in the required fields or select Configure Git later.
Click Review + Create > Create.
After the creation is complete, open the data factory and select the Author & Monitor tile to start the Azure Data Factory application in a separate tab.
# Step 2: Create an ADF pipeline and set the source system
Use the Copy data tool to create a pipeline that reads data from a file in your data storage and writes to CDF.
On the Let's get started page, select the Create pipeline tile.
Open Activities > Move and transform.
Drag and drop the Copy data onto the canvas and enter a unique name in the Name field.
Open the Source tab > New.
Select the datastore that references the source system you're extracting from. In this example we'll extract data from a .csv file in Azure Blob Storage.
Create a new linked service that links your data store to the data factory.
Enter a unique name for the service.
Select your Azure subscription and storage account.
Click Test connection to make sure you are connected to the selected storage account.
Under Set properties, set the file you want to read data from.
Enter a unique name for the source data set.
Browse to the file you want to use.
Select First row as header to reuse the source file's header columns.
# Step 3: Set PostgreSQL gateway as sink
In this step, add the PostgreSQL gateway as the destination using a sink in ADF.
Open Copy data > Sink tab.
In the Sink dataset field, select New.
Select or search for Azure Database for PostgreSQL.
Click Open to open the sink data set.
On the Connection tab, add a linked service for the sink data set.
Click New to add a linked service.
Under Account selection method > Enter manually.
In the Fully qualified domain name field, enter
skywarp.<cluster>.cognite.aiwhere <cluster> is where your CDF instance is installed. If you don't know the cluster name, contact Cognite support.
In the Port field, enter 5432, which is the default port for the PostgreSQL gateway.
For Database name and User name, enter the username returned from the PostgreSQL gateway.
In the Password field, enter the password returned from the PostgreSQL gateway.
In the Encryption method field, select SSL.
Optionally, select Validate server certificate.
Click Test connection to make sure you are connected to the Azure database for PostgreSQL.
# Step 4: Set the destination database table
Back on the Connection tab, you need to select the destination table from the tables available in your PostgreSQL gateway database. You can choose between several resource types (opens new window) like assets, events, and data points.
- In the Table field, select the table matching the destination RAW Table or the CDF resource type
public.<cdf_resource_type>, for instance, public.assets or public.timeseries.
Navigate back to the Sink tab on your pipeline.
- Under Write method, select Bulk insert.
# Step 5: Map the source and destination tables
Open the Mapping tab on your pipeline.
Click Import schemas to import the table columns and remove the lines you won't use.
Map the columns according to your requirements.
Click Debug in the top bar to run the PostgreSQL gateway.
# Step 6: Verify your data
Verify that the configuration is successful.
- Sign in to Cognite Data Fusion (opens new window)
- In the top menu, select Explore & Build or Browse staged data and check that the data has been successfully ingested.