Skip to main content

Set up extraction pipelines with remote configuration files

You can set up the CDF extraction pipelines to use versioned extractor configuration files stored in the cloud. To deploy your extractors, you must supply a minimal configuration file containing the sign-in credentials and pull the remaining configuration from the cloud. You can generate the configuration files in the cloud with continuous integration systems such as GitHub Actions or directly with the CDF API.

Before you start

Make sure the extractor has the extractionconfigs:WRITE capability to access CDF.

Configure extractors with GitHub Actions

  1. Configure your extractor with a minimal configuration file. Refer to the extractor documentation for details. All the Cognite extractors have a similar configuration. This is an example for the Cognite DB extractor:

    type: remote
    cognite:
    host: ${BASE_URL}
    project: ${PROJECT}
    idp-authentication:
    client-id: ${CLIENT_ID}
    secret: ${CLIENT_SECRET}
    token-url: ${TOKEN_URL}
    scopes:
    - ${BASE_URL}/.default
    extraction-pipeline:
    external-id: db-extractor-pipeline

    At startup, the extractor attempts to read the configuration files from the extraction pipeline with the external ID pipeline-external-id. The extractor continues to check for updates every few minutes.

  2. Create configuration files directly with the CDF API or set up a continuous integration pipeline in GitHub Actions.

    To use the action, you must create a GitHub Workflow:

    name: Update extractor configuration
    on:
    push

    jobs:
    deploy-configs:
    runs-on: ubuntu-latest
    name: Deploy Configs
    steps:
    - uses: actions/checkout@v2

    - name: get commit message
    id: commitmsg
    run: 'echo ::set-output name=commitmessage::$(git log --format=%B -n 1 ${{ github.event.after }})'

    - name: Deploy
    uses: cognitedata/upload-config-action@v1
    with:
    base-url: ${{ secrets.BASE_URL }}
    token-url: ${{ secrets.TOKEN_URL }}
    cdf-project-name: ${{ secrets.PROJECT }}
    client-id: ${{ secrets.CLIENT_ID }}
    client-secret: ${{ secrets.CLIENT_SECRET }}
    root-folder: 'root_dir/'
    deploy: 'true'
    revision-message: '${{ steps.commitmsg.outputs.commitmessage }}'
  3. Place the configuration files in the folder root-folder. The configuration file name must be identical to the external ID of the extraction pipeline. For instance, db-extractor-pipeline.yml.

    The extractor finds and runs the configuration file at startup.