Skip to main content

Cognite REST extractor

Beta

The features described in this section are currently in beta testing and are subject to change.

The Cognite REST extractor is a generic extractor for reading data from REST APIs. The extractor runs on a fixed schedule, each time making one or more requests to a REST API and passing the data to hosted extractor mappings.

The REST extractor makes heavy use of the Cognite mapping language to implement custom pagination, incremental load, and to handle responses. See custom mappings for details on writing your mappings.

To set up a new REST extractor:

  1. You need hostedextractors:READ and hostedextractors:WRITE capabilities to create a REST extractor.

  2. Make sure the extractor has the following access capabilities in a Cognite Data Fusion (CDF) project:

  • timeseries:READ, timeseries:WRITE for writing datapoints and time series.

  • events:READ, events:WRITE, assets:READ for writing events.

  • raw:READ, raw:WRITE for writing raw rows, or

  • datamodelinstances:WRITE for nodes and edges.

    tip

    You can use OpenID Connect and your existing identity provider (IdP) framework to securely manage access to CDF data. For more information, see setup and administration for extractors.

  1. Navigate to Data management > Integrate > Extractors.

  2. Find the Cognite REST extractor and select Set up extractor.

About REST

REST is a generic system for creating APIs over HTTP. It imposes few rules, so creating a generic REST extractor is difficult. The Cognite REST extractor is a tool for you to create a simple extractor for a REST API with some expected limitations:

  • The REST extractor will only make requests to a single host. The exception is that the host specified under authentication may be different.

  • The REST extractor currently supports sending payloads as a valid JSON blob.

  • The extractor stores the last_run field as the singular state value between executions. This value indicates the most recent execution of the REST extractor in your project.

Generally, you should use the Cognite Extractor Utils library and develop a custom REST extractor for your source system when you need to make requests to multiple hosts, combine source data/information in complex ways, or read data from CDF,

If you just want to list out information from a REST API and ingest this into CDF resources, the hosted REST extractor is recommended.

Configuration

The extractor configuration consists of several parts required to successfully extract from a REST source.

Create the source

  1. host is the hostname or IP address the REST extractor will connect to. Enter only the hostname, not the full URL. For example, for https://api.cognite.com/v1/project/test/assets, the host portion is api.cognite.com.
  2. scheme is the type of connection to use, either http or https. Most public APIs use https.
  3. port is the port on the source server. Note that http and https typically use standard ports 80 and 443. In most cases, you can leave the port specification out unless your source system indicates a different port.
  4. CA Certificate lets you use a self-signed certificate by giving the public certificate of the certificate authority to the extractor.
  5. authentication lets you choose from one of several options used to authenticate requests to the server. Unlike manually setting headers, using the authentication option ensures that passwords and secrets are not visible when the source is read, and that they are stored with additional encryption.

Create the job

  1. interval represents how often the job should execute. You can pick any of the options from the fixed list of intervals.
  2. path is the path to the API endpoint portion of the URL. It does not include the host information or the query parameters. For https://api/cognite.com/v1/projects/test/assets?externalId=test the path portion will be /v1/projects/test/assets.
  3. method is the HTTP method for the request. The default is get. Sending a JSON body is only permitted if the method is post.
  4. body is the JSON body to send with post requests. When this contains information, the Content-Type header will also be added to the request.
  5. query is a list of key/value pairs for the initial query portion of the URL. For example, when using /v1/projects/test/assets?externalId=test&name=something, the query is { "externalId": "test", "name": "something" }.
  6. headers is the list of valid header key/value pairs representing the initial headers sent with the request.
  7. incremental load is used to build the initial request based on previous runs of the extractor. Incremental load describes the configuration details. Mapping for incremental load is applied to the extractor startup. Incremental load avoids reading the same data every time the extractor runs.
  8. pagination is used during a single extractor run to paginate through data received from the source. Pagination describes the configuration details. The pagination mapping is applied once for each request after the first and is used to determine whether to execute more queries.

Mapping context

Like other hosted extractors, all mappings involved with hosted extractors are passed a context object. This uses the following format:

{
"last_run": 12345678, // Timestamps since 01/01/1970 when this extractor last ran.
"response": {
"headers": { // List of headers in the last response.
"key": "value"
},
"status": 201 // Last response message, will always be 2xx.
},
"request": {
"path": "/last/path", // Last request path.
"host": "myhost.com", // Last request host.
"headers": { // List of headers in last request.
"key": "value"
},
"query": { // List of query parameters in last request.
"key": "value"
},
"method": "get", // Last request method.
"body": {} // Last request body
}
}
note

response will be empty for incremental load mappings, since there is no prior response when those are executed.

Incremental load

Incremental load is applied to extractor startup. It may apply to query parameters, headers, or the body. The incremental load mapping is given only context as described in the mapping context, without anything in the response section.

If the incremental load is set to use body, the result of the mapping will set the next message body. If you want to only modify the body given in the extractor configuration, you can use a mapping on the following form:

{
...context.request.body,
...{
"set-this-field": "to-value"
}
}

If set to use queryParam or header, the result will overwrite only the configured query parameter or header value.

Example, for the CDF Assets API:

type: body

{
"filter": {
...context.request.body.filter,
"lastUpdatedTime": {
"min": try_int(context.last_run, 0)
}
}
}

Pagination

Pagination is applied after each request and used to determine whether the extractor should make more requests to the source. The pagination mapping is given the context as described in the mapping context, and the response body in body, is converted to JSON.

If a pagination request returns the same value as the previous request or null, the extractor will stop paginating. This usually means you need to make sure to return null from the mapping if a cursor or similar is null.

If pagination is set to use body, the result of the mapping will set the next message body. If you want to just modify the body given in extractor configuration or in incremental load, you can create a mapping using the following format:

{
...context.request.body,
...{
"set-this-field": "to-value"
}
}

If set to use queryParam or header, the result will overwrite only the configured query parameter or header value.

Example, for the CDF Assets API:

type: body

if(body.nextCursor, {
...context.request.body,
"cursor": body.nextCursor
}, null)

This will set cursor in the body if nextCursor is set in the response and return null to end pagination otherwise.