Cognite REST extractor
The Cognite REST extractor is a generic extractor for reading data from REST APIs. The extractor runs on a fixed schedule, each time making one or more requests to a REST API and passing the data to hosted extractor mappings.
The REST extractor makes heavy use of the Cognite mapping language to implement custom pagination, incremental load, and to handle responses. See custom mappings for details on writing your mappings.
To set up a new REST extractor:
-
You need
hostedextractors:READ
andhostedextractors:WRITE
capabilities to create a REST extractor. -
Make sure the extractor has the following access capabilities in a Cognite Data Fusion (CDF) project:
-
timeseries:READ
,timeseries:WRITE
for writing datapoints and time series. -
events:READ
,events:WRITE
,assets:READ
for writing events. -
raw:READ
,raw:WRITE
for writing raw rows, or -
datamodelinstances:WRITE
for nodes and edges.sfatYou can use OpenID Connect and your existing identity provider (IdP) framework to securely manage access to CDF data. For more information, see setup and administration for extractors.
-
Navigate to Data management > Integrate > Extractors.
-
Find the Cognite REST extractor and select Set up extractor.
About REST
REST is a generic system for creating APIs over HTTP. It imposes few rules, so creating a generic REST extractor is difficult. The Cognite REST extractor is a tool for you to create a simple extractor for a REST API with some expected limitations:
-
The REST extractor will only make requests to a single host. The exception is that the host specified under
authentication
may be different. -
The REST extractor currently supports sending payloads as a valid JSON blob.
-
The extractor stores the
last_run
field as the singular state value between executions. This value indicates the most recent execution of the REST extractor in your project.
Generally, you should use the Cognite Extractor Utils library and develop a custom REST extractor for your source system when you need to make requests to multiple hosts, combine source data/information in complex ways, or read data from CDF,
If you just want to list out information from a REST API and ingest this into CDF resources, the hosted REST extractor is recommended.
Configuration
The extractor configuration consists of several parts required to successfully extract from a REST source.
Create the source
- host is the hostname or IP address the REST extractor will connect to. Enter only the hostname, not the full URL. For example, for
https://api.cognite.com/v1/project/test/assets
, the host portion isapi.cognite.com
. - scheme is the type of connection to use, either
http
orhttps
. Most public APIs usehttps
. - port is the port on the source server. Note that
http
andhttps
typically use standard ports80
and443
. In most cases, you can leave the port specification out unless your source system indicates a different port. - CA Certificate lets you use a self-signed certificate by giving the public certificate of the certificate authority to the extractor.
- authentication lets you choose from one of several options used to authenticate requests to the server. Unlike manually setting headers, using the authentication option ensures that passwords and secrets are not visible when the source is read, and that they are stored with additional encryption.
Create the job
- interval represents how often the job should run. You can pick any of the options from the fixed list of intervals.
- path is the path to the API endpoint portion of the URL. It does not include the host information or the query parameters. For
https://api/cognite.com/v1/projects/test/assets?externalId=test
the path portion will be/v1/projects/test/assets
. - method is the HTTP method for the request. The default is
get
. Sending a JSON body is only permitted if the method ispost
. - body is the JSON body to send with
post
requests. When this contains information, theContent-Type
header will also be added to the request. - query is a list of key/value pairs for the initial query portion of the URL. For example, when using
/v1/projects/test/assets?externalId=test&name=something
, the query is{ "externalId": "test", "name": "something" }
. - headers is the list of valid header key/value pairs representing the initial headers sent with the request.
- incremental load is used to build the initial request based on previous runs of the extractor. Incremental load describes the configuration details. Mapping for incremental load is applied to the extractor startup. Incremental load avoids reading the same data every time the extractor runs.
- pagination is used during a single extractor run to paginate through data received from the source. Pagination describes the configuration details. The pagination mapping is applied once for each request after the first and is used to determine whether to run more queries.
Mapping context
Like other hosted extractors, all mappings involved with hosted extractors are passed a context
object. This uses the following format:
{
"last_run": 12345678, // Timestamps since 01/01/1970 when this extractor last ran.
"response": {
"headers": { // List of headers in the last response.
"key": "value"
},
"status": 201 // Last response message, will always be 2xx.
},
"request": {
"path": "/last/path", // Last request path.
"host": "myhost.com", // Last request host.
"headers": { // List of headers in last request.
"key": "value"
},
"query": { // List of query parameters in last request.
"key": "value"
},
"method": "get", // Last request method.
"body": {} // Last request body
}
}
response
will be empty for incremental load
mappings, since there is no prior response when those are run.
Incremental load
Incremental load is applied to extractor startup. It may apply to query parameters, headers, or the body. The incremental load mapping is given only context
as described in the mapping context, without anything in the response
section.
If the incremental load is set to use body
, the result of the mapping will set the next message body. If you want to only modify the body given in the extractor configuration, you can use a mapping on the following form:
{
...context.request.body,
...{
"set-this-field": "to-value"
}
}
If set to use queryParam
or header
, the result will overwrite only the configured query parameter or header value.
Example, for the CDF Assets API:
type: body
{
"filter": {
...context.request.body.filter,
"lastUpdatedTime": {
"min": try_int(context.last_run, 0)
}
}
}
Pagination
Pagination is applied after each request and used to determine whether the extractor should make more requests to the source. The pagination mapping is given the context
as described in the mapping context, and the response body in body
, is converted to JSON.
If a pagination request returns the same value as the previous request or null
, the extractor will stop paginating. This usually means you need to make sure to return null
from the mapping if a cursor or similar is null
.
If pagination is set to use body
, the result of the mapping will set the next message body. If you want to just modify the body given in extractor configuration or in incremental load
, you can create a mapping using the following format:
{
...context.request.body,
...{
"set-this-field": "to-value"
}
}
If set to use queryParam
or header
, the result will overwrite only the configured query parameter or header value.
Example, for the CDF Assets API:
type: body
if(body.nextCursor, {
...context.request.body,
"cursor": body.nextCursor
}, null)
This will set cursor
in the body if nextCursor
is set in the response and return null
to end pagination otherwise.