> ## Documentation Index
> Fetch the complete documentation index at: https://docs.cognite.com/llms.txt
> Use this file to discover all available pages before exploring further.

# REST extractor

> Learn how to use the Cognite REST extractor to read data from REST APIs on a fixed schedule with custom mappings and pagination.

The REST extractor uses the Cognite mapping language to implement custom pagination, incremental load, and to handle responses. See [custom mappings](/cdf/integration/guides/extraction/hosted_extractors/hosted_extractors_custom_mappings) for details on writing your mappings.

## Before you start

* Assign [access capabilities](/cdf/access/guides/capabilities#cognite-rest-extractor) to *create* an hosted REST extractor and for the extractor to *write* data points, time series, events, RAW rows, and instances in data models in the target CDF project.

<Tip>
  You can use OpenID Connect and your existing identity provider (IdP) framework to securely manage access to CDF data. For more information, see [setup and administration for extractors](/cdf/integration/guides/extraction/admin_oidc).
</Tip>

## Deploy the extractor

<Steps>
  <Step title="Navigate to extractors">
    Navigate to <span class="ui-element">Data fusion</span> > <span class="ui-element">Integrate</span> > <span class="ui-element">Extractors</span>.
  </Step>

  <Step title="Set up the extractor">
    Find the **Cognite REST extractor** and select **Set up extractor**.
  </Step>
</Steps>

## Limitations

REST is a generic system for creating APIs over HTTP. REST imposes few rules, and the Cognite REST extractor is a *tool* for you to create a simple extractor for a REST API with some expected limitations:

* The REST extractor will only make requests to a single host. The exception is that the host specified under `authentication` may be different.

* The REST extractor currently supports sending payloads as a valid JSON blob.

* The extractor stores the `last_run` field as the singular state value between executions. This value indicates the most recent execution of the REST extractor in your project.

Generally, you should use the [Cognite Extractor Utils](https://cognite-extractor-utils.readthedocs-hosted.com/en/latest) library and develop a custom REST extractor for your source system when you need to make requests to multiple hosts, combine source data/information in complex ways, or read data from CDF.

If you just want to list out information from a REST API and ingest this into CDF resources, the hosted REST extractor is recommended.

## Configure the extractor

The extractor configuration consists of several parts required to extract from a REST source.

<Steps>
  <Step title="Create the source">
    * **host** is the hostname or IP address the REST extractor will connect to. Enter only the hostname, not the full URL. For example, for `https://api.cognite.com/v1/project/test/assets`, the **host** portion is `api.cognite.com`.
    * **scheme** is the type of connection to use, either `http` or `https`. Most public APIs use `https`.
    * **port** is the port on the source server. Note that `http` and `https` typically use standard ports `80` and `443`. In most cases, you can leave the port specification out unless your source system indicates a different port.
    * **CA Certificate** lets you use a self-signed certificate by giving the public certificate of the certificate authority to the extractor.
    * **authentication** lets you choose from one of several options used to authenticate requests to the server. Unlike manually setting **headers**, using the authentication option ensures that passwords and secrets are not visible when the source is read, and that they are stored with additional encryption.
  </Step>

  <Step title="Create the job">
    * **interval** represents how often the job should run. You can pick any of the options from the fixed list of intervals.
    * **path** is the *path* to the API endpoint portion of the URL. It does not include the host information or the  query parameters. For `https://api/cognite.com/v1/projects/test/assets?externalId=test` the **path** portion will be `/v1/projects/test/assets`.
    * **method** is the HTTP *method* for the request. The default is `get`. Sending a JSON body is only permitted if the method is `post`.
    * **body** is the JSON body to send with `post` requests. When this contains information, the `Content-Type` header will also be added to the request.
    * **query** is a list of key/value pairs for the initial *query* portion of the URL. For example, when using `/v1/projects/test/assets?externalId=test&name=something`, the query is `{ "externalId": "test", "name": "something" }`.
    * **headers** is the list of valid header key/value pairs representing the initial headers sent with the request.
    * **incremental load** is used to build the initial request based on *previous* runs of the extractor. [Incremental load](/cdf/integration/guides/extraction/rest#incremental-load) describes the configuration details. Mapping for incremental load is applied to the extractor startup. Incremental load avoids reading the same data every time the extractor runs.
    * **pagination** is used during a single extractor run to paginate through data received from the source. [Pagination](/cdf/integration/guides/extraction/rest#pagination) describes the configuration details. The pagination mapping is applied once for each request after the first and is used to determine whether to run more queries.
  </Step>
</Steps>

## Mapping context

Like other hosted extractors, all mappings involved with hosted extractors are passed a `context` object. This uses the following format:

```json theme={"languages":{"custom":["/_languages/kuiper.json","../_languages/kuiper.json"]}}
{
    "last_run": 12345678, // Timestamps since 01/01/1970 when this extractor last ran.
    "response": {
        "headers": { // List of headers in the last response.
            "key": "value"
        },
        "status": 201 // Last response message, will always be 2xx.
    },
    "request": {
        "path": "/last/path", // Last request path.
        "host": "myhost.com", // Last request host.
        "headers": { // List of headers in last request.
            "key": "value"
        },
        "query": { // List of query parameters in last request.
            "key": "value"
        },
        "method": "get", // Last request method.
        "body": {} // Last request body
    }
}
```

<Note>
  `response` will be empty for `incremental load` mappings, since there is no prior response when those are run.
</Note>

## Incremental load

Incremental load is applied to extractor *startup*. It may apply to query parameters, headers, or the body. The incremental load mapping is given only `context` as described [in the mapping context](#mapping-context), without anything in the `response` section.

If the incremental load is set to use `body`, the result of the mapping will *set* the next message body. If you want to only modify the body given in the extractor configuration, you can use a mapping on the following form:

```
{
    ...context.request.body,
    ...{
        "set-this-field": "to-value"
    }
}
```

If set to use `queryParam` or `header`, the result will overwrite only the configured query parameter or header value.

Example, for the CDF Assets API:

`type: body`

```
{
    "filter": {
        ...context.request.body.filter,
        "lastUpdatedTime": {
            "min": try_int(context.last_run, 0)
        }
    }
}
```

## Pagination

Pagination is applied after each request and used to determine whether the extractor should make more requests to the source. The pagination mapping is given the `context` as described [in the mapping context](#mapping-context), and the response body in `body`, is converted to JSON.

If a pagination request returns the *same* value as the previous request or `null`, the extractor will stop paginating. This usually means you need to make sure to return `null` from the mapping if a cursor or similar is `null`.

If pagination is set to use `body`, the result of the mapping will *set* the next message body. If you want to just modify the body given in extractor configuration or in `incremental load`, you can create a mapping using the following format:

```
{
    ...context.request.body,
    ...{
        "set-this-field": "to-value"
    }
}
```

If set to use `queryParam` or `header`, the result will overwrite only the configured query parameter or header value.

Example, for the CDF Assets API:

`type: body`

```
if(body.nextCursor, {
    ...context.request.body,
    "cursor": body.nextCursor
}, null)
```

This will set `cursor` in the body if `nextCursor` is set in the response and return `null` to end pagination otherwise.
