Skip to main content

Hosted extractors

Hosted extractors run inside Cognite Data Fusion (CDF) and are intended for live data streams with low latency.

A hosted extractor job reads from a source, transforms the data using a built-in format or a mapping, and writes to a destination.

In order to create a hosted extractor, you need the hostedextractors:READ and hostedextractors:WRITE capabilities.

tip

You can use OpenID Connect and your existing identity provider (IdP) framework to manage access to CDF data securely. Read more.

Sources

A hosted extractor source represents an external source system on the internet. The source resource in CDF contains all the information the extractor needs to connect to the external source system.

A source can have many jobs, each streaming different data from the source system.

Jobs

A hosted extractor job represents the running extractor. Jobs produce logs and metrics that give the state of the job. A job can have nine different states:

StateDescription
PausedThe job is temporarily stopped.
Waiting to startThe job is temporarily stopped and pending start. This state typically only lasts a few seconds.
StoppingThe job is running, but is supposed to stop. This state should only last for a few seconds at most.
Startup errorThe job failed to start and will not attempt to restart. Check the configuration settings for the job to resolve this state.
Connection errorThe job failed to connect to the source system and is currently retrying.
ConnectedThe job is connected to the source system, but has not yet received any data.
Transform errorThe job is connected to the source system, received data but failed to transform and ingest the data into a CDF resource type.
Destination errorThe job successfully transformed data, but failed to ingest data into a CDF resource type.
RunningThe job is streaming data into CDF.

Job metrics

Jobs report metrics on their current execution. The following metrics are currently reported:

MetricDescription
Source messagesThe number of input messages received from the source system.
Transform failuresThe number of input messages that failed to transform.
Destination input valuesThe number of messages that successfully transformed and were given to destinations for uploading to CDF.
Destination requestsThe number of requests made to CDF for this job.
Destination write failuresThe number of requests to CDF that failed for this job.
Destination skipped valuesThe number of values that were invalid and were skipped before ingestion into CDF.
Destination failed valuesThe number of values that were not written to CDF due to failed requests.
Destination uploaded valuesThe number of values that were successfully ingested into CDF.

Destinations

A hosted extractor writes to a destination. The destination only contains credentials for CDF.

Multiple jobs can share a single destination, in which case they will make requests together, reducing the number of requests made to CDF APIs. Metrics will still be reported individually.

Mapping

A mapping is a custom transformation, translating the source format to a format that can be ingested into CDF. Read more in Custom data formats for hosted extractors