Hosted extractors

Hosted extractors run inside Cognite Data Fusion (CDF), and are intended for live low latency data streams.

A hosted extractor job reads from a source, transforms the data using a built-in format or a mapping, and writes to a destination.

In order to create a hosted extractor, you need the hostedextractors:READ and hostedextractors:WRITE capabilities.

Tipp

You can use OpenID Connect and your existing identity provider (IdP) framework to manage access to CDF data securely. Read more.

Sources

A hosted extractor source represents an external source system on the internet. The source resource in CDF contains all the information the extractor needs to connect to the external source system.

A source can have many jobs, each streaming different data from the source system.

Jobs

A hosted extractor job represents the running extractor. Jobs produce logs and metrics that give the state of the job. A job can have nine different states:

State	Description
Paused	The job is temporarily stopped.
Waiting to start	The job is temporarily stopped and pending start. This state typically only lasts a few seconds.
Stopping	The job is running, but is supposed to stop. This state should only last for a few seconds at most.
Startup error	The job failed to start and will not attempt to restart. Check the configuration settings for the job to resolve this state.
Connection error	The job failed to connect to the source system and is currently retrying.
Connected	The job is connected to the source system, but has not yet received any data.
Transform error	The job is connected to the source system, received data but failed to transform and ingest the data into a CDF resource type.
Destination error	The job successfully transformed data, but failed to ingest data into a CDF resource type.
Running	The job is streaming data into CDF.

Job metrics

Jobs report metrics on their current execution. The following metrics are currently reported:

Metric	Description
Source messages	The number of input messages received from the source system.
Transform failures	The number of input messages that failed to transform.
Destination input values	The number of messages that successfully transformed and were given to destinations for uploading to CDF.
Destination requests	The number of requests made to CDF for this job.
Destination write failures	The number of requests to CDF that failed for this job.
Destination skipped values	The number of values that were invalid and were skipped before ingestion into CDF.
Destination failed values	The number of values that were not written to CDF due to failed requests.
Destination uploaded values	The number of values that were successfully ingested into CDF.

Destinations

A hosted extractor writes to a destination. The destination only contains credentials for CDF.

Multiple jobs can share a single destination, in which case they will make requests together, reducing the number of requests made to the Cognite APIs. Metrics will still be reported individually.

For permanent deployments, the destination should be given a dedicated set of credentials. These are the credentials of the extractor, meaning that they should only give access to resources that the extractor needs to access. The required credentials depend on the type of data the extractor produces. Depending on the configured mapping, this is one or more of the following: See Data formats for details on mappings.

Credentials

The CDF user creating an instance of a hosted extractor needs both the hostedextractors:READ and hostedextractors:WRITE capabilities.

Output type	Required ACLs
Datapoints	timeseries:READ and timeseries:WRITE. datamodelinstances:WRITE if you write to a data model extended from the `CogniteTimeSeries` data model type.
Timeseries	timeseries:READ and timeseries:WRITE. datamodelinstances:WRITE if you write to a data model extended from the `CogniteTimeSeries` data model type.
Events	events:READ and events:WRITE. assets:READ if you produce events linked to assets.
Raw	raw:WRITE
Data Models	datamodelinstances:WRITE

For extra security, you can scope the access of the extractor to the data sets/spaces it writes to.

Warning

Granting hostedextractors:WRITE to a user may cause a privilege escalation. Any user that has access to create hosted extractors also has access to all the resources any configured destinations have access to.

Mapping

A mapping is a custom transformation, translating the source format to a format that can be ingested into CDF. Read more in Custom data formats for hosted extractors

Sources​

Jobs​

Job metrics​

Destinations​

Credentials​

Mapping​