Salt la conținutul principal

Hosted extractors

Hosted extractors run inside Cognite Data Fusion (CDF) and are intended for live data streams with low latency.

A hosted extractor job reads from a source, transforms the data using a built-in format or a mapping, and writes to a destination.

In order to create a hosted extractor, you need the hostedextractors:READ and hostedextractors:WRITE capabilities.

sfat

You can use OpenID Connect and your existing identity provider (IdP) framework to manage access to CDF data securely. Read more.

Sources

A hosted extractor source represents an external source system on the internet. The source resource in CDF contains all the information the extractor needs to connect to the external source system.

A source can have many jobs, each streaming different data from the source system.

Jobs

A hosted extractor job represents the running extractor. Jobs produce logs and metrics that give the state of the job. A job can have nine different states:

StateDescription
PausedThe job is temporarily stopped.
Waiting to startThe job is temporarily stopped and pending start. This state typically only lasts a few seconds.
StoppingThe job is running, but is supposed to stop. This state should only last for a few seconds at most.
Startup errorThe job failed to start and will not attempt to restart. Check the configuration settings for the job to resolve this state.
Connection errorThe job failed to connect to the source system and is currently retrying.
ConnectedThe job is connected to the source system, but has not yet received any data.
Transform errorThe job is connected to the source system, received data but failed to transform and ingest the data into a CDF resource type.
Destination errorThe job successfully transformed data, but failed to ingest data into a CDF resource type.
RunningThe job is streaming data into CDF.

Job metrics

Jobs report metrics on their current execution. The following metrics are currently reported:

MetricDescription
Source messagesThe number of input messages received from the source system.
Transform failuresThe number of input messages that failed to transform.
Destination input valuesThe number of messages that successfully transformed and were given to destinations for uploading to CDF.
Destination requestsThe number of requests made to CDF for this job.
Destination write failuresThe number of requests to CDF that failed for this job.
Destination skipped valuesThe number of values that were invalid and were skipped before ingestion into CDF.
Destination failed valuesThe number of values that were not written to CDF due to failed requests.
Destination uploaded valuesThe number of values that were successfully ingested into CDF.

Destinations

A hosted extractor writes to a destination. The destination only contains credentials for CDF.

Multiple jobs can share a single destination, in which case they will make requests together, reducing the number of requests made to CDF APIs. Metrics will still be reported individually.

For permanent deployments, the destination should be given a dedicated set of credentials. These are the credentials of the extractor, meaning that they should only give access to resources that the extractor needs to access. The required credentials depend on the type of data the extractor produces. Depending on the configured mapping, this is one or more of the following: See Data formats for details on mappings.

Output typeRequired ACLs
Datapointstimeseries:READ and timeseries:WRITE. datamodelinstances:WRITE if you write to core data models.
Timeseriestimeseries:READ and timeseries:WRITE
Eventsevents:READ and events:WRITE. assets:READ if you produce events linked to assets.
Rawraw:WRITE
Data Modelsdatamodelinstances:WRITE

For extra security you will typically want to scope the access of the extractor to the datasets/spaces it writes to.

Warning

Granting hostedextractors:WRITE may cause a privilege escalation. Any user that has access to create hosted extractors implicitly has access to all the resources any configured destinations have access to.

Mapping

A mapping is a custom transformation, translating the source format to a format that can be ingested into CDF. Read more in Custom data formats for hosted extractors