Hosted extractors
Hosted extractors run inside Cognite Data Fusion (CDF) and are intended for live data streams with low latency.
A hosted extractor job reads from a source, transforms the data using a built-in format or a mapping, and writes to a destination.
In order to create a hosted extractor, you need the hostedextractors:READ
and hostedextractors:WRITE
capabilities.
You can use OpenID Connect and your existing identity provider (IdP) framework to manage access to CDF data securely. Read more.
Sources
A hosted extractor source represents an external source system on the internet. The source resource in CDF contains all the information the extractor needs to connect to the external source system.
A source can have many jobs, each streaming different data from the source system.
Jobs
A hosted extractor job represents the running extractor. Jobs produce logs and metrics that give the state of the job. A job can have nine different states:
State | Description |
---|---|
Paused | The job is temporarily stopped. |
Waiting to start | The job is temporarily stopped and pending start. This state typically only lasts a few seconds. |
Stopping | The job is running, but is supposed to stop. This state should only last for a few seconds at most. |
Startup error | The job failed to start and will not attempt to restart. Check the configuration settings for the job to resolve this state. |
Connection error | The job failed to connect to the source system and is currently retrying. |
Connected | The job is connected to the source system, but has not yet received any data. |
Transform error | The job is connected to the source system, received data but failed to transform and ingest the data into a CDF resource type. |
Destination error | The job successfully transformed data, but failed to ingest data into a CDF resource type. |
Running | The job is streaming data into CDF. |
Job metrics
Jobs report metrics on their current execution. The following metrics are currently reported:
Metric | Description |
---|---|
Source messages | The number of input messages received from the source system. |
Transform failures | The number of input messages that failed to transform. |
Destination input values | The number of messages that successfully transformed and were given to destinations for uploading to CDF. |
Destination requests | The number of requests made to CDF for this job. |
Destination write failures | The number of requests to CDF that failed for this job. |
Destination skipped values | The number of values that were invalid and were skipped before ingestion into CDF. |
Destination failed values | The number of values that were not written to CDF due to failed requests. |
Destination uploaded values | The number of values that were successfully ingested into CDF. |
Destinations
A hosted extractor writes to a destination. The destination only contains credentials for CDF.
Multiple jobs can share a single destination, in which case they will make requests together, reducing the number of requests made to CDF APIs. Metrics will still be reported individually.
For permanent deployments, the destination should be given a dedicated set of credentials. These are the credentials of the extractor, meaning that they should only give access to resources that the extractor needs to access. The required credentials depend on the type of data the extractor produces. Depending on the configured mapping, this is one or more of the following: See Data formats for details on mappings.
Output type | Required ACLs |
---|---|
Datapoints | timeseries:READ and timeseries:WRITE. datamodelinstances:WRITE if you write to core data models. |
Timeseries | timeseries:READ and timeseries:WRITE |
Events | events:READ and events:WRITE. assets:READ if you produce events linked to assets. |
Raw | raw:WRITE |
Data Models | datamodelinstances:WRITE |
For extra security you will typically want to scope the access of the extractor to the datasets/spaces it writes to.
Granting hostedextractors:WRITE may cause a privilege escalation. Any user that has access to create hosted extractors implicitly has access to all the resources any configured destinations have access to.
Mapping
A mapping is a custom transformation, translating the source format to a format that can be ingested into CDF. Read more in Custom data formats for hosted extractors