Hosted extractors
Hosted extractors run inside Cognite Data Fusion (CDF) and are intended for live data streams with low latency.
A hosted extractor job reads from a source, transforms the data using a built-in format or a mapping, and writes to a destination.
In order to create a hosted extractor, you need the hostedextractors:READ
and hostedextractors:WRITE
capabilities.
You can use OpenID Connect and your existing identity provider (IdP) framework to manage access to CDF data securely. Read more.
Sources
A hosted extractor source represents an external source system on the internet. The source resource in CDF contains all the information the extractor needs to connect to the external source system.
A source can have many jobs, each streaming different data from the source system.
Jobs
A hosted extractor job represents the running extractor. Jobs produce logs and metrics that give the state of the job. A job can have nine different states:
State | Description |
---|---|
Paused | The job is temporarily stopped. |
Waiting to start | The job is temporarily stopped and pending start. This state typically only lasts a few seconds. |
Stopping | The job is running, but is supposed to stop. This state should only last for a few seconds at most. |
Startup error | The job failed to start and will not attempt to restart. Check the configuration settings for the job to resolve this state. |
Connection error | The job failed to connect to the source system and is currently retrying. |
Connected | The job is connected to the source system, but has not yet received any data. |
Transform error | The job is connected to the source system, received data but failed to transform and ingest the data into a CDF resource type. |
Destination error | The job successfully transformed data, but failed to ingest data into a CDF resource type. |
Running | The job is streaming data into CDF. |
Job metrics
Jobs report metrics on their current execution. The following metrics are currently reported:
Metric | Description |
---|---|
Source messages | The number of input messages received from the source system. |
Transform failures | The number of input messages that failed to transform. |
Destination input values | The number of messages that successfully transformed and were given to destinations for uploading to CDF. |
Destination requests | The number of requests made to CDF for this job. |
Destination write failures | The number of requests to CDF that failed for this job. |
Destination skipped values | The number of values that were invalid and were skipped before ingestion into CDF. |
Destination failed values | The number of values that were not written to CDF due to failed requests. |
Destination uploaded values | The number of values that were successfully ingested into CDF. |
Destinations
A hosted extractor writes to a destination. The destination only contains credentials for CDF.
Multiple jobs can share a single destination, in which case they will make requests together, reducing the number of requests made to CDF APIs. Metrics will still be reported individually.
Mapping
A mapping is a custom transformation, translating the source format to a format that can be ingested into CDF. Read more in Custom data formats for hosted extractors