About the Cognite PI extractor

The PI points in the PI Data Archive correspond to the time series in CDF.

How the extractor processes data

The Cognite PI extractor extracts the time series data from the PI Data Archive according to your filters, creates corresponding time series in CDF, and then feeds all PI points to the time series in CDF. The extractor supports CDF data sets, and each extractor can push data into one data set. In CDF, the time series get an external ID composed of a configurable prefix and the PI point name. You can configure the extractor to use the PI point ID if the PI points don’t have unique names.

Extraction modes

The Cognite PI extractor runs one or more of these tasks concurrently:

Backfill - Fills in historical data backward from the first data point in CDF.
Frontfill - Fills in historical data forward from the last data point in CDF.
Streaming - Adds data points to CDF in real-time.

The first time you run the extractor, it creates all the necessary time series in CDF, streams data points from the current time, and backfills the time series with historical data. If the extractor reruns after a period of downtime, it resumes the backfill task and starts a frontfill task to fill in the gap between when the extractor stopped and the current time. When the frontfill task has caught up, the extractor returns to streaming live data points. The extractor maintains an extraction state for the time range between the first and last data point inserted into CDF. Only the streaming task can insert data points in CDF within this range. Any changes to historical values already existing in CDF will only be updated in CDF when the extractor is streaming data.

The PI extractor can miss data updates on the PI server. For instance, if the extractor restarts while the PI server gets historical updates. Changes in PI within a time range that has already been extracted is picked up by streaming if the PI extractor is running while the change is made. If the PI extractor isn’t running while the change is made, the change isn’t replicated to CDF.

The backfill and frontfill tasks extract compressed data points from the PI Data Archive, while the streaming task extracts all data points. The compression settings of your PI system may impact the data quality for your data models.

CDF receives data points from streaming that may be later removed in PI by compression. For this reason, there may be more data points in CDF than in PI.

Time series processing

The extractor runs an update task on a schedule, by default, every 24 hours and when it reconnects to the PI server after downtime. The update task discovers new PI Points and adds the corresponding time series in CDF, including the metadata. If a PI Point has been deleted on the PI server, the extractor doesn’t remove the associated time series in CDF. The extractor logs the deletion and keeps track of the number of deleted PI Points. The extractor resets the number when it restarts or reconnects to the PI server. This is the mapping between PI Point attributes (metadata) and fields to time series attributes and metadata in CDF:

Source (PI Point)	Destination (CDF)
`engunits`	`Unit`
`descriptor`	`Description`
`PointType`	`IsString` The `PIPointType` maps to numeric (`IsString` = `false`) for `Digital`, `Float16/32/64`, `Int16/32` and `Timestamp`. All other types (including `String`) maps to `IsString` = `true`.
`Name`	`Name`
Attributes (filters below)	`MetaData`, every value converted to string
`<prefix>` + `Name`	`ExternalId`, `LegacyName`
`step` attribute equal to 1	`IsStep`

Internal bookkeeping attributes in the PI system aren’t included in the CDF time series metadata: creationdate, creator, changedate, changer, ptowner, ptgroup, ptaccess, ptsecurity, dataowner, datagroup, dataaccess, and datasecurity.

Data points processing

Each data point in the source has a source timestamp, a generic value, and a IsGood flag. The extractor sets the source timestamp in PI as the data point timestamp in CDF. Because the timestamp resolution in PI is higher than in CDF (milliseconds), multiple data points in PI can be mapped to the same timestamp in CDF. In the unlikely event that this should happen, the PI extractor can’t guarantee which data point it extracts to CDF. The IsGood flag in PI indicates that there is an issue reading the value from the sensor or control system. In these cases, the PI value may be a string description even though the PI Point has a numeric type. The extractor ignores data points with IsGood = false. For IsGood = true, it tries to convert the value to the CDF time series type. If the conversion fails, typically only for numeric time series in CDF, the data point is ignored. The extractor logs the number of ignored data points. The extractor ignores PI’s unit of measure (UOM) property per data point.

Data engineering

​How the extractor processes data

​Extraction modes

​Time series processing

​Data points processing

How the extractor processes data

Extraction modes

Time series processing

Data points processing