# PI replace utility metrics

You can configure the PI replace utility to check the data point quality in Cognite Data Fusion (CDF) against the data points in the PI Data Archive. If you include the `data-quality`

section, the differences between PI and CDF will be logged to the configured logger, sent to a Prometheus pushgateway using metrics, and written to a CSV file on exit using the report-file-name configuration parameter.

## Logger output

PI replace logs the differences between the PI Data Archive and CDF with warning levels:

[2020-07-30 08:15:01.938 INF] Pushing metrics to http://localhost:9091 with job name replace-pi-job

[2020-07-30 08:15:02.445 INF] Connecting to PI server

[2020-07-30 08:15:19.484 INF] Found 16549 time series in PI. The replace range is ("2020-07-28T00:00:00.0000000Z", "2020-07-29T00:00:00.0000000Z")

[2020-07-30 08:15:19.532 INF] 16549/16549 time series have not been processed.

[2020-07-30 08:15:19.551 INF] Progress will be stored with an interval of 10 secs.

[2020-07-30 08:15:20.061 WRN] TimeSeries1 (Numeric) - datapoint number difference: -2/9

[2020-07-30 08:15:20.074 WRN] TimeSeries2 (Numeric) - datapoint number difference: -11/31

[2020-07-30 08:15:22.960 WRN] TimeSeries3 - datapoint number difference: -6065/12042

[2020-07-30 08:18:02.830 INF] Completed replacing 1000/16549 time series in range (2020-07-28 00:00:00.000, 2020-07-29 00:00:00.000) (163.2766988 secs)

[2020-07-30 08:24:53.899 WRN] TimeSeries4 (Numeric) - datapoints with different values 3/7

[2020-07-30 08:26:28.255 WRN] TimeSeries5 has no datapoints in CDF. Should have 6

[2020-07-30 08:27:05.487 WRN] TimeSeries6 has no datapoints in CDF. Should have 6

[2020-07-30 08:41:00.468 INF] Completed replacing 2000/16549 time series in range (2020-07-28 00:00:00.000, 2020-07-29 00:00:00.000) (177.6383223 secs)

[2020-07-30 09:06:44.673 WRN] TimeSeries7 (Numeric) - datapoints with different values 3/8

where:

`TimeSeries1 (Numeric)`

is the data point number difference:`-2/9`

indicates 9 data points in the PI Data Archive, but 2 are missing from CDF.`TimeSeries4 (Numeric)`

is the data points' value difference.`3/7`

indicates that both CDF and PI have 7 data points, but 3 data points have different values in CDF compared to the data points in the PI Data Archive.`TimeSeries5`

has no data points in CDF. For instance:`Should have 6`

indicate that the time series`TimeSeries5`

exists in CDF but has no data points. The same time series has 6 data points in the PI Data Archive.

## CSV report

The PI replace utility writes a report with the differences between the PI Data Archive and CDF to a CSV file on exit:

where:

`TimeSeries: Name`

is the name of the time series. This is the PI Point name.`RangeStart`

indicates that data points with timestamps higher than or equal to this date and time were replaced.`RangeEnd`

indicates that data points with timestamps smaller than or equal to this date and time were replaced.`TotalDataPoints`

is the total number of data points in the PI Data Archive.`CountDifference`

is the number of data points in CDF subtracted from the number of points in the PI Data Archive.`ValueDifference`

is the number of data points in CDF that have different values than the data points in the PI Data Archive.`MaxDelta`

is the maximum difference of the numeric data point values. The individual data point delta percentage is calculated as the fraction: (Value in PI - Value in CDF) / Value in PI`AvgDelta`

is the average data point value difference.

## Metrics

PI replace can send metrics to a Prometheus pushgateway. The metrics can be displayed in a Grafana dashboard and monitored by CDF.

Name | Description |
---|---|

Heartbeat | The time since the last metrics push from the PI replace process. |

Estimated time left | The estimated time left until all time series are processed, based on the average iteration duration. |

# Time series | The total number of time series being processed. |

# Data points | The total number of data points processed until now. |

Time range left to cover | The time range left to query PI for data points. When it reaches 0, all time series have been replaced. |

Avg duration per 1000 | The average time it takes to replace 1000 time series in one iteration. The `step-hours` configuration parameter defines the time span of an iteration. |

Data points with different values | The percentage of the processed data points with different values in CDF compared to the PI Data Archive. |

Data point count difference | The percentage of the data points in the PI Data Archive that are missing from CDF. |

# Extractor incidents | The number and type (data loss or reconnection) of extractor incidents being handled. |

## Replace progress

Name | Description |
---|---|

Replace progress | The overall progress. `Replace to` is the goal timestamp (`start-time` configuration parameter) and `Replace progress` is the current iteration timestamp (step). The replace process is completed when the yellow line meets the green line. |

Iteration - # of time series replaced | The number of time series replaced in the current iteration (step-hours) and the replacing rate (time series/sec). |

Iteration - Duration per 1000 | The duration of replacing 1000 time series in one iteration (step). The dashed line is the estimated time left until all time series are replaced. |

## Performance

Name | Description |
---|---|

Process | The CPU usage and memory consumption of the PI replace process on the host machine. |

CDF Requests | The total number and latency of requests made to CDF. |

CDF Uploaded data points | The total number of data points uploaded to CDF and upload rate (data point/sec). |

## Data quality

Name | Description |
---|---|

Data points with different values | The total number of data points with different values and their rate (points/sec). |

Data point count difference | The total number of data points in the PI Data Archive that are missing from CDF and its rate (points/sec). |

Numeric data point delta | The average and maximum difference between numeric data points in the PI Data Archive and CDF, as a percentage. A `MaxDelta` of 100% means that one time series had at least one data point where ABS (Value in PI - Value in CDF) / Value in PI greater than Value in PI. |

## PI server

Name | Description |
---|---|

# PI connections | The number of times PI replace establishes a connection with the PI server. If this value is greater than 1, an error caused PI replace to reconnect to PI. |

PI query rate | The rate of queries (query/sec) to the PI server and the query latency. |

PI AF Values | The total number of good and bad AF Values obtained from PI and its rate (value/sec). Typically, the PI extractor converts 'good' AF Values to data points and uploads these to CDF. |