Data quality monitoring

Monitor time series data quality

When you rely on data to make operational decisions, it is critical that you know when the data is reliable and that end users know when they can rely on the data to make decisions.

Follow the steps below to monitor time series data quality continuously.

Step 1. Create the data kit and sub-kits

Create a data kit containing the time series that you want to monitor.

To specify different data quality requirements for different time series, you can group time series with similar requirements into sub-kits.

We recommend that you choose a suitable sample set of time series to represent the time series that your app relies on.

Note: Each data kit can contain a maximum of 500 time series and each sub-kit a maximum of 150 time series.

Step 2. Add data quality monitoring to sub-kits

Add data quality monitoring to a sub-kit to specify the data quality requirements for a group of time series.

  1. Open an existing data kit and then a sub-kit.

  2. Select + Add rules to specify the quality requirements for the time series in the sub-kit.

    Add rules to data kit

  3. Define the time window that matches the requirements of your data science model or application.

    The time window specifies how far back the data quality monitor should check that the selected time series meet the data quality requirements.

    Note: The time window has to be between 1 minute and 3 hours. We recommend that you specify a time window between 1 minute and 30 minutes.

    Select time window and data quality rules

  4. Select the data quality rules.

    Different models and apps have different data quality requirements and will need to be monitored for different aspects of data quality. Select which data quality rules to apply to the time series in each sub-kit:

    • Max age of the last data point - checks that the latency is acceptable for each of the time series in the sub-kit.
    • Max distance between data points - monitors the gap between any two data points in each of the time series in the sub-kit.
    • Min number of data points - checks that the number of data points in the defined time window is high enough for each of the time series in the sub-kit.
    • Max value - checks that the data points are below a max value for each of the time series in the sub-kit.
    • Min value - checks that the data points are above a minimum value for each of the time series in the sub-kit.
  5. Set up notifications.

    Specify email addresses or a webhook URL to notify if the data quality does not meet the requirements, and when data quality returns to normal.

    A webhook lets an app provide other applications with real-time information. You can use a tool like Opsgenie to receive the notification and pass it on to the relevant recipients through email, Slack, or other mediums.

    Select rules, time window and webhooks

  6. Select Apply.

Step 3. Report quality status in apps and models

If the data quality does not meet the requirements, the data quality monitor writes an event to Cognite Data Fusion. The event is of the type Data Quality Monitoring Alert, and includes a sub type corresponding to the type of rule that was broken:

  • min_count
  • max_age_of_last_data_point
  • max_value
  • min_value
  • max_distance_between_points

The metadata fields contains information about which data kit and sub-kit the event belongs to and which time series initially broke the rule. This allows you to report the data quality status in other apps and models.

To get the latest data quality status for a particular data kit, you can query events from CDF and filter on the type of events. Learn more about events.

For example, the request below returns all currently broken rules for a data kit with 322 as its dataKitId. You can see the IDs for a data kit and its sub-kits by opening the data kit in the Console.

URL: https://api.cognitedata.com/api/v1/projects/project-name/events/list

Body:

{
    "filter": {
        "type": "Data Quality Monitoring Alert",
        "metadata": {
        	"dataKitId": 322,
        	"isOpen": "true"
        }
    }
}
1
2
3
4
5
6
7
8
9

Response example:

{
   "startTime": 1575900039980,
   "type": "Data Quality Monitoring Alert",
   "subtype": "min_count",
   "description": "",
   "metadata": {
     "dataKitId": "322",
     "dimension": "min_count",
     "isOpen": "true",
     "subKitId": "312",
     "subKitName": "Untitled Sub-Kit",
   },
   "source": "Data Quality Monitoring in the Cognite Console",
   "id": 6740001776635791,
   "lastUpdatedTime": 1575900044311,
   "createdTime": 1575900044311
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

When the data quality is restored, the event is updated with an end time, and the “isOpen” metadata field is set to “false”.

The response also includes the subKitId and subKitName, so you can easily see which group of time series have broken rules.

Last Updated: 2/2/2020, 5:26:31 PM