Skip to main content

Introduction to time series

In Cognite Data Fusion (CDF), a time series is the resource type for indexing a series of data points in time order. Examples of a time series are the temperature of a water pump asset, the monthly precipitation in a location and the daily average number of manufacturing defects.

About time series

An asset can have several time series connected to it. For example, a water pump asset can have time series that measure the pump temperature, the pressure within the pump, rpm, flow volume, power consumption, and more.

Time series can be analyzed and visualized to draw inferences from the data, for example, to identify trends, seasonal movements, and random fluctuations. Other common uses of time series analysis include forecasting future values, for example, scheduling maintenance and controlling the series by adjusting parameters to optimize equipment performance.

A data point is a piece of information associated with a specific time, stored as a numerical or string value. Timestamps, defined in milliseconds in Unix Epoch time identify data points. We don't support fractional milliseconds, and don't count leap seconds.

Use the isString flag on the time series object to decide whether to store data points in a time series as numerical values or as string values.

  • Numerical data points can be aggregated to reduce the amount of data transferred in query responses and improve performance. You can specify one or more aggregates, for example, average, minimum, and maximum, and also the time granularity for the aggregates, for example, 1h for one hour.

    See Aggregating time series data to learn more about how Cognite Data Fusion aggregates and interpolates time series data, and see the details about the available aggregation functions.

  • String data points can store arbitrary information like states, for example, open or closed, or more complex information in JSON format. Cognite Data Fusion can not aggregate string data points.

Cognite Data Fusion stores discrete data points, but the underlying process measured by the data points can vary continuously. To interpolate between data points, use the isStep flag on the time series object to assume that each value stays the same until the next measurement (isStep), or that it linearly changes between the two measurements (not isStep).

tip

See the time series API documentation for more information about how to work with time series.

Get the data points from a time series

You can get data points from a time series by using the externalId or the id of the time series.

  1. To get data points from a time series by using the externalId, in this case outside_temperature, enter:

      POST /api/v1/projects/publicdata/timeseries/data/list
    Host: api.cognitedata.com
    api-key: <key>
    Content-Type: application/json
    content-length: 99

    {
    "items": [
    {
    "limit": 5,
    "externalId": "outside-temperature"
    }
    ]
    }

    The response will look similar to this:

    {
    "items": [
    {
    "isString": false,
    "id": 44435358976768,
    "externalId": "outside-temperature",
    "datapoints": [
    {
    "timestamp": 1349732232902,
    "value": 31.62889862060547
    },
    {
    "timestamp": 1349732244888,
    "value": 31.59380340576172
    },
    {
    "timestamp": 1349732245888,
    "value": 31.62889862060547
    },
    {
    "timestamp": 1349732258888,
    "value": 31.59380340576172
    },
    {
    "timestamp": 1349732259888,
    "value": 31.769287109375
    }
    ],
    "nextCursor": "wpnaLqNvdkOrsPd"
    }
    ]
    }

Get aggregate values between two points in time

To visualize or analyze a longer period, you can extract the aggregate values between two points in time. See Retrieve data points for valid aggregate functions and granularities.

  1. For example, to return the hourly average aggregate with a granularity of 1 hour, for the last 5 hours, for the outside_temperature time series, enter:

      POST /api/v1/projects/publicdata/timeseries/data/list
    Host: api.cognitedata.com
    api-key: <api-key>
    Content-Type: application/json

    {
    "items": [
    {
    "limit": 5,
    "externalId": "outside-temperature",
    "aggregates": ["average"],
    "granularity": "1h",
    "start": 1541424400000,
    "end":"now"
    }

    ]
    }

    The response will look similar to this:

    {
    "items": [
    {
    "id": 44435358976768,
    "externalId": "outside-temperature",
    "datapoints": [
    {
    "timestamp": 1541422800000,
    "average": 26.3535328292538
    },
    {
    "timestamp": 1541426400000,
    "average": 26.34716274449083
    },
    {
    "timestamp": 1541430000000,
    "average": 26.35558703492914
    },
    {
    "timestamp": 1541433600000,
    "average": 26.36287845690146
    },
    {
    "timestamp": 1541437200000,
    "average": 26.36948613080317
    }
    ],
    "nextCursor": "wpnaLqNvdkOrsPd"
    }
    ]
    }

Count number of time series matching filtering criteria

Count number of time series that match selected filtering criteria, such as being part of specific data set and follow naming convention for externalId.

POST /api/v1/projects/daitya/timeseries/aggregate HTTP/1.1
Host: https://api.cognitedata.com
api-key: <api-key>
content-type: application/json

{
"filter": {
"dataSetIds": [
{
"externalId": "Cognite data quality monitoring alerts and metrics"
}
],
"externalIdPrefix": "dq_monitor"
}
}

The response will look similar to this:

{
"items": [
{
"count": 273
}
]
}

Data point quality status codes

Beta

The features described in this section are currently in beta testing and are subject to change.

Time series data points display data quality status codes to help you determine how to treat uncertain data points and visualize data. The implementation follows the OPC UA standard, and both the Cognite OPC UA and PI extractors support this feature. Including status codes for data points can impact data retrieval and aggregation functions.

By default, the time series API only includes data points where the status code is Good and disregards other data points. To include data points with other status codes, you must pass additional parameters to the API.

This example has five data points. Three of the data points are good, one is uncertain, and one is bad:

{
"items": [
{
"externalId": "outside-temperature",
"datapoints": [
{ "timestamp": 1620000000000, "value": 1 },
{ "timestamp": 1620000000001, "value": 2, "status": { "code": 0 } },
{
"timestamp": 1620000000002,
"value": 3,
"status": { "symbol": "GoodClamped" }
},
{
"timestamp": 1620000000003,
"value": 4,
"status": { "symbol": "Uncertain" }
},
{
"timestamp": 1620000000004,
"value": 5,
"status": {
"code": 2153809152
}
}
]
}
]
}

The API supports two formats of status codes as defined in OPC-UA standard: the OPC-UA 32-bit binary encoding, and a text-based symbolic representation.

The full list of status codes is available here.

List data points using status codes

Include uncertain data points

This example passes additional parameters to include the uncertain data point.

{
"items": [
{
"externalId": "outside-temperature",
"includeStatus": true,
"treatUncertainAsBad": false,
}
]
}
r = client.post(
f"/api/v1/projects/{client.config.project}/timeseries/data/list",
json=json,
headers={"cdf-version": "20230101-beta"},
)
assert r.status_code == 200
datapoints = r.json()["items"][0]["datapoints"]
datapoints

Where:

  • includeStatus: True denotes that the status of the data point should be included in the response.

  • treatUncertainAsBad: False returns data points marked with the uncertain status code. The default behavior of the API is to treat them the same as bad data points and don't returned them.

Using the example data above, the response looks like this:

[
{ "timestamp": 1620000000000, "value": 1.0 },
{ "timestamp": 1620000000001, "value": 2.0 },
{
"timestamp": 1620000000002,
"value": 3.0,
"status": { "code": 3145728, "symbol": "GoodClamped" }
},
{
"timestamp": 1620000000003,
"value": 4.0,
"status": { "code": 1073741824, "symbol": "Uncertain" }
}
]

List all data points

This example passes additional parameters to include the uncertain and the bad data point.

{
"items": [
{
"externalId": "outside-temperature",
"includeStatus": true,
"ignoreBadDataPoints": false,
}
]
}
r = client.post(
f"/api/v1/projects/{client.config.project}/timeseries/data/list",
json=json,
headers={"cdf-version": "20230101-beta"},
)
assert r.status_code == 200
datapoints = r.json()["items"][0]["datapoints"]
datapoints

Where:

  • ignoreBadDataPoints: True denotes that the API should return bad data points. Because the API treats uncertain data points as bad by default, this parameter includes both uncertain and bad data points.

Using the example data from above, the response looks like this:

[
{ "timestamp": 1620000000000, "value": 1.0 },
{ "timestamp": 1620000000001, "value": 2.0 },
{
"timestamp": 1620000000002,
"value": 3.0,
"status": { "code": 3145728, "symbol": "GoodClamped" }
},
{
"timestamp": 1620000000003,
"value": 4.0,
"status": { "code": 1073741824, "symbol": "Uncertain" }
},
{
"timestamp": 1620000000004,
"value": 5.0,
"status": {
"code": 2153809152,
"symbol": "BadBrowseNameInvalid, StructureChanged, Low"
}
}
]

Get aggregate values using status codes

Get multiple aggregates from good data points

The example below requests five aggregates. The first three are simple counts of number of data points with each status code present and the average and discrete variance from only the good data points.

{
"items": [
{
"externalId": "outside-temperature",
"granularity": "1s",
"aggregates": ["count", "countBad", "countUncertain", "average", "min", "max"],
}
]
}
r = client.post(
f"/api/v1/projects/{client.config.project}/timeseries/data/list",
json=json,
headers={"cdf-version": "20230101-beta"},
)
assert r.status_code == 200
datapoints = r.json()["items"][0]["datapoints"]
pd.DataFrame(datapoints)

Using the example data from above, the response looks like this:

multiple aggregates from good data points
note

The example doesn't include parameters to return the uncertain or bad status codes. Therefore, the average calculations are only performed against values with a good status code.

Get multiple aggregates treating uncertain data points as good

{
"items": [
{
"externalId": "outside-temperature",
"granularity": "1s",
"aggregates": ["count", "countBad", "countUncertain", "average", "min", "max"],
"treatUncertainAsBad": false
}
]
}
r = client.post(
f"/api/v1/projects/{client.config.project}/timeseries/data/list",
json=json,
headers={"cdf-version": "20230101-beta"},
)
assert r.status_code == 200
datapoints = r.json()["items"][0]["datapoints"]
pd.DataFrame(datapoints)

Using the example data from above, the response looks like this:

multiple aggregates treating uncertain data points as good

Treating the uncertain data point as good increases the total count to four. Notice that also the average and max values have changed because of the additional good data point in the calculation.

tip

For more information, see the time series API documentation and the OPC-UA extractor documentation.

Best practices

Use the tips and best practices below to increase the throughput and query performance.

Message size/batching

Send many data points from the same time series in the same request. If you have room for more data points in the request, you can add more time series.

We prefer batch sizes up to 100k numeric data points or 10-100k string data points, depending on the string length (around 1 MB is good).

For each time series, group the data points in time. Ideally, different requests shouldn't have overlapping time series ranges. If data changes, update existing ranges.

Error handling

There are three error categories.

429 - Too many requests

To ensure that the API isn't overloaded and has enough capacity for other users on the server, you receive a 429 error code when you have too many concurrent requests to the API.

If you receive this error, reduce the number of concurrent threads or retry with capped exponential backoff. Try again after, for instance, 1 second. If you still get the error, try again after 2 seconds, followed by 4, 8, 10, 10 seconds (cap = 10 seconds).

You can try to scale up when you're no longer rate-limited with a 429.

NOTE

There's no fixed limit of requests before you receive 429. The limit depends on the complexity of the requests, other users on the server, and service performance.

5xx - Server errors

If you receive these errors, reduce the number of concurrent threads or retry with capped exponential backoff as described for 429 errors.

You may have found a bug if you receive repeated 500 - Internal Server Error. Notify support@cognite.com and include the request ID in the error message. You can scale up to regular operations when you no longer get the 5xx errors.

4xx - User errors

Usually, you don't want to retry 4xx errors. Instead, ensure that you're using the API correctly, that you've signed in with the correct user, and that the resources you try to access haven't been modified in a separate request.

Retries and idempotency

Most of the API endpoints, including all datapoint requests, are idempotent. You can send the same request several times without any effect beyond the first request.

When you modify or delete time series, next requests can fail harmlessly if the references have become invalid.

The only endpoint that's not idempotent is creating a time series without externalId. For each successful request, you generate a new time series. The recommendation is to always use externalId on your time series.

NOTE

We always apply the complete request or nothing at all. A 200 response indicates that we applied the complete request.