Time series

In Cognite Data Fusion, a time series is the resource type for indexing a series of data points in time order. Examples of a time series are the temperature of a water pump asset, the monthly precipitation in a location and the daily average number of manufacturing defects.

Watch this video for a quick introduction to time series:

In this article:

About time series

An asset can have several time series connected to it. A water pump asset can for example have time series that measure the pump temperature, the pressure within the pump, rpm, flow volume, power consumption, and more.

Time series can be analyzed and visualized to draw inferences from the data, for example to identify trends, seasonal movements and random fluctuations. Other common uses of time series analysis include to forecast future values, for example to schedule maintenance, and to control the series by adjusting parameters, for example to optimize the performance of equipment.

A data point is a piece of information associated with a specific time, stored as a numerical or string value. Data points are identified by their timestamps, defined in milliseconds in Unix Epoch time. We do not support fractional milliseconds, and do not count leap seconds.

Use the isString flag on the time series object to decide whether to store data points in a time series as numerical values or as string values.

  • Numerical data points can be aggregated to reduce the amount of data transferred in query responses and improve performance. You can specify one or more aggregates (for example average, minimum and maximum) and also the time granularity for the aggregates (for example 1h for one hour).

    See Aggregating time series data to learn more about how Cognite Data Fusion aggregates and interpolates time series data, and see the details about the available aggregation functions.

  • String data points can store arbitrary information like states (for example open or closed) or more complex information in JSON format. String data points can not be aggregated by Cognite Data Fusion.

Cognite Data Fusion stores discrete data points, but the underlying process measured by the data points can vary continuously. To interpolate between data points, use the isStep flag on the time series object to assume that each value stays the same until the next measurement (isStep), or that it linearly changes between the two measurements (not isStep).

TIP

See the time series API documentation for more information about how to work with time series.

Get the datapoints from a time series

You can get datapoints from a time series by using the externalId or the id of the time series.

  1. To get datapoints from a time series by using the externalId, in this case outside_temperature, enter:

      POST /api/v1/projects/publicdata/timeseries/data/list
      Host: api.cognitedata.com
      api-key: <key>
      Content-Type: application/json
      content-length: 99
      {
        "items": [
          {
            "limit": 5,
            "externalId": "outside-temperature"
          }
        ]
      }
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14

    The response will look similar to this:

    {
      "items": [
        {
          "isString": false,
          "id": 44435358976768,
          "externalId": "outside-temperature",
          "datapoints": [
            {
              "timestamp": 1349732232902,
              "value": 31.62889862060547
            },
            {
              "timestamp": 1349732244888,
              "value": 31.59380340576172
            },
            {
              "timestamp": 1349732245888,
              "value": 31.62889862060547
            },
            {
              "timestamp": 1349732258888,
              "value": 31.59380340576172
            },
            {
              "timestamp": 1349732259888,
              "value": 31.769287109375
            }
          ]
        }
      ]
    }
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31

Get aggregate values between two points in time

To visualize or analyze a longer time period, you can extract the aggregate values between two points in time. See Retrieve data points for valid aggregate functions and granularities.

  1. For example, to return the hourly average aggregate with a granularity of 1 hour, for the last 5 hours, for the outside_temperature time series, enter:

      POST /api/v1/projects/publicdata/timeseries/data/list
      Host: api.cognitedata.com
      api-key: <api-key>
      Content-Type: application/json
      {
        "items": [
          {
            "limit": 5,
            "externalId": "outside-temperature",
            "aggregates": ["average"],
            "granularity": "1h",
            "start": 1541424400000,
            "end":"now"
          }
        ]
      }
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18

    The response will look similar to this:

    {
      "items": [
        {
          "id": 44435358976768,
          "externalId": "outside-temperature",
          "datapoints": [
            {
              "timestamp": 1541422800000,
              "average": 26.3535328292538
            },
            {
              "timestamp": 1541426400000,
              "average": 26.34716274449083
            },
            {
              "timestamp": 1541430000000,
              "average": 26.35558703492914
            },
            {
              "timestamp": 1541433600000,
              "average": 26.36287845690146
            },
            {
              "timestamp": 1541437200000,
              "average": 26.36948613080317
            }
          ]
        }
      ]
    }
    
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30

Synthetic time series

You can combine input time series, constants, and operators to create new synthetic time series.

You can, for example, use the expression 24 * TS{externalId='production/hour'} to convert from hourly to daily production rates.

Also, you can combine time series: TS{id=123} + TS{externalId='my_external_id'}, use functions with time series sin(pow(TS{id=123}, 2)), and aggregate time series TS{id=123, aggregate='average', granularity='1h'}+TS{id=456}. See below for a list of supported functions and aggregates.

TIP

See the synthetic time series API documentation for more information about how to work with synthetic time series.

Supported functions

Synthetic time series support these functions:

  • Inputs with internal or external ID. The external ID must be surrounded by quotes (single or double).
  • Mathematical operators +,-,*,/.
  • Grouping with brackets ().
  • Trigonometrics. sin(x), cos(x) and pi().
  • ln(x), pow(base, exponent), sqrt(x), exp(x), abs(x).
  • Variable length functions. max(x1, x2, ...), min(...), avg(...).
  • round(x, decimals) (-10<decimals<10).
  • on_error(expression, default), for handling errors like overflow or division by zero.
  • map(expression, [list of strings to map from], [list of values to map to], default).

Supported aggregates

You define aggregates similar to time series inputs, but aggregates have two extra parameters: aggregate and granularity. Aggregate must be one of interpolation, stepinterpolation, or average. Granularity must be on the form NG where N is a number, and G is s, m, h, or d. (second, minute, hour, day).

You must enclose the value of aggregate and granularity in quotes.

See also: Aggregating time series data.

Output granularity

We return datapoints at any point in time where:

  • Any input time series has an input.
  • All time series are defined (between the first and last datapoint, inclusive)

For example, if time series A has data at time 15, 30, 45, 60, and time series B has data at times 30, 40, 50, A+B will have data at 30, 40, 45 and 50.

Aggregates have data at every granularity time, rounded to multiples of granularity since epoch. For example, 60m aggregates have data points at 00.00, 01.00, and so on, even if the start time is 00.15. This differs from retrieving aggregate datapoints from the non-synthetic endpoint, where we round to multiples of the granularity unit and use an arbitrary offset.

Interpolation

If we don't have input data, we interpolate. As a general rule, we use linear interpolation: we find the previous and next datapoint and draw a straight line between these. The other case is step interpolation: we use the value of the previous datapoint.

We interpolate any interval. If we have datapoints in 1971 and 2050, we define the time series for all timestamps in between.

Most aggregates are constant for the whole duration of granularity. The only exception is the interpolation aggregate on non-step time series.

String inputs

The map() function can handle time series of type string and convert strings to doubles. If, for example, a time series for a valve can have the values "OPEN" or "CLOSED", you can convert it to a number with:

map(TS{externalId='stringstate'}, ['OPEN', 'CLOSED'], [1, 0], -1)
1

"OPEN" is mapped to 1, "CLOSED" to 0, and everything else to -1.

Aggregates on string time series is currently not supported. All string time series are considered to be stepped time series.

Error handling

There are three possible errors:

  • TYPE_ERROR: You're using a bad type as input. For example, using a string time series with the division operator.
  • BAD_DOMAIN: You're using bad input ranges. For example, division by zero, or sqrt of a negative number.
  • OVERFLOW: The result is more than 10^100 in absolute value.

Instead of returning a value for these cases, CDF returns an error field with an error message. To avoid these, you can wrap the (sub)expression in the on_error() function, for example. on_error(1/TS{externalId='canBeZero'}, 0). Note that, because of interpolation, this can happen even if none of the raw datapoints are zero.

Limits

In a single request, you can ask for:

  • 10 expressions (10 synthetic time series).
  • 10.000 data points, summed over all expressions.
  • 100 input time series referrals, summed over all expressions.
  • 2000 tokens in each expressions. Similar to 2000 characters, except that words and numbers are counted as a single character.

If you use time series aggregates as inputs, and you get slow responses or a 503 error code, try again or reduce the limit. This is the expected behavior when you have recently updated the data points, and we need to recalculate aggregates.

Last Updated: 6/29/2020, 3:55:18 PM