Salt la conținutul principal

Data sampling and validation for time series input

Data sampling configuration is crucial for fetching and validating time series input data in a routine. The system uses several time-related concepts to manage this process:

  • Run-time: This is user-defined and defaults to the time when the simulation is triggered or when the schedule is executed.
  • Validation window: User-defined and only relevant if data validation is enabled. Used to determine a time range before run time to search for valid time ranges to sample process data.
  • Validation end: Calculated and equal to the Run Time.
  • Validation start: Calculated as Validation End minus Validation Window.
  • Sampling window: User-defined and only relevant if data sampling is enabled. Used to determine a time range from which to sample process data.
  • Sampling end: If data validation is not enabled, this equals Run Time. Otherwise, it's determined based on valid time ranges after data validation.
  • Sampling start: Calculated as Sampling End minus Sampling Window.
  • Simulation time: If data sampling is enabled, this is the middle point of the Sampling Window. Otherwise, it's equal to Run Time.
  • Start time: The time when the connector started running the simulation.
  • End time: The time when the connector finished running the simulation.

There are several options for data sampling:

  1. If data sampling is disabled, no data validation is performed, simulation time is equal to the Run Time, and the latest datapoints before the Run Time are used for all input timeseries.

data sampling disabled

  1. If data sampling is enabled without validation, the system uses the Sampling Window to sample data between Run Time and Sampling Start, taking the average value inside the window. Simulation Time is located in the middle of the Sampling Window.

data sampling without validation

  1. If both data sampling and validation are enabled, the system searches for valid time ranges inside the Validation Window to determine if there is a continuous time range up to the size of the Sampling Window where process data can be sampled. The most recent valid range is used as the Sampling Window, where data from the input timeseries is sampled as before.

data sampling

Data validation methods

Two data validation methods can be enabled to check for valid process conditions before selecting data for simulator runs: Logic Validation and Steady State Detection.

Logic Validation checks if a logical condition is met for a given time series. The user specifies the time series, logical operator, and value to check against. It returns a binary time series, with 1 for timestamps when the condition is met and 0 otherwise.

The image below provides an example of the Logic Validation applied to some sensor data. The validation window is 6h, the logical operator is Greater and the value is 45.0. The yellow hatched areas identify the time regions where the logical condition was met.

logical check

Steady State Detection checks for steady state regions in a given time series. The user specifies the time series and three parameters: min section size, var threshold, and slope threshold. It returns a binary time series, with 1 for timestamps where the steady state criteria is met and 0 otherwise.

The image below provides an example of the Steady State Detection applied to some sensor data. The validation window is 6h, the min section size is 15, the var threshold is 3.5 and the slope threshold is -3.0. The yellow hatched areas identify the time regions where the steady state criteria was met.

Steady state detection

With the results of the data validation methods, the sampling time can be calculated. First the valid timestamps are identified by calculating the intersection between all enabled data validation routines (when both return 1.0). Then the routine checks if a continuous time range greater or equal to the sampling window size exists within the valid timestamps. The most recent time range with size equal to the sampling window is selected to sample the input variables required for the simulation.

The image below provides an example of the Data Validation process and the identification of the sampling time. The Logical Check and the Steady State Detection are enabled and are used to identify the valid timestamps (identified as blue hatched areas in the third plot) which are used to find the sampling time (identified as green hatched area in the forth plot).

Sampling time calculation