Skip to contents

Objective

Time series resampling involves interpolating new values for time steps not available in the original time series. This operation is useful to:

  • Transform irregular time series into regular.

  • Align time series with different temporal resolutions.

  • Increase (upsampling) or decrease (downsampling) the temporal resolution of a time series.

On the other hand, time series resampling should not be used to extrapolate new values outside of the original time range of the time series, or to increase the resolution of a time series by a factor of two or more. These operations are known to produce non-sensical results.

Methods This function offers three methods for time series interpolation:

  • "linear" (default): interpolation via piecewise linear regression as implemented in zoo::na.approx().

  • "spline": cubic smoothing spline regression as implemented in stats::smooth.spline().

  • "loess": local polynomial regression fitting as implemented in stats::loess().

These methods are used to fit models y ~ x where y represents the values of a univariate time series and x represents a numeric version of its time.

The functions utils_optimize_spline() and utils_optimize_loess() are used under the hood to optimize the complexity of the methods "spline" and "loess" by finding the configuration that minimizes the root mean squared error (RMSE) between observed and predicted y. However, when the argument max_complexity = TRUE, the complexity optimization is ignored, and a maximum complexity model is used instead.

New time

The argument new_time offers several alternatives to help define the new time of the resulting time series:

  • NULL: the target time series (x) is resampled to a regular time within its original time range and number of observations.

  • zoo object: a zoo object to be used as template for resampling. Useful when the objective is equalizing the frequency of two separate zoo objects.

  • time vector: a time vector of a class compatible with the time in x.

  • keyword: character string defining a resampling keyword, obtained via zoo_time(x, keywords = "resample")$keywords..

  • numeric: a single number representing the desired interval between consecutive samples in the units of x (relevant units can be obtained via zoo_time(x)$units).

Step by Step

The steps to resample a time series list are:

  1. The time interpolation range taken from the index of the zoo object. This step ensures that no extrapolation occurs during resampling.

  2. If new_time is provided, any values of new_time outside of the minimum and maximum interpolation times are removed to avoid extrapolation. If new_time is not provided, a regular time within the interpolation time range of the zoo object is generated.

  3. For each univariate time time series, a model y ~ x, where y is the time series and x is its own time coerced to numeric is fitted.

    • If max_complexity == FALSE and method = "spline" or method = "loess", the model with the complexity that minimizes the root mean squared error between the observed and predicted y is returned.

    • If max_complexity == TRUE and method = "spline" or method = "loess", the first valid model closest to a maximum complexity is returned.

  4. The fitted model is predicted over new_time to generate the resampled time series.

Other Details

Please use this operation with care, as there are limits to the amount of resampling that can be done without distorting the data. The safest option is to keep the distance between new time points within the same magnitude of the distance between the old time points.

Usage

zoo_resample(
  x = NULL,
  new_time = NULL,
  method = "linear",
  max_complexity = FALSE
)

Arguments

x

(required, zoo object) Time series to resample. Default: NULL

new_time

(optional, zoo object, keyword, or time vector) New time to resample x to. The available options are:

  • NULL: a regular version of the time in x is generated and used for resampling.

  • zoo object: the index of the given zoo object is used as template to resample x.

  • time vector: a vector with new times to resample x to. If time in x is of class "numeric", this vector must be numeric as well. Otherwise, vectors of classes "Date" and "POSIXct" can be used indistinctly.

  • keyword: a valid keyword returned by zoo_time(x)$keywords, used to generate a time vector with the relevant units.

  • numeric of length 1: interpreted as new time interval, in the highest resolution units returned by zoo_time(x)$units.

method

(optional, character string) Name of the method to resample the time series. One of "linear", "spline" or "loess". Default: "linear".

max_complexity

(required, logical). Only relevant for methods "spline" and "loess". If TRUE, model optimization is ignored, and the a model of maximum complexity (an overfitted model) is used for resampling. Default: FALSE

Value

zoo object

Examples

#simulate irregular time series
x <- zoo_simulate(
  cols = 2,
  rows = 50,
  time_range = c("2010-01-01", "2020-01-01"),
  irregular = TRUE
  )

#plot time series
if(interactive()){
  zoo_plot(x)
}

#intervals between samples
x_intervals <- diff(zoo::index(x))
x_intervals
#> Time differences in days
#>  [1] 184.44444  36.88889 147.55556  36.88889  73.77778  36.88889 147.55556
#>  [8]  36.88889  36.88889 184.44444  36.88889  36.88889  36.88889  36.88889
#> [15]  36.88889 110.66667  36.88889  36.88889  73.77778  73.77778  36.88889
#> [22]  36.88889 110.66667 221.33333  36.88889  36.88889  36.88889 110.66667
#> [29]  36.88889  36.88889  36.88889  36.88889  73.77778 147.55556 184.44444
#> [36]  36.88889  36.88889  73.77778  36.88889  36.88889  73.77778  36.88889
#> [43] 110.66667  73.77778  36.88889 147.55556  73.77778  36.88889 147.55556

#create regular time from the minimum of the observed intervals
new_time <- seq.Date(
  from = min(zoo::index(x)),
  to = max(zoo::index(x)),
  by = floor(min(x_intervals))
)

new_time
#>   [1] "2010-03-15" "2010-04-20" "2010-05-26" "2010-07-01" "2010-08-06"
#>   [6] "2010-09-11" "2010-10-17" "2010-11-22" "2010-12-28" "2011-02-02"
#>  [11] "2011-03-10" "2011-04-15" "2011-05-21" "2011-06-26" "2011-08-01"
#>  [16] "2011-09-06" "2011-10-12" "2011-11-17" "2011-12-23" "2012-01-28"
#>  [21] "2012-03-04" "2012-04-09" "2012-05-15" "2012-06-20" "2012-07-26"
#>  [26] "2012-08-31" "2012-10-06" "2012-11-11" "2012-12-17" "2013-01-22"
#>  [31] "2013-02-27" "2013-04-04" "2013-05-10" "2013-06-15" "2013-07-21"
#>  [36] "2013-08-26" "2013-10-01" "2013-11-06" "2013-12-12" "2014-01-17"
#>  [41] "2014-02-22" "2014-03-30" "2014-05-05" "2014-06-10" "2014-07-16"
#>  [46] "2014-08-21" "2014-09-26" "2014-11-01" "2014-12-07" "2015-01-12"
#>  [51] "2015-02-17" "2015-03-25" "2015-04-30" "2015-06-05" "2015-07-11"
#>  [56] "2015-08-16" "2015-09-21" "2015-10-27" "2015-12-02" "2016-01-07"
#>  [61] "2016-02-12" "2016-03-19" "2016-04-24" "2016-05-30" "2016-07-05"
#>  [66] "2016-08-10" "2016-09-15" "2016-10-21" "2016-11-26" "2017-01-01"
#>  [71] "2017-02-06" "2017-03-14" "2017-04-19" "2017-05-25" "2017-06-30"
#>  [76] "2017-08-05" "2017-09-10" "2017-10-16" "2017-11-21" "2017-12-27"
#>  [81] "2018-02-01" "2018-03-09" "2018-04-14" "2018-05-20" "2018-06-25"
#>  [86] "2018-07-31" "2018-09-05" "2018-10-11" "2018-11-16" "2018-12-22"
#>  [91] "2019-01-27" "2019-03-04" "2019-04-09" "2019-05-15" "2019-06-20"
#>  [96] "2019-07-26" "2019-08-31" "2019-10-06" "2019-11-11" "2019-12-17"
diff(new_time)
#> Time differences in days
#>  [1] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [26] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [51] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [76] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36

#resample using piecewise linear regression
x_linear <- zoo_resample(
  x = x,
  new_time = new_time,
  method = "linear"
)

#resample using max complexity splines
x_spline <- zoo_resample(
  x = x,
  new_time = new_time,
  method = "spline",
  max_complexity = TRUE
)
#> Warning: Class of argument 'x' must be one of 'POSIXct', 'Date', 'numeric', 'integer'.

#resample using max complexity loess
x_loess <- zoo_resample(
  x = x,
  new_time = new_time,
  method = "loess",
  max_complexity = TRUE
)
#> Warning: Class of argument 'x' must be one of 'POSIXct', 'Date', 'numeric', 'integer'.


#intervals between new samples
diff(zoo::index(x_linear))
#> Time differences in days
#>  [1] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [26] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [51] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [76] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
diff(zoo::index(x_spline))
#> Time differences in days
#>  [1] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [26] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [51] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [76] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
diff(zoo::index(x_loess))
#> Time differences in days
#>  [1] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [26] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [51] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [76] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36

#plotting results
if(interactive()){

  par(mfrow = c(4, 1), mar = c(3,3,2,2))

  zoo_plot(
    x,
    guide = FALSE,
    title = "Original"
    )

  zoo_plot(
    x_linear,
    guide = FALSE,
    title = "Method: linear"
  )

  zoo_plot(
    x_spline,
    guide = FALSE,
    title = "Method: spline"
    )

  zoo_plot(
    x_loess,
    guide = FALSE,
    title = "Method: loess"
  )

}