Objective
Time series resampling involves interpolating new values for time steps not available in the original time series. This operation is useful to:
Transform irregular time series into regular.
Align time series with different temporal resolutions.
Increase (upsampling) or decrease (downsampling) the temporal resolution of a time series.
On the other hand, time series resampling should not be used to extrapolate new values outside of the original time range of the time series, or to increase the resolution of a time series by a factor of two or more. These operations are known to produce non-sensical results.
Methods This function offers three methods for time series interpolation:
"linear" (default): interpolation via piecewise linear regression as implemented in
zoo::na.approx()
."spline": cubic smoothing spline regression as implemented in
stats::smooth.spline()
."loess": local polynomial regression fitting as implemented in
stats::loess()
.
These methods are used to fit models y ~ x
where y
represents the values of a univariate time series and x
represents a numeric version of its time.
The functions utils_optimize_spline()
and utils_optimize_loess()
are used under the hood to optimize the complexity of the methods "spline" and "loess" by finding the configuration that minimizes the root mean squared error (RMSE) between observed and predicted y
. However, when the argument max_complexity = TRUE
, the complexity optimization is ignored, and a maximum complexity model is used instead.
New time
The argument new_time
offers several alternatives to help define the new time of the resulting time series:
NULL
: the target time series (x
) is resampled to a regular time within its original time range and number of observations.zoo object
: a zoo object to be used as template for resampling. Useful when the objective is equalizing the frequency of two separate zoo objects.time vector
: a time vector of a class compatible with the time inx
.keyword
: character string defining a resampling keyword, obtained viazoo_time(x, keywords = "resample")$keywords
..numeric
: a single number representing the desired interval between consecutive samples in the units ofx
(relevant units can be obtained viazoo_time(x)$units
).
Step by Step
The steps to resample a time series list are:
The time interpolation range taken from the index of the zoo object. This step ensures that no extrapolation occurs during resampling.
If
new_time
is provided, any values ofnew_time
outside of the minimum and maximum interpolation times are removed to avoid extrapolation. Ifnew_time
is not provided, a regular time within the interpolation time range of the zoo object is generated.For each univariate time time series, a model
y ~ x
, wherey
is the time series andx
is its own time coerced to numeric is fitted.If
max_complexity == FALSE
andmethod = "spline"
ormethod = "loess"
, the model with the complexity that minimizes the root mean squared error between the observed and predictedy
is returned.If
max_complexity == TRUE
andmethod = "spline"
ormethod = "loess"
, the first valid model closest to a maximum complexity is returned.
The fitted model is predicted over
new_time
to generate the resampled time series.
Other Details
Please use this operation with care, as there are limits to the amount of resampling that can be done without distorting the data. The safest option is to keep the distance between new time points within the same magnitude of the distance between the old time points.
Arguments
- x
(required, zoo object) Time series to resample. Default: NULL
- new_time
(optional, zoo object, keyword, or time vector) New time to resample
x
to. The available options are:NULL: a regular version of the time in
x
is generated and used for resampling.zoo object: the index of the given zoo object is used as template to resample
x
.time vector: a vector with new times to resample
x
to. If time inx
is of class "numeric", this vector must be numeric as well. Otherwise, vectors of classes "Date" and "POSIXct" can be used indistinctly.keyword: a valid keyword returned by
zoo_time(x)$keywords
, used to generate a time vector with the relevant units.numeric of length 1: interpreted as new time interval, in the highest resolution units returned by
zoo_time(x)$units
.
- method
(optional, character string) Name of the method to resample the time series. One of "linear", "spline" or "loess". Default: "linear".
- max_complexity
(required, logical). Only relevant for methods "spline" and "loess". If TRUE, model optimization is ignored, and the a model of maximum complexity (an overfitted model) is used for resampling. Default: FALSE
See also
Other zoo_functions:
zoo_aggregate()
,
zoo_name_clean()
,
zoo_name_get()
,
zoo_name_set()
,
zoo_permute()
,
zoo_plot()
,
zoo_time()
,
zoo_to_tsl()
,
zoo_vector_to_matrix()
Examples
#simulate irregular time series
x <- zoo_simulate(
cols = 2,
rows = 50,
time_range = c("2010-01-01", "2020-01-01"),
irregular = TRUE
)
#plot time series
if(interactive()){
zoo_plot(x)
}
#intervals between samples
x_intervals <- diff(zoo::index(x))
x_intervals
#> Time differences in days
#> [1] 184.44444 36.88889 147.55556 36.88889 73.77778 36.88889 147.55556
#> [8] 36.88889 36.88889 184.44444 36.88889 36.88889 36.88889 36.88889
#> [15] 36.88889 110.66667 36.88889 36.88889 73.77778 73.77778 36.88889
#> [22] 36.88889 110.66667 221.33333 36.88889 36.88889 36.88889 110.66667
#> [29] 36.88889 36.88889 36.88889 36.88889 73.77778 147.55556 184.44444
#> [36] 36.88889 36.88889 73.77778 36.88889 36.88889 73.77778 36.88889
#> [43] 110.66667 73.77778 36.88889 147.55556 73.77778 36.88889 147.55556
#create regular time from the minimum of the observed intervals
new_time <- seq.Date(
from = min(zoo::index(x)),
to = max(zoo::index(x)),
by = floor(min(x_intervals))
)
new_time
#> [1] "2010-03-15" "2010-04-20" "2010-05-26" "2010-07-01" "2010-08-06"
#> [6] "2010-09-11" "2010-10-17" "2010-11-22" "2010-12-28" "2011-02-02"
#> [11] "2011-03-10" "2011-04-15" "2011-05-21" "2011-06-26" "2011-08-01"
#> [16] "2011-09-06" "2011-10-12" "2011-11-17" "2011-12-23" "2012-01-28"
#> [21] "2012-03-04" "2012-04-09" "2012-05-15" "2012-06-20" "2012-07-26"
#> [26] "2012-08-31" "2012-10-06" "2012-11-11" "2012-12-17" "2013-01-22"
#> [31] "2013-02-27" "2013-04-04" "2013-05-10" "2013-06-15" "2013-07-21"
#> [36] "2013-08-26" "2013-10-01" "2013-11-06" "2013-12-12" "2014-01-17"
#> [41] "2014-02-22" "2014-03-30" "2014-05-05" "2014-06-10" "2014-07-16"
#> [46] "2014-08-21" "2014-09-26" "2014-11-01" "2014-12-07" "2015-01-12"
#> [51] "2015-02-17" "2015-03-25" "2015-04-30" "2015-06-05" "2015-07-11"
#> [56] "2015-08-16" "2015-09-21" "2015-10-27" "2015-12-02" "2016-01-07"
#> [61] "2016-02-12" "2016-03-19" "2016-04-24" "2016-05-30" "2016-07-05"
#> [66] "2016-08-10" "2016-09-15" "2016-10-21" "2016-11-26" "2017-01-01"
#> [71] "2017-02-06" "2017-03-14" "2017-04-19" "2017-05-25" "2017-06-30"
#> [76] "2017-08-05" "2017-09-10" "2017-10-16" "2017-11-21" "2017-12-27"
#> [81] "2018-02-01" "2018-03-09" "2018-04-14" "2018-05-20" "2018-06-25"
#> [86] "2018-07-31" "2018-09-05" "2018-10-11" "2018-11-16" "2018-12-22"
#> [91] "2019-01-27" "2019-03-04" "2019-04-09" "2019-05-15" "2019-06-20"
#> [96] "2019-07-26" "2019-08-31" "2019-10-06" "2019-11-11" "2019-12-17"
diff(new_time)
#> Time differences in days
#> [1] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [26] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [51] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [76] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#resample using piecewise linear regression
x_linear <- zoo_resample(
x = x,
new_time = new_time,
method = "linear"
)
#resample using max complexity splines
x_spline <- zoo_resample(
x = x,
new_time = new_time,
method = "spline",
max_complexity = TRUE
)
#> Warning: Class of argument 'x' must be one of 'POSIXct', 'Date', 'numeric', 'integer'.
#resample using max complexity loess
x_loess <- zoo_resample(
x = x,
new_time = new_time,
method = "loess",
max_complexity = TRUE
)
#> Warning: Class of argument 'x' must be one of 'POSIXct', 'Date', 'numeric', 'integer'.
#intervals between new samples
diff(zoo::index(x_linear))
#> Time differences in days
#> [1] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [26] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [51] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [76] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
diff(zoo::index(x_spline))
#> Time differences in days
#> [1] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [26] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [51] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [76] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
diff(zoo::index(x_loess))
#> Time differences in days
#> [1] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [26] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [51] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#> [76] 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36 36
#plotting results
if(interactive()){
par(mfrow = c(4, 1), mar = c(3,3,2,2))
zoo_plot(
x,
guide = FALSE,
title = "Original"
)
zoo_plot(
x_linear,
guide = FALSE,
title = "Method: linear"
)
zoo_plot(
x_spline,
guide = FALSE,
title = "Method: spline"
)
zoo_plot(
x_loess,
guide = FALSE,
title = "Method: loess"
)
}