Function to aggregate zoo objects within a time series list.
This function supports progress bars generated by the progressr
package. See examples.
This function also accepts a parallelization setup via future::plan()
, but it might only be worth it for large time series lists.
Objective
Time series aggregation involves grouping observations and summarizing group values with a statistical function. This operation is useful to:
Decrease (downsampling) the temporal resolution of a time series.
Highlight particular states of a time series over time. For example, a daily temperature series can be aggregated by month using
max
to represent the highest temperatures each month.Transform irregular time series into regular.
This function aggregates time series lists with overlapping times. Please check such overlap by assessing the columns "begin" and "end " of the data frame resulting from df <- tsl_time(tsl = tsl)
. Aggregation will be limited by the shortest time series in your time series list. To aggregate non-overlapping time series, please subset the individual components of tsl
one by one either using tsl_subset()
or the syntax tsl = my_tsl[[i]]
.
Methods
Any function returning a single number from a numeric vector can be used to aggregate a time series list. Quoted and unquoted function names can be used. Additional arguments to these functions can be passed via the argument ...
. Typical examples are:
mean
or"mean"
: seemean()
.median
or"median"
: seestats::median()
.quantile
or "quantile": seestats::quantile()
.min
or"min"
: seemin()
.max
or"max"
: seemax()
.sd
or"sd"
: to compute standard deviation, seestats::sd()
.var
or"var"
: to compute the group variance, seestats::var()
.length
or"length"
: to compute group length.sum
or"sum"
: seesum()
.f_slope
or"f_slope"
: to compute the group slope, seef_slope()
.
Arguments
- tsl
(required, list) Time series list. Default: NULL
- new_time
(required, numeric, numeric vector, Date vector, POSIXct vector, or keyword) Definition of the aggregation pattern. The available options are:
numeric vector: only for the "numeric" time class, defines the breakpoints for time series aggregation.
"Date" or "POSIXct" vector: as above, but for the time classes "Date" and "POSIXct." In any case, the input vector is coerced to the time class of the
tsl
argument.numeric: defines fixed time intervals in the units of
tsl
for time series aggregation. Used as is when the time class is "numeric", and coerced to integer and interpreted as days for the time classes "Date" and "POSIXct".keyword (see
utils_time_units()
): the common options for the time classes "Date" and "POSIXct" are: "millennia", "centuries", "decades", "years", "quarters", "months", and "weeks". Exclusive keywords for the "POSIXct" time class are: "days", "hours", "minutes", and "seconds". The time class "numeric" accepts keywords coded as scientific numbers, from "1e8" to "1e-8".
- method
(required, function name) Name of a standard or custom function to aggregate numeric vectors. Typical examples are
mean
,max
,min
,median
, andquantile
. Default:mean
.- ...
(optional) further arguments for
method
.
See also
Other tsl_processing:
tsl_resample()
,
tsl_stats()
,
tsl_transform()
Examples
#parallelization setup (not worth it for this data size)
future::plan(
future::multisession,
workers = 2 #set to parallelly::availableWorkers() - 1
)
# progress bar (does not work in examples)
# progressr::handlers(global = TRUE)
# yearly aggregation
#----------------------------------
#long-term monthly temperature of 20 cities
tsl <- tsl_initialize(
x = cities_temperature,
name_column = "name",
time_column = "time"
)
#plot time series
if(interactive()){
tsl_plot(
tsl = tsl[1:4],
guide_columns = 4
)
}
#check time features
tsl_time(tsl)[, c("name", "resolution", "units")]
#> name resolution units
#> 1 Bangkok 30.4381 days
#> 2 Bogotá 30.4381 days
#> 3 Cairo 30.4381 days
#> 4 Dhaka 30.4381 days
#> 5 Ho_Chi_Minh_City 30.4381 days
#> 6 Istanbul 30.4381 days
#> 7 Jakarta 30.4381 days
#> 8 Karachi 30.4381 days
#> 9 Kinshasa 30.4381 days
#> 10 Lagos 30.4381 days
#> 11 Lima 30.4381 days
#> 12 London 30.4381 days
#> 13 Los_Angeles 30.4381 days
#> 14 Manila 30.4381 days
#> 15 Moscow 30.4381 days
#> 16 Paris 30.4381 days
#> 17 Rio_De_Janeiro 30.4381 days
#> 18 Shanghai 30.4381 days
#> 19 São_Paulo 30.4381 days
#> 20 Tokyo 30.4381 days
#aggregation: mean yearly values
tsl_year <- tsl_aggregate(
tsl = tsl,
new_time = "year",
method = mean
)
#' #check time features
tsl_time(tsl_year)[, c("name", "resolution", "units")]
#> name resolution units
#> 1 Bangkok 365.2571 days
#> 2 Bogotá 365.2571 days
#> 3 Cairo 365.2571 days
#> 4 Dhaka 365.2571 days
#> 5 Ho_Chi_Minh_City 365.2571 days
#> 6 Istanbul 365.2571 days
#> 7 Jakarta 365.2571 days
#> 8 Karachi 365.2571 days
#> 9 Kinshasa 365.2571 days
#> 10 Lagos 365.2571 days
#> 11 Lima 365.2571 days
#> 12 London 365.2571 days
#> 13 Los_Angeles 365.2571 days
#> 14 Manila 365.2571 days
#> 15 Moscow 365.2571 days
#> 16 Paris 365.2571 days
#> 17 Rio_De_Janeiro 365.2571 days
#> 18 Shanghai 365.2571 days
#> 19 São_Paulo 365.2571 days
#> 20 Tokyo 365.2571 days
if(interactive()){
tsl_plot(
tsl = tsl_year[1:4],
guide_columns = 4
)
}
# other supported keywords
#----------------------------------
#simulate full range of calendar dates
tsl <- tsl_simulate(
n = 2,
rows = 1000,
time_range = c(
"0000-01-01",
as.character(Sys.Date())
)
)
#mean value by millennia (extreme case!!!)
tsl_millennia <- tsl_aggregate(
tsl = tsl,
new_time = "millennia",
method = mean
)
if(interactive()){
tsl_plot(tsl_millennia)
}
#max value by centuries
tsl_century <- tsl_aggregate(
tsl = tsl,
new_time = "century",
method = max
)
if(interactive()){
tsl_plot(tsl_century)
}
#quantile 0.75 value by centuries
tsl_centuries <- tsl_aggregate(
tsl = tsl,
new_time = "centuries",
method = stats::quantile,
probs = 0.75 #argument of stats::quantile()
)
#disable parallelization
future::plan(
future::sequential
)