Skip to contents

Function to aggregate zoo objects within a time series list. This function supports progress bars generated by the progressr package. See examples.

This function also accepts a parallelization setup via future::plan(), but it might only be worth it for large time series lists.

Objective

Time series aggregation involves grouping observations and summarizing group values with a statistical function. This operation is useful to:

  • Decrease (downsampling) the temporal resolution of a time series.

  • Highlight particular states of a time series over time. For example, a daily temperature series can be aggregated by month using max to represent the highest temperatures each month.

  • Transform irregular time series into regular.

This function aggregates time series lists with overlapping times. Please check such overlap by assessing the columns "begin" and "end " of the data frame resulting from df <- tsl_time(tsl = tsl). Aggregation will be limited by the shortest time series in your time series list. To aggregate non-overlapping time series, please subset the individual components of tsl one by one either using tsl_subset() or the syntax tsl = my_tsl[[i]].

Methods

Any function returning a single number from a numeric vector can be used to aggregate a time series list. Quoted and unquoted function names can be used. Additional arguments to these functions can be passed via the argument .... Typical examples are:

Usage

tsl_aggregate(tsl = NULL, new_time = NULL, method = mean, ...)

Arguments

tsl

(required, list) Time series list. Default: NULL

new_time

(required, numeric, numeric vector, Date vector, POSIXct vector, or keyword) Definition of the aggregation pattern. The available options are:

  • numeric vector: only for the "numeric" time class, defines the breakpoints for time series aggregation.

  • "Date" or "POSIXct" vector: as above, but for the time classes "Date" and "POSIXct." In any case, the input vector is coerced to the time class of the tsl argument.

  • numeric: defines fixed time intervals in the units of tsl for time series aggregation. Used as is when the time class is "numeric", and coerced to integer and interpreted as days for the time classes "Date" and "POSIXct".

  • keyword (see utils_time_units()): the common options for the time classes "Date" and "POSIXct" are: "millennia", "centuries", "decades", "years", "quarters", "months", and "weeks". Exclusive keywords for the "POSIXct" time class are: "days", "hours", "minutes", and "seconds". The time class "numeric" accepts keywords coded as scientific numbers, from "1e8" to "1e-8".

method

(required, function name) Name of a standard or custom function to aggregate numeric vectors. Typical examples are mean, max,min, median, and quantile. Default: mean.

...

(optional) further arguments for method.

Value

time series list

See also

zoo_aggregate()

Other tsl_processing: tsl_resample(), tsl_stats(), tsl_transform()

Examples

#parallelization setup (not worth it for this data size)
future::plan(
  future::multisession,
  workers = 2 #set to parallelly::availableWorkers() - 1
)

# progress bar (does not work in examples)
# progressr::handlers(global = TRUE)

# yearly aggregation
#----------------------------------
#long-term monthly temperature of 20 cities
tsl <- tsl_initialize(
  x = cities_temperature,
  name_column = "name",
  time_column = "time"
)

#plot time series
if(interactive()){
  tsl_plot(
    tsl = tsl[1:4],
    guide_columns = 4
  )
}

#check time features
tsl_time(tsl)[, c("name", "resolution", "units")]
#>                name resolution units
#> 1           Bangkok    30.4381  days
#> 2            Bogotá    30.4381  days
#> 3             Cairo    30.4381  days
#> 4             Dhaka    30.4381  days
#> 5  Ho_Chi_Minh_City    30.4381  days
#> 6          Istanbul    30.4381  days
#> 7           Jakarta    30.4381  days
#> 8           Karachi    30.4381  days
#> 9          Kinshasa    30.4381  days
#> 10            Lagos    30.4381  days
#> 11             Lima    30.4381  days
#> 12           London    30.4381  days
#> 13      Los_Angeles    30.4381  days
#> 14           Manila    30.4381  days
#> 15           Moscow    30.4381  days
#> 16            Paris    30.4381  days
#> 17   Rio_De_Janeiro    30.4381  days
#> 18         Shanghai    30.4381  days
#> 19        São_Paulo    30.4381  days
#> 20            Tokyo    30.4381  days

#aggregation: mean yearly values
tsl_year <- tsl_aggregate(
  tsl = tsl,
  new_time = "year",
  method = mean
)

#' #check time features
tsl_time(tsl_year)[, c("name", "resolution", "units")]
#>                name resolution units
#> 1           Bangkok   365.2571  days
#> 2            Bogotá   365.2571  days
#> 3             Cairo   365.2571  days
#> 4             Dhaka   365.2571  days
#> 5  Ho_Chi_Minh_City   365.2571  days
#> 6          Istanbul   365.2571  days
#> 7           Jakarta   365.2571  days
#> 8           Karachi   365.2571  days
#> 9          Kinshasa   365.2571  days
#> 10            Lagos   365.2571  days
#> 11             Lima   365.2571  days
#> 12           London   365.2571  days
#> 13      Los_Angeles   365.2571  days
#> 14           Manila   365.2571  days
#> 15           Moscow   365.2571  days
#> 16            Paris   365.2571  days
#> 17   Rio_De_Janeiro   365.2571  days
#> 18         Shanghai   365.2571  days
#> 19        São_Paulo   365.2571  days
#> 20            Tokyo   365.2571  days

if(interactive()){
  tsl_plot(
    tsl = tsl_year[1:4],
    guide_columns = 4
  )
}


# other supported keywords
#----------------------------------

#simulate full range of calendar dates
tsl <- tsl_simulate(
  n = 2,
  rows = 1000,
  time_range = c(
    "0000-01-01",
    as.character(Sys.Date())
  )
)

#mean value by millennia (extreme case!!!)
tsl_millennia <- tsl_aggregate(
  tsl = tsl,
  new_time = "millennia",
  method = mean
)

if(interactive()){
  tsl_plot(tsl_millennia)
}

#max value by centuries
tsl_century <- tsl_aggregate(
  tsl = tsl,
  new_time = "century",
  method = max
)

if(interactive()){
  tsl_plot(tsl_century)
}

#quantile 0.75 value by centuries
tsl_centuries <- tsl_aggregate(
  tsl = tsl,
  new_time = "centuries",
  method = stats::quantile,
  probs = 0.75 #argument of stats::quantile()
)

#disable parallelization
future::plan(
  future::sequential
)