Contribution of Individual Variables to Dissimilarity in Time Series Lists
Source:R/distantia_importance.R
distantia_importance.Rd
This function measures the contribution of individual variables to the dissimilarity between pairs of time series to help answer the question what makes two time series more or less similar?
Three key values are required to assess individual variable contributions:
psi: dissimilarity when all variables are considered.
psi_only_with: dissimilarity when using only the target variable.
psi_without: dissimilarity when removing the target variable.
The values psi_only_with
and psi_without
can be computed in two different ways defined by the argument robust
.
robust = FALSE
: This method replicates the importance algorithm released with the first version of the package, and it is only recommended when the goal to compare new results with previous studies. It normalizespsi_only_with
andpsi_without
using the least cost path obtained from the individual variable. As different variables may have different least cost paths for the same time series, normalization values may change from variable to variable, making individual importance scores harder to compare.robust = TRUE
(default, recommended): This a novel version of the importance algorithm that yields more stable and comparable solutions. It uses the least cost path of the complete time series to normalizepsi_only_with
andpsi_without
, making importance scores of separate variables fully comparable.
The individual importance score of each variable (column "importance" in the output data frame) is based on different expressions depending on the robust
argument, even when lock_step = TRUE
:
robust = FALSE
: Importance is computed as((psi - psi_without) * 100)/psi
and interpreted as "change in similarity when a variable is removed".robust = TRUE
: Importance is computed as((psi_only_with - psi_without) * 100)/psi
and interpreted as "relative dissimilarity induced by the variable expressed as a percentage".
In either case, positive values indicate that the variable contributes to dissimilarity, while negative values indicate a net contribution to similarity.
This function allows computing dissimilarity between pairs of time series using different combinations of arguments at once. For example, when the argument distance
is set to c("euclidean", "manhattan")
, the output data frame will show two dissimilarity scores for each pair of time series, one based on euclidean distances, and another based on manhattan distances. The same happens for most other parameters.
This function supports progress bars generated by the progressr
package. See examples.
This function also accepts a parallelization setup via future::plan()
, but it might only be worth it for very long time series.
Usage
distantia_importance(
tsl = NULL,
distance = "euclidean",
diagonal = TRUE,
weighted = TRUE,
ignore_blocks = FALSE,
lock_step = FALSE,
robust = TRUE
)
Arguments
- tsl
(required, time series list) list of zoo time series. Default: NULL
- distance
(optional, character vector) name or abbreviation of the distance method. Valid values are in the columns "names" and "abbreviation" of the dataset distances. Default: "euclidean".
- diagonal
(optional, logical vector). If TRUE, diagonals are included in the dynamic time warping computation. Default: TRUE
- weighted
(optional, logical vector) If TRUE, diagonal is set to TRUE, and diagonal cost is weighted by a factor of 1.414214. Default: TRUE
- ignore_blocks
(optional, logical vector). If TRUE, blocks of consecutive least cost path coordinates are trimmed to avoid inflating the psi dissimilarity Irrelevant if
diagonal = TRUE
. Default: FALSE.- lock_step
(optional, logical vector) If TRUE, time series captured at the same times are compared sample wise (with no dynamic time warping). Requires time series in argument
tsl
to be fully aligned, or it will return an error. Default: FALSE.- robust
(required, logical). If TRUE (default), importance scores are computed using the least cost path of the complete time series as reference. Setting it to FALSE allows to replicate importance scores of the previous versions of this package. This option is irrelevant when
lock_step = TRUE
. Default: TRUE
Value
data frame:
x
: name of the time seriesx
.y
: name of the time seriesy
.psi
: psi score ofx
andy
.variable
: name of the individual variable.importance
: importance score of the variable.psi_only_with
: psi score of the variable.psi_without
: psi score without the variable.psi_difference
: difference betweenpsi_only_with
andpsi_without
.distance
: name of the distance metric.diagonal
: value of the argumentdiagonal
.weighted
: value of the argumentweighted
.ignore_blocks
: value of the argumentignore_blocks
.lock_step
: value of the argumentlock_step
.robust
: value of the argumentrobust
.
See also
Other dissimilarity_analysis_main:
distantia()
Examples
#parallelization setup (not worth it for this data size)
future::plan(
future::multisession,
workers = 2 #set to parallelly::availableWorkers() - 1
)
#progress bar
# progressr::handlers(global = TRUE)
#three time series
#climate and ndvi in Fagus sylvatica stands in Spain, Germany, and Sweden
data("fagus_dynamics")
#load as tsl
#center and scale with same parameters
tsl <- tsl_initialize(
x = fagus_dynamics,
name_column = "name",
time_column = "time"
) |>
tsl_transform(
f = f_scale
)
if(interactive()){
tsl_plot(
tsl = tsl,
guide_columns = 3
)
}
#importance computed with dynamic time warping
#less sensitive to latitudinal or altitudinal differences
importance_dtw <- distantia_importance(
tsl = tsl
)
#focus on important columns
importance_dtw[, c(
"x",
"y",
"psi",
"variable",
"importance",
"effect"
)]
#> x y psi variable importance effect
#> 1 Germany Spain 1.3429956 evi 6.512321 decreases similarity
#> 2 Germany Spain 1.3429956 rainfall 12.505764 decreases similarity
#> 3 Germany Spain 1.3429956 temperature -26.509115 increases similarity
#> 4 Germany Sweden 0.8571217 evi 29.026504 decreases similarity
#> 5 Germany Sweden 0.8571217 rainfall -4.209397 increases similarity
#> 6 Germany Sweden 0.8571217 temperature -26.661768 increases similarity
#> 7 Spain Sweden 1.4803954 evi -6.625437 increases similarity
#> 8 Spain Sweden 1.4803954 rainfall -4.416941 increases similarity
#> 9 Spain Sweden 1.4803954 temperature 13.668290 decreases similarity
#Interpretation example:
#variable contributing the most to similarity between Germany and Sweden: temperature
#variable contributing the most to dissimilarity between Spain and Sweden: temperature
#importance computed with lock-step method
#more sensitive to latitudinal or altitudinal differences
importance_lock_step <- distantia_importance(
tsl = tsl,
lock_step = TRUE
)
importance_lock_step[, c(
"x",
"y",
"psi",
"variable",
"importance",
"effect"
)]
#> x y psi variable importance effect
#> 1 Germany Spain 1.306133 evi 0.241829 decreases similarity
#> 2 Germany Spain 1.306133 rainfall 19.051052 decreases similarity
#> 3 Germany Spain 1.306133 temperature -30.814944 increases similarity
#> 4 Germany Sweden 0.857670 evi 28.539736 decreases similarity
#> 5 Germany Sweden 0.857670 rainfall -4.845232 increases similarity
#> 6 Germany Sweden 0.857670 temperature -25.011608 increases similarity
#> 7 Spain Sweden 1.470850 evi -22.912397 increases similarity
#> 8 Spain Sweden 1.470850 rainfall 9.732110 decreases similarity
#> 9 Spain Sweden 1.470850 temperature 12.501949 decreases similarity
#combinations of parameters
#---------------------------------
#most arguments accept vectors, and the results contain all argument combinations
importance_df <- distantia_importance(
tsl = tsl,
lock_step = c(TRUE, FALSE)
)
importance_df[, c(
"x",
"y",
"psi",
"variable",
"importance",
"effect",
"lock_step"
)]
#> x y psi variable importance effect
#> 1 Germany Spain 1.3061327 evi 0.241829 decreases similarity
#> 2 Germany Spain 1.3061327 rainfall 19.051052 decreases similarity
#> 3 Germany Spain 1.3061327 temperature -30.814944 increases similarity
#> 4 Germany Sweden 0.8576700 evi 28.539736 decreases similarity
#> 5 Germany Sweden 0.8576700 rainfall -4.845232 increases similarity
#> 6 Germany Sweden 0.8576700 temperature -25.011608 increases similarity
#> 7 Spain Sweden 1.4708497 evi -22.912397 increases similarity
#> 8 Spain Sweden 1.4708497 rainfall 9.732110 decreases similarity
#> 9 Spain Sweden 1.4708497 temperature 12.501949 decreases similarity
#> 10 Germany Spain 1.3429956 evi 6.512321 decreases similarity
#> 11 Germany Spain 1.3429956 rainfall 12.505764 decreases similarity
#> 12 Germany Spain 1.3429956 temperature -26.509115 increases similarity
#> 13 Germany Sweden 0.8571217 evi 29.026504 decreases similarity
#> 14 Germany Sweden 0.8571217 rainfall -4.209397 increases similarity
#> 15 Germany Sweden 0.8571217 temperature -26.661768 increases similarity
#> 16 Spain Sweden 1.4803954 evi -6.625437 increases similarity
#> 17 Spain Sweden 1.4803954 rainfall -4.416941 increases similarity
#> 18 Spain Sweden 1.4803954 temperature 13.668290 decreases similarity
#> lock_step
#> 1 TRUE
#> 2 TRUE
#> 3 TRUE
#> 4 TRUE
#> 5 TRUE
#> 6 TRUE
#> 7 TRUE
#> 8 TRUE
#> 9 TRUE
#> 10 FALSE
#> 11 FALSE
#> 12 FALSE
#> 13 FALSE
#> 14 FALSE
#> 15 FALSE
#> 16 FALSE
#> 17 FALSE
#> 18 FALSE
#disable parallelization
future::plan(
future::sequential
)