Skip to contents

Computes the contribution of individual variables to the similarity/dissimilarity between two irregular multivariate time series. In opposition to the robust version, least-cost paths for each combination of variables are computed independently, which makes the results of individual variables harder to compare. This function should only be used when the objective is replicating importance scores generated with previous versions of the package distantia. This function generates a data frame with the following columns:

  • variable: name of the individual variable for which the importance is being computed, from the column names of the arguments x and y.

  • psi: global dissimilarity score psi of the two time series.

  • psi_only_with: dissimilarity between x and y computed from the given variable alone.

  • psi_without: dissimilarity between x and y computed from all other variables.

  • psi_difference: difference between psi_only_with and psi_without.

  • importance: contribution of the variable to the similarity/dissimilarity between x and y, computed as ((psi_all - psi_without) * 100) / psi_all. Positive scores represent contribution to dissimilarity, while negative scores represent contribution to similarity.

Usage

importance_dtw_legacy_cpp(
  y,
  x,
  distance = "euclidean",
  diagonal = FALSE,
  weighted = TRUE,
  ignore_blocks = FALSE,
  bandwidth = 1
)

Arguments

y

(required, numeric matrix) multivariate time series with the same number of columns as 'x'.

x

(required, numeric matrix) multivariate time series.

distance

(optional, character string) distance name from the "names" column of the dataset distances (see distances$name). Default: "euclidean".

diagonal

(optional, logical). If TRUE, diagonals are included in the computation of the cost matrix. Default: FALSE.

weighted

(optional, logical). If TRUE, diagonal is set to TRUE, and diagonal cost is weighted by a factor of 1.414214 (square root of 2). Default: FALSE.

ignore_blocks

(optional, logical). If TRUE, blocks of consecutive path coordinates are trimmed to avoid inflating the psi distance. Default: FALSE.

bandwidth

(required, numeric) Size of the Sakoe-Chiba band at both sides of the diagonal used to constrain the least cost path. Expressed as a fraction of the number of matrix rows and columns. Unrestricted by default. Default: 1

Value

data frame

See also

Other Rcpp_importance: importance_dtw_cpp(), importance_ls_cpp()

Examples

#simulate two regular time series
x <- zoo_simulate(
  seed = 1,
  rows = 100
  )

y <- zoo_simulate(
  seed = 2,
  rows = 150
  )

#different number of rows
#this is not a requirement though!
nrow(x) == nrow(y)
#> [1] FALSE

#compute importance
df <- importance_dtw_legacy_cpp(
  x = x,
  y = y,
  distance = "euclidean"
)

df
#>   variable     psi psi_only_with psi_without psi_difference importance
#> 1        a 5.90895      3.707647    6.235127      -2.527480  -5.520046
#> 2        b 5.90895      3.655116    5.511635      -1.856518   6.723961
#> 3        c 5.90895      2.716768    5.741011      -3.024243   2.842119
#> 4        d 5.90895      3.272493    5.840707      -2.568214   1.154905
#> 5        e 5.90895      1.352853    6.137938      -4.785084  -3.875268