Skip to contents

The functions distantia() and distantia_importance() allow dissimilarity assessments based on different combinations of arguments at once. For example, when the argument distance is set to c("euclidean", "manhattan"), the output data frame will show two dissimilarity scores for each pair of compared time series, one based on euclidean distances, and another based on manhattan distances.

When df is the result of distantia(), the input data is grouped by pairs of time series, and the function f is applied to the column "psi" by group

When df is the result of distantia_importance(), the input data is grouped by pairs of time series and variables, and the function f is applied to the columns "importance", "psi_only_with" and "psi_without" by group. However, if the values TRUE and FALSE appear in the column "robust" (which is not allowed by default in distantia_importance()), then the aggregation is cancelled with an error, as the results of both methods should not be aggregated together.

If psi scores smaller than zero occur in the aggregated output, then the the smaller psi value is added to the column psi to start dissimilarity scores at zero.

If there are no different combinations of arguments in the input data frame, no aggregation happens, but all parameter columns are removed.

Usage

distantia_aggregate(df = NULL, f = mean, ...)

Arguments

df

(required, data frame) Output of distantia() or distantia_importance(). Default: NULL

f

(optional, function) Function to summarize psi scores (for example, mean) when there are several combinations of parameters in df. Ignored when there is a single combination of arguments in the input. Default: mean

...

(optional, arguments of f) Further arguments to pass to the function f.

Value

data frame

Examples

#three time series
#climate and ndvi in Fagus sylvatica stands in Spain, Germany, and Sweden
tsl <- tsl_initialize(
  x = fagus_dynamics,
  name_column = "name",
  time_column = "time"
) |>
  tsl_transform(
    f = f_scale
  )

if(interactive()){
  tsl_plot(
    tsl = tsl,
    guide_columns = 3
    )
}

#distantia with multiple parameter combinations
#-------------------------------------
df_multiple <- distantia(
  tsl = tsl,
  distance = "euclidean",
  lock_step = c(TRUE, FALSE)
)
#> Loading required package: foreach
#> Loading required package: future

df_multiple[, c(
  "x",
  "y",
  "distance",
  "lock_step",
  "psi"
)]
#>         x      y  distance lock_step       psi
#> 1 Germany  Spain euclidean      TRUE 1.3061327
#> 2 Germany Sweden euclidean      TRUE 0.8576700
#> 3   Spain Sweden euclidean      TRUE 1.4708497
#> 4 Germany  Spain euclidean     FALSE 1.3429956
#> 5 Germany Sweden euclidean     FALSE 0.8571217
#> 6   Spain Sweden euclidean     FALSE 1.4803954

#aggregation using means
df <- distantia_aggregate(
  df = df_multiple,
  f = mean
)

df
#>         x      y       psi
#> 1 Germany  Spain 1.3245642
#> 2 Germany Sweden 0.8573959
#> 3   Spain Sweden 1.4756226