Aggregate Dissimilarity Analysis Data Frames Across Parameter Combinations
Source:R/distantia_aggregate.R
distantia_aggregate.Rd
The functions distantia()
and distantia_importance()
allow dissimilarity assessments based on different combinations of arguments at once. For example, when the argument distance
is set to c("euclidean", "manhattan")
, the output data frame will show two dissimilarity scores for each pair of compared time series, one based on euclidean distances, and another based on manhattan distances.
When df
is the result of distantia()
, the input data is grouped by pairs of time series, and the function f
is applied to the column "psi" by group
When df
is the result of distantia_importance()
, the input data is grouped by pairs of time series and variables, and the function f
is applied to the columns "importance", "psi_only_with" and "psi_without" by group. However, if the values TRUE and FALSE appear in the column "robust" (which is not allowed by default in distantia_importance()
), then the aggregation is cancelled with an error, as the results of both methods should not be aggregated together.
If psi scores smaller than zero occur in the aggregated output, then the the smaller psi value is added to the column psi
to start dissimilarity scores at zero.
If there are no different combinations of arguments in the input data frame, no aggregation happens, but all parameter columns are removed.
Arguments
- df
(required, data frame) Output of
distantia()
ordistantia_importance()
. Default: NULL- f
(optional, function) Function to summarize psi scores (for example,
mean
) when there are several combinations of parameters indf
. Ignored when there is a single combination of arguments in the input. Default:mean
- ...
(optional, arguments of
f
) Further arguments to pass to the functionf
.
See also
Other dissimilarity_analysis:
distantia_boxplot()
,
distantia_cluster_hclust()
,
distantia_cluster_kmeans()
,
distantia_matrix()
,
distantia_plot()
,
distantia_to_sf()
Examples
#three time series
#climate and ndvi in Fagus sylvatica stands in Spain, Germany, and Sweden
tsl <- tsl_initialize(
x = fagus_dynamics,
name_column = "name",
time_column = "time"
) |>
tsl_transform(
f = f_scale
)
if(interactive()){
tsl_plot(
tsl = tsl,
guide_columns = 3
)
}
#distantia with multiple parameter combinations
#-------------------------------------
df_multiple <- distantia(
tsl = tsl,
distance = "euclidean",
lock_step = c(TRUE, FALSE)
)
#> Loading required package: foreach
#> Loading required package: future
df_multiple[, c(
"x",
"y",
"distance",
"lock_step",
"psi"
)]
#> x y distance lock_step psi
#> 1 Germany Spain euclidean TRUE 1.3061327
#> 2 Germany Sweden euclidean TRUE 0.8576700
#> 3 Spain Sweden euclidean TRUE 1.4708497
#> 4 Germany Spain euclidean FALSE 1.3429956
#> 5 Germany Sweden euclidean FALSE 0.8571217
#> 6 Spain Sweden euclidean FALSE 1.4803954
#aggregation using means
df <- distantia_aggregate(
df = df_multiple,
f = mean
)
df
#> x y psi
#> 1 Germany Spain 1.3245642
#> 2 Germany Sweden 0.8573959
#> 3 Spain Sweden 1.4756226