
Optimize the Silhouette Width of Hierarchical Clustering Solutions
Source:R/utils_cluster_hclust_optimizer.R
utils_cluster_hclust_optimizer.RdPerforms a parallelized grid search to find the number of clusters maximizing the overall silhouette width of the clustering solution (see utils_cluster_silhouette()). When method = NULL, the optimization also includes all methods available in stats::hclust() in the grid search. This function supports parallelization via future::plan() and a progress bar generated by the progressr package (see Examples).
Arguments
- d
(required, matrix) distance matrix typically resulting from
distantia_matrix(), but any other square matrix should work. Default: NULL- method
(optional, character string) Argument of
stats::hclust()defining the agglomerative method. One of: "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). Unambiguous abbreviations are accepted as well.This function supports a parallelization setup via
future::plan(), and progress bars provided by the package progressr.
See also
Other distantia_support:
distantia_aggregate(),
distantia_boxplot(),
distantia_cluster_hclust(),
distantia_cluster_kmeans(),
distantia_matrix(),
distantia_model_frame(),
distantia_spatial(),
distantia_stats(),
distantia_time_delay(),
utils_block_size(),
utils_cluster_kmeans_optimizer(),
utils_cluster_silhouette()
Examples
#weekly covid prevalence
#in 10 California counties
#aggregated by month
tsl <- tsl_initialize(
x = covid_prevalence,
name_column = "name",
time_column = "time"
) |>
tsl_subset(
names = 1:10
) |>
tsl_aggregate(
new_time = "months",
fun = max
)
if(interactive()){
#plotting first three time series
tsl_plot(
tsl = tsl_subset(
tsl = tsl,
names = 1:3
),
guide_columns = 3
)
}
#compute dissimilarity matrix
psi_matrix <- distantia(
tsl = tsl,
lock_step = TRUE
) |>
distantia_matrix()
#optimize hierarchical clustering
hclust_optimization <- utils_cluster_hclust_optimizer(
d = psi_matrix
)
#best solution in first row
head(hclust_optimization)
#> clusters method silhouette_mean
#> 1 5 ward.D 0.3127728
#> 2 5 ward.D2 0.3127728
#> 3 5 single 0.3127728
#> 4 5 complete 0.3127728
#> 5 5 average 0.3127728
#> 6 5 mcquitty 0.3127728