Optimize the Silhouette Width of K-Means Clustering Solutions
Source:R/utils_cluster_kmeans_optimizer.R
utils_cluster_kmeans_optimizer.Rd
Generates k-means solutions from 2 to nrow(d) - 1
number of clusters and returns the number of clusters with a higher silhouette width median. See utils_cluster_silhouette()
for more details.
Arguments
- d
(required, matrix) distance matrix typically resulting from
distantia_matrix()
, but any other square matrix should work. Default: NULL- seed
(optional, integer) Random seed to be used during the K-means computation. Default: 1
See also
Other internal_dissimilarity_analysis:
utils_block_size()
,
utils_cluster_hclust_optimizer()
,
utils_cluster_silhouette()
,
utils_importance_df_to_wide()
Examples
#parallelization and progress bar
#for large datasets, parallelization accelerates cluster optimization
future::plan(
future::multisession,
workers = 2 #set to parallelly::availableWorkers() - 1
)
#progress bar
# progressr::handlers(global = TRUE)
#weekly covid prevalence
#in 10 California counties
#aggregated by month
tsl <- tsl_initialize(
x = covid_prevalence,
name_column = "name",
time_column = "time"
) |>
tsl_subset(
names = 1:10
) |>
tsl_aggregate(
new_time = "months",
fun = max
)
if(interactive()){
#plotting first three time series
tsl_plot(
tsl = tsl_subset(
tsl = tsl,
names = 1:3
),
guide_columns = 3
)
}
#compute dissimilarity matrix
psi_matrix <- distantia(
tsl = tsl,
lock_step = TRUE
) |>
distantia_matrix()
#optimize hierarchical clustering
kmeans_optimization <- utils_cluster_kmeans_optimizer(
d = psi_matrix
)
#best solution in first row
head(kmeans_optimization)
#> clusters silhouette_mean
#> 1 2 0.3175009
#> 2 5 0.3080265
#> 3 4 0.2970758
#> 4 3 0.2618570
#> 5 6 0.2460827
#> 6 7 0.2312105
#disable parallelization
future::plan(
future::sequential
)