Hierarchical clustering of predictors from their pairwise correlation matrix. Computes the correlation matrix with cor_df()
and cor_matrix()
, transforms it to a dist object, computes a clustering solution with stats::hclust()
, and applies stats::cutree()
to separate groups based on the value of the argument max_cor
.
Returns a data frame with predictor names and their clusters, and optionally, prints a dendrogram of the clustering solution.
Accepts a parallelization setup via future::plan()
and a progress bar via progressr::handlers()
(see examples).
Usage
cor_clusters(
df = NULL,
predictors = NULL,
max_cor = 0.75,
method = "complete",
plot = FALSE
)
Arguments
- df
(required; data frame, tibble, or sf) A data frame with responses and predictors. Default: NULL.
- predictors
(optional; character vector) Names of the predictors to select from
df
. If omitted, all numeric columns indf
are used instead. If argumentresponse
is not provided, non-numeric variables are ignored. Default: NULL- max_cor
(optional; numeric) Maximum correlation allowed between any pair of variables in
predictors
. Recommended values are between 0.5 and 0.9. Higher values return larger number of predictors with a higher multicollinearity. If NULL, the pairwise correlation analysis is disabled. Default:0.75
- method
(optional, character string) Argument of
stats::hclust()
defining the agglomerative method. One of: "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). Unambiguous abbreviations are accepted as well. Default: "complete".- plot
(optional, logical) If TRUE, the clustering is plotted. Default: FALSE
See also
Other pairwise_correlation:
cor_cramer_v()
,
cor_df()
,
cor_matrix()
,
cor_select()
Examples
#parallelization setup
future::plan(
future::multisession,
workers = 2 #set to parallelly::availableCores() - 1
)
#progress bar
# progressr::handlers(global = TRUE)
df_clusters <- cor_clusters(
df = vi[1:1000, ],
predictors = vi_predictors[1:15]
)
#disable parallelization
future::plan(future::sequential)