Hierarchical clustering of predictors from their pairwise correlation matrix. Computes the correlation matrix with cor_df() and cor_matrix(), transforms it to a dist object, computes a clustering solution with stats::hclust(), and applies stats::cutree() to separate groups based on the value of the argument max_cor.
Returns a data frame with predictor names and their clusters, and optionally, prints a dendrogram of the clustering solution.
Accepts a parallelization setup via future::plan() and a progress bar via progressr::handlers() (see examples).
Usage
cor_clusters(
df = NULL,
predictors = NULL,
max_cor = 0.75,
method = "complete",
plot = FALSE
)Arguments
- df
(required; data frame, tibble, or sf) A data frame with responses and predictors. Default: NULL.
- predictors
(optional; character vector) Names of the predictors to select from
df. If omitted, all numeric columns indfare used instead. If argumentresponseis not provided, non-numeric variables are ignored. Default: NULL- max_cor
(optional; numeric) Maximum correlation allowed between any pair of variables in
predictors. Recommended values are between 0.5 and 0.9. Higher values return larger number of predictors with a higher multicollinearity. If NULL, the pairwise correlation analysis is disabled. Default:0.75- method
(optional, character string) Argument of
stats::hclust()defining the agglomerative method. One of: "ward.D", "ward.D2", "single", "complete", "average" (= UPGMA), "mcquitty" (= WPGMA), "median" (= WPGMC) or "centroid" (= UPGMC). Unambiguous abbreviations are accepted as well. Default: "complete".- plot
(optional, logical) If TRUE, the clustering is plotted. Default: FALSE
See also
Other pairwise_correlation:
cor_cramer_v(),
cor_df(),
cor_matrix(),
cor_select()
Examples
#parallelization setup
future::plan(
future::multisession,
workers = 2 #set to parallelly::availableCores() - 1
)
#progress bar
# progressr::handlers(global = TRUE)
df_clusters <- cor_clusters(
df = vi[1:1000, ],
predictors = vi_predictors[1:15]
)
#disable parallelization
future::plan(future::sequential)
