Computes a square matrix of pairwise correlations for a set of numeric and/or categorical predictors.
If df is already a correlation dataframe generated by cor_df()), the function transforms it into a correlation matrix. Otherwise, cor_df() is used internally to compute pairwise correlations before generating the matrix.
Supports parallel computation via future::plan() and optional progress reporting via progressr::handlers().
Arguments
- df
(required; dataframe, tibble, or sf) A dataframe with predictors or the output of
cor_df(). Default: NULL.- predictors
(optional; character vector or NULL) Names of the predictors in
df. If NULL, all columns exceptresponsesand constant/near-zero-variance columns are used. Default: NULL.- quiet
(optional; logical) If FALSE, messages are printed. Default: FALSE.
- ...
(optional) Internal args (e.g.
function_nameforvalidate_arg_function_name, a precomputed correlation matrixm, or cross-validation args forpreference_order).
See also
Other multicollinearity_assessment:
collinear_stats(),
cor_clusters(),
cor_cramer(),
cor_df(),
cor_stats(),
vif(),
vif_df(),
vif_stats()
Examples
data(vi_smol)
## OPTIONAL: parallelization setup
## irrelevant when all predictors are numeric
## only worth it for large data with many categoricals
# future::plan(
# future::multisession,
# workers = future::availableCores() - 1
# )
## OPTIONAL: progress bar
# progressr::handlers(global = TRUE)
predictors <- c(
"koppen_zone", #character
"soil_type", #factor
"topo_elevation", #numeric
"soil_temperature_mean" #numeric
)
#from dataframe with predictors
x <- cor_matrix(
df = vi_smol,
predictors = predictors
)
#>
#> collinear::cor_matrix()
#> └── collinear::cor_df()
#> └── collinear::validate_arg_df(): converted the following character columns to factor:
#> - koppen_zone
#>
#> collinear::cor_matrix()
#> └── collinear::cor_df(): 2 categorical predictors have cardinality > 2 and may bias the multicollinearity analysis. Applying target encoding to convert them to numeric will solve this issue.
x
#> koppen_zone soil_temperature_mean soil_type
#> koppen_zone 1.0000000 0.9195774 0.3146128
#> soil_temperature_mean 0.9195774 1.0000000 0.6306982
#> soil_type 0.3146128 0.6306982 1.0000000
#> topo_elevation 0.5413656 -0.2837184 0.3458931
#> topo_elevation
#> koppen_zone 0.5413656
#> soil_temperature_mean -0.2837184
#> soil_type 0.3458931
#> topo_elevation 1.0000000
#> attr(,"class")
#> [1] "collinear_cor_matrix" "matrix" "array"
#from correlation dataframe
x <- cor_df(
df = vi,
predictors = predictors
) |>
cor_matrix()
#>
#> collinear::cor_df()
#> └── collinear::validate_arg_df(): converted the following character columns to factor:
#> - koppen_zone
#>
#> collinear::cor_df(): 2 categorical predictors have cardinality > 2 and may bias the multicollinearity analysis. Applying target encoding to convert them to numeric will solve this issue.
x
#> koppen_zone soil_temperature_mean soil_type
#> koppen_zone 1.0000000 0.9197237 0.3074300
#> soil_temperature_mean 0.9197237 1.0000000 0.6825247
#> soil_type 0.3074300 0.6825247 1.0000000
#> topo_elevation 0.5774720 -0.2613602 0.3929814
#> topo_elevation
#> koppen_zone 0.5774720
#> soil_temperature_mean -0.2613602
#> soil_type 0.3929814
#> topo_elevation 1.0000000
#> attr(,"class")
#> [1] "collinear_cor_matrix" "matrix" "array"
## OPTIONAL: disable parallelization
#future::plan(future::sequential)
