Skip to contents

Computes a square matrix of pairwise correlations for a set of numeric and/or categorical predictors.

If df is already a correlation dataframe generated by cor_df()), the function transforms it into a correlation matrix. Otherwise, cor_df() is used internally to compute pairwise correlations before generating the matrix.

Supports parallel computation via future::plan() and optional progress reporting via progressr::handlers().

Usage

cor_matrix(df = NULL, predictors = NULL, quiet = FALSE, ...)

Arguments

df

(required; dataframe, tibble, or sf) A dataframe with predictors or the output of cor_df(). Default: NULL.

predictors

(optional; character vector or NULL) Names of the predictors in df. If NULL, all columns except responses and constant/near-zero-variance columns are used. Default: NULL.

quiet

(optional; logical) If FALSE, messages are printed. Default: FALSE.

...

(optional) Internal args (e.g. function_name for validate_arg_function_name, a precomputed correlation matrix m, or cross-validation args for preference_order).

Value

correlation matrix

See also

Author

Blas M. Benito, PhD

Examples

data(vi_smol)

## OPTIONAL: parallelization setup
## irrelevant when all predictors are numeric
## only worth it for large data with many categoricals
# future::plan(
#   future::multisession,
#   workers = future::availableCores() - 1
# )

## OPTIONAL: progress bar
# progressr::handlers(global = TRUE)

predictors <- c(
  "koppen_zone", #character
  "soil_type", #factor
  "topo_elevation", #numeric
  "soil_temperature_mean" #numeric
)

#from dataframe with predictors
x <- cor_matrix(
  df = vi_smol,
  predictors = predictors
)
#> 
#> collinear::cor_matrix()
#> └── collinear::cor_df()
#>     └── collinear::validate_arg_df(): converted the following character columns to factor:
#>  - koppen_zone
#> 
#> collinear::cor_matrix()
#> └── collinear::cor_df(): 2 categorical predictors have cardinality > 2 and may bias the multicollinearity analysis. Applying target encoding to convert them to numeric will solve this issue.

x
#>                       koppen_zone soil_temperature_mean soil_type
#> koppen_zone             1.0000000             0.9195774 0.3146128
#> soil_temperature_mean   0.9195774             1.0000000 0.6306982
#> soil_type               0.3146128             0.6306982 1.0000000
#> topo_elevation          0.5413656            -0.2837184 0.3458931
#>                       topo_elevation
#> koppen_zone                0.5413656
#> soil_temperature_mean     -0.2837184
#> soil_type                  0.3458931
#> topo_elevation             1.0000000
#> attr(,"class")
#> [1] "collinear_cor_matrix" "matrix"               "array"               

#from correlation dataframe
x <- cor_df(
  df = vi,
  predictors = predictors
) |>
  cor_matrix()
#> 
#> collinear::cor_df()
#> └── collinear::validate_arg_df(): converted the following character columns to factor:
#>  - koppen_zone
#> 
#> collinear::cor_df(): 2 categorical predictors have cardinality > 2 and may bias the multicollinearity analysis. Applying target encoding to convert them to numeric will solve this issue.

x
#>                       koppen_zone soil_temperature_mean soil_type
#> koppen_zone             1.0000000             0.9197237 0.3074300
#> soil_temperature_mean   0.9197237             1.0000000 0.6825247
#> soil_type               0.3074300             0.6825247 1.0000000
#> topo_elevation          0.5774720            -0.2613602 0.3929814
#>                       topo_elevation
#> koppen_zone                0.5774720
#> soil_temperature_mean     -0.2613602
#> soil_type                  0.3929814
#> topo_elevation             1.0000000
#> attr(,"class")
#> [1] "collinear_cor_matrix" "matrix"               "array"               

## OPTIONAL: disable parallelization
#future::plan(future::sequential)