Computes a pairwise correlation data frame. Implements methods to compare different types of predictors:
numeric vs. numeric: as computed with
stats::cor()
using the methods "pearson" or "spearman", viacor_numeric_vs_numeric()
.numeric vs. categorical: the function
cor_numeric_vs_categorical()
target-encodes the categorical variable using the numeric variable as reference withtarget_encoding_lab()
and the method "loo" (leave-one-out), and then their correlation is computed withstats::cor()
.categorical vs. categorical: the function
cor_categorical_vs_categorical()
computes Cramer's V (seecor_cramer_v()
) as indicator of the association between character or factor variables. However, take in mind that Cramer's V is not directly comparable with R-squared, even when having the same range from zero to one. It is always recommended to target-encode categorical variables withtarget_encoding_lab()
before the pairwise correlation analysis.
Accepts a parallelization setup via future::plan()
and a progress bar via progressr::handlers()
(see examples).
Usage
cor_df(df = NULL, predictors = NULL, quiet = FALSE)
cor_numeric_vs_numeric(df = NULL, predictors = NULL, quiet = FALSE)
cor_numeric_vs_categorical(df = NULL, predictors = NULL, quiet = FALSE)
cor_categorical_vs_categorical(df = NULL, predictors = NULL, quiet = FALSE)
Arguments
- df
(required; data frame, tibble, or sf) A data frame with responses and predictors. Default: NULL.
- predictors
(optional; character vector) Names of the predictors to select from
df
. If omitted, all numeric columns indf
are used instead. If argumentresponse
is not provided, non-numeric variables are ignored. Default: NULL- quiet
(optional; logical) If FALSE, messages generated during the execution of the function are printed to the console Default: FALSE
See also
Other pairwise_correlation:
cor_clusters()
,
cor_cramer_v()
,
cor_matrix()
,
cor_select()
Other pairwise_correlation:
cor_clusters()
,
cor_cramer_v()
,
cor_matrix()
,
cor_select()
Other pairwise_correlation:
cor_clusters()
,
cor_cramer_v()
,
cor_matrix()
,
cor_select()
Other pairwise_correlation:
cor_clusters()
,
cor_cramer_v()
,
cor_matrix()
,
cor_select()
Examples
data(
vi,
vi_predictors
)
#reduce size of vi to speed-up example execution
vi <- vi[1:1000, ]
#mixed predictors
vi_predictors <- vi_predictors[1:10]
#parallelization setup
future::plan(
future::multisession,
workers = 2 #set to parallelly::availableCores() - 1
)
#progress bar
# progressr::handlers(global = TRUE)
#correlation data frame
df <- cor_df(
df = vi,
predictors = vi_predictors
)
df
#> x y correlation
#> 1 koppen_description koppen_zone 0.99745611
#> 2 koppen_group koppen_zone 0.99142913
#> 3 swi_mean koppen_description 0.90497221
#> 4 swi_mean koppen_zone 0.90454452
#> 5 swi_max swi_mean 0.89615880
#> 6 swi_max koppen_zone 0.89539332
#> 7 swi_max koppen_description 0.89444755
#> 8 koppen_description koppen_group 0.87130629
#> 9 swi_min koppen_zone 0.86281575
#> 10 swi_mean koppen_group 0.86243147
#> 11 swi_min koppen_description 0.82466707
#> 12 swi_max koppen_group 0.81826035
#> 13 swi_min koppen_group 0.81719574
#> 14 swi_mean soil_type 0.75221964
#> 15 swi_max soil_type 0.73723126
#> 16 swi_min swi_mean 0.67767585
#> 17 swi_min swi_max 0.64216484
#> 18 swi_min soil_type 0.61277268
#> 19 koppen_group soil_type 0.57728499
#> 20 topo_diversity topo_slope 0.53815381
#> 21 topo_elevation koppen_zone 0.53549999
#> 22 topo_elevation koppen_description 0.51881683
#> 23 topo_elevation topo_slope 0.37851557
#> 24 koppen_description soil_type 0.36467487
#> 25 topo_diversity koppen_description 0.35558428
#> 26 topo_diversity koppen_zone 0.35536647
#> 27 topo_diversity soil_type 0.34146783
#> 28 koppen_zone soil_type 0.33542742
#> 29 topo_elevation soil_type 0.32824302
#> 30 topo_diversity koppen_group 0.31970029
#> 31 topo_slope koppen_zone 0.31392482
#> 32 topo_slope koppen_description 0.29823947
#> 33 topo_slope soil_type 0.29353313
#> 34 topo_slope koppen_group 0.27219308
#> 35 swi_mean topo_diversity 0.24291289
#> 36 swi_min topo_diversity 0.23685867
#> 37 topo_elevation topo_diversity 0.22066271
#> 38 swi_min topo_slope 0.21877803
#> 39 topo_elevation koppen_group 0.20367646
#> 40 swi_max topo_diversity 0.18539344
#> 41 swi_mean topo_elevation -0.17649343
#> 42 swi_mean topo_slope 0.12893647
#> 43 swi_max topo_slope 0.10922059
#> 44 swi_max topo_elevation -0.08993101
#> 45 swi_min topo_elevation -0.04417289
#disable parallelization
future::plan(future::sequential)