Pairwise Correlation Data Frame

Computes a pairwise correlation data frame. Implements methods to compare different types of predictors:

numeric vs. numeric: as computed with stats::cor() using the methods "pearson" or "spearman", via cor_numeric_vs_numeric().
numeric vs. categorical: the function cor_numeric_vs_categorical() target-encodes the categorical variable using the numeric variable as reference with target_encoding_lab() and the method "loo" (leave-one-out), and then their correlation is computed with stats::cor().
categorical vs. categorical: the function cor_categorical_vs_categorical() computes Cramer's V (see cor_cramer_v()) as indicator of the association between character or factor variables. However, take in mind that Cramer's V is not directly comparable with R-squared, even when having the same range from zero to one. It is always recommended to target-encode categorical variables with target_encoding_lab() before the pairwise correlation analysis.

Accepts a parallelization setup via future::plan() and a progress bar via progressr::handlers() (see examples).

Usage

cor_df(df = NULL, predictors = NULL, quiet = FALSE)

cor_numeric_vs_numeric(df = NULL, predictors = NULL, quiet = FALSE)

cor_numeric_vs_categorical(df = NULL, predictors = NULL, quiet = FALSE)

cor_categorical_vs_categorical(df = NULL, predictors = NULL, quiet = FALSE)

Arguments

df: (required; data frame, tibble, or sf) A data frame with responses and predictors. Default: NULL.
predictors: (optional; character vector) Names of the predictors to select from df. If omitted, all numeric columns in df are used instead. If argument response is not provided, non-numeric variables are ignored. Default: NULL
quiet: (optional; logical) If FALSE, messages generated during the execution of the function are printed to the console Default: FALSE

Value

data frame; pairwise correlation

Examples

data(
  vi,
  vi_predictors
)

#reduce size of vi to speed-up example execution
vi <- vi[1:1000, ]

#mixed predictors
vi_predictors <- vi_predictors[1:10]

#parallelization setup
future::plan(
  future::multisession,
  workers = 2 #set to parallelly::availableCores() - 1
)

#progress bar
# progressr::handlers(global = TRUE)

#correlation data frame
df <- cor_df(
  df = vi,
  predictors = vi_predictors
)

df
#>                     x                  y correlation
#> 1  koppen_description        koppen_zone  0.99745611
#> 2        koppen_group        koppen_zone  0.99142913
#> 3            swi_mean koppen_description  0.90497221
#> 4            swi_mean        koppen_zone  0.90454452
#> 5             swi_max           swi_mean  0.89615880
#> 6             swi_max        koppen_zone  0.89539332
#> 7             swi_max koppen_description  0.89444755
#> 8  koppen_description       koppen_group  0.87130629
#> 9             swi_min        koppen_zone  0.86281575
#> 10           swi_mean       koppen_group  0.86243147
#> 11            swi_min koppen_description  0.82466707
#> 12            swi_max       koppen_group  0.81826035
#> 13            swi_min       koppen_group  0.81719574
#> 14           swi_mean          soil_type  0.75221964
#> 15            swi_max          soil_type  0.73723126
#> 16            swi_min           swi_mean  0.67767585
#> 17            swi_min            swi_max  0.64216484
#> 18            swi_min          soil_type  0.61277268
#> 19       koppen_group          soil_type  0.57728499
#> 20     topo_diversity         topo_slope  0.53815381
#> 21     topo_elevation        koppen_zone  0.53549999
#> 22     topo_elevation koppen_description  0.51881683
#> 23     topo_elevation         topo_slope  0.37851557
#> 24 koppen_description          soil_type  0.36467487
#> 25     topo_diversity koppen_description  0.35558428
#> 26     topo_diversity        koppen_zone  0.35536647
#> 27     topo_diversity          soil_type  0.34146783
#> 28        koppen_zone          soil_type  0.33542742
#> 29     topo_elevation          soil_type  0.32824302
#> 30     topo_diversity       koppen_group  0.31970029
#> 31         topo_slope        koppen_zone  0.31392482
#> 32         topo_slope koppen_description  0.29823947
#> 33         topo_slope          soil_type  0.29353313
#> 34         topo_slope       koppen_group  0.27219308
#> 35           swi_mean     topo_diversity  0.24291289
#> 36            swi_min     topo_diversity  0.23685867
#> 37     topo_elevation     topo_diversity  0.22066271
#> 38            swi_min         topo_slope  0.21877803
#> 39     topo_elevation       koppen_group  0.20367646
#> 40            swi_max     topo_diversity  0.18539344
#> 41           swi_mean     topo_elevation -0.17649343
#> 42           swi_mean         topo_slope  0.12893647
#> 43            swi_max         topo_slope  0.10922059
#> 44            swi_max     topo_elevation -0.08993101
#> 45            swi_min     topo_elevation -0.04417289

#disable parallelization
future::plan(future::sequential)

Usage

Arguments

Value

See also

Examples