Computes the correlation matrix among a set of predictors, orders the correlation matrix according to a user-defined preference order, and removes variables one by one, taking into account the preference order, until the remaining ones are below a given Pearson correlation threshold. Warning: variables in preference.order not in colnames(x), and non-numeric columns are removed silently from x and preference.order. The same happens with rows having NA values (na.omit() is applied). The function issues a warning if zero-variance columns are found.

auto_cor(
  x = NULL,
  preference.order = NULL,
  cor.threshold = 0.5,
  verbose = TRUE
)

Arguments

x

A data frame with predictors, or the result of auto_vif() Default: NULL.

preference.order

Character vector indicating the user's order of preference to keep variables. Doesn't need to contain If not provided, variables in x are prioritised by their column order. Default: NULL.

cor.threshold

Numeric between 0 and 1, with recommended values between 0.5 and 0.9. Maximum Pearson correlation between any pair of the selected variables. Default: 0.50

verbose

Logical. if TRUE, describes the function operations to the user. Default:: TRUE

Value

List with three slots:

  • cor: correlation matrix of the selected variables.

  • selected.variables: character vector with the names of the selected variables.

  • selected.variables.df: data frame with the selected variables.

Details

Can be chained together with auto_vif() through pipes, see the examples below.

See also

Examples

if(interactive()){ #load data data(plant_richness_df) #on a data frame out <- auto_cor(x = plant_richness_df[, 5:21]) #getting the correlation matrix out$cor #getting the names of the selected variables out$selected.variables #getting the data frame of selected variables out$selected.variables.df #on the result of auto_vif out <- auto_vif(x = plant_richness_df[, 5:21]) out <- auto_cor(x = out) #with pipes out <- plant_richness_df[, 5:21] %>% auto_vif() %>% auto_cor() }