Computes the correlation matrix among a set of predictors, orders the correlation matrix according to a user-defined preference order, and removes variables one by one, taking into account the preference order, until the remaining ones are below a given Pearson correlation threshold. Warning: variables in preference.order not in colnames(x), and non-numeric columns are removed silently from x and preference.order. The same happens with rows having NA values (na.omit() is applied). The function issues a warning if zero-variance columns are found.

auto_cor(
  x = NULL,
  preference.order = NULL,
  cor.threshold = 0.5,
  verbose = TRUE
)

Arguments

x

A data frame with predictors, or the result of auto_vif() Default: NULL.

preference.order

Character vector indicating the user's order of preference to keep variables. Doesn't need to contain If not provided, variables in x are prioritised by their column order. Default: NULL.

cor.threshold

Numeric between 0 and 1, with recommended values between 0.5 and 0.9. Maximum Pearson correlation between any pair of the selected variables. Default: 0.50

verbose

Logical. if TRUE, describes the function operations to the user. Default:: TRUE

Value

List with three slots:

  • cor: correlation matrix of the selected variables.

  • selected.variables: character vector with the names of the selected variables.

  • selected.variables.df: data frame with the selected variables.

Details

Can be chained together with auto_vif() through pipes, see the examples below.

See also

Examples

if(interactive()){

 #load data
 data(plant_richness_df)

 #on a data frame
 out <- auto_cor(x = plant_richness_df[, 5:21])

 #getting the correlation matrix
 out$cor

 #getting the names of the selected variables
 out$selected.variables

 #getting the data frame of selected variables
 out$selected.variables.df

 #on the result of auto_vif
 out <- auto_vif(x = plant_richness_df[, 5:21])
 out <- auto_cor(x = out)

 #with pipes
 out <- plant_richness_df[, 5:21] %>%
 auto_vif() %>%
 auto_cor()

}