Computes the correlation matrix among a set of predictors, orders the correlation matrix according to a user-defined preference order, and removes variables one by one, taking into account the preference order, until the remaining ones are below a given Pearson correlation threshold. Warning: variables in preference.order not in colnames(x), and non-numeric columns are removed silently from x and preference.order. The same happens with rows having NA values (na.omit() is applied). The function issues a warning if zero-variance columns are found.

auto_cor(
x = NULL,
preference.order = NULL,
cor.threshold = 0.5,
verbose = TRUE
)

## Arguments

x A data frame with predictors, or the result of auto_vif() Default: NULL. Character vector indicating the user's order of preference to keep variables. Doesn't need to contain If not provided, variables in x are prioritised by their column order. Default: NULL. Numeric between 0 and 1, with recommended values between 0.5 and 0.9. Maximum Pearson correlation between any pair of the selected variables. Default: 0.50 Logical. if TRUE, describes the function operations to the user. Default:: TRUE

## Value

List with three slots:

• cor: correlation matrix of the selected variables.

• selected.variables: character vector with the names of the selected variables.

• selected.variables.df: data frame with the selected variables.

## Details

Can be chained together with auto_vif() through pipes, see the examples below.

auto_vif()

## Examples

if(interactive()){

data(plant_richness_df)

#on a data frame
out <- auto_cor(x = plant_richness_df[, 5:21])

#getting the correlation matrix
out$cor #getting the names of the selected variables out$selected.variables

#getting the data frame of selected variables
out\$selected.variables.df

#on the result of auto_vif
out <- auto_vif(x = plant_richness_df[, 5:21])
out <- auto_cor(x = out)

#with pipes
out <- plant_richness_df[, 5:21] %>%
auto_vif() %>%
auto_cor()

}