Computes the correlation matrix among a set of predictors, orders the correlation matrix according to a user-defined preference order, and removes variables one by one, taking into account the preference order, until the remaining ones are below a given Pearson correlation threshold. Warning: variables in preference.order
not in colnames(x)
, and non-numeric columns are removed silently from x
and preference.order
. The same happens with rows having NA values (na.omit()
is applied). The function issues a warning if zero-variance columns are found.
auto_cor(
x = NULL,
preference.order = NULL,
cor.threshold = 0.5,
verbose = TRUE
)
A data frame with predictors, or the result of auto_vif()
Default: NULL
.
Character vector indicating the user's order of preference to keep variables. Doesn't need to contain If not provided, variables in x
are prioritised by their column order. Default: NULL
.
Numeric between 0 and 1, with recommended values between 0.5 and 0.9. Maximum Pearson correlation between any pair of the selected variables. Default: 0.50
Logical. if TRUE
, describes the function operations to the user. Default:: TRUE
List with three slots:
cor
: correlation matrix of the selected variables.
selected.variables
: character vector with the names of the selected variables.
selected.variables.df
: data frame with the selected variables.
Can be chained together with auto_vif()
through pipes, see the examples below.
if(interactive()){
#load data
data(plant_richness_df)
#on a data frame
out <- auto_cor(x = plant_richness_df[, 5:21])
#getting the correlation matrix
out$cor
#getting the names of the selected variables
out$selected.variables
#getting the data frame of selected variables
out$selected.variables.df
#on the result of auto_vif
out <- auto_vif(x = plant_richness_df[, 5:21])
out <- auto_cor(x = out)
#with pipes
out <- plant_richness_df[, 5:21] %>%
auto_vif() %>%
auto_cor()
}