Skip to contents

Multicollinearity Filtering

Remove redundant predictors from modelling datasets. These functions filter variables by pairwise correlation, variance inflation factors, or both, while respecting user-defined predictor priorities.

collinear()
Smart multicollinearity management
collinear_select()
Dual multicollinearity filtering algorithm
cor_select()
Multicollinearity filtering by pairwise correlation threshold
step_collinear() prep(<step_collinear>) bake(<step_collinear>)
Tidymodels recipe step for multicollinearity filtering
vif_select()
Multicollinearity filtering by variance inflation factor threshold

Multicollinearity Assessment

Quantify redundancy among predictors. Compute pairwise correlations, variance inflation factors, and summary statistics for datasets with numeric and categorical variables.

collinear_stats()
Compute summary statistics for correlation and VIF
cor_clusters()
Group predictors by hierarchical correlation clustering
cor_cramer()
Quantify association between categorical variables
cor_df()
Compute signed pairwise correlations dataframe
cor_matrix()
Signed pairwise correlation matrix
cor_stats()
Compute summary statistics for absolute pairwise correlations
vif()
Compute variance inflation factors from a correlation matrix
vif_df()
Compute variance inflation factors dataframe
vif_stats()
VIF Statistics

Predictor Ranking

Prioritize predictors for multicollinearity filtering. Rank variables by their association with a response or by their redundancy with other predictors. Supports cross-validation and multiple response types.

f_binomial_gam()
Area under the curve of binomial GAM predictions vs. observations
f_binomial_glm()
Area Under the Curve of Binomial GLM predictions vs. observations
f_binomial_rf()
Area Under the Curve of Binomial Random Forest predictions vs. observations
f_categorical_rf()
Cramer's V of Categorical Random Forest predictions vs. observations
f_count_gam()
R-squared of Poisson GAM predictions vs. observations
f_count_glm()
R-squared of Poisson GLM predictions vs. observations
f_count_rf()
R-squared of Random Forest predictions vs. observations
f_numeric_gam()
R-squared of Gaussian GAM predictions vs. observations
f_numeric_glm()
R-squared of Gaussian GLM predictions vs. observations
f_numeric_rf()
R-squared of Random Forest predictions vs. observations
preference_order()
Rank predictors by importance or multicollinearity
f_auto()
Automatic selection of predictor scoring method
f_auto_rules()
Decision rules for f_auto()
f_functions()
List predictor scoring functions

Target Encoding

Convert categorical predictors to numeric using response values. Implements mean, leave-one-out, and rank encoding methods for seamless integration of categorical variables in correlation and VIF analyses.

target_encoding_lab()
Convert categorical predictors to numeric via target encoding
target_encoding_loo() target_encoding_mean() target_encoding_rank()
Encode categories as response means

Example Data

Sample datasets for exploring package functionality. Includes dataframes with numeric, categorical, and mixed predictor types, plus multiple response encodings.

toy
Toy dataframe with varying levels of multicollinearity
vi
Large example dataframe
vi_predictors
Vector of all predictor names in vi and vi_smol
vi_predictors_categorical
Vector of categorical predictors in vi and vi_smol
vi_predictors_numeric
Vector of numeric predictor names in vi and vi_smol
vi_responses
Vector of response names in vi and vi_smol
vi_smol
Small example dataframe

Validation Experiments

Results from simulation studies used to calibrate adaptive thresholds and validate the equivalence between correlation and VIF filtering.

experiment_adaptive_thresholds
Dataframe resulting from experiment to test the automatic selection of multicollinearity thresholds
experiment_cor_vs_vif
Dataframe with results of experiment comparing correlation and VIF thresholds
gam_cor_to_vif
GAM describing the relationship between correlation and VIF thresholds
prediction_cor_to_vif
Prediction of the model gam_cor_to_vif across correlation values

S3 methods for displaying and summarizing results from collinear() and related functions.

print(<collinear_output>)
Print all collinear selection results of collinear()
print(<collinear_selection>)
Print single selection results from collinear
summary(<collinear_output>)
Summarize all results of collinear()
summary(<collinear_selection>)
Summarize single response selection results of collinear

Variable Type Detection

Identify and classify variables by type. Detect numeric, categorical, logical, and near-zero variance columns in modelling datasets.

identify_categorical_variables()
Find valid categorical variables in a dataframe
identify_logical_variables()
Find logical variables in a dataframe
identify_numeric_variables()
Find valid numeric variables in a dataframe
identify_response_type()
Detect response variable type for model selection
identify_valid_variables()
Find valid numeric, categorical, and logical variables in a dataframe
identify_zero_variance_variables()
Find near-zero variance variables in a dataframe

Modelling Utilities

Helper functions for model fitting and evaluation. Generate formulas, compute performance metrics, and create class-balancing weights.

case_weights()
Generate sample weights for imbalanced responses
model_formula()
Build model formulas from response and predictors
score_auc()
Compute area under the ROC curve between binomial observations and probabilistic predictions
score_cramer()
Compute Cramer's V between categorical observations and predictions
score_r2()
Compute R-squared between numeric observations and predictions

Input Validation

Internal functions for checking and preparing function arguments. Ensure data frames, variable names, and parameters meet requirements.

drop_geometry_column()
Removes geometry Column From sf Dataframes
validate_arg_df()
Check and prepare argument df
validate_arg_df_not_null()
Ensure that argument df is not NULL
validate_arg_encoding_method()
Check and validate argument encoding_method
validate_arg_f()
Check and validate argument f
validate_arg_function_name()
Build hierarchical function names for messages
validate_arg_max_cor()
Check and constrain argument max_cor
validate_arg_max_vif()
Check and constrain argument max_vif
validate_arg_predictors()
Check and validate argument predictors
validate_arg_preference_order()
Check and complete argument preference_order
validate_arg_quiet()
Check and validate argument quiet
validate_arg_responses()
Check and validate arguments response and responses