Skip to contents

Automated Multicollinearity Management

Tools to automatically select sets of variables with a low multicollinearity.

collinear()
Automated multicollinearity management

Variance Inflation Factors

Functions implementing VIF-based methods for multicollinearity filtering.

vif_df()
Variance Inflation Factor
vif_select()
Automated Multicollinearity Filtering with Variance Inflation Factors

Pairwise Correlation

Functions implementing pairwise correlation-based methods for multicollinearity filtering.

cor_clusters()
Hierarchical Clustering from a Pairwise Correlation Matrix
cor_cramer_v()
Bias Corrected Cramer's V
cor_df() cor_numeric_vs_numeric() cor_numeric_vs_categorical() cor_categorical_vs_categorical()
Pairwise Correlation Data Frame
cor_matrix()
Pairwise Correlation Matrix
cor_select()
Automated Multicollinearity Filtering with Pairwise Correlations

Target Encoding

Tools to transform categorical variables into numeric.

target_encoding_lab()
Target Encoding Lab: Transform Categorical Variables to Numeric
target_encoding_mean() target_encoding_rank() target_encoding_loo()
Target Encoding Methods
add_white_noise()
Add White Noise to Encoded Predictor
encoded_predictor_name()
Name of Target-Encoded Predictor

Preference Order

Rank predictors by their association to a response to preserve important ones during multicollinearity filtering.

preference_order()
Quantitative Variable Prioritization for Multicollinearity Filtering
f_auto()
Select Function to Compute Preference Order
f_auto_rules()
Rules to Select Default f Argument to Compute Preference Order
f_functions()
Data Frame of Preference Functions
preference_order_collinear()
Preference Order Argument in collinear()
f_auc_glm_binomial() f_auc_glm_binomial_poly2() f_auc_gam_binomial() f_auc_rpart() f_auc_rf()
Association Between a Binomial Response and a Continuous Predictor
f_r2_pearson() f_r2_spearman() f_r2_glm_gaussian() f_r2_glm_gaussian_poly2() f_r2_gam_gaussian() f_r2_rpart() f_r2_rf()
Association Between a Continuous Response and a Continuous Predictor
f_r2_glm_poisson() f_r2_glm_poisson_poly2() f_r2_gam_poisson()
Association Between a Count Response and a Continuous Predictor
f_v()
Association Between a Categorical Response and a Categorical Predictor
f_v_rf_categorical()
Association Between a Categorical Response and a Categorical or Numeric Predictor

Modelling Tools

Tools to evaluate models, weight cases, and generate model formulas.

case_weights()
Case Weights for Unbalanced Binomial or Categorical Responses
model_formula()
Generate Model Formulas
performance_score_auc()
Area Under the Curve of Binomial Observations vs Probabilistic Model Predictions
performance_score_r2()
Pearson's R-squared of Observations vs Predictions
performance_score_v()
Cramer's V of Observations vs Predictions

Example Data

Real and synthetic datasets used throughout the package examples.

toy
One response and four predictors with varying levels of multicollinearity
vi
Example Data With Different Response and Predictor Types
vi_predictors
All Predictor Names in Example Data Frame vi
vi_predictors_categorical
All Categorical and Factor Predictor Names in Example Data Frame vi
vi_predictors_numeric
All Numeric Predictor Names in Example Data Frame vi

Data Preparation

Internal functions for data preparation and validation.

validate_data_cor()
Validate Data for Correlation Analysis
validate_data_vif()
Validate Data for VIF Analysis
validate_df()
Validate Argument df
validate_encoding_arguments()
Validates Arguments of target_encoding_lab()
validate_predictors()
Validate Argument predictors
validate_preference_order()
Validate Argument preference_order
validate_response()
Validate Argument response
identify_predictors()
Identify Numeric and Categorical Predictors
identify_predictors_categorical()
Identify Valid Categorical Predictors
identify_predictors_numeric()
Identify Valid Numeric Predictors
identify_predictors_type()
Identify Predictor Types
identify_predictors_zero_variance()
Identify Zero and Near-Zero Variance Predictors
identify_response_type()
Identify Response Type
drop_geometry_column()
Removes geometry column in sf data frames