
Package index
Multicollinearity Filtering
Remove redundant predictors from modelling datasets. These functions filter variables by pairwise correlation, variance inflation factors, or both, while respecting user-defined predictor priorities.
-
collinear() - Smart multicollinearity management
-
collinear_select() - Dual multicollinearity filtering algorithm
-
cor_select() - Multicollinearity filtering by pairwise correlation threshold
-
step_collinear()prep(<step_collinear>)bake(<step_collinear>) - Tidymodels recipe step for multicollinearity filtering
-
vif_select() - Multicollinearity filtering by variance inflation factor threshold
Multicollinearity Assessment
Quantify redundancy among predictors. Compute pairwise correlations, variance inflation factors, and summary statistics for datasets with numeric and categorical variables.
-
collinear_stats() - Compute summary statistics for correlation and VIF
-
cor_clusters() - Group predictors by hierarchical correlation clustering
-
cor_cramer() - Quantify association between categorical variables
-
cor_df() - Compute signed pairwise correlations dataframe
-
cor_matrix() - Signed pairwise correlation matrix
-
cor_stats() - Compute summary statistics for absolute pairwise correlations
-
vif() - Compute variance inflation factors from a correlation matrix
-
vif_df() - Compute variance inflation factors dataframe
-
vif_stats() - VIF Statistics
Predictor Ranking
Prioritize predictors for multicollinearity filtering. Rank variables by their association with a response or by their redundancy with other predictors. Supports cross-validation and multiple response types.
-
f_binomial_gam() - Area under the curve of binomial GAM predictions vs. observations
-
f_binomial_glm() - Area Under the Curve of Binomial GLM predictions vs. observations
-
f_binomial_rf() - Area Under the Curve of Binomial Random Forest predictions vs. observations
-
f_categorical_rf() - Cramer's V of Categorical Random Forest predictions vs. observations
-
f_count_gam() - R-squared of Poisson GAM predictions vs. observations
-
f_count_glm() - R-squared of Poisson GLM predictions vs. observations
-
f_count_rf() - R-squared of Random Forest predictions vs. observations
-
f_numeric_gam() - R-squared of Gaussian GAM predictions vs. observations
-
f_numeric_glm() - R-squared of Gaussian GLM predictions vs. observations
-
f_numeric_rf() - R-squared of Random Forest predictions vs. observations
-
preference_order() - Rank predictors by importance or multicollinearity
-
f_auto() - Automatic selection of predictor scoring method
-
f_auto_rules() - Decision rules for
f_auto() -
f_functions() - List predictor scoring functions
Target Encoding
Convert categorical predictors to numeric using response values. Implements mean, leave-one-out, and rank encoding methods for seamless integration of categorical variables in correlation and VIF analyses.
-
target_encoding_lab() - Convert categorical predictors to numeric via target encoding
-
target_encoding_loo()target_encoding_mean()target_encoding_rank() - Encode categories as response means
Example Data
Sample datasets for exploring package functionality. Includes dataframes with numeric, categorical, and mixed predictor types, plus multiple response encodings.
-
toy - Toy dataframe with varying levels of multicollinearity
-
vi - Large example dataframe
-
vi_predictors - Vector of all predictor names in
viandvi_smol -
vi_predictors_categorical - Vector of categorical predictors in
viandvi_smol -
vi_predictors_numeric - Vector of numeric predictor names in
viandvi_smol -
vi_responses - Vector of response names in
viandvi_smol -
vi_smol - Small example dataframe
Validation Experiments
Results from simulation studies used to calibrate adaptive thresholds and validate the equivalence between correlation and VIF filtering.
-
experiment_adaptive_thresholds - Dataframe resulting from experiment to test the automatic selection of multicollinearity thresholds
-
experiment_cor_vs_vif - Dataframe with results of experiment comparing correlation and VIF thresholds
-
gam_cor_to_vif - GAM describing the relationship between correlation and VIF thresholds
-
prediction_cor_to_vif - Prediction of the model
gam_cor_to_vifacross correlation values
Print and Summary Methods
S3 methods for displaying and summarizing results from collinear() and related functions.
-
print(<collinear_output>) - Print all collinear selection results of
collinear() -
print(<collinear_selection>) - Print single selection results from
collinear -
summary(<collinear_output>) - Summarize all results of
collinear() -
summary(<collinear_selection>) - Summarize single response selection results of
collinear
Variable Type Detection
Identify and classify variables by type. Detect numeric, categorical, logical, and near-zero variance columns in modelling datasets.
-
identify_categorical_variables() - Find valid categorical variables in a dataframe
-
identify_logical_variables() - Find logical variables in a dataframe
-
identify_numeric_variables() - Find valid numeric variables in a dataframe
-
identify_response_type() - Detect response variable type for model selection
-
identify_valid_variables() - Find valid numeric, categorical, and logical variables in a dataframe
-
identify_zero_variance_variables() - Find near-zero variance variables in a dataframe
Modelling Utilities
Helper functions for model fitting and evaluation. Generate formulas, compute performance metrics, and create class-balancing weights.
-
case_weights() - Generate sample weights for imbalanced responses
-
model_formula() - Build model formulas from response and predictors
-
score_auc() - Compute area under the ROC curve between binomial observations and probabilistic predictions
-
score_cramer() - Compute Cramer's V between categorical observations and predictions
-
score_r2() - Compute R-squared between numeric observations and predictions
Input Validation
Internal functions for checking and preparing function arguments. Ensure data frames, variable names, and parameters meet requirements.
-
drop_geometry_column() - Removes
geometryColumn FromsfDataframes -
validate_arg_df() - Check and prepare argument
df -
validate_arg_df_not_null() - Ensure that argument
dfis notNULL -
validate_arg_encoding_method() - Check and validate argument
encoding_method -
validate_arg_f() - Check and validate argument
f -
validate_arg_function_name() - Build hierarchical function names for messages
-
validate_arg_max_cor() - Check and constrain argument
max_cor -
validate_arg_max_vif() - Check and constrain argument
max_vif -
validate_arg_predictors() - Check and validate argument
predictors -
validate_arg_preference_order() - Check and complete argument
preference_order -
validate_arg_quiet() - Check and validate argument
quiet -
validate_arg_responses() - Check and validate arguments
responseandresponses