
Dataframe with results of experiment comparing correlation and VIF thresholds
Source:R/data.R
experiment_cor_vs_vif.RdA dataframe summarizing 10,000 experiments comparing the output of cor_select() and vif_select(). Each row records the input sampling parameters and the resulting feature-selection metrics.
Usage
data(experiment_cor_vs_vif)Format
A dataframe with 10,000 rows and 6 variables:
- input_rows
Number of rows in the input data subset.
- input_predictors
Number of predictors in the input data subset.
- output_predictors
Number of predictors selected by
vif_select()at the best-matchingmax_vif.- max_cor
Maximum allowed pairwise correlation supplied to
cor_select().- max_vif
VIF threshold at which
vif_select()produced the highest Jaccard similarity withcor_select()for the givenmax_cor.- out_selection_jaccard
Jaccard similarity between the predictors selected by
cor_select()andvif_select().
Details
The source data is a synthetic dataframe with 500 columns and 10,000 rows generated using distantia::zoo_simulate() with correlated time series (independent = FALSE).
Each iteration randomly subsets 10-50 predictors and 30-100 rows per predictor, applies cor_select() with a random max_cor threshold, then finds the max_vif value that maximizes Jaccard similarity between the two selections.
See also
Other experiments:
experiment_adaptive_thresholds,
gam_cor_to_vif,
prediction_cor_to_vif
Examples
data(experiment_cor_vs_vif)
str(experiment_cor_vs_vif)
#> 'data.frame': 10000 obs. of 6 variables:
#> $ input_rows : num 1764 2250 1960 2392 1488 ...
#> $ input_predictors : num 42 45 40 26 48 47 49 40 43 40 ...
#> $ output_predictors : int 2 2 2 2 5 6 6 5 5 6 ...
#> $ max_cor : num 0.1 0.1 0.12 0.14 0.14 0.12 0.1 0.13 0.14 0.13 ...
#> $ max_vif : num 1 1 1 1 1.2 1.2 1.3 1.1 1.1 1.2 ...
#> $ out_selection_jaccard: num 0.25 0.25 0.25 0.25 0.286 ...