Computes the Variance Inflation Factor of numeric variables in a data frame.
This function computes the VIF (see section Variance Inflation Factors below) in two steps:
Applies
base::solve()
to obtain the precision matrix, which is the inverse of the covariance matrix between all variables inpredictors
.Uses
base::diag()
to extract the diagonal of the precision matrix, which contains the variance of the prediction of each predictor from all other predictors, and represents the VIF.
Arguments
- df
(required; data frame, tibble, or sf) A data frame with responses and predictors. Default: NULL.
- predictors
(optional; character vector) Names of the predictors to select from
df
. If omitted, all numeric columns indf
are used instead. If argumentresponse
is not provided, non-numeric variables are ignored. Default: NULL- quiet
(optional; logical) If FALSE, messages generated during the execution of the function are printed to the console Default: FALSE
Variance Inflation Factors
The Variance Inflation Factor for a given variable \(a\) is computed as \(1/(1-R2)\), where \(R2\) is the multiple R-squared of a multiple regression model fitted using \(a\) as response and all other predictors in the input data frame as predictors, as in \(a = b + c + ...\).
The square root of the VIF of \(a\) is the factor by which the confidence interval of the estimate for \(a\) in the linear model \(y = a + b + c + ...\)` is widened by multicollinearity in the model predictors.
The range of VIF values is (1, Inf]. The recommended thresholds for maximum VIF may vary depending on the source consulted, being the most common values, 2.5, 5, and 10.
References
David A. Belsley, D.A., Kuh, E., Welsch, R.E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley & Sons. DOI: 10.1002/0471725153.
See also
Other vif:
vif_select()
Examples
data(
vi,
vi_predictors_numeric
)
#subset to limit run time
df <- vi[1:1000, ]
#apply pairwise correlation first
selection <- cor_select(
df = df,
predictors = vi_predictors_numeric,
quiet = TRUE
)
#VIF data frame
df <- vif_df(
df = df,
predictors = selection
)
df
#> predictor vif
#> 21 swi_mean 15.1740
#> 16 swi_min 10.4749
#> 22 solar_rad_mean 10.4663
#> 19 cloud_cover_min 7.9517
#> 17 temperature_max 7.9092
#> 18 temperature_seasonality 7.3571
#> 20 solar_rad_max 6.6851
#> 11 swi_range 6.3444
#> 13 growing_season_temperature 4.6673
#> 15 soil_nitrogen 4.1777
#> 9 cloud_cover_range 4.0856
#> 8 soil_sand 3.4906
#> 14 rainfall_range 3.0034
#> 6 soil_clay 2.8938
#> 1 topo_elevation 2.8349
#> 12 rainfall_min 2.7243
#> 3 humidity_range 2.7023
#> 10 soil_soc 2.6656
#> 2 country_population 2.0143
#> 5 topo_slope 1.8512
#> 4 country_gdp 1.8382
#> 7 topo_diversity 1.5827