Skip to contents

Computes the Variance Inflation Factor of numeric variables in a data frame.

This function computes the VIF (see section Variance Inflation Factors below) in two steps:

  • Applies base::solve() to obtain the precision matrix, which is the inverse of the covariance matrix between all variables in predictors.

  • Uses base::diag() to extract the diagonal of the precision matrix, which contains the variance of the prediction of each predictor from all other predictors, and represents the VIF.

Usage

vif_df(df = NULL, predictors = NULL, quiet = FALSE)

Arguments

df

(required; data frame, tibble, or sf) A data frame with responses and predictors. Default: NULL.

predictors

(optional; character vector) Names of the predictors to select from df. If omitted, all numeric columns in df are used instead. If argument response is not provided, non-numeric variables are ignored. Default: NULL

quiet

(optional; logical) If FALSE, messages generated during the execution of the function are printed to the console Default: FALSE

Value

data frame; predictors names their VIFs

Variance Inflation Factors

The Variance Inflation Factor for a given variable \(a\) is computed as \(1/(1-R2)\), where \(R2\) is the multiple R-squared of a multiple regression model fitted using \(a\) as response and all other predictors in the input data frame as predictors, as in \(a = b + c + ...\).

The square root of the VIF of \(a\) is the factor by which the confidence interval of the estimate for \(a\) in the linear model \(y = a + b + c + ...\)` is widened by multicollinearity in the model predictors.

The range of VIF values is (1, Inf]. The recommended thresholds for maximum VIF may vary depending on the source consulted, being the most common values, 2.5, 5, and 10.

References

  • David A. Belsley, D.A., Kuh, E., Welsch, R.E. (1980). Regression Diagnostics: Identifying Influential Data and Sources of Collinearity. John Wiley & Sons. DOI: 10.1002/0471725153.

See also

Other vif: vif_select()

Examples


data(
  vi,
  vi_predictors_numeric
)

#subset to limit run time
df <- vi[1:1000, ]

#apply pairwise correlation first
selection <- cor_select(
  df = df,
  predictors = vi_predictors_numeric,
  quiet = TRUE
)

#VIF data frame
df <- vif_df(
  df = df,
  predictors = selection
)

df
#>                     predictor     vif
#> 21                   swi_mean 15.1740
#> 16                    swi_min 10.4749
#> 22             solar_rad_mean 10.4663
#> 19            cloud_cover_min  7.9517
#> 17            temperature_max  7.9092
#> 18    temperature_seasonality  7.3571
#> 20              solar_rad_max  6.6851
#> 11                  swi_range  6.3444
#> 13 growing_season_temperature  4.6673
#> 15              soil_nitrogen  4.1777
#> 9           cloud_cover_range  4.0856
#> 8                   soil_sand  3.4906
#> 14             rainfall_range  3.0034
#> 6                   soil_clay  2.8938
#> 1              topo_elevation  2.8349
#> 12               rainfall_min  2.7243
#> 3              humidity_range  2.7023
#> 10                   soil_soc  2.6656
#> 2          country_population  2.0143
#> 5                  topo_slope  1.8512
#> 4                 country_gdp  1.8382
#> 7              topo_diversity  1.5827