
Find valid categorical variables in a dataframe
Source:R/identify_categorical_variables.R
identify_categorical_variables.RdIdentifies valid and invalid character or factor variables. Invalid categorical predictors are those with a single category, or as many categories as cases (full-cardinality).
Usage
identify_categorical_variables(
df = NULL,
responses = NULL,
predictors = NULL,
quiet = FALSE,
...
)Arguments
- df
(required; dataframe, tibble, or sf) A dataframe with responses (optional) and predictors. Must have at least 10 rows for pairwise correlation analysis, and
10 * (length(predictors) - 1)for VIF. Default: NULL.- responses
(optional; character, character vector, or NULL) Name of one or several response variables in
df. Default: NULL.- predictors
(required, character vector) Names of the predictors to identify. Default: NULL
- quiet
(optional; logical) If FALSE, messages are printed. Default: FALSE.
- ...
(optional) Internal args (e.g.
function_nameforvalidate_arg_function_name, a precomputed correlation matrixm, or cross-validation args forpreference_order).
Value
list:
valid: character vector with valid categorical predictor names.invalid: character vector with invalid categorical predictor names due to degenerate cardinality (1 ornrow(df)categories).
Examples
data(vi_smol, vi_predictors)
#create an invalid categorical
vi_smol$invalid_categorical <- "a"
x <- identify_categorical_variables(
df = vi_smol,
responses = "vi_categorical",
predictors = vi_predictors
)
x$valid
#> [1] "vi_categorical" "koppen_zone" "koppen_group"
#> [4] "koppen_description" "soil_type" "biogeo_ecoregion"
#> [7] "biogeo_biome" "biogeo_realm" "country_name"
#> [10] "continent" "region" "subregion"
x$invalid
#> NULL