
Detect response variable type for model selection
Source:R/identify_response_type.R
identify_response_type.RdUsed by f_auto() to identify the type of a response variable and select a proper modelling method to compute preference order.
Supported types are:
"continuous-binary": decimal numbers and two unique values; results in a warning, as this type is difficult to model.
"continuous-low": decimal numbers and 3 to 5 unique values; results in a message, as this type is difficult to model.
"continuous-high": decimal numbers and more than 5 unique values.
"integer-binomial": integer with 0s and 1s, suitable for binomial models.
"integer-binary": integer with 2 unique values other than 0 and 1; returns a warning, as this type is difficult to model.
"integer-low": integer with 3 to 5 unique values or meets specified thresholds.
"integer-high": integer with more than 5 unique values suitable for count modelling.
"categorical": character or factor with 2 or more levels.
"unknown": when the response type cannot be determined.
Arguments
- df
(required; dataframe, tibble, or sf) A dataframe with responses (optional) and predictors. Must have at least 10 rows for pairwise correlation analysis, and
10 * (length(predictors) - 1)for VIF. Default: NULL.- response
(optional, character string) Name of a response variable in
df. Default: NULL.- quiet
(optional; logical) If FALSE, messages are printed. Default: FALSE.
- ...
(optional) Internal args (e.g.
function_nameforvalidate_arg_function_name, a precomputed correlation matrixm, or cross-validation args forpreference_order).
Examples
data(vi_smol)
identify_response_type(
df = vi_smol,
response = "vi_numeric"
)
#> [1] "continuous-high"
identify_response_type(
df = vi_smol,
response = "vi_counts"
)
#> [1] "integer-high"
identify_response_type(
df = vi_smol,
response = "vi_binomial"
)
#> [1] "integer-binomial"
identify_response_type(
df = vi_smol,
response = "vi_categorical"
)
#>
#> collinear::identify_response_type()
#> └── collinear::validate_arg_df(): converted the following character columns to factor:
#> - vi_categorical
#> [1] "categorical"
identify_response_type(
df = vi_smol,
response = "vi_factor"
)
#> [1] "categorical"