R/rf_tuning.R
rf_tuning.RdFinds the optimal set of random forest hyperparameters num.trees, mtry, and min.node.size via grid search by maximizing the model's R squared, or AUC, if the response variable is binomial, via spatial cross-validation performed with rf_evaluate().
rf_tuning(
model = NULL,
num.trees = NULL,
mtry = NULL,
min.node.size = NULL,
xy = NULL,
repetitions = 30,
training.fraction = 0.75,
seed = 1,
verbose = TRUE,
n.cores = parallel::detectCores() - 1,
cluster = NULL
)A model fitted with rf(). If provided, the training data is taken directly from the model definition (stored in model$ranger.arguments). Default: NULL
Numeric integer vector with the number of trees to fit on each model repetition. Default: c(500, 1000, 2000).
Numeric integer vector, number of predictors to randomly select from the complete pool of predictors on each tree split. Default: floor(seq(1, length(predictor.variable.names), length.out = 4))
Numeric integer, minimal number of cases in a terminal node. Default: c(5, 10, 20, 40)
Data frame or matrix with two columns containing coordinates and named "x" and "y". If NULL, the function will throw an error. Default: NULL
Integer, number of independent spatial folds to use during the cross-validation. Default: 30.
Proportion between 0.2 and 0.9 indicating the number of records to be used in model training. Default: 0.75
Integer, random seed to facilitate reproduciblity. If set to a given number, the results of the function are always the same. Default: 1.
Logical. If TRUE, messages and plots generated during the execution of the function are displayed, Default: TRUE
Integer, number of cores to use for parallel execution. Creates a socket cluster with parallel::makeCluster(), runs operations in parallel with foreach and %dopar%, and stops the cluster with parallel::clusterStop() when the job is done. Default: parallel::detectCores() - 1
A cluster definition generated with parallel::makeCluster(). If provided, overrides n.cores. When cluster = NULL (default value), and model is provided, the cluster in model, if any, is used instead. If this cluster is NULL, then the function uses n.cores instead. The function does not stop a provided cluster, so it should be stopped with parallel::stopCluster() afterwards. The cluster definition is stored in the output list under the name "cluster" so it can be passed to other functions via the model argument, or using the %>% pipe. Default: NULL
A model with a new slot named tuning, with a data frame with the results of the tuning analysis.
if(interactive()){
#loading example data
data(plant_richness_df)
data(distance_matrix)
#fitting model to tune
out <- rf(
data = plant_richness_df,
dependent.variable.name = "richness_species_vascular",
predictor.variable.names = colnames(plant_richness_df)[5:21],
distance.matrix = distance_matrix,
distance.thresholds = 0,
n.cores = 1
)
#model tuning
tuning <- rf_tuning(
model = out,
num.trees = c(100, 500),
mtry = c(2, 8),
min.node.size = c(5, 10),
xy = plant_richness_df[, c("x", "y")],
n.cores = 1
)
}