R/rf_tuning.R
rf_tuning.Rd
Finds the optimal set of random forest hyperparameters num.trees
, mtry
, and min.node.size
via grid search by maximizing the model's R squared, or AUC, if the response variable is binomial, via spatial cross-validation performed with rf_evaluate()
.
rf_tuning(
model = NULL,
num.trees = NULL,
mtry = NULL,
min.node.size = NULL,
xy = NULL,
repetitions = 30,
training.fraction = 0.75,
seed = 1,
verbose = TRUE,
n.cores = parallel::detectCores() - 1,
cluster = NULL
)
A model fitted with rf()
. If provided, the training data is taken directly from the model definition (stored in model$ranger.arguments
). Default: NULL
Numeric integer vector with the number of trees to fit on each model repetition. Default: c(500, 1000, 2000)
.
Numeric integer vector, number of predictors to randomly select from the complete pool of predictors on each tree split. Default: floor(seq(1, length(predictor.variable.names), length.out = 4))
Numeric integer, minimal number of cases in a terminal node. Default: c(5, 10, 20, 40)
Data frame or matrix with two columns containing coordinates and named "x" and "y". If NULL
, the function will throw an error. Default: NULL
Integer, number of independent spatial folds to use during the cross-validation. Default: 30
.
Proportion between 0.2 and 0.9 indicating the number of records to be used in model training. Default: 0.75
Integer, random seed to facilitate reproduciblity. If set to a given number, the results of the function are always the same. Default: 1
.
Logical. If TRUE, messages and plots generated during the execution of the function are displayed, Default: TRUE
Integer, number of cores to use for parallel execution. Creates a socket cluster with parallel::makeCluster()
, runs operations in parallel with foreach
and %dopar%
, and stops the cluster with parallel::clusterStop()
when the job is done. Default: parallel::detectCores() - 1
A cluster definition generated with parallel::makeCluster()
. If provided, overrides n.cores
. When cluster = NULL
(default value), and model
is provided, the cluster in model
, if any, is used instead. If this cluster is NULL
, then the function uses n.cores
instead. The function does not stop a provided cluster, so it should be stopped with parallel::stopCluster()
afterwards. The cluster definition is stored in the output list under the name "cluster" so it can be passed to other functions via the model
argument, or using the %>%
pipe. Default: NULL
A model with a new slot named tuning
, with a data frame with the results of the tuning analysis.
if(interactive()){
#loading example data
data(plant_richness_df)
data(distance_matrix)
#fitting model to tune
out <- rf(
data = plant_richness_df,
dependent.variable.name = "richness_species_vascular",
predictor.variable.names = colnames(plant_richness_df)[5:21],
distance.matrix = distance_matrix,
distance.thresholds = 0,
n.cores = 1
)
#model tuning
tuning <- rf_tuning(
model = out,
num.trees = c(100, 500),
mtry = c(2, 8),
min.node.size = c(5, 10),
xy = plant_richness_df[, c("x", "y")],
n.cores = 1
)
}