Tuning of random forest hyperparameters via spatial cross-validation
Source:R/rf_tuning.R
rf_tuning.RdFinds the optimal set of random forest hyperparameters num.trees, mtry, and min.node.size via grid search by maximizing the model's R squared, or AUC, if the response variable is binomial, via spatial cross-validation performed with rf_evaluate().
Usage
rf_tuning(
model = NULL,
num.trees = NULL,
mtry = NULL,
min.node.size = NULL,
xy = NULL,
repetitions = 30,
training.fraction = 0.75,
seed = 1,
verbose = TRUE,
n.cores = parallel::detectCores() - 1,
cluster = NULL
)Arguments
- model
A model fitted with
rf(). If provided, the training data is taken directly from the model definition (stored inmodel$ranger.arguments). Default:NULL- num.trees
Numeric integer vector with the number of trees to fit on each model repetition. Default:
c(500, 1000, 2000).- mtry
Numeric integer vector, number of predictors to randomly select from the complete pool of predictors on each tree split. Default:
floor(seq(1, length(predictor.variable.names), length.out = 4))- min.node.size
Numeric integer, minimal number of cases in a terminal node. Default:
c(5, 10, 20, 40)- xy
Data frame or matrix with two columns containing coordinates and named "x" and "y". If
NULL, the function will throw an error. Default:NULL- repetitions
Integer, number of independent spatial folds to use during the cross-validation. Default:
30.- training.fraction
Proportion between 0.2 and 0.9 indicating the number of records to be used in model training. Default:
0.75- seed
Integer, random seed to facilitate reproduciblity. If set to a given number, the results of the function are always the same. Default:
1.- verbose
Logical. If TRUE, messages and plots generated during the execution of the function are displayed, Default:
TRUE- n.cores
Integer, number of cores to use for parallel execution. Creates a socket cluster with
parallel::makeCluster(), runs operations in parallel withforeachand%dopar%, and stops the cluster withparallel::clusterStop()when the job is done. Default:parallel::detectCores() - 1- cluster
A cluster definition generated with
parallel::makeCluster(). If provided, overridesn.cores. Whencluster = NULL(default value), andmodelis provided, the cluster inmodel, if any, is used instead. If this cluster isNULL, then the function usesn.coresinstead. The function does not stop a provided cluster, so it should be stopped withparallel::stopCluster()afterwards. The cluster definition is stored in the output list under the name "cluster" so it can be passed to other functions via themodelargument, or using the%>%pipe. Default:NULL
Value
A model with a new slot named tuning, with a data frame with the results of the tuning analysis.
Examples
if(interactive()){
#loading example data
data(plant_richness_df)
data(distance_matrix)
#fitting model to tune
out <- rf(
data = plant_richness_df,
dependent.variable.name = "richness_species_vascular",
predictor.variable.names = colnames(plant_richness_df)[5:21],
distance.matrix = distance_matrix,
distance.thresholds = 0,
n.cores = 1
)
#model tuning
tuning <- rf_tuning(
model = out,
num.trees = c(100, 500),
mtry = c(2, 8),
min.node.size = c(5, 10),
xy = plant_richness_df[, c("x", "y")],
n.cores = 1
)
}