`R/rf_tuning.R`

`rf_tuning.Rd`

Finds the optimal set of random forest hyperparameters `num.trees`

, `mtry`

, and `min.node.size`

via grid search by maximizing the model's R squared, or AUC, if the response variable is binomial, via spatial cross-validation performed with `rf_evaluate()`

.

```
rf_tuning(
model = NULL,
num.trees = NULL,
mtry = NULL,
min.node.size = NULL,
xy = NULL,
repetitions = 30,
training.fraction = 0.75,
seed = 1,
verbose = TRUE,
n.cores = parallel::detectCores() - 1,
cluster = NULL
)
```

- model
A model fitted with

`rf()`

. If provided, the training data is taken directly from the model definition (stored in`model$ranger.arguments`

). Default:`NULL`

- num.trees
Numeric integer vector with the number of trees to fit on each model repetition. Default:

`c(500, 1000, 2000)`

.- mtry
Numeric integer vector, number of predictors to randomly select from the complete pool of predictors on each tree split. Default:

`floor(seq(1, length(predictor.variable.names), length.out = 4))`

- min.node.size
Numeric integer, minimal number of cases in a terminal node. Default:

`c(5, 10, 20, 40)`

- xy
Data frame or matrix with two columns containing coordinates and named "x" and "y". If

`NULL`

, the function will throw an error. Default:`NULL`

- repetitions
Integer, number of independent spatial folds to use during the cross-validation. Default:

`30`

.- training.fraction
Proportion between 0.2 and 0.9 indicating the number of records to be used in model training. Default:

`0.75`

- seed
Integer, random seed to facilitate reproduciblity. If set to a given number, the results of the function are always the same. Default:

`1`

.- verbose
Logical. If TRUE, messages and plots generated during the execution of the function are displayed, Default:

`TRUE`

- n.cores
Integer, number of cores to use for parallel execution. Creates a socket cluster with

`parallel::makeCluster()`

, runs operations in parallel with`foreach`

and`%dopar%`

, and stops the cluster with`parallel::clusterStop()`

when the job is done. Default:`parallel::detectCores() - 1`

- cluster
A cluster definition generated with

`parallel::makeCluster()`

. If provided, overrides`n.cores`

. When`cluster = NULL`

(default value), and`model`

is provided, the cluster in`model`

, if any, is used instead. If this cluster is`NULL`

, then the function uses`n.cores`

instead. The function does not stop a provided cluster, so it should be stopped with`parallel::stopCluster()`

afterwards. The cluster definition is stored in the output list under the name "cluster" so it can be passed to other functions via the`model`

argument, or using the`%>%`

pipe. Default:`NULL`

A model with a new slot named `tuning`

, with a data frame with the results of the tuning analysis.

```
if(interactive()){
#loading example data
data(plant_richness_df)
data(distance_matrix)
#fitting model to tune
out <- rf(
data = plant_richness_df,
dependent.variable.name = "richness_species_vascular",
predictor.variable.names = colnames(plant_richness_df)[5:21],
distance.matrix = distance_matrix,
distance.thresholds = 0,
n.cores = 1
)
#model tuning
tuning <- rf_tuning(
model = out,
num.trees = c(100, 500),
mtry = c(2, 8),
min.node.size = c(5, 10),
xy = plant_richness_df[, c("x", "y")],
n.cores = 1
)
}
```