Uses rf_evaluate()
to compare the performance of several models on independent spatial folds via spatial cross-validation.
rf_compare(
models = NULL,
xy = NULL,
repetitions = 30,
training.fraction = 0.75,
metrics = c("r.squared", "pseudo.r.squared", "rmse", "nrmse", "auc"),
distance.step = NULL,
distance.step.x = NULL,
distance.step.y = NULL,
fill.color = viridis::viridis(100, option = "F", direction = -1, alpha = 0.8),
line.color = "gray30",
seed = 1,
verbose = TRUE,
n.cores = parallel::detectCores() - 1,
cluster = NULL
)
Named list with models resulting from rf()
, rf_spatial()
, rf_tuning()
, or rf_evaluate()
. Example: models = list(a = model.a, b = model.b)
. Default: NULL
Data frame or matrix with two columns containing coordinates and named "x" and "y". Default: NULL
Integer, number of spatial folds to use during cross-validation. Must be lower than the total number of rows available in the model's data. Default: 30
Proportion between 0.5 and 0.9 indicating the proportion of records to be used as training set during spatial cross-validation. Default: 0.75
Character vector, names of the performance metrics selected. The possible values are: "r.squared" (cor(obs, pred) ^ 2
), "pseudo.r.squared" (cor(obs, pred)
), "rmse" (sqrt(sum((obs - pred)^2)/length(obs))
), "nrmse" (rmse/(quantile(obs, 0.75) - quantile(obs, 0.25))
). Default: c("r.squared", "pseudo.r.squared", "rmse", "nrmse")
Numeric, argument distance.step
of thinning_til_n()
. distance step used during the selection of the centers of the training folds. These fold centers are selected by thinning the data until a number of folds equal or lower than repetitions
is reached. Its default value is 1/1000th the maximum distance within records in xy
. Reduce it if the number of training folds is lower than expected.
Numeric, argument distance.step.x
of make_spatial_folds()
. Distance step used during the growth in the x axis of the buffers defining the training folds. Default: NULL
(1/1000th the range of the x coordinates).
Numeric, argument distance.step.x
of make_spatial_folds()
. Distance step used during the growth in the y axis of the buffers defining the training folds. Default: NULL
(1/1000th the range of the y coordinates).
Character vector with hexadecimal codes (e.g. "#440154FF" "#21908CFF" "#FDE725FF"), or function generating a palette (e.g. viridis::viridis(100)
). Default: viridis::viridis(100, option = "F", direction = -1)
Character string, color of the line produced by ggplot2::geom_smooth()
. Default: "gray30"
Integer, random seed to facilitate reproduciblity. If set to a given number, the results of the function are always the same. Default: 1
.
Logical. If TRUE
, messages and plots generated during the execution of the function are displayed, Default: TRUE
Integer, number of cores to use for parallel execution. Creates a socket cluster with parallel::makeCluster()
, runs operations in parallel with foreach
and %dopar%
, and stops the cluster with parallel::clusterStop()
when the job is done. Default: parallel::detectCores() - 1
A cluster definition generated with parallel::makeCluster()
. If provided, overrides n.cores
. When cluster = NULL
(default value), and model
is provided, the cluster in model
, if any, is used instead. If this cluster is NULL
, then the function uses n.cores
instead. The function does not stop a provided cluster, so it should be stopped with parallel::stopCluster()
afterwards. The cluster definition is stored in the output list under the name "cluster" so it can be passed to other functions via the model
argument, or using the %>%
pipe. Default: NULL
A list with three slots:
comparison.df
: Data frame with one performance value per spatial fold, metric, and model.
spatial.folds
: List with the indices of the training and testing records for each evaluation repetition.
plot
: Violin-plot of comparison.df
.
if(interactive()){
#loading example data
data(distance_matrix)
data(plant_richness_df)
#fitting random forest model
rf.model <- rf(
data = plant_richness_df,
dependent.variable.name = "richness_species_vascular",
predictor.variable.names = colnames(plant_richness_df)[5:21],
distance.matrix = distance_matrix,
distance.thresholds = 0,
n.cores = 1
)
#fitting a spatial model with Moran's Eigenvector Maps
rf.spatial <- rf_spatial(
model = rf.model,
n.cores = 1
)
#comparing the spatial and non spatial models
comparison <- rf_compare(
models = list(
`Non spatial` = rf.model,
Spatial = rf.spatial
),
xy = plant_richness_df[, c("x", "y")],
metrics = c("r.squared", "rmse"),
n.cores = 1
)
}