Uses rf_evaluate() to compare the performance of several models on independent spatial folds via spatial cross-validation.
Usage
rf_compare(
models = NULL,
xy = NULL,
repetitions = 30,
training.fraction = 0.75,
metrics = c("r.squared", "pseudo.r.squared", "rmse", "nrmse", "auc"),
distance.step = NULL,
distance.step.x = NULL,
distance.step.y = NULL,
fill.color = viridis::viridis(100, option = "F", direction = -1, alpha = 0.8),
line.color = "gray30",
seed = 1,
verbose = TRUE,
n.cores = parallel::detectCores() - 1,
cluster = NULL
)Arguments
- models
Named list with models resulting from
rf(),rf_spatial(),rf_tuning(), orrf_evaluate(). Example:models = list(a = model.a, b = model.b). Default:NULL- xy
Data frame or matrix with two columns containing coordinates and named "x" and "y". Default:
NULL- repetitions
Integer, number of spatial folds to use during cross-validation. Must be lower than the total number of rows available in the model's data. Default:
30- training.fraction
Proportion between 0.5 and 0.9 indicating the proportion of records to be used as training set during spatial cross-validation. Default:
0.75- metrics
Character vector, names of the performance metrics selected. The possible values are: "r.squared" (
cor(obs, pred) ^ 2), "pseudo.r.squared" (cor(obs, pred)), "rmse" (sqrt(sum((obs - pred)^2)/length(obs))), "nrmse" (rmse/(quantile(obs, 0.75) - quantile(obs, 0.25))). Default:c("r.squared", "pseudo.r.squared", "rmse", "nrmse")- distance.step
Numeric, argument
distance.stepofthinning_til_n(). distance step used during the selection of the centers of the training folds. These fold centers are selected by thinning the data until a number of folds equal or lower thanrepetitionsis reached. Its default value is 1/1000th the maximum distance within records inxy. Reduce it if the number of training folds is lower than expected.- distance.step.x
Numeric, argument
distance.step.xofmake_spatial_folds(). Distance step used during the growth in the x axis of the buffers defining the training folds. Default:NULL(1/1000th the range of the x coordinates).- distance.step.y
Numeric, argument
distance.step.xofmake_spatial_folds(). Distance step used during the growth in the y axis of the buffers defining the training folds. Default:NULL(1/1000th the range of the y coordinates).- fill.color
Character vector with hexadecimal codes (e.g. "#440154FF" "#21908CFF" "#FDE725FF"), or function generating a palette (e.g.
viridis::viridis(100)). Default:viridis::viridis(100, option = "F", direction = -1)- line.color
Character string, color of the line produced by
ggplot2::geom_smooth(). Default:"gray30"- seed
Integer, random seed to facilitate reproduciblity. If set to a given number, the results of the function are always the same. Default:
1.- verbose
Logical. If
TRUE, messages and plots generated during the execution of the function are displayed, Default:TRUE- n.cores
Integer, number of cores to use for parallel execution. Creates a socket cluster with
parallel::makeCluster(), runs operations in parallel withforeachand%dopar%, and stops the cluster withparallel::clusterStop()when the job is done. Default:parallel::detectCores() - 1- cluster
A cluster definition generated with
parallel::makeCluster(). If provided, overridesn.cores. Whencluster = NULL(default value), andmodelis provided, the cluster inmodel, if any, is used instead. If this cluster isNULL, then the function usesn.coresinstead. The function does not stop a provided cluster, so it should be stopped withparallel::stopCluster()afterwards. The cluster definition is stored in the output list under the name "cluster" so it can be passed to other functions via themodelargument, or using the%>%pipe. Default:NULL
Value
A list with three slots:
comparison.df: Data frame with one performance value per spatial fold, metric, and model.spatial.folds: List with the indices of the training and testing records for each evaluation repetition.plot: Violin-plot ofcomparison.df.
Examples
if(interactive()){
#loading example data
data(distance_matrix)
data(plant_richness_df)
#fitting random forest model
rf.model <- rf(
data = plant_richness_df,
dependent.variable.name = "richness_species_vascular",
predictor.variable.names = colnames(plant_richness_df)[5:21],
distance.matrix = distance_matrix,
distance.thresholds = 0,
n.cores = 1
)
#fitting a spatial model with Moran's Eigenvector Maps
rf.spatial <- rf_spatial(
model = rf.model,
n.cores = 1
)
#comparing the spatial and non spatial models
comparison <- rf_compare(
models = list(
`Non spatial` = rf.model,
Spatial = rf.spatial
),
xy = plant_richness_df[, c("x", "y")],
metrics = c("r.squared", "rmse"),
n.cores = 1
)
}