Applies make_spatial_fold() to every record in a data frame xy.selected to generate as many spatially independent folds over the dataset xy as rows are in xy.selected.
Usage
make_spatial_folds(
data = NULL,
dependent.variable.name = NULL,
xy.selected = NULL,
xy = NULL,
distance.step.x = NULL,
distance.step.y = NULL,
training.fraction = 0.75,
n.cores = parallel::detectCores() - 1,
cluster = NULL
)Arguments
- data
Data frame with a response variable and a set of predictors. Default:
NULL- dependent.variable.name
Character string with the name of the response variable. Must be in the column names of
data. Default:NULL- xy.selected
Data frame with at least three columns: "x" (longitude), "y" (latitude), and "id" (integer, id of the record). Usually a subset of
xy. Usually the result of applyingthinning()orthinning_til_n()to 'xy' Default:NULL.- xy
data frame with at least three columns: "x" (longitude), "y" (latitude), and "id" (integer, index of the record). Default:
NULL.- distance.step.x
Numeric, distance step used during the growth in the x axis of the buffers defining the training folds. Default:
NULL(1/1000th the range of the x coordinates).- distance.step.y
Numeric, distance step used during the growth in the y axis of the buffers defining the training folds. Default:
NULL(1/1000th the range of the y coordinates).- training.fraction
numeric, fraction of the data to be included in the growing buffer as training data, Default:
0.75- n.cores
Integer, number of cores to use for parallel execution. Creates a socket cluster with
parallel::makeCluster(), runs operations in parallel withforeachand%dopar%, and stops the cluster withparallel::clusterStop()when the job is done. Default:parallel::detectCores() - 1- cluster
A cluster definition generated with
parallel::makeCluster(). If provided, overridesn.cores. Whencluster = NULL(default value), andmodelis provided, the cluster inmodel, if any, is used instead. If this cluster isNULL, then the function usesn.coresinstead. The function does not stop a provided cluster, so it should be stopped withparallel::stopCluster()afterwards. Default:NULL
Value
A list with as many slots as rows are in xy.selected. Each slot has two slots named training and testing, with the former having the indices of the training records selected from xy, and the latter having the indices of the testing records.
Examples
#loading example data
data(plant_richness_df)
#getting case coordinates
xy <- plant_richness_df[, 1:3]
colnames(xy) <- c("id", "x", "y")
#thining til 20 cases
xy.selected <- thinning_til_n(
xy = xy,
n = 20
)
#making spatial folds centered on these 20 cases
out <- make_spatial_folds(
xy.selected = xy.selected,
xy = xy,
distance.step.x = 0.05, #degrees
training.fraction = 0.6,
n.cores = 1
)
if(interactive()){
#plotting training and testing folds
plot(xy[ c("x", "y")], type = "n", xlab = "", ylab = "")
#plots training points
points(xy[out[[10]]$training, c("x", "y")], col = "red4", pch = 15)
#plots testing points
points(xy[out[[10]]$testing, c("x", "y")], col = "blue4", pch = 15)
#plots xy.i
points(xy[10, c("x", "y")], col = "black", pch = 15, cex = 2)
}