Applies make_spatial_fold() to every record in a data frame xy.selected to generate as many spatially independent folds over the dataset xy as rows are in xy.selected.

make_spatial_folds(
  data = NULL,
  dependent.variable.name = NULL,
  xy.selected = NULL,
  xy = NULL,
  distance.step.x = NULL,
  distance.step.y = NULL,
  training.fraction = 0.75,
  n.cores = parallel::detectCores() - 1,
  cluster = NULL
)

Arguments

data

Data frame with a response variable and a set of predictors. Default: NULL

dependent.variable.name

Character string with the name of the response variable. Must be in the column names of data. Default: NULL

xy.selected

Data frame with at least three columns: "x" (longitude), "y" (latitude), and "id" (integer, id of the record). Usually a subset of xy. Usually the result of applying thinning() or thinning_til_n() to 'xy' Default: NULL.

xy

data frame with at least three columns: "x" (longitude), "y" (latitude), and "id" (integer, index of the record). Default: NULL.

distance.step.x

Numeric, distance step used during the growth in the x axis of the buffers defining the training folds. Default: NULL (1/1000th the range of the x coordinates).

distance.step.y

Numeric, distance step used during the growth in the y axis of the buffers defining the training folds. Default: NULL (1/1000th the range of the y coordinates).

training.fraction

numeric, fraction of the data to be included in the growing buffer as training data, Default: 0.75

n.cores

Integer, number of cores to use for parallel execution. Creates a socket cluster with parallel::makeCluster(), runs operations in parallel with foreach and %dopar%, and stops the cluster with parallel::clusterStop() when the job is done. Default: parallel::detectCores() - 1

cluster

A cluster definition generated with parallel::makeCluster(). If provided, overrides n.cores. When cluster = NULL (default value), and model is provided, the cluster in model, if any, is used instead. If this cluster is NULL, then the function uses n.cores instead. The function does not stop a provided cluster, so it should be stopped with parallel::stopCluster() afterwards. The cluster definition is stored in the output list under the name "cluster" so it can be passed to other functions via the model argument, or using the %>% pipe. Default: NULL

Value

A list with as many slots as rows are in xy.selected. Each slot has two slots named training and testing, with the former having the indices of the training records selected from xy, and the latter having the indices of the testing records.

Examples

if(interactive()){

 #loading example data
 data(plant_richness_df)

 #getting case coordinates
 xy <- plant_richness_df[, 1:3]
 colnames(xy) <- c("id", "x", "y")

 #thining til 20 cases
 xy.selected <- thinning_til_n(
   xy = xy,
   n = 20
   )

 #making spatial folds centered on these 20 cases
 out <- make_spatial_folds(
   xy.selected = xy.selected,
   xy = xy,
   distance.step = 0.05, #degrees
   training.fraction = 0.6,
   n.cores = 1
 )

 #plotting training and testing folds
 plot(xy[ c("x", "y")], type = "n", xlab = "", ylab = "")
 #plots training points
 points(xy[out[[10]]$training, c("x", "y")], col = "red4", pch = 15)
 #plots testing points
 points(xy[out[[10]]$testing, c("x", "y")], col = "blue4", pch = 15)
 #plots xy.i
 points(xy[10, c("x", "y")], col = "black", pch = 15, cex = 2)

}