Applies make_spatial_fold() to every record in a data frame xy.selected to generate as many spatially independent folds over the dataset xy as rows are in xy.selected.

make_spatial_folds(
  data = NULL,
  dependent.variable.name = NULL,
  xy.selected = NULL,
  xy = NULL,
  distance.step.x = NULL,
  distance.step.y = NULL,
  training.fraction = 0.75,
  n.cores = parallel::detectCores() - 1,
  cluster = NULL
)

Arguments

data

Data frame with a response variable and a set of predictors. Default: NULL

dependent.variable.name

Character string with the name of the response variable. Must be in the column names of data. Default: NULL

xy.selected

Data frame with at least three columns: "x" (longitude), "y" (latitude), and "id" (integer, id of the record). Usually a subset of xy. Usually the result of applying thinning() or thinning_til_n() to 'xy' Default: NULL.

xy

data frame with at least three columns: "x" (longitude), "y" (latitude), and "id" (integer, index of the record). Default: NULL.

distance.step.x

Numeric, distance step used during the growth in the x axis of the buffers defining the training folds. Default: NULL (1/1000th the range of the x coordinates).

distance.step.y

Numeric, distance step used during the growth in the y axis of the buffers defining the training folds. Default: NULL (1/1000th the range of the y coordinates).

training.fraction

numeric, fraction of the data to be included in the growing buffer as training data, Default: 0.75

n.cores

Integer, number of cores to use for parallel execution. Creates a socket cluster with parallel::makeCluster(), runs operations in parallel with foreach and %dopar%, and stops the cluster with parallel::clusterStop() when the job is done. Default: parallel::detectCores() - 1

cluster

A cluster definition generated with parallel::makeCluster(). If provided, overrides n.cores. When cluster = NULL (default value), and model is provided, the cluster in model, if any, is used instead. If this cluster is NULL, then the function uses n.cores instead. The function does not stop a provided cluster, so it should be stopped with parallel::stopCluster() afterwards. The cluster definition is stored in the output list under the name "cluster" so it can be passed to other functions via the model argument, or using the %>% pipe. Default: NULL

Value

A list with as many slots as rows are in xy.selected. Each slot has two slots named training and testing, with the former having the indices of the training records selected from xy, and the latter having the indices of the testing records.

See also

Examples

if(interactive()){ #loading example data data(plant_richness_df) #getting case coordinates xy <- plant_richness_df[, 1:3] colnames(xy) <- c("id", "x", "y") #thining til 20 cases xy.selected <- thinning_til_n( xy = xy, n = 20 ) #making spatial folds centered on these 20 cases out <- make_spatial_folds( xy.selected = xy.selected, xy = xy, distance.step = 0.05, #degrees training.fraction = 0.6, n.cores = 1 ) #plotting training and testing folds plot(xy[ c("x", "y")], type = "n", xlab = "", ylab = "") #plots training points points(xy[out[[10]]$training, c("x", "y")], col = "red4", pch = 15) #plots testing points points(xy[out[[10]]$testing, c("x", "y")], col = "blue4", pch = 15) #plots xy.i points(xy[10, c("x", "y")], col = "black", pch = 15, cex = 2) }