Selects spatial predictors following these steps:

  1. Gets the spatial predictors ranked by rank_spatial_predictors() and fits a model of the form y ~ predictors + best_spatial_predictor_1. The Moran's I of the residuals of this model is used as reference value for the next step.

  2. The remaining spatial predictors are introduced again into rank_spatial_predictors(), and the spatial predictor with the highest ranking is introduced in a new model of the form y ~ predictors + best_spatial_predictor_1 + best_spatial_predictor_2.

  3. Steps 1 and 2 are repeated until the Moran's I doesn't improve for a number of repetitions equal to the 20 percent of the total number of spatial predictors introduced in the function.

This method allows to select the smallest set of spatial predictors that have the largest joint effect in reducing the spatial correlation of the model residuals, while maintaining the model's R-squared as high as possible. As a consequence of running rank_spatial_predictors() on each iteration, this method includes in the final model less spatial predictors than the sequential method implemented in select_spatial_predictors_sequential() would do, while minimizing spatial correlation and maximizing the R squared of the model as much as possible.

select_spatial_predictors_recursive(
  data = NULL,
  dependent.variable.name = NULL,
  predictor.variable.names = NULL,
  distance.matrix = NULL,
  distance.thresholds = NULL,
  ranger.arguments = NULL,
  spatial.predictors.df = NULL,
  spatial.predictors.ranking = NULL,
  weight.r.squared = 0.25,
  weight.penalization.n.predictors = 0,
  n.cores = parallel::detectCores() - 1,
  cluster = NULL
)

Arguments

data

Data frame with a response variable and a set of predictors. Default: NULL

dependent.variable.name

Character string with the name of the response variable. Must be in the column names of data. Default: NULL

predictor.variable.names

Character vector with the names of the predictive variables. Every element of this vector must be in the column names of data. Default: NULL

distance.matrix

Squared matrix with the distances among the records in data. The number of rows of distance.matrix and data must be the same. If not provided, the computation of the Moran's I of the residuals is omitted. Default: NULL

distance.thresholds

Numeric vector with neighborhood distances. All distances in the distance matrix below each value in dustance.thresholds are set to 0 for the computation of Moran's I. If NULL, it defaults to seq(0, max(distance.matrix), length.out = 4). Default: NULL

ranger.arguments

Named list with ranger arguments (other arguments of this function can also go here). All ranger arguments are set to their default values except for 'importance', that is set to 'permutation' rather than 'none'. Please, consult the help file of ranger if you are not familiar with the arguments of this function.

spatial.predictors.df

Data frame of spatial predictors.

spatial.predictors.ranking

Ranking of predictors returned by rank_spatial_predictors().

weight.r.squared

Numeric between 0 and 1, weight of R-squared in the optimization index. Default: 0.25

weight.penalization.n.predictors

Numeric between 0 and 1, weight of the penalization for the number of spatial predictors added in the optimization index. Default: 0

n.cores

Integer, number of cores to use. Default: parallel::detectCores() - 1

cluster

A cluster definition generated by parallel::makeCluster(). Default: NULL

Value

A list with two slots: optimization, a data frame with the index of the spatial predictor added on each iteration, the spatial correlation of the model residuals, and the R-squared of the model, and best.spatial.predictors, that is a character vector with the names of the spatial predictors that minimize the Moran's I of the residuals and maximize the R-squared of the model.

Details

The algorithm works as follows. If the function rank_spatial_predictors() returns 10 ranked spatial predictors (sp1 to sp10, being sp7 the best one), select_spatial_predictors_recursive() is going to first fit the model y ~ predictors + sp7. Then, the spatial predictors sp2 to sp9 are again ranked with rank_spatial_predictors() using the model y ~ predictors + sp7 as reference (at this stage, some of the spatial predictors might be dropped due to lack of effect). When the new ranking of spatial predictors is ready (let's say they are sp5, sp3, and sp4), the best one (sp5) is included in the model y ~ predictors + sp7 + sp5, and the remaining ones go again to rank_spatial_predictors() to repeat the process until spatial predictors are depleted.

Examples

if(interactive()){

#loading example data
data(distance_matrix)
data(plant_richness_df)

#response and preditor names
dependent.variable.name = "richness_species_vascular"
predictor.variable.names = colnames(plant_richness_df)[5:21]

#non-spatial model
model <- rf(
  data = plant_richness_df,
  dependent.variable.name = dependent.variable.name,
  predictor.variable.names = predictor.variable.names,
  distance.matrix = distance_matrix,
  distance.thresholds = 0,
  n.cores = 1
)

#preparing spatial predictors
spatial.predictors <- mem_multithreshold(
  distance.matrix = distance_matrix,
  distance.thresholds = 0
)

#ranking spatial predictors
spatial.predictors.ranking <- rank_spatial_predictors(
  data = plant_richness_df,
  dependent.variable.name = dependent.variable.name,
  predictor.variable.names = predictor.variable.names,
  spatial.predictors.df = spatial.predictors,
  ranking.method = "moran",
  reference.moran.i = model$spatial.correlation.residuals$max.moran,
  distance.matrix = distance_matrix,
  distance.thresholds = 0,
  n.cores = 1
)

#selecting the best subset of predictors
selection <- select_spatial_predictors_recursive(
  data = plant_richness_df,
  dependent.variable.name = dependent.variable.name,
  predictor.variable.names = predictor.variable.names,
  distance.matrix = distance_matrix,
  distance.thresholds = 0,
  spatial.predictors.df = spatial.predictors,
  spatial.predictors.ranking = spatial.predictors.ranking,
  n.cores = 1
)

selection$optimization
selection$best.spatial.predictors
plot_optimization(selection$optimization)

}