European oak distribution: multi-species niche modelling with recursive partition trees
Source:vignettes/articles/quercus.Rmd
quercus.RmdOverview
The quercus dataset contains presence and absence
records of eight European Quercus (oak) species combined with 31
environmental predictor variables covering bioclimatic, topographic,
vegetation, and human impact dimensions.
Its main features are:
- Complete case study - No missing values, clean data ready for analysis.
- Multi-class classification - Eight oak species plus absence points (9 levels).
- European scope - Records spanning western, central, and southern Europe.
- Ecological predictors - Climate, topography, vegetation indices, land cover, and human footprint.
We built this dataset to support a range of analytical approaches, such as binary and multi-class classification, niche modelling, and niche overlap analysis.
Setup
The following R libraries are required to run this tutorial:
The code below loads the example data.
data(
quercus,
quercus_response,
quercus_predictors,
package = "spatialData"
)Data Structure
The dataset is an sf data frame with 6728 rows and 33
columns, and no missing data. The first 10 records and all columns but
geometry are shown below.
quercus |>
sf::st_drop_geometry() |>
head() |>
dplyr::glimpse()
#> Rows: 6
#> Columns: 32
#> $ species <chr> "Quercus robur", "Quercus robur", "Quercus petra…
#> $ bio1 <int> 75, 75, 110, 58, 82, 157
#> $ bio10 <int> 163, 149, 179, 158, 164, 250
#> $ bio11 <int> -20, 5, 41, -36, 2, 73
#> $ bio12 <int> 740, 818, 1006, 587, 582, 510
#> $ bio13 <int> 106, 96, 105, 73, 67, 69
#> $ bio14 <int> 34, 43, 61, 28, 30, 4
#> $ bio15 <int> 40, 27, 13, 33, 23, 51
#> $ bio16 <int> 294, 279, 296, 210, 194, 178
#> $ bio17 <int> 107, 138, 212, 92, 103, 29
#> $ bio18 <int> 294, 205, 212, 160, 194, 29
#> $ bio19 <int> 115, 162, 285, 118, 125, 172
#> $ bio2 <int> 90, 65, 103, 64, 68, 124
#> $ bio3 <int> 31, 28, 41, 22, 27, 37
#> $ bio4 <int> 7116, 5759, 5368, 7617, 6366, 6904
#> $ bio5 <int> 227, 193, 249, 206, 213, 353
#> $ bio6 <int> -62, -35, 0, -74, -31, 22
#> $ bio7 <int> 289, 228, 249, 280, 244, 331
#> $ topographic_diversity <int> 93, 12, 43, 8, 11, 55
#> $ human_footprint <dbl> 42.33, 52.65, 32.92, 26.68, 40.08, 36.03
#> $ landcover_veg_bare <dbl> 0.36, 0.25, 0.00, 0.08, 0.30, 2.40
#> $ landcover_veg_herb <dbl> 58.85, 91.73, 71.67, 44.06, 53.94, 87.12
#> $ landcover_veg_tree <dbl> 40.69, 8.02, 28.25, 55.78, 35.20, 10.47
#> $ ndvi_average <dbl> 0.63, 0.63, 0.73, 0.50, 0.38, 0.48
#> $ ndvi_maximum <dbl> 0.79, 0.77, 0.80, 0.83, 0.53, 0.70
#> $ ndvi_minimum <dbl> 0.44, 0.45, 0.63, 0.06, 0.22, 0.27
#> $ ndvi_range <dbl> 0.36, 0.33, 0.17, 0.77, 0.31, 0.42
#> $ sun_rad_average <dbl> 4164.84, 3552.85, 4480.35, 3376.14, 3787.15, 511…
#> $ sun_rad_maximum <dbl> 7515.76, 7189.52, 7601.16, 7099.44, 7301.63, 776…
#> $ sun_rad_minimum <dbl> 897.09, 356.39, 1271.03, 230.87, 552.49, 2118.26
#> $ sun_rad_range <dbl> 6618.67, 6833.12, 6330.14, 6868.57, 6749.14, 564…
#> $ topo_slope <dbl> 3.10, 0.39, 1.25, 0.28, 0.43, 1.88Response Variable
The response variable is species, a categorical variable
with 9 levels (8 species + absence):
quercus |>
sf::st_drop_geometry() |>
dplyr::group_by(species) |>
dplyr::summarise(
n = dplyr::n()
) |>
dplyr::arrange(
dplyr::desc(n)
)
#> # A tibble: 9 × 2
#> species n
#> <chr> <int>
#> 1 absence 1899
#> 2 Quercus robur 1660
#> 3 Quercus petraea 1445
#> 4 Quercus ilex 627
#> 5 Quercus cerris 394
#> 6 Quercus faginea 287
#> 7 Quercus pubescens 225
#> 8 Quercus pyrenaica 133
#> 9 Quercus suber 58Predictor Variables
The dataset includes 31 predictor variables organized into six categories:
| Category | Variables | Source |
|---|---|---|
| Bioclimatic (17) |
bio1, bio2, …, bio19
(excluding bio8, bio9) |
WorldClim |
| NDVI (4) |
ndvi_average, ndvi_maximum,
ndvi_minimum, ndvi_range
|
MODIS |
| Solar radiation (4) |
sun_rad_average, sun_rad_maximum,
sun_rad_minimum, sun_rad_range
|
WorldClim |
| Land cover (3) |
landcover_veg_bare, landcover_veg_herb,
landcover_veg_tree
|
MODIS MOD44B |
| Topographic (2) |
topo_slope, topographic_diversity
|
SRTM |
| Human impact (1) | human_footprint |
Venter et al. 2016 |
Interactive Map
The map below shows the spatial distribution of oak species and absence points across Europe:
mapview::mapview(
quercus,
zcol = "species",
layer.name = "Species"
)