Overview
The quercus dataset contains presence and absence
records of eight European Quercus (oak) species combined with
31 environmental predictor variables covering bioclimatic, topographic,
vegetation, and human impact dimensions.
Its main features are:
- Complete case study - No missing values, clean data ready for analysis.
- Multi-class classification - Eight oak species plus absence points (9 levels).
- European scope - Records spanning western, central, and southern Europe.
- Ecological predictors - Climate, topography, vegetation indices, land cover, and human footprint.
This dataset was built to support a range of analytical approaches, such as binary and multi-class classification, niche modelling, and niche overlap analysis.
Setup
The following R libraries are required to run this tutorial:
library(dplyr)
library(sf)
library(mapview)
library(terra)
library(rpart)
library(rpart.plot)
library(collinear)
library(spatialData)
data(
quercus,
quercus_response,
quercus_predictors,
package = "spatialData"
)
#convert `species` to factor
quercus <- dplyr::mutate(
quercus,
species = as.factor(species)
)
quercus_colors <- grDevices::palette.colors(
n = length(levels(quercus$species)),
palette = "Okabe-Ito",
alpha = 0.6
) |>
rev() |>
stats::setNames(levels(quercus$species))
env_colors <- grDevices::hcl.colors(
n = 100,
palette = "Zissou 1"
)Data Structure
The dataset is an sf POINT dataframe with EPSG 4326,
6728 rows and 33 columns, and no missing data. The first 10 records and
all columns but geometry are shown below.
quercus |>
sf::st_drop_geometry() |>
head() |>
dplyr::glimpse()
#> Rows: 6
#> Columns: 32
#> $ species <fct> Quercus robur, Quercus robur, Quercus petraea, Q…
#> $ bio1 <int> 75, 75, 110, 58, 82, 157
#> $ bio10 <int> 163, 149, 179, 158, 164, 250
#> $ bio11 <int> -20, 5, 41, -36, 2, 73
#> $ bio12 <int> 740, 818, 1006, 587, 582, 510
#> $ bio13 <int> 106, 96, 105, 73, 67, 69
#> $ bio14 <int> 34, 43, 61, 28, 30, 4
#> $ bio15 <int> 40, 27, 13, 33, 23, 51
#> $ bio16 <int> 294, 279, 296, 210, 194, 178
#> $ bio17 <int> 107, 138, 212, 92, 103, 29
#> $ bio18 <int> 294, 205, 212, 160, 194, 29
#> $ bio19 <int> 115, 162, 285, 118, 125, 172
#> $ bio2 <int> 90, 65, 103, 64, 68, 124
#> $ bio3 <int> 31, 28, 41, 22, 27, 37
#> $ bio4 <int> 7116, 5759, 5368, 7617, 6366, 6904
#> $ bio5 <int> 227, 193, 249, 206, 213, 353
#> $ bio6 <int> -62, -35, 0, -74, -31, 22
#> $ bio7 <int> 289, 228, 249, 280, 244, 331
#> $ topographic_diversity <int> 93, 12, 43, 8, 11, 55
#> $ human_footprint <dbl> 42.33, 52.65, 32.92, 26.68, 40.08, 36.03
#> $ landcover_veg_bare <dbl> 0.36, 0.25, 0.00, 0.08, 0.30, 2.40
#> $ landcover_veg_herb <dbl> 58.85, 91.73, 71.67, 44.06, 53.94, 87.12
#> $ landcover_veg_tree <dbl> 40.69, 8.02, 28.25, 55.78, 35.20, 10.47
#> $ ndvi_average <dbl> 0.63, 0.63, 0.73, 0.50, 0.38, 0.48
#> $ ndvi_maximum <dbl> 0.79, 0.77, 0.80, 0.83, 0.53, 0.70
#> $ ndvi_minimum <dbl> 0.44, 0.45, 0.63, 0.06, 0.22, 0.27
#> $ ndvi_range <dbl> 0.36, 0.33, 0.17, 0.77, 0.31, 0.42
#> $ sun_rad_average <dbl> 4164.84, 3552.85, 4480.35, 3376.14, 3787.15, 511…
#> $ sun_rad_maximum <dbl> 7515.76, 7189.52, 7601.16, 7099.44, 7301.63, 776…
#> $ sun_rad_minimum <dbl> 897.09, 356.39, 1271.03, 230.87, 552.49, 2118.26
#> $ sun_rad_range <dbl> 6618.67, 6833.12, 6330.14, 6868.57, 6749.14, 564…
#> $ topo_slope <dbl> 3.10, 0.39, 1.25, 0.28, 0.43, 1.88The response variable is species, a categorical variable
with 9 levels (8 species + absence):
quercus |>
sf::st_drop_geometry() |>
dplyr::group_by(species) |>
dplyr::summarise(
n = dplyr::n()
) |>
dplyr::arrange(
dplyr::desc(n)
)
#> # A tibble: 9 × 2
#> species n
#> <fct> <int>
#> 1 absence 1899
#> 2 Quercus robur 1660
#> 3 Quercus petraea 1445
#> 4 Quercus ilex 627
#> 5 Quercus cerris 394
#> 6 Quercus faginea 287
#> 7 Quercus pubescens 225
#> 8 Quercus pyrenaica 133
#> 9 Quercus suber 58The dataset includes 31 predictor variables organized into six categories:
| Category | Variables | Source |
|---|---|---|
| Bioclimatic (17) |
bio1, bio2, …, bio19
(excluding bio8, bio9) |
WorldClim |
| NDVI (4) |
ndvi_average, ndvi_maximum,
ndvi_minimum, ndvi_range
|
MODIS |
| Solar radiation (4) |
sun_rad_average, sun_rad_maximum,
sun_rad_minimum, sun_rad_range
|
WorldClim |
| Land cover (3) |
landcover_veg_bare, landcover_veg_herb,
landcover_veg_tree
|
MODIS MOD44B |
| Topographic (2) |
topo_slope, topographic_diversity
|
SRTM |
| Human impact (1) | human_footprint |
Venter et al. 2016 |
The map below shows the spatial distribution of oak species and absence points across Europe:
mapview::mapview(
quercus,
zcol = "species",
layer.name = "Species",
col.regions = quercus_colors,
color = NULL
)