Global plant diversity: spatial random forests across 662 ecoregions
Source:vignettes/articles/plantae.Rmd
plantae.RmdOverview
Plant species richness is not randomly distributed — it peaks in
tropical regions, tracks climate gradients, and is shaped by
biogeographic history and human impact. Understanding those drivers
requires datasets with both broad geographic scope and many predictors,
which is exactly what plantae provides.
This dataset was created in the context of research on the biogeography of global drylands (Maestre et al. 2021, New Phytologist, 231(2), 540–558, doi:10.1111/nph.17398).
The plantae dataset contains plant diversity metrics for
662 global ecoregions, derived from 244M+ GBIF
plant presence records and a large set of environmental
predictors (see plantae_predictors).
Its main features are:
- Complete case study: No missing values, clean data ready for analysis.
- Global scope: All major biomes and biogeographic realms.
-
Multiple responses: 53 response variables (see
plantae_responses) covering richness, rarity, and beta diversity. -
Large number of predictors: The 84 predictors (see
plantae_predictors) span 12 categories (temperature, precipitation, atmospheric, soil, fragmentation, human impact, and more).
It is designed to support a wide range of ecological modelling approaches — from richness regressions to beta diversity analyses.
Description
The dataset is an sf data frame with 662 rows and 143
columns, and no missing data. The first 10 records and all columns but
geometry are shown below.
plantae |>
head(n = 10) |>
dplyr::glimpse()
#> Rows: 10
#> Columns: 143
#> $ ecoregion_id <dbl> 473, 193, 298, 613, 99, 805, …
#> $ ecoregion_name <chr> "Japurá-Solimões-Negro moist …
#> $ ecoregion_biome <chr> "Tropical & Subtropical Moist…
#> $ ecoregion_realm <chr> "Neotropic", "Australasia", "…
#> $ ecoregion_continent <chr> "Americas", "Oceania", "Euras…
#> $ richness_species <int> 4835, 2197, 1336, 4360, 2271,…
#> $ richness_genera <int> 1231, 650, 769, 1452, 795, 14…
#> $ richness_families <int> 203, 132, 180, 234, 164, 201,…
#> $ richness_classes <int> 6, 5, 7, 6, 4, 8, 5, 6, 5, 6
#> $ richness_species_trees <int> 2448, 166, 436, 1364, 729, 48…
#> $ richness_genera_trees <int> 577, 55, 268, 552, 303, 227, …
#> $ richness_families_trees <int> 112, 32, 72, 120, 89, 74, 10,…
#> $ richness_species_grasses <int> 112, 221, 43, 286, 141, 401, …
#> $ richness_genera_grasses <int> 50, 67, 35, 91, 55, 123, 42, …
#> $ rarity_weighted_richness_species <dbl> 426.27763, 269.42868, 97.6067…
#> $ rarity_weighted_richness_genera <dbl> 30.43368, 15.10900, 10.17875,…
#> $ rarity_weighted_richness_species_trees <dbl> 217.008423, 17.151178, 23.930…
#> $ rarity_weighted_richness_genera_trees <dbl> 17.6373842, 1.3127927, 4.0111…
#> $ rarity_weighted_richness_species_grasses <dbl> 5.648815, 22.025660, 3.102633…
#> $ rarity_weighted_richness_genera_grasses <dbl> 0.8883361, 1.0822623, 0.27355…
#> $ mean_rarity_species <dbl> 0.08816497, 0.12263481, 0.073…
#> $ mean_rarity_genera <dbl> 0.024722726, 0.023244614, 0.0…
#> $ mean_rarity_species_trees <dbl> 0.08864723, 0.10332035, 0.054…
#> $ mean_rarity_genera_trees <dbl> 0.030567390, 0.023868958, 0.0…
#> $ mean_rarity_species_grasses <dbl> 0.05043585, 0.09966362, 0.072…
#> $ mean_rarity_genera_grasses <dbl> 0.017766722, 0.016153169, 0.0…
#> $ betadiversity_R_species <int> 9498, 11004, 1997, 19412, 854…
#> $ betadiversity_R_percent_species <dbl> 64.86819, 83.04906, 55.41065,…
#> $ betadiversity_R_genera <int> 826, 1925, 641, 1884, 1005, 4…
#> $ betadiversity_R_percent_genera <dbl> 39.57834, 74.58349, 43.39878,…
#> $ betadiversity_R_families <int> 51, 166, 51, 102, 98, 35, 58,…
#> $ betadiversity_R_percent_families <dbl> 20.07874, 55.70470, 21.61017,…
#> $ betadiversity_R_species_trees <int> 3591, 2389, 467, 4341, 2382, …
#> $ betadiversity_R_percent_species_trees <dbl> 58.49487, 93.46635, 47.84836,…
#> $ betadiversity_R_genera_trees <int> 328, 555, 184, 581, 298, 49, …
#> $ betadiversity_R_percent_genera_trees <dbl> 35.96491, 90.98361, 38.41336,…
#> $ betadiversity_R_families_trees <int> 25, 94, 34, 57, 39, 16, 27, 2…
#> $ betadiversity_R_percent_families_trees <dbl> 18.11594, 74.60317, 32.07547,…
#> $ betadiversity_R_species_grasses <int> 336, 744, 145, 737, 363, 331,…
#> $ betadiversity_R_percent_species_grasses <dbl> 73.84615, 76.85950, 73.60406,…
#> $ betadiversity_R_genera_grasses <int> 48, 133, 55, 99, 88, 38, 47, …
#> $ betadiversity_R_percent_genera_grasses <dbl> 48.48485, 66.50000, 59.78261,…
#> $ betadiversity_sorensen_species <dbl> 0.5277546, 0.7210027, 0.54379…
#> $ betadiversity_sorensen_genera <dbl> 0.2694647, 0.6006202, 0.35566…
#> $ betadiversity_sorensen_families <dbl> 0.11159737, 0.38604651, 0.148…
#> $ betadiversity_sorensen_species_trees <dbl> 0.4470242, 0.8787211, 0.45998…
#> $ betadiversity_sorensen_genera_trees <dbl> 0.2307692, 0.8345865, 0.33055…
#> $ betadiversity_sorensen_families_trees <dbl> 0.10843373, 0.59493671, 0.191…
#> $ betadiversity_sorensen_species_grasses <dbl> 0.6250000, 0.6323777, 0.70562…
#> $ betadiversity_sorensen_genera_grasses <dbl> 0.3378378, 0.4981273, 0.47200…
#> $ betadiversity_simpson_species <dbl> 0.06390900, 0.02230314, 0.202…
#> $ betadiversity_simpson_genera <dbl> 0.024370431, 0.009230769, 0.0…
#> $ betadiversity_simpson_families <dbl> 0.000000000, 0.000000000, 0.0…
#> $ betadiversity_simpson_species_trees <dbl> 0.041275031, 0.006024096, 0.1…
#> $ betadiversity_simpson_genera_trees <dbl> 0.012131716, 0.000000000, 0.1…
#> $ betadiversity_simpson_families_trees <dbl> 0.008928571, 0.000000000, 0.0…
#> $ betadiversity_simpson_species_grasses <dbl> 0.06250000, 0.01357466, 0.209…
#> $ betadiversity_simpson_genera_grasses <dbl> 0.02000000, 0.00000000, 0.057…
#> $ bias_log_records <dbl> 9.881395, 12.170088, 8.815815…
#> $ geo_neighbors_count <int> 10, 3, 6, 13, 3, 4, 8, 4, 5, 4
#> $ geo_neighbors_area_km2 <int> 2005112, 761733, 499488, 7202…
#> $ geo_neighbors_aridity_mean <dbl> -0.9563798, 0.5657846, -0.273…
#> $ geo_area_km2 <int> 268347, 11940, 81922, 24929, …
#> $ geo_polygons_count <int> 2, 7, 1, 43, 1, 1, 1, 6, 1, 1
#> $ geo_perimeter_km <int> 8567, 3563, 1987, 8303, 1623,…
#> $ geo_shared_perimeter_km <int> 8459, 3528, 1987, 4979, 904, …
#> $ geo_shared_perimeter_fraction <dbl> 0.99, 0.99, 1.00, 0.60, 0.56,…
#> $ geo_distance_to_ocean <int> 9367, 1174, 1409, 85, 358, 50…
#> $ geo_elevation_mean <dbl> 86.016377, 1300.268070, 477.2…
#> $ human_population <int> 1897238, 2450, 46472620, 1135…
#> $ human_population_density <dbl> 7.0675297, 0.1987217, 567.254…
#> $ human_footprint_mean <dbl> 0.4305056, 2.5179189, 19.3168…
#> $ climate_velocity_lgm_mean <dbl> 7.6806232, 0.8897423, 2.43905…
#> $ climate_hypervolume <dbl> 0.022396502, 0.012212734, 0.0…
#> $ landcover_bare_percent_mean <dbl> 0.10334276, 0.28396785, 6.818…
#> $ landcover_herbs_percent_mean <dbl> 16.22373, 28.57396, 81.62750,…
#> $ landcover_trees_percent_mean <dbl> 80.99299, 70.83879, 11.30655,…
#> $ fragmentation_ai <dbl> 98.23654, 74.69482, 99.02252,…
#> $ fragmentation_area_mn <dbl> 3388500.00, 21289.66, 8200800…
#> $ fragmentation_ca <dbl> 27108000, 1234800, 8200800, 2…
#> $ fragmentation_clumpy <dbl> 0.9753594, 0.6897991, 0.98618…
#> $ fragmentation_cohesion <dbl> 99.81169, 96.55357, 99.67810,…
#> $ fragmentation_contig_mn <dbl> 0.3938724, 0.1382791, 0.98054…
#> $ fragmentation_core_mn <dbl> 3185850.000, 10000.000, 77924…
#> $ fragmentation_cpland <dbl> 26.7325362, 8.6536166, 27.793…
#> $ fragmentation_dcore_mn <dbl> 1.8750000, 0.7586207, 1.00000…
#> $ fragmentation_division <dbl> 0.9195281, 0.9882589, 0.91444…
#> $ fragmentation_ed <dbl> 0.12164884, 0.98143948, 0.096…
#> $ fragmentation_lsi <dbl> 5.573765, 14.812548, 2.392006…
#> $ fragmentation_mesh <dbl> 7672193.128, 78693.459, 23987…
#> $ fragmentation_ndca <int> 15, 44, 1, 122, 3, 7, 2, 6, 2…
#> $ fragmentation_nlsi <dbl> 42.98268, 543.03662, 33.63983…
#> $ fragmentation_np <int> 8, 58, 1, 126, 3, 1, 2, 6, 1,…
#> $ fragmentation_shape_mn <dbl> 1.801587, 1.678439, 2.392006,…
#> $ fragmentation_tca <dbl> 25486800, 580000, 7792400, 11…
#> $ fragmentation_te <dbl> 11598000, 6578000, 2692000, 1…
#> $ air_humidity_max <dbl> 73.55356, 66.91849, 63.06462,…
#> $ air_humidity_mean <dbl> 71.62155, 60.41771, 57.23757,…
#> $ air_humidity_min <dbl> 69.86486, 54.90084, 50.29069,…
#> $ air_humidity_range <dbl> 3.215996, 11.534526, 12.25825…
#> $ aridity_mean <dbl> 2.2031174, 1.2921102, 0.51042…
#> $ cloud_cover_max <dbl> 71.63208, 53.53228, 70.77399,…
#> $ cloud_cover_mean <dbl> 57.23377, 39.98402, 42.42303,…
#> $ cloud_cover_min <dbl> 40.902202, 29.684742, 14.4396…
#> $ cloud_cover_range <dbl> 30.21841, 23.36135, 55.84739,…
#> $ evapotranspiration_max <dbl> 135.4485, 155.6613, 185.5662,…
#> $ evapotranspiration_mean <dbl> 114.07073, 83.18569, 151.0150…
#> $ evapotranspiration_min <dbl> 91.826448, 25.519873, 116.057…
#> $ evapotranspiration_range <dbl> 43.12295, 129.64771, 69.01481…
#> $ precipitation_seasonality <dbl> 27.33326, 22.15575, 77.16738,…
#> $ precipitation_total <dbl> 2953.3259, 1131.5263, 854.425…
#> $ precipitation_coldest_quarter <dbl> 836.69665, 358.70920, 167.813…
#> $ precipitation_driest_month <dbl> 158.361408, 61.908393, 7.3857…
#> $ precipitation_driest_quarter <dbl> 507.18217, 200.21909, 36.5709…
#> $ precipitation_warmest_quarter <dbl> 562.74768, 230.81961, 165.048…
#> $ precipitation_wettest_month <dbl> 363.2224, 129.9431, 182.6952,…
#> $ precipitation_wettest_quarter <dbl> 1015.6067, 368.3444, 438.6092…
#> $ temperature_isothermality <dbl> 76.99648, 40.23496, 55.37760,…
#> $ temperature_mean_daily_range <dbl> 5.816152, 8.847707, 9.680353,…
#> $ temperature_mean <dbl> 25.0341507, 7.4928180, 25.648…
#> $ temperature_range <dbl> 7.676869, 22.356158, 17.68530…
#> $ temperature_seasonality <dbl> 42.41023, 476.58344, 201.1696…
#> $ temperature_coldest_month <dbl> 21.438079, -1.379348, 16.7388…
#> $ temperature_coldest_quarter <dbl> 24.483276, 1.468820, 22.97271…
#> $ temperature_driest_quarter <dbl> 25.293351, 10.979521, 24.8094…
#> $ temperature_warmest_month <dbl> 29.66041, 20.53516, 34.85455,…
#> $ temperature_warmest_quarter <dbl> 25.57681, 13.64903, 28.34105,…
#> $ temperature_wettest_quarter <dbl> 24.788659, 3.620998, 24.89750…
#> $ soil_clay <dbl> 27.04603, 20.25723, 32.27310,…
#> $ soil_nitrogen <dbl> 1.5610697, 2.8912605, 1.19533…
#> $ soil_organic_carbon <dbl> 22.83940, 66.46767, 14.38392,…
#> $ soil_ph <dbl> 4.328191, 5.281891, 6.858474,…
#> $ soil_sand <dbl> 38.61876, 55.87067, 39.68722,…
#> $ soil_silt <dbl> 32.98165, 22.52010, 26.68605,…
#> $ soil_temperature_max <dbl> 28.72042, 22.68232, 39.01643,…
#> $ soil_temperature_mean <dbl> 24.7056713, 9.2987020, 27.748…
#> $ soil_temperature_min <dbl> 20.84371617, -0.07568503, 18.…
#> $ soil_temperature_range <dbl> 7.340125, 22.637208, 20.05041…
#> $ ndvi_max <dbl> 0.7685034, 0.6779593, 0.58250…
#> $ ndvi_mean <dbl> 0.6823738, 0.6013160, 0.42989…
#> $ ndvi_min <dbl> 0.5827225, 0.4948931, 0.32135…
#> $ ndvi_range <dbl> 0.1857809, 0.1830662, 0.26115…
#> $ geometry <POINT [°]> POINT (-65.45348 -0.9548852),…The spatial distribution of ecoregion centroids covers all continents:
mapview::mapview(
plantae,
zcol = "richness_species",
layer.name = "Species richness",
col.regions = viridis::turbo(
n = length(unique(plantae$richness_species)),
direction = -1
),
cex = sqrt(plantae$richness_species)/10
)A version with the original ecoregion geometries instead of point centroids can be downloaded with [plantae_extra()].
plantae_polygons <- spatialData::plantae_extra()
mapview::mapview(
plantae_polygons,
zcol = "richness_species",
layer.name = "Species richness",
col.regions = viridis::turbo(
n = length(unique(plantae$richness_species)),
direction = -1
)
)