Source Code
You can download the Rmarkdown notebook used to render this article
here.
To download the file, use the button Download raw file on the
upper-right hand of the Code panel.
Overview
The trees dataset focuses on the distribution of the
tree species present in Mesoamerica across North and South America. The
presence of Mesoamerican trees was obtained from the Tree
Biodiversity Network (BIOTREE-NET) dataset (project now
defunct). The distribution of these species across the Americas was
derived from GBIF. The data represents richness of
Mesoamerican trees in the Americas for 3,373 hexagonal grid cells
across the Americas, combined with 50 environmental predictor variables
from different sources. The dataset was originally compiled for Benito
et al. (2013).
Its main features are:
- Americas scope: Hexagonal cells covering longitudes -125.3° to -34.3° and latitudes -34.4° to 49.9°.
-
Single response:
trees, an integer count of tree species richness per hexagonal cell. - Rich environmental predictors: 50 predictors spanning 10 categories (climate, soil, vegetation, geography, and more).
We designed it to support regression modelling, multicollinearity filtering, and spatial analysis of tree diversity patterns.
Description
The dataset is an sf data frame with 3373 rows and 53
columns, and 989 cells with NA in the trees
response variable (cells where environmental data is valid but no tree
species were found in the relevant databases). The first 10 records and
all columns but geometry are shown below.
trees |>
head(n = 10) |>
dplyr::glimpse()
#> Rows: 10
#> Columns: 53
#> $ cellid <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10
#> $ trees <int> 1, NA, 1, 5, 1, NA, NA, NA, NA, NA
#> $ air_humidity_max <dbl> 66.72578, 68.97865, 65.92500, 63.75984, …
#> $ air_humidity <dbl> 64.19672, 65.28275, 62.97762, 60.19885, …
#> $ air_humidity_min <dbl> 62.62891, 62.51946, 59.47713, 55.22597, …
#> $ air_humidity_range <dbl> 3.675112, 5.961699, 5.949539, 8.043704, …
#> $ aridity <dbl> 3.034283, 7.000583, 2.228907, 3.352518, …
#> $ cloud_cover_max <dbl> 51.80775, 59.59544, 49.02710, 53.67749, …
#> $ cloud_cover <dbl> 39.38748, 49.43344, 36.08522, 37.64137, …
#> $ cloud_cover_min <dbl> 23.980626, 37.773336, 18.689770, 14.2178…
#> $ cloud_cover_range <dbl> 27.21311, 21.36061, 29.83153, 38.95784, …
#> $ evapotranspiration_max <dbl> 145.8778, 121.9521, 162.0084, 165.8928, …
#> $ evapotranspiration <dbl> 81.39344, 62.25680, 86.53583, 83.38519, …
#> $ evapotranspiration_min <dbl> 31.34277, 18.14818, 26.22737, 24.54848, …
#> $ evapotranspiration_range <dbl> 114.0328, 103.3089, 135.2890, 140.8501, …
#> $ rainfall_seasonality <dbl> 69.05514, 58.10799, 83.43844, 73.72836, …
#> $ rainfall <dbl> 1984.803, 3788.573, 1797.700, 2674.796, …
#> $ rainfall_coldest_quarter <dbl> 899.4158, 1485.1863, 935.7548, 1274.9368…
#> $ rainfall_driest_month <dbl> 13.207154, 72.568857, 4.360102, 12.90247…
#> $ rainfall_driest_quarter <dbl> 74.86140, 287.73483, 34.07716, 87.48721,…
#> $ rainfall_warmest_quarter <dbl> 74.86140, 314.23001, 34.07716, 87.48721,…
#> $ rainfall_wettest_month <dbl> 354.6468, 601.3340, 376.6572, 511.7660, …
#> $ rainfall_wettest_quarter <dbl> 952.3741, 1699.2585, 954.7104, 1328.0056…
#> $ temperature_isothermality <dbl> 42.31744, 31.10151, 46.19262, 40.70953, …
#> $ temperature_mean_daily_range <dbl> 4.970194, 4.531185, 7.362851, 7.845559, …
#> $ temperature <dbl> 10.965723, 8.226245, 10.837620, 10.12197…
#> $ temperature_range <dbl> 12.12370, 15.21725, 16.33477, 19.58319, …
#> $ temperature_seasonality <dbl> 255.8554, 356.1905, 308.2309, 410.3759, …
#> $ temperature_coldest_month_min <dbl> 5.6348733, 1.8606111, 4.1480463, 2.53274…
#> $ temperature_coldest_quarter <dbl> 8.016393, 4.015278, 7.217357, 5.473721, …
#> $ temperature_driest_quarter <dbl> 14.54247, 12.75220, 15.08227, 15.84711, …
#> $ temperature_warmest_month_max <dbl> 18.26975, 17.30557, 20.92225, 22.42327, …
#> $ temperature_warmest_quarter <dbl> 14.54247, 13.03118, 15.08227, 15.84711, …
#> $ temperature_wettest_quarter <dbl> 8.542474, 4.589159, 7.622423, 5.954469, …
#> $ distance_to_ocean <dbl> 47.70194, 95.32629, 160.80169, 236.81731…
#> $ elevation <dbl> 82.64232, 279.17706, 327.39466, 516.4347…
#> $ latitude <dbl> 43.07518, 48.57816, 40.58208, 42.43964, …
#> $ longitude <dbl> -124.3935, -124.6602, -124.0775, -124.14…
#> $ soil_clay <dbl> 18.97297, 19.41023, 26.68421, 28.00797, …
#> $ soil_nitrogen <dbl> 4.081081, 5.990067, 3.774358, 4.297677, …
#> $ soil_organic_carbon <dbl> 69.37015, 86.66117, 72.22415, 75.67207, …
#> $ soil_ph <dbl> 5.242456, 4.839451, 5.537970, 5.388273, …
#> $ soil_sand <dbl> 49.89072, 32.86094, 28.39756, 31.97907, …
#> $ soil_silt <dbl> 29.78496, 46.36619, 43.56146, 38.65843, …
#> $ soil_temperature_max <dbl> 20.68563, 19.32484, 23.68939, 24.77120, …
#> $ soil_temperature <dbl> 10.056886, 7.044211, 11.533294, 10.98734…
#> $ soil_temperature_min <dbl> 1.9985030, -0.7671579, 2.9644339, 1.4102…
#> $ soil_temperature_range <dbl> 18.21407, 20.42632, 20.22940, 22.93742, …
#> $ ndvi_max <dbl> 0.7350104, 0.8117852, 0.7702507, 0.77888…
#> $ ndvi <dbl> 0.6665069, 0.7123055, 0.7063288, 0.71021…
#> $ ndvi_min <dbl> 0.5839046, 0.6050872, 0.6405807, 0.62898…
#> $ ndvi_range <dbl> 0.1511058, 0.2066980, 0.1296699, 0.14990…
#> $ geometry <POLYGON [°]> POLYGON ((-124.8289 42.6689..., POLYGON …The hexagonal grid cells cover the Americas between approximately 50°N and 35°S. The map below shows the grid coloured by tree richness.
