class: center, middle, inverse, title-slide # Autocorrelation
in spatial regression
with Random Forest ### Blas M. Benito ### University of Alicante (Spain) ### 2021-09-09 --- class: center, middle <style type="text/css"> @import url(https://fonts.googleapis.com/css?family=IBM+Plex+Mono); .blockquote { padding: 10px px; margin: 0 0 20px; font-size: 150%; border-left: 5px solid #f1605d; } .large { font-size: 250%; } .medium { font-size: 200%; } .small { font-size: 150%; } .tiny { font-size: 100%; } .content-box { box-sizing: content-box; background-color: #e2e2e2; /* Total width: 160px + (2 * 20px) + (2 * 8px) = 216px Total height: 80px + (2 * 20px) + (2 * 8px) = 136px Content box width: 160px Content box height: 80px */; border-radius: 15px; margin: 0 0 25px; overflow: hidden; padding: 20px; width: 100%; background-color: #f1605d; font-size: 200%; color: white; } ul { list-style: none; /* Remove default bullets */ } ul li{ font-size:150%; } ul li::before { content: "\02192"; /* Add content: \2022 is the CSS Code/unicode for a bullet */ color: #f1605d; /* Change the color */ font-weight: bold; /* If you want it to be bold */ display: inline-block; /* Needed to add space between the bullet and the text */ width: 1em; /* Also needed for space (tweak if needed) */ margin-left: -1em; /* Also needed for space (tweak if needed) */ font-size: 150%; } .remark-slide-content.full-slide-fig { padding: 0px 0px 0px 0px; width: 100%; } .remark-code { font-size: 26px; } .large .remark-code { /*Change made here*/ font-size: 65% !important; } .medium .remark-code { /*Change made here*/ font-size: 60% !important; } .small .remark-code { /*Change made here*/ font-size: 42% !important; } </style> # Availability .left[ .small[ Live slideshow: [https://blasbenito.github.io/spatialRF_talk/talk.html](https://blasbenito.github.io/spatialRF_talk/talk.html) Github repo: [BlasBenito/spatialRF_talk](https://github.com/BlasBenito/spatialRF_talk) ] ] <img src="figures/github.png" width="70%" height="40%" /> --- class: inverse, center, middle # WHAT IS <br> SPATIAL AUTOCORRELATION <br> AND <br> WHY DO WE CARE? --- class: center, middle ## Tobler’s First Law of Geography .blockquote[ Everything is usually related to all else, but those which are near to each other are more related when compared to those that are further away. .right[-- <cite>Waldo Tobler, 1970</cite>] ] <br> .content-box[Similarity depends on distance] <br> .medium[We call it **spatial autocorrelation** (SAC)] --- class: center <img src="talk_files/figure-html/unnamed-chunk-2-1.png" width="1224" /> --- class: center middle <img src="talk_files/figure-html/unnamed-chunk-3-1.png" width="936" /> --- class: center, middle .content-box[What does SAC *really* represent?] ## The *footprint* <br> of the process generating <br> the variable of interest! .medium[(mixed with the observation scale and sampling structure)] --- class: center ## Cholera map (John Snow<sup>1</sup>, 1854) <img src="figures/cholera.jpg" width="60%" height="60%" /> <br> .small[ .footnote[ [1] Not *that* John Snow ] ] --- class: center ## Cholera map (John Snow, 1854) <img src="figures/cholera2.jpg" width="60%" height="60%" /> --- class: center ## Colony of Imperial cormorants <br> (*Leucocarbo atriceps*) <img src="figures/cormorants.jpeg" width="90%" height="60%" /> .small[ Source: [www.dailymail.co.uk](https://www.dailymail.co.uk/news/article-2982830/Amazing-photographs-thousands-nesting-Cormorants-gather-crowded-beach-annual-nesting-season.html) ] --- class: center ## Colony of Imperial cormorants <br> (*Leucocarbo atriceps*) <img src="figures/cormorants2.jpeg" width="90%" height="60%" /> .small[ Source: [www.dailymail.co.uk](https://www.dailymail.co.uk/news/article-2982830/Amazing-photographs-thousands-nesting-Cormorants-gather-crowded-beach-annual-nesting-season.html) ] --- class: inverse middle center # SPATIAL PREDICTORS --- class: middle center ## WHAT ARE <br> SPATIAL PREDICTORS? .medium[ **Variables representing <br> the spatial structure of the data** <br><br> Proxies of the process <br> originating spatial autocorrelation <br><br> How?: **Eigenvectors of a neighborhood matrix** ] --- class: middle center ### A GOOD PAPER TO START <img src="figures/paper.png" width="100%" height="60%" /> --- class: middle center ## MAIN IDEA .medium[ **Linear combinations of the eigenvectors of a neighborhood matrix represent all the possible spatial configurations of a given set of spatial records** ] .small[Let's see how that works!] --- class: middle center ### HYPOTHETIC SPATIAL RECORDS <img src="talk_files/figure-html/unnamed-chunk-10-1.png" width="60%" height="50%" /> --- class: middle center ### DISTANCE MATRIX <br> <img src="talk_files/figure-html/unnamed-chunk-11-1.png" width="504" height="90%" /> --- class: middle center ### MATRIX OF WEIGHTS .small[ Computed as 1/distance matrix ] <img src="talk_files/figure-html/unnamed-chunk-12-1.png" width="504" height="90%" /> --- class: middle center ### NORMALIZED AND DOUBLE-CENTERED .small[ column and row means are 0 ] <img src="talk_files/figure-html/unnamed-chunk-13-1.png" width="504" height="100%" /> --- class: middle center ### EIGENVECTORS IN SPACE <img src="talk_files/figure-html/unnamed-chunk-14-1.png" width="120%" height="60%" /> --- class: middle center ### EIGENVECTORS WITH MORAN'S I > 0 <img src="talk_files/figure-html/unnamed-chunk-15-1.png" width="120%" height="60%" /> --- class: middle center ### MODEL TRAINING <img src="figures/eigenvectors.png" width="90%" height="60%" /> --- class: inverse middle center # Example with the R package *spatialRF* --- class: middle center ## The R package *spatialRF* .left[ - Github repo: [https://github.com/BlasBenito/spatialRF](https://github.com/BlasBenito/spatialRF) - Website: [https://blasbenito.github.io/spatialRF/](https://blasbenito.github.io/spatialRF/) - Not in CRAN yet. .small[Install:] .medium[ ```r remotes::install_github( repo = "blasbenito/spatialRF", ref = "development" ) library(spatialRF) ``` ] ] --- class: middle left ## EXAMPLE DATA .pull-left[ .tiny[ - **Response variable**: plant richness of the American ecoregions - **14 predictors** (climate, fragmentation, human impact, etc) - **Distance matrix** among the ecoregion polygons (centroids shown in the figure) ] ] .pull-right[ <img src="talk_files/figure-html/unnamed-chunk-19-1.png" width="100%" height="100%" /> ] --- class: middle left ### MODELLING SETUP .left[ .small[ ```r #loading training data and distance matrix data(plant_richness_df) data(distance_matrix) #names of the response and predictors response.name <- "richness_species_vascular" predictor.names <- c( "human_population", #human "human_population_density", "human_footprint_average", "climate_hypervolume", #climate "climate_bio1_average", "climate_bio15_minimum", "climate_aridity_index_average", "climate_velocity_lgm_average", "neighbors_area", #neighbours "neighbors_count", "neighbors_percent_shared_edge", "bias_area_km2", #size and shape "fragmentation_cohesion", "fragmentation_division" ) #distance thresholds in km distance.thresholds <- c(50, 500, 5000) ``` ] ] --- class: middle left ### FITTING THE MODELS .medium[ ```r #non-spatial model, predictors only model.non.spatial <- spatialRF::rf( data = plant_richness_df, dependent.variable.name = response.name, predictor.variable.names = predictor.names, distance.matrix = distance_matrix, distance.thresholds = distance.thresholds ) ``` ```r #spatial model, predictors and spatial predictors model.spatial <- spatialRF::rf_spatial( model = model.non.spatial, method = "mem.moran.sequential" ) ``` ] --- class: middle left ## COMPARING THE MODELS WITH SPATIAL CROSS-VALIDATION .pull-left[ .small[ <br> ```r model.comparison <- spatialRF::rf_compare( models = list( model.non.spatial = model.non.spatial, model.spatial = model.spatial ), xy = plant_richness_df[, c("x", "y")], repetitions = 100, metrics = "rmse" ) ``` ] ] .pull-right[ <img src="talk_files/figure-html/unnamed-chunk-24-1.png" width="100%" /> ] --- class: middle center ## MODEL COMPARISON <br> <table> <thead> <tr> <th style="text-align:left;"> Model </th> <th style="text-align:right;"> spatial predictors </th> <th style="text-align:right;"> Moran's I of residuals </th> <th style="text-align:right;"> RMSE spatial cv </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> Non spatial </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 0.15 </td> <td style="text-align:right;"> 3104 </td> </tr> <tr> <td style="text-align:left;"> Spatial </td> <td style="text-align:right;"> 18 </td> <td style="text-align:right;"> 0.03 </td> <td style="text-align:right;"> 3514 </td> </tr> </tbody> </table> --- class: middle center ## VARIABLE IMPORTANCE <img src="talk_files/figure-html/unnamed-chunk-26-1.png" width="120%" height="60%" /> --- class: middle center ## IMPORTANCE OF SPATIAL PREDICTORS <img src="talk_files/figure-html/unnamed-chunk-27-1.png" width="120%" height="60%" /> --- class: middle center ## RESPONSE CURVES <img src="talk_files/figure-html/unnamed-chunk-28-1.png" width="120%" height="60%" /> --- class: middle left ## A FEW IDEAS - **Spatial predictors** reduce SAC in model residuals - **Spatial predictors** hinder model transferability - **Eigenvectors**: Outcome similar to base model, plus information about the importance of spatial processes - Computational demands limit sample size (1000 to 5000, depending on the available RAM) --- class: middle center ## FINAL MESSAGE .medium[ Incorporating spatial predictors into machine learning models might help unveil underlying spatial processes not represented by the covariates. ] --- class: middle background-image: url("figures/end.png") background-size: contain