In support of our manuscript, we developed an R (R Core Team 2023) package to help construct integrated species distribution models (ISDMs) from disparate datasets in a simple and reproducible framework. This R Markdown document presents an illustration of the package by creating an ISDM for red-listed plant species obtained via the Vascular Plant Field Notes survey program in Norway, as well as citizen-science data obtained from Global Biodiversity Information Facility (GBIF). The first step in exploring this document is to download the package using the following script:
intSDM has two main functions:
sdmWorkflow(). The first of which is designed to setup
and specify all the individual components of the workflow using
different slot functions. The functions related to this object
|Function name||Function use|
||Plot data and other objects required for the model.|
||Add data not available on GBIF.|
||Create an inla.mesh object.|
||Add data from GBIF.|
||Specify sampling domain.|
||Add spatial covariates.|
||Specify the cross-validation method.|
||Add R-INLA (Martins et al. 2013), inlabru (Bachl et al. 2019) and PointedSDMs (Mostert and O’Hara 2023) options.|
||Add penalizing complexity priors to the spatial effects.|
||Specify an additional spatial effect for a dataset.|
||Specify the output of the workflow.|
||Obtain metadata for the occurrence records.|
sdmWorkflow() implements the workflow based on the
objects added in
startWorkflow(). The output of this
function is a list of objects specified in
To start the workflow, we need to specify the coordinate reference system (CRS) considered for the analysis as well as the species used. The three species selected for this analysis from the Vascular Plant Field Notes (arnica montana, fraxinus excelsior and ulmus glabra) have records predominantly spread across the southern and eastern part of Norway. However the species ulmus glabra (which has the largest spread of the three species selected), has some of the records approaching the middle and middle-upper parts of Norway.
The other arguments (saveOptions and Save) should be used if the user wants the objects to be saved in a folder created by the function.
Next we specify the study domain for the study: in this case Norway.
This can be achieved using the countryName argument which will
then access the object from the giscoR (Hernangómez 2023) R package. The …
argument is used to specify any additional arguments for
giscoR::gisco_get_countries() function (in this case,
resolution). If you have your own layer you would like to
specify for the workflow (for example, a non-country layer), this may be
added using the Object argument.
We then use
.$plot() to see what the boundary looks
Species’ occurrence data is certainly the most important component of
a SDM, and intSDM has two slot functions to help you add data
into the workflow:
The former of which uses the rgbif package to download data
directly from GBIF. For this function, we need to specify the name of
the dataset (datasetName) and the type of the dataset
(datasetType) – which can be one of PO, PA or
Counts. The … argument is used to specify any
addiditional arguments for
rgbif::occ_data() (Chamberlain, Oldoni, and Waller 2022) (in this
case, limit and datasetKey). If
datasetType = 'PA', absences may be generated
generateAbsences = TRUE. This will treat the
obtained data as a checklist survey data: combining all the sampling
locations for the species in the dataset, and creating absences when a
given species did not occur in a given region.
For this example we consider three sources of data. The Vascular Plant Field Notes is a collection of observations provided by the Norwegian University of Science and Technology’s (NTNU) (Norwegian University of Science and Technology 2023) University Museum and the University of Oslo (UiO) (University of Oslo 2023), containing records of standardized cross-lists of most vascular plants found in Norway. We treat these two datasets as detection/non-detection data, generating absences in sampling locations where the species does not occur.
The other source of data considered comes from the Norwegian Species Observation service (published by Artsdatabanken) (The Norwegian Biodiversity Information Centre 2023). This data is a collection of citizen science records – and as a result we treat it as presence-only data.
workflow$addGBIF(datasetName = 'NTNU', datasetType = 'PA', limit = 10000, coordinateUncertaintyInMeters = '0,50', generateAbsences = TRUE, datasetKey = 'd29d79fd-2dc4-4ef5-89b8-cdf66994de0d') workflow$addGBIF(datasetName = 'UiO', datasetType = 'PA', limit = 10000, coordinateUncertaintyInMeters = '0,50', generateAbsences = TRUE, datasetKey = 'e45c7d91-81c6-4455-86e3-2965a5739b1f') workflow$addGBIF(datasetName = 'CZ', datasetType = 'PO', coordinateUncertaintyInMeters = '0,50', limit = 10000, datasetKey = 'b124e1e0-4755-430f-9eab-894f25a9b59c') workflow$plot(Species = TRUE)
Covariate data may be added to the model using
.$addCovariates(). Layers from WorldClim (Fick and Hijmans 2017) may be accessed
using the worldClim argument. This in turn uses the geodata
(Hijmans et al. 2023) R package to
obtain spatRaster objects of the covariates cropped around the
study domain. Other covariate layers may be added using the
We can then view the metadata for the obtained occurrence records
.$obtainMeta() function, which will give us the
citation for the datasets used in this workflow.
One of the objects required for our model is an inla.mesh
object, which we will use in the approximation of our spatial random
.$addMesh() function’s argument … uses
INLA::inla.mesh.2d() to create this object.
Furthermore we also used penalizing complexity (PC) priors in our model, which are designed to control the spatial range and standard deviation in the GRF’s Matérn covariance function in order to reduce over-fitting in the model (Simpson et al. 2017).
We specify an additional spatial effect for the citizen science data
.$biasFields() to account biases in the collection
process (Simmonds et al. 2020).
Finally we specify some options used in
well as the output from our workflow – in this case we want plots of the
log-intensity for each species.
The workflow is then implemented using the
function. Due to the lengthy time it requires to produce this map,
inference is not made in this vignette. However the script is available
below for the user to run the model themselves.