title: “MRFs and CRFs for Bird.parasites data” output: rmarkdown::html_vignette vignette: > % % % —

We can explore the model’s primary functions using a test dataset
that is available with the package. Load the `Bird.parasites`

dataset, which contains binary occurrences of four avian blood parasites
in New Caledonian *Zosterops* species (available in its original
form at Dryad; Clark *et al* 2016). A single continuous
covariate is also included (`scale.prop.zos`

), which reflects
the relative abundance of *Zosterops* species (compared to other
avian species) among different sample sites

```
library(MRFcov)
data("Bird.parasites")
```

The `Bird.parasites`

dataset is already in the appropriate
structure for running `MRFcov`

models; however, it is useful
to understand how this was done. Using the raw dataset (downloaded from
`Dryad`

at the above link), we created a scaled continuous
covariate to represent *Zosterops* spp. proportional captures in
each sample site (as an estimate of relative abundance)

```
#Not run
#install.packages(dplyr)
data.paras = data.frame(data.paras) %>%
dplyr::group_by(Capturesession,Genus) %>%
dplyr::summarise(count = dlyr::n()) %>%
dplyr::mutate(prop.zos = count / sum(count)) %>%
dplyr::left_join(data.paras) %>%
dplyr::ungroup() %>% dplyr::filter(Genus == 'Zosterops') %>%
dplyr::mutate(scale.prop.zos = as.vector(scale(prop.zos)))
data.paras <- data.paras[, c(12:15, 23)]
```

You can visualise the dataset to see how analysis data needs to be
structured. In short, for (`family = "binomial"`

) models,
node variable (i.e. species) occurrences should be included as binary
variables (`1`

s and `0`

s) as the
`n_nodes`

left-most variables in `data`

. Note that
Gaussian continuous variables (`family = "gaussian"`

) and
Poisson non-negative counts (`family = "poisson"`

) are also
supported in `MRFcov`

. Any covariates can be included as the
right-most variables. It is recommended that these covariates all be on
a similar scale, ideally using the `scale`

function for
continuous covariates (or similar) so that covariates have roughly
`mean = 0`

and `sd = 1`

, as this makes LASSO
variable selection and comparisons of effect sizes very
straightforward

```
help("Bird.parasites")
View(Bird.parasites)
```

We first test for interactions using a Markov Random Fields (MRF)
model without covariates. We set `n_nodes`

as the number of
species to be represented in the graphical model (`4`

). We
also need to specify the family for the model (`binomial`

in
this case). We do not specify the level of penalisation used by the
LASSO algorithm. Instead, this is optimised separately for each species
through cross validation (using function `cv.glmnet`

in the
`glmnet`

package). This ensures the log-likelihood of each
species is properly maximised before unifying them into an undirected
graph. Finally, it is important to note that for MRF model functions, we
can specify the number of processing cores to split the job across
(i.e. `n_cores`

). In this case we will only use 1 core, but
higher numbers *may* increase speed (if more cores are
available). Check available cores using the `detectCores()`

function in the `parallel`

package

`MRF_fit <- MRFcov(data = Bird.parasites[, c(1:4)], n_nodes = 4, family = 'binomial')`

```
## Leave-one-out cv used for the following low-occurrence (rare) nodes:
## Microfilaria ...
```

```
## Warning: `funs()` was deprecated in dplyr 0.8.0.
## ℹ Please use a list of either functions or lambdas:
##
## # Simple named list: list(mean = mean, median = median)
##
## # Auto named with `tibble::lst()`: tibble::lst(mean, median)
##
## # Using lambdas list(~ mean(., trim = .2), ~ median(., na.rm = TRUE))
## ℹ The deprecated feature was likely used in the MRFcov package.
## Please report the issue to the authors.
```

`## Fitting MRF models in sequence using 1 core ...`

The message above can be useful for identifying nodes (i.e. species)
that are very rare or very common, as these *can* be difficult to
properly model. In this case, the `Microfilaria`

node is
fairly rare, and so the function automatically reverts to using
leave-one-out cross-validation to optimise parameters. This can be slow,
so be aware when attempting to use large datasets that contain many very
rare or very common species. Now that the model has converged, we can
plot the estimated interaction coefficients as a heatmap using
`plotMRF_hm`

. Note that we can specify the names of nodes
should we want to change them

```
plotMRF_hm(MRF_mod = MRF_fit, main = 'MRF (no covariates)',
node_names = c('H. zosteropis', 'H. killangoi',
'Plasmodium', 'Microfilaria'))
```

Note that other plotting methods an be used as desired. For instance,
to plot these as a network instead, we can simply extract the adjacency
matrix and plot it using standard `igraph`

functions

```
net <- igraph::graph.adjacency(MRF_fit$graph, weighted = T, mode = "undirected")
igraph::plot.igraph(net, layout = igraph::layout.circle,
edge.width = abs(igraph::E(net)$weight),
edge.color = ifelse(igraph::E(net)$weight < 0,
'blue',
'red'))
```

We can now run a Conditional Random Fields (CRF) model using the
provided continuous covariate (`scale.prop.zos`

). Again, each
species’ regression is optimised separately using LASSO regularization.
Note that any columns in `data`

to the right of column
`n_nodes`

will be presumed to represent covariates if we
don’t specify an `n_covariates`

argument

`MRF_mod <- MRFcov(data = Bird.parasites, n_nodes = 4, family = 'binomial')`

```
## Leave-one-out cv used for the following low-occurrence (rare) nodes:
## Microfilaria ...
## Fitting MRF models in sequence using 1 core ...
```

Visualise the estimated species interaction coefficients as a
heatmap. These represent predicted interactions when the covariate is
set to its mean (i.e. in this case, when
`scale.prop.zos = 0`

). If we had included other covariates,
then this graph would represent interactions predicted when *all*
covariates were set at their means

`plotMRF_hm(MRF_mod = MRF_mod)`

Regression coefficients and their relative importances can be
accessed as well. This call returns a matrix of the raw coefficient, as
well as standardised coefficients (standardised by the `sd`

of the covariate). Standardisation in this way helps to compare the
relative influences of each parameter on the target species’ occurrence
probability, but in general the two coefficients will be identical
(unless users have not pre-scaled their covariates). The list also
contains contains each variable’s relative importance (calculated using
the formula `B^2 / sum(B^2)`

, where the vector of
`B`

s represents regression coefficients for predictor
variables). Variables with an underscore (`_`

) indicate an
interaction between a covariate and another node, suggesting that
conditional dependencies of the two nodes vary across environmental
gradients. Because of this, it is recommended to avoid using column
names with `_`

in them

`MRF_mod$key_coefs$Hzosteropis`

```
## Variable Rel_importance Standardised_coef Raw_coef
## 1 Hkillangoi 0.66231856 -2.3358266 -2.3358266
## 5 scale.prop.zos_Microfilaria 0.12393514 -1.0104256 -1.0104256
## 3 Microfilaria 0.10206554 0.9169522 0.9169522
## 4 scale.prop.zos 0.09265021 -0.8736356 -0.8736356
## 2 Plas 0.01458260 -0.3465968 -0.3465968
```

Finally, a useful capability is to generate some fake data and test predictions. For instance, say we want to know how frequently malaria parasite infections are predicted to occur in sites with high occurrence of microfilaria

```
fake.dat <- Bird.parasites
fake.dat$Microfilaria <- rbinom(nrow(Bird.parasites), 1, 0.8)
fake.preds <- predict_MRF(data = fake.dat, MRF_mod = MRF_mod)
```

The returned object from `predict_MRF`

depends on the
family of the model. For `family = "binomial"`

, we get a list
including both linear prediction probabilities and binary predictions
(where a linear prediction probability `> 0.5`

equates to
a binary prediction of `1`

). These binary predictions can be
used to estimate parasite prevalence

```
H.zos.pred.prev <- sum(fake.preds$Binary_predictions[, 'Hzosteropis']) / nrow(fake.preds$Binary_predictions)
Plas.pred.prev <- sum(fake.preds$Binary_predictions[, 'Plas']) / nrow(fake.preds$Binary_predictions)
Plas.pred.prev
```

`## [1] 0.4253898`

A key step in the process of model exploration is to determine
whether inclusion of covariates actually improves model fit and is
warranted when estimating interaction parameters. This is
straightforward for binomial models, as we can compare classification
accuracy quickly and easily. The `cv_diag`

functions will fit
models and then determine their predictive performances against the
supplied data. We can also compare MRF and CRF models directly to
determine whether covariates are warranted. For this dataset,
considering that parasite infections are quite rare, we are primarily
interested in maximising model Sensitivity (capacity to successfully
predict positive infections)

```
mod_fits <- cv_MRF_diag_rep(data = Bird.parasites, n_nodes = 4,
n_cores = 1, family = 'binomial', plot = F,
compare_null = T,
n_folds = 10)
```

```
## Generating node-optimised Conditional Random Fields model
##
## Generating Markov Random Fields model (no covariates)
##
## Calculating model predictions of the supplied data
## Generating CRF predictions ...
## Generating null MRF predictions ...
##
## Calculating predictive performance across test folds
## Processing cross-validation run 1 of 10 ...
## Processing cross-validation run 2 of 10 ...
## Processing cross-validation run 3 of 10 ...
## Processing cross-validation run 4 of 10 ...
## Processing cross-validation run 5 of 10 ...
## Processing cross-validation run 6 of 10 ...
## Processing cross-validation run 7 of 10 ...
## Processing cross-validation run 8 of 10 ...
## Processing cross-validation run 9 of 10 ...
## Processing cross-validation run 10 of 10 ...
```

```
# CRF (with covariates) model sensitivity
quantile(mod_fits$mean_sensitivity[mod_fits$model == 'CRF'], probs = c(0.05, 0.95))
```

```
## 5% 95%
## 0.2343402 0.5454545
```

```
# MRF (no covariates) model sensitivity
quantile(mod_fits$mean_sensitivity[mod_fits$model != 'CRF'], probs = c(0.05, 0.95))
```

```
## 5% 95%
## 0.1481481 0.4093260
```

We now may want to fit models to bootstrapped subsets of the data to
account for parameter uncertainty. Users can change the proportion of
observations to include in each bootstrap run with the
`sample_prop`

option.

`booted_MRF <- bootstrap_MRF(data = Bird.parasites, n_nodes = 4, family = 'binomial', n_bootstraps = 10, n_cores = 1, sample_prop = 0.9)`

`## Fitting bootstrap_MRF models in sequence using 1 core...`

Finally, we can explore regression coefficients to get a better
understanding of just how important interactions are for predicting
species’ occurrence probabilities (in comparison to other covariates).
This is perhaps the strongest property of CRFs, as comparing the
relative importances of interactions and fixed covariates using
competing methods (such as Joint Species Distribution Models) is
difficult. The `bootstrap_MRF`

function conveniently returns
a matrix of important coefficients for each node in the graph, as well
as their relative importances

`booted_MRF$mean_key_coefs$Hzosteropis`

```
## Variable Rel_importance Mean_coef
## 2 scale.prop.zos_Hkillangoi 0.80140395 9.640955
## 1 Hkillangoi 0.16985458 -4.438467
## 3 scale.prop.zos_Microfilaria 0.01364083 -1.257809
```

`booted_MRF$mean_key_coefs$Hkillangoi`

```
## Variable Rel_importance Mean_coef
## 3 scale.prop.zos_Hzosteropis 0.76213791 9.640955
## 1 Hzosteropis 0.16153229 -4.438467
## 2 Microfilaria 0.06792907 -2.878267
```

`booted_MRF$mean_key_coefs$Plas`

```
## Variable Rel_importance Mean_coef
## 2 Microfilaria 0.63127189 2.0893991
## 3 scale.prop.zos 0.16101622 -1.0552321
## 5 scale.prop.zos_Microfilaria 0.09262250 0.8003343
## 4 scale.prop.zos_Hkillangoi 0.07574510 -0.7237532
## 1 Hzosteropis 0.03171738 -0.4683406
```

`booted_MRF$mean_key_coefs$Microfilaria`

```
## Variable Rel_importance Mean_coef
## 2 Hkillangoi 0.49414460 -2.8782673
## 3 Plas 0.26039620 2.0893991
## 5 scale.prop.zos_Hzosteropis 0.09436723 -1.2578090
## 4 scale.prop.zos 0.07227446 -1.1007699
## 1 Hzosteropis 0.03995655 0.8184612
## 6 scale.prop.zos_Plas 0.03820627 0.8003343
```

Users can also use the `predict_MRFnetworks`

function to
calculate network statistics for each node in each observation or to
generate adjacency matrices for each observation. By default, this
function generates a list of `igraph`

adjacency matrices, one
for each row in `data`

, which can be used to make network
plots using a host of other packages. Note, both this function and the
`predict_MRF`

function rely on `data`

that has a
structure exactly matching to the data that was used to fit the model.
In other words, the column names and column order need to be identical.
The `cutoff`

argument is important for binary problems, as
this specifies the probability threshold for stating whether or not a
species should be considered present at a site (and thus, whether their
interactions will be present). Here, we state that a predicted
occurrence above `0.33`

is sufficient

```
adj_mats <- predict_MRFnetworks(data = Bird.parasites,
MRF_mod = booted_MRF,
metric = 'eigencentrality',
cutoff = 0.33)
colnames(adj_mats) <- colnames(Bird.parasites[, 1:4])
apply(adj_mats, 2, summary)
```

```
## Hzosteropis Hkillangoi Plas Microfilaria
## Min. 0.0000000 0.00000000 0.0000000 0.00000000
## 1st Qu. 0.0000000 0.00000000 0.0000000 0.00000000
## Median 0.0000000 0.00000000 0.0000000 0.00000000
## Mean 0.1751534 0.08908686 0.1818131 0.08685969
## 3rd Qu. 0.3660254 0.00000000 0.0675390 0.00000000
## Max. 1.0000000 1.00000000 1.0000000 1.00000000
```

Lastly, `MRFcov`

has the capability to account for
possible spatial autocorrelation when estimating interaction parameters.
To do this, we incorporate functions from the `mgcv`

package
to include smoothed Gaussian process spatial regression splines in each
node-wise regression. The user must supply a two-column
`data.frame`

called `coords`

, which will ideally
contain Latitude and Longitude values for each row in `data`

.
We don’t have these coordinates for the `Bird.parasites`

dataset, so we will instead create some fake coordinates to showcase the
model. Note, these commands were not run here, but feel free to move
through them as you did for the above examples

```
Latitude <- sample(seq(120, 140, length.out = 100), nrow(Bird.parasites), TRUE)
Longitude <- sample(seq(-19, -22, length.out = 100), nrow(Bird.parasites), TRUE)
coords <- data.frame(Latitude = Latitude, Longitude = Longitude)
```

The syntax for the `MRFcov_spatial`

function is nearly
identical to `MRFcov`

, with the exception that
`coords`

must be supplied

```
CRFmod_spatial <- MRFcov_spatial(data = Bird.parasites, n_nodes = 4,
family = 'binomial', coords = coords)
```

Interpretation is also identical to `MRFcov`

objects.
Here, key coefficients are those that are retained *after*
accounting for spatial influences

`CRFmod_spatial$key_coefs$Hzosteropis`

Finally, we can compare fits of spatial and non-spatial models just as we did for MRFs and CRFs above

```
cv_MRF_diag_rep_spatial(data = Bird.parasites, n_nodes = 4,
n_cores = 3, family = 'binomial', plot = T, compare_null = T,
coords = coords)
```

title: “MRFs and CRFs for Bird.parasites data” output: rmarkdown::html_vignette vignette: > % % % —

We can explore the model’s primary functions using a test dataset
that is available with the package. Load the `Bird.parasites`

dataset, which contains binary occurrences of four avian blood parasites
in New Caledonian *Zosterops* species (available in its original
form at Dryad; Clark *et al* 2016). A single continuous
covariate is also included (`scale.prop.zos`

), which reflects
the relative abundance of *Zosterops* species (compared to other
avian species) among different sample sites

```
library(MRFcov)
data("Bird.parasites")
```

The `Bird.parasites`

dataset is already in the appropriate
structure for running `MRFcov`

models; however, it is useful
to understand how this was done. Using the raw dataset (downloaded from
`Dryad`

at the above link), we created a scaled continuous
covariate to represent *Zosterops* spp. proportional captures in
each sample site (as an estimate of relative abundance)

```
#Not run
#install.packages(dplyr)
data.paras = data.frame(data.paras) %>%
dplyr::group_by(Capturesession,Genus) %>%
dplyr::summarise(count = dlyr::n()) %>%
dplyr::mutate(prop.zos = count / sum(count)) %>%
dplyr::left_join(data.paras) %>%
dplyr::ungroup() %>% dplyr::filter(Genus == 'Zosterops') %>%
dplyr::mutate(scale.prop.zos = as.vector(scale(prop.zos)))
data.paras <- data.paras[, c(12:15, 23)]
```

You can visualise the dataset to see how analysis data needs to be
structured. In short, for (`family = "binomial"`

) models,
node variable (i.e. species) occurrences should be included as binary
variables (`1`

s and `0`

s) as the
`n_nodes`

left-most variables in `data`

. Note that
Gaussian continuous variables (`family = "gaussian"`

) and
Poisson non-negative counts (`family = "poisson"`

) are also
supported in `MRFcov`

. Any covariates can be included as the
right-most variables. It is recommended that these covariates all be on
a similar scale, ideally using the `scale`

function for
continuous covariates (or similar) so that covariates have roughly
`mean = 0`

and `sd = 1`

, as this makes LASSO
variable selection and comparisons of effect sizes very
straightforward

```
help("Bird.parasites")
View(Bird.parasites)
```

We first test for interactions using a Markov Random Fields (MRF)
model without covariates. We set `n_nodes`

as the number of
species to be represented in the graphical model (`4`

). We
also need to specify the family for the model (`binomial`

in
this case). We do not specify the level of penalisation used by the
LASSO algorithm. Instead, this is optimised separately for each species
through cross validation (using function `cv.glmnet`

in the
`glmnet`

package). This ensures the log-likelihood of each
species is properly maximised before unifying them into an undirected
graph. Finally, it is important to note that for MRF model functions, we
can specify the number of processing cores to split the job across
(i.e. `n_cores`

). In this case we will only use 1 core, but
higher numbers *may* increase speed (if more cores are
available). Check available cores using the `detectCores()`

function in the `parallel`

package

`MRF_fit <- MRFcov(data = Bird.parasites[, c(1:4)], n_nodes = 4, family = 'binomial')`

```
## Leave-one-out cv used for the following low-occurrence (rare) nodes:
## Microfilaria ...
## Fitting MRF models in sequence using 1 core ...
```

The message above can be useful for identifying nodes (i.e. species)
that are very rare or very common, as these *can* be difficult to
properly model. In this case, the `Microfilaria`

node is
fairly rare, and so the function automatically reverts to using
leave-one-out cross-validation to optimise parameters. This can be slow,
so be aware when attempting to use large datasets that contain many very
rare or very common species. Now that the model has converged, we can
plot the estimated interaction coefficients as a heatmap using
`plotMRF_hm`

. Note that we can specify the names of nodes
should we want to change them

```
plotMRF_hm(MRF_mod = MRF_fit, main = 'MRF (no covariates)',
node_names = c('H. zosteropis', 'H. killangoi',
'Plasmodium', 'Microfilaria'))
```

Note that other plotting methods an be used as desired. For instance,
to plot these as a network instead, we can simply extract the adjacency
matrix and plot it using standard `igraph`

functions

```
net <- igraph::graph.adjacency(MRF_fit$graph, weighted = T, mode = "undirected")
igraph::plot.igraph(net, layout = igraph::layout.circle,
edge.width = abs(igraph::E(net)$weight),
edge.color = ifelse(igraph::E(net)$weight < 0,
'blue',
'red'))
```

We can now run a Conditional Random Fields (CRF) model using the
provided continuous covariate (`scale.prop.zos`

). Again, each
species’ regression is optimised separately using LASSO regularization.
Note that any columns in `data`

to the right of column
`n_nodes`

will be presumed to represent covariates if we
don’t specify an `n_covariates`

argument

`MRF_mod <- MRFcov(data = Bird.parasites, n_nodes = 4, family = 'binomial')`

```
## Leave-one-out cv used for the following low-occurrence (rare) nodes:
## Microfilaria ...
## Fitting MRF models in sequence using 1 core ...
```

Visualise the estimated species interaction coefficients as a
heatmap. These represent predicted interactions when the covariate is
set to its mean (i.e. in this case, when
`scale.prop.zos = 0`

). If we had included other covariates,
then this graph would represent interactions predicted when *all*
covariates were set at their means

`plotMRF_hm(MRF_mod = MRF_mod)`

Regression coefficients and their relative importances can be
accessed as well. This call returns a matrix of the raw coefficient, as
well as standardised coefficients (standardised by the `sd`

of the covariate). Standardisation in this way helps to compare the
relative influences of each parameter on the target species’ occurrence
probability, but in general the two coefficients will be identical
(unless users have not pre-scaled their covariates). The list also
contains contains each variable’s relative importance (calculated using
the formula `B^2 / sum(B^2)`

, where the vector of
`B`

s represents regression coefficients for predictor
variables). Variables with an underscore (`_`

) indicate an
interaction between a covariate and another node, suggesting that
conditional dependencies of the two nodes vary across environmental
gradients. Because of this, it is recommended to avoid using column
names with `_`

in them

`MRF_mod$key_coefs$Hzosteropis`

```
## Variable Rel_importance Standardised_coef Raw_coef
## 1 Hkillangoi 0.67489383 -2.4496398 -2.4496398
## 5 scale.prop.zos_Microfilaria 0.12365075 -1.0485349 -1.0485349
## 3 Microfilaria 0.09374486 0.9129736 0.9129736
## 4 scale.prop.zos 0.09351624 -0.9118597 -0.9118597
## 2 Plas 0.01147868 -0.3194702 -0.3194702
```

Finally, a useful capability is to generate some fake data and test predictions. For instance, say we want to know how frequently malaria parasite infections are predicted to occur in sites with high occurrence of microfilaria

```
fake.dat <- Bird.parasites
fake.dat$Microfilaria <- rbinom(nrow(Bird.parasites), 1, 0.8)
fake.preds <- predict_MRF(data = fake.dat, MRF_mod = MRF_mod)
```

The returned object from `predict_MRF`

depends on the
family of the model. For `family = "binomial"`

, we get a list
including both linear prediction probabilities and binary predictions
(where a linear prediction probability `> 0.5`

equates to
a binary prediction of `1`

). These binary predictions can be
used to estimate parasite prevalence

```
H.zos.pred.prev <- sum(fake.preds$Binary_predictions[, 'Hzosteropis']) / nrow(fake.preds$Binary_predictions)
Plas.pred.prev <- sum(fake.preds$Binary_predictions[, 'Plas']) / nrow(fake.preds$Binary_predictions)
Plas.pred.prev
```

`## [1] 0.403118`

A key step in the process of model exploration is to determine
whether inclusion of covariates actually improves model fit and is
warranted when estimating interaction parameters. This is
straightforward for binomial models, as we can compare classification
accuracy quickly and easily. The `cv_diag`

functions will fit
models and then determine their predictive performances against the
supplied data. We can also compare MRF and CRF models directly to
determine whether covariates are warranted. For this dataset,
considering that parasite infections are quite rare, we are primarily
interested in maximising model Sensitivity (capacity to successfully
predict positive infections)

```
mod_fits <- cv_MRF_diag_rep(data = Bird.parasites, n_nodes = 4,
n_cores = 1, family = 'binomial', plot = F,
compare_null = T,
n_folds = 10)
```

```
## Generating node-optimised Conditional Random Fields model
##
## Generating Markov Random Fields model (no covariates)
##
## Calculating model predictions of the supplied data
## Generating CRF predictions ...
## Generating null MRF predictions ...
##
## Calculating predictive performance across test folds
## Processing cross-validation run 1 of 10 ...
## Processing cross-validation run 2 of 10 ...
## Processing cross-validation run 3 of 10 ...
## Processing cross-validation run 4 of 10 ...
## Processing cross-validation run 5 of 10 ...
## Processing cross-validation run 6 of 10 ...
## Processing cross-validation run 7 of 10 ...
## Processing cross-validation run 8 of 10 ...
## Processing cross-validation run 9 of 10 ...
## Processing cross-validation run 10 of 10 ...
```

```
# CRF (with covariates) model sensitivity
quantile(mod_fits$mean_sensitivity[mod_fits$model == 'CRF'], probs = c(0.05, 0.95))
```

```
## 5% 95%
## 0.2410751 0.5297269
```

```
# MRF (no covariates) model sensitivity
quantile(mod_fits$mean_sensitivity[mod_fits$model != 'CRF'], probs = c(0.05, 0.95))
```

```
## 5% 95%
## 0.1426108 0.3703704
```

We now may want to fit models to bootstrapped subsets of the data to
account for parameter uncertainty. Users can change the proportion of
observations to include in each bootstrap run with the
`sample_prop`

option.

`booted_MRF <- bootstrap_MRF(data = Bird.parasites, n_nodes = 4, family = 'binomial', n_bootstraps = 10, n_cores = 1, sample_prop = 0.9)`

`## Fitting bootstrap_MRF models in sequence using 1 core...`

```
## Warning: from glmnet C++ code (error code -91); Convergence for 91th lambda
## value not reached after maxit=25000 iterations; solutions for larger lambdas
## returned
```

```
## Warning: from glmnet C++ code (error code -90); Convergence for 90th lambda
## value not reached after maxit=25000 iterations; solutions for larger lambdas
## returned
```

```
## Warning: from glmnet C++ code (error code -88); Convergence for 88th lambda
## value not reached after maxit=25000 iterations; solutions for larger lambdas
## returned
```

```
## Warning: from glmnet C++ code (error code -89); Convergence for 89th lambda
## value not reached after maxit=25000 iterations; solutions for larger lambdas
## returned
```

Finally, we can explore regression coefficients to get a better
understanding of just how important interactions are for predicting
species’ occurrence probabilities (in comparison to other covariates).
This is perhaps the strongest property of CRFs, as comparing the
relative importances of interactions and fixed covariates using
competing methods (such as Joint Species Distribution Models) is
difficult. The `bootstrap_MRF`

function conveniently returns
a matrix of important coefficients for each node in the graph, as well
as their relative importances

`booted_MRF$mean_key_coefs$Hzosteropis`

```
## Variable Rel_importance Mean_coef
## 3 scale.prop.zos_Hkillangoi 0.82793968 8.908619
## 1 Hkillangoi 0.13618891 -3.613116
## 2 Microfilaria 0.01320045 1.124878
## 4 scale.prop.zos_Microfilaria 0.01064800 -1.010288
```

`booted_MRF$mean_key_coefs$Hkillangoi`

```
## Variable Rel_importance Mean_coef
## 3 scale.prop.zos_Hzosteropis 0.81243353 8.908619
## 1 Hzosteropis 0.13363829 -3.613116
## 2 Microfilaria 0.03686441 -1.897667
```

`booted_MRF$mean_key_coefs$Plas`

```
## Variable Rel_importance Mean_coef
## 3 Microfilaria 0.61653283 1.7503420
## 4 scale.prop.zos 0.21443166 -1.0322604
## 1 Hzosteropis 0.06704136 -0.5771866
## 5 scale.prop.zos_Microfilaria 0.06246871 0.5571550
## 2 Hkillangoi 0.03371410 -0.4093083
```

`booted_MRF$mean_key_coefs$Microfilaria`

```
## Variable Rel_importance Mean_coef
## 2 Hkillangoi 0.32113332 -1.8976674
## 3 Plas 0.27320648 1.7503420
## 1 Hzosteropis 0.11283812 1.1248778
## 4 scale.prop.zos 0.11143411 -1.1178576
## 5 scale.prop.zos_Hzosteropis 0.09101970 -1.0102878
## 6 scale.prop.zos_Hkillangoi 0.06268627 -0.8384239
## 7 scale.prop.zos_Plas 0.02768199 0.5571550
```

Users can also use the `predict_MRFnetworks`

function to
calculate network statistics for each node in each observation or to
generate adjacency matrices for each observation. By default, this
function generates a list of `igraph`

adjacency matrices, one
for each row in `data`

, which can be used to make network
plots using a host of other packages. Note, both this function and the
`predict_MRF`

function rely on `data`

that has a
structure exactly matching to the data that was used to fit the model.
In other words, the column names and column order need to be identical.
The `cutoff`

argument is important for binary problems, as
this specifies the probability threshold for stating whether or not a
species should be considered present at a site (and thus, whether their
interactions will be present). Here, we state that a predicted
occurrence above `0.33`

is sufficient

```
adj_mats <- predict_MRFnetworks(data = Bird.parasites,
MRF_mod = booted_MRF,
metric = 'eigencentrality',
cutoff = 0.33)
colnames(adj_mats) <- colnames(Bird.parasites[, 1:4])
apply(adj_mats, 2, summary)
```

```
## Hzosteropis Hkillangoi Plas Microfilaria
## Min. 0.0000000 0.0000000 0.0000000 0.00000000
## 1st Qu. 0.0000000 0.0000000 0.0000000 0.00000000
## Median 0.0000000 0.0000000 0.0000000 0.00000000
## Mean 0.1759286 0.1447661 0.1270602 0.08908686
## 3rd Qu. 0.3873937 0.0000000 0.1070651 0.00000000
## Max. 1.0000000 1.0000000 1.0000000 1.00000000
```

Lastly, `MRFcov`

has the capability to account for
possible spatial autocorrelation when estimating interaction parameters.
To do this, we incorporate functions from the `mgcv`

package
to include smoothed Gaussian process spatial regression splines in each
node-wise regression. The user must supply a two-column
`data.frame`

called `coords`

, which will ideally
contain Latitude and Longitude values for each row in `data`

.
We don’t have these coordinates for the `Bird.parasites`

dataset, so we will instead create some fake coordinates to showcase the
model. Note, these commands were not run here, but feel free to move
through them as you did for the above examples

```
Latitude <- sample(seq(120, 140, length.out = 100), nrow(Bird.parasites), TRUE)
Longitude <- sample(seq(-19, -22, length.out = 100), nrow(Bird.parasites), TRUE)
coords <- data.frame(Latitude = Latitude, Longitude = Longitude)
```

The syntax for the `MRFcov_spatial`

function is nearly
identical to `MRFcov`

, with the exception that
`coords`

must be supplied

```
CRFmod_spatial <- MRFcov_spatial(data = Bird.parasites, n_nodes = 4,
family = 'binomial', coords = coords)
```

Interpretation is also identical to `MRFcov`

objects.
Here, key coefficients are those that are retained *after*
accounting for spatial influences

`CRFmod_spatial$key_coefs$Hzosteropis`

Finally, we can compare fits of spatial and non-spatial models just as we did for MRFs and CRFs above

```
cv_MRF_diag_rep_spatial(data = Bird.parasites, n_nodes = 4,
n_cores = 3, family = 'binomial', plot = T, compare_null = T,
coords = coords)
```

d4590f253a4e9a4e75e82d27d6259f5c053d2a3c