The **disaggregation** package contains functions to run
Bayesian disaggregation models. Aggregated response data over large
heterogenous regions can be used alongside fine-scale covariate
information to predict fine-scale response across the region. The github
page for this package can be found here.

Install **disggregation** using:

`::install_github("aknandi/disaggregation") devtools`

The key functions are `prepare_data`

,
`fit_model`

and `predict`

. The
`prepare_data`

function takes the aggregated data and
covariate data to be used in the model and produces an object to be use
by `fit_model`

. This functions runs the disaggregation model
and the out can be passed to `predict`

to produce fine-scale
predicted maps of the response variable.

To use the disaggregation `prepare_data`

function, you
must have the aggregated data as a `SpatialPolygonDataFrame`

object and a `RasterStack`

of the covariate data to be used
in the model.

We will demonstrate an example of the **disaggregation**
package using areal data of leukemia incidence in New York, using data
from the package `SpatialEpi`

.

```
library(SpatialEpi, quietly = TRUE)
library(dplyr, quietly = TRUE)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(sp, quietly = TRUE)
library(raster, quietly = TRUE)
#>
#> Attaching package: 'raster'
#> The following object is masked from 'package:dplyr':
#>
#> select
library(disaggregation, quietly = TRUE)
<- NYleukemia$spatial.polygon
map <- NYleukemia$data
df
<- SpatialPolygonsDataFrame(map, df)
polygon_data
polygon_data#> class : SpatialPolygonsDataFrame
#> features : 281
#> extent : -76.73743, -75.2395, 41.9978, 43.41827 (xmin, xmax, ymin, ymax)
#> crs : +proj=longlat
#> variables : 3
#> names : censustract.FIPS, cases, population
#> min values : 36007000100, 0.00014, 9
#> max values : 36109992300, 9.28601, 13015
```

Now we simulate two covariate rasters for the area of interest and
make a `RasterStack`

. They are simulated at the resolution of
approximately 1km2.

```
<- 111*(polygon_data@bbox[, 2] - polygon_data@bbox[, 1])
extent_in_km <- floor(extent_in_km[[1]])
n_pixels_x <- floor(extent_in_km[[2]])
n_pixels_y <- raster::raster(ncol = n_pixels_x, nrow = n_pixels_y)
r <- raster::setExtent(r, raster::extent(polygon_data))
r <- sapply(1:raster::ncell(r), function(x) rnorm(1, ifelse(x %% n_pixels_x != 0, x %% n_pixels_x, n_pixels_x), 3))
r[] <- raster::raster(ncol = n_pixels_x, nrow = n_pixels_y)
r2 <- raster::setExtent(r2, raster::extent(polygon_data))
r2 <- sapply(1:raster::ncell(r), function(x) rnorm(1, ceiling(x/n_pixels_y), 3))
r2[] <- raster::stack(r, r2)
cov_stack <- raster::scale(cov_stack) cov_stack
```

We also create a population raster. This is to allow the model to correctly aggregated the pixel values to the polygon level. For this simple example we assume that the population within each polygon is uniformly distributed.

```
<- raster::extract(r, polygon_data)
extracted <- sapply(extracted, length)
n_cells @data$pop_per_cell <- polygon_data@data$population/n_cells
polygon_data<- rasterize(polygon_data, cov_stack, field = 'pop_per_cell') pop_raster
```

To correct small inconsistencies in the polygon geometry, we run the line below

```
<- rgeos::gBuffer(polygon_data, byid = TRUE, width = 0)
polygon_data #> Warning in wkt(obj): CRS object has no comment
#> Warning in wkt(obj): CRS object has no comment
#> Warning in rgeos::gBuffer(polygon_data, byid = TRUE, width = 0): Spatial object
#> is not projected; GEOS expects planar coordinates
```

Now we have setup the data we can use the `prepare_data`

function to create the objects needed to run the disaggregation model.
The name of the response variable and id variable in the
`SpatialPolygonsDataFrame`

should be specified.

The user can also control the parameters of the mesh that is used to
create the spatial field. The mesh is created by finding a tight
boundary around the polygon data, and creating a fine mesh within the
boundary and a coarser mesh outside. This speeds up computation time by
only having a very fine mesh within the area of interest and having a
small region outside with a coarser mesh to avoid edge effects. The mesh
parameters: `concave`

, `convex`

and
`resolution`

refer to the parameters used to create the mesh
boundary using the inla.noncovex.hull
function, while the mesh parameters `max.edge`

,
`cut`

and `offset`

refer to the parameters used to
create the mesh using the inla.mesh.2d
function.

```
<- prepare_data(polygon_data,
data_for_model
cov_stack,
pop_raster,response_var = 'cases',
id_var = 'censustract.FIPS',
mesh.args = list(cut = 0.01,
offset = c(0.1, 0.5),
max.edge = c(0.1, 0.2),
resolution = 250),
na.action = TRUE,
ncores = 1)
```

`plot(data_for_model)`

Now have our data object we are ready to run the model. Here we can specify the likelihood function as gaussian, binomial or poisson, and we can specify the link function as logit, log or identity. The disaggregation model makes predictions at the pixel level:

\(link(pred_i) = \beta_0 + \beta X + GP(s_i) + u_i\)

where \(X\) are the covariates, \(GP\) is the gaussian random field and \(u_i\) is the iid random effect. The pixel predictions are then aggregated to the polygon level using the weighted sum (via the aggregation raster, \(agg_i\)):

\(cases_j = \sum_{i \epsilon j} pred_i \times agg_i\)

\(rate_j = \frac{\sum_{i \epsilon j} pred_i \times agg_i}{\sum_{i \epsilon j} agg_i}\)

The different likelihood correspond to slightly different models (\(y_j\) is the response count data):

**Gaussian** (\(\sigma_j\) is the dispersion of the polygon
data),

\(dnorm(y_j/\sum agg_i, rate_j, \sigma_j)\)

Here \(\sigma_j = \sigma \sqrt{\sum agg_i^2} / \sum agg_i\), where \(\sigma\) is the dispersion of the pixel data, a parameter learnt by the model.

**Binomial** (For a survey in polygon j, \(y_j\) is the number positive and \(N_j\) is the number tested)

\(dbinom(y_j, N_j, rate_j)\)

**Poisson** (predicts incidence count)

\(dpois(y_j, cases_j)\)

The user can also specify the priors for the regression parameters. For the field, the user specifies the pc priors for the range, \(\rho_{min}\) and \(\rho_{prob}\), where \(P(\rho < \rho_{min}) = \rho_{prob}\), and the variation, \(\sigma_{min}\) and \(\sigma_{prob}\), where \(P(\sigma > \sigma_{min}) = \sigma_{prob}\), in the field. For the iid effect, the user also specifies pc priors.

By default the model contains a spatial field and a polygon iid
effect. These can be turned off in the `disag_model`

function, using `field = FALSE`

or
`iid = FALSE`

.

```
<- disag_model(data_for_model,
model_result iterations = 1000,
family = 'poisson',
link = 'log',
priors = list(priormean_intercept = 0,
priorsd_intercept = 2,
priormean_slope = 0.0,
priorsd_slope = 0.4,
prior_rho_min = 3,
prior_rho_prob = 0.01,
prior_sigma_max = 1,
prior_sigma_prob = 0.01,
prior_iideffect_sd_max = 0.05,
prior_iideffect_sd_prob = 0.01))
#> Fitting model. This may be slow.
#> Fitting model. This may be slow.
```

`plot(model_result)`

Now we have the results from the model of the fitted parameters, we
can predict Leukemia incidence rate at fine-scale (the scale of the
covariate data) across New York. The `predict`

function takes
the model result and predicts both the mean raster surface and predicts
and summarises `N`

parameter draws, where `N`

is
set by the user (default 100). The uncertainty is summarised via the
confidence interval set by the user (default 95% CI).

```
<- predict(model_result,
preds N = 100,
CI = 0.95)
plot(preds)
```