This is a short vignette illustrating the `bssm`

package.
For more detailed exposition, please see the corresponding R Journal
paper:

Jouni Helske and Matti Vihola (2021). “bssm: Bayesian Inference of Non-linear and Non-Gaussian State Space Models in R”. The R Journal (2021) 13:2, pages 578-589. Link to the paper.

State space models (SSM) are latent variable models which are
commonly applied in analysing time series data due to their flexible and
general framework (cf. Durbin and Koopman 2012). For
`R`

(R Core Team
2020), there is large number of packages available for state
space modelling, especially for the two special cases. First special
case is linear-Gaussian SSM (LGSSM) where both the observation and state
densities are Gaussian with linear relationships with the states.
Another special case is SSM with discrete state space, which are
sometimes called hidden Markov models (HMM). What is special about these
two classes of models is that the marginal likelihood function, and the
conditional state distributions (conditioned on the observations) of
these models are analytically tractable, making inference relatively
straightforward. See for example S. Helske and
Helske (2019) for review of some of the
`R`

packages dealing with these type of models. The
`R`

package `bssm`

is designed for Bayesian
inference of general state space models with non-Gaussian and/or
non-linear observational and state equations. The package aims to
provide easy-to-use and efficient functions for fully Bayesian inference
of common time series models such basic structural time series model
(BSM) (Harvey
1989) with exogenous covariates, simple stochastic volatility
models, and discretized diffusion models, making it straightforward and
efficient to make predictions and other inference in a Bayesian
setting.

The motivation behind the `bssm`

package is in (Vihola, Helske,
and Franks 2020) which suggests a new computationally
efficient, parallelisable approach for Bayesian inference of state space
models. The core idea is to use fast approximate Markov chain Monte
Carlo (MCMC) targeting the approximate marginal posterior of the
hyperparameters (i.e. unknown variables excluding latent state
variables), which is then used in importance sampling type weighting
phase which provides asymptotically exact samples from the joint
posterior of hyperparameters and the hidden states. In addition to this
the two-stage procedure, standard pseudo-marginal MCMC and so called
delayed acceptance pseudo-marginal MCMC are also supported. For more
details, see (J.
Helske and Vihola 2021). There is also separate vignette for
nonlinear models as well as for discretized diffusion models.

Denote a sequence of observations \((y_1,\ldots,y_T)\) as \(y_{1:T}\), and sequence of latent state
variables \((\alpha_1,\ldots,
\alpha_T)\) as \(\alpha_{1:T}\).
A general state space model consists of two parts: observation level
densities \(g_t(y_t | \alpha_t)\) and
latent state transition densities \(\mu_t(\alpha_{t+1} | \alpha_t)\). We first
focus on the case where the state transitions are linear-Gaussian: \[
\alpha_{t+1} = c_t + T_t \alpha_t + R_t \eta_t,
\] where \(c_t\) is known input
vector (often omitted), and \(T_t\) and
\(R_t\) are a system matrices which can
depend on unknown parameters. Also, \(\eta_t
\sim N(0, I_k)\) and \(\alpha_1 \sim
N(a_1, P_1)\) independently of each other. For observation level
density \(g_t\), the `bssm`

package currently supports basic stochastic volatility model and general
exponential family state space models.

For exponential family models, the observation equation has a general form

\[ g_t(y_t | d_t + Z_t \alpha_t, x'_t\beta, \phi, u_t), \] where \(d_t\) is a again known input, \(x_t\) contains the exogenous covariate values at time \(t\), with \(\beta\) corresponding to the regression coefficients. Parameter \(\phi\) and the known vector \(u_t\) are distribution specific and can be omitted in some cases. Currently, following observational level distributions are supported:

Gaussian distribution: \(g_t(y_t | Z_t \alpha_t, x'_t\beta) = x'_t \beta + Z_t \alpha_t + H_t \epsilon_t\) with \(\epsilon_t \sim N(0, 1)\).

Poisson distribution: \(g_t(y_t | Z_t \alpha_t, x'_t \beta, u_t) = \textrm{Poisson}(u_t \exp(x'_t \beta + Z_t \alpha_t))\), where \(u_t\) is the known exposure at time \(t\).

Binomial distribution: \(g_t(y_t | Z_t \alpha_t, x'_t \beta, u_t) = \textrm{binomial}(u_t, \exp(x'_t \beta + Z_t \alpha_t) / (1 + \exp(x'_t \beta + Z_t \alpha_t)))\), where \(u_t\) is the number of trials and \(\exp(x_t \beta + Z_t \alpha_t) / (1 + \exp(x'_t \beta + Z_t \alpha_t))\) is the probability of the success.

Negative binomial distribution: \(g_t(y_t | Z_t \alpha_t, x'_t \beta, \phi, u_t) = \textrm{negative binomial}(\exp(x'_t \beta + Z_t \alpha_t), \phi, u_t)\), where \(u_t \exp(x'_t \beta + Z_t \alpha_t)\) is the expected value and \(\phi\) is the dispersion parameter (\(u_t\) is again exposure term).

Gamma distribution: \(g_t(y_t | d_t + Z_t \alpha_t, \phi, u_t) = \textrm{Gamma}(\exp( d_t + Z_t \alpha_t), \phi, u_t)\), where \(u_t \exp( d_t + Z_t \alpha_t)\) is the expected value, \(\phi\) is the shape parameter, and \(u_t\) is a known offset term.

For stochastic volatility model, there are two possible parameterizations available. In general for we have \[ y_t = \sigma \exp(\alpha_t / 2)\epsilon_t, \quad \epsilon_t \sim N(0, 1), \] and \[ \alpha_{t+1} = \mu + \rho (\alpha_t - \mu) + \sigma_{\eta} \eta_t, \] with \(\alpha_1 \sim N(\mu, \sigma^2_{\eta} / (1-\rho^2))\). For identifiability purposes we must either choose \(\sigma=1\) or \(\mu=0\). Although analytically identical, the parameterization with \(\mu\) is often preferable in terms of computational efficiency.

Typically some of the model components such as \(\beta\), \(T_t\) or \(R_t\) depend on unknown parameter vector \(\theta\), so \(g_t(y_t | \alpha_t)\) and \(\mu_t(\alpha_{t+1} | \alpha_t)\) depend implicitly on \(\theta\). Our goal is to perform Bayesian inference of the joint posterior of \(\alpha_{1:T}\) and \(\theta\).

For multivariate models, these distributions can be combined arbitrarily, except the stochastic volatility model case which is currently handled separately. Also, for fully Gaussian model, the observational level errors can be correlated, i.e. \(Cov(\epsilon_t) = H_tH'_t\).

The general non-linear Gaussian model in the `bssm`

has
following form:

\[
y_t = Z(t, \alpha_t, \theta) + H(t, \alpha_t, \theta)\epsilon_t,\\
\alpha_{t+1} = T(t, \alpha_t, \theta) + R(t, \alpha_t, \theta)\eta_t,\\
\alpha_1 \sim N(a_1(\theta), P_1(\theta)),
\] with \(t=1,\ldots, n\), \(\epsilon_t \sim N(0,\textrm{I}_p)\), and
\(\eta \sim N(0,\textrm{I}_k)\). Here
vector \(\theta\) contains the unknown
model parameters. Due to their generality and the need to repeated calls
within MCMC, functions \(T(\cdot)\),
\(H(\cdot)\), \(T(\cdot)\), \(R(\cdot)\),\(a_1(\cdot)\), \(P_1(\cdot)\), as well as functions defining
the Jacobians of \(Z(\cdot)\) and \(T(\cdot)\) needed by the extended Kalman
filter and the prior distribution for \(\theta\) must be defined by user as a
external pointers to user-defined `C++`

functions. All of
these functions can also depend on some known parameters, defined as
`known_params`

(vector) and `known_tv_params`

(matrix with \(n\) columns) arguments
to `ssm_nlg`

function. See the growth model vignette^{1} for a
template for these functions.

The `bssm`

package also supports models where the state
equation is defined as a continuous time diffusion model of form \[
\textrm{d} \alpha_t =
\mu(\alpha_t,\theta) \textrm{d} t +
\sigma(\alpha_t, \theta) \textrm{d} B_t, \quad t\geq0,
\] where \(B_t\) is a Brownian
motion and where \(\mu\) and \(\sigma\) are scalar-valued functions, with
the univariate observation density \(g(y_k |
\alpha_k)\) defined at integer times \(k=1\ldots,n\). As these transition
densities are generally unavailable for non-linear diffusions, we use
Milstein time-discretisation scheme for approximate simulation with
bootstrap particle filter. Fine discretisation mesh gives less bias than
the coarser one, with increased computational complexity. These models
are also defined via `C++`

snippets, see the SDE vignette for
details.

Given the prior \(p(\theta)\), the joint posterior of \(\theta\) and \(\alpha_{1:T}\) is given as

\[ p(\alpha_{1:T}, \theta | y_{1:T}) \propto p(\theta) p(\alpha_{1:T}, y_{1:T} | \theta) = p(\theta) p(y_{1:T} | \theta) p( \alpha_{1:T} | y_{1:T}, \theta) \]

where \(p(y_{1:T} | \theta)\) is the marginal likelihood, and \(p(\alpha_{1:T} | y_{1:T}, \theta)\) is often referred as a smoothing distribution. However, instead of targeting this joint posterior, it is typically more efficient to target the marginal posterior \(p(\theta | y)\), and then given the sample \(\{\theta^{i}\}^n_{i=1}\) from this marginal posterior, simulate states \(\alpha^i_{1:T}\) from the smoothing distribution \(p(\alpha_{1:T} | y_{1:T}, \theta^i)\) for \(i=1\ldots,n\).

For Gaussian models given the parameters \(\theta\), the marginal likelihood \(p(y_{1:T} | \theta)\) can be computed using
the well known Kalman filter recursions, and there are several
algorithms for simulating the states \(\alpha_{1:T}\) from the smoothing
distribution \(p(\alpha_{1:T} |
y_{1:T})\) (see for example Durbin and
Koopman (2012)). Therefore we can
straightforwardly apply standard MCMC algorithms. In `bssm`

,
we use an adaptive random walk Metropolis algorithm based on RAM (Vihola 2012)
where we fix the target acceptance rate beforehand. There RAM algorithm
is provided by the `ramcmc`

package (J. Helske
2018).

For non-linear/non-Gaussian models, the marginal likelihood \(p(y_{1:T} | \theta)\) is typically not
available in closed form. Thus we need to resort to simulation methods,
which leads to pseudo-marginal MCMC algorithm Andrieu and Roberts (2009). `bssm`

also
supports more efficient inference algorithms based on (intermediate)
approximations, see J. Helske and Vihola (2021) and Vihola, Helske, and Franks (2020).

Main functions of `bssm`

is written in `C++`

,
with help of `Rcpp`

(Eddelbuettel and François 2011) and
`RcppArmadillo`

(Eddelbuettel and Sanderson 2014)
packages. On the `R`

side, package uses S3 methods in order to
provide relatively unified workflow independent of the type of the model
one is working with. The model building functions such as
`bsm_ng`

and `svm`

are used to construct the
actual state models which can be then passed to other methods, such as
`logLik`

and `run_mcmc`

which compute the
log-likelihood value and run MCMC algorithm respectively. We will now
briefly describe the main functions and methods of `bssm`

,
for more detailed descriptions of different function arguments and
return values, see the corresponding documentation in
`R`

.

For linear-Gaussian models, `bssm`

offers functions
`bsm_lg`

for basic univariate structural time series models
(BSM), `ar1`

for univariate, possibly noisy AR(1) process, as
well as general `ssm_ulg`

and `ssm_mlg`

for
arbitrary linear gaussian models. As an example, consider a Gaussian
local linear trend model of form

\[
\begin{aligned}
y_t &= \mu_t + \epsilon_t,\\
\mu_{t+1} &= \mu_t + \nu_t + \eta_t,\\
\nu_{t+1} &= \nu_t + \xi_t,
\end{aligned}
\] with zero-mean Gaussian noise terms \(\epsilon_t, \eta_t, \xi_t\) with unknown
standard deviations. This model can be built with `bsm_lg`

function as

`library("bssm")`

```
##
## Attaching package: 'bssm'
```

```
## The following object is masked from 'package:base':
##
## gamma
```

```
data("nhtemp", package = "datasets")
prior <- halfnormal(1, 10)
bsm_model <- bsm_lg(y = nhtemp, sd_y = prior, sd_level = prior,
sd_slope = prior)
```

Here we use helper function `halfnormal`

which defines
half-Normal prior distribution for the standard deviation parameters,
with first argument defining the initial value of the parameter, and
second defines the scale parameter of the half-Normal distribution.
Other prior options are `normal`

and
`uniform`

.

For non-Gaussian models, function `bsm_ng`

can be used for
constructing an BSM model where the observations are assumed to be
distributed according to Poisson, binomial, negative binomial, or Gamma
distribution. The syntax is nearly identical as in case of
`bsm_lg`

, but we now define also the distribution via
argument `distribution`

, and depending on the model, we can
also define parameters `u`

and `phi`

. For Poisson
and negative binomial models, the known parameter `u`

corresponds to the offset term, whereas in case of binomial model
`u`

defines the number of trials. For negative binomial
model, argument `phi`

defines the dispersion term, which can
be given as a fixed value, or as a prior function. For same
observational densities, a model where the state equation follows a
first order autoregressive process can be defined using the function
`ng_ar1`

. Finally, a stochastic volatility model can be
defined using a function `svm`

, and an arbitrary
linear-Gaussian state model with Poisson, binomial or negative binomial
distributed observations can be defined with `ssm_ung`

and
`ssm_mng`

for univariate and multivariate models
respectively.

For models where the state equation is no longer linear-Gaussian, we
can use our pointer-based C++ interface with the function
`ssm_nlg`

. Diffusion models can be defined with the function
`ssm_sde`

. For details regarding these types of models, see
the corresponding vignettes `growth_model`

and
`sde_model`

respectively.

Filtering refers to estimating the conditional densities of the
hidden states at time \(t\), given the
observations up to that point. For linear-Gaussian models, these
densities can be efficiently computed using the Kalman filter
recursions. The `bssm`

has a method `kfilter`

for
this task. For models defined with the `ssm_mng`

,`bsm_ng`

, `ar1_ng`

, and `svm`

functions, `kfilter`

will first construct an approximating
Gaussian model for which the Kalman filter is then used. For details of
this approximation, see Durbin and Koopman (1997) and Vihola, Helske, and Franks (2020). For non-linear models
defined by `ssm_nlg`

it is possible to perform filtering
using extended Kalman filter (EKF) with the function `ekf`

,
or unscented Kalman filter with the function `ukf`

. It is
also possible to use iterated EKF (IEKF) by changing the argument
`iekf_iter`

of the `ekf`

function. Compared to
EKF, in IEKF the observation equation is linearized iteratively within
each time step.

While Kalman filter solves the filtering problem exactly in case of
linear-Gaussian models, EKF, UKF, and the filtering based on the
approximating Gaussian models produce only approximate, possibly biased
filtering estimates for general models. This problem can be solved by
the use of particle filters (PF). These sequential Monte Carlo methods
are computationally more expensive, but can in principle deal with
almost arbitrary state space models. The `bssm`

supports
general bootstrap particle filter (BSF) for all model classes of the
`bssm`

. For `ssm_mng`

,`bsm_ng`

,
`ar1_ng`

, and `svm`

models we recommend the
particle filter called \(\psi\)-APF
(Vihola,
Helske, and Franks 2020) (see also another vignette on CRAN)
which makes use of the previously mentioned approximating Gaussian model
in order to produce more efficient filter. It is also available for
`ssm_nlg`

models but in case of severe non-linearities, it is
not necessarily best option.

Compared to filtering problem, in smoothing problems we are interested in the conditional densities of the hidden states at certain time point \(t\) given all the observations \(y_1,\ldots,y_t,\ldots,y_n\). Again for linear-Gaussian models we can use so called Kalman smoothing recursions, where as in case of more general models we can rely on approximating methods, or smoothing algorithms based on the output of particle filters. Currently only filter-smoother approach (Kitagawa 1996) for particle smoothing is supported.

The main purpose of the `bssm`

is to allow efficient
MCMC-based inference for various state space models. For this task, a
method `run_mcmc`

can be used. Here we define a random walk
model with a drift and stochastic seasonal component for UK gas
consumption dataset and use 40 000 MCMC iteration where first half is
discarded by default as a burn-in. Note that the number of iterations is
quite low and in practice we should run the chain longer. Here we use
less iterations to speed up the package checks on CRAN.

```
prior <- halfnormal(0.1, 1)
UKgas_model <- bsm_lg(log10(UKgas), sd_y = prior, sd_level = prior,
sd_slope = prior, sd_seasonal = prior)
mcmc_bsm <- run_mcmc(UKgas_model, iter = 4e4, seed = 1)
mcmc_bsm
```

```
##
## Call:
## run_mcmc.lineargaussian(model = UKgas_model, iter = 40000, seed = 1)
##
## Iterations = 20001:40000
## Thinning interval = 1
## Length of the final jump chain = 4716
##
## Acceptance rate after the burn-in period: 0.236
##
## Summary for theta:
##
## variable Mean SE SD 2.5% 97.5% ESS
## sd_level 0.005076792 1.969154e-04 0.003360055 0.0002233676 0.012164506 291
## sd_seasonal 0.026279113 1.331061e-04 0.003790511 0.0185637669 0.033864873 811
## sd_slope 0.001169760 2.945858e-05 0.000539790 0.0001344258 0.002394876 336
## sd_y 0.016280928 2.656162e-04 0.005588283 0.0041307403 0.026326567 443
##
## Summary for alpha_109:
##
## variable time Mean SE SD 2.5% 97.5%
## level 1987 2.844603981 3.576960e-04 0.016755373 2.811864975 2.87881401
## seasonal_1 1987 0.268232881 7.340927e-04 0.035012588 0.199319650 0.33673134
## seasonal_2 1987 0.062504590 3.753102e-04 0.017738847 0.029212802 0.09875666
## seasonal_3 1987 -0.295387310 3.287016e-04 0.015225214 -0.327559158 -0.26697840
## slope 1987 0.009664101 8.330244e-05 0.003840359 0.002693921 0.01804641
## ESS
## 2194
## 2275
## 2234
## 2145
## 2125
##
## Run time:
## user system elapsed
## 3.15 0.03 3.26
```

Note that all MCMC algorithms of `bssm`

output also state
forecasts for the timepoint \(n + 1\),
the summary statistics of this state is also shown in the output
above.

Here we use `ggplot2`

(Wickham 2016) package for the figures, so
we transform the MCMC samples to `data.frame`

:

`suppressMessages(library("ggplot2"))`

`## Warning: package 'ggplot2' was built under R version 4.3.1`

```
d <- as.data.frame(mcmc_bsm, variable = "theta")
ggplot(d, aes(x = value)) +
geom_density(adjust = 3, fill = "#92f0a8") +
facet_wrap(~ variable, scales = "free") +
theme_bw()
```

```
suppressMessages(library("dplyr"))
d <- as.data.frame(mcmc_bsm, variable = "states")
level_fit <- d |>
filter(variable == "level") |>
group_by(time) |>
summarise(consumption = mean(value),
lwr = quantile(value, 0.025),
upr = quantile(value, 0.975))
ggplot(level_fit, aes(x = time, y = consumption)) +
geom_ribbon(aes(ymin = lwr, ymax = upr),
fill = "#92f0a8", alpha = 0.25) +
geom_line(colour = "#92f0a8") +
geom_line(data = data.frame(
consumption = log10(UKgas),
time = time(UKgas)),
colour = "grey30", linetype = "dashed") +
theme_bw()
```

This work has been supported by the Academy of Finland research grants 284513, 312605, 311877, and 331817.

Andrieu, Christophe, and Gareth O. Roberts. 2009. “The
Pseudo-Marginal Approach for Efficient Monte
Carlo Computations.” *Annals of Statistics*
37 (2): 697–725.

Beaumont, Mark A. 2003. “Estimation of Population Growth or
Decline in Genetically Monitored Populations.” *Genetics*
164: 1139–60.

Durbin, James, and Siem Jan Koopman. 1997. “Monte
Carlo Maximum Likelihood Estimation for
Non-Gaussian State Space Models.”
*Biometrika* 84 (3): 669–84. https://doi.org/10.1093/biomet/84.3.669.

———. 2012. *Time Series Analysis by State Space Methods*. 2nd ed.
New York: Oxford University Press.

Eddelbuettel, Dirk, and Romain François. 2011. “Rcpp:
Seamless R and C++ Integration.”
*Journal of Statistical Software* 40 (8): 1–18. https://www.jstatsoft.org/v40/i08/.

Eddelbuettel, Dirk, and Conrad Sanderson. 2014. “RcppArmadillo:
Accelerating r with High-Performance c++ Linear Algebra.”
*Computational Statistics and Data Analysis* 71: 1054–63. https://dx.doi.org/10.1016/j.csda.2013.02.005.

Harvey, A. C. 1989. *Forecasting, Structural Time Series Models and
the Kalman Filter*. Cambridge University Press.

Helske, Jouni. 2017. “KFAS: Exponential Family State
Space Models in R.” *Journal of Statistical
Software* 78 (10): 1–39. https://doi.org/10.18637/jss.v078.i10.

———. 2018. *ramcmc: Robust Adaptive
Metropolis Algorithm*. https://github.com/helske/ramcmc.

Helske, Jouni, and Matti Vihola. 2021. “Bssm: Bayesian Inference
of Non-Linear and Non-Gaussian State Space Models in
R.” *R Journal*. https://arxiv.org/abs/2101.08492.

Helske, Satu, and Jouni Helske. 2019. “Mixture Hidden
Markov Models for Sequence Data: The seqHMM Package in R.”
*Journal of Statistical Software* 88 (3): 1–32. https://doi.org/10.18637/jss.v088.i03.

Kitagawa, Genshiro. 1996. “Monte Carlo
Filter and Smoother for Non-Gaussian Nonlinear State Space
Models.” *Journal of Computational and Graphical
Statistics* 5 (1): 1–25.

Lin, L., K. F. Liu, and J. Sloan. 2000. “A Noisy
Monte Carlo Algorithm.” *Physical
Review D* 61.

Petris, Giovanni, and Sonia Petrone. 2011. “State Space Models in
.” *Journal of Statistical Software* 41 (4): 1–25. https://doi.org/10.18637/jss.v041.i04.

R Core Team. 2020. *R: A Language and Environment for Statistical
Computing*. Vienna, Austria: R Foundation for Statistical Computing.
https://www.R-project.org/.

Tusell, Fernando. 2011. “Kalman Filtering in .” *Journal
of Statistical Software* 39 (2): 1–27. https://doi.org/10.18637/jss.v039.i02.

Vihola, Matti. 2012. “Robust Adaptive Metropolis
Algorithm with Coerced Acceptance Rate.” *Statistics and
Computing* 22 (5): 997–1008. https://doi.org/10.1007/s11222-011-9269-5.

Vihola, Matti, Jouni Helske, and Jordan Franks. 2020. “Importance
Sampling Type Estimators Based on Approximate Marginal
MCMC.” *Scandinavian Journal of Statistics*.
https://doi.org/10.1111/sjos.12492.

Wickham, Hadley. 2016. *Ggplot2: Elegant Graphics for Data
Analysis*. Springer-Verlag New York. https://ggplot2.tidyverse.org.