kalmanfilter
is an Rcpp
implementation of
the multivariate Kalman filter for state space models that can handle
missing values and exogenous data in the observation and state
equations. The Kalman filter is a method to compute the optimal
estimator of the unobserved state vector in a time series.
The state space model is written as \[ Y_t = A + H \beta + B^o X^o_t + e_t, e_t \sim N(0, R)\\ \beta_t = D + F \beta_{t-1} + B^s X^s_t + u_t, u_t \sim N(0, Q) \] where the first equation is the observation equation and the second equation is the state equation. \(A\) is the observation intercept matrix, \(H\) is the observation matrix, \(X^o_t\) is optional exogenous data in the observation equation, \(e_t \sim N(0, R)\) is the observation error, and \(R\) is the observation error-covariance matrix. \(\beta_t\) is the vector of the state components, \(D\) is the state intercept matrix, \(F\) is the transition matrix, \(X^s_t\) is optional exogenous data in the state equation, \(v_t \sim N(0, Q_{s_t})\) is the state error, and \(Q\) is the state error-covariance matrix.
The Kalman filter is a forward two-stage procedure that is the optimal linear filter based on the multivariate normal distribution. The “forwardness” of the filter means that it uses only data up to time \(t\) to make an inference on the unobserved components at time \(t\) and does peak into the future to make that inference.
The first stage is the prediction stage, which makes predictions of the state components based on information up to time \(t-1\). This stage is made up of four equations, the first is the prediction of the state components based on its time series properties
\[ B_{t|t-1} = D + F \beta_{t-1|t-1} + B^s X^s_t \]
Second, is the prediction of the covariance matrix of the state components
\[ P_{t|t-1} = F P_{t_1|t-1} F^{\prime} + Q \]
Third, is the prediction error of the time series of interest
\[ \eta_{t|t-1} = Y_t - (A + H \beta_{t|t-1} + B^o X^o_t) \]
And finally, we have the variance of the prediction error
\[ f_{t|t-1} = H P_{t|t-1} H^{\prime} + R \]
The second stage is the updating stage, which makes predictions based on all information up to time \(t\). It consists of three equations. The first equation is the prediction of the state components based on the the full information set
\[ \beta_{t|t} = B_{t|t-1} + K_t \eta_{t|t-1} \] where \(K_t\) is the Kalman gain, which determines the optimal weight to give new information in making predictions about \(\beta_t\) and is the second equation
\[ K_t = P_{t|t-1} H^{\prime} f_{t|t-1}^{-1} \] The last equation is the updating of the covariance matrix of the state components
\[ P_{t|t} = P_{t|t-1} - K_t H^{\prime} P_{t|t-1} \] The seven equations above, make up the full Kalman filter routine. If \(Y_t\) is missing for any observation, then
\[ B_{t|t} = B_{t|t-1} \\ P_{t|t} = P_{t|t-1} \\ K_t = 0 \\ f_{t|t-1} = \infty \]
Once the Kalman filter is applied to the data, a smoothing procedure can be applied in the backward direction to make a better inference of the state components based on the entire data set. Unlike the filter, the smoother does peak into the future to make an inference of the state components at time \(t\). This procedure consists of only two equations.
The first equation updates the prediction of the state components based on all the available information
\[ \beta_{t|T} = \beta_{t|t} + P_{t|t} F^{\prime} P_{t|t}^{-1} (\beta_{t+1|T} - \beta_{t+1|t}) \]
The second equation updates the covariance matrix of the state components based on all the available information
\[ P_{t|T} = P_{t|t} + P_{t|t} F^{\prime} P_{t+1|t}^{-1} (P_{t+1|T} - P_{t+1|t}) P_{t+1|t}^{-1 \prime} F P_{t|t}^{\prime} \]
To estimate a model with the Kalman filter via maximum likelihood, the log likelihood is given by
\[ ln(L(\theta)) = -\frac{1}{2} \sum_{t=1}^T \ln(|f_{t|t-1}|) - \frac{1}{2}\sum_{t=1}^T \eta_{t|t-1}^{\prime} f_{t|t-1}^{-1} \eta_{t|t-1} \]
To use the package, use the function kalman_filter
as
= kalman_filter(ssm, yt, Xo, Xs, w, smooth) kf
where ssm
is a list object with matrices that describes
the state space model. ssm
should contain matrices named
B0
for the initial guess of the unobserved components,
P0
for the initial guess of the unobserved components
covariance matrix, Dm
for the constant in the state
equation, Am
for the constant in the observation equation,
Fm
for the state equation transition matrix,
Hm
for the observation matrix in the observation equation,
Qm
for the covariance matrix of the errors in the state
equation, Rm
for the covariance matrix of the errors in the
observation equation, betaO
for the coefficients on the
exogenous data in the observation matrix, and betaS
for the
coefficients on the exogenous data in the state matrix.
betaO
and betaS
are optional.
yt
is an \(N_y x T\)
matrix, where \(N_y\) is the number of
variables and \(T\) is the number of
time periods.
Xo
is an optional \(N_o x
T\) matrix of optional exogenous data in the observation
equation, where \(N_o\) is the number
of exogenous observation variables.
Xs
is an optional \(N_s x
T\) matrix of optional exogenous data in the state equation,
where \(N_s\) is the number of
exogenous state variables.
w
is an optional \(T x
1\) matrix of optional weights applied to the likelihood.
ssm[["B0"]]
, ssm[["P0"]]
,
ssm[["Qm"]]
are \(N_{\beta} x
N_{\beta}\) matrix, where \(N_{\beta}\) is the number of unobserved
variables in the state equation.
ssm[["Dm"]]
is an \(N_{\beta}
x 1\) matrix.
ssm[["Am"]]
is an \(N_y x
1\) matrix.
ssm[["Fm"]]
is an \(N_{\beta}
x P\) matrix, where \(P\) is the
number of lags in the state equation.
ssm[["Hm"]]
is an \(N_y x
N_{\beta}\) matrix.
ssm[["Rm"]]
is an \(N_y x
N_y\) matrix.
ssm[["betaO"]]
is an optional \(N_y x N_o\) matrix.
ssm[["betaS"]]
is an optional \(N_{\beta} x N_s\) matrix.
To force exclusion of exogenous data, set the relevant matrices to have all 0 values. This is the default.
To use an unweighted likelihood, set the weights to be all 1’s. This is the default.
smooth
is a boolean indicating whether to run the
backwards smoother or not.
The output gives a list with values lnl
(the log
likelihood), y_tl
(the predicted values of y
given information up to time \(t-1\)),
y_tt
(the predicted values of y
using all the
available information), B_tl
(the predicted values of the
unobserved components up to time \(t-1\)), B_tt
(the predicted
values of the unobserved components given all the available
information), P_tl
(the unobserved components covariance
matrix up to time \(t-1\)),
P_tt
(the unobserved components covariance matrix using all
the available information), F_t
(variance of the prediction
error of y
up to time \(t-1\)), N_t
(the prediction
error of y
up to time \(t-1\)), and K_t
(the Kalman
gain).
library(kalmanfilter)
library(maxLik)
library(ggplot2)
library(data.table)
library(gridExtra)
data(treasuries)
= unique(treasuries$maturity)
tau
#State space model for the Nelson-Siegel dynamic factor yield curve
= function(par, tau){
yc_ssm #Set factor names
= unique(tau)
tau = c("l", "s", "c")
factors
#Loading on the slope factor
= function(par, tau){
Ls return(-(1 - exp(-tau*par["lambda"]))/(tau*par["lambda"]))
}
#Loading on the curvature factor
= function(par, tau){
Lc return(-Ls(par["lambda"], tau) - exp(-tau*par["lambda"]))
}
########## Define the state space model
#Transition matrix
= matrix(par[grepl("phi_", names(par))], nrow = 3, ncol = 3,
Fm dimnames = list(paste0(factors, "_t0"),
paste0(factors, "_t1")))
is.na(Fm)] = 0
Fm[
#Transition intercept matrix
= matrix(par[grepl("D_", names(par))], ncol = 1,
Dm dimnames = list(rownames(Fm), NULL))
#Transition error covariance matrix
= matrix(par[unlist(lapply(factors, function(x){paste0("sig_", x, factors)}))], nrow = 3, ncol = 3)
Qm lower.tri(Qm)] = Qm[upper.tri(Qm)]
Qm[dimnames(Qm) = list(rownames(Fm), rownames(Fm))
#Observation equation matrix
= length(tau)
n = matrix(NA, nrow = n, ncol = 3,
Hm dimnames = list(as.character(tau), rownames(Fm)))
= as.numeric(tau)
tau if("l" %in% factors){
"l_t0"] = 1
Hm[,
}if("s" %in% factors){
"s_t0"] = Ls(par, tau)
Hm[,
}if("c" %in% factors){
"c_t0"] = Lc(par, tau)
Hm[,
}
#Observation error variance covariance matrix
= diag(sum(grepl("sig_\\d+", names(par))))*par[grepl("sig_\\d+", names(par))]^2
Rm dimnames(Rm) = list(rownames(Hm), rownames(Hm))
#Observation intercept matrix
= matrix(0, nrow = nrow(Hm), ncol = 1,
Am dimnames = list(rownames(Hm), NULL))
#Initial guess for the unobserved vector
= matrix(par[names(par) %in% c("level", "slope", "curvature")], ncol = 1, nrow = nrow(Fm),
B0 dimnames = list(rownames(Fm), NULL))
#Initial guess for the variance of the unobserved vector
= diag(par["P0"]^2, nrow = nrow(Qm), ncol = ncol(Qm))
P0
return(list(Fm = Fm, Dm = Dm, Qm = Qm,
Hm = Hm, Am = Am, Rm = Rm,
B0 = B0, P0 = P0))
}
#Set the initial values
= c(level = 5.9030, slope = -0.7090, curvature = 0.8690, lambda = 0.0423,
init D_l = 0.1234, D_s = -0.2285, D_c = 0.2020,
phi_ll = 0.9720, phi_ls = 0.1009, phi_lc = -0.1226,
phi_sl = -0.0209, phi_ss = 0.8189, phi_sc = 0.0192,
phi_cl = -0.0061, phi_cs = -0.1446, phi_cc = 0.8808,
sig_ll = 0.1017,
sig_sl = 0.0937, sig_ss = 0.2267,
sig_cl = 0.0303, sig_cs = 0.0351, sig_cc = 0.7964,
sig_1 = 0.0934, sig_3 = 0.0001, sig_6 = 0.1206, sig_12 = 0.1525,
sig_24 = 0.1328, sig_36 = 0.0855, sig_60 = 0.0001, sig_84 = 0.0397,
sig_120 = 0.0595, sig_240 = 0.1438, sig_360 = 0.1450,
P0 = 0.0001)
#Set up the constraints: lambda and all standard deviation parameters must be positive
= matrix(0, nrow = 15, ncol = length(init), dimnames = list(NULL, names(init)))
ineqA diag(ineqA[, c("lambda", "sig_ll", "sig_ss", "sig_cc", paste0("sig_", tau))]) = 1
= matrix(0, nrow = nrow(ineqA), ncol = 1)
ineqB
#Convert to an NxT matrix
= dcast(treasuries, "date ~ maturity", value.var = "value")
yt = t(yt[, 2:ncol(yt)])
yt
#Set the objective function
= function(par){
objective = yc_ssm(par, tau)
ssm return(kalman_filter(ssm, yt,)$lnl)
}
#Solve the model
= maxLik(logLik = objective, start = init, method = "BFGS",
solve finalHessian = FALSE, hess = NULL,
control = list(printLevel = 2, iterlim = 10000),
constraints = list(ineqA = ineqA, ineqB = ineqB))
#Get the estimated state space model
= yc_ssm(solve$estimate, tau)
ssm
#Filter the data with the model
= kalman_filter(ssm, yt, smooth = TRUE)
filter
#Get the estimated unobserved factors
= data.table(t(filter[["B_tt"]]))[, "date" := unique(treasuries$date)]
B_tt colnames(B_tt)[1:3] = c("level", "slope", "curvature")
#Get the estimated yields
= data.table(t(filter[["y_tt"]]))[, "date" := unique(treasuries$date)]
y_tt colnames(y_tt)[1:(ncol(y_tt) - 1)] = tau
#Plot the data
= ggplot(treasuries, id.vars = "date") +
g1 geom_line(aes(x = date, y = value, group = factor(maturity), color = factor(maturity))) +
theme_minimal() + theme(legend.position = "bottom") +
labs(title = "Actual Treasury Yields", x = "", y = "Value") +
guides(color = guide_legend(title = NULL))
= ggplot(melt(y_tt, id.vars = "date")) +
g2 geom_line(aes(x = date, y = value, group = variable, color = variable)) +
theme_minimal() + theme(legend.position = "bottom") +
labs(title = "Estimated Treasury Yields", x = "", y = "Value") +
guides(color = guide_legend(title = NULL))
= ggplot(melt(B_tt, id.vars = "date")) +
g3 geom_line(aes(x = date, y = value, group = variable, color = variable)) +
theme_minimal() + theme(legend.position = "bottom") +
labs(title = "Estimated Factors", x = "", y = "Value") +
guides(color = guide_legend(title = NULL))
grid.arrange(g1, g2, g3, layout_matrix = rbind(c(1, 2), c(3, 3)))
library(kalmanfilter)
library(data.table)
library(maxLik)
library(ggplot2)
library(gridExtra)
data(sw_dcf)
= sw_dcf[, colnames(sw_dcf) != "dcoinc", with = F]
data = colnames(data)[colnames(data) != "date"]
vars
#State space model for the Stock and Watson Dynamic Common Factor model
= function(par, yt){
dcf_ssm #Get the parameters
= dimnames(yt)[which(unlist(lapply(dimnames(yt), function(x){!is.null(x)})))][[1]]
vars = par[grepl("phi", names(par))]
phi names(phi) = gsub("phi", "", names(phi))
= par[grepl("gamma_", names(par))]
gamma names(gamma) = gsub("gamma_", "", names(gamma))
= par[grepl("psi_", names(par))]
psi names(psi) = gsub("psi_", "", names(psi))
= par[grepl("sigma_", names(par))]
sig names(sig) = gsub("sigma_", "", names(sig))
#Build the transition matrix
= max(c(length(phi)), length(unique(sapply(strsplit(names(gamma), "\\."), function(x){x[2]}))))
phi_dim = sapply(unique(sapply(strsplit(names(psi), "\\."), function(x){x[1]})), function(x){
psi_dim max(as.numeric(sapply(strsplit(names(psi)[grepl(paste0("^", x), names(psi))], "\\."), function(x){x[2]})))
})= matrix(0, nrow = phi_dim + length(psi), ncol = phi_dim + length(psi),
Fm dimnames = list(
c(paste0("ct.", 0:(phi_dim - 1)),
unlist(lapply(names(psi_dim), function(x){paste0("e_", x, ".", 0:(psi_dim[x] - 1))}))),
c(paste0("ct.", 1:phi_dim),
unlist(lapply(names(psi_dim), function(x){paste0("e_", x, ".", 1:psi_dim[x])})))
))"ct.0", paste0("ct.", names(phi))] = phi
Fm[for(i in 1:length(vars)){
paste0("e_", i, ".0"),
Fm[paste0("e_", names(psi)[grepl(paste0("^", i), names(psi))])] = psi[grepl(paste0("^", i), names(psi))]
}diag(Fm[intersect(rownames(Fm), colnames(Fm)), intersect(rownames(Fm), colnames(Fm))]) = 1
#Build the observation matrix
= matrix(0, nrow = nrow(yt), ncol = nrow(Fm), dimnames = list(rownames(yt), rownames(Fm)))
Hm for(i in 1:length(vars)){
paste0("ct.", sapply(strsplit(names(gamma)[grepl(paste0("^", i), names(gamma))], "\\."), function(x){x[2]}))] =
Hm[i, grepl(paste0("^", i), names(gamma))]
gamma[
}diag(Hm[, paste0("e_", 1:length(vars), ".0")]) = 1
#Build the state covariance matrix
#Set the dynamic common factor standard deviation to 1
= matrix(0, ncol = ncol(Fm), nrow = nrow(Fm), dimnames = list(rownames(Fm), rownames(Fm)))
Qm "ct.0", "ct.0"] = 1
Qm[for(i in 1:length(vars)){
paste0("e_", i, ".0"), paste0("e_", i, ".0")] = sig[names(sig) == i]^2
Qm[
}
#Build the observation equation covariance matrix
= matrix(0, ncol = nrow(Hm), nrow = nrow(Hm), dimnames = list(vars, vars))
Rm
#Transition equation intercept matrix
= matrix(0, nrow = nrow(Fm), ncol = 1, dimnames = list(rownames(Fm), NULL))
Dm
#Observation equation intercept matrix
= matrix(0, nrow = nrow(Hm), ncol = 1)
Am
#Initialize the filter for each state
= matrix(0, nrow(Fm), 1)
B0 = diag(nrow(Fm))
P0 dimnames(B0) = list(rownames(Fm), NULL)
dimnames(P0) = list(rownames(Fm), rownames(Fm))
= solve(diag(ncol(Fm)) - Fm) %*% Dm
B0 = solve(diag(prod(dim(Fm))) - kronecker(Fm, Fm)) %*% matrix(as.vector(Qm), ncol = 1)
VecP_ll = matrix(VecP_ll[, 1], ncol = ncol(Fm))
P0
return(list(B0 = B0, P0 = P0, Am = Am, Dm = Dm, Hm = Hm, Fm = Fm, Qm = Qm, Rm = Rm))
}
#Log the data
= copy(data)
data.log c(vars) := lapply(.SD, log), .SDcols = c(vars)]
data.log[,
#Difference the data
= copy(data.log)
data.logd c(vars) := lapply(.SD, function(x){
data.logd[, - shift(x, type = "lag", n = 1)
x = c(vars)]
}), .SDcols
#Standardize the data
= copy(data.logd)
data.logds c(vars) := lapply(.SD, scale, scale = FALSE), .SDcols = c(vars)]
data.logds[,
#Transpose the data
= t(data.logds[, c(vars), with = FALSE])
yt
#Set the initial values
= c(phi1 = 0.8588, phi2 = -0.1526,
init psi_1.1 = -0.1079, psi_1.2 = -0.1415,
psi_2.1 = -0.3166, psi_2.2 = -0.0756,
psi_3.1 = -0.3994, psi_3.2 = -0.2028,
psi_4.1 = -0.0370, psi_4.2 = 0.3646,
gamma_1.0 = 0.0064, gamma_1.1 = -0.0020,
gamma_2.0 = 0.0018,
gamma_3.0 = 0.0059, gamma_3.1 = -0.0027,
gamma_4.0 = 0.0014, gamma_4.1 = -0.0004, gamma_4.2 = 0.0001, gamma_4.3 = 0.0004,
sigma_1 = 0.0049, sigma_2 = 0.0056, sigma_3 = 0.0077, sigma_4 = 0.0013)
#Set the constraints
= matrix(0, nrow = 14,
ineqA ncol = length(init), dimnames = list(NULL, names(init)))
#Stationarity constraints
c(1, 2), c("phi1", "phi2")] = rbind(c(1, 1), c(-1, -1))
ineqA[c(3, 4), grepl("psi_1", colnames(ineqA))] = rbind(c(1, 1), c(-1, -1))
ineqA[c(5, 6), grepl("psi_2", colnames(ineqA))] = rbind(c(1, 1), c(-1, -1))
ineqA[c(7, 8), grepl("psi_3", colnames(ineqA))] = rbind(c(1, 1), c(-1, -1))
ineqA[c(9, 10), grepl("psi_4", colnames(ineqA))] = rbind(c(1, 1), c(-1, -1))
ineqA[#Non-negativity constraints
diag(ineqA[c(11, 12, 13, 14), grepl("sigma_", colnames(ineqA))]) = 1
= matrix(c(rep(1, 10),
ineqB rep(0, 4)), nrow = nrow(ineqA), ncol = 1)
#Define the objective function
= function(par, yt){
objective = dcf_ssm(par, yt)
ssm return(kalman_filter(ssm, yt, smooth = FALSE)$lnl)
}
#Solve the model
= maxLik(logLik = objective, start = init, method = "BFGS",
solve finalHessian = FALSE, hess = NULL,
control = list(printLevel = 2, iterlim = 10000),
constraints = list(ineqA = ineqA, ineqB = ineqB),
yt = yt)
#Get the estimated state space model
= dcf_ssm(solve$estimate, yt)
ssm
#Get the column means and standard deviations
= matrix(unlist(data.logd[, lapply(.SD, mean, na.rm = TRUE), .SDcols = c(vars)]),
M ncol = 1, dimnames = list(vars, NULL))
#Get the coefficient matrices
= ssm[["Hm"]]
Hm = ssm[["Fm"]]
Fm
#Final K_t is approximation to steady state K
= kalman_filter(ssm, yt, smooth = T)
filter = filter$K_t[,, dim(filter$K_t)[3]]
K = solve(diag(nrow(K)) - (diag(nrow(K)) - K %*% Hm) %*% Fm) %*% K
W = (W %*% M)[1, 1]
d
#Get the intercept terms
= Hm[, grepl("ct", colnames(Hm))]
gamma = M - gamma %*% matrix(rep(d, ncol(gamma)))
D
#Initialize first element of the dynamic common factor
= t(data.log[, c(vars), with = F][1, ])
Y1 = function(par){
initC return(sum((Y1 - D - gamma %*% par)^2))
}= optim(par = Y1, fn = initC, method = "BFGS", control = list(trace = FALSE))$par[1]
C10 = rep(C10, ncol(yt))
Ctt
#Build the rest of the level of the dynamic common factor
= filter$B_tt[which(rownames(Fm) == "ct.0"), ]
ctt for(j in 2:length(Ctt)){
= ctt[j] + Ctt[j - 1] + c(d)
Ctt[j]
}= data.table(date = data$date, dcf = Ctt, d.dcf = ctt)
Ctt
#Plot the outputs
= ggplot(melt(data.log, id.vars = "date")[, "value" := scale(value), by = "variable"]) +
g1 ggtitle("Data", subtitle = "Log Levels (Rescaled)") +
scale_y_continuous(name = "Value") +
scale_x_date(name = "") +
geom_line(aes(x = date, y = value, group = variable, color = variable)) +
theme_minimal() + theme(legend.position = "bottom") + guides(color = guide_legend(title = NULL))
= ggplot( melt(data.logds, id.vars = "date")) +
g2 ggtitle("Data", subtitle = "Log Differenced & Standardized") +
scale_y_continuous(name = "Value") +
scale_x_date(name = "") +
geom_hline(yintercept = 0, color = "black") +
geom_line(aes(x = date, y = value, group = variable, color = variable)) +
theme_minimal() + theme(legend.position = "bottom") + guides(color = guide_legend(title = NULL))
= ggplot(melt(Ctt, id.vars = "date")[variable == "dcf", ]) +
g3 ggtitle("Dynamic Common Factor", subtitle = "Levels") +
scale_x_date(name = "") +
geom_hline(yintercept = 0, color = "grey") +
geom_line(aes(x = date, y = value, group = variable, color = variable)) +
theme_minimal() + theme(legend.position = "bottom") + scale_color_manual(values = "black") +
guides(color = guide_legend(title = NULL), fill = guide_legend(title = NULL)) +
scale_y_continuous(name = "Value", limits = range(Ctt$dcf, na.rm = TRUE))
= ggplot(melt(Ctt, id.vars = "date")[variable == "d.dcf", ]) +
g4 ggtitle("Dynamic Common Factor", subtitle = "Differenced") +
scale_x_date(name = "") +
geom_hline(yintercept = 0, color = "grey") +
geom_line(aes(x = date, y = value, group = variable, color = variable)) +
theme_minimal() + theme(legend.position = "bottom") + scale_color_manual(values = "black") +
guides(color = guide_legend(title = NULL), fill = guide_legend(title = NULL)) +
scale_y_continuous(name = "Value", limits = range(Ctt$d.dcf, na.rm = TRUE))
grid.arrange(g1, g2, g3, g4, layout_matrix = matrix(c(1, 3, 2, 4), nrow = 2))