State Space Analysis and Kalman Filter
State Space Analysis and Kalman Filter
series models we mean models in which the observations are made up of trend,
seasonal, cycle and regression components plus error. We go on to put Box–
Jenkins ARIMA models into state space form, thus demonstrating that these
models are special cases of state space models. Next we discuss the history of
exponential smoothing and show how it relates to simple forms of state space and
ARIMA models. We follow this by considering various aspects of regression with
or without time-varying coefficients or autocorrelated errors. We also present a
treatment of dynamic factor analysis. Further topics discussed are simultaneous
modelling series from different sources, benchmarking, continuous time models
and spline smoothing in discrete and continuous time. These considerations apply
to minimum variance linear unbiased systems and to Bayesian treatments as well
as to classical models.
Chapter 4 begins with a set of four lemmas from elementary multivariate
regression which provides the essentials of the theory for the general linear state
space model from both a classical and a Bayesian standpoint. These have the
useful property that they produce the same results for Gaussian assumptions of
the model and for linear minimum variance criteria where the Gaussian assump-
tions are dropped. The implication of these results is that we only need to prove
formulae for classical models assuming normality and they remain valid for lin-
ear minimum variance and for Bayesian assumptions. The four lemmas lead to
derivations of the Kalman filter and smoothing recursions for the estimation of
the state vector and its conditional variance matrix given the data. We also derive
recursions for estimating the observation and state disturbances. We derive the
simulation smoother which is an important tool in the simulation methods we
employ later in the book. We show that allowance for missing observations and
forecasting are easily dealt with in the state space framework.
Computational algorithms in state space analyses are mainly based on recur-
sions, that is, formulae in which we calculate the value at time t + 1 from earlier
values for t, t − 1, . . . , 1. The question of how these recursions are started up at
the beginning of the series is called initialisation; it is dealt with in Chapter 5. We
give a general treatment in which some elements of the initial state vector have
known distributions while others are diffuse, that is, treated as random variables
with infinite variance, or are treated as unknown constants to be estimated by
maximum likelihood.
Chapter 6 discusses further computational aspects of filtering and smooth-
ing and begins by considering the estimation of a regression component of the
model and intervention components. It next considers the square root filter and
smoother which may be used when the Kalman filter and smoother show signs
of numerical instability. It goes on to discuss how multivariate time series can
be treated as univariate series by bringing elements of the observational vec-
tors into the system one at a time, with computational savings relative to the
multivariate treatment in some cases. Further modifications are discussed where
the observation vector is high-dimensional. The chapter concludes by discussing
computer algorithms.
Non-Gaussian and nonlinear models 3
1.5 Notation
Although a large number of mathematical symbols are required for the exposition
of the theory in this book, we decided to confine ourselves to the standard
Other books on state space methods 5
English and Greek alphabets. The effect of this is that we occasionally need to
use the same symbol more than once; we have aimed however at ensuring that
the meaning of the symbol is always clear from the context. We present below a
list of the main conventions we have employed.
2.1 Introduction
The purpose of this chapter is to introduce the basic techniques of state space
analysis, such as filtering, smoothing, initialisation and forecasting, in terms of
a simple example of a state space model, the local level model. This is intended
to help beginners grasp the underlying ideas more quickly than they would if
we were to begin the book with a systematic treatment of the general case. We
shall present results from both the classical and Bayesian perspectives, assuming
normality, and also from the standpoint of minimum variance linear unbiased
estimation when the normality assumption is dropped.
A time series is a set of observations y1 , . . . , yn ordered in time. The basic
model for representing a time series is the additive model
y t = µt + γ t + ε t , t = 1, . . . , n. (2.1)
y t = µ t γ t εt . (2.2)
By taking logs however and working with logged values model (2.2) reduces to
model (2.1), so we can use model (2.1) for this case also.
To develop suitable models for µt and γt we need the concept of a random
walk . This is a scalar series αt determined by the relation αt+1 = αt + ηt where
the ηt ’s are independent and identically distributed random variables with zero
means and variances ση2 .
Consider a simple form of model (2.1) in which µt = αt where αt is a random
walk, no seasonal is present and all random variables are normally distributed.
We assume that εt has constant variance σε2 . This gives the model
yt = αt + εt , εt ∼ N 0, σε2 ,
(2.3)
αt+1 = αt + ηt , ηt ∼ N 0, ση2 ,
10 Local level model
for t = 1, . . . , n where the εt ’s and ηt ’s are all mutually independent and are
independent of α1 . This model is called the local level model. Although it has a
simple form, this model is not an artificial special case and indeed it provides the
basis for the analysis of important real problems in practical time series analysis;
for example, the local level model provides the basis for our analysis of the Nile
data that we start in Subsection 2.2.5. It exhibits the characteristic structure
of state space models in which there is a series of unobserved values α1 , . . . , αn ,
called the states, which represents the development over time of the system
under study, together with a set of observations y1 , . . . , yn which are related to
the αt ’s by the state space model (2.3). The object of the methodology that
we shall develop is to infer relevant properties of the αt ’s from a knowledge of
the observations y1 , . . . , yn . The model (2.3) is suitable for both classical and
Bayesian analysis. Where the εt ’s and the ηt ’s are not normally distributed
we obtain equivalent results from the standpoint of minimum variance linear
unbiased estimation.
We assume initially that α1 ∼ N(a1 ,P1 ) where a1 and P1 are known and
that σε2 and ση2 are known. Since random walks are non-stationary the model
is non-stationary. By non-stationary here we mean that distributions of random
variables yt and αt depend on time t.
For applications of model (2.3) to real series, we need to compute quantities
such as the mean of αt given y1 , . . . , yt−1 or the mean of αt given y1 , . . . , yn ,
together with their variances; we also need to fit the model to data by calculating
maximum likelihood estimates of the parameters σε2 and ση2 . In principle, this
could be done by using standard results from multivariate normal theory as
described in books such as Anderson (2003). In this approach the observations yt
generated by the local level model are represented as the n×1 vector Yn such that
⎛ ⎞ ⎛ ⎞
y1 1
⎜ .. ⎟ ⎜ .. ⎟
Yn ∼ N(1a1 , Ω), with Yn = ⎝ . ⎠, 1 = ⎝ . ⎠, Ω = 11 P1 + Σ, (2.4)
yn 1
t−1
y t = α1 + η j + εt , t = 1, . . . , n. (2.6)
j=1
2.2 Filtering
2.2.1 The Kalman filter
The object of filtering is to update our knowledge of the system each time a
new observation yt is brought in. We shall first develop the theory of filtering for
the local level model (2.3) where the εt ’s and ηt ’s are assumed normal from the
standpoint of classical analysis. Since in this case all distributions are normal,
conditional joint distributions of one set of observations given another set are
also normal. Let Yt−1 be the vector of observations (y1 , . . . , yt−1 ) for t = 2, 3, . . .
and assume that the conditional distribution of αt given Yt−1 is N(at ,Pt ) where
at and Pt are known. Assume also that the conditional distribution of αt given
Yt is N(at|t , Pt|t ). The distribution of αt+1 given Yt is N(at+1 , Pt+1 ). Our object
is to calculate at|t , Pt|t , at+1 and Pt+1 when yt is brought in. We refer to at|t as
the filtered estimator of the state αt and at+1 as the one-step ahead predictor of
αt+1 . Their respective associated variances are Pt|t and Pt+1 .
An important part is played by the one-step ahead prediction error vt of yt .
Then vt = yt − at for t = 1, . . . , n, and
for t = 2, . . . , n. When Yt is fixed, Yt−1 and yt are fixed so Yt−1 and vt are fixed
and vice versa. Consequently, p(αt |Yt ) = p(αt |Yt−1 , vt ). We have
where
Thus
Pt Pt σε2
p(αt |Yt ) = N at + vt , . (2.10)
Pt + σε2 Pt + σε2
But at|t and Pt|t have been defined such that p(αt |Yt ) = N(at|t , Pt|t ). It follows
that
Pt
at|t = at + vt , (2.11)
Pt + σε2
Pt σε2
Pt|t = . (2.12)
Pt + σε2
Since at+1 = E(αt+1 |Yt ) = E(αt + ηt |Yt ) and Pt+1 = Var(αt+1 |Yt ) = Var(αt +
ηt |Yt ) from (2.3), we have
giving
Pt
at+1 = at + vt , (2.13)
Pt + σε2
Pt σε2
Pt+1 = + ση2 , (2.14)
Pt + σε2
v t = y t − at , Ft = Pt + σε2 ,
at|t = at + Kt vt , Pt|t = Pt (1 − Kt ), (2.15)
at+1 = at + Kt vt , Pt+1 = Pt (1 − Kt ) + ση2 ,
for t = 1, . . . , n, where Kt = Pt / Ft .
We have assumed that a1 and P1 are known; however, more general initial
specifications for a1 and P1 will be dealt with in Section 2.9. Relations (2.15)
constitute the celebrated Kalman filter for the local level model. It should be
noted that Pt depends only on σε2 and ση2 and does not depend on Yt−1 . We
include the case t = n in (2.15) for convenience even though an+1 and Pn+1
are not normally needed for anything except forecasting. A set of relations such
as (2.15) which enables us to calculate quantities for t + 1 given those for t is
called a recursion.
with means µx and µy , variances σx2 and σy2 , and covariance σxy . The joint
distribution is
p(x, y) = p(y) p(x|y),
by the definition of the conditional density p(x|y). But it can also be verified by
direct multiplication. We have
A 1 1 2
p(x, y) = exp − σy−2 (y − µy )2 − σx−2 x − µx − σxy σy−2 (y − µy ) ,
2π 2 2
where A = σx2 − σy−2 σxy . It follows that the conditional distribution of x given y
is normal and independent of y with mean and variance given by
2
σxy σxy
E(x|y) = µx + (y − µy ), Var(x|y) = σx2 − .
σy2 σy2
To apply this lemma to the Kalman filter, let vt = yt − at and keep Yt−1 fixed.
Take x = αt so that µx = at and y = vt . It follows that µy = E(vt ) = 0. Then,
σx2 = Var(αt ) = Pt , σy2 = Var(vt ) = Var(αt − at + εt ) = Pt + σε2 and σxy = Pt .
We obtain the conditional distribution for αt given vt by
Pt Pt
E(αt |vt ) = at|t = at + (yt − at ), Var(αt |vt ) = Pt|t = .
Pt + σε2 Pt + σε2
14 Local level model
In a similar way we can obtain the equations for at+1 and Pt+1 by application
of this regression lemma.
Pt
ᾱt = at + (yt − at ), (2.21)
Pt + σε2
Pt σε2
Var(ᾱt − αt |Yt−1 ) = . (2.22)
Pt + σε2
∗
Similarly, if we estimate αt+1 given Yt−1 by the linear function ᾱt+1 = β ∗ + γ ∗ yt
∗
and require this to have the unbiasedness property E(ᾱt+1 − αt+1 |Yt−1 ) = 0, we
find that β ∗ = at (1 − γ ∗ ) so ᾱt+1
∗
= at + γ ∗ (yt − at ). By the same argument as
for ᾱt we find that Var(ᾱt+1 − αt+1 |Yt−1 ) is minimised when γ ∗ = Pt /(Pt + σε2 )
∗
giving
∗ Pt
ᾱt+1 = at + (yt − at ), (2.23)
Pt + σε2
∗ Pt σε2
Var(ᾱt+1 − αt+1 |Yt−1 ) = + ση2 . (2.24)
Pt + σε2
16 Local level model
We have therefore shown that the estimates of ᾱt and ᾱt+1 given by the MVLUE
approach and their variances are exactly the same as the values at|t , at+1 , Pt|t
and Pt+1 in (2.11) to (2.14) that are obtained by assuming normality, both from
a classical and from a Bayesian standpoint. It follows that the values given by the
Kalman filter recursion (2.15) are MVLUE. We shall show in Subsection 4.3.1
that the same is true for the general linear Gaussian state space model (4.12).
2.2.5 Illustration
In this subsection we shall illustrate the output of the Kalman filter using obser-
vations from the river Nile. The data set consists of a series of readings of the
annual flow volume at Aswan from 1871 to 1970. The series has been analysed by
Cobb (1978) and Balke (1993). We analyse the data using the local level model
(2.3) with a1 = 0, P1 = 107 , σε2 = 15, 099 and ση2 = 1, 469.1. The values for a1
and P1 were chosen arbitrarily for illustrative purposes. The values for σε2 and
ση2 are the maximum likelihood estimates which we obtain in Subsection 2.10.3.
The values of at together with the raw data, Pt , vt and Ft , for t = 2, . . . , n, given
by the Kalman filter, are presented graphically in Fig. 2.1.
(i) (ii)
17500
1250
15000
1000 12500
10000
750
7500
500
1880 1900 1920 1940 1960 1880 1900 1920 1940 1960
(iii) 32500 (iv)
250 30000
27500
0
25000
−250 22500
1880 1900 1920 1940 1960 1880 1900 1920 1940 1960
Fig. 2.1 Nile data and output of Kalman filter: (i) data (dots), filtered state at (solid
line) and its 90% confidence intervals (light solid lines); (ii) filtered state variance Pt ;
(iii) prediction errors vt ; (iv) prediction variance Ft .
Forecast errors 17
The most obvious feature of the four graphs is that Pt and Ft converge rapidly
to constant values which confirms that the local level model has a steady state
solution; for discussion of the concept of a steady state see Section 2.11. However,
it was found that the fitted local level model converged numerically to a steady
state in around 25 updates of Pt although the graph of Pt seems to suggest that
the steady state was obtained after around 10 updates.
n
p(v1 , . . . , vn ) = p(vt ), (2.26)
t=1
since p(v1 ) = p(y1 ) and p(vt ) = p(yt |Yt−1 ) for t = 2, . . . , n. Consequently, the
vt ’s are independently distributed.
We next show that the forecast errors vt are effectively obtained from a
Cholesky decomposition of the observation vector Yn . The Kalman filter recur-
sions compute the forecast error vt as a linear function of the initial mean a1
and the observations y1 , . . . , yt since
v 1 = y1 − a1 ,
v2 = y2 − a1 − K1 (y1 − a1 ),
v3 = y3 − a1 − K2 (y2 − a1 ) − K1 (1 − K2 )(y1 − a1 ), and so on.
It should be noted that Kt does not depend on the initial mean a1 and the
observations y1 , . . . , yn ; it depends only on the initial state variance P1 and the
disturbance variances σε2 and ση2 . Using the definitions in (2.4), we have
30 Local level model
Substituting the covariance terms into this and taking into account the definition
(2.34) leads directly to
The consequence is that we can use the original state smoother (2.37) for all t
by taking Kt = 0, and hence Lt = 1, at the missing time points. This device
applies to any missing observation within the sample period. In the same way
the equations for the variance of the state error and the smoothed disturbances
can be obtained by putting Kt = 0 at missing time points.
2.7.1 Illustration
Here we consider the Nile data and the same local level model as before; however,
we treat the observations at time points 21, . . . , 40 and 61, . . . , 80 as missing.
The Kalman filter is applied first and the output vt , Ft , at and Pt is stored for
t = 1, . . . , n. Then, the state smoothing recursions are applied. The first two
graphs in Fig. 2.5 are the Kalman filter values of at and Pt , respectively. The
last two graphs are the smoothing output α̂t and Vt , respectively.
Note that the application of the Kalman filter to missing observations can
be regarded as extrapolation of the series to the missing time points, while
smoothing at these points is effectively interpolation.
2.8 Forecasting
Let ȳn+j be the minimum mean square error forecast of yn+j given the time
series y1 , . . . , yn for j = 1, 2, . . . , J with J as some pre-defined positive integer.
By minimum mean square error forecast here we mean the function ȳn+j of
y1 , . . . , yn which minimises E[(yn+j − ȳn+j )2 |Yn ]. Then ȳn+j = E(yn+j |Yn ). This
follows immediately from the well-known result that if x is a random variable
with mean µ the value of λ that minimises E(x−λ)2 is λ = µ; see Exercise 4.14.3.
The variance of the forecast error is denoted by F̄n+j = Var(yn+j |Yn ). The theory
of forecasting for the local level model turns out to be surprisingly simple; we
merely regard forecasting as filtering the observations y1 , . . . , yn , yn+1 , . . . , yn+J
using the recursion (2.15) and treating the last J observations yn+1 , . . . , yn+J as
missing, that is, taking Kt = 0 in (2.15).
Letting ān+j = E(αn+j |Yn ) and P̄n+j = Var(αn+j |Yn ), it follows immediately
from equation (2.54) with τ = n + 1 and τ ∗ = n + J in §2.7 that
with ān+1 = an+1 and P̄n+1 = Pn+1 obtained from the Kalman filter (2.15).
Furthermore, we have
for j = 1, . . . , J. The consequence is that the Kalman filter can be applied for
t = 1, . . . , n + J where we treat the observations at times n + 1, . . . , n + J as
missing. Thus we conclude that forecasts and their error variances are delivered
by applying the Kalman filter in a routine way with Kt = 0 for t = n+1, . . . , n+J.
The same property holds for the general linear Gaussian state space model as
we shall show in Section 4.11. For a Bayesian treatment a similar argument can
be used to show that the posterior mean and variance of the forecast of yn+j is
obtained by treating yn+1 , . . . , yn+j as missing values, for j = 1, . . . , J.
2.8.1 Illustration
The Nile data set is now extended by 30 missing observations allowing the com-
putation of forecasts for the observations y101 , . . . , y130 . Only the Kalman filter
(i) (ii)
50000
1250
40000
1000 30000
750 20000
10000
500
1900 1950 2000 1900 1950 2000
(iii) (iv)
1200
60000
1100
50000
1000
40000
900
800 30000
Fig. 2.6 Nile data and output of forecasting: (i) data (dots), state forecast at and
50% confidence intervals; (ii) state variance Pt ; (iii) observation forecast E(yt |Yt−1 );
(iv) observation forecast variance Ft .
32 Local level model
is required. The graphs in Fig. 2.6 contain ŷn+j|n = an+j|n , Pn+j|n , an+j|n
and Fn+j|n , respectively, for j √ = 1, . . . , J with J = 30. The confidence inter-
val for E(yn+j |Yn ) is ŷn+j|n ± k F n+j|n where k is determined by the required
probability of inclusion; in Fig. 2.6 this probability is 50%.
2.9 Initialisation
We assumed in our treatment of the linear Gaussian model in previous sections
that the distribution of the initial state α1 is N(a1 , P1 ) where a1 and P1 are
known. We now consider how to start up the filter (2.15) when nothing is known
about the distribution of α1 , which is the usual situation in practice. In this
situation it is reasonable to represent α1 as having a diffuse prior density, that
is, fix a1 at an arbitrary value and let P1 → ∞. From (2.15) we have
v 1 = y1 − a1 , F1 = P1 + σε2 ,
and, by substituting into the equations for a2 and P2 in (2.15), it follows that
P1
a2 = a 1 + (y1 − a1 ), (2.56)
P1 + σε2
P1
P2 = P 1 1 − + ση2
P1 + σε2
P1
= σ 2 + ση2 . (2.57)
P1 + σε2 ε
1 P1
ε̂1 = σε2 u1 , with u1 = v1 − r1 ,
P1 + σε2 P1 + σε2
and η̂1 = ση2 r1 . Letting P1 → ∞, we obtain ε̂1 = −σε2 r1 . Note that r1 depends
on the Kalman filter output for t = 2, . . . , n. The smoothed variances of the
disturbances for t = 1 depend on D1 and N1 of which only D1 is affected by
P1 → ∞; using (2.47),
2
1 P1
D1 = + N1 .
P1 + σε2 P1 + σε2
as before with r1 as defined in (2.34) for t = 1. The equations for the remaining
α̂t ’s are the same as previously. The same results may be obtained by Bayesian
arguments.
Use of a diffuse prior for initialisation is the approach preferred by most time
series analysts in the situation where nothing is known about the initial value
α1 . However, some workers find the diffuse approach uncongenial because they
regard the assumption of an infinite variance as unnatural since all observed
time series have finite values. From this point of view an alternative approach
is to assume that α1 is an unknown constant to be estimated from the data by
maximum likelihood. The simplest form of this idea is to estimate α1 by maxi-
mum likelihood from the first observation y1 . Denote this maximum likelihood
estimate by α̂1 and its variance by Var(α̂1 ). We then initialise the Kalman filter
by taking a1|1 = α̂1 and P1|1 = Var(α̂1 ). Since when α1 is fixed y1 ∼ N(α1 , σε2 ),
we have α̂1 = y1 and Var(α̂1 ) = σε2 . We therefore initialise the filter by taking
a1|1 = y1 and P1|1 = σε2 . But these are the same values as we obtain by assum-
ing that α1 is diffuse. It follows that we obtain the same initialisation of the
Kalman filter by representing α1 as a random variable with infinite variance as
by assuming that it is fixed and unknown and estimating it from y1 . We shall
show that a similar result holds for the general linear Gaussian state space model
in Subsection 5.7.3.
n
p(Yn ) = p(yt |Yt−1 ),
t=1
Parameter estimation 35
where p(y1 |Y0 ) = p(y1 ). Now p(yt |Yt−1 ) = N(at , Ft ) and vt = yt − at so on taking
logs and assuming that a1 and P1 are known the loglikelihood is given by
n
n 1 vt2
log L = log p(Yn ) = − log(2π) − log Ft + . (2.58)
2 2 t=1 Ft
The exact loglikelihood can therefore be constructed easily from the Kalman
filter (2.15).
Alternatively, let us derive the loglikelihood for the local level model from
the representation (2.4). This gives
n 1 1
log L = − log(2π) − log|Ω| − (Yn − a1 1) Ω−1 (Yn − a1 1), (2.59)
2 2 2
which follows from the multivariate normal distribution Yn ∼ N(a1 1, Ω). Using
results from §2.3.1, Ω = CF C , |C| = 1, Ω−1 = C F −1 C and v = C(Yn − a1 1);
it follows that
and
(Yn − a1 1) Ω−1 (Yn − a1 1) = v F −1 v.
n −1
Substitution and using the results log|F | =
t=1 log Ft and v F v =
n −1 2
F
t=1 t v t lead directly to (2.58).
The loglikelihood in the diffuse case is derived as follows. All terms in (2.58)
remain finite as P1 → ∞ with Yn fixed except the term for t = 1. It thus seems
reasonable to remove the influence of P1 as P1 → ∞ by defining the diffuse
loglikelihood as
1
log Ld = lim log L + log P1
P1 →∞ 2
n
1 F1 v12 n 1 vt2
= − lim log + − log(2π) − log Ft +
2 P1 →∞ P1 F1 2 2 t=2 Ft
n
n 1 v2
=− log(2π) − log Ft + t , (2.60)
2 2 t=2 Ft
since F1 /P1 → 1 and v12 /F1 → 0 as P1 → ∞. Note that vt and Ft remain finite
as P1 → ∞ for t = 2, . . . , n.
Since P1 does not depend on σε2 and ση2 , the values of σε2 and ση2 that maximise
log L are identical to the values that maximise log L + 12 log P1 . As P1 → ∞,
these latter values converge to the values that maximise log Ld because first and
36 Local level model
second derivatives with respect to σε2 and ση2 converge, and second derivatives are
finite and strictly negative. It follows that the maximum likelihood estimators
of σε2 and ση2 obtained by maximising (2.58) converge to the values obtained by
maximising (2.60) as P1 → ∞.
We estimate the unknown parameters σε2 and ση2 by maximising expres-
sion (2.58) or (2.60) numerically according to whether a1 and P1 are known
or unknown. In practice it is more convenient to maximise numerically with
respect to the quantities ψε = log σε2 and ψη = log ση2 . An efficient algorithm for
numerical maximisation is implemented in the STAMP 8.3 package of Koopman,
Harvey, Doornik and Shephard (2010). This optimisation procedure is based on
the quasi-Newton scheme BFGS for which details are given in Subsection 7.3.2.
By maximising (2.61) with respect to σε2 , for given F2∗ , . . . , Fn∗ , we obtain
1 vt2
n
σ̂ 2ε = . (2.62)
n − 1 t=2 Ft∗
The value of log Ld obtained by substituting σ̂ε2 for σε2 in (2.61) is called the
concentrated diffuse loglikelihood and is denoted by log Ldc , giving
1
n
n n−1 n−1
log Ldc = − log(2π) − − log σ̂ε2 − log Ft∗ . (2.63)
2 2 2 2 t=2
0 1 0 −3.32 −495.68
1 0.0360 −3.32 0.93 −492.53
2 0.0745 −2.60 0.25 −492.10
3 0.0974 −2.32 −0.001 −492.07
4 0.0973 −2.33 0.0 −492.07
2.10.3 Illustration
The estimates of the variances σε2 and ση2 = qσε2 for the Nile data are obtained
by maximising the concentrated diffuse loglikelihood (2.63) with respect to ψ
where q = exp(ψ). In Table 2.1 the iterations of the BFGS procedure are
reported starting with ψ = 0. The relative percentage change of the loglikeli-
hood goes down very rapidly and convergence is achieved after 4 iterations. The
final estimate for ψ is −2.33 and hence the estimate of q is q̂ = 0.097. The
estimate of σε2 given by (2.62) is 15099 which implies that the estimate of ση2 is
σ̂η2 = q̂σ̂ε2 = 0.097 × 15099 = 1469.1.
x2 − xq − q = 0, (2.64)
!
x = q + q 2 + 4q /2.
This is positive when q > 0 which holds for nontrivial models. The other solution
to (2.64) is inapplicable since it is negative for q > 0. Thus all non-trivial local
level models have a steady state solution.
3 Linear state space models
3.1 Introduction
The general linear Gaussian state space model can be written in a variety of
ways; we shall use the form
yt = Zt αt + εt , εt ∼ N(0, Ht ),
(3.1)
αt+1 = Tt αt + Rt ηt , ηt ∼ N(0, Qt ), t = 1, . . . , n,
selection matrix since it selects the rows of the state equation which have nonzero
disturbance terms; however, much of the theory remains valid if Rt is a general
m × r matrix.
Model (3.1) provides a powerful tool for the analysis of a wide range of
problems. In this chapter we shall give substance to the general theory to be
presented in Chapter 4 by describing a number of important applications of the
model to problems in time series analysis and in spline smoothing analysis.
s−1
γt+1 = − γt+1−j + ωt , ωt ∼ N 0, σω2 , (3.3)
j=1
[s/2]
2πj
γt = (γ̃j cos λj t + γ̃j∗ sin λj t), λj = , j = 1, . . . , [s/2], (3.5)
j=1
s
where [a] is the largest integer ≤ a and where the quantities γ̃j and γ̃j∗ are given
constants. For a time-varying seasonal this can be made stochastic by replacing
γ̃j and γ̃j∗ by the random walks
∗ ∗ ∗
γ̃j,t+1 = γ̃jt + ω̃jt , γ̃j,t+1 = γ̃jt + ω̃jt , j = 1, . . . , [s/2], t = 1, . . . , n, (3.6)
46 Linear state space models
∗
where ω̃jt and ω̃jt are independent N(0, σω2 ) variables; for details see Young, Lane,
Ng and Palmer (1991). An alternative trigonometric form is the quasi-random
walk model
[s/2]
γt = γjt , (3.7)
j=1
where
∗
γj,t+1 = γjt cos λj + γjt sin λj + ωjt ,
∗ ∗ ∗
γj,t+1 = −γjt sin λj + γjt cos λj + ωjt , j = 1, . . . , [s/2], (3.8)
∗
in which the ωjt and ωjt terms are independent N(0, σω2 ) variables. We can show
that when the stochastic terms in (3.8) are zero, the values of γt defined by (3.7)
are periodic with period s by taking
which are easily shown to satisfy the deterministic part of (3.8). The required
result follows since γt defined by (3.5) is periodic with period s. In effect, the
deterministic part of (3.8) provides a recursion for (3.5).
The advantage of (3.7) over (3.6) is that the contributions of the errors ωjt and
∗
ωjt are not amplified in (3.7) by the trigonometric functions cos λj t and sin λj t.
We regard (3.3) as the main time domain model and (3.7) as the main frequency
domain model for the seasonal component in structural time series analysis. A
more detailed discussion of seasonal models is presented in Proietti (2000). In
particular, he shows that the seasonal model in trigonometric form with specific
∗
variance restrictions for ωjt and ωjt , is equivalent to the quasi-random walk
seasonal model (3.4).
y t = µ t + γ t + εt , t = 1, . . . , n. (3.9)
To represent the model in state space form, we take the state vector as
4.1 Introduction
In this chapter and the following three chapters we provide a general treatment
from both classical and Bayesian perspectives of the linear Gaussian state space
model (3.1). The observations yt will be treated as multivariate. For much of
the theory, the development is a straightforward extension to the general case
of the treatment of the simple local level model in Chapter 2. We also consider
linear unbiased estimates in the non-normal case.
In Section 4.2 we present some elementary results in multivariate regres-
sion theory which provide the foundation for our treatment of Kalman filtering
and smoothing later in the chapter. We begin by considering a pair of jointly
distributed random vectors x and y. Assuming that their joint distribution is
normal, we show in Lemma 1 that the conditional distribution of x given y is
normal and we derive its mean vector and variance matrix. We shall show in
Section 4.3 that these results lead directly to the Kalman filter. For workers who
do not wish to assume normality we derive in Lemma 2 the minimum variance
linear unbiased estimate of x given y. For those who prefer a Bayesian approach
we derive in Lemma 3, under the assumption of normality, the posterior den-
sity of x given an observed value of y. Finally in Lemma 4, while retaining the
Bayesian approach, we drop the assumption of normality and derive a quasi-
posterior density of x given y, with a mean vector which is linear in y and which
has minimum variance matrix.
All four lemmas can be regarded as representing in appropriate senses the
regression of x on y. For this reason in all cases the mean vectors and variance
matrices are the same. We shall use these lemmas to derive the Kalman filter and
smoother in Sections 4.3 and 4.4. Because the mean vectors and variance matrices
are the same, we need only use one of the four lemmas to derive the results that
we need; the results so obtained then remain valid under the conditions assumed
under the other three lemmas.
Denote the set of observations y1 , . . . , yt by Yt . In Section 4.3 we will
derive the Kalman filter, which is a recursion for calculating at|t = E(αt |Yt ),
at+1 = E(αt+1 |Yt ), Pt|t = Var(αt |Yt ) and Pt+1 = Var(αt+1 |Yt ) given at and
Pt . The derivation requires only elementary properties of multivariate regres-
sion theory derived in Lemmas 1 to 4. We also investigate some properties of
state estimation errors and one-step ahead forecast errors. In Section 4.4 we use
Basic results in multivariate regression theory 77
the output of the Kalman filter and the properties of forecast errors to obtain
recursions for smoothing the series, that is, calculating the conditional mean and
variance matrix of αt , for t = 1, . . . , n, n+1, given all the observations y1 , . . . , yn .
Estimates of the disturbance vectors εt and ηt and their error variance matrices
given all the data are derived in Section 4.5. Covariance matrices of smoothed
estimators are considered in Section 4.7. The weights associated with filtered
and smoothed estimates of functions of the state and disturbance vectors are
discussed in Section 4.8. Section 4.9 describes how to generate random samples
for purposes of simulation from the smoothed densities of the state and distur-
bance vectors given the observations. The problem of missing observations is
considered in Section 4.10 where we show that with the state space approach
the problem is easily dealt with by means of simple modifications of the Kalman
filter and the smoothing recursions. Section 4.11 shows that forecasts of observa-
tions and state can be obtained simply by treating future observations as missing
values; these results are of special significance in view of the importance of fore-
casting in much practical time series work. A comment on varying dimensions
of the observation vector is given in Section 4.12. Finally, in Section 4.13 we
consider a general matrix formulation of the state space model.
Proof. Let z = x−Σxy Σ−1 yy (y −µy ). Since the transformation from (x, y) to (y, z)
is linear and (x, y) is normally distributed, the joint distribution of y and z is
normal. We have
E(z) = µx
Var(z) = E [(z − µx )(z − µx ) ]
= Σxx − Σxy Σ−1
yy Σxy , (4.4)
Cov(y, z) = E [y(z − µx ) ]
= E y(x − µx ) − y(y − µy ) Σ−1
yy Σxy
= 0. (4.5)
Using the result that if two vectors are normal and uncorrelated they are inde-
pendent, we infer from (4.5) that z is distributed independently of y. Since
the distribution of z does not depend on y its conditional distribution given
y is the same as its unconditional distribution, that is, it is normal with
mean vector µx and variance matrix (4.4) which is the same as (4.3). Since
z = x − Σxy Σ−1
yy (y − µy ), it follows that the conditional distribution of x given
y is normal with mean vector (4.2) and variance matrix (4.3).
Formulae (4.2) and (4.3) are well known in regression theory. An early proof
in a state space context is given in Åström (1970, Chapter 7, Theorem 3.2). The
proof given here is based on the treatment given by Rao (1973, §8a.2(v)). A par-
tially similar proof is given by Anderson (2003, Theorem 2.5.1). A quite different
proof in a state space context is given by Anderson and Moore (1979, Example
3.2) which is repeated by Harvey (1989, Appendix to Chapter 3); some details
of this proof are given in Exercise 4.14.1.
We can regard Lemma 1 as representing the regression of x on y in a mul-
tivariate normal distribution. It should be noted that Lemma 1 remains valid
when Σyy is singular if the symbol Σ−1yy is interpreted as a generalised inverse; see
the treatment in Rao (1973). Åström (1970) pointed out that if the distribution
of (x, y) is singular we can always derive a nonsingular distribution by making a
projection on the hyperplanes where the mass is concentrated. The fact that the
conditional variance Var(x|y) given by (4.3) does not depend on y is a property
special to the multivariate normal distribution and does not generally hold for
other distributions.
We now consider the estimation of x when x is unknown and y is known, as
for example when y is an observed vector. Under the assumptions of Lemma 1
we take as our estimate of x the conditional expectation x % = E(x|y), that is,
% = µx + Σxy Σ−1
x yy (y − µy ). (4.6)
% − x so x
This has estimation error x % is conditionally unbiased in the sense that
x − x|y) = x
E(% % − E(x|y) = 0. It is also obviously unconditionally unbiased in the
x − x) = 0. The unconditional error variance matrix of x
sense that E(% % is
Basic results in multivariate regression theory 79
x − x) = Var Σxy Σ−1
Var(% −1
yy (y − µy ) − (x − µx ) = Σxx − Σxy Σyy Σxy . (4.7)
Expressions (4.6) and (4.7) are, of course, the same as (4.2) and (4.3) respectively.
We now consider the estimation of x given y when the assumption that (x, y)
is normally distributed is dropped. We assume that the other assumptions of
Lemma 1 are retained. Let us restrict our attention to estimates x̄ that are
linear in the elements of y, that is, we shall take
x̄ = β + γy,
E(x̄ − x) = E(β + γy − x)
= β + γµy − µx = 0.
x̄ = µx + γ(y − µy ). (4.8)
Thus
which holds for all LUEs x̄. Since Var (γ − Σxy Σ−1
yy )y is non-negative definite
the lemma is proved.
The MVLUE property of the vector estimate x % implies that arbitrary lin-
ear functions of elements of x % are minimum variance linear unbiased estimates
of the corresponding linear functions of the elements of x. Lemma 2 can be
regarded as an analogue for multivariate distributions of the Gauss–Markov the-
orem for least squares regression of a dependent variable on fixed regressors. For
a treatment of the Gauss–Markov theorem, see, for example, Davidson and Mac-
Kinnon (1993, Chapter 3). Lemma 2 is proved in the special context of Kalman
filtering by Duncan and Horn (1972) and by Anderson and Moore (1979, §3.2).
However, their treatments lack the brevity and generality of Lemma 2 and its
proof.
Lemma 2 is highly significant for workers who prefer not to assume normal-
ity as the basis for the analysis of time series on the grounds that many real
time series have distributions that appear to be far from normal; however, the
MVLUE criterion is regarded as acceptable as a basis for analysis by many of
these workers. We will show later in the book that many important results in
state space analysis such as Kalman filtering and smoothing, missing observation
analysis and forecasting can be obtained by using Lemma 1; Lemma 2 shows that
these results also satisfy the MVLUE criterion. A variant of Lemma 2 is to for-
mulate it in terms of minimum mean square error matrix rather than minimum
variance unbiasedness; this variant is dealt with in Exercise 4.14.4.
Other workers prefer to treat inference problems in state space time series
analysis from a Bayesian point of view instead of from the classical standpoint
for which Lemmas 1 and 2 are appropriate. We therefore consider basic results
in multivariate regression theory that will lead us to a Bayesian treatment of the
linear Gaussian state space model.
Suppose that x is a parameter vector with prior density p(x) and that y is an
observational vector with density p(y) and conditional density p(y|x). Suppose
further that the joint density of x and y is the multivariate normal density p(x, y).
Then the posterior density of x given y is
p(x, y) p(x)p(y|x)
p(x|y) = = . (4.11)
p(y) p(y)
We shall use the same notation as in (4.1) for the first and second moments of
x and y. The equation (4.11) is a form of Bayes Theorem.
Basic results in multivariate regression theory 81
is non-negative definite for all linear posterior means x̄, we say that x∗ is a
minimum variance linear posterior mean estimate (MVLPME) of x given y.
Lemma 4 The linear posterior mean x % defined by (4.6) is a MVLPME and its
error variance matrix is given by (4.7).
from which it follows that β = µx − γµy and hence that (4.8) holds. Let x % be the
value of x̄ obtained by putting γ = Σxy Σ−1
yy in (4.8). It follows as in the proof of
Lemma 2 that (4.10) applies so the lemma is proved.
4.3 Filtering
4.3.1 Derivation of the Kalman filter
For convenience we restate the linear Gaussian state space model (3.1) here as
yt = Zt αt + εt , εt ∼ N(0, Ht ),
αt+1 = Tt αt + Rt ηt , ηt ∼ N(0, Qt ), t = 1, . . . , n, (4.12)
α1 ∼ N(a1 , P1 ),
where details are given below (3.1). At various points we shall drop the normality
assumptions in (4.12). Let Yt−1 denote the set of past observations y1 , . . . , yt−1
for t = 2, 3, . . . while Y0 indicates that there is no prior observation before t = 1.
In our treatments below, we will define Yt by the vector (y1 , . . . , yt ) . Starting
at t = 1 in (4.12) and building up the distributions of αt and yt recursively, it is
easy to show that p(yt |α1 , . . . , αt , Yt−1 ) = p(yt |αt ) and p(αt+1 |α1 , . . . , αt , Yt ) =
p(αt+1 |αt ). In Table 4.1 we give the dimensions of the vectors and matrices of
the state space model.
In this section we derive the Kalman filter for model (4.12) for the case where
the initial state α1 is N(a1 , P1 ) where a1 and P1 are known. We shall base the
derivation on classical inference using Lemma 1. It follows from Lemmas 2 to 4
that the basic results are also valid for minimum variance linear unbiased estima-
tion and for Bayesian-type inference with or without the normality assumption.
Returning to the assumption of normality, our object is to obtain the condi-
tional distributions of αt and αt+1 given Yt for t = 1, . . . , n. Let at|t = E(αt |Yt ),
at+1 = E(αt+1 |Yt ), Pt|t = Var(αt |Yt ) and Pt+1 = Var(αt+1 |Yt ). Since all distri-
butions are normal, it follows from Lemma 1 that conditional distributions of
subsets of variables given other subsets of variables are also normal; the distri-
butions of αt given Yt and αt+1 given Yt are therefore given by N(at|t , Pt|t ) and
N(at+1 , Pt+1 ). We proceed inductively; starting with N(at , Pt ), the distribution
of αt given Yt−1 , we show how to calculate at|t , at+1 , Pt|t and Pt+1 from at and
Pt recursively for t = 1, . . . , n.
Let
Vector Matrix
yt p×1 Zt p×m
αt m×1 Tt m×m
εt p×1 Ht p×p
ηt r×1 Rt m×r
Qt r×r
a1 m×1 P1 m×m
Filtering 83
Thus vt is the one-step ahead forecast error of yt given Yt−1 . When Yt−1 and
vt are fixed then Yt is fixed and vice versa. Thus E(αt |Yt ) = E(αt |Yt−1 , vt ). But
E(vt |Yt−1 ) = E(yt − Zt at |Yt−1 ) = E(Zt αt + εt − Zt at |Yt−1 ) = 0. Consequently,
E(vt ) = 0 and Cov(yj , vt ) = E[yj E(vt |Yt−1 ) ] = 0 for j = 1, . . . , t − 1. Also,
where Cov and Var refer to covariance and variance in the conditional joint
distributions of αt and vt given Yt−1 . Here, E(αt |Yt−1 ) = at by definition of at
and
Cov(αt , vt ) = E αt (Zt αt + εt − Zt at ) |Yt−1
= E [αt (αt − at ) Zt |Yt−1 ] = Pt Zt , (4.15)
by definition of Pt . Let
Then
at|t = at + Pt Zt Ft−1 vt . (4.17)
By (4.3) of Lemma 1 in Section 4.2 we have
for t = 1, . . . , n.
Substituting (4.17) into (4.19) gives
at+1 = Tt at|t
= T t a t + Kt v t , t = 1, . . . , n, (4.21)
where
Kt = Tt Pt Zt Ft−1 . (4.22)
The matrix Kt is referred to as the Kalman gain. We observe that at+1 has been
obtained as a linear function of the previous value at and the forecast error vt
of yt given Yt−1 . Substituting from (4.18) and (4.22) in (4.20) gives
Relations (4.21) and (4.23) are sometimes called the prediction step of the
Kalman filter.
The recursions (4.17), (4.21), (4.18) and (4.23) constitute the celebrated
Kalman filter for model (4.12). They enable us to update our knowledge of
the system each time a new observation comes in. It is noteworthy that we
have derived these recursions by simple applications of the standard results of
multivariate normal regression theory contained in Lemma 1. The key advantage
of the recursions is that we do not have to invert a (pt × pt) matrix to fit the
model each time the tth observation comes in for t = 1, . . . , n; we only have to
invert the (p × p) matrix Ft and p is generally much smaller than n; indeed, in
the most important case in practice, p = 1. Although relations (4.17), (4.21),
(4.18) and (4.23) constitute the forms in which the multivariate Kalman filter
recursions are usually presented, we shall show in Section 6.4 that variants of
them in which elements of the observational vector yt are brought in one at a
time, rather than the entire vector yt , are in general computationally superior.
We infer from Lemma 2 that when the observations are not normally dis-
tributed and we restrict attention to estimates which are linear in yt and
unbiased, and also when matrices Zt and Tt do not depend on previous yt ’s,
then under appropriate assumptions the values of at|t and at+1 given by the
filter minimise the variance matrices of the estimates of αt and αt+1 given Yt .
These considerations emphasise the point that although our results are obtained
under the assumption of normality, they have a wider validity in the sense of
minimum variance linear unbiased estimation when the variables involved are
not normally distributed. It follows from the discussion just after the proof of
Lemma 2 that the estimates are also minimum error variance linear estimates.
From the standpoint of Bayesian inference we note that, on the assump-
tion of normality, Lemma 3 implies that the posterior densities of αt and αt+1
given Yt are normal with mean vectors (4.17) and (4.21) and variance matrices
(4.18) and (4.23), respectively. We therefore do not need to provide a sepa-
rate Bayesian derivation of the Kalman filter. If the assumption of normality
Filtering 85
v t = y t − Zt at , Ft = Zt Pt Zt + Ht ,
at|t = at + Pt Zt Ft−1 vt , Pt|t = Pt − Pt Zt Ft−1 Zt Pt , (4.24)
at+1 = Tt at + Kt vt , Pt+1 = Tt Pt (Tt − Kt Zt ) + Rt Qt Rt ,
for t = 1, . . . , n, where Kt = Tt Pt Zt Ft−1 with a1 and P1 as the mean vector and
variance matrix of the initial state vector α1 . The recursion (4.24) is called the
Kalman filter. Once at|t and Pt|t are computed, it suffices to adopt the relations
for predicting the state vector αt+1 and its variance matrix at time t. In Table 4.2
we give the dimensions of the vectors and matrices of the Kalman filter equations.
yt = Zt αt + dt + εt , εt ∼ N (0, Ht ),
αt+1 = Tt αt + ct + Rt ηt , ηt ∼ N (0, Qt ), (4.25)
α1 ∼ N (a1 , P1 ),
where p × 1 vector dt and m × 1 vector ct are known and may change over
time. Indeed, Harvey (1989) employs (4.25) as the basis for the treatment of the
linear Gaussian state space model. While the simpler model (4.12) is adequate
for most purposes, it is worth while presenting the Kalman filter for model (4.25)
explicitly for occasional use.
Vector Matrix
vt p×1 Ft p×p
Kt m×p
at m×1 Pt m×m
at|t m×1 Pt|t m×m
86 Filtering, smoothing and forecasting
Defining at = E(αt |Yt−1 ) and Pt = Var(αt |Yt−1 ) as before and assuming that
dt can depend on Yt−1 and ct can depend on Yt , the Kalman filter for (4.25)
takes the form
vt = yt − Zt at − dt , Ft = Zt Pt Zt + Ht ,
at|t = at + Pt Zt Ft−1 vt , Pt|t = Pt − Pt Zt Ft−1 Zt Pt , (4.26)
at+1 = Tt at|t + ct , Pt+1 = Tt Pt|t Tt + Rt Qt Rt ,
for t = 1, . . . , n. The reader can easily verify this result by going through the
argument leading from (4.19) to (4.23) step by step for model (4.25) in place of
model (4.12).
y t = Z t α t + εt , αt+1 = Tt αt + Rt ηt ,
we obtain the innovation analogue of the state space model, that is,
v t = Z t xt + εt , xt+1 = Lt xt + Rt ηt − Kt εt , (4.30)
since Cov(xt , ηt ) = 0. Relations (4.30) will be used for deriving the smoothing
recursions in the next section.
We finally show that the one-step ahead forecast errors are independent of
each other using the same arguments as in Subsection 2.3.1. The joint density
of the observational vectors y1 , . . . , yn is
n
p(y1 , . . . , yn ) = p(y1 ) p(yt |Yt−1 ).
t=2
n
p(v1 , . . . , vn ) = p(vt ),
t=1
since p(y1 ) = p(v1 ) and the Jacobian of the transformation is unity because each
vt is yt minus a linear function of y1 , . . . , yt−1 for t = 2, . . . , n. Consequently
v1 , . . . , vn are independent of each other, from which it also follows that vt , . . . , vn
are independent of Yt−1 .
7.1 Introduction
So far we have developed methods for estimating parameters which can be placed
in the state vector of model (4.12). In virtually all applications in practical work
the models depend on additional parameters which have to be estimated from
the data; for example, in the local level model (2.3) the variances σε2 and ση2 are
unknown and need to be estimated. In classical analyses, these additional param-
eters are assumed to be fixed but unknown whereas in Bayesian analyses they
are assumed to be random variables. Because of the differences in assumptions
the treatment of the two cases is not the same. In this chapter we deal with clas-
sical analyses in which the additional parameters are fixed and are estimated by
maximum likelihood. The Bayesian treatment for these parameters is discussed
as part of a general Bayesian discussion of state space models in Chapter 13 of
Part II.
For the linear Gaussian model we shall show that the likelihood can be cal-
culated by a routine application of the Kalman filter, even when the initial state
vector is fully or partially diffuse. We also give the details of the computation
of the likelihood when the univariate treatment of multivariate observations is
adopted as suggested in Section 6.4. We go on to consider how the loglikelihood
can be maximised by means of iterative numerical procedures. An important part
in this process is played by the score vector and we show how this is calculated,
both for the case where the initial state vector has a known distribution and for
the diffuse case. A useful device for maximisation of the loglikelihood in some
cases, particularly in the early stages of maximisation, is the EM algorithm; we
give details of this for the linear Gaussian model. We go on to consider biases
in estimates due to errors in parameter estimation. The chapter ends with a
discussion of some questions of goodness-of-fit and diagnostic checks.
n
L(Yn ) = p(y1 , . . . , yn ) = p(y1 ) p(yt |Yt−1 ),
t=2
n
log L(Yn ) = log p(yt |Yt−1 ), (7.1)
t=1
where p(y1 |Y0 ) = p(y1 ). For model (3.1), E(yt |Yt−1 ) = Zt at . Putting vt = yt −
Zt at , Ft = Var(yt |Yt−1 ) and substituting N(Zt at , Ft ) for p(yt |Yt−1 ) in (7.1), we
obtain
1
n
np
log L(Yn ) = − log 2π − log|Ft | + vt Ft−1 vt . (7.2)
2 2 t=1
The quantities vt and Ft are calculated routinely by the Kalman filter (4.24)
so log L(Yn ) is easily computed from the Kalman filter output. We assume that
Ft is nonsingular for t = 1, . . . , n. If this condition is not satisfied initially it is
usually possible to redefine the model so that it is satisfied. The representation
(7.2) of the loglikelihood was first given by Schweppe (1965). Harvey (1989, §3.4)
refers to it as the prediction error decomposition.
and we work with log Ld (Yn ) in place of log L(Yn ) for estimation of unknown
parameters in the diffuse case. Similar definitions for the diffuse loglikelihood
function have been adopted by Harvey and Phillips (1979) and Ansley and Kohn
(1986). As in Section 5.2, and for the same reasons, we assume that F∞,t is
The state space model framework accommodates both classical and Bayesian techniques by providing a unified structure where inference results for the state parameters are consistent under both approaches. This is evident in computational methods such as the Kalman filter and smoothing recursions, which give the same form from Bayesian and classical perspectives. For instance, the Bayesian analysis considers the prior distribution and updates it with new observations, mirroring the recursive updating of estimates in the classical approach . Additionally, the computation of filtered estimates and their variances use formulas derived in a way that is valid for both probabilistic interpretations .
In the context of a local level model, the Kalman filter plays a crucial role by updating estimates of the state vector each time a new observation is received. It computes the filtered estimator and one-step ahead predictor of the state, utilizing known conditional distributions . The filter, through recursive equations, permits real-time updating of the state estimates and their variances, incorporating noise and missing observations while handling them effectively within the state space framework .
The four lemmas from elementary multivariate regression are foundational for developing the theory of the linear state space model. They ensure that the formulae for classical models assuming normality remain valid under linear minimum variance and Bayesian conditions. These results demonstrate equivalence in estimation methods across different assumptions, notably in deriving the Kalman filter and smoothing recursions. These derivations enable consistent estimation of state vectors and their conditional variance matrices, crucial for practical applications where assumptions about the distribution cannot strictly be Gaussian .
Exponential smoothing fits into state space and ARIMA models by being expressible as simple forms of these models. The historical development of exponential smoothing showed that it could be cast under the state space paradigm, helping to unify various forecasting methods. Here, state space models are flexible enough to represent the dynamics captured by exponential smoothing, linking it theoretically to ARIMA models which are well-known in time series analysis. This transformation highlights the state space framework's capability to represent a wide array of time series models, demonstrating its broad applicability .
Minimum variance linear unbiased estimators (MVLUE) are applied in state space models by seeking estimates of states that are linear functions of observations and unbiased. The advantage of using MVLUE in state space models lies in their ability to provide estimates with the smallest possible estimation error variance. This approach is critical when model assumptions like normal distribution are questionable, providing robustness in estimation by focusing on linear unbiased functions that minimize variance, thereby ensuring efficiency .
Simulation smoothing in state space models is significant as it helps generate random samples from the smoothed densities of the state and disturbance vectors given the observations. This process is crucial for simulation-based inference methods, aiding in the estimation of the state and disturbances even when data is incomplete or noisy. By providing a basis for evaluating model performance and conducting diagnostic checks, simulation smoothing enhances the robustness and applicability of state space models .
A multivariate time series can be transformed into a univariate series in the state space approach by sequentially incorporating elements of the observational vectors into the system. This method can result in computational savings as it reduces the dimensionality of the problem without losing critical information. By treating high-dimensional observation vectors one element at a time, the state space method simplifies calculations and allows efficient handling of the data .
Initialisation plays a vital role in the application of recursive algorithms within the state space approach by setting baseline conditions for further computations. It determines how the recursions are started at the initial stage, which is essential since the accuracy of subsequent state estimates relies heavily on the initial state set-up. Parameters involved in the initial state can be known distributions, treated as random variables with infinite variance, or estimated as unknown constants with methods like maximum likelihood, ensuring the robustness of the analysis from the onset .
The state space approach handles missing observations uniquely by integrating them seamlessly into the filtering and smoothing recursions. This capability allows the approach to adapt dynamically by considering future observations as if they were missing, thus facilitating straightforward forecasting. Unlike other models that might require complex data handling procedures, the state space model's recursive nature allows for simple modifications that accommodate missing data, thus preserving the integrity and continuity of analysis .
Continuous-time state space modelling handles time series data by accounting for changes continuously over time, unlike discrete-time models which rely on specific time intervals. This approach can capture the dynamics and temporal relationships in data more fluidly, offering a potentially more accurate depiction of systems in contexts where phenomena do not naturally occur at regular intervals. Moreover, it aligns better with the theoretical frameworks of certain systems where time is a fundamental and unbroken parameter .