0% found this document useful (0 votes)
75 views42 pages

Lopes 2010

Uploaded by

Daan2213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views42 pages

Lopes 2010

Uploaded by

Daan2213
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Journal of Forecasting

J. Forecast. 30, 168–209 (2011)


Published online 30 July 2010 in Wiley Online Library
([Link]) DOI: 10.1002/for.1195

Particle Filters and Bayesian Inference in


Financial Econometrics
HEDIBERT F. LOPES* AND RUEY S. TSAY
University of Chicago Booth School of Business, Chicago,
IL, USA

ABSTRACT
In this paper we review sequential Monte Carlo (SMC) methods, or particle
filters (PF), with special emphasis on its potential applications in financial time
series analysis and econometrics. We start with the well-known normal dynamic
linear model, also known as the normal linear state space model, for which
sequential state learning is available in closed form via standard Kalman filter
and Kalman smoother recursions. Particle filters are then introduced as a set of
Monte Carlo schemes that enable Kalman-type recursions when normality or
linearity or both are abandoned. The seminal bootstrap filter (BF) of Gordon,
Salmond and Smith (1993) is used to introduce the SMC jargon, potentials and
limitations. We also review the literature on parameter learning, an area that
started to attract much attention from the particle filter community in recent years.
We give particular attention to the Liu–West filter (2001), Storvik filter (2002)
and particle learning (PL) of Carvalho, Johannes, Lopes and Polson (2010). We
argue that the BF and the auxiliary particle filter (APF) of Pitt and Shephard
(1999) define two fundamentally distinct directions within the particle filter lit-
erature. We also show that the distinction is more pronounced with parameter
learning and argue that PL, which follows the APF direction, is an attractive
extension. One of our contributions is to sort out the research from BF to APF
(during the 1990s), from APF to now (the 2000s) and from Liu–West filter to
Storvik filter to PL. To this end, we provide code in R for all the examples of
the paper. Readers are invited to find their own way into this dynamic and active
research arena. Copyright © 2010 John Wiley & Sons, Ltd.

key words particle learning; sequential Monte Carlo; Markov chain Monte
Carlo; stochastic volatility; realized volatility; Nelson–Siegel
model

INTRODUCTION

The Kalman filter (KF) and its many variants and generalizations have played a fundamental role
in modern time series analysis by allowing the study and estimation of complex dynamics and by
drawing the attention of researchers and practitioners to the rich class of state-space models, also
known as dynamic models (Harvey, 1989; West and Harrison, 1997). Well-known and widely used

* Correspondence to: Hedibert F. Lopes, University of Chicago Booth School of Business, 5807 South Woodlawn Avenue,
Chicago, IL 60637, USA. E-mail: hlopes@[Link]

Copyright © 2010 John Wiley & Sons, Ltd.


Particle Filters and Bayesian Inference 169

variants of KF include (i) the extended KF (Jazwinski, 1970; West et al., 1985); (ii) the
Gaussian sum filter (Alspach and Sorenson, 1972); (iii) the unscented KF (Julier and Uhlmann, 1997;
Van der Merwe et al., 2000); and (iv) the Gaussian quadrature KF (Ito and Xiong, 2000).
Despite their wide applicability, approximations provided by these variants become less effective
when substantial nonlinearities and/or extreme non-Gaussianity are present in the data. To overcome
the difficulty, the last two decades have been exposed to an increasing number of Monte Carlo (MC)-
based approximations for state-space models. These MC methods are basically divided into two major
categories: Markov chain Monte Carlo (MCMC) schemes for offline/batch sampling and sequential
Monte Carlo (SMC) schemes for online/sequential sampling. For example, Carlin et al. (1992), Carter
and Kohn (1994), Frühwirth-Schnatter (1994) and Shephard (1994) propose MCMC methods to esti-
mate general state-space models. However, MCMC-based algorithms are prohibitively costly when
performing online estimation of states and parameters; see Gamerman and Lopes (2006).
SMC methods, also known as particle filters, are MC schemes that, when used in the state-space
context, rebalance draws from the posterior distribution of the states and parameters at a given time
(the particles) based on the next observation via its likelihood. In their seminal paper, Gordon et al.
(1993) propose one of the most popular filters, the bootstrap filter (BF), which is based on a sampling
importance resampling (SIR) argument (Smith and Gelfand, 1992). Also influential from the start
are the works on sequential Bayesian imputation by Kong et al. (1994) and Liu and Chen (1995).
In this paper we review the bootstrap filter and its variants. We also introduce the auxiliary particle
filter (APF) of Pitt and Shephard (1999) (see also the discussion in Liu and Chen, 1998) and argue
that both filters define two directions within the SMC literature, namely sample–resample and resa-
mple–sample methods. This is done in the third section, which ends with a list of additional review
papers and books on SMC. The fourth section starts by showing how both BF and APF can be used
to approximate the likelihood function of fixed parameters. We then introduce the Liu and West
filter (Liu and West, 2001), which generalizes APF to sequentially update the posterior distributions
of parameters. The section also introduces the particle learning (PL) of Carvalho et al. (2010). Some
illustrative examples appear in the fifth section. Final remarks and current research directions are
presented in the sixth section.

NORMAL DYNAMIC LINEAR MODEL

To introduce the ideas of the particle filter, let us start with the well-known normal dynamic linear
model (NDLM):

yt = Ft ′xt + vt (1)

xt = Gt xt −1 + wt (2)

where vt and wt are temporally and mutually independent Gaussian sequences with zero mean and
variances σ t2 and τ 2t , respectively. Equation (1) is referred to as the observation equation that relates
the observed series yt to the state vector xt. Equation (2) is the state transition equation that governs
the time evolution of the state, which might be latent. The local level and the local linear trend
models are special cases of the NDLM. In the local level model, yt = xt + vt and xt = xt−1 + wt, so that
Ft = Gt = 1, σ t2 = σ2 and τ 2t = τ2 for all t. In the local linear trend model, yt = x1t + vt, x1t = x1,t−1 +
x2,t−1 + w1t and x2t = x2,t−1 + w2t, and we have xt = (x1t, x2t)′, Ft = (1, 0)′, G = (g1, g2), g1 = (1, 0)′, g2 =
(1, 1)′, σ t2 = σ2 and τ 2t = τ for all t, where τ is a 2 × 2 positive definite matrix.

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
170 H. F. Lopes and R. S. Tsay

Conditionally on the quadruple {Ft, Gt, σ 2t, τ 2t}, for t = 1, . . . , T, and on the initial distribution
(x0|y0) ~ N (m0, C0), it is straightforward to show that

xt y t −1 ∼ N ( at , Rt ) (3)

yt y t −1 ∼ N ( ft , Qt ) (4)

xt y t ∼ N ( mt , Ct ) (5)

for t = 1, . . . , T, where yt = (y1, . . . , yt)′ and N(a, b) denotes the normal distribution with mean a
and variance b. The three densities in equations (3)–(5) are referred to as the propagation density,
predictive density and filtering density, respectively. In fact, the propagation and filtering densities
are the prior density of xt given yt−1 and the posterior density of xt given yt. The means and variances
of the three densities are provided by the Kalman recursions:

at = Gt mt −1 and Rt = Gt Ct −1Gt′ + τ t2 (6)

ft = Ft ′at and Qt = Ft ′Rt Ft + σ t2 (7)

mt = at + At et and Ct = Rt − At Qt At′ (8)

where et = yt − ft is the prediction error and At = RtFtQt−1 is the Kalman gain. Two other useful densi-
ties are the conditional and marginal smoothed densities:

xt xt +1, yt ∼ N ( ht , Ht ) (9)

xt yT ∼ N ( mtT , CtT ) (10)

where

ht = mt + Bt ( xt +1 − at +1 ) and Ht = Ct − Bt Rt +1 Bt′ (11)

mtT = mt + Bt ( mtT+1 − at +1 ) and CtT = Ct − Bt2( Rt +1 − CtT+1 ) (12)

and Bt = CtG′t+1R −1
t+1. See West and Harrison (1997, Ch. 4) for additional details.
An important and rich subclass of the NDLM assumes that Ft and Gt are both known, while σ t2 =
σ2 and τ t2 = τ2 are both unknown variances. In this case, the above Kalman recursions can be used
to marginalize out the states based on equation (4), i.e.,
T
p ( yT σ 2, r 2 ) = ∏ f ( yt ; ft , Qt ) (13)
t =1

where f(x; μ, σ2) is the density of a normal random variable with mean μ and variance σ2 evaluated
at x. Note that here ft and Qt are both nonlinear functions of (σ2, τ2). In other words, should the main
objective be sampling from p(xT, σ2, τ2|yT), then draws can be obtained in two steps:

1. Draw (σ2, τ2) from p(σ2, τ2|yT), which is proportional to the prior p(σ2, τ2) times the likelihood
from equation (12).

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 171

2. Draw xT from p(xT|σ2, τ2) by first computing forward moments via equations (6)–(8) and (11),
and then sampling backward xt conditional on xt+1 and yt via equation (9).

Sampling (σ2, τ2) from step 1 can be performed by SIR, acceptance–rejection or Metropolis–
Hastings-type algorithms or replaced by a Gibbs step that draws (σ2, τ2) conditional on (yT, xT). Reis
et al. (2006) compare the performance of these sampling and other MCMC schemes in the context
of the local level model. Step 2 is known as the forward filtering, backward sampling (FFBS) algo-
rithm (Carter and Kohn, 1994; Frühwirth-Schnatter, 1994).
For the NLDM, all densities needed for making inference are well known and they can easily be
carried out in applications. On the other hand, difficulties arise when the model is nonlinear or non-
Gaussian, because no closed-form densities are available. As we discuss below, particle filters
provide an effective approach to overcoming the difficulty.

BASIC PARTICLE FILTERS

Let us consider a more general dynamic model where the assumptions of normality and/or linearity
are relaxed. The observation and state transition equations become

yt xt ∼ p ( yt xt ) ,

xt xt −1 ∼ p ( xt xt −1 ) , t = 1, 2, . . .

Denote the initial probability density of the state by p(x0). All static parameters, such as σ2 and
τ from the previous section, are assumed to be known throughout this section. Batch and sequential
2

parameter learning are deferred to the next section. The Kalman recursions from equations (3) and
(5) are now replaced, respectively, by

p ( xt y t −1 ) = ∫ p ( xt xt −1 ) p ( xt −1 y t −1 ) dxt −1 (14)

and

p ( yt xt ) p ( xt y t −1 )
p ( xt y t ) = (15)
p ( yt y t −1 )

In most real-world applications, outside the realm of NDLM, the integration with respect to xt−1
in (14) and the implementation of Bayes’ theorem in (15) are both analytically intractable and/or
computationally costly. As mentioned in the Introduction, there exist approximations for sequential
state estimation and filtering based on Kalman-like filters, such as the extended Kalman filter and
the unscented Kalman filter. Also, as discussed in the Introduction, there exist several MCMC-type
samplers for batch estimation of the whole state vector and parameters similar to the FFBS intro-
duced in the previous section.
Particle filters, loosely speaking, combine the sequential estimation nature of Kalman-like filters
with the flexibility for modeling of MCMC samplers, while avoiding some of the their shortcomings.
On the one hand, like MCMC samplers and unlike Kalman-like filters, particle filters are designed
to allow for more flexible observational and evolutional dynamics and distributions. On the other

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
172 H. F. Lopes and R. S. Tsay

hand, like Kalman-like filters and unlike MCMC samplers, particle filters provide online filtering
and smoothing distributions of states and parameters.
The goal of most particle filters is to draw a set of i.i.d. particles {xt(i)}i=1
N
that approximates p(xt|yt)
by starting with a set of i.i.d. particles {x t−1}i=1 that approximates p(xt−1|y ).
(i) N t−1

The most popular filters are the bootstrap filter (BF), also known as the sequential importance sam-
pling with resampling (SISR) filter, proposed by Gordon et al. (1993), and the auxiliary particle filter
(APF), also known as the auxiliary SIR (ASIR) filter, proposed by Pitt and Shephard (1999). However,
it is worth mentioning that one of the earliest sequential Monte Carlo algorithms was proposed by West
(1992). For recent discussion regarding the similarities and differences between BF and APF see, for
instance, Carvalho et al. (2010), Doucet and Johansen (2008), and Douc et al. (2009b).

Propagate–resample filters
The BF of Gordon et al. (1993) is based on sequential SIR steps over time (Smith and Gelfand,
1992). The Kalman recursions from (14) and (15) are combined in
p ( xt , xt −1 yt , y t −1 ) ∝ p ( yt xt ) p ( xt xt −1 ) p ( xt −1 y t −1 )

    (16)
2. Resample 1. Propagate

In other words, the BF first propagates particles from the posterior at time t − 1 in order to generate
particles from the prior at time t. Then it resamples the propagated particles with weights proportional
to their likelihoods. This is Algorithm 1 below, whose recursions are illustrated in Figure 1.

Resample–propagate filters
Similarly, the APF first resamples particles from the posterior at time t − 1 with weights taking into
account the next observed data point, yt. Then it propagates the resampled particles. The identity
from equation (16) is rewritten as

Figure 1. Bootstrap filter. A schematic representation of the bootstrap filter over two time periods. The squares
are yt+1 and yt+2. From top to bottom, the first, second, fourth and fifth set of dots represent particles, while the
third and sixth set of dots represent particle weights

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 173

p ( xt , xt −1 yt , y t −1 ) ∝ p ( xt xt −1, y t ) p ( yt xt −1 ) p ( xt −1 y t −1 )
    (17)
2. Propagate 1. Resample

The main difficulty with the APF is that, in most applications, neither is p(yt|xt−1) available for
pointwise evaluation (resampling) nor is p(xt|xt−1, yt) available for sampling (propagation). The APF
is fully adapted when these conditions are satisfied. The main suggestion in Pitt and Shephard (1999)
for general state-space models is as follows:
(a) use p(yt|g(xt−1)), i.e. the data density p(yt|xt) evaluated at g(xt−1) (usually the expected value,
median or mode of the state transition density p(xt|xt−1)) as the proposal weight to resample the
old particle xt−1; and
(b) use q(xt|xt−1, yt) ≡ p(xt|xt−1) as the proposal density to propagate resampled particles to the new
set of particles {xt(i)}i=1
N
Note that here q(·) is blind since it does not incorporate the current obser-
vations yt. See below for further details on better ways of choosing q(·).
Since both resampled and propagated particles come from proposal densities, it follows directly
from a simple SIR argument that these particles have weights given by
p ( yt xt ) p ( xt xt −1 ) p ( xt −1 y t −1 )
wt ∝
p ( yt g( xt −1 )) p ( xt xt −1 ) p ( xt −1 y t −1 )
(18)
p ( yt xt )
=
p ( yt g( xt −1 ))
This leads to Algorithm 2 below.

Algorithm 1: Bootstrap filter (BF)


(i) N
1. Propagate {x t−1 }i=1 to {x̃ t(i)}i=1
N
via p(xt|xt−1).
2. Resample {xt }i=1 from {x̃ t }i=1 with weights wt(i) ∝ p(yt|x̃ t(i)).
(i) N (i) N

Algorithm 2: Auxiliary particle filter (APF)


t−1}i=1 from {x t−1}i=1 with weights wt ∝ p(yt|g(x t−1)).
1. Resample {x̃ (i) N (i) N (i) (i)

2. Propagate {x̃ t−1}i=1 to {x̃ t }i=1 via p(xt|x̃ t−1).


(i) N (i) N

3. Resample {xt(i)}i=1
N
from {x̃ t(i)}i=1
N
with weights wt(i) ∝ p(yt|x̃ t(i))/p(yt|g(x̃ (i)t−1)).

Algorithms 3 and 4 below are the optimal and fully adapted versions of BF and APF when p(yt|xt−1)
is analytically tractable and p(xt|xt−1, yt) easy to sample from.

Algorithm 3: Optimal bootstrap filter (OBF)


1. Propagate {x(i)t−1}i=1
N
to {x̃ t(i)}i=1
N
via p(xt|xt−1, yt).
2. Resample {xt }i=1 from {x̃ t }i=1 with weights wt(i) ∝ p(yt|x (i)
(i) N (i) N
t−1).

Algorithm 4: Optimal auxiliary particle filter (OAPF)


t−1}i=1 from {x t−1}i=1 with weights wt ∝ p(yt|x t−1).
1. Resample {x̃ (i) N (i) N (i) (i)

2. Propagate {x̃ t−1}i=1 to {xt }i=1 via p(xt|x̃ t−1, yt).


(i) N (i) N

Choosing the proposal


Improvements on the basic particle filter algorithm include the use of better proposal distributions
in the importance sampling stage (Pitt and Shephard, 1999), the use of Markov chain Monte Carlo

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
174 H. F. Lopes and R. S. Tsay

sampling (Gilks and Berzuini, 2001; Fearnhead, 2002; Fearnhead and Clifford, 2003) and the use
of resampling (Liu and Chen, 1995; Carpenter et al., 1999), which can be important to avoid having
only a small number of particles with non-negligible weight.
Pitt and Shephard (1999) suggest local linearization of the observation equation (via an extended
Kalman filter-type approximation) in order to construct a proposal propagation density, say q(xt|xt−1,
yt), for the OAPF propagation density p(xt|xt−1, yt), which takes into account the current observation
yt and, potentially, outperforms the naïve blind propagation proposal density p(xt|xt−1). See Liu and
Chen (1995), Carpenter et al. (1999), Gilks and Berzuini (2001), Doucet et al. (2000), Fearnhead
(2002) and Guo et al. (2005), amongst others, for additional discussion on the choice of q(xt|xt−1, yt).
See Chen and Lai (2007) for an interesting application of online identification and adaptive control
of autoregressive models with exogenous inputs (ARX models) with Markov parameter jumps. More
efficient proposal densities can be obtained in the presence of conditional linearity and/or normality.
In other words, when the split of the state vector xt into x1t and x2t leads to, say, x1t|x2t being an
NDLM, then part of the sequential learning algorithm can be performed exactly by analytically
integrating out x1t. Such filters are commonly referred to as the Rao–Blackwellized particle filter or
mixture Kalman filter (Chen and Liu, 2000; Andrieu and Doucet, 2002).

Resampling or not?
It has been argued that the resampling step in the BF and the second resampling step in the APF
should only be performed when particle degeneracy is signaled. For instance, Kong et al. (1994)
introduced the effective sample size, Neff, which they estimate by

1
N̂ eff = (19)
∑ ( w( ) )
N i 2
i =1 t

The particle set that approximates p(xt|yt) is then represented by {(x̃ t, wt)(i)}i=1
N
, using the notation
from Algorithms 1 and 2 above.

Reducing MC error
Regardless of whether resampling is performed at each time period or not, when the goal is to
produce summary statistics based on the posterior p(xt|yt) (for instance, mean, variance, quantiles),
it is more efficient (estimator with lower variance) to perform the computation prior to resampling.
For instance, it is more efficient to estimate E(xt|yt) by Σi=1
N
ω (i)
t x̃ t /Σ j=1wt than by Σ i=1x̃ t/N.
(i) N (j) N

Example 1: Local level model. In this example, we use the local level model to compare the per-
formance of the four particle filter algorithms discussed above. They are the BF, APF, OBF and
OAPF. As mentioned in the second section (‘Normal dynamic linear model’ above), the local level
model is yt|xt ~ N(xt, σ2) and xt|xt−1 ~ N(xt−1, τ2). For this simple linear model, the traditional Kalman
filter is available to produce the ‘optimal’ estimate of the filtered state vector. We use this estimate
in evaluating the performance of particle filters.
Based on results of the second section, it is easy to see that (i) xt|yt ~ N(mt, Atσ2), where mt = (1
− At)mt−1 + Atyt and At = (At−1σ2 + τ2)/(At−1σ2 + τ2 + σ2), (ii) yt|xt−1 ~ N(xt−1, σ2 + τ2), and (iii) xt|xt−1, yt
~ N(ω2xt−1/τ2 + yt/σ2), ω2), where ω−2 = σ−2 + τ−2. Thus the four particle-filter algorithms are easy to
implement. To compare the filters, we employ the criterion of mean square errors, which are com-
puted using R runs of each particle filter f in {BF, OBF, APF, OAPF}, across M time series of length
T. Specifically, the MSE is given by MSEf = Σt,m,r(x̂ ftmr − x̃ tm)2/(TMR), where x̃ tm is obtained via the

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 175

standard Kalman filter recursions (equations (3)–(8)) for the mth dataset up to time t, and x̂ ftmr =
ΣNi=1x(i)ftmr/N is the particle approximation for x̃ tm based on N particles {x(i)ftmr}Ni=1.
The relative MSE, relative to the bootstrap filter, is defined as RMSEf = MSEf /MSEBF for f in
{OBF, APF, OAPF}. Results are summarized in Figure 2. From the plots, OAPF outperforms OBF
for all four values of τ2 and OBF fares better than BF for all four values of τ2. Also, it seems that
BF performs much better than APF when the signal-to-noise ratio, τ/σ, is greater than one.

Review papers
Since Gordon et al. (1993), several review papers have contributed to straightening out the sub-area
of sequential Monte Carlo. Here we list a small subset of these papers. The choice is rather subjec-
tive and based on our limited and biased views of the field. A few of the early reviews are by Doucet
et al. (2000) and Arulampalam et al. (2002), books by Liu (2001), Doucet et al. (2001a) and Ristic
et al. (2004) and the 2002 special issue of IEEE Transactions on Signal Processing on sequential
Monte Carlo methods. See also the review by Chen (2003).
More recent studies, along with this paper, are Cappé et al. (2007), Doucet and Johansen (2010)
and Prado and West (2010, Ch. 6). They carefully organize and highlight the fast development of
the field over the last decade, such as parameter learning, more efficient particle smoothers, particle
filters for highly dimensional dynamic systems and, perhaps the most recent one, the interconnections
between MCMC and SMC methods.

PARAMETER LEARNING

Consider again the general dynamic model. We now address explicitly the unknown vector of static
parameters θ of the model:

yt xt , θ ∼ p ( yt xt , θ ) (20)

xt xt −1, θ ∼ p ( xt xt −1, θ ) (21)

for t = 1, . . . , T and initial probability density p(x0|θ) and prior p(θ). There are primarily two ways
to tackle the problem of learning θ: batch sampling and online sampling.

Batch sampling
The solution involves obtaining an approximation, say pN(yT|θ), to the joint likelihood p(yT|θ). For
the NDLM of the second section (‘Normal dynamic linear model’ above), the predictive density was
obtained analytically from equation (13). The approximation pN(yT|θ) can be obtained by any of the
previous filters as
T T N
1
p N ( yT θ ) = ∏ p N ( yt y t −1, θ ) = ∏∑ p(y t xt(i), θ ) (22)
t =1 NT t =1 i =1

where xt(i) ~ p(xt|x(i)t−1, θ), for i = 1, . . . , N. Therefore, the components of θ can be sampled iteratively
via a standard MCMC sampler, such as a Metropolis–Hastings algorithm, or via an SIR step. Two
of the major drawbacks of this solution are that (i) SMC loses its appealing sequential nature and
(ii) the overall MCMC or SIR scheme can be highly sensitive to the approximation pN(yT|θ). See,
for instance, Chopin (2002) and Del Moral et al. (2006) for more theoretical justifications and further

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
176 H. F. Lopes and R. S. Tsay

N=50 N=100

1.4

1.4
1.2

1.2
APF
APF
1.0

1.0
OBF
RMSE

RMSE
0.8

0.8
OBF
0.6

0.6
OAPF
0.4

0.4
OAPF
0.2

0.22 0.71 1 1.41 0.2 0.22 0.71 1 1.41

tau tau

N=100 N=1000
2.5

2.5
2.0

2.0
1.5

1.5
RMSE

RMSE

APF
APF
1.0

1.0

OBF OBF
0.5

0.5

OAPF OAPF

0.22 0.71 1 1.41 0.22 0.71 1 1.41

tau tau

Figure 2. Comparison between BF, OBF, APF and OAPF via relative mean square error (RMSE). Local level
model is used, where yt|xt ~ N(xt, σ2) and xt|xt−1 ~ N(xt−1, τ2), for t = 1, . . . , T, x0 ~ N(m0, C0), σ = 1, τ = 0.22,
0.71, 1.0 or 1.41, m0 = 0 and C0 = 10. The starting value is x0 = 0 and N denotes the number of particles. RMSE
is based on M = 10 time series of length T and R = 10 runs of each particle filter per time series. Top row:
sample size T = 100; bottom row: T = 1000

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 177

details, Doucet and Tadic (2003), Andrieu et al. (2004), Poyiadjis et al. (2005), Andrieu et al. (2005)
and Olsson et al. (2008) for expectation-maximization-like schemes, and Fernández-Villaverde and
Rubio-Ramírez (2005, 2007) and DeJong et al. (2009) for applications in dynamic stochastic general
equilibrium macroeconomic models.

Particle filters and MCMC


Before introducing particle filters that learn about parameters in a sequential manner, we should
mention that hybrid schemes that combine particle methods and MCMC methods are abundant. Gilks
and Berzuini (2001) and Polson et al. (2008), for instance, use MCMC steps to sample and replenish
static parameters in dynamic systems. Andrieu et al. (2010) introduce particle MCMC methods to
efficiently construct proposal distributions in high dimension via SMC methods.

Online sampling
The solution here is to produce sequential MC approximations to p(xt, θ|yt), sometimes p(xt−1, xt, θ|yt)
and/or other small dimensional functions (small compared to t) of (xt, θ) conditional on yt. Simply
resampling θ over time is bound to fail since, in general, after a few time steps the particle set will
contain only one particle. Gordon et al. (1993) suggest incorporating artificial evolution noise for θ
when tackling the problem of sequentially learning the static parameters of a state-space model.
Since parameters are not states, adding noise artificially will eventually distort and compromise the
validity of the approximated posterior distributions. Their approach imposes a loss of information
in time as artificial uncertainties added to the parameters eventually result in a very diffuse posterior
density for θ. In what follows, we introduce three well-established filters for sequentially learning
both xt and θ: (i) the Liu and West filter; (ii) the Storvik filter; and (iii) the particle learning (PL)
filter of Carvalho et al. (2010) and Lopes et al. (2010).

Liu and West filter


Liu and West (2001) combine (i) the APF of Pitt and Shephard (1999), (ii) a kernel smoothing
approximation to p(θ|yt−1) via a mixture of multivariate normals, and (iii) a neat shrinkage idea to
incorporate artificial evolution for θ without the associated loss of information; see West (1993a,b).
More specifically, let the set of i.i.d. particles {x(i)t−1, θ (i)
t−1}i=1 approximate p(xt−1, θ|y ) such that
N t−1
t−1
p(θ|y ) can be approximated by

1 N
p N (θ y t −1 ) = ∑ f (θ ; m( j), V )
N j =1
(23)

– – – –(j) –
where m(j) = aθ (j)
t−1 + (1 − a)θ , θ = Σ j=1θ t−1/N, V = h Σ j=1(θ t−1 − θ )(θ t−1 − θ )′/N and h = 1 − a . The
N (j) 2 N (j) 2 2

subscript t of θt is used only to indicate that samples are from p(θ|y ). The APF of Pitt and Shephard
t

(1999) of equation (17) can now be written for the state vector (xt, θt) as

p ( xt , xt −1, θ t ,θ t −1 yt , y t −1 ) = p ( yt xt −1, θ t −1 ) p ( xt −1 θ t −1, y t −1 ) p (θ t −1 y t −1 )



1. Resample
(24)
× p ( xt xt −1, θ t , y ) p (θ t θ t −1, y t )
t
  
2. Propagate

In general and similar to the APF filter of the third section, p(yt |xt−1, θ) is not available for point-wise
evaluation and/or p(xt |xt−1, θt, yt) is not easy to sample from. Liu and West resample old particles with

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
178 H. F. Lopes and R. S. Tsay

weights proportional to p(yt |g(xt−1), m(θt−1)), where g(·) and m(·) are described above. Then, they propa-
gate θt from the proposal propagation density p(θt |θt−1) and propagate xt conditional on θt from the
evolution density q(xt|xt−1, θ, yt) ≡ p(xt |xt−1, θt). The propagated particles (xt, θt) have associated weights
p ( yt xt , θ t )
ω t ∝
p ( yt g( xt −1 ) , m (θ t −1 ))
which leads to Algorithm 5 below.
The performance of the LW filter depends on the choice of tuning parameter a, which drives both
the shrinkage and the smoothness of the normal approximation. It is common practice to set a around
0.98 or higher. The components of θ can be either transformed in order to accommodate the approxi-
mate local normality or the multivariate normal approximation could be replaced by a composition
of, say, conditionally normal densities for location parameters and inverse-gamma densities for scale/
variance parameters. See, for example, Petris et al. (2009, pp. 222–228) for an example based on
the local level model and Carvalho and Lopes (2007) for an application to Markov switching
stochastic volatility models.

Example 2: Stochastic volatility model. In its simplest form, asset returns yt are modeled as con-
ditionally independent normal random variables with log-variance xt following a first-order autore-
gressive model, i.e. yt |xt ~ N (0, exp{xt}) and xt |xt−1 ~ N(α + βxt−1, τ 2); see Jacquier et al. (1994) and
Kim et al. (1998). One possible version of the LW filter assumes, for example, that θ = (α, β, logτ2)
and g(xt−1) = α + βxt−1.

Algorithm 5: Liu and West filter (LWF)


1. Resample {(x̃ t−1, θ˜t−1)(i)}i=1
N
from {xt−1, θt−1)(i)}Ni=1 with weights wt(i) ∝ p(yt|g(x(i) (i) (i)
t−1), m ), where m is
defined in equation (23).
2. Propagate
(a) {(θ˜(i) N (i) N (i)
t−1}i=1 to {(θ̂ t }i=1 via N(m̃ , V), then
(b) {(x̃ t−1}i=1 to {(x̂ t }i=1 via p(xt |x̃ (i)
(i) N (i) N
t−1, θ t ).
˜ (i)
p ( yt xˆ t(i), θˆ t(i) )
3. Resample {(xt, θt)(i)}i=1
N
from {(x̂ t, θ̂)(i)}i=1
N
with weights wt(i) ∝ .
p ( yt g( x t(−i)1 ) , m (i) )
Storvik filter
For the class of state-space models where p(θ|xt, yt) can be rewritten as p(θ|st), where st is a low-
dimensional set of conditionally sufficient statistics and can be recursively computed via st = S(st−1,
xt, yt), Storvik (2002) (see also Fearnhead, 2002) proposed Algorithm 6 below. This algorithm can
be thought of as an extension of the bootstrap filter with the additional steps of sequentially updating
the sufficient statistics and sampling θ.

Algorithm 6: Storvik filter (SF)


1. Propagate {(x(i)t−1}i=1
N
to {(x̃ t(i)}i=1
N
via q(xt |xt−1, θ, yt). p ( yt x t(i), θ ) p ( x t(i) xt(−i)1, θ )
2. Resample {(xt, st−1) }i=1 from {(x̃ t, st−1)(i)}i=1
(i) N N
with weights wt(i) ∝ .
3. Compute sufficient statistics: st = S(st−1, xt, yt). q ( x t(i) xt(−i)1, θ , y t )
4. Sample θ from p(θ|st).

Example 2 (continued). The stochastic volatility model admits recursive sufficient statistics for
θ = (α, β, τ 2) when p(θ) is conditionally conjugate normal-inverse gamma. More precisely, when

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 179

(α, β|τ 2) ~ N (b0, τ 2B0) and τ 2 ~ IG(c0, d0), it is easy to see that (α, β|τ2, xt) ~ N(bt, τ 2Bt) and (τ 2|xt) ~
IG(ct, dt), where Bt−1 = B−1 −1 −1
t−1 + ztz′t, Bt bt = B t−1bt−1 + xtzt, ct = ct−1 + 1/2, dt = dt−1 + (xt − b′t zt)xt/2 + (bt−1 − bt)′
−1
Bt−1bt−1/2 and z′t = (1, xt−1). The recursive sufficient statistics are functions of xt−1, x2t−1, xt−1xt and x 2t.
All simulated exercises in Storvik (2002) are based on a blind propagation rule, i.e. q(xt |xt−1, θ,
yt) above is equal to p(xt|xt−1, θ). In this case, resampling is performed with weights wt ∝ p(yt |xt, θ).
Like any other PF with blind propagation, such as the bootstrap filter, this filter is bound to suffer
from particle degeneracy, which in turn directly compromises sequential parameter estimation.

Particle learning
Carvalho et al. (2010) present methods for sequential filtering, particle learning (PL) and smoothing
for rather general state-space models. They extend Chen and Liu’s (2000) mixture Kalman filter
(MKF) methods by allowing parameter learning and utilize a resample–propagate algorithm together
with a particle set that includes state-sufficient statistics. Recall the simulated exercise from the third
section that empirically shows that resample–propagate filters tend to outperform propagate–
resample ones. They also show via several simulation studies that PL outperforms both the LW
and Storvik filters and is comparable to MCMC samplers, even when full adaptation is considered.
The advantage is even more pronounced for large values of T.
Let st and sxt denote the parameter and state-sufficient statistics satisfying deterministic updating
rules st = S(st−1, xt, yt), as in the Storvik filter from the previous subsection, and sxt = K(st−1
x
, θ, yt), for
K(·) mimicking the Kalman filter recursions (see Example 3 below). Then PL can be described as
follows.

Algorithm 7: Particle learning (PL)


1. Resample (θ˜, s̃t−1
x
, s̃ t−1) from (θ, sxt−1, st−1) with weights wt ∝ p(yt|sxt−1, θ).
2. Sample xt from p(xt |s̃t−1 x
, θ˜, yt).
3. Update parameter-sufficient statistics: st = S(s̃ t−1, xt, yt).
4. Sample θ from p(θ|st).
5. Update state-sufficient statistics: sxt = K(s̃t−1 x
, θ, yt).

In many cases S will also be a function of xt−1 and possibly other lags of the state variable, such
as in the stochastic volatility model of Example 2. In these cases, the above algorithm is slightly
changed and particles for such lagged values are also carried over time. Therefore, step 2 is modified
to sample (xt−1, xt) from p(xt−1, xt|st−1
x
, θ, yt), which implies sampling xt−1 from p(xt−1|st−1
x
, θ, yt) and xt
from p(xt|xt−1, θ, y ).
t

Example 3: Conditional NDLM. Carvalho et al. (2010) derive the PL scheme for the class of
conditional NDLM defined by the observation and evolution equations that assume the form of a
linear system (see the NDLM equations (1) and (2)) conditional on an auxiliary state λt:

yt = Fλ′t xt + vt , vt ∼ N ( 0, σ λ2t ) (25)

xt = Gλt xt −1 + wt , wt ∼ N ( 0, τ λ2t ) (26)

with the quadruple {Fλt, Gλt, σ λ2t, τ λ2t} being a function of the static parameter vector θ. The marginal
distributions of the observation error and state shock distributions are any combination of normal, scale
mixture of normals, or discrete mixture of normals depending on the specification of the

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
180 H. F. Lopes and R. S. Tsay

distribution of the auxiliary state variable p(λt+1|θ) (Chen and Liu, 2000). Extensions to hidden Markov
specifications where λt+1 evolves according to p(λt+1|λt, θ) are discussed in Carvalho et al. (2010). In
the case where the auxiliary state variable λt is discrete, such as in stochastic volatility with jumps
models (Markovian or not), the state xt−1 can be analytically integrated out, in addition to xt and λt, at
the initial resampling step, i.e. p(yt|(λt−1, st−1x
, θ)(i)) = Σλtp(yt|λt, (st−1
x
, θ)(i))p(λt|(λt−1, θ)(i)), where the con-
x
ditional sufficient statistics for states (st ) and parameters (st) satisfy the deterministic updating rules
st = S(st−1, xt, λt, yt) and sxt = K(sxt−1, θ, λt, yt), where S(·) denotes, as defined previously, the recursive
update of the parameter sufficient statistics and K(·) denotes the Kalman filter recursions of the con-
ditional NDLM given in equations (3)–(8). The algorithm can be summarized as follows:

1. resample (λ̃ t−1, θ˜, s̃t−1


x
, s̃ t−1) from (λt−1, θ, sxt−1, st−1) with weights wt ∝ p(yt|λt−1, sxt−1, θ);
2. sample λt from p(λt |λ̃ t−1, θ˜, yt);
3. sample xt from p(xt |λt, s̃t−1 x
, θ˜, yt);
4. compute st = S(st−1, xt, λt, yt);
5. sample θ from p(θ|st); and
6. compute sxt = K(λt, st−1 x
, θ, yt).

In the case where the auxiliary state variable λt is continuous, the authors extend the above scheme
by adding to the current particle set a propagated particle λt+1 ~ p(λt+1 |(λt, θ)(i)).

PL in time series models


Successful implementations of PL (and hybrid versions of PL) have appeared over the last couple
of years. Rios and Lopes (2009), for example, propose a hybrid LW–Storvik filter for the Markov
switching stochastic volatility model that outperforms the Carvalho and Lopes (2007) filter. Lund
and Lopes (2009) sequentially estimate a regime-switching macro-finance model for the postwar US
term structure of interest rates, while Prado and Lopes (2009) adapt PL to study state-space autore-
gressive models with structured priors. Chen et al. (2009) propose a hybrid PL–LW sequential MC
algorithm that fully estimates nonlinear, non-normal dynamic to stochastic general equilibrium
models, with a particular application in a neoclassical growth model. Additionally, Dukić et al.
(2009) use PL to track influenza epidemics using Google trends data, while Lopes and Polson (2010)
use PL to estimate volatility and examine volatility dynamics for financial time series, such as the
S&P500 and the NDX100 indices, during the early part of the credit crisis.

Sequential Bayesian computation via PL


Lopes et al. (2010) develop a simulation-based approach to sequential Bayesian computation for
both dynamic and non-dynamic systems. They show through various important applications that PL
provides a simple yet powerful framework for efficient sequential posterior sampling strategies. For
example, Carvalho et al. (2009b) adapt PL to a rich and general class of mixture models that include
finite mixture models and Dirichlet process mixture models, as well as for the less common settings
of latent feature selection through an Indian Buffet process and dependent distribution tracking
through a probit stick-breaking model. Taddy et al. (2010) show that PL is the best alternative to
perform online posterior filtering of tree states in dynamic regression tree models, while Gramacy
and Polson (2010) use PL for online updating of Gaussian process models for regression and clas-
sification. Shi and Dunson (2009) adopt PL for stochastic variable selection and model search in
linear regression and probit models, while Mukherjee and West (2009) focus on model comparison
for applications in cellular dynamics in systems biology.

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 181

Example 4: Comparison between LW, Storvik and PL. We compare the performance of these
particle filters using the local level model of Example 1 with parameter learning, where the random
walk system equation is replaced by a first-order autoregression. More precisely, yt|xt, θ ~ N (xt, σ2)
and xt|xt−1, θ ~ N(α + βxt−1,τ2), where x0 ~ N(m0, C0) and θ = (α, β, τ 2, σ2). The prior distribution of
θ is p(θ) = p(σ2)p(τ2)p(α, β|τ 2), where σ2 ~ IG(n0/2, n0σ 20/2), τ2 ~ IG(␯0/2, ␯0τ2/2) and (α, β) ~ N(b0,
τ2B0). The recursive sufficient statistics for θ can easily be derived. It can be shown that β|(τ2, xt) ~
N(bt, τ 2Bt) and τ2|xt ~ IG(␯1/2, ␯1τ 21/2), where ␯1 = ␯0 + t, Bt−1 = B0−1 + Z′t Zt, Bt−1bt = B0−1b0 + ZtTzt, and
␯1τ 21 = ␯0τ 02 + (zt − Xtbt)′zt + (b0 − bt)′B0−1b0, for zt = (x1, . . . , xt)′, Zt = (1t, Z2t), Z2t = (x0, . . . , xt−1)′
and 1t a t-dimensional vector of ones. The quantities (nt, ntτ 2t, bt, Bt) can be rewritten recursively as
functions of (␯t−1, ␯t−1τ 2t−1, bt−1, Bt−1), xt−1, xt and yt. A time series of length T = 200 was simulated
using θ = (0.0, 0.9, 0.5, 1.0) and x0 = 0. The prior hyperparameters are m0 = 0, C0 = 10, b0 = (0.0,
0.9)′, B0 = I2, n0 = v0 = 10, τ 20 = 0.5 and σ 20 = 1.0, leading to relatively uninformative prior informa-
tion. The performance of the filters is assessed by running each algorithm for R = 100 times based
on N = 1000 particles. A very long PL (N = 100,000) is run to serve as a benchmark for comparison.
Let q(γ, α, t) be the 100αth percentile of p(γ|yt), where γ is an element of θ. We define the root mean
squared error as the square root of MSE(γ, α, f, t) = Σt,r[q(γ, α, t) − qfr(γ, α, t)]2/R for filter f in {LW,
STORVIK, PL} and replication r = 1, . . . , R. Finally, a full adaptation is implemented for the three
filters. In other words, LW differs from PL only through the sequential estimation of θ, Storvik
differs from PL only to the extent that Storvik propagates first and then resamples, while PL resa-
mples first and then propagates. Results are summarized in Figures 3 and 4. Both the Storvik filter
and PL are significantly better than the LW filter, while PL is moderately better than Storvik, par-
ticularly when estimating the pair (τ2, σ2).

Smoothing
In addition to delivering sequential filtering for parameters and states, particle filters can also be
implemented effectively when the main goal is smoothing the states conditional on the whole vector
of observations yT. In this sense, particle smoothers are alternatives to MCMC in state-space models
(Kitagawa, 1996). Godsill et al. (2004) introduced an algorithm that relies on (i) forward particle
sampling and (ii) backward particle resampling. Carvalho et al. (2010) extend the algorithm to
accommodate sequential learning of the parameter vector θ. In this more general case, it can be
shown that

{ }
T −1
p ( x T , θ yT ) ∏ p(x t xt +1, θ , y t ) p ( xT , θ yT ) (27)
t =1

whose components, by Bayes’ rule and conditional independence, are

p ( xt xt +1, θ , y t ) ∝ p ( xt +1 xt , θ ) p ( xt θ , y t ) (28)

This leads to a backward sampling algorithm that resamples forward particles xt from p(xt |θ, yt)
with weights proportional to p(xt+1 |xt, θ). More precisely, for each particle i, for i = 1, . . . , N, start
with (x̃ T, θ˜)(i) = (xT, θ)(i), i.e. a draw from p(xT, θ|yT). Then, for t = T − 1, . . . , 1, sample x̃ t(i) from
{xt(j)}Nj=1 with weights πt(j) ∝ p(x̃ t+1|xt(j), θ˜). In the end, (x̃ 1, . . . , x̃ T)(i) are draws from p(xT|yT) for i =
1, . . . , N. Note that the algorithm is O(TN2), so the computational time to obtain draws from p(xT|yT)
is expected to be much larger than the computational time to obtain draws from p(xt|yt) via standard
SMC filters for t = 1, . . . , T. See Briers et al. (2010) for an alternative O(TN2) SMC smoother for

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
182 H. F. Lopes and R. S. Tsay

Figure 3. Comparison between LWF, SF and PL. Percentiles of p(θ|yt) (2.5th, 50th and 97.5th) based on 100
replications of each particle filter with N = 1000 particles (gray lines). Black lines are based on PL and N =
100,000. Liu and West filter (left column), Storvik’s filter (center column) and particle learning (right column).
The row (from top to bottom) represents the components of θ = (α, β, τ2, σ2)

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
alpha beta

0.25
0.15

0.20
0.10

0.15

Root MSE
Root MSE

0.10
0.05

0.05
0.00
0.00

Copyright © 2010 John Wiley & Sons, Ltd.


2.5th 50th 97.5th 2.5th 50th 97.5th

Percentile Percentile

tau2 sigma2

1.2
LW

0.4
Storvik
1.0 PL

0.3
0.8
0.6

0.2

Root MSE
Root MSE
0.4

0.1
0.2

0.0
0.0

2.5th 50th 97.5th 2.5th 50th 97.5th


Particle Filters and Bayesian Inference

Percentile Percentile

DOI: 10.1002/for
J. Forecast. 30, 168–209 (2011)
183

Figure 4. Comparison between LWF, SF and PL. Root mean squared error of R = 100 replications for each filter. All filters are based on
N = 1000 particles and the root mean squared error is computed against a long PL run (N = 100,000)
184 H. F. Lopes and R. S. Tsay

the case where θ is known. An O(TN) smoothing algorithm has recently been introduced by
Fearnhead et al. (2008b) also for the case where θ is known. See also Douc et al. (2009a) for
additional FFBS particle approximations.

Example 4 (continued): comparison between PL and MCMC. In the case of pure filtering, i.e.
when the parameter vector θ = (α, β, τ2, σ2) is known and fixed, it is easy to see that both filtered
and smoothed distributions, p(xt |yt, θ) and p(xt |yT, θ), are available in closed form (see equations (5)
and (10), respectively). For example, Figure 5 shows that results of particle filtering and a smoothing

(a)
4

TRUE
OAPF
2
States

0
−2
−4

0 50 100 150 200

Time

(b)
4

MCMC
PL
2
0
−2
−4

0 50 100 150 200

Time

Figure 5. Particle smoothing. (a) Comparing the true filtered and smoothed distributions, p(xt |yt) and p(xt |yT),
respectively, with approximations based on N = 2000 particles from the OAPF. (b) Comparing the MCMC and
PL approximations to the filtered and smoothed distributions, p(xt |yt) and p(xt |yT). MCMC is based on
M = 2000 draws, after M0 = 10,000 as burn-in, while PL is based on N = 2000 particles

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 185

approximation based on the OAPF (N = 2000 particles) virtually match the true distributions. Figure
6 shows that both MCMC approximation (M = 2000 draws, after M0 = 10,000 as burn-in) and PL
approximation (N = 2000 particles) to p(α|yT), p(β|yT), p(τ2|yT) and p(σ2|yT) are virtually identical.
Computational cost is measured here in CPU seconds with N = 1000 and M0 = M = 1000 points.
FFBS is about one order of magnitude faster than PL for smoothing (34 s versus 233 s), but PL is
approximately three orders of magnitude faster than FFBS for filtering (2 s versus 3500 s).

Sequential model assessment


One of the direct benefits of particle filters is the simple approximation of marginal likelihoods and
Bayes factors. These tasks are usually rather involved under MCMC approximations, where comput-
ing marginal likelihoods is essentially an independent task in the MCMC paraphernalia. See Kass
and Raftery (1995), Han and Carlin (2000), Lopes and West (2004), Gamerman and Lopes (2006,
Ch. 7) and references therein for additional details on several MCMC-based algorithms for Bayesian
model assessment.
Even in the simple case of the NDLM in the second section above and equation (13), computing
p(y1, . . . , yT) is a non-trivial task. In this case, the simplest Monte Carlo solution is

p N ( yT ) =
1 N
( (i )
)
∑ p yT (σ 2, τ 2 ) ≈ p ( yT ) = ∫ p ( yT σ 2, τ 2 ) p (σ 2, τ 2 ) dσ 2 dτ 2
N i =1
(29)

where {(σ2, τ 2)(i)}i=1


N
is a random sample from the prior p(σ2, τ2). Despite its simplicity, this approxi-
mation is very unstable when the prior and the likelihood for (σ2, τ2) are moderately separated.
Moreover, the MC approximation deteriorates quickly for more general state-space models where
the dimension of the parameter space is likely to be greater than two. The sequential Monte Carlo
solution to this problem is rather straightforward, with equation (13) being approximated by

∏∑ p(y (x )
T N
1 (i )
p N ( yT ) = t , σ 2, τ 2 )
t −1 (30)
NT t =1 i =1

where {(xt−1, σ2, τ2)(i)}i=1


N
is the particle approximation to p(xt, σ2, τ2|yt−1) obtained from the LW filter,
Storvik’s filter or PL. See the stochastic volatility with Student-t errors below for the sequential
computation of posterior model probabilities.
There are contributions that explicitly deal with parameter, state and model uncertainties all
together via SMC methods. Fearnhead (2004), MacEachern et al. (1999) and Carvalho et al. (2009b)
use particle methods for general mixtures.

APPLICATIONS

In this section we apply particle filters to four time series problems that are of common interest in
many scientific areas. The first application revisits the stochastic volatility model of Example 2. The
second application is concerned with the Markov switching stochastic volatility model of Carvalho
and Lopes (2007). The last two applications illustrate the use of particle filters in modeling realized
volatilities and in estimating unemployment rates via a dynamic generalized linear model.

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
186 H. F. Lopes and R. S. Tsay

alpha alpha

8
6

6
MCMC

PL
4

4
2

2
0

0
−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2

beta beta
10

10
8

8
MCMC

6
PL
4

4
2

2
0

0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00

tau2 tau2
3

3
MCMC

2
PL
1

1
0

0.2 0.4 0.6 0.8 1.0 1.2 0.2 0.4 0.6 0.8 1.0 1.2

sigma2 sigma2
2.5
2.5

2.0
2.0

1.5
MCMC

1.5

PL

1.0
1.0

0.5
0.5
0.0

0.0

0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4 1.6

Figure 6. Parameter learning. MCMC (left column) and PL (right column) approximations to p(θ|yT). Rows
are the components of θ, i.e. α, β, τ2 and σ2. MCMC is based on M = 2000 draws, after M0 = 10,000 as burn-in,
while PL is based on N = 2000 particles

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 187

Dynamic beta regression


Da Silva et al. (2009) use a dynamic beta regression to analyze (via MCMC) the Brazilian monthly
unemployment rate from March 2002 to December 2009. More precisely, they model the unemploy-
ment rate at time t, namely yt, by

yt μt , φ ∼ beta (φμt , φ (1 − μt )) ( beta model )

μt−1 = 1 + exp {− βt } ( link function )

βt βt −1, W ∼ N ( βt −1, W ) ( transition )

for t = 1, . . . , T, β0 ~ N(m0, C0), φ ~ IG(a0, b0) and W ~ IG(c0, d0). The dynamic beta regression is
a special case of the dynamic generalized linear model (DGLM) of West et al. (1985), where the
observational distribution belongs to the exponential family. The data were downloaded from the
Brazilian Institute for Geography and Statistics (IBGE) site.1
We illustrate the computation of sequential Bayes factors via particle filters by comparing the beta
regression model to a simple local level model, i.e. yt |μt, σ2 ~ N(μt, σ2) and μt|μt−1, τ2 ~ N(μt−1, τ2),
where μ0 ~ N(m0, C0), σ2 ~ IG(a0, b0) and τ2 ~ IG(c0, d0) with given hyperparameters. The prior
hyperparameters were set at m0 = 0.1, C0 = 100, a0 = 2.1, b0 = (a0 + 1)0.00001, c0 = 2.1 and d0 = (c0
+ 1)0.00001, for the local level model and at m0 = log(y1/(1 − y1)), C0 = 0.1, a0 = 2.1, b0 = (a0 +
1)15,000, c0 = 2.1 and d0 = (c0 + 1)0.05, for the dynamic beta model. More general dynamics, such
as seasonality, could easily be included in both models with only slight modifications to the models
and particle filters. We ignore the seasonality here for simplicity.
Sequential inference for the local level model was performed by PL whereas that for the dynamic beta
regression was performed by the LWF. Results appear in Figure 7, while a Monte Carlo study is presented
in Figures 8 and 9. The estimations of μt under both models are relatively similar, with the sequential
Bayes factor slightly favoring the dynamic beta regression model. The Monte Carlo error associated with
the estimation of parameters and Bayes factors is relatively small. See Carvalho et al. (2009a) for more
details about PL for dynamic generalized linear models and dynamic discrete-choice models.

Stochastic volatility model with Student-t innovations


We revisit the simple SV model with normal innovations of Example 2 and compute sequential
Bayes factors against the alternative SV model with Student-t innovations.2 We use monthly log
returns of GE stock from January 1926 to December 1999 for 888 observations. This series was
analyzed in Example 12.6 of Tsay (2005, Ch. 12).3 The competing models are

Observation equation: yt ( xt , θ ) ∼ tη( 0, exp { xt }) ,

System equation: ( xt xt −1, θ ) ∼ N (α + β xt −1, τ 2 )

where tη (μ, σ2) denotes the Student-t distribution with η degrees of freedom, location μ and scale
σ2. The number of degrees of freedom η is treated as known. Sequential posterior inference is based
on the Liu and West filter with N = 100,000 particles. The shrinkage constant a is set at a = 0.95,
whereas prior hyperparameters are m0 = 0, C0 = 10, ␯0 = 3, τ 20 = 0.01, b0 = (0, 1)′ and B0 = 10I2.
1
[Link]
2
For additional early use of APF in SV models see Chib et al. (2002, 2006) and Omori et al. (2007).
3
The data are available at [Link]

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
188 H. F. Lopes and R. S. Tsay

(a) (b)

1.4
Local level
0.14

Dynamic Beta

1.3
0.12

Bayes factor

1.2
0.10

1.1
0.08

1.0
mar/03 jan/04 jan/06 jan/08 dec/09 mar/03 jan/04 jan/06 jan/08 dec/09

Months Months

(c) (d)
0.010

0.010
0.008

0.008
0.006
σ

0.006
0.004

0.004
0.002

0.002

mar/03 jan/04 jan/06 jan/08 dec/09 mar/03 jan/04 jan/06 jan/08 dec/09

Months Months

(e) (f)
0.7
0.005 0.010 0.015 0.020 0.025

0.6
0.5
2)
(1 (1 + φ))(1

0.4
W(1 2)

0.3
0.2
0.1
0.0

mar/03 jan/04 jan/06 jan/08 dec/09 mar/03 jan/04 jan/06 jan/08 dec/09

Months Months

Figure 7. Dynamic beta regression. (a) Sequential Monte Carlo (SMC) approximations for the median
and 95% credibility interval based on N = 10,000 particles for both models. (b) Sequential Bayes factor.
(c, d) Sequential parameter learning for σ and τ from the local level model. (e, f) Sequential parameter learning
for (1 + ø)−1/2 and W1/2 from the dynamic beta model
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 189

σ τ

0.012
0.014
0.012

0.010
0.010

0.008
0.008

0.006
0.006
0.004

0.004
0.002

0.002
mar/03 jan/04 jan/06 jan/08 dec/09 mar/03 jan/04 jan/06 jan/08 dec/09

(1 (1 + φ))(1 2) W(1 2)
0.035

1.2
0.030

1.0
0.025

0.8
0.020

0.6
0.015

0.4
0.010

0.2
0.005

0.0

mar/03 jan/04 jan/06 jan/08 dec/09 mar/03 jan/04 jan/06 jan/08 dec/09

Figure 8. Dynamic beta regression. A total of 20 replications of the SMC algorithm, each one based on
N = 10,000 particles. Top row: σ and τ from the local level model. Bottom row: (1 + ø)−1/2 and W1/2 from the
dynamic beta model

Particle approximation to the sequential posterior model probabilities, assuming a uniform prior for
η over models {t∞, t2, . . . , t20}, appears in Figure 10, where t∞ denotes the normal distribution.
Figure 10(d) shows percentiles of p(σt |yt) when integrating out over all competing models in {t∞,
t2, . . . , t20}. One can argue that the data slowly move over time from a more t-like, heavy-tail model

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
190 H. F. Lopes and R. S. Tsay

2.0
1.8
1.6
Bayes factor

1.4
1.2
1.0

mar/03 jan/04 jan/05 jan/06 jan/07 jan/08 jan/09 dec/09

Figure 9. Dynamic beta regression. A total of 20 replications of the SMC algorithm, each one based on
N = 10,000 particles

towards a more Gaussian, thin-tail model. Figures 11 and 12 present posterior summaries for the
volatilities and parameters of a few competing models.

Markov switching stochastic volatility model


Jumps have been intensively studied in financial data analysis; see, for example, Eraker et al. (2003).
So et al. (1998) suggest a model that allows for occasional discrete shifts in the parameter determin-
ing the level of the log-volatility through a Markovian process. They claim that this model not only
is a better way to explain volatility persistence but is also a tool to capture changes in economic
forces, as well as abrupt changes due to unusual market forces. Carvalho and Lopes (2007) adopt
the LWF when sequentially estimating univariate financial time series with an MSSV structure. Let
us call this filter the CL filter. For illustration, we consider an MSSV model with two regimes, i.e.
yt xt , θ ∼ N ( 0, exp ( xt )) ,

λt xt −1, st , θ ∼ N (α st + β xt −1, τ 2 )
where Pr(st = j|st−1 = i) = pij, for i, j = 1, 2, and parameters θ = (α1, α2, β, τ2, p11, p22). Rios and Lopes
(2009) propose an extension of the CL filter, which they named the extended LW (ELW) filter,
which combines features of the LW filter and PL. The simulation exercise from Figure 13 shows
that the CL filter degenerates after 500 observations, whereas the ELW filter never collapses.

Realized volatility
We consider the intradaily realized volatilities of Alcoa stock from 2 January 2003 to 7 May 2004
for 340 observations. The daily realized volatilities used are the sums of squares of intraday 5 min,
10 min and 20 min log returns measured in percentages; see Tsay (2005, Ch. 11). In what follows,

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 191

t=1 t=444

0.15
0.054
0.053

0.10
0.052
PMP

PMP
0.051

0.05
0.050
0.049

0.00
0.048

N 3 4 5 6 7 8 9 11 13 15 17 19 N 3 4 5 6 7 8 9 11 13 15 17 19

Degrees of freedom Degrees of freedom

t=666 t=888
0.15

0.15
0.10

0.10
PMP

PMP
0.05

0.05
0.00

0.00

N 3 4 5 6 7 8 9 11 13 15 17 19 N 3 4 5 6 7 8 9 11 13 15 17 19

Degrees of freedom Degrees of freedom

Figure 10. Stochastic volatility model. Sequential posterior model probability for the number of degrees of
freedom η

we use the logarithms of the daily realized volatilities. Figure 14 presents the time series of log
realized volatilities. As expected, all three series behave similarly over time and are highly positively
correlated, with the 5 and 10 min and 10 and 20 min realized volatilities more correlated than the 5
and 20 min ones. Table I shows summary statistics of the time series.

Two competing models


We entertain two models: (i) the three RV time series are modeled by independent univariate local
level models; and (ii) the trivariate vector of RV time series is modeled by a multivariate local level

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
192 H. F. Lopes and R. S. Tsay

(a) (b)

60
40

50
20

Standard deviation

40
Returns

30
0

20
−20

10
−40

0
0 200 400 600 800 0 200 400 600 800

Months Months

(c) (d)
60

60
50

50
Standard deviation

Standard deviation
40

40
30

30
20

20
10

10
0

0 200 400 600 800 0 200 400 600 800

Months Months

Figure 11. Stochastic volatility model. (a) GE returns. (b, c) 2.5th, 50th and 97.5th percentiles of p(σt |yt, M),
where σ 2t = exp{xt}, for M = t12 and M = t18, respectively. (d) 2.5th, 50th and 97.5th percentiles of p(σt |yt) by
integrating out over all competing models in {normal, t2, . . . , t20}

model. In the first model, say model M1, the i-minute log realized volatility yit, for i = 5, 10, 20, is
initially modeled by a local level model: (yit|xit, σ 2i ) ~ N(xit, σ 2i ) and (xit|xi,t−1, τ 2i ) ~ N(xi,t−1, τ 2i ). In
the second model, say model M2, the univariate local level model is extended to jointly model the
p = 3 time series of realized volatilities: (yt|xt, Σ) ~ N(1pxt, Σ) and (xt|xt−1, τ2)~ N(xt−1, τ2) and 1p is a
p-dimensional unity vector. The diagonal elements of the covariance matrix Σ are σ 2i , for i = 1, . . . ,
p, and the non-diagonal elements are σij, for i < j = 1, . . . , p.

Parameter learning
The variances σ 2i and τ 2i under M1 are, a priori, independent with σ 2i ~ IG(a0, b0), τ 2i ~ IG(c0, d0)
and hyperparameters a0 = c0 = 10, b0 = 1.1 and d0 = 0.55 common across i = 5, 10, 20. In this case
the prior mean and mode of σ 2i are 0.122 and 0.1, respectively, while its prior 95% credibility interval
is (0.064, 0.229). Similarly, the prior mean and mode of τ 2i are 0.061 and 0.05, respectively, while

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
p( α ) Normal t12 t18

0.12
0.8
0.8
0.8

0.10
0.6
0.6
0.6

0.08
0.4
0.4
0.4

α
α
α

0.06

Density
0.2
0.2
0.2

0.04
0.0
0.0
0.0

0.02
−0.2
−0.2
−0.2

0.00
−20 −10 0 10 20 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900
Months Months Months

p( β ) Normal t12 t18

0.10
1.00
1.00
1.00

Copyright © 2010 John Wiley & Sons, Ltd.


0.08
0.95
0.95
0.95

β
β
β

0.06

Density
0.90
0.90
0.90

0.04
0.02
0.85
0.85
0.85

0.00
−20 −10 0 10 20 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900

Months Months Months

2
p(τ ) Normal t12 t18

40
0.15
0.15
0.15

30
0.10
0.10
0.10

τ2
τ2
τ2

20

Density
0.05
0.05
0.05

10
0
0.00
0.00
0.00

0.00 0.05 0.10 0.15 0.20 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900
Particle Filters and Bayesian Inference

Months Months Months

DOI: 10.1002/for
J. Forecast. 30, 168–209 (2011)
193

Figure 12. Stochastic volatility model. Column 1: marginal prior distributions for α, β and τ2. Columns 2–4: sequential 2.5th, 50th and 97.5th
percentiles of p(γ|yt, M1), for γ in (α, β, τ2, M) and model M ∈ {normal, t12, t18}
194
H. F. Lopes and R. S. Tsay

Copyright © 2010 John Wiley & Sons, Ltd.


Figure 13. Markov switching stochastic volatility. Carvalho and Lopes’ (2007) filter (last two rows) and Rios and Lopes’ (2009) ELW filter (first
two rows). 2.5th, 50th and 97.5th percentiles of the marginal distribution of each parameter based on N = 5000 particles. The dotted lines are the
true values α1 = −2.5, α2 = −1.0, β = 0.5, τ2 = 1.0, p11 = 0.99 and p22 = 0.985

DOI: 10.1002/for
J. Forecast. 30, 168–209 (2011)
Particle Filters and Bayesian Inference 195

5−minute

4
3

3
10−minute
2

2
1

1
0

0
−1

−1
0 50 100 150 200 250 300 350 −1 0 1 2 3 4

Day 5−minute

10−minute
4

4
3

3
20−minute
2

2
1

1
0

0
−1

−1

0 50 100 150 200 250 300 350 −1 0 1 2 3 4

Day 5−minute

20−minute
4

4
3

3
20−minute
2

2
1

1
0

0
−1

−1

0 50 100 150 200 250 300 350 −1 0 1 2 3 4

Day 10−minute

Figure 14. Realized volatility. Log realized volatility of Alcoa stock based on the sum of squares of intraday
5 min, 10 min and 20 min log returns measured as a percentage

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
196 H. F. Lopes and R. S. Tsay

Table I. Summary statistics. Correlations (below main diagonal) and covariances (main diagonal and above)

RV Mean Median Skewness Kurtosis Correlations/covariances


5 min 10 min 20 min
5 min 0.992 0.977 1.091 5.479 0.314 0.270 0.258
10 min 0.913 0.871 0.153 0.769 0.857 0.317 0.307
20 min 0.850 0.847 0.034 0.843 0.732 0.865 0.396

its 95% credibility interval is (0.032, 0.115). Under model M2, τ2 ~ IG(c0, d0) and Σ ~ IW(␯0, S0)4.
When p = 1, σ2 ~ IG(␯0/2, S0/2), so we set ␯0 = 2a0 = 20 and S0 = 2b0Ip = 2.2Ip for comparison to
the univariate models. The prior mean and mode of Σ are 0.138Ip and 0.092Ip, respectively. The
parameter θ = (τ2, Σ) can be sampled from p(θ|st) = pIG(τ2; ct, dt)pIW (Σ; ␯t, St), where the recursive
sufficient statistics are ct = ct−1 + 1/2, dt = dt−1 + (xt − xt−1)2/2, ␯t = ␯t−1 + 1 and St = St−1 + (yt − 1p xt)
(yt − 1p xt)′.

State learning
Let us start by assuming that (xt−1|yt−1, θ) ~ N(mt−1, Ct−1), with x0 ~ N(m0, C0), θ = (Σ, τ2) and st−1 x
=
(mt−1, Ct−1). PL starts by resampling the particles {(θ, st−1, st−1) }i=1 with weights p(yt|st−1, θ) = pN(yt;
x (i) N x

1pmt−1, Qt), where Qt = RtDp + Σ, Dp = 1p1′p and Rt = Ct−1 + τ2. The state-sufficient statistics sxt−1 are
then propagated to st = (mt, Ct), where mt = (1 − At1p)mt−1 + Atyt, Ct = Rt −AtQtA′t and At = Rt1′pQt−1.
Since both xt−1 and xt are used in dt when sampling τ2 from IG(ct, dt), then (xt−1, xt) need to be

sampled from p(xt−1, xt|yt, θ) = pN(xt−1; g(yt), Vt)pN(xt; h(yt, xt−1), Ct), where Vt−1 = C−1 −1
t−1 + 1′pW 1p, g(yt)
−1 −1 – −1 −2 −1 – −2 −1
= Vt(Ct−1mt−1 + 1′pW yt), C t = τ + 1′pΣ 1p and h(yt, xt−1) = Ct(τ xt−1 + 1′pΣ yt), for W = τ2Dp + Σ.
Finally, marginal posterior inference for xt given (yt, θ) is more efficient if drawn from p(xt|yt, st−1 x
,
−1 −1 −1 −1 −1
θ) = pN(xt; m̃t, C̃ t), where C̃ t = Rt + 1′pΣ 1p and m̃t = C̃ t(Rt mt−1 + 1′pΣ yt).

Results
Figures 15–17 summarize the sequential learning of parameters and states for the univariate local
level model based on N = 10,000 particles, which is fairly large considering the sample size T =
340. Figures 18–21 summarize the results for the multivariate local model, also based on N = 10,000
particles. Figure 22 compares the sequential posterior medians for the latent states from the three
individual fits of model M1 against the multivariate fit of model M2. Note the shrinkage effect of
model M2, which provides a smoother point estimate for the latent state.

CONCLUDING REMARKS

In this paper we review particle filters, which are also known as sequential Monte Carlo (SMC)
methods. We argue that, after almost two decades since the seminal paper of Gordon et al. (1993),
SMC methods now belong in the toolbox of researchers and practitioners in many areas of modern

ν 0 + p +1
Σ is inverse-Wishart with parameters ␯0 and S0 and density p ( Σ; ν 0, S0 ) ∝ Σ 2 exp {−0.5tr ( S0 Σ −1 )}, for ␯0 > p − 1,
4 −

S0 > 0 (positive definite) and trΣ = σ 21 + . . . + σ p2. The mean and the mode of Σ are S0/(␯0 − p − 1) and S0/(␯0 + p + 1),
respectively. Its inverse Σ−1 is Wishart with the same parameters and denoted by Σ−1 ~ W(␯0, S0). The mean and the mode
of Σ−1 are ␯0S−1 −1
0 and (␯0 − p − 1)S0 (␯0 ≥ p + 1), respectively.

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 197

Figure 15. Realized volatility. 2.5th, 50th and 97.5th percentiles of p(σ 2i |yit), p(τ 2i |y it) and p(xit|y it ), where yi is
i-minute log realized volatilities, for i = 5, 10, 20

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
198 H. F. Lopes and R. S. Tsay

5−minute 5−minute

25
20

60
15

40
10

20
5
0

0
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.02 0.03 0.04 0.05 0.06 0.07 0.08

σ2 τ2

10−minute 10−minute
25
20

60
15

40
10

20
5
0

0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.02 0.03 0.04 0.05 0.06 0.07 0.08

σ2 τ2

20−minute 20−minute
25
20

60
15

40
10

20
5
0

0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.02 0.03 0.04 0.05 0.06 0.07 0.08

σ2 τ2

Figure 16. Realized volatility. Histogram approximations to p(σ 2i |yT) and p(τ 2i |yT), for i = 5, 10, 20

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 199

5−minute

2
1
0

0 50 100 150 200 250 300

Time

10−minute
2
1
0

0 50 100 150 200 250 300

Time

20−minute
2
1
0

0 50 100 150 200 250 300

Time

Figure 17. Realized volatility. 2.5th, 50th and 97.5th percentiles of p(xit|yit), where yi is i-minute log realized
volatilities, for i = 5, 10, 20

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
200 H. F. Lopes and R. S. Tsay

Figure 18. Realized volatility. Sequential 2.5th, 50th and 97.5th percentiles for the unique components of Σ
and for the correlations

science, ranging from signal processing and target tracking to robotics, bioinformatics and financial
econometrics. This paper focuses on the latter, with five demonstrations.
Besides the references of PF in financial econometrics cited above, some additional ones
are Johannes et al. (2008) on predictive regressions and optimal portfolio allocation, Raggi and
Bordignon (2006), Jasra et al. (2008), Li et al. (2008), Creal (2008) and Li (2009) on Lévy-type
SV models, and Johannes et al. (2009) on extracting latent jump diffusions from asset prices.
See also Fearnhead and Meligkotsidou (2004) and Fearnhead et al. (2008a) for particle filters in

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 201

(a)

0.16
0.14
0.12
0.10
τ2

0.08
0.06
0.04
0.02

0 50 100 150 200 250 300 350

Time

(b)
80
60
40
20
0

0.03 0.04 0.05 0.06 0.07

(c)
2
xt

1
0

0 50 100 150 200 250 300 350

Figure 19. Realized volatility. (a) Sequential 2.5th, 50th and 97.5th percentiles of p(τ2|yt). (b) Histogram
approximation to p(τ2|yT). (c) Sequential 2.5th, 50th and 97.5th percentiles of p(xt |yt)

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
202 H. F. Lopes and R. S. Tsay

τ2

100
80
60
40
20
0

0.02 0.03 0.04 0.05 0.06 0.07

σ2
30

5m
25

10m
20

20m
15
10
5
0

0.10 0.15 0.20 0.25 0.30 0.35 0.40

Figure 20. Realized volatility. Approximated posterior distributions of τ 2i and σ 2i . Univariate local level models
(grey lines) and multivariate local level model (black lines), for i = 5, 10, 20 (solid, dashed and dotted lines)

partially observed continuous-time models and diffusions. PF for jump Markov systems are studied
by Doucet et al. (2001b) and Andrieu et al. (2003).

ACKNOWLEDGEMENTS

We would like to express our sincere appreciation of the achievements of Professor Rudolf E.
Kalman. His studies have led to many new developments in scientific computing, statistical infer-
ence, and applications. Particle filters are one more example that will have a long-lasting impact in
our profession.

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 203

1.0

15
0.8
0.6

10
ρ12

0.4

5
0.2
0.0

0
0 50 100 150 200 250 300 350 0.0 0.2 0.4 0.6 0.8 1.0

Time ρ12
1.0

15
0.8
0.6

10
ρ13

0.4

5
0.2
0.0

0 50 100 150 200 250 300 350 0.0 0.2 0.4 0.6 0.8 1.0

Time ρ13
1.0

15
0.8
0.6

10
ρ23

0.4

5
0.2
0.0

0 50 100 150 200 250 300 350 0.0 0.2 0.4 0.6 0.8 1.0

Time ρ23

Figure 21. Realized volatility. Sequential learning of correlations. Sequential 2.5th, 50th and 97.5th percentiles
of p(ρij |yt) (left column). Histogram approximation to p(ρij |yT) (right column)

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
204 H. F. Lopes and R. S. Tsay

5−minute
10−minute
2.0

20−minute
multivariate
1.5
xt

1.0
0.5
0.0

1 43 85 127 170

Time (first half)


1.5
1.0
xt

0.5
0.0

171 213 255 297 340

Time (second half)

Figure 22. Realized volatility. Sequential 50th percentiles of p(xit |yt, M1) for i = 5, 10, 20 and p(xt |y t, M2),
where M1 is the univariate local level model and M2 its multivariate version

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 205

REFERENCES

Alspach DL, Sorenson HW. 1972. Nonlinear Bayesian estimation using Gaussian sum approximation. IEEE
Transactions on Automatic Control 17: 439–448.
Andrieu C, Doucet A. 2002. Particle filtering for partially observed Gaussian state space models. Journal of the
Royal Statistical Society, Series B 64: 827–836.
Andrieu C, Davy M, Doucet A. 2003. Efficient particle filtering for jump Markov systems: application to time-
varying autoregressions. IEEE Transactions on Signal Processing 51: 1762–1770.
Andrieu C, Doucet A, Singh SS, Tadić VB. 2004. Particle methods for change detection, system identification,
and control. Proceedings of the IEEE 92: 423–438.
Andrieu C, Doucet A, Tadić VB. 2005. On-line parameter estimation in general state-space models. In Proceed-
ings of the 44th Conference on Decision and Control; 332–337.
Andrieu C, Doucet A, Holenstein R. 2010. Particle Markov chain Monte Carlo (with discussion). Journal of the
Royal Statistical Society, Series B 72: 269–342.
Arulampalam MS, Maskell S, Gordon N, Clapp T. 2002. A tutorial on particle filters for on-line nonlinear/non-
Gaussian Bayesian tracking. IEEE Transactions on Signal Processing 50: 174–188.
Briers M, Doucet A, Maskell S. 2010. Smoothing algorithms for state-space models. Annals of the Institute
Statistical Mathematics 62: 61–89.
Cappé O, Godsill S, Moulines E. 2007. An overview of existing methods and recent advances in sequential Monte
Carlo. IEEE Proceedings in Signal Processing 95: 899–924.
Carlin BP, Polson NG, Stoffer DS. 1992. A Monte Carlo approach to nonnormal and nonlinear state-space model-
ing. Journal of the American Statistical Association 87: 493–500.
Carpenter J, Clifford P, Fearnhead P. 1999. An improved particle filter for non-linear problems. IEE Proceedings
on Radar, Sonar and Navigation 146: 2–7.
Carter CK, Kohn R. 1994. On Gibbs sampling for state space models. Biometrika 81: 541–553.
Carvalho CM, Lopes HF. 2007. Simulation-based sequential analysis of Markov switching stochastic volatility
models. Computational Statistics and Data Analysis 51: 4526–4542.
Carvalho CM, Lopes HF, Polson N. 2009a. Particle learning for generalized dynamic conditionally linear models.
Working paper, University of Chicago Booth School of Business.
Carvalho CM, Lopes HF, Polson NG, Taddy M. 2009b. Particle learning for general mixtures. Working paper,
University of Chicago Booth School of Business.
Carvalho CM, Johannes M, Lopes HF, Polson N. 2010. Particle learning and smoothing. Statistical Science (to
appear).
Chen H, Petralia F, Lopes HF. 2009. Sequential Monte Carlo estimation of DSGE models. Working paper,
University of Chicago Booth School of Business.
Chen R, Liu JS. 2000. Mixture Kalman filter. Journal of the Royal Statistical Society, Series B 62: 493–
508.
Chen Y, Lai TL. 2007. Identification and adaptive control of change-point ARX models via Rao–Blackwellized
particle filters. IEEE Transactions on Automatic Control 52: 67–72.
Chen Z. 2003. Bayesian filtering: from Kalman filters to particle filters, and beyond. Working paper, McMaster
University, Canada.
Chib S, Nardari F, Shephard N. 2002. Markov Chain Monte Carlo Methods for Stochastic Volatility Models.
Journal of Econometrics 108: 281–316.
Chib S, Nardari F, Shephard N. 2006. Analysis of High Dimensional Multivariate Stochastic Volatility Models.
Journal of Econometrics 134: 341–371.
Chopin N. 2002. A sequential particle filter method for static models. Biometrika 89: 539–552.
Creal D. 2008. Analysis of filtering and smoothing algorithms for Lévy-driven stochastic volatility models.
Computational Statistics and Data Analysis 52: 2863–2876.
Da Silva CQ, Migon HS, Correira LT. 2009. Bayesian beta dynamic model and applications. Working paper,
Department of Statistics, Federal University of Rio de Janeiro.
DeJong DN, Dharmarajan H, Liesenfeld R, Moura GV, Richard J-F. 2009. Efficient likelihood evaluation of state-
space representations. Working paper, Department of Economics, University of Pittsburgh.
Del Moral P, Doucet A, Jasra A. 2006. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society,
Series B 68: 411–436.

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
206 H. F. Lopes and R. S. Tsay

Douc R, Garivier E, Moulines E, Olsson J. 2009a. On the forward filtering backward smoothing particle
approximations of the smoothing distribution in general state space models. Working paper, Institut Télécom,
Paris.
Douc R, Moulines E, Olsson J. 2009b. Optimality of the auxiliary particle filter. Probability and Mathematical
Statistics 29: 1–28.
Doucet A, Johansen A. 2008. A note on auxiliary particle filters. Statistics and Probability Letters 78:
1498–1504.
Doucet A, Johansen A. 2010. A tutorial on particle filtering and smoothing: fifteen years later. In Handbook of
Nonlinear Filtering, Crisan D, Rozovsky B (eds). Oxford University Press: Oxford.
Doucet A, Tadić VB. 2003. Parameter estimation in general state-space models using particle methods. Annals of
the Institute of Statistical Mathematics 55: 409–422.
Doucet A, Godsill S, Andrieu C. 2000. On sequential Monte-Carlo sampling methods for Bayesian filtering.
Statistics and Computing 10: 197–208.
Doucet A, De Freitas N, Gordon N. 2001a. Sequential Monte Carlo Methods in Practice. Springer: New
York.
Doucet A, Gordon NJ, Krishnamurthy V. 2001b. Particle filters for state estimation of jump Markov linear systems.
IEEE Transactions on Signal Processing 49: 613–624.
Dukić V, Lopes HF, Polson NG. 2009. Tracking flu epidemics using Google trends and particle learning. Working
paper, University of Chicago Booth School of Business.
Eraker B, Johannes MS, Polson NG. 2003. The impact of jumps in volatility and returns. Journal of Finance 58:
1269–1300.
Fearnhead P. 2002. Markov chain Monte Carlo, sufficient statistics and particle filter. Journal of Computational
and Graphical Statistics 11: 848–862.
Fearnhead P. 2004. Particle filters for mixture models with an unknown number of components. Statistics and
Computing 14: 11–21.
Fearnhead P, Clifford P. 2003. Online inference for hidden Markov models via particle filters. Journal of the Royal
Statistical Society, Series B 65: 887–899.
Fearnhead P, Meligkotsidou L. 2004. Exact filtering for partially-observed continuoustime models. Journal of the
Royal Statistical Society, Series B 66: 771–789.
Fearnhead P, Papaspiliopoulos O, Roberts GO. 2008a. Particle filters for partially observed diffusions. Journal of
the Royal Statistical Society, Series B 70: 755–777.
Fearnhead P, Wyncoll D, Tawn J. 2008b. A sequential smoothing algorithm with linear computational cost.
Working paper, Department of Mathematics and Statistics, Lancaster University.
Fernández-Villaverde J, Rubio-Ramírez JF. 2005. Estimating dynamic equilibrium economies: linear versus non-
linear likelihood. Journal of Applied Econometrics 20: 891–910.
Fernández-Villaverde J, Rubio-Ramírez JF. 2007. Estimating macroeconomic models: a likelihood approach.
Review of Economic Studies 74: 1059–1087.
Frühwirth-Schnatter S. 1994. Data augmentation and dynamic linear models. Journal of Time Series Analysis 15:
183–202.
Gamerman D, Lopes HF. 2006. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference.
Chapman & Hall/CRC: Boca Raton, FL.
Gilks WR, Berzuini C. 2001. Following a moving target: MonteCarlo inference for dynamic Bayesian models.
Journal of the Royal Statistical Society, Series B 63: 127–146.
Gordon N, Salmond D, Smith AFM. 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation.
IEE Proceedings F. Radar Signal Process 140: 107–113.
Godsill SJ, Doucet A, West M. 2004. Monte Carlo smoothing for non-linear time series. Journal of the American
Statistical Association 99: 156–168.
Gramacy R, Polson NG. 2010. Particle learning of Gaussian process models for sequential design and optimiza-
tion. Working paper, University of Chicago Booth School of Business.
Guo D, Wang X, Chen R. 2005. New sequential Monte Carlo methods for nonlinear dynamic systems. Statistics
and Computing 15: 135–147.
Han C, Carlin BP. 2000. MCMC Methods for computing Bayes factors: a comparative review. Journal of the
American Statistical Association 96: 1122–1132.
Harvey AC. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University
Press: Cambridge, CA.

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 207

Ito K, Xiong K. 2000. Gaussian filters for nonlinear filtering problems. IEEE Transactions on Automatic Control
45: 910–927.
Jacquier E, Polson NG, Rossi PE. 1994. Bayesian analysis of stochastic volatility models. Journal of Business
and Economic Statistics 20: 69–87.
Jasra A, Stephens DA, Doucet A, Tsagaris T. 2008. Inference for Lévy-driven stochastic volatility models via
adaptive sequential Monte Carlo. Working paper, Institute for Statistical Mathematics, Tokyo, Japan.
Jazwinski A. 1970. Stochastic Processes and Filtering Theory. Academic Press: New York.
Johannes M, Polson P. 2009. Particle filtering. In Handbook of Financial Time Series, Andersen TG, Davis RA,
Kreiss J-P, Mikosch T (eds). Springer: Berlin; 1115–1130.
Johannes M, Korteweg A, Polson NG. 2008. Sequential learning, predictive regressions, and optimal portfolio
returns. Working paper, Graduate School of Business, Stanford University.
Johannes MS, Polson NG, Stroud JR. 2009. Optimal filtering of jump diffusions: extracting latent states from asset
prices. Review of Financial Studies 22: 2559–2599.
Julier SJ, Uhlmann JK. 1997. A new extension of the Kalman filter to nonlinear systems. In Proceedings of Aero-
Sense: 11th International Symposium on Aerospace, Defense Sensing, Simulation and Controls, no. 3068;
182–193.
Kass RE, Raftery A. 1995. Bayes factors. Journal of the American Statistical Association 90: 773–795.
Kim S, Shephard N, Chib S. 1998. Stochastic volatility: likelihood inference and comparison with ARCH models.
Review of Economic Studies 65: 361–393.
Kitagawa G. 1996. Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal of
Computational and Graphical Statistics 5: 1–25.
Kong A, Liu JS, Wong W. 1994. Sequential imputation and Bayesian missing data problems. Journal of the
American Statistical Association 89: 590–599.
Li H. 2009. Sequential Bayesian analysis of time-changed infinite activity derivatives pricing models. Working
paper, ESSEC Business School, Paris/Singapore.
Li H, Wells MT, Yu CL. 2008. A Bayesian analysis of return dynamics with Lévy jumps. Review of Financial
Studies 21: 2345–2378.
Liu JS. 2001. Monte Carlo Strategies in Scientific Computing. Springer: New York.
Liu J, Chen R. 1995. Blind deconvolution via sequential imputations. Journal of the American Statistical Associa-
tion 90: 567–576.
Liu J, Chen R. 1998. Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical
Association 93: 1032–1044.
Liu J, West M. 2001. Combined parameters and state estimation in simulationbased filtering. In Sequential
Monte Carlo Methods in Practice, Doucet A, de Freitas N, Gordon N (eds). Springer: New York; 197–
223.
Lopes HF, Polson NG. 2010. Extracting SP500 and NASDAQ volatility: the credit crisis of 2007–2008. In The
Oxford Handbook of Applied Bayesian Analysis, O’Hagan A, West M (eds). Oxford University Press: Oxford;
319–342.
Lopes HF, West M. 2004. Bayesian model assessment in factor analysis. Statistica Sinica 14: 41–67.
Lopes HF, Carvalho CM, Johannes M, Polson NG. 2010. Particle learning for sequential Bayesian computation.
In Bayesian Statistics 9, Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West
M (eds). Oxford University Press: Oxford (to appear).
Lund B, Lopes HF. 2009. Learning in a regime switching macro-finance model for the term structure. Working
paper, University of Chicago Booth School of Business.
MacEachern SN, Clyde MA, Liu JS. 1999. Sequential importance sampling for nonparametric Bayes models: the
next generation. Canadian Journal of Statistics 27: 251–267.
Mukherjee C, West M. 2009. Sequential Monte Carlo in model comparison: example in cellular dynamics in
systems biology. Working paper, Department of Statistical Science, Duke University.
Olsson J, Cappé O, Douc R, Moulines E. 2008. Sequential Monte Carlo smoothing with application to parameter
estimation in non-linear state space models. Bernoulli 14: 155–179.
Omori Y, Chib S, Shephard N, Nakajima J. 2007. Stochastic Volatility with Leverage: Fast and Efficient Likeli-
hood Inference. Journal of Econometrics 140: 425–449.
Petris G, Petrone S, Campagnoli P. 2009. Dynamic Linear Models with R. Springer: New York.
Pitt MK, Shephard N. 1999. Filtering via simulation: auxiliary particle filters. Journal of the American Statistical
Association 94: 590–599.

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
208 H. F. Lopes and R. S. Tsay

Polson NG, Stroud JR, Müller P. 2008. Practical filtering with sequential parameter learning. Journal of the Royal
Statistical Society, Series B 70: 413–428.
Poyiadjis G, Doucet A, Singh SS. 2005. Particle methods for optimal filter derivative: application to parameter
estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing,
Vol. 5, v/925–v/928.
Prado R, Lopes HF. 2009. Sequential parameter learning and filtering in structured AR models. Working paper,
University of Chicago Booth School of Business.
Prado R, West M. 2010. Time Series: Modelling, Computation and Inference. Chapman & Hall/CRC Press: Baton
Rouge.
Raggi D, Bordignon S. 2006. Sequential Monte Carlo methods for stochastic volatility models with jumps. Working
paper, Department of Economics, University of Bologna.
Reis EA, Salazar E, Gamerman D. 2006. Comparison of sampling schemes for dynamic linear models. Interna-
tional Statistical Review 74: 203–214.
Rios MP, Lopes HF. 2009. Sequential parameter estimation in stochastic volatility models. Working paper, Uni-
versity of Chicago Booth School of Business.
Ristic B, Arulampalam S, Gordon N. 2004. Beyond the Kalman Filter: Particle Filters for Tracking Applications.
Artech House Radar Library: Norwood, MA.
Shephard N. 1994. Partial non-Gaussian state space. Biometrika 81: 115–131.
Shi M, Dunson DB. 2009. Bayesian variable selection via particle stochastic search. Working paper, Department
of Statistical Science, Duke University.
Smith AFM, Gelfand AE. 1992. Bayesian statistics without tears: a sampling–resampling perspective. American
Statistician 46: 84–88.
So MKP, Lam K, Li WK. 1998. A stochastic volatility model with Markov switching. Journal of Business and
Economic Statistics 16: 244–253.
Storvik G. 2002. Particle filters for state-space models with the presence of unknown static parameters. IEEE
Transactions on Signal Processing 50: 281–289.
Taddy M, Gramacy R, Polson NG. 2010. Dynamic trees for learning and design. Working paper, University of
Chicago Booth School of Business.
Tsay RS. 2005. Analysis of Financial Time Series (2nd edn). Wiley: New York.
Van der Merwe R, Doucet D, De Freitas N, Wan E. 2000. The unscented particle filter. In Advances in Neural
Information Processing Systems, Vol. 13. Leen TK, Dietterich TG, Tresp V (eds). MIT Press: Cambridge, MA;
584–590.
West M. 1992. Modelling with mixtures. In Bayesian Statistics 4, Bernardo JM, Berger JO, Dawid AP, Smith
AFM (eds). Oxford University Press: Oxford; 503–524.
West W. 1993a. Approximating posterior distributions by mixtures. Journal of the Royal Statistical Society, Series
B 54: 553–568.
West M. 1993b. Mixture models, Monte Carlo, Bayesian updating and dynamic models. Computing Science and
Statistics 24: 325–333.
West M, Harrison J. 1997. Bayesian Forecasting and Dynamic Models (2nd edn). Springer: New York.
West M, Harrison J, Migon H. 1985. Dynamic generalized linear models and Bayesian forecasting. Journal of the
American Statistical Association 80: 73–83.

Authors’ biographies:
Hedibert F. Lopes is Associate Professor of Econometrics and Statistics, Booth School of Business, University
of Chicago. Recent publications include work on Bayesian inference; dynamic models; Markov Chain Monte
Carlo and sequential Monte Carlo methods; modeling time-varying covariance through latent factor analysis,
Cholesky decompositions and other factorizations. Areas of application include economics, finance, biology and
natural and social sciences.

Ruey S. Tsay is the H. G. B. Alexander Professor of Econometrics and Statistics of Chicago Booth. He earned
his Ph.D. in statistics from the University of Wisconsin-Madison in 1982 and joined Chicago Booth in 1989. He
is a fellow of the American Statistical Association, the Institute of Mathematical Statistics, the Royal Statistical
Society, and Academia Sinica. His research focuses on business and economic forecasting, high-dimensional data

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 209

analysis, risk management, and statistical inference. He has published extensively in leading statistical, economet-
ric, and finance journals.

Authors’ addresses:
Hedibert F. Lopes and Ruey S. Tsay, University of Chicago Booth School of Business, 5807 South Woodlawn
Avenue, Chicago, IL 60637, USA.

Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for

You might also like