Lopes 2010
Lopes 2010
ABSTRACT
In this paper we review sequential Monte Carlo (SMC) methods, or particle
filters (PF), with special emphasis on its potential applications in financial time
series analysis and econometrics. We start with the well-known normal dynamic
linear model, also known as the normal linear state space model, for which
sequential state learning is available in closed form via standard Kalman filter
and Kalman smoother recursions. Particle filters are then introduced as a set of
Monte Carlo schemes that enable Kalman-type recursions when normality or
linearity or both are abandoned. The seminal bootstrap filter (BF) of Gordon,
Salmond and Smith (1993) is used to introduce the SMC jargon, potentials and
limitations. We also review the literature on parameter learning, an area that
started to attract much attention from the particle filter community in recent years.
We give particular attention to the Liu–West filter (2001), Storvik filter (2002)
and particle learning (PL) of Carvalho, Johannes, Lopes and Polson (2010). We
argue that the BF and the auxiliary particle filter (APF) of Pitt and Shephard
(1999) define two fundamentally distinct directions within the particle filter lit-
erature. We also show that the distinction is more pronounced with parameter
learning and argue that PL, which follows the APF direction, is an attractive
extension. One of our contributions is to sort out the research from BF to APF
(during the 1990s), from APF to now (the 2000s) and from Liu–West filter to
Storvik filter to PL. To this end, we provide code in R for all the examples of
the paper. Readers are invited to find their own way into this dynamic and active
research arena. Copyright © 2010 John Wiley & Sons, Ltd.
key words particle learning; sequential Monte Carlo; Markov chain Monte
Carlo; stochastic volatility; realized volatility; Nelson–Siegel
model
INTRODUCTION
The Kalman filter (KF) and its many variants and generalizations have played a fundamental role
in modern time series analysis by allowing the study and estimation of complex dynamics and by
drawing the attention of researchers and practitioners to the rich class of state-space models, also
known as dynamic models (Harvey, 1989; West and Harrison, 1997). Well-known and widely used
* Correspondence to: Hedibert F. Lopes, University of Chicago Booth School of Business, 5807 South Woodlawn Avenue,
Chicago, IL 60637, USA. E-mail: hlopes@[Link]
variants of KF include (i) the extended KF (Jazwinski, 1970; West et al., 1985); (ii) the
Gaussian sum filter (Alspach and Sorenson, 1972); (iii) the unscented KF (Julier and Uhlmann, 1997;
Van der Merwe et al., 2000); and (iv) the Gaussian quadrature KF (Ito and Xiong, 2000).
Despite their wide applicability, approximations provided by these variants become less effective
when substantial nonlinearities and/or extreme non-Gaussianity are present in the data. To overcome
the difficulty, the last two decades have been exposed to an increasing number of Monte Carlo (MC)-
based approximations for state-space models. These MC methods are basically divided into two major
categories: Markov chain Monte Carlo (MCMC) schemes for offline/batch sampling and sequential
Monte Carlo (SMC) schemes for online/sequential sampling. For example, Carlin et al. (1992), Carter
and Kohn (1994), Frühwirth-Schnatter (1994) and Shephard (1994) propose MCMC methods to esti-
mate general state-space models. However, MCMC-based algorithms are prohibitively costly when
performing online estimation of states and parameters; see Gamerman and Lopes (2006).
SMC methods, also known as particle filters, are MC schemes that, when used in the state-space
context, rebalance draws from the posterior distribution of the states and parameters at a given time
(the particles) based on the next observation via its likelihood. In their seminal paper, Gordon et al.
(1993) propose one of the most popular filters, the bootstrap filter (BF), which is based on a sampling
importance resampling (SIR) argument (Smith and Gelfand, 1992). Also influential from the start
are the works on sequential Bayesian imputation by Kong et al. (1994) and Liu and Chen (1995).
In this paper we review the bootstrap filter and its variants. We also introduce the auxiliary particle
filter (APF) of Pitt and Shephard (1999) (see also the discussion in Liu and Chen, 1998) and argue
that both filters define two directions within the SMC literature, namely sample–resample and resa-
mple–sample methods. This is done in the third section, which ends with a list of additional review
papers and books on SMC. The fourth section starts by showing how both BF and APF can be used
to approximate the likelihood function of fixed parameters. We then introduce the Liu and West
filter (Liu and West, 2001), which generalizes APF to sequentially update the posterior distributions
of parameters. The section also introduces the particle learning (PL) of Carvalho et al. (2010). Some
illustrative examples appear in the fifth section. Final remarks and current research directions are
presented in the sixth section.
To introduce the ideas of the particle filter, let us start with the well-known normal dynamic linear
model (NDLM):
yt = Ft ′xt + vt (1)
xt = Gt xt −1 + wt (2)
where vt and wt are temporally and mutually independent Gaussian sequences with zero mean and
variances σ t2 and τ 2t , respectively. Equation (1) is referred to as the observation equation that relates
the observed series yt to the state vector xt. Equation (2) is the state transition equation that governs
the time evolution of the state, which might be latent. The local level and the local linear trend
models are special cases of the NDLM. In the local level model, yt = xt + vt and xt = xt−1 + wt, so that
Ft = Gt = 1, σ t2 = σ2 and τ 2t = τ2 for all t. In the local linear trend model, yt = x1t + vt, x1t = x1,t−1 +
x2,t−1 + w1t and x2t = x2,t−1 + w2t, and we have xt = (x1t, x2t)′, Ft = (1, 0)′, G = (g1, g2), g1 = (1, 0)′, g2 =
(1, 1)′, σ t2 = σ2 and τ 2t = τ for all t, where τ is a 2 × 2 positive definite matrix.
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
170 H. F. Lopes and R. S. Tsay
Conditionally on the quadruple {Ft, Gt, σ 2t, τ 2t}, for t = 1, . . . , T, and on the initial distribution
(x0|y0) ~ N (m0, C0), it is straightforward to show that
xt y t −1 ∼ N ( at , Rt ) (3)
yt y t −1 ∼ N ( ft , Qt ) (4)
xt y t ∼ N ( mt , Ct ) (5)
for t = 1, . . . , T, where yt = (y1, . . . , yt)′ and N(a, b) denotes the normal distribution with mean a
and variance b. The three densities in equations (3)–(5) are referred to as the propagation density,
predictive density and filtering density, respectively. In fact, the propagation and filtering densities
are the prior density of xt given yt−1 and the posterior density of xt given yt. The means and variances
of the three densities are provided by the Kalman recursions:
where et = yt − ft is the prediction error and At = RtFtQt−1 is the Kalman gain. Two other useful densi-
ties are the conditional and marginal smoothed densities:
xt xt +1, yt ∼ N ( ht , Ht ) (9)
where
and Bt = CtG′t+1R −1
t+1. See West and Harrison (1997, Ch. 4) for additional details.
An important and rich subclass of the NDLM assumes that Ft and Gt are both known, while σ t2 =
σ2 and τ t2 = τ2 are both unknown variances. In this case, the above Kalman recursions can be used
to marginalize out the states based on equation (4), i.e.,
T
p ( yT σ 2, r 2 ) = ∏ f ( yt ; ft , Qt ) (13)
t =1
where f(x; μ, σ2) is the density of a normal random variable with mean μ and variance σ2 evaluated
at x. Note that here ft and Qt are both nonlinear functions of (σ2, τ2). In other words, should the main
objective be sampling from p(xT, σ2, τ2|yT), then draws can be obtained in two steps:
1. Draw (σ2, τ2) from p(σ2, τ2|yT), which is proportional to the prior p(σ2, τ2) times the likelihood
from equation (12).
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 171
2. Draw xT from p(xT|σ2, τ2) by first computing forward moments via equations (6)–(8) and (11),
and then sampling backward xt conditional on xt+1 and yt via equation (9).
Sampling (σ2, τ2) from step 1 can be performed by SIR, acceptance–rejection or Metropolis–
Hastings-type algorithms or replaced by a Gibbs step that draws (σ2, τ2) conditional on (yT, xT). Reis
et al. (2006) compare the performance of these sampling and other MCMC schemes in the context
of the local level model. Step 2 is known as the forward filtering, backward sampling (FFBS) algo-
rithm (Carter and Kohn, 1994; Frühwirth-Schnatter, 1994).
For the NLDM, all densities needed for making inference are well known and they can easily be
carried out in applications. On the other hand, difficulties arise when the model is nonlinear or non-
Gaussian, because no closed-form densities are available. As we discuss below, particle filters
provide an effective approach to overcoming the difficulty.
Let us consider a more general dynamic model where the assumptions of normality and/or linearity
are relaxed. The observation and state transition equations become
yt xt ∼ p ( yt xt ) ,
xt xt −1 ∼ p ( xt xt −1 ) , t = 1, 2, . . .
Denote the initial probability density of the state by p(x0). All static parameters, such as σ2 and
τ from the previous section, are assumed to be known throughout this section. Batch and sequential
2
parameter learning are deferred to the next section. The Kalman recursions from equations (3) and
(5) are now replaced, respectively, by
p ( xt y t −1 ) = ∫ p ( xt xt −1 ) p ( xt −1 y t −1 ) dxt −1 (14)
and
p ( yt xt ) p ( xt y t −1 )
p ( xt y t ) = (15)
p ( yt y t −1 )
In most real-world applications, outside the realm of NDLM, the integration with respect to xt−1
in (14) and the implementation of Bayes’ theorem in (15) are both analytically intractable and/or
computationally costly. As mentioned in the Introduction, there exist approximations for sequential
state estimation and filtering based on Kalman-like filters, such as the extended Kalman filter and
the unscented Kalman filter. Also, as discussed in the Introduction, there exist several MCMC-type
samplers for batch estimation of the whole state vector and parameters similar to the FFBS intro-
duced in the previous section.
Particle filters, loosely speaking, combine the sequential estimation nature of Kalman-like filters
with the flexibility for modeling of MCMC samplers, while avoiding some of the their shortcomings.
On the one hand, like MCMC samplers and unlike Kalman-like filters, particle filters are designed
to allow for more flexible observational and evolutional dynamics and distributions. On the other
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
172 H. F. Lopes and R. S. Tsay
hand, like Kalman-like filters and unlike MCMC samplers, particle filters provide online filtering
and smoothing distributions of states and parameters.
The goal of most particle filters is to draw a set of i.i.d. particles {xt(i)}i=1
N
that approximates p(xt|yt)
by starting with a set of i.i.d. particles {x t−1}i=1 that approximates p(xt−1|y ).
(i) N t−1
The most popular filters are the bootstrap filter (BF), also known as the sequential importance sam-
pling with resampling (SISR) filter, proposed by Gordon et al. (1993), and the auxiliary particle filter
(APF), also known as the auxiliary SIR (ASIR) filter, proposed by Pitt and Shephard (1999). However,
it is worth mentioning that one of the earliest sequential Monte Carlo algorithms was proposed by West
(1992). For recent discussion regarding the similarities and differences between BF and APF see, for
instance, Carvalho et al. (2010), Doucet and Johansen (2008), and Douc et al. (2009b).
Propagate–resample filters
The BF of Gordon et al. (1993) is based on sequential SIR steps over time (Smith and Gelfand,
1992). The Kalman recursions from (14) and (15) are combined in
p ( xt , xt −1 yt , y t −1 ) ∝ p ( yt xt ) p ( xt xt −1 ) p ( xt −1 y t −1 )
(16)
2. Resample 1. Propagate
In other words, the BF first propagates particles from the posterior at time t − 1 in order to generate
particles from the prior at time t. Then it resamples the propagated particles with weights proportional
to their likelihoods. This is Algorithm 1 below, whose recursions are illustrated in Figure 1.
Resample–propagate filters
Similarly, the APF first resamples particles from the posterior at time t − 1 with weights taking into
account the next observed data point, yt. Then it propagates the resampled particles. The identity
from equation (16) is rewritten as
Figure 1. Bootstrap filter. A schematic representation of the bootstrap filter over two time periods. The squares
are yt+1 and yt+2. From top to bottom, the first, second, fourth and fifth set of dots represent particles, while the
third and sixth set of dots represent particle weights
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 173
p ( xt , xt −1 yt , y t −1 ) ∝ p ( xt xt −1, y t ) p ( yt xt −1 ) p ( xt −1 y t −1 )
(17)
2. Propagate 1. Resample
The main difficulty with the APF is that, in most applications, neither is p(yt|xt−1) available for
pointwise evaluation (resampling) nor is p(xt|xt−1, yt) available for sampling (propagation). The APF
is fully adapted when these conditions are satisfied. The main suggestion in Pitt and Shephard (1999)
for general state-space models is as follows:
(a) use p(yt|g(xt−1)), i.e. the data density p(yt|xt) evaluated at g(xt−1) (usually the expected value,
median or mode of the state transition density p(xt|xt−1)) as the proposal weight to resample the
old particle xt−1; and
(b) use q(xt|xt−1, yt) ≡ p(xt|xt−1) as the proposal density to propagate resampled particles to the new
set of particles {xt(i)}i=1
N
Note that here q(·) is blind since it does not incorporate the current obser-
vations yt. See below for further details on better ways of choosing q(·).
Since both resampled and propagated particles come from proposal densities, it follows directly
from a simple SIR argument that these particles have weights given by
p ( yt xt ) p ( xt xt −1 ) p ( xt −1 y t −1 )
wt ∝
p ( yt g( xt −1 )) p ( xt xt −1 ) p ( xt −1 y t −1 )
(18)
p ( yt xt )
=
p ( yt g( xt −1 ))
This leads to Algorithm 2 below.
3. Resample {xt(i)}i=1
N
from {x̃ t(i)}i=1
N
with weights wt(i) ∝ p(yt|x̃ t(i))/p(yt|g(x̃ (i)t−1)).
Algorithms 3 and 4 below are the optimal and fully adapted versions of BF and APF when p(yt|xt−1)
is analytically tractable and p(xt|xt−1, yt) easy to sample from.
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
174 H. F. Lopes and R. S. Tsay
sampling (Gilks and Berzuini, 2001; Fearnhead, 2002; Fearnhead and Clifford, 2003) and the use
of resampling (Liu and Chen, 1995; Carpenter et al., 1999), which can be important to avoid having
only a small number of particles with non-negligible weight.
Pitt and Shephard (1999) suggest local linearization of the observation equation (via an extended
Kalman filter-type approximation) in order to construct a proposal propagation density, say q(xt|xt−1,
yt), for the OAPF propagation density p(xt|xt−1, yt), which takes into account the current observation
yt and, potentially, outperforms the naïve blind propagation proposal density p(xt|xt−1). See Liu and
Chen (1995), Carpenter et al. (1999), Gilks and Berzuini (2001), Doucet et al. (2000), Fearnhead
(2002) and Guo et al. (2005), amongst others, for additional discussion on the choice of q(xt|xt−1, yt).
See Chen and Lai (2007) for an interesting application of online identification and adaptive control
of autoregressive models with exogenous inputs (ARX models) with Markov parameter jumps. More
efficient proposal densities can be obtained in the presence of conditional linearity and/or normality.
In other words, when the split of the state vector xt into x1t and x2t leads to, say, x1t|x2t being an
NDLM, then part of the sequential learning algorithm can be performed exactly by analytically
integrating out x1t. Such filters are commonly referred to as the Rao–Blackwellized particle filter or
mixture Kalman filter (Chen and Liu, 2000; Andrieu and Doucet, 2002).
Resampling or not?
It has been argued that the resampling step in the BF and the second resampling step in the APF
should only be performed when particle degeneracy is signaled. For instance, Kong et al. (1994)
introduced the effective sample size, Neff, which they estimate by
1
N̂ eff = (19)
∑ ( w( ) )
N i 2
i =1 t
The particle set that approximates p(xt|yt) is then represented by {(x̃ t, wt)(i)}i=1
N
, using the notation
from Algorithms 1 and 2 above.
Reducing MC error
Regardless of whether resampling is performed at each time period or not, when the goal is to
produce summary statistics based on the posterior p(xt|yt) (for instance, mean, variance, quantiles),
it is more efficient (estimator with lower variance) to perform the computation prior to resampling.
For instance, it is more efficient to estimate E(xt|yt) by Σi=1
N
ω (i)
t x̃ t /Σ j=1wt than by Σ i=1x̃ t/N.
(i) N (j) N
Example 1: Local level model. In this example, we use the local level model to compare the per-
formance of the four particle filter algorithms discussed above. They are the BF, APF, OBF and
OAPF. As mentioned in the second section (‘Normal dynamic linear model’ above), the local level
model is yt|xt ~ N(xt, σ2) and xt|xt−1 ~ N(xt−1, τ2). For this simple linear model, the traditional Kalman
filter is available to produce the ‘optimal’ estimate of the filtered state vector. We use this estimate
in evaluating the performance of particle filters.
Based on results of the second section, it is easy to see that (i) xt|yt ~ N(mt, Atσ2), where mt = (1
− At)mt−1 + Atyt and At = (At−1σ2 + τ2)/(At−1σ2 + τ2 + σ2), (ii) yt|xt−1 ~ N(xt−1, σ2 + τ2), and (iii) xt|xt−1, yt
~ N(ω2xt−1/τ2 + yt/σ2), ω2), where ω−2 = σ−2 + τ−2. Thus the four particle-filter algorithms are easy to
implement. To compare the filters, we employ the criterion of mean square errors, which are com-
puted using R runs of each particle filter f in {BF, OBF, APF, OAPF}, across M time series of length
T. Specifically, the MSE is given by MSEf = Σt,m,r(x̂ ftmr − x̃ tm)2/(TMR), where x̃ tm is obtained via the
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 175
standard Kalman filter recursions (equations (3)–(8)) for the mth dataset up to time t, and x̂ ftmr =
ΣNi=1x(i)ftmr/N is the particle approximation for x̃ tm based on N particles {x(i)ftmr}Ni=1.
The relative MSE, relative to the bootstrap filter, is defined as RMSEf = MSEf /MSEBF for f in
{OBF, APF, OAPF}. Results are summarized in Figure 2. From the plots, OAPF outperforms OBF
for all four values of τ2 and OBF fares better than BF for all four values of τ2. Also, it seems that
BF performs much better than APF when the signal-to-noise ratio, τ/σ, is greater than one.
Review papers
Since Gordon et al. (1993), several review papers have contributed to straightening out the sub-area
of sequential Monte Carlo. Here we list a small subset of these papers. The choice is rather subjec-
tive and based on our limited and biased views of the field. A few of the early reviews are by Doucet
et al. (2000) and Arulampalam et al. (2002), books by Liu (2001), Doucet et al. (2001a) and Ristic
et al. (2004) and the 2002 special issue of IEEE Transactions on Signal Processing on sequential
Monte Carlo methods. See also the review by Chen (2003).
More recent studies, along with this paper, are Cappé et al. (2007), Doucet and Johansen (2010)
and Prado and West (2010, Ch. 6). They carefully organize and highlight the fast development of
the field over the last decade, such as parameter learning, more efficient particle smoothers, particle
filters for highly dimensional dynamic systems and, perhaps the most recent one, the interconnections
between MCMC and SMC methods.
PARAMETER LEARNING
Consider again the general dynamic model. We now address explicitly the unknown vector of static
parameters θ of the model:
yt xt , θ ∼ p ( yt xt , θ ) (20)
for t = 1, . . . , T and initial probability density p(x0|θ) and prior p(θ). There are primarily two ways
to tackle the problem of learning θ: batch sampling and online sampling.
Batch sampling
The solution involves obtaining an approximation, say pN(yT|θ), to the joint likelihood p(yT|θ). For
the NDLM of the second section (‘Normal dynamic linear model’ above), the predictive density was
obtained analytically from equation (13). The approximation pN(yT|θ) can be obtained by any of the
previous filters as
T T N
1
p N ( yT θ ) = ∏ p N ( yt y t −1, θ ) = ∏∑ p(y t xt(i), θ ) (22)
t =1 NT t =1 i =1
where xt(i) ~ p(xt|x(i)t−1, θ), for i = 1, . . . , N. Therefore, the components of θ can be sampled iteratively
via a standard MCMC sampler, such as a Metropolis–Hastings algorithm, or via an SIR step. Two
of the major drawbacks of this solution are that (i) SMC loses its appealing sequential nature and
(ii) the overall MCMC or SIR scheme can be highly sensitive to the approximation pN(yT|θ). See,
for instance, Chopin (2002) and Del Moral et al. (2006) for more theoretical justifications and further
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
176 H. F. Lopes and R. S. Tsay
N=50 N=100
1.4
1.4
1.2
1.2
APF
APF
1.0
1.0
OBF
RMSE
RMSE
0.8
0.8
OBF
0.6
0.6
OAPF
0.4
0.4
OAPF
0.2
tau tau
N=100 N=1000
2.5
2.5
2.0
2.0
1.5
1.5
RMSE
RMSE
APF
APF
1.0
1.0
OBF OBF
0.5
0.5
OAPF OAPF
tau tau
Figure 2. Comparison between BF, OBF, APF and OAPF via relative mean square error (RMSE). Local level
model is used, where yt|xt ~ N(xt, σ2) and xt|xt−1 ~ N(xt−1, τ2), for t = 1, . . . , T, x0 ~ N(m0, C0), σ = 1, τ = 0.22,
0.71, 1.0 or 1.41, m0 = 0 and C0 = 10. The starting value is x0 = 0 and N denotes the number of particles. RMSE
is based on M = 10 time series of length T and R = 10 runs of each particle filter per time series. Top row:
sample size T = 100; bottom row: T = 1000
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 177
details, Doucet and Tadic (2003), Andrieu et al. (2004), Poyiadjis et al. (2005), Andrieu et al. (2005)
and Olsson et al. (2008) for expectation-maximization-like schemes, and Fernández-Villaverde and
Rubio-Ramírez (2005, 2007) and DeJong et al. (2009) for applications in dynamic stochastic general
equilibrium macroeconomic models.
Online sampling
The solution here is to produce sequential MC approximations to p(xt, θ|yt), sometimes p(xt−1, xt, θ|yt)
and/or other small dimensional functions (small compared to t) of (xt, θ) conditional on yt. Simply
resampling θ over time is bound to fail since, in general, after a few time steps the particle set will
contain only one particle. Gordon et al. (1993) suggest incorporating artificial evolution noise for θ
when tackling the problem of sequentially learning the static parameters of a state-space model.
Since parameters are not states, adding noise artificially will eventually distort and compromise the
validity of the approximated posterior distributions. Their approach imposes a loss of information
in time as artificial uncertainties added to the parameters eventually result in a very diffuse posterior
density for θ. In what follows, we introduce three well-established filters for sequentially learning
both xt and θ: (i) the Liu and West filter; (ii) the Storvik filter; and (iii) the particle learning (PL)
filter of Carvalho et al. (2010) and Lopes et al. (2010).
1 N
p N (θ y t −1 ) = ∑ f (θ ; m( j), V )
N j =1
(23)
– – – –(j) –
where m(j) = aθ (j)
t−1 + (1 − a)θ , θ = Σ j=1θ t−1/N, V = h Σ j=1(θ t−1 − θ )(θ t−1 − θ )′/N and h = 1 − a . The
N (j) 2 N (j) 2 2
subscript t of θt is used only to indicate that samples are from p(θ|y ). The APF of Pitt and Shephard
t
(1999) of equation (17) can now be written for the state vector (xt, θt) as
In general and similar to the APF filter of the third section, p(yt |xt−1, θ) is not available for point-wise
evaluation and/or p(xt |xt−1, θt, yt) is not easy to sample from. Liu and West resample old particles with
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
178 H. F. Lopes and R. S. Tsay
weights proportional to p(yt |g(xt−1), m(θt−1)), where g(·) and m(·) are described above. Then, they propa-
gate θt from the proposal propagation density p(θt |θt−1) and propagate xt conditional on θt from the
evolution density q(xt|xt−1, θ, yt) ≡ p(xt |xt−1, θt). The propagated particles (xt, θt) have associated weights
p ( yt xt , θ t )
ω t ∝
p ( yt g( xt −1 ) , m (θ t −1 ))
which leads to Algorithm 5 below.
The performance of the LW filter depends on the choice of tuning parameter a, which drives both
the shrinkage and the smoothness of the normal approximation. It is common practice to set a around
0.98 or higher. The components of θ can be either transformed in order to accommodate the approxi-
mate local normality or the multivariate normal approximation could be replaced by a composition
of, say, conditionally normal densities for location parameters and inverse-gamma densities for scale/
variance parameters. See, for example, Petris et al. (2009, pp. 222–228) for an example based on
the local level model and Carvalho and Lopes (2007) for an application to Markov switching
stochastic volatility models.
Example 2: Stochastic volatility model. In its simplest form, asset returns yt are modeled as con-
ditionally independent normal random variables with log-variance xt following a first-order autore-
gressive model, i.e. yt |xt ~ N (0, exp{xt}) and xt |xt−1 ~ N(α + βxt−1, τ 2); see Jacquier et al. (1994) and
Kim et al. (1998). One possible version of the LW filter assumes, for example, that θ = (α, β, logτ2)
and g(xt−1) = α + βxt−1.
Example 2 (continued). The stochastic volatility model admits recursive sufficient statistics for
θ = (α, β, τ 2) when p(θ) is conditionally conjugate normal-inverse gamma. More precisely, when
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 179
(α, β|τ 2) ~ N (b0, τ 2B0) and τ 2 ~ IG(c0, d0), it is easy to see that (α, β|τ2, xt) ~ N(bt, τ 2Bt) and (τ 2|xt) ~
IG(ct, dt), where Bt−1 = B−1 −1 −1
t−1 + ztz′t, Bt bt = B t−1bt−1 + xtzt, ct = ct−1 + 1/2, dt = dt−1 + (xt − b′t zt)xt/2 + (bt−1 − bt)′
−1
Bt−1bt−1/2 and z′t = (1, xt−1). The recursive sufficient statistics are functions of xt−1, x2t−1, xt−1xt and x 2t.
All simulated exercises in Storvik (2002) are based on a blind propagation rule, i.e. q(xt |xt−1, θ,
yt) above is equal to p(xt|xt−1, θ). In this case, resampling is performed with weights wt ∝ p(yt |xt, θ).
Like any other PF with blind propagation, such as the bootstrap filter, this filter is bound to suffer
from particle degeneracy, which in turn directly compromises sequential parameter estimation.
Particle learning
Carvalho et al. (2010) present methods for sequential filtering, particle learning (PL) and smoothing
for rather general state-space models. They extend Chen and Liu’s (2000) mixture Kalman filter
(MKF) methods by allowing parameter learning and utilize a resample–propagate algorithm together
with a particle set that includes state-sufficient statistics. Recall the simulated exercise from the third
section that empirically shows that resample–propagate filters tend to outperform propagate–
resample ones. They also show via several simulation studies that PL outperforms both the LW
and Storvik filters and is comparable to MCMC samplers, even when full adaptation is considered.
The advantage is even more pronounced for large values of T.
Let st and sxt denote the parameter and state-sufficient statistics satisfying deterministic updating
rules st = S(st−1, xt, yt), as in the Storvik filter from the previous subsection, and sxt = K(st−1
x
, θ, yt), for
K(·) mimicking the Kalman filter recursions (see Example 3 below). Then PL can be described as
follows.
In many cases S will also be a function of xt−1 and possibly other lags of the state variable, such
as in the stochastic volatility model of Example 2. In these cases, the above algorithm is slightly
changed and particles for such lagged values are also carried over time. Therefore, step 2 is modified
to sample (xt−1, xt) from p(xt−1, xt|st−1
x
, θ, yt), which implies sampling xt−1 from p(xt−1|st−1
x
, θ, yt) and xt
from p(xt|xt−1, θ, y ).
t
Example 3: Conditional NDLM. Carvalho et al. (2010) derive the PL scheme for the class of
conditional NDLM defined by the observation and evolution equations that assume the form of a
linear system (see the NDLM equations (1) and (2)) conditional on an auxiliary state λt:
with the quadruple {Fλt, Gλt, σ λ2t, τ λ2t} being a function of the static parameter vector θ. The marginal
distributions of the observation error and state shock distributions are any combination of normal, scale
mixture of normals, or discrete mixture of normals depending on the specification of the
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
180 H. F. Lopes and R. S. Tsay
distribution of the auxiliary state variable p(λt+1|θ) (Chen and Liu, 2000). Extensions to hidden Markov
specifications where λt+1 evolves according to p(λt+1|λt, θ) are discussed in Carvalho et al. (2010). In
the case where the auxiliary state variable λt is discrete, such as in stochastic volatility with jumps
models (Markovian or not), the state xt−1 can be analytically integrated out, in addition to xt and λt, at
the initial resampling step, i.e. p(yt|(λt−1, st−1x
, θ)(i)) = Σλtp(yt|λt, (st−1
x
, θ)(i))p(λt|(λt−1, θ)(i)), where the con-
x
ditional sufficient statistics for states (st ) and parameters (st) satisfy the deterministic updating rules
st = S(st−1, xt, λt, yt) and sxt = K(sxt−1, θ, λt, yt), where S(·) denotes, as defined previously, the recursive
update of the parameter sufficient statistics and K(·) denotes the Kalman filter recursions of the con-
ditional NDLM given in equations (3)–(8). The algorithm can be summarized as follows:
In the case where the auxiliary state variable λt is continuous, the authors extend the above scheme
by adding to the current particle set a propagated particle λt+1 ~ p(λt+1 |(λt, θ)(i)).
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 181
Example 4: Comparison between LW, Storvik and PL. We compare the performance of these
particle filters using the local level model of Example 1 with parameter learning, where the random
walk system equation is replaced by a first-order autoregression. More precisely, yt|xt, θ ~ N (xt, σ2)
and xt|xt−1, θ ~ N(α + βxt−1,τ2), where x0 ~ N(m0, C0) and θ = (α, β, τ 2, σ2). The prior distribution of
θ is p(θ) = p(σ2)p(τ2)p(α, β|τ 2), where σ2 ~ IG(n0/2, n0σ 20/2), τ2 ~ IG(0/2, 0τ2/2) and (α, β) ~ N(b0,
τ2B0). The recursive sufficient statistics for θ can easily be derived. It can be shown that β|(τ2, xt) ~
N(bt, τ 2Bt) and τ2|xt ~ IG(1/2, 1τ 21/2), where 1 = 0 + t, Bt−1 = B0−1 + Z′t Zt, Bt−1bt = B0−1b0 + ZtTzt, and
1τ 21 = 0τ 02 + (zt − Xtbt)′zt + (b0 − bt)′B0−1b0, for zt = (x1, . . . , xt)′, Zt = (1t, Z2t), Z2t = (x0, . . . , xt−1)′
and 1t a t-dimensional vector of ones. The quantities (nt, ntτ 2t, bt, Bt) can be rewritten recursively as
functions of (t−1, t−1τ 2t−1, bt−1, Bt−1), xt−1, xt and yt. A time series of length T = 200 was simulated
using θ = (0.0, 0.9, 0.5, 1.0) and x0 = 0. The prior hyperparameters are m0 = 0, C0 = 10, b0 = (0.0,
0.9)′, B0 = I2, n0 = v0 = 10, τ 20 = 0.5 and σ 20 = 1.0, leading to relatively uninformative prior informa-
tion. The performance of the filters is assessed by running each algorithm for R = 100 times based
on N = 1000 particles. A very long PL (N = 100,000) is run to serve as a benchmark for comparison.
Let q(γ, α, t) be the 100αth percentile of p(γ|yt), where γ is an element of θ. We define the root mean
squared error as the square root of MSE(γ, α, f, t) = Σt,r[q(γ, α, t) − qfr(γ, α, t)]2/R for filter f in {LW,
STORVIK, PL} and replication r = 1, . . . , R. Finally, a full adaptation is implemented for the three
filters. In other words, LW differs from PL only through the sequential estimation of θ, Storvik
differs from PL only to the extent that Storvik propagates first and then resamples, while PL resa-
mples first and then propagates. Results are summarized in Figures 3 and 4. Both the Storvik filter
and PL are significantly better than the LW filter, while PL is moderately better than Storvik, par-
ticularly when estimating the pair (τ2, σ2).
Smoothing
In addition to delivering sequential filtering for parameters and states, particle filters can also be
implemented effectively when the main goal is smoothing the states conditional on the whole vector
of observations yT. In this sense, particle smoothers are alternatives to MCMC in state-space models
(Kitagawa, 1996). Godsill et al. (2004) introduced an algorithm that relies on (i) forward particle
sampling and (ii) backward particle resampling. Carvalho et al. (2010) extend the algorithm to
accommodate sequential learning of the parameter vector θ. In this more general case, it can be
shown that
{ }
T −1
p ( x T , θ yT ) ∏ p(x t xt +1, θ , y t ) p ( xT , θ yT ) (27)
t =1
p ( xt xt +1, θ , y t ) ∝ p ( xt +1 xt , θ ) p ( xt θ , y t ) (28)
This leads to a backward sampling algorithm that resamples forward particles xt from p(xt |θ, yt)
with weights proportional to p(xt+1 |xt, θ). More precisely, for each particle i, for i = 1, . . . , N, start
with (x̃ T, θ˜)(i) = (xT, θ)(i), i.e. a draw from p(xT, θ|yT). Then, for t = T − 1, . . . , 1, sample x̃ t(i) from
{xt(j)}Nj=1 with weights πt(j) ∝ p(x̃ t+1|xt(j), θ˜). In the end, (x̃ 1, . . . , x̃ T)(i) are draws from p(xT|yT) for i =
1, . . . , N. Note that the algorithm is O(TN2), so the computational time to obtain draws from p(xT|yT)
is expected to be much larger than the computational time to obtain draws from p(xt|yt) via standard
SMC filters for t = 1, . . . , T. See Briers et al. (2010) for an alternative O(TN2) SMC smoother for
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
182 H. F. Lopes and R. S. Tsay
Figure 3. Comparison between LWF, SF and PL. Percentiles of p(θ|yt) (2.5th, 50th and 97.5th) based on 100
replications of each particle filter with N = 1000 particles (gray lines). Black lines are based on PL and N =
100,000. Liu and West filter (left column), Storvik’s filter (center column) and particle learning (right column).
The row (from top to bottom) represents the components of θ = (α, β, τ2, σ2)
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
alpha beta
0.25
0.15
0.20
0.10
0.15
Root MSE
Root MSE
0.10
0.05
0.05
0.00
0.00
Percentile Percentile
tau2 sigma2
1.2
LW
0.4
Storvik
1.0 PL
0.3
0.8
0.6
0.2
Root MSE
Root MSE
0.4
0.1
0.2
0.0
0.0
Percentile Percentile
DOI: 10.1002/for
J. Forecast. 30, 168–209 (2011)
183
Figure 4. Comparison between LWF, SF and PL. Root mean squared error of R = 100 replications for each filter. All filters are based on
N = 1000 particles and the root mean squared error is computed against a long PL run (N = 100,000)
184 H. F. Lopes and R. S. Tsay
the case where θ is known. An O(TN) smoothing algorithm has recently been introduced by
Fearnhead et al. (2008b) also for the case where θ is known. See also Douc et al. (2009a) for
additional FFBS particle approximations.
Example 4 (continued): comparison between PL and MCMC. In the case of pure filtering, i.e.
when the parameter vector θ = (α, β, τ2, σ2) is known and fixed, it is easy to see that both filtered
and smoothed distributions, p(xt |yt, θ) and p(xt |yT, θ), are available in closed form (see equations (5)
and (10), respectively). For example, Figure 5 shows that results of particle filtering and a smoothing
(a)
4
TRUE
OAPF
2
States
0
−2
−4
Time
(b)
4
MCMC
PL
2
0
−2
−4
Time
Figure 5. Particle smoothing. (a) Comparing the true filtered and smoothed distributions, p(xt |yt) and p(xt |yT),
respectively, with approximations based on N = 2000 particles from the OAPF. (b) Comparing the MCMC and
PL approximations to the filtered and smoothed distributions, p(xt |yt) and p(xt |yT). MCMC is based on
M = 2000 draws, after M0 = 10,000 as burn-in, while PL is based on N = 2000 particles
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 185
approximation based on the OAPF (N = 2000 particles) virtually match the true distributions. Figure
6 shows that both MCMC approximation (M = 2000 draws, after M0 = 10,000 as burn-in) and PL
approximation (N = 2000 particles) to p(α|yT), p(β|yT), p(τ2|yT) and p(σ2|yT) are virtually identical.
Computational cost is measured here in CPU seconds with N = 1000 and M0 = M = 1000 points.
FFBS is about one order of magnitude faster than PL for smoothing (34 s versus 233 s), but PL is
approximately three orders of magnitude faster than FFBS for filtering (2 s versus 3500 s).
p N ( yT ) =
1 N
( (i )
)
∑ p yT (σ 2, τ 2 ) ≈ p ( yT ) = ∫ p ( yT σ 2, τ 2 ) p (σ 2, τ 2 ) dσ 2 dτ 2
N i =1
(29)
∏∑ p(y (x )
T N
1 (i )
p N ( yT ) = t , σ 2, τ 2 )
t −1 (30)
NT t =1 i =1
APPLICATIONS
In this section we apply particle filters to four time series problems that are of common interest in
many scientific areas. The first application revisits the stochastic volatility model of Example 2. The
second application is concerned with the Markov switching stochastic volatility model of Carvalho
and Lopes (2007). The last two applications illustrate the use of particle filters in modeling realized
volatilities and in estimating unemployment rates via a dynamic generalized linear model.
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
186 H. F. Lopes and R. S. Tsay
alpha alpha
8
6
6
MCMC
PL
4
4
2
2
0
0
−0.2 −0.1 0.0 0.1 0.2 −0.2 −0.1 0.0 0.1 0.2
beta beta
10
10
8
8
MCMC
6
PL
4
4
2
2
0
0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00 0.65 0.70 0.75 0.80 0.85 0.90 0.95 1.00
tau2 tau2
3
3
MCMC
2
PL
1
1
0
0.2 0.4 0.6 0.8 1.0 1.2 0.2 0.4 0.6 0.8 1.0 1.2
sigma2 sigma2
2.5
2.5
2.0
2.0
1.5
MCMC
1.5
PL
1.0
1.0
0.5
0.5
0.0
0.0
0.6 0.8 1.0 1.2 1.4 1.6 0.6 0.8 1.0 1.2 1.4 1.6
Figure 6. Parameter learning. MCMC (left column) and PL (right column) approximations to p(θ|yT). Rows
are the components of θ, i.e. α, β, τ2 and σ2. MCMC is based on M = 2000 draws, after M0 = 10,000 as burn-in,
while PL is based on N = 2000 particles
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 187
for t = 1, . . . , T, β0 ~ N(m0, C0), φ ~ IG(a0, b0) and W ~ IG(c0, d0). The dynamic beta regression is
a special case of the dynamic generalized linear model (DGLM) of West et al. (1985), where the
observational distribution belongs to the exponential family. The data were downloaded from the
Brazilian Institute for Geography and Statistics (IBGE) site.1
We illustrate the computation of sequential Bayes factors via particle filters by comparing the beta
regression model to a simple local level model, i.e. yt |μt, σ2 ~ N(μt, σ2) and μt|μt−1, τ2 ~ N(μt−1, τ2),
where μ0 ~ N(m0, C0), σ2 ~ IG(a0, b0) and τ2 ~ IG(c0, d0) with given hyperparameters. The prior
hyperparameters were set at m0 = 0.1, C0 = 100, a0 = 2.1, b0 = (a0 + 1)0.00001, c0 = 2.1 and d0 = (c0
+ 1)0.00001, for the local level model and at m0 = log(y1/(1 − y1)), C0 = 0.1, a0 = 2.1, b0 = (a0 +
1)15,000, c0 = 2.1 and d0 = (c0 + 1)0.05, for the dynamic beta model. More general dynamics, such
as seasonality, could easily be included in both models with only slight modifications to the models
and particle filters. We ignore the seasonality here for simplicity.
Sequential inference for the local level model was performed by PL whereas that for the dynamic beta
regression was performed by the LWF. Results appear in Figure 7, while a Monte Carlo study is presented
in Figures 8 and 9. The estimations of μt under both models are relatively similar, with the sequential
Bayes factor slightly favoring the dynamic beta regression model. The Monte Carlo error associated with
the estimation of parameters and Bayes factors is relatively small. See Carvalho et al. (2009a) for more
details about PL for dynamic generalized linear models and dynamic discrete-choice models.
where tη (μ, σ2) denotes the Student-t distribution with η degrees of freedom, location μ and scale
σ2. The number of degrees of freedom η is treated as known. Sequential posterior inference is based
on the Liu and West filter with N = 100,000 particles. The shrinkage constant a is set at a = 0.95,
whereas prior hyperparameters are m0 = 0, C0 = 10, 0 = 3, τ 20 = 0.01, b0 = (0, 1)′ and B0 = 10I2.
1
[Link]
2
For additional early use of APF in SV models see Chib et al. (2002, 2006) and Omori et al. (2007).
3
The data are available at [Link]
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
188 H. F. Lopes and R. S. Tsay
(a) (b)
1.4
Local level
0.14
Dynamic Beta
1.3
0.12
Bayes factor
1.2
0.10
1.1
0.08
1.0
mar/03 jan/04 jan/06 jan/08 dec/09 mar/03 jan/04 jan/06 jan/08 dec/09
Months Months
(c) (d)
0.010
0.010
0.008
0.008
0.006
σ
0.006
0.004
0.004
0.002
0.002
mar/03 jan/04 jan/06 jan/08 dec/09 mar/03 jan/04 jan/06 jan/08 dec/09
Months Months
(e) (f)
0.7
0.005 0.010 0.015 0.020 0.025
0.6
0.5
2)
(1 (1 + φ))(1
0.4
W(1 2)
0.3
0.2
0.1
0.0
mar/03 jan/04 jan/06 jan/08 dec/09 mar/03 jan/04 jan/06 jan/08 dec/09
Months Months
Figure 7. Dynamic beta regression. (a) Sequential Monte Carlo (SMC) approximations for the median
and 95% credibility interval based on N = 10,000 particles for both models. (b) Sequential Bayes factor.
(c, d) Sequential parameter learning for σ and τ from the local level model. (e, f) Sequential parameter learning
for (1 + ø)−1/2 and W1/2 from the dynamic beta model
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 189
σ τ
0.012
0.014
0.012
0.010
0.010
0.008
0.008
0.006
0.006
0.004
0.004
0.002
0.002
mar/03 jan/04 jan/06 jan/08 dec/09 mar/03 jan/04 jan/06 jan/08 dec/09
(1 (1 + φ))(1 2) W(1 2)
0.035
1.2
0.030
1.0
0.025
0.8
0.020
0.6
0.015
0.4
0.010
0.2
0.005
0.0
mar/03 jan/04 jan/06 jan/08 dec/09 mar/03 jan/04 jan/06 jan/08 dec/09
Figure 8. Dynamic beta regression. A total of 20 replications of the SMC algorithm, each one based on
N = 10,000 particles. Top row: σ and τ from the local level model. Bottom row: (1 + ø)−1/2 and W1/2 from the
dynamic beta model
Particle approximation to the sequential posterior model probabilities, assuming a uniform prior for
η over models {t∞, t2, . . . , t20}, appears in Figure 10, where t∞ denotes the normal distribution.
Figure 10(d) shows percentiles of p(σt |yt) when integrating out over all competing models in {t∞,
t2, . . . , t20}. One can argue that the data slowly move over time from a more t-like, heavy-tail model
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
190 H. F. Lopes and R. S. Tsay
2.0
1.8
1.6
Bayes factor
1.4
1.2
1.0
Figure 9. Dynamic beta regression. A total of 20 replications of the SMC algorithm, each one based on
N = 10,000 particles
towards a more Gaussian, thin-tail model. Figures 11 and 12 present posterior summaries for the
volatilities and parameters of a few competing models.
λt xt −1, st , θ ∼ N (α st + β xt −1, τ 2 )
where Pr(st = j|st−1 = i) = pij, for i, j = 1, 2, and parameters θ = (α1, α2, β, τ2, p11, p22). Rios and Lopes
(2009) propose an extension of the CL filter, which they named the extended LW (ELW) filter,
which combines features of the LW filter and PL. The simulation exercise from Figure 13 shows
that the CL filter degenerates after 500 observations, whereas the ELW filter never collapses.
Realized volatility
We consider the intradaily realized volatilities of Alcoa stock from 2 January 2003 to 7 May 2004
for 340 observations. The daily realized volatilities used are the sums of squares of intraday 5 min,
10 min and 20 min log returns measured in percentages; see Tsay (2005, Ch. 11). In what follows,
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 191
t=1 t=444
0.15
0.054
0.053
0.10
0.052
PMP
PMP
0.051
0.05
0.050
0.049
0.00
0.048
N 3 4 5 6 7 8 9 11 13 15 17 19 N 3 4 5 6 7 8 9 11 13 15 17 19
t=666 t=888
0.15
0.15
0.10
0.10
PMP
PMP
0.05
0.05
0.00
0.00
N 3 4 5 6 7 8 9 11 13 15 17 19 N 3 4 5 6 7 8 9 11 13 15 17 19
Figure 10. Stochastic volatility model. Sequential posterior model probability for the number of degrees of
freedom η
we use the logarithms of the daily realized volatilities. Figure 14 presents the time series of log
realized volatilities. As expected, all three series behave similarly over time and are highly positively
correlated, with the 5 and 10 min and 10 and 20 min realized volatilities more correlated than the 5
and 20 min ones. Table I shows summary statistics of the time series.
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
192 H. F. Lopes and R. S. Tsay
(a) (b)
60
40
50
20
Standard deviation
40
Returns
30
0
20
−20
10
−40
0
0 200 400 600 800 0 200 400 600 800
Months Months
(c) (d)
60
60
50
50
Standard deviation
Standard deviation
40
40
30
30
20
20
10
10
0
Months Months
Figure 11. Stochastic volatility model. (a) GE returns. (b, c) 2.5th, 50th and 97.5th percentiles of p(σt |yt, M),
where σ 2t = exp{xt}, for M = t12 and M = t18, respectively. (d) 2.5th, 50th and 97.5th percentiles of p(σt |yt) by
integrating out over all competing models in {normal, t2, . . . , t20}
model. In the first model, say model M1, the i-minute log realized volatility yit, for i = 5, 10, 20, is
initially modeled by a local level model: (yit|xit, σ 2i ) ~ N(xit, σ 2i ) and (xit|xi,t−1, τ 2i ) ~ N(xi,t−1, τ 2i ). In
the second model, say model M2, the univariate local level model is extended to jointly model the
p = 3 time series of realized volatilities: (yt|xt, Σ) ~ N(1pxt, Σ) and (xt|xt−1, τ2)~ N(xt−1, τ2) and 1p is a
p-dimensional unity vector. The diagonal elements of the covariance matrix Σ are σ 2i , for i = 1, . . . ,
p, and the non-diagonal elements are σij, for i < j = 1, . . . , p.
Parameter learning
The variances σ 2i and τ 2i under M1 are, a priori, independent with σ 2i ~ IG(a0, b0), τ 2i ~ IG(c0, d0)
and hyperparameters a0 = c0 = 10, b0 = 1.1 and d0 = 0.55 common across i = 5, 10, 20. In this case
the prior mean and mode of σ 2i are 0.122 and 0.1, respectively, while its prior 95% credibility interval
is (0.064, 0.229). Similarly, the prior mean and mode of τ 2i are 0.061 and 0.05, respectively, while
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
p( α ) Normal t12 t18
0.12
0.8
0.8
0.8
0.10
0.6
0.6
0.6
0.08
0.4
0.4
0.4
α
α
α
0.06
Density
0.2
0.2
0.2
0.04
0.0
0.0
0.0
0.02
−0.2
−0.2
−0.2
0.00
−20 −10 0 10 20 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900
Months Months Months
0.10
1.00
1.00
1.00
β
β
β
0.06
Density
0.90
0.90
0.90
0.04
0.02
0.85
0.85
0.85
0.00
−20 −10 0 10 20 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900
2
p(τ ) Normal t12 t18
40
0.15
0.15
0.15
30
0.10
0.10
0.10
τ2
τ2
τ2
20
Density
0.05
0.05
0.05
10
0
0.00
0.00
0.00
0.00 0.05 0.10 0.15 0.20 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900 200 300 400 500 600 700 800 900
Particle Filters and Bayesian Inference
DOI: 10.1002/for
J. Forecast. 30, 168–209 (2011)
193
Figure 12. Stochastic volatility model. Column 1: marginal prior distributions for α, β and τ2. Columns 2–4: sequential 2.5th, 50th and 97.5th
percentiles of p(γ|yt, M1), for γ in (α, β, τ2, M) and model M ∈ {normal, t12, t18}
194
H. F. Lopes and R. S. Tsay
DOI: 10.1002/for
J. Forecast. 30, 168–209 (2011)
Particle Filters and Bayesian Inference 195
5−minute
4
3
3
10−minute
2
2
1
1
0
0
−1
−1
0 50 100 150 200 250 300 350 −1 0 1 2 3 4
Day 5−minute
10−minute
4
4
3
3
20−minute
2
2
1
1
0
0
−1
−1
Day 5−minute
20−minute
4
4
3
3
20−minute
2
2
1
1
0
0
−1
−1
Day 10−minute
Figure 14. Realized volatility. Log realized volatility of Alcoa stock based on the sum of squares of intraday
5 min, 10 min and 20 min log returns measured as a percentage
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
196 H. F. Lopes and R. S. Tsay
Table I. Summary statistics. Correlations (below main diagonal) and covariances (main diagonal and above)
its 95% credibility interval is (0.032, 0.115). Under model M2, τ2 ~ IG(c0, d0) and Σ ~ IW(0, S0)4.
When p = 1, σ2 ~ IG(0/2, S0/2), so we set 0 = 2a0 = 20 and S0 = 2b0Ip = 2.2Ip for comparison to
the univariate models. The prior mean and mode of Σ are 0.138Ip and 0.092Ip, respectively. The
parameter θ = (τ2, Σ) can be sampled from p(θ|st) = pIG(τ2; ct, dt)pIW (Σ; t, St), where the recursive
sufficient statistics are ct = ct−1 + 1/2, dt = dt−1 + (xt − xt−1)2/2, t = t−1 + 1 and St = St−1 + (yt − 1p xt)
(yt − 1p xt)′.
State learning
Let us start by assuming that (xt−1|yt−1, θ) ~ N(mt−1, Ct−1), with x0 ~ N(m0, C0), θ = (Σ, τ2) and st−1 x
=
(mt−1, Ct−1). PL starts by resampling the particles {(θ, st−1, st−1) }i=1 with weights p(yt|st−1, θ) = pN(yt;
x (i) N x
1pmt−1, Qt), where Qt = RtDp + Σ, Dp = 1p1′p and Rt = Ct−1 + τ2. The state-sufficient statistics sxt−1 are
then propagated to st = (mt, Ct), where mt = (1 − At1p)mt−1 + Atyt, Ct = Rt −AtQtA′t and At = Rt1′pQt−1.
Since both xt−1 and xt are used in dt when sampling τ2 from IG(ct, dt), then (xt−1, xt) need to be
–
sampled from p(xt−1, xt|yt, θ) = pN(xt−1; g(yt), Vt)pN(xt; h(yt, xt−1), Ct), where Vt−1 = C−1 −1
t−1 + 1′pW 1p, g(yt)
−1 −1 – −1 −2 −1 – −2 −1
= Vt(Ct−1mt−1 + 1′pW yt), C t = τ + 1′pΣ 1p and h(yt, xt−1) = Ct(τ xt−1 + 1′pΣ yt), for W = τ2Dp + Σ.
Finally, marginal posterior inference for xt given (yt, θ) is more efficient if drawn from p(xt|yt, st−1 x
,
−1 −1 −1 −1 −1
θ) = pN(xt; m̃t, C̃ t), where C̃ t = Rt + 1′pΣ 1p and m̃t = C̃ t(Rt mt−1 + 1′pΣ yt).
Results
Figures 15–17 summarize the sequential learning of parameters and states for the univariate local
level model based on N = 10,000 particles, which is fairly large considering the sample size T =
340. Figures 18–21 summarize the results for the multivariate local model, also based on N = 10,000
particles. Figure 22 compares the sequential posterior medians for the latent states from the three
individual fits of model M1 against the multivariate fit of model M2. Note the shrinkage effect of
model M2, which provides a smoother point estimate for the latent state.
CONCLUDING REMARKS
In this paper we review particle filters, which are also known as sequential Monte Carlo (SMC)
methods. We argue that, after almost two decades since the seminal paper of Gordon et al. (1993),
SMC methods now belong in the toolbox of researchers and practitioners in many areas of modern
ν 0 + p +1
Σ is inverse-Wishart with parameters 0 and S0 and density p ( Σ; ν 0, S0 ) ∝ Σ 2 exp {−0.5tr ( S0 Σ −1 )}, for 0 > p − 1,
4 −
S0 > 0 (positive definite) and trΣ = σ 21 + . . . + σ p2. The mean and the mode of Σ are S0/(0 − p − 1) and S0/(0 + p + 1),
respectively. Its inverse Σ−1 is Wishart with the same parameters and denoted by Σ−1 ~ W(0, S0). The mean and the mode
of Σ−1 are 0S−1 −1
0 and (0 − p − 1)S0 (0 ≥ p + 1), respectively.
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 197
Figure 15. Realized volatility. 2.5th, 50th and 97.5th percentiles of p(σ 2i |yit), p(τ 2i |y it) and p(xit|y it ), where yi is
i-minute log realized volatilities, for i = 5, 10, 20
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
198 H. F. Lopes and R. S. Tsay
5−minute 5−minute
25
20
60
15
40
10
20
5
0
0
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.02 0.03 0.04 0.05 0.06 0.07 0.08
σ2 τ2
10−minute 10−minute
25
20
60
15
40
10
20
5
0
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.02 0.03 0.04 0.05 0.06 0.07 0.08
σ2 τ2
20−minute 20−minute
25
20
60
15
40
10
20
5
0
0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.02 0.03 0.04 0.05 0.06 0.07 0.08
σ2 τ2
Figure 16. Realized volatility. Histogram approximations to p(σ 2i |yT) and p(τ 2i |yT), for i = 5, 10, 20
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 199
5−minute
2
1
0
Time
10−minute
2
1
0
Time
20−minute
2
1
0
Time
Figure 17. Realized volatility. 2.5th, 50th and 97.5th percentiles of p(xit|yit), where yi is i-minute log realized
volatilities, for i = 5, 10, 20
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
200 H. F. Lopes and R. S. Tsay
Figure 18. Realized volatility. Sequential 2.5th, 50th and 97.5th percentiles for the unique components of Σ
and for the correlations
science, ranging from signal processing and target tracking to robotics, bioinformatics and financial
econometrics. This paper focuses on the latter, with five demonstrations.
Besides the references of PF in financial econometrics cited above, some additional ones
are Johannes et al. (2008) on predictive regressions and optimal portfolio allocation, Raggi and
Bordignon (2006), Jasra et al. (2008), Li et al. (2008), Creal (2008) and Li (2009) on Lévy-type
SV models, and Johannes et al. (2009) on extracting latent jump diffusions from asset prices.
See also Fearnhead and Meligkotsidou (2004) and Fearnhead et al. (2008a) for particle filters in
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 201
(a)
0.16
0.14
0.12
0.10
τ2
0.08
0.06
0.04
0.02
Time
(b)
80
60
40
20
0
(c)
2
xt
1
0
Figure 19. Realized volatility. (a) Sequential 2.5th, 50th and 97.5th percentiles of p(τ2|yt). (b) Histogram
approximation to p(τ2|yT). (c) Sequential 2.5th, 50th and 97.5th percentiles of p(xt |yt)
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
202 H. F. Lopes and R. S. Tsay
τ2
100
80
60
40
20
0
σ2
30
5m
25
10m
20
20m
15
10
5
0
Figure 20. Realized volatility. Approximated posterior distributions of τ 2i and σ 2i . Univariate local level models
(grey lines) and multivariate local level model (black lines), for i = 5, 10, 20 (solid, dashed and dotted lines)
partially observed continuous-time models and diffusions. PF for jump Markov systems are studied
by Doucet et al. (2001b) and Andrieu et al. (2003).
ACKNOWLEDGEMENTS
We would like to express our sincere appreciation of the achievements of Professor Rudolf E.
Kalman. His studies have led to many new developments in scientific computing, statistical infer-
ence, and applications. Particle filters are one more example that will have a long-lasting impact in
our profession.
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 203
1.0
15
0.8
0.6
10
ρ12
0.4
5
0.2
0.0
0
0 50 100 150 200 250 300 350 0.0 0.2 0.4 0.6 0.8 1.0
Time ρ12
1.0
15
0.8
0.6
10
ρ13
0.4
5
0.2
0.0
0 50 100 150 200 250 300 350 0.0 0.2 0.4 0.6 0.8 1.0
Time ρ13
1.0
15
0.8
0.6
10
ρ23
0.4
5
0.2
0.0
0 50 100 150 200 250 300 350 0.0 0.2 0.4 0.6 0.8 1.0
Time ρ23
Figure 21. Realized volatility. Sequential learning of correlations. Sequential 2.5th, 50th and 97.5th percentiles
of p(ρij |yt) (left column). Histogram approximation to p(ρij |yT) (right column)
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
204 H. F. Lopes and R. S. Tsay
5−minute
10−minute
2.0
20−minute
multivariate
1.5
xt
1.0
0.5
0.0
1 43 85 127 170
0.5
0.0
Figure 22. Realized volatility. Sequential 50th percentiles of p(xit |yt, M1) for i = 5, 10, 20 and p(xt |y t, M2),
where M1 is the univariate local level model and M2 its multivariate version
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 205
REFERENCES
Alspach DL, Sorenson HW. 1972. Nonlinear Bayesian estimation using Gaussian sum approximation. IEEE
Transactions on Automatic Control 17: 439–448.
Andrieu C, Doucet A. 2002. Particle filtering for partially observed Gaussian state space models. Journal of the
Royal Statistical Society, Series B 64: 827–836.
Andrieu C, Davy M, Doucet A. 2003. Efficient particle filtering for jump Markov systems: application to time-
varying autoregressions. IEEE Transactions on Signal Processing 51: 1762–1770.
Andrieu C, Doucet A, Singh SS, Tadić VB. 2004. Particle methods for change detection, system identification,
and control. Proceedings of the IEEE 92: 423–438.
Andrieu C, Doucet A, Tadić VB. 2005. On-line parameter estimation in general state-space models. In Proceed-
ings of the 44th Conference on Decision and Control; 332–337.
Andrieu C, Doucet A, Holenstein R. 2010. Particle Markov chain Monte Carlo (with discussion). Journal of the
Royal Statistical Society, Series B 72: 269–342.
Arulampalam MS, Maskell S, Gordon N, Clapp T. 2002. A tutorial on particle filters for on-line nonlinear/non-
Gaussian Bayesian tracking. IEEE Transactions on Signal Processing 50: 174–188.
Briers M, Doucet A, Maskell S. 2010. Smoothing algorithms for state-space models. Annals of the Institute
Statistical Mathematics 62: 61–89.
Cappé O, Godsill S, Moulines E. 2007. An overview of existing methods and recent advances in sequential Monte
Carlo. IEEE Proceedings in Signal Processing 95: 899–924.
Carlin BP, Polson NG, Stoffer DS. 1992. A Monte Carlo approach to nonnormal and nonlinear state-space model-
ing. Journal of the American Statistical Association 87: 493–500.
Carpenter J, Clifford P, Fearnhead P. 1999. An improved particle filter for non-linear problems. IEE Proceedings
on Radar, Sonar and Navigation 146: 2–7.
Carter CK, Kohn R. 1994. On Gibbs sampling for state space models. Biometrika 81: 541–553.
Carvalho CM, Lopes HF. 2007. Simulation-based sequential analysis of Markov switching stochastic volatility
models. Computational Statistics and Data Analysis 51: 4526–4542.
Carvalho CM, Lopes HF, Polson N. 2009a. Particle learning for generalized dynamic conditionally linear models.
Working paper, University of Chicago Booth School of Business.
Carvalho CM, Lopes HF, Polson NG, Taddy M. 2009b. Particle learning for general mixtures. Working paper,
University of Chicago Booth School of Business.
Carvalho CM, Johannes M, Lopes HF, Polson N. 2010. Particle learning and smoothing. Statistical Science (to
appear).
Chen H, Petralia F, Lopes HF. 2009. Sequential Monte Carlo estimation of DSGE models. Working paper,
University of Chicago Booth School of Business.
Chen R, Liu JS. 2000. Mixture Kalman filter. Journal of the Royal Statistical Society, Series B 62: 493–
508.
Chen Y, Lai TL. 2007. Identification and adaptive control of change-point ARX models via Rao–Blackwellized
particle filters. IEEE Transactions on Automatic Control 52: 67–72.
Chen Z. 2003. Bayesian filtering: from Kalman filters to particle filters, and beyond. Working paper, McMaster
University, Canada.
Chib S, Nardari F, Shephard N. 2002. Markov Chain Monte Carlo Methods for Stochastic Volatility Models.
Journal of Econometrics 108: 281–316.
Chib S, Nardari F, Shephard N. 2006. Analysis of High Dimensional Multivariate Stochastic Volatility Models.
Journal of Econometrics 134: 341–371.
Chopin N. 2002. A sequential particle filter method for static models. Biometrika 89: 539–552.
Creal D. 2008. Analysis of filtering and smoothing algorithms for Lévy-driven stochastic volatility models.
Computational Statistics and Data Analysis 52: 2863–2876.
Da Silva CQ, Migon HS, Correira LT. 2009. Bayesian beta dynamic model and applications. Working paper,
Department of Statistics, Federal University of Rio de Janeiro.
DeJong DN, Dharmarajan H, Liesenfeld R, Moura GV, Richard J-F. 2009. Efficient likelihood evaluation of state-
space representations. Working paper, Department of Economics, University of Pittsburgh.
Del Moral P, Doucet A, Jasra A. 2006. Sequential Monte Carlo samplers. Journal of the Royal Statistical Society,
Series B 68: 411–436.
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
206 H. F. Lopes and R. S. Tsay
Douc R, Garivier E, Moulines E, Olsson J. 2009a. On the forward filtering backward smoothing particle
approximations of the smoothing distribution in general state space models. Working paper, Institut Télécom,
Paris.
Douc R, Moulines E, Olsson J. 2009b. Optimality of the auxiliary particle filter. Probability and Mathematical
Statistics 29: 1–28.
Doucet A, Johansen A. 2008. A note on auxiliary particle filters. Statistics and Probability Letters 78:
1498–1504.
Doucet A, Johansen A. 2010. A tutorial on particle filtering and smoothing: fifteen years later. In Handbook of
Nonlinear Filtering, Crisan D, Rozovsky B (eds). Oxford University Press: Oxford.
Doucet A, Tadić VB. 2003. Parameter estimation in general state-space models using particle methods. Annals of
the Institute of Statistical Mathematics 55: 409–422.
Doucet A, Godsill S, Andrieu C. 2000. On sequential Monte-Carlo sampling methods for Bayesian filtering.
Statistics and Computing 10: 197–208.
Doucet A, De Freitas N, Gordon N. 2001a. Sequential Monte Carlo Methods in Practice. Springer: New
York.
Doucet A, Gordon NJ, Krishnamurthy V. 2001b. Particle filters for state estimation of jump Markov linear systems.
IEEE Transactions on Signal Processing 49: 613–624.
Dukić V, Lopes HF, Polson NG. 2009. Tracking flu epidemics using Google trends and particle learning. Working
paper, University of Chicago Booth School of Business.
Eraker B, Johannes MS, Polson NG. 2003. The impact of jumps in volatility and returns. Journal of Finance 58:
1269–1300.
Fearnhead P. 2002. Markov chain Monte Carlo, sufficient statistics and particle filter. Journal of Computational
and Graphical Statistics 11: 848–862.
Fearnhead P. 2004. Particle filters for mixture models with an unknown number of components. Statistics and
Computing 14: 11–21.
Fearnhead P, Clifford P. 2003. Online inference for hidden Markov models via particle filters. Journal of the Royal
Statistical Society, Series B 65: 887–899.
Fearnhead P, Meligkotsidou L. 2004. Exact filtering for partially-observed continuoustime models. Journal of the
Royal Statistical Society, Series B 66: 771–789.
Fearnhead P, Papaspiliopoulos O, Roberts GO. 2008a. Particle filters for partially observed diffusions. Journal of
the Royal Statistical Society, Series B 70: 755–777.
Fearnhead P, Wyncoll D, Tawn J. 2008b. A sequential smoothing algorithm with linear computational cost.
Working paper, Department of Mathematics and Statistics, Lancaster University.
Fernández-Villaverde J, Rubio-Ramírez JF. 2005. Estimating dynamic equilibrium economies: linear versus non-
linear likelihood. Journal of Applied Econometrics 20: 891–910.
Fernández-Villaverde J, Rubio-Ramírez JF. 2007. Estimating macroeconomic models: a likelihood approach.
Review of Economic Studies 74: 1059–1087.
Frühwirth-Schnatter S. 1994. Data augmentation and dynamic linear models. Journal of Time Series Analysis 15:
183–202.
Gamerman D, Lopes HF. 2006. Markov Chain Monte Carlo: Stochastic Simulation for Bayesian Inference.
Chapman & Hall/CRC: Boca Raton, FL.
Gilks WR, Berzuini C. 2001. Following a moving target: MonteCarlo inference for dynamic Bayesian models.
Journal of the Royal Statistical Society, Series B 63: 127–146.
Gordon N, Salmond D, Smith AFM. 1993. Novel approach to nonlinear/non-Gaussian Bayesian state estimation.
IEE Proceedings F. Radar Signal Process 140: 107–113.
Godsill SJ, Doucet A, West M. 2004. Monte Carlo smoothing for non-linear time series. Journal of the American
Statistical Association 99: 156–168.
Gramacy R, Polson NG. 2010. Particle learning of Gaussian process models for sequential design and optimiza-
tion. Working paper, University of Chicago Booth School of Business.
Guo D, Wang X, Chen R. 2005. New sequential Monte Carlo methods for nonlinear dynamic systems. Statistics
and Computing 15: 135–147.
Han C, Carlin BP. 2000. MCMC Methods for computing Bayes factors: a comparative review. Journal of the
American Statistical Association 96: 1122–1132.
Harvey AC. 1989. Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University
Press: Cambridge, CA.
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 207
Ito K, Xiong K. 2000. Gaussian filters for nonlinear filtering problems. IEEE Transactions on Automatic Control
45: 910–927.
Jacquier E, Polson NG, Rossi PE. 1994. Bayesian analysis of stochastic volatility models. Journal of Business
and Economic Statistics 20: 69–87.
Jasra A, Stephens DA, Doucet A, Tsagaris T. 2008. Inference for Lévy-driven stochastic volatility models via
adaptive sequential Monte Carlo. Working paper, Institute for Statistical Mathematics, Tokyo, Japan.
Jazwinski A. 1970. Stochastic Processes and Filtering Theory. Academic Press: New York.
Johannes M, Polson P. 2009. Particle filtering. In Handbook of Financial Time Series, Andersen TG, Davis RA,
Kreiss J-P, Mikosch T (eds). Springer: Berlin; 1115–1130.
Johannes M, Korteweg A, Polson NG. 2008. Sequential learning, predictive regressions, and optimal portfolio
returns. Working paper, Graduate School of Business, Stanford University.
Johannes MS, Polson NG, Stroud JR. 2009. Optimal filtering of jump diffusions: extracting latent states from asset
prices. Review of Financial Studies 22: 2559–2599.
Julier SJ, Uhlmann JK. 1997. A new extension of the Kalman filter to nonlinear systems. In Proceedings of Aero-
Sense: 11th International Symposium on Aerospace, Defense Sensing, Simulation and Controls, no. 3068;
182–193.
Kass RE, Raftery A. 1995. Bayes factors. Journal of the American Statistical Association 90: 773–795.
Kim S, Shephard N, Chib S. 1998. Stochastic volatility: likelihood inference and comparison with ARCH models.
Review of Economic Studies 65: 361–393.
Kitagawa G. 1996. Monte Carlo filter and smoother for non-Gaussian nonlinear state space models. Journal of
Computational and Graphical Statistics 5: 1–25.
Kong A, Liu JS, Wong W. 1994. Sequential imputation and Bayesian missing data problems. Journal of the
American Statistical Association 89: 590–599.
Li H. 2009. Sequential Bayesian analysis of time-changed infinite activity derivatives pricing models. Working
paper, ESSEC Business School, Paris/Singapore.
Li H, Wells MT, Yu CL. 2008. A Bayesian analysis of return dynamics with Lévy jumps. Review of Financial
Studies 21: 2345–2378.
Liu JS. 2001. Monte Carlo Strategies in Scientific Computing. Springer: New York.
Liu J, Chen R. 1995. Blind deconvolution via sequential imputations. Journal of the American Statistical Associa-
tion 90: 567–576.
Liu J, Chen R. 1998. Sequential Monte Carlo methods for dynamic systems. Journal of the American Statistical
Association 93: 1032–1044.
Liu J, West M. 2001. Combined parameters and state estimation in simulationbased filtering. In Sequential
Monte Carlo Methods in Practice, Doucet A, de Freitas N, Gordon N (eds). Springer: New York; 197–
223.
Lopes HF, Polson NG. 2010. Extracting SP500 and NASDAQ volatility: the credit crisis of 2007–2008. In The
Oxford Handbook of Applied Bayesian Analysis, O’Hagan A, West M (eds). Oxford University Press: Oxford;
319–342.
Lopes HF, West M. 2004. Bayesian model assessment in factor analysis. Statistica Sinica 14: 41–67.
Lopes HF, Carvalho CM, Johannes M, Polson NG. 2010. Particle learning for sequential Bayesian computation.
In Bayesian Statistics 9, Bernardo JM, Bayarri MJ, Berger JO, Dawid AP, Heckerman D, Smith AFM, West
M (eds). Oxford University Press: Oxford (to appear).
Lund B, Lopes HF. 2009. Learning in a regime switching macro-finance model for the term structure. Working
paper, University of Chicago Booth School of Business.
MacEachern SN, Clyde MA, Liu JS. 1999. Sequential importance sampling for nonparametric Bayes models: the
next generation. Canadian Journal of Statistics 27: 251–267.
Mukherjee C, West M. 2009. Sequential Monte Carlo in model comparison: example in cellular dynamics in
systems biology. Working paper, Department of Statistical Science, Duke University.
Olsson J, Cappé O, Douc R, Moulines E. 2008. Sequential Monte Carlo smoothing with application to parameter
estimation in non-linear state space models. Bernoulli 14: 155–179.
Omori Y, Chib S, Shephard N, Nakajima J. 2007. Stochastic Volatility with Leverage: Fast and Efficient Likeli-
hood Inference. Journal of Econometrics 140: 425–449.
Petris G, Petrone S, Campagnoli P. 2009. Dynamic Linear Models with R. Springer: New York.
Pitt MK, Shephard N. 1999. Filtering via simulation: auxiliary particle filters. Journal of the American Statistical
Association 94: 590–599.
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
208 H. F. Lopes and R. S. Tsay
Polson NG, Stroud JR, Müller P. 2008. Practical filtering with sequential parameter learning. Journal of the Royal
Statistical Society, Series B 70: 413–428.
Poyiadjis G, Doucet A, Singh SS. 2005. Particle methods for optimal filter derivative: application to parameter
estimation. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing,
Vol. 5, v/925–v/928.
Prado R, Lopes HF. 2009. Sequential parameter learning and filtering in structured AR models. Working paper,
University of Chicago Booth School of Business.
Prado R, West M. 2010. Time Series: Modelling, Computation and Inference. Chapman & Hall/CRC Press: Baton
Rouge.
Raggi D, Bordignon S. 2006. Sequential Monte Carlo methods for stochastic volatility models with jumps. Working
paper, Department of Economics, University of Bologna.
Reis EA, Salazar E, Gamerman D. 2006. Comparison of sampling schemes for dynamic linear models. Interna-
tional Statistical Review 74: 203–214.
Rios MP, Lopes HF. 2009. Sequential parameter estimation in stochastic volatility models. Working paper, Uni-
versity of Chicago Booth School of Business.
Ristic B, Arulampalam S, Gordon N. 2004. Beyond the Kalman Filter: Particle Filters for Tracking Applications.
Artech House Radar Library: Norwood, MA.
Shephard N. 1994. Partial non-Gaussian state space. Biometrika 81: 115–131.
Shi M, Dunson DB. 2009. Bayesian variable selection via particle stochastic search. Working paper, Department
of Statistical Science, Duke University.
Smith AFM, Gelfand AE. 1992. Bayesian statistics without tears: a sampling–resampling perspective. American
Statistician 46: 84–88.
So MKP, Lam K, Li WK. 1998. A stochastic volatility model with Markov switching. Journal of Business and
Economic Statistics 16: 244–253.
Storvik G. 2002. Particle filters for state-space models with the presence of unknown static parameters. IEEE
Transactions on Signal Processing 50: 281–289.
Taddy M, Gramacy R, Polson NG. 2010. Dynamic trees for learning and design. Working paper, University of
Chicago Booth School of Business.
Tsay RS. 2005. Analysis of Financial Time Series (2nd edn). Wiley: New York.
Van der Merwe R, Doucet D, De Freitas N, Wan E. 2000. The unscented particle filter. In Advances in Neural
Information Processing Systems, Vol. 13. Leen TK, Dietterich TG, Tresp V (eds). MIT Press: Cambridge, MA;
584–590.
West M. 1992. Modelling with mixtures. In Bayesian Statistics 4, Bernardo JM, Berger JO, Dawid AP, Smith
AFM (eds). Oxford University Press: Oxford; 503–524.
West W. 1993a. Approximating posterior distributions by mixtures. Journal of the Royal Statistical Society, Series
B 54: 553–568.
West M. 1993b. Mixture models, Monte Carlo, Bayesian updating and dynamic models. Computing Science and
Statistics 24: 325–333.
West M, Harrison J. 1997. Bayesian Forecasting and Dynamic Models (2nd edn). Springer: New York.
West M, Harrison J, Migon H. 1985. Dynamic generalized linear models and Bayesian forecasting. Journal of the
American Statistical Association 80: 73–83.
Authors’ biographies:
Hedibert F. Lopes is Associate Professor of Econometrics and Statistics, Booth School of Business, University
of Chicago. Recent publications include work on Bayesian inference; dynamic models; Markov Chain Monte
Carlo and sequential Monte Carlo methods; modeling time-varying covariance through latent factor analysis,
Cholesky decompositions and other factorizations. Areas of application include economics, finance, biology and
natural and social sciences.
Ruey S. Tsay is the H. G. B. Alexander Professor of Econometrics and Statistics of Chicago Booth. He earned
his Ph.D. in statistics from the University of Wisconsin-Madison in 1982 and joined Chicago Booth in 1989. He
is a fellow of the American Statistical Association, the Institute of Mathematical Statistics, the Royal Statistical
Society, and Academia Sinica. His research focuses on business and economic forecasting, high-dimensional data
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for
Particle Filters and Bayesian Inference 209
analysis, risk management, and statistical inference. He has published extensively in leading statistical, economet-
ric, and finance journals.
Authors’ addresses:
Hedibert F. Lopes and Ruey S. Tsay, University of Chicago Booth School of Business, 5807 South Woodlawn
Avenue, Chicago, IL 60637, USA.
Copyright © 2010 John Wiley & Sons, Ltd. J. Forecast. 30, 168–209 (2011)
DOI: 10.1002/for