Linear Models with Structural Changes
Linear Models with Structural Changes
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide
range of content in a trusted digital archive. We use information technology and tools to increase productivity and
facilitate new forms of scholarship. For more information about JSTOR, please contact support@[Link].
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at
[Link]
The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to
Econometrica
This paper considers issues related to multiple structural changes, occurring at un-
known dates, in the linear regression model estimated by least squares. The main aspects
are the properties of the estimators, including the estimates of the break dates, and the
construction of tests that allow inference to be made about the presence of structural
change and the number of breaks. We consider the general case of a partial structural
change model where not all parameters are subject to shifts. We study both fixed and
shrinking magnitudes of shifts and obtain the rates of convergence for the estimated
break fractions. We also propose a procedure that allows one to test the null hypothesis
of, say, I changes, versus the alternative hypothesis of I + 1 changes. This is particularly
useful in that it allows a specific to general modeling strategy to consistently determine
the appropriate number of changes present. An estimation strategy for which the location
of the breaks need not be simultaneously determined is discussed. Instead, our method
successively estimates each break point.
1. INTRODUCTION
THIS PAPER CONSIDERS ISSUES related to multiple structural changes in the linear
regression model estimated by minimizing the sum of squared residuals.
Throughout, we treat the dates of the breaks as unknown variables to be
estimated. The main aspects considered are the properties of the estimators,
including the estimates of the break dates, and the construction of tests that
allow inference to be made about the presence of structural change and the
number of breaks.
Both the statistics and econometrics literature contains a vast amount of work
on issues related to structural change, most of it specifically designed for the
case of a single change.2 The econometric literature has witnessed recently an
I This paper has benefited from the comments of seminar participants at Harvard/MIT, N
western University, McGill University, the University of Copenhagen, the University of Illinois at
Urbana-Champaign, the Universit6 de Montr6al, the University of Sao Paulo, PUC at Rio de
Janeiro, Hitotsubashi University, the Universit6 de Lausanne, the CREST-INSEE, the European
University Institute, the University of York, the University of Pennsylvania, the 1995 Winter
Meeting of the Econometric Society, the 1995 Joint Meeting of the Institute of Mathematical
Statistics and the Canadian Statistical Association, and the Symposium on Nonlinear Dynamics and
Econometrics, Boston. Financial support is acknowledged from the National Science Foundation
under Grant SBR-9414083, the Social Sciences and Humanities Research Council of Canada, the
Natural Sciences and Engineering Research Council of Canada, and the Fonds pour la Formation
de Chercheurs et l'Aide a la Recherche du Qu6bec. Finally, we thank three anonymous referees for
their valuable comments.
2See the surveys of Zacks (1983), Krishnaiah and Miao (1988), and Bhattacharya (1994).
47
the model). Section 4 proposes test statistics, derives their asymptotic distribu-
tions, and presents critical values. Section 5 discusses sequential methods to
estimate the model without treating all break points simultaneously. All proofs
are collected in an appendix.
residuals 7L' 1ZTTi +i[Yt -X /3 -Z8i]2. Let PQTJ) and 6({IT}) denote the
resulting estimates. Substituting them in the objective function and denoting the
resulting sum of squared residuals as ST(T1, ..., T79.), the estimated break points
(T..., T,,) are such that
" > " denote convergence in probability, " " convergence in distribution,
and " " weak converge in the space D[O, 1] under the Skorohod metric (e.g.,
Pollard (1984)).
ASSUMPTION Al: Let wt = (x, Z)', W= (w1,... I, WY, and W? be the diagon
partition of Wat (To,. .., T7) such that W? = diag(W), . . .,j W )+ O. We assume
each i = 1, .. ., m + 1, with T(? = 1 and T, l + 1 = T, that WiJ 'W0/(Ti -Tio 1)
verges in probability to some nonrandom positive definite matrix not necessarily th
same for all i.
ASSUMPTION A2: There exists an 10 > 0 such that for all 1 > lo, the minimum
eigenvalues of Ail = (1/1)ELT%o+ lwtw' and of A* = (I/l) ETo_ w wI are bounded
away from zero (i = 1,..., rn + 1). I
The sequence of errors {u,) satisfies one of the following two sets of condi-
tions:
Or:
3 Note that, for the proof of the consistency, A3 could be dispensed using generalized inverses.
3.1. Consistency
We outline the main steps of the proof using a few lemmas that are proved in
the appendix. Denote by ut the estimated residuals and by dt the difference
between the fitted and true values. That is, ui =Y -xt/3Zt8k, for t E [Tkl ?
l,Tk] and dt =x'(/-/3) ?z(8k-1j), for t[tk ?1, Tk] n [T + 1, Tj1]
(k,j = 1,. .n.,m + 1). Note that,?in general, dt is defined over (m + 1)2 differ
segments for each of the possible m-partitions {Ti} and {T,?}. Using propertie
projections,
lT lT
(4) -: u2 <-: u2
T t=l T t=1
and using ut = ut -dt,
lT lT lT lT
The proof of Proposition 1 simply uses relations (4) and (5) and the associated
limit of T- T T= Iut t. We start with the latter.
Lemma 1 together with (4) and (5) implies that T- lT= Id2 __ 0. The p
follows by showing that this implies A __ AO. More specifically, T- L[= 1d'--j 0
cannot hold if A1 -'p A' for some j. This is stated in the following lemma.
We are now in the position to prove Proposition 1. Using (5) and Lemmas 1
and 2, and under the supposition that some break date is not consistently
estimated, we have the inequality
T T
PROPOSITION 2: Under Al-A5, for evey r1 > 0, there exists a C < CX, stuch that
forall alarge T, P(IT(Ak - A?)j > C) < -j (k= 1,... m).
Note that when the errors are serially uncorrelated and homoskedastic we
have d = o- 2V and the asymptotic covariance matrix reduces to o2V- 1, which
can be consistently estimated using a consistent estimate of o- 2 When serial
correlation and/or heteroskedasticity is present, a consistent estimate of d) can
be constructed along the lines of Andrews (1991), assuming identical distribu-
tions across segments or allowing the distributions of both the regressors and
the errors to differ.
Note first that, as in the single break case, the usual limiting distribution of
the break dates obtained specifying fixed magnitude of changes depends on the
exact distribution of the pair {zt, ut}. On the other hand, a strategy that permits
obtaining pivotal statistics is to consider an asymptotic framework where the
magnitudes of the shifts converge to zero as the sample size increases. Even
though the setup is particularly well suited to provide an adequate approxima-
tion to the exact distribution when the shifts are small, it remains adequate even
for moderate shifts. The required conditions are stated in the next assumptions
defined for i = 1, ... , m.
Note that for a smaller magnitude of shift (small VT), which corresponds to a
smaller X, A6 requires the existence of a higher moment of ut. When VT is
fixed constant, we can choose O arbitrarily close to 1/2. In this case, the
requirement of ElBt 2/0 <M reduces to the existence of 4 + S mo
stated in A4.
Bai (1994a, 1994b) for the case of a single break. Proposition 4 allows us to study
the limiting distribution of the estimated break dates. It asserts that we can
restrict the analysis to a "neighborhood" of length C/vT around the true break
dates Tio which makes possible the application of a central limit theorem
this "neighborhood" increases when LT decreases. With the mixing assumptions
on the errors, each segment is asymptotically distinct and the analysis of the
limiting distribution of the break dates is similar to that in the single break case
as analyzed in Bai (1994a, 1994b). We provide, in the rest of this section, a
description of the results when the data are not trending and under the
assumption that the following conditions are satisfied.
and
Ti0 ?+[s ATi''] Tj 1?+[SATi ]
where Bi(s) is a multivariate Gaussian process on [0, 1] with mean zero and
covar-iance EBi(s)Bi(u) = min{s, u}lfi.
Now, define for i = 1,..., n: i= A1Qi+I A1A/Qi Ai, i21 = Al2 Ai/IAQi Ai,
,= A2Qi+ Ai/AQi+ 1Ai, and let WP)(s) and W4'(As) be independent Wiener
processes defined on [0, oo), starting at 0 when s = 0. These processes are also
independent across i. Also, define Z(')(s)= - i1WI("( -s)-Is1/2, for s < 0, an
Z(i)(s) = 4ri 4i2WIA(s) - 4JsJ/2, for s > 0. We can state the following result.
The limiting distribution is the same as that occurring in a single break model.
The density function of argmaxsZ(I)(s) is derived in Bai (1994b) and is nonsym
metric. When the limits Qi, !)i, and o-i2 are the same for adjacent i's, (i = 1, an
X1= Xi 2-X, in which case the limiting distribution reduces to:
which is symmetric about the origin and has distribution function (see Yao
(1987)):
+ 3exP(- 3Vx:/2),
for x> 0 and H(x) = 1 - H(-x), with >(x) the distribution function of a
standard normal variable. For instance the 95% and 97.5% quantiles are 7.7 and
11.0.
The results discussed above allows easy construction of confidence intervals
for the break dates. All that is needed is to construct consistent estimates of the
various parameters; T` I= 1z, z for Q, T- [ =i2u for C 2 and 5i + ^ - 3
UTiA. When serial correlation is present, D2 can be estimated using a kernel-
based method as discussed in Andrews (1991). Note that when the segments are
not homogeneous, obtaining consistent estimates is still possible using data over
the relevant subsamples only.
The limiting distribution in the case of trending regressors is discussed in Bai
(1994b, 1995) for a single structural change model. His results remain valid for
multiple breaks. We omit the details and refer the reader to those papers.
general case where plimT CT-' 1[TsWtW,W = Q(s), which allows trending r
sors, are beyond the scope of the present paper.
ASSUMPTION A9: The ei7ors {ut) form an atray of martingale differences relati
to {gt7} = u-field {... wt_ w, ... ,t2 1_. Also, E[U2] = cr2 for all t a
T-1/2E[TW iW,t, = oQ1/2W*(r), with W*(r) a (p + q) vector of independent
Wiener processes.
The case where {ut} satisfies the general conditions stated in Assumption A4
is discussed in Section 4.4 below. We show how the results remain valid provided
appropriate modifications are made to account for the effect of serial correla-
tion on the asymptotic distributions. The following proposition is proved in the
appendix.
Note that the asymptotic distribution of the test statistic depends on the value
of e in AE. As e converges to zero, the critical values of the limiting random
variable of supFT(k; q) diverge to infinity. Because the computed test statistic
for a given sample is finite, a small positive value of e can improve the power
significantly; see Andrews (1993) for further details. In what follows, we have
adopted e = 0.05. No critical values for k ? 2 are available except those of
Garcia and Perron (1996) for k = 2 and q = 1.
Asymptotic critical values are obtained via simulations. The Wiener process
WJ,(A) is approximated by the partial sums n-1/2E[nAle with ei i.i.d.
and n = 1,000. The number of replications is 10,000. For each replication, the
supremum of F(A1,..., Ak; q) with respect to (A1,..., Ak) over the set A
obtained via a dynamic programming algorithm. We present, in Table I, critical
values covering cases with up to 9 breaks (k = 1,... , 9) and up to 10 regressors
(q = 1,..., 10) whose coefficients are the object of the test. The values reported
are scaled up by q for comparison purposes. The column corresponding to k = 1
can also be found in Andrews (1993). Because supFT(l; q) < 2sup FT(2; q) <
ksup FT(k; q), the consistency of the supFT(k; q) (k ? 2) follows from Andrews
(1993) who proved the consistency of supFT(l; q) for various alternatives includ-
ing multiple breaks.
TABLE I
Number of Breaks, k
q a 1 2 3 4 5 6 7 8 9 UDmax WDmax
1 .90 8.02 7.87 7.07 6.61 6.14 5.74 5.40 5.09 4.81 8.78 9.14
.95 9.63 8.78 7.85 7.21 6.69 6.23 5.86 5.51 5.20 10.17 10.91
.975 11.17 9.81 8.52 7.79 7.22 6.70 6.27 5.92 5.56 11.52 12.53
.99 13.58 10.95 9.37 8.50 7.85 7.21 6.75 6.33 5.98 13.74 15.02
2 .90 11.02 10.48 9.61 8.99 8.50 8.06 7.66 7.32 7.01 11.69 12.33
.95 12.89 11.60 10.46 9.71 9.12 8.65 8.19 7.79 7.46 13.27 14.19
.975 14.53 12.64 11.20 10.29 9.69 9.10 8.64 8.18 7.80 14.69 16.04
.99 16.64 13.78 12.06 11.00 10.28 9.65 9.11 8.66 8.22 16.79 18.11
3 .90 13.43 12.73 11.76 11.04 10.49 10.02 9.59 9.21 8.86 14.05 14.76
.95 15.37 13.84 12.64 11.83 11.15 10.61 10.14 9.71 9.32 15.80 16.82
.975 17.17 14.91 13.44 12.49 11.75 11.13 10.62 10.14 9.72 17.36 18.79
.99 19.25 16.27 14.48 13.40 12.56 11.80 11.22 10.67 10.19 19.38 20.81
4 .90 15.53 14.65 13.63 12.91 12.33 11.79 11.34 10.93 10.55 16.17 16.95
.95 17.60 15.84 14.63 13.71 12.99 12.42 11.91 11.49 11.04 17.88 19.07
.975 19.35 16.85 15.44 14.43 13.64 13.01 12.46 11.94 11.49 19.51 20.89
.99 21.20 18.21 16.43 15.21 14.45 13.70 13.04 12.48 12.02 21.25 22.81
5 .90 17.42 16.45 15.44 14.69 14.05 13.51 13.02 12.59 12.18 17.94 18.85
.95 19.50 17.60 16.40 15.52 14.79 14.19 13.63 13.16 12.70 19.74 20.95
.975 21.47 18.75 17.26 16.13 15.40 14.75 14.19 13.66 13.17 21.57 23.04
.99 23.99 20.18 18.19 17.09 16.14 15.34 14.81 14.26 13.72 24.00 25.46
6 .90 19.38 18.15 17.17 16.39 15.74 15.18 14.63 14.18 13.74 19.92 20.89
.95 21.59 19.61 18.23 17.27 16.50 15.86 15.29 14.77 14.30 21.90 23.27
.975 23.73 20.80 19.15 18.07 17.21 16.49 15.84 15.29 14.78 23.83 25.22
.99 25.95 22.18 20.29 18.93 17.97 17.20 16.54 15.94 15.35 26.07 27.63
7 .90 21.23 19.93 18.75 17.98 17.28 16.69 16.16 15.69 15.24 21.79 22.81
.95 23.50 21.30 19.83 18.91 18.10 17.43 16.83 16.28 15.79 23.77 25.02
.975 25.23 22.54 20.85 19.68 18.79 18.03 17.38 16.79 16.31 25.46 26.92
.99 28.01 24.07 21.89 20.68 19.68 18.81 18.10 17.49 16.96 28.02 29.57
8 .90 22.92 21.56 20.43 19.58 18.84 18.21 17.69 17.19 16.70 23.53 24.55
.95 25.22 23.03 21.48 20.46 19.66 18.97 18.37 17.80 17.30 25.51 26.83
.975 27.21 24.20 22.41 21.29 20.39 19.63 18.98 18.34 17.78 27.32 28.98
.99 29.60 25.66 23.44 22.22 21.22 20.40 19.66 19.03 18.46 29.60 31.32
9 .90 24.75 23.15 21.98 21.12 20.37 19.72 19.13 18.58 18.09 25.19 26.40
.95 27.08 24.55 23.16 22.08 21.22 20.49 19.90 19.29 18.79 27.28 28.78
.975 29.13 25.92 24.14 22.97 21.98 21.28 20.59 19.98 19.39 29.20 30.82
.99 31.66 27.42 25.13 24.01 23.06 22.18 21.35 20.63 19.94 31.72 33.32
10 .90 26.13 24.70 23.48 22.57 21.83 21.16 20.57 20.03 19.55 26.66 27.79
.95 28.49 26.17 24.59 23.59 22.71 21.93 21.34 20.74 20.17 28.75 30.16
.975 30.67 27.52 25.69 24.47 23.45 22.71 21.95 21.34 20.79 30.84 32.46
.99 33.62 29.14 26.90 25.58 24.44 23.49 22.75 22.09 21.47 33.86 35.47
Notes: 1. The test UDmax is defined as max, < k < 5SUP(Al Ak,)C F(Al Ak; q) mtu
given in (9) multiplied by q, and M is chosen to be 5.
The test discussed above requires the specification of the number of breaks,
m, under the alternative hypothesis. It is of interest to consider tests of no
structural break against an unknown number of breaks given some upper bound
M. Consider the following new class of tests, called the doutble maximutm tests:
defined for some fixed weights {a1, .. ., am}. Note that the asymptotic distribution
of this class of tests is easily obtained from Proposition 6. Indeed, we have
The weights may reflect the imposition of some priors on the likelihood of
various numbers of breaks. Apart from such considerations, precise theoretical
guidelines about their choice remain an open question. An obvious candidate is
to set all weights equal to unity and we label this version of the test as
UDmax FT(M, q) = max1I < I < Msup(A A,,.) e AC FT(Aj, . . , A,,,; q). For
F(A1,.. ., A,,,; q) is the sum of m dependent chi-square random var
degrees of freedom, each one divided by m. This scaling by m can be viewed, in
some sense, as a prior imposed to account for the fact that as ni increases a
fixed sample of data becomes less informative about the hypotheses being
confronted. Since for any fixed q the critical values of the individual tests
SUP(A1... A,,,) eAlFT(Al,.., A,..; q) decrease as m increases, this implies t
marginal p-values decrease with mn and may lead to a test with low power if the
number of breaks is large. One way to alleviate this problem is to consider a set
of weights such that the marginal p-values are equal across values of m. This
implies weights that depend on q and the significance level of the test, say a. To
be more precise, let c(q, a, m) be the asymptotic critical value of the test
c(q, a,l1)
(9) WDmax FT (M, q)= max
1?7l <M c(q, a, m)
x sup FT(Al1,..A,l,;q).
(A1,.,, A I) E e
The last two columns of Table I report the asymptotic critical values of both
tests for M = 5 and e = 0.05. This should be sufficient for most empirical
applications. In any event, the critical values vary little for choices of the upper
bound M larger than 5. The consistency of the tests follows directly from the
consistency of supFT(k; q).
This section considers a test of the null hypothesis of 1 breaks against the
alternative that an additional break exists. Ideally, one would base the test on
the difference between the sum of squared residuals obtained with 1 breaks and
that obtained with 1 + 1 breaks. The limiting distribution of this test statistic is,
however, difficult to obtain. Here, we pursue a different strategy. For the model
with 1 breaks, the estimated break points, denoted by T1,... T, 1', are obtained by
a global minimization of the sum of squared residuals. Our strategy proceeds by
testing each (I + 1) segment (obtained using the estimated partition T,... ,)
for the presence of an additional break. We assume the magnitude of shifts is
fixed (nonshrinking) in this section.
The test amounts to the application of (I + 1) tests of the null hypothesis of
no structural change versus the alternative hypothesis of a single change. It is
applied to each segment containing the observations Ti>l + ito Tj (i = 1
1 + 1) using again the convention that To = 0 and T+1 = T. We conclude for a
rejection in favor of a model with (I + 1) breaks if the overall minimal value of
the sum of squared residuals (over all segments where an additional break is
included) is sufficiently smaller than the sum of squared residuals from the 1
break model. The break date thus selected is the one associated with this overall
minimum. More precisely, the test is defined by
where
i=
A A
1,
1 + 1 as ST(T,.. . T, 17, T). We have the following result, proved in the Appendix:
The critical values of this test for different values of I can be obtained from
the distribution function Gq, ,(x). A partial tabulation of some percentage po
can be found in DeLong (1981) and Andrews (1993) (see also the first column of
our Table I). However, the grid presented is not fine enough to allow obtaining
the relevant percentage points of Gq,,,(x)1+ 1. Accordingly, we provide a full set
of critical values in Table II calculated with -r = .05. These were obtained using
a simulation method similar to that used for Table I.
Note that 52 is only required to be consistent under the null hypothesis for
the validity of the stated asymptotic distribution. The test may, however, have
better power if 52 is also consistent under the alternative hypothesis. Also, it is
important to note that the results carry through allowing different distributions
across segments for the regressors and the errors. That is, Proposition 7 remains
TABLE II
q a 0 1 2 3 4 5 6 7 8 9
1 .90 8.02 9.56 10.45 11.07 11.65 12.07 12.47 12.70 13.07 13.34
.95 9.63 11.14 12.16 12.83 13.45 14.05 14.29 14.50 14.69 14.88
.975 11.17 12.88 14.05 14.50 15.03 15.37 15.56 15.73 16.02 16.39
.99 13.58 15.03 15.62 16.39 16.60 16.90 17.04 17.27 17.32 17.61
2 .90 11.02 12.79 13.72 14.45 14.90 15.35 15.81 16.12 16.44 16.58
.95 12.89 14.50 15.42 16.16 16.61 17.02 17.27 17.55 17.76 17.97
.975 14.53 16.19 17.02 17.55 17.98 18.15 18.46 18.74 18.98 19.22
.99 16.64 17.98 18.66 19.22 20.03 20.87 20.97 21.19 21.43 21.74
3 .90 13.43 15.26 16.38 17.07 17.52 17.91 18.35 18.61 18.92 19.19
.95 15.37 17.15 17.97 18.72 19.23 19.59 19.94 20.31 21.05 21.20
.975 17.17 18.75 19.61 20.31 21.33 21.59 21.78 22.07 22.41 22.73
.99 19.25 21.33 22.01 22.73 23.13 23.48 23.70 23.79 23.84 24.59
4 .90 15.53 17.54 18.55 19.30 19.80 20.15 20.48 20.73 20.94 21.10
.95 17.60 19.33 20.22 20.75 21.15 21.55 21.90 22.27 22.63 22.83
.975 19.35 20.76 21.60 22.27 22.84 23.44 23.74 24.14 24.36 24.54
.99 21.20 22.84 24.04 24.54 24.96 25.36 25.51 25.58 25.63 25.88
5 .90 17.42 19.38 20.46 21.37 21.96 22.47 22.77 23.23 23.56 23.81
.95 19.50 21.43 22.57 23.33 23.90 24.34 24.62 25.14 25.34 25.51
.975 21.47 23.34 24.37 25.14 25.58 25.79 25.96 26.39 26.60 26.84
.99 23.99 25.58 26.32 26.84 27.39 27.86 27.90 28.32 28.38 28.39
6 .90 19.38 21.51 22.81 23.64 24.19 24.59 24.86 25.27 25.53 25.87
.95 21.59 23.72 24.66 25.29 25.89 26.36 26.84 27.10 27.26 27.40
.975 23.73 25.41 26.37 27.10 27.42 28.02 28.39 28.75 29.13 29.44
.99 25.95 27.42 28.60 29.44 30.18 30.52 30.64 30.99 31.25 31.33
7 .90 21.23 23.41 24.51 25.07 25.75 26.30 26.74 27.06 27.46 27.70
.95 23.50 25.17 26.34 27.19 27.96 28.25 28.64 28.84 28.97 29.14
.975 25.23 27.24 28.25 28.84 29.14 29.72 30.41 30.76 31.09 31.43
.99 28.01 29.14 30.61 31.43 32.56 32.75 32.90 33.25 33.25 33.85
8 .90 22.92 25.15 26.38 27.09 27.77 28.15 28.61 28.90 29.19 29.49
.95 25.22 27.18 28.21 28.99 29.54 30.05 30.45 30.79 31.29 31.75
.975 27.21 29.01 30.09 30.79 31.80 32.50 32.81 32.86 33.20 33.60
.99 29.60 31.80 32.84 33.60 34.23 34.57 34.75 35.01 35.50 35.65
9 .90 24.75 26.99 28.11 29.03 29.69 30.18 30.61 30.93 31.14 31.46
.95 27.08 29.10 30.24 30.99 31.48 32.46 32.71 32.89 33.15 33.43
.975 29.13 31.04 32.48 32.89 33.47 33.98 34.25 34.74 34.88 35.07
.99 31.66 33.47 34.60 35.07 35.49 37.08 37.12 37.23 37.47 37.68
10 .90 26.13 28.40 29.68 30.62 31.25 31.81 32.37 32.78 33.09 33.53
.95 28.49 30.65 31.90 32.83 33.57 34.27 34.53 35.01 35.33 35.65
.975 30.67 32.87 34.27 35.01 35.86 36.32 36.65 36.90 37.15 37.41
.99 33.62 35.86 36.68 37.41 38.20 38.70 38.91 39.09 39.11 39.12
a nontrivial break point in the sense that both boundaries of each segment are
separated from the true break point by a positive fraction of the total number of
observations. For this segment, the supFT(l; q) test statistic diverges to infinity
as the sample size increases since it is consistent. Accordingly, the statistic
FT(l + 1l1) (computed for I + 1 segments) also diverges to infinity. This shows
consistency.
The tests discussed above can be applied without the imposition of serially
uncorrelated errors as specified in Assumption A9. A simple modification is to
use the following version of the F test instead of that specified in (7):
FT (Al, , A q) = (52/hU(O))FT(Al . ., A;
with 2= T- Lt=It2 and hj(O) a consistent estimate of hjO). Hence, t
robust version of the test is simply a scaled version of the original statistic. This
is the case, for instance, when testing for a change in mean as in Garcia and
Perron (1996).
The computation of the robust version of the F test (12) can be involved
especially if a data dependent method is used to construct the robust asymptotic
covariance matrix of 8. Since the break fractions are T-consistent even with
correlated errors, an asymptotically equivalent version is to first take the
supremum of the original F test to obtain the break points, i.e. imposing
D = of 2I. The robust version of the test is obtained by evaluating (12) and (13) at
these estimated break dates.
5. SEQUENTIAL METHODS
In this section, we show that the estimate of the break fraction in a single
structural change regression applied to data that contain two breaks converges
to one of the two true break fractions. In independent work, Chong (1994)
obtains a similar result (see also Bai (1994c) for an earlier exposition). To
present our arguments, we consider a simple three-regime model:
Without loss of generality we consider the case where S(A1) < S(A2); our result
is stated in the following lemma.
LEMMA 3: Suippose that the data are generated by (14) and that S(A1) < S(A2);
the estimated single break point Ta/T is consistent for A1.
The assumption that S(Aj) < S(A2) implies that the first break point is
dominating in terms of the relative magnitudes of shifts and the regime spells.
The above lemma shows that the sum of squared residuals is reduced the most
when the dominating break is identified. Given that T,/T is consistent for A1,
one can use the subsample [Tb, T] to estimate another break point associated
with a minimized sum of squared residuals for this subsample. The resulting
estimate is then consistent for A2. This follows from the same type of argument
because only A2 can be the dominating break in the sample [T, T], even if
T?f < [TA1].
The arguments in Section 5.1 showed that T1J/T is consistent for one of the
true break points, the one that allows the greatest reduction in the sum of
squared residuals. Suppose, as above, that this break point is A1, which, in
general, may not be known. In that case, we choose one break point either in
the intervals [1, T] or [Ta, T], such that the sum of squared residuals for all
observations [1 T] is minimized. Let r be this estimator. With probability
tending to 1 as T increases, it is easy to show that the estimated break point r
will be in the interval [ Ta T]. Similarly, if Ta is actually consistent for A2 (t
will be true if S(Al1)> S(A2)), the second estimated break point will be in [1,T
Generally, let (N1, N2) be the ordered version of (T, r) such that N1 < N. Th
(N1/T, N2/T) is consistent for (A1, A2). The preceding argument implies that
we can obtain consistent estimates of A1 and A2 in a sequential way.
PROPOSITION 8: Let 'Ii be the nunmber of bereaks obtain-ed lusinlg the sequienitial
mnethod based on the staltistic FT(l + 111) applied with some size aT, and let nz() be
the true inumber of breatks. If a, conLVeiges to 0 slowly enouigh (for the test bctsed
o01 FT(l + 1 l) to remain conisistenit), theni, itinder Assimptions Al-A5, P(;h = mz()
-I1, as T -^x.
6. CONCLUSIONS
need to evaluate the quality of the approximations and the power of the tests in
finite samples via simulations. We present such a simulation study in a compan-
ion paper, Bai and Perron (1996). Among the topics to be investigated, an
important one appears to be the relative merits of different methods to select
the number of structural changes. There are, of course, many other issues on
the agenda: for instance, extensions of the test procedures to include tests that
are optimal with respect to some criteria and extensions to nonlinear models. In
addition, while the consistency and rate of convergence for the estimated break
points apply to trending regressors, the limiting distributions of the various tests
for structural change remain to be studied in the presence of trending regres-
sors.
MATHEMATICAL APPENDIX
LEMMA A. 1: Let S and V be two matrices having the same number of rows. Then the matrix S'MV S
is nondecreasing as more rows are added to the matrix (S, V).
PROOF: Write S = (S', S)' and V= (V1', V2)'. We need to show that for an arbitrary vector a
(having the same dimension as the number of rows of S and V) a'S'MvSa 2 a'S'Mv1Sl a. Note
that a'S'MvSa (a'S'Mv1Sl a) is the sum of squares of the residuals from a projection of Sa (S a)
on the space spanned by V (V1). The inequality is verified using the fact that the sum of squared
residuals is nondecreasing as the number of observations increases (here the number of rows of Si
and S). See, e.g., Brown, Durbin, and Evans (1975). Q.E.D.
PROOF: Consider first the case where A4(i) is assumed to hold. Because of the independence
between zs and ut, we can treat the z 's as nonstochastic, otherwise conditional argument
used. We shall prove that IU'PzUI= OP(T2a) uniformly in T,,... T7,. Note that U'PZ
summation of the m + 1 terms
H ztz f ' t
Ti+ 1 Ti+ 1 Tj+ 1
T T 1 2s
2s2s 11 s 2-
(19) E <, E i-[ =
t =k ( =-w[ t= k
LEMMA A.5: Under A1-A4, for some a < 1/2, (a) supT. T X PZU = 01)(Ta+ 1/2); (b)
SUPT. Z O'PZU= OP(T a/2).
PROOF: This follows from Lemma A.4, lIXII = O (T 112 ), aind llX'PzUll ? llXj lllPZUjj. Similar
arguments apply for part (b). Q. E.D.
1 -T
The first term is O,lO) uniformly over all partitions by Lemmas A.2 and A.3. The second term
OP(Ta-/2) = 0o () by Lemmas A.2 and A.4 (note that X'MZU = Op(Ta? /2)). Thus 3({Tj}) -
= O(1) uniformly over all partitions. This implies (22) since U'X= - ,0(T /2). Next, from ={T
(ZMAZ) 1Z'MZ Y, MAX = 0, and (2), we obtain
T,; < T(AOjO- ) anid T(A" + n) < T,;. Then ( -I1 =-Xtt ) + .z(8k t- for to [T(A(j1-),Tj1]
and c, d x( ) - pt + Z:(8k - 31 ) for [TAp? + 1, T(A( + -q)]. We have
T
(;iii:2) L-z,x, Lz I I A
where E, extencls over the set T(A - 0 ) < t ? TAj. ancl E, extends ov
+ -q). Let YT and yT be the smallest eigenvalue of the first and seco
? (12~~~~~
2)m-in{yT~ Tyi)}IIQ
~~~ ( /)iil7 () 1- F'12
T}l6 4- 1
for an aibiti-aiv positive definiite natrix A and for all x. Nowv the first matrix in (27) can be wivitteni as
(T-0)(i/ T)11Y i , V >nV,(Tq)AT say. By A2, the smallest eigenvalue of AT is bouncled away
from zero. Thus the smallest eigenvalue of (T-q)AT, T'T iS of the order T-q. The same can be said for
yT. Therefoie, Z Td > TC IjI - 6j? 1K for some C> 0 with probability no less than E( > 0. O.E.D.
PROOF OF PROPOSITION 2: Without loss of generality, we assume there are only thlee breaks
(n = 3) and provide an explicit proof of T-consistency for A, only. The analysis for A, and A. is
virtually the same (and actually simpler) and is thus omitted. For eaclh e > 0, let Ve = {(T1. T, T3);
Ti - Ti7)I < ET). From Proposition 1, P({T1, T1, f- E V1) -* 1. Therefore we only need to examine the
behavior of the sum of squared residuals, ST(TI, T, T3), for those Ti suchi that I Ti - Ti( < E T for all
i. Also using an argument of symmetry, we can, without loss of generality, consider the case T, < TV.
For C > 0, define
VJ0C = {( T, T,, 7'1'7; {'- Tiol < E T, I < i < 3), T-)- T," < - C}.
Thus, 1V(C) c V4. Because ST(T1, T, T,) < ST(TI, 7T, T.,) with probability I, it is enouglh to shiow that
for each r1 > 0, there exist C > 0 and e > 0 such that for large T, P(minl{ST(TI, T, T) -
ST(TI T'', T,)} < 0) < -q, or equivalently,
easy to derive exact expressions for (29) in terms of estimated coefficients. Let (81*2, 8A, 83
denote the estimator of (8 , 8, 8 2, ?,6 84) based on the partition (T1, T2, T2?, T3) (note 82? is
repeated once). In particular, 82* is an estimate of 82 associated with the regressor (0, ..., 0, zT+
ZT2,0...0)', 8A is an estimate of 82 associated with the regressor Z=(0. 0, ZT2+1.
ZT?,0.--0)', and 8* is an estimate of 82 associated with the regressor (O,....O, zT2+1.ZT3 0,
0)'. Now consider SSR1 - SSR3; we have (e.g., Amemiya (1985, p. 31)),
The inequality is due to ZMWZA < ZAZ. From the definition of MW, we have
- )- (II){--(III).
Consider term (I). Note first that 8,i is close to 8,0 given that, on the
between Ti and Ti? can be controlled and made small by choosing a sm
estimated using observations from the second true regime only, 8A is close to 82? for a large enough
C, on VJ(C). Hence, for large C, large T and small E, (I) is no less than (1/2)(8? - 82)'[Z,ZA/(T2
-T,)](8 - 82) with large probability. Next consider term (II). It is easy to show that on V1(C), 8*
and 8A are Op(l) uniformly. Also on V1(C), (W'W/T)- 1 = Op(1) and ZAW/(T2?-T =Op(l)
(because ZAW involves no more than T2? - T2 observations). Furthermore,
Thus (II) is no larger than EOp(1). Consider finally (III). Because both 8* and 8A are close to 82,
182 - i81 < p with large probability for every p > 0 (this is true for large T, large C, and small E).
Also, because IIZAZA/(T2 - T2)1 = Op(l) uniformly on V1(C), term (III) is no larger than pOp(l).
Hence, the inequality
2T(
ZAZO/(T2T ) = T0' E : Z
-2t=T2+ I
has its minimum eigenvalue bounded away from zero on V1(C). Thus the first term on the right-hand
side of (32) is positive and dominates the other two terms. It follows that with large probability,
(SSRI - SSR2)/(T2 - 7,) > 0. This proves (28) and the proposition. Q.E.D.
PROOF OF PROPOSITION 4(i): The structure of the proof is similar to that of Proposition 1 but
modifications are necessary in view of the fact that T-1 T= 1d - 0 even supposing a break is
consistently estimated when the shifts are shrinking. Using (4) and (5) (without dividing T on both
sides), we can arrive at the desired contradiction if we can show that >T Td2 > 2yT 1Lt0d,
limit as T oc. To do this we show that ET[ Id2 diverges at a faster rate than ET 1ultdt.
We will make use of VT being small to strengthen the result of (22) and (23). We shall drop the
subscript T in 8 i. From 8i0 - 810+ 1 = 0(0?T) under A6, by adding and subtracting terms, we
i?- = O(LT) for all i and j. Now consider (24). The first term on the right-hand side can be
rewritten as (X'MZX)- 'X'Mz(Z0 - Z)8 0 because MZZ = 0. A key to the proof lies in the fact
that (Z? - Z)8? depends on changes in the parameters (i.e. 8i0 - 8-0). In the case of a single change
point, for example, assume T1 < TO; then
over all partitions. Next, we combine the first and the third terms of (25) and rewrite them as
for some C' > 0. Thus T= Id2 > 2E T 1lltdt if Tv2/(Ta+ 1/2LT + T2a) m. This is the case
T"'/2)- aL T -X oc. Under E ILtt 12/ 0 < c of A6, we can choose a such that a < i in Lemma A.4. Thus
T(T/2) a V T X by A6. Q.E.D.
To prove Proposition 4(ii), we first prove a lemma, which generalizes the Hajek and Renyi
inequality to mixingales.
LEMMA A.6: Let 6,,9i,} be a q x 1 L2 mixingale satisfing (a)-(e) of A4(i) withl ui u-eplaced by (i
and r1eplaced by 1 1. Then there exists ani L < - such tlhat, for eveiy c > 0 and n > 0,
( I1k L
P slip k | ,| >c C
k?mn k )= n
t= 1 ~ ~ ~ ~ ~ ~ ~ ~ ~ ~ k
p( supik ||/S
Ejt > ajc
1 N
< , 2 ijill2 + E Ejill)
j= - 17 i 1 i=ii1+E1
the latter bound is due to Hajek and Renyi's inequality for martingale differences. From the
definition of a mixingale and A4(i)(c), Ell _fjl2 < 4c7fr121 < 4K2ff.. Thus the above is bounde
- i=a7j-V'j)(m + LE,, + i2). Since LT, +1i2 ? 2m-1, if we let L
12K2(_J= --a72 ) then the desired upper bound is obtained for a fixed N. Since the boun
not depend on N, the lemma is obtained by letting N -cc*It remains to choose appropriat
such that E jCaI- J121 is bounded. Let P0 = 1 and vj -j 1 K(j 2 1), where K > 0 as given in
Let a1= vj/(1 + 2L> Iv ) and a1-j = aj for 1 ? 0. Then Ejaj = 1. By Assumption A4(i)(d),
PROOF OF PROPOSITION 4(i): We shall maintain all the notations in the proof of Proposition 2.
Define a new set
Vl'Q (C) = {(T1, v , T-); I Ti - Ti <T 1 < i < 3, T2 - T4? < -CIv2
which is a subset of V1. We only need to show that (28) holds when the minimum is taken over
Vet (C). We can prove that, uniformly on the set V1*(C),
controlled by choosing a small e. The dependence on IoT is explained in part (i). The term
Op(T 1/2) is related to disturbances and this specific rate is due to the fact that each 8.; is
estimated with a positive fraction of the sample for partitions in VQ*(C). The last two terms of (37)
are due to spillover from the misspecification via partial structural changes.
Using (36) and (37), expression (I) in (31) is no smaller than
From [Z'ZA/(T7) -T2)T]- Op(l) on V'(C) for large C, (III) is further bounded by
SSR_I-_SSR, ZAu 2
(38)~~~~~~~~~~~~~~~~2
For every q > 0, we can choose a small e > 0 such that
choose B < -- such that P(10j1)1 > B) < V. Thus
By Lemma A.6 with 7 lit' c = [A/(2B)] 12 VT, and m = CVT2 (applied with data o
i.e. treating T,? as the first observation), the above probability is bounded by
(T-(k + )q - p) SSRk /) ap .
Hence, we concentrate on the limit of F* = SSR0 - SSR k. Now, let DU(i, j) (DR(i, j), resp.) be the
sum of squared residuals from the unrestricted (restricted, resp.) model using data from segments i
to j (inclusively), i.e. from observation Ti- I + I to Tj. We can write FT = DR(1, k + 1) - L11DU(i, ),
or
(39) F*= T [D(1, i + 1) - DR(1, i)-DU(i + 1, i + 1)] + DR(, 1)- DU(1, 1).
Y=X,/+Z8+U=X,/+Z8+U and
Yj = Xi 3+Z8?+ Uj
(with 8 = (8. 8) a q(k + 1) vector with 8 defined by 81 = 82 = 8k+ 1 8), straightforward
algebra yields
and
we deduce that
(41) FT i = -S'+ 1H+' Si+ 1 + S'Hi 'Si + (Si+ 1 - Si)Y[Hi+ 1 - Hi]- '(Si+ 1 - Si)
where B(r) is a (q +p) dimensional vector Brownian motion with covariance matrix
Q [Q ll Q 12
([i) T-1(X1 j,j1 1(XI .,Zl.j) > ( 2A Q
From these two limits, we deduce easily the following results:
a (k + 1) vector;
We then obtain
The second equality follows since e'Ae = 1 and (e' 2 Q12Ql)B* = Q1Q-B2(1). Using the re
stated above we easily deduce that (Mi+ - M)'(AT -AT) =? 0, (AT -AT)A(Li+ - L)(AT
0, and
FT, i =-S'+ lHi-',Si+ 1 + S'Hi 'Si + (Si+ 1 - Sj) [Hi+ 1 - Hi] I(Si+ 1 - Si) + (1)
Finally, it is easy to verify that DR(1, 1) - DU(1, 1) 0. Note that this convergence result holds
jointly for i = 1.., k; hence
PROOF OF PROPOSITION 7: For simplicity, we present the arguments in the case of a pure
structural clhange. Let SSR(i,j) be the minimized sum of squared residuals for the segment
containing observations from (i + 1) to j; then we can write
IWq( ) - /W?(1)112
sup
where dA? is as defined in (11) with 7i replaced by Ti?. Under the null hypothesis, Propositio
asserts that T7 = 0 + 0,(l). Using this result, we can show that (45) also holds with 7Ti 1 and T
replaced by T> and Ti, respectively. In addition, because over different regimes SSR(-,-) are
computed using nonoverlapping observations, the weak limits in (45) foi different i's are indepen-
dent. Tlhus the limit of (44) is the maximum of I + 1 independent random variables in the form of
(45).
PROOF OF LEMMA 3: We show that S(T) for T E [0,1] has a unique minimum at Al. The function
S(T) has different expressions over [0, 1]. Some algebra reveals that
A1- T
S(T) - S 1At) -A [(I1 Al)( /jl- /2,) + 0 - A,)( T2K ] 7< Al
which is nonnegative. Under the assumption that S( Al) < S( A2), the expression in brackets is
nonzero, so S(T) - S(A1) is strictly positive for T < A1. By symmetry (regarded as reversing the data
order), S(T) - S(A9) is nonnegative for T> A,. Thus for T E [A,, 1],
S(T) -S(A1) = S(T) -S(A,) + S(A2) -S(A1) > S(A2) -S(A1) > 0.
A, Al TO - A)(0 - A,)
S(T)- S(A1) = (T- At) [A. K- /J L 2 I A-)(l-A _) 1 2
A , Al (1:-A?) 1 -
2 (sAt) A, At ( I- )' -O
? (T-Al.)[S(A2) -5S(Al)]
where the first inequality follows from [T(I - A,)]/[AM(1 - )] < I and the second inequali
from A./2 ? 1. Thus S(T) - S(A1) is strictly positive for s e (Al, A,) ancd we have sho
has a uniquLe global minimum at A, when S(Al) < S(A). Because 5Th,) ? 5T([TA ]), it
To/T is consistent for A1. Q. E.D.
REFERENCES
HALL, P., AND C. C. HEYDE (1980): Martingale Limit Theory and its Applications. New York:
Academic Press.
HANSEN, B. E. (1991): "Strong Laws for Dependent Heterogeneous Processes," Econometric Theory,
7, 213-221.
HOSOYA, Y. (1989): "Hierarchical Statistical Models and a Generalized Likelihood Ratio Test,"
Joumal of the Royal Statistical Society, Series B, 51, 435-447.
KRISHNAIAH, P. R., AND B. Q. MIAO (1988): "Review about Estimation of Change Points," in
Handbook of Statistics, Vol. 7, ed. by P. R. Krishnaiah and C. R. Rao. New York: Elsevier.
Liu, J., S. Wu, AND J. V. ZIDEK (1997): "On Segmented Multivariate Regressions," Statistica Sinica,
7, 497-525.
McLEISH, D. L. (1975): "A Maximal Inequality and Dependent Strong Laws," The Annals of
Probability, 5, 829-839.
PERRON, P. (1989): "The Great Crash, the Oil Price Shock and the Unit Root Hypothesis,"
Econometrica, 57, 1361-1401.
POLLARD, D. (1984): Convergence of Stochastic Processes. New York: Springer-Verlag.
YAO, Y.-C. (1987): "Approximating the Distribution of the ML Estimate of the Change-Point in a
Sequence of Independent r.v.'s," Annals of Statistics, 3, 1321-1328.
(1988): "Estimating the Number of Change-Points via Schwarz' Criterion," Statistics and
Probability Letters, 6, 181-189.
YAO, Y.-C., AND S. T. Au (1989): "Least Squares Estimation of a Step Function," Sankhya, 51, Ser.
A, 370-381.
ZACKS, S. (1983): "Survey of Classical and Bayesian Approaches to the Change-Point Problem: Fixed
and Sequential Procedures of Testing and Estimation," in Recent Advances in Statistics, ed. by M.
H. Rivzi, J. S. Rustagi, and D. Sigmund. New York: Academic Press, 245-269.