Discrete-Time Markov Chains Overview
Discrete-Time Markov Chains Overview
3.1 Introduction
Before introducing Markov chain, we first talk about stochastic processes. A stochastic process is a family
of RVs Xn that is indexed by n, where n ∈ T . Note that sometimes people write Xt with t ∈ T . The set T
is called the index set. There are two common types of stochastic processes:
• T is continuous. For instance T = [0, ∞). Then Xn is called a continuous-time stochastic process.
Each random variable Xn can have a discrete, continuous, or mixed distribution. For example, in a queue
Xn could represent the time that the n-th customer waits after arrival before receiving service, with a
distribution that has an atom at zero but is otherwise continuous. Usually, each Xn takes its values in the
same set, which is called the state space and denoted S. Therefore, Xn ∈ S.
We will focus on discrete time stochastic processes with a discrete state space in this course.
Let {Xn : n = 0, 1, 2, ...} be a discrete time stochastic process and the state space S is discrete. The
probability model of this process is determined by the joint distribution function:
P (X0 = x0 , X1 = x1 , · · · , Xn = xn )
3-1
3-2 Lecture 3: Discrete-Time Markov Chain – Part I
Example 2: Precipitation level. Let the precipitation level of day n be Xn where Xn = 0 (dry) or 1 (wet).
Assume that the precipitation level of a day given all the past history only depends on the precipitation level
of the last two days. Namely,
P (x0 , · · · , xn ) = P (xn |xn−1 , xn−2 ) × P (xn−1 |xn−2 , xn−3 ) × P (x2 |x1 , x0 ) × P (x1 |x0 ) × P (x0 ).
Again, it is clear that the RVs X0 , X1 , · · · , Xn are not IID. Although this model shows that the process has
a two-step memory, we can reformulate it to a one-step memory process by defining a sequence of random
vectors Y0 , Y1 , · · · , Yn , · · · such that Yn = (Xn , Xn+1 ). Then
A discrete time stochastic process {Xn : n = 0, 1, 2, · · · } is called a Markov chain if for every x0 , x1 , · · · , xn−2 , i, j ∈
S and n ≥ 0,
P (Xn = i|Xn−1 = j, · · · , X0 = x0 ) = P (Xn = i|Xn−1 = j)
whenever both sides are well-defined. The Markov chain has a one-step memory.
If the distribution function P (Xn = xn |Xn−1 = xn−1 ) = pij is independent of n, we called {Xn } a ho-
mogeneous Markov chain. Otherwise we called it an inhomogeneous Markov chain. For a homogeneous
Markov chain, X
pij = 1, pij ≥ 0
j∈S
for every i, j. Note that sometimes people write pij = pi→j , where pi→j stands for that the probability
moving from state i to state j.
Because S is a discrete set, we often label it as S = {1, 2, 3, · · · , s} and the elements {pij : i, j = 1, · · · , s}
forms an s × s matrix P = {pij }. P is called the transition (probability) matrix (t.p.m). The property
of homogeneous Markov chain implies that
P ≥ 0, P1s = 1s , (3.1)
where 1s = (1, 1, 1, 1, · · · , 1)T is the vector of 1’s. Note that any matrix satisfying equation (3.1) is called a
stochastic matrix.
Example 3: SIS (Susceptible-Infected-Susceptible) model.
Suppose we observe an individual over a sequence of days n = 1, 2, . . . and classify this individual each day
as (
I if infected
Xn =
S if susceptible.
Lecture 3: Discrete-Time Markov Chain – Part I 3-3
We would like to construct a stochastic model for the sequence {Xn : n = 1, 2, ...}. One possibility is to
assume that the Xn ’s are independent with P (Xn = I) = 1 − P (Xn = S) = α. However, this model is not
very realistic since we know from experience that the individual is more likely to stay infected if he or she is
already infected.
Since Markov chains are the simplest models that allow us to relax independence, we proceed by defining a
transition probability matrix:
% I S
P= I 1−α α
S β 1−β
It can be helpful to visualize the transitions that are possible (have positive probability) by a transition
diagram:
p
1−p I S 1−q
Xn = 3
The probability of transfer depends on the number of particles in each compartment. For N = 2 we have
states 0, 1, 2 and t.p.m.
0 1 0
P = 1/2 0 1/2
0 1 0
and the transition diagrm
0 1 2
There is a data on the precipitation (in inches), recorded by the UW Weather Service, at Sonqualmie Falls
in the years 1948–1983. We examine the data for January only and consider dry (=0) and wet (=1) only. If
we condition on the state on January 1st we obtain the frequencies of the four different transitions as:
0 1 Total
0 186 123 309
1 128 643 771
Total (314) (766) 1080
For example, there were 123 occasions on which a wet day followed a dry day.
From the table of frequencies we can compute the relative frequencies of transitions:
0.602 0.398
P=
b
0.166 0.834
This is an estimate (hence the hat!) of the t.p.m.
To give an intuition about how we obtain the Markov property, consider a simple case where n = 1 and
m = 2.
X
P (X3 = i3 |X1 = i1 , X0 = i0 ) = P (X3 = i3 , X2 = i2 |X1 = i1 , X0 = i0 )
i2
X
= P (X3 = i3 |X2 = i2 , X1 = i1 , X0 = i0 )P (X2 = i2 |X1 = i1 , X0 = i0 )
i2
!
X
= P (X3 = i3 |X2 = i2 , X1 = i1 )P (X2 = i2 |X1 = i1 )
i2
X
= P (X3 = i3 , X2 = i2 |X1 = i1 )
i2
= P (X3 = i3 |X1 = i1 ).
Lecture 3: Discrete-Time Markov Chain – Part I 3-5
!
To argue the equality ‘=’, observe that
= P (X3 = i3 |X2 = i2 ).
Property: conditional independence. We can represent the Markov chain using a simple graphical
model:
0 1 2 n−1 n
The claim of the Markov property is now obvious from the theorem on conditional independence and graphical
factorization. Indeed, the latest available state serves as a separating set.
Using the graph representation, we obtain an interesting property about a Markov chain: the past and the
future are independent given the present.
To see this, again we consider a simple case where n = 2 and we have X0 , X1 , X2 . Here X0 denotes the past,
X1 denotes the present, and X2 denotes the future. Then
P (X0 = i0 , X1 = i1 , X2 = i2 )
P (X0 = i0 , X2 = i2 |X1 = i1 ) =
P (X1 = i1 )
P (X2 = i2 |X1 = i1 , X0 = i0 )P (X1 = i1 , X0 = i0 )
=
P (X1 = i1 )
P (X1 = i1 , X0 = i0 )
= P (X2 = i2 |X1 = i1 )
P (X1 = i1 )
= P (X2 = i2 |X1 = i1 )P (X0 = i0 |X1 = i1 )
(n)
Lemma 3.1 Let {Xn } be a homogeneous Markov chain and let pij be the n-step transition probability.
Then for any k = 0, 1, 2, · · · ,
(n)
P (Xn+k = j|Xk = i) = pij .
Proof:
X
P (Xn+k = j|Xk = i) = P (Xn+k = j|Xn+k−1 = in+k−1 ) × · · · × P (Xk+1 = ik+1 |Xk = i)
ik+1 ,ik+2 ,··· ,in+k−1
X
= P (Xn = j|Xn−1 = in+k−1 ) × · · · × P (X1 = ik+1 |X0 = i)
ik+1 ,ik+2 ,··· ,in+k−1
(n)
= P (Xn = j|X0 = i) = pij .
The n-step transition probabilities are related to each other via the famous Chapman-Kolmogorov Equation.
(n)
Lemma 3.2 Let {Xn } be a homogeneous Markov chain and let pij be the n-step transition probability.
Then for any n, m = 0, 1, 2, · · ·
(n+m)
X (n) (m)
pij = pik pkj . (3.3)
k∈S
Proof:
(n+m)
pij = P (Xn+m = j|X0 = i)
X
= P (Xn+m = j, Xm = k|X0 = i)
k∈S
X
= P (Xn+m = j|Xm = k, X0 = i)P (Xm = k|X0 = i)
k∈S
X
= P (Xn+m = j|Xm = k)P (Xm = k|X0 = i) (Markov property)
k∈S
X
= P (Xn = j|X0 = k)P (Xm = k|X0 = i) (time-invariant property)
k∈S
(n) (m)
X
= pik pkj .
k∈S
The forward equation singles out the final step and has the initial state i fixed. The equation is most useful
(n)
when interest centers on the pij ’s for a particular i but all values of j. Conversely, the backward equation
singles out the change from the initial state i and has the final state j fixed. This equation is useful when
(n)
interest is in the pij ’s for a particular j but all values of i. The backward equation can be interesting, in
particular, when there is an absorbing state j from which there is no escape (pjj = 1).
Lecture 3: Discrete-Time Markov Chain – Part I 3-7
(n)
If we collect the n-step transition probabilities into the matrix P(n) = {pij }, then Kolmogorov’s forward
and backward equations can be rewritten in matrix form as
% 0 1
P= 0 1−α α
1 β 1−β
Note that we use {0, 1} to denote the state I and S in the SIS model.
Assume that the initial distribution p0 = (1−α, α), i.e., P (X0 = 0) = 1−α. Moreover, assume that β = 1−α
so the distribution of X1 will be
1−α α
pT1 = pT0 P = (1 − α, α)
1−α α
= [(1 − α)2 + α(1 − α), α(1 − α) + α2 ] = (1 − α, α) = pT0 .
What will be the distribution of Xn ? Using the matrix form, we know that
which is the joint PMF of IID random Bernoulli variables with parameter α. Therefore, under this special
case, the Markov chain reduces to IID Bernoulli RVs.
Note that in general, when the rows of t.p.m are the same, the corresponding Markov chain is a sequence of
IID RVs whose distribution is given by the first/any row of the t.p.m.
3-8 Lecture 3: Discrete-Time Markov Chain – Part I
When we consider n-step transition probability with n large (in this case, n ≥ 10), it turns out that the
n-step transition probability matrix becomes
b n = 0.294 0.706 .
P
0.294 0.706
This implies that the initial distribution is uninformative – whether it is dry or wet on Jan 18th tells us little
about Jan 27th.
Recall that in the previous example, we saw that when the rows of a t.p.m. are the same, the correspond-
ing random variables are IID. Therefore, if we consider another sequence of RVs {Y0 (n) = Xk , Y1 (n) =
Xk+n , Y2 (n) = Xk+2n , · · · }, then Y0 (n), Y1 (n), Y2 (n), · · · are IID when n → ∞.
After seeing this example, one may conjecture that if the limit P∞ = limn→∞ Pn will have equal rows.
However, this is not always true. A counterexample is
0 1
P= .
1 0
First-step analysis is a general strategy for solving many Markov chain problems by conditioning on the
first step of the Markov chain. We demonstrate this technique on a simple example – the Gambler’s ruin
problem.
Gambler’s ruin problem: Two players bet one dollar in each round. Player 1 wins with probability α
and loses with probability β = 1 − α. We assume that player 1 starts with a dollars and player 2 starts with
b dollars. Let Xn be the fortune of player 1 after n rounds. Xn can take values from 0 to a + b:
α if j = i + 1,
pij = P (Xn+1 = j|Xn = i) = β if j = i − 1,
0 otherwise.
Let T be the time of rounds when one of the players loses all his/her money. Because of the randomness of
this model, T is also a random variable. We are interested in the probability that player 1 wins the game,
which occurs when XT = a + b. Apparently, this probability depends on the initial amount of money that
player 1 has so we will denote it as
Note that u(0) = 0 and u(a + b) = 1. We may view u(a) as the probability that the chain is absorbed into
the state a + b at the hitting time T when the chain starts at X0 = a.
First step analysis proceeds as follows:
Therefore, we have u(a) = u(a + 1)α + u(a − 1)β with two boundary conditions: u(0) = 0, u(a + b) = 1.
Because α + β = 1,
(α + β)u(a) = u(a) = u(a + 1)α + u(a − 1)β
which implies
α(u(a + 1) − u(a)) = β(u(a) − u(a − 1)).
β
αv(a + 1) = βv(a) ⇒ v(a + 1) = v(a).
α
By telescoping,
2 a
β β β
v(a + 1) = v(a) = v(a − 1) = · · · = v(1).
α α α
Using the boundary condition u(0) = 0,
Therefore,
1
a+b if α = β
v(1) = β
1− α
β a+b if α 6= β.
1−( α )
and
a
a+b if α = β
a
u(a) = β
1−( α )
β a+b if α 6= β.
1−( α )
We now turn to a classification of the states of a Markov chain that is crucial to understanding the behavior
of Markov chains.
An equivalence relation “∼” is a binary relation between elements of a set satisfying
2. Symmetry: i ∼ j ⇒ j ∼ i
3. Transitivity: i ∼ j, j ∼ k ⇒ i ∼ k.
For a set S and a ∈ S, {s ∈ S : s ∼ a} is called an equivalence class. Equivalence relations will allow us to
split Markov chain state spaces into equivalence classes.
(m)
State j is accessible from state i (i → j) if there exists m ≥ 0 such that pij > 0. We say that i communicates
with j (i ↔ j) if j is accessible from i and i is accessible from j. A set of states C is a communicating
class if every pair of states in C communicates with each other, and no state in C communicates with any
state not in C.
Proof: Reflexivity and symmetry are clear. To prove transitivity, let i ↔ j and j ↔ k. We then want to
show that i ↔ k.
Lecture 3: Discrete-Time Markov Chain – Part I 3-11
Note that i → j if and only if the transition diagram contains a path from i to j. So there is a path from
i to j and from j to k. Concatenate the two to obtain a path from i to k, which testifies to the fact that
i → k.
Analogously, we have k → i and conclude that i ↔ k.
P
A set of states C is closed if j∈C pij = 1 for all i ∈ C.
Example 6. Consider a Markov chain with the following transition diagram:
2 4 5
1 6
Then {1, 2, 3}, {4, 5}, and {6} are the communication classes.
A Markov chain {Xn } is called irreducible if it has only one communication class, i.e., for all i and j ,
(n)
i ↔ j. For state i, di = gcd{n ≥ 1 : pii > 0} is called its period, where gcd = greatest common divisor
(n)
and di = +∞ if pii = 0 for all n ≥ 1.
Example 7. Consider the example with state space S = {0, 1, 2, ...} and Xn such that
p
if i = 1,
P (Xn+1 = i|Xn = 0) = 1 − p if i = 0,
0 otherwise.
and for j 6= 0
p
if i = j + 1,
P (Xn+1 = i|Xn = j) = 1 − p if i = j − 1,
0 otherwise.
(5)
Then, d2 = gcd{2, 4, 5, 6, ...} = 1 though 1 is not in the list (think about why p22 > 0).
Example 8: Simple (1-D) Random Walk on the Integers. Consider another example with state space
Z. Let Xn be the position at time n. Then
with p = 1 − q. Suppose we start at 0, then it is clear that we cannot return to 0 after an odd number of
(2n+1)
steps, so p00 = 0 for all n, i.e.
(n)
d0 = gcd{n ≥ 1 : p00 > 0} = gcd{2, 4, 6, . . . } = 2.
(n ) (n )
Proof: i ↔ j ⇒ there exists n1 , n2 such that pij 1 > 0 and pji 2 > 0. Then, by Chapman-Kolmogorov:
(n +n )
X (n ) (n ) (n ) (n )
pii 1 2 = pik 1 pki 2 ≥ pij 1 pji 2 > 0.
k
3-12 Lecture 3: Discrete-Time Markov Chain – Part I
Hence, di |n1 + n2 + n.
Together, n1 + n2 = c1 di and n1 + n2 + n = c2 di imply that n = (c2 − c1 )di and as a result, di |n for all n
(n)
such that pjj > 0.
(n)
Since di is a divisor of the set {n : pjj > 0} and dj is the greatest common divisor of the same set (by
definition of period), di ≤ dj .
By symmetry, dj ≤ di ⇒ di = dj .
In a communication class, all states have the same period. Since all states communicate in an irreducible
Markov chains, it makes sense to define the period of such a Markov chain. If di = 1, state i is called
aperiodic. An irreducible Markov chain with period 1 is also called aperiodic.
Theorem 3.5 (Lattice Theorem (Brémaud p.75)) Suppose d is the period of an irreducible Markov
chain. Then for all states i, j there exists m ∈ {0, . . . , d − 1} and k ≥ 0 such that
(m+nd)
pij > 0, ∀n ≥ k.
Theorem 3.6 (Cyclic Classes) For any irreducible Markov chain one can find a unique partition of S
into d classes C0 , C1 , ..., Cd−1 such that for all k, and for i ∈ Ck ,
X
pij = 1,
j∈Ck+1
where, by convention Cd = C0 and where d is maximal (that is, there is no other such partition C00 , C10 , ..., Cd0 0 −1
with d0 > d).
Proof: Fix a state i and classify states j by the value of m in Lattice Theorem.
The number d ≥ 1 is the period of the chain. The classes C0 , C1 , ..., Cd−1 are called the cyclic classes.
Example 8: Simple (1-D) Random Walk on the Integers (revisited). Random walk on S = Z =
C0 + C1 where C0 and C1 are the sets of even and odd integers.
Lecture 3: Discrete-Time Markov Chain – Part I 3-13
The Markov property states that the random variable at time n + m conditional on its behavior at time n
is independent of the at time prior to n. However, what if the time n is random?
Say we are interested in the behavior of XT +m given XT , where T is the first time that the Markov chain
hits the state 0. Do we still have the Markov property?
Some random time does not have the Markov property. Recall that the Markov property states that for
any m < n < k, Xm ⊥ Xk |Xn for non-random m, n, k. Let {Xn } be a Markov chain with state space
S = {1, 2, 3} and consider a random time
To see why Markov property does not work for random m, n, k, consider m = T − 1, n = T, k = T + 1. Then
the probability
• Counterexample – non-stopping time: Let τ = inf{n ≥ 1 : Xn+1 = i} is not a stopping time because
when the time arrives at m, {τ = m} = {X1 6= i, · · · , Xm 6= i, Xm+1 = i} depends on Xm+1 .
Stopping time is a very important class of random variable in statistics. Many statistical procedure involves
a stopping time. For instance, if we are performing a sequence of experiments and we will stop when we
observe certain behavior such as a high signal or enough anomaly. Then the time (of related to the number
of sample) is a stopping time. If we want to use data from this sequence of experiments, then we need to
use theorems of stopping time (such as optional sampling theorem).
Theorem 3.7 (Strong Markov Property) Let {Xn } be a homogeneous Markov chain with a transition
probability matrix P = {pij } and let τ be a stopping time with respect to {Xn }. Then for any integer k,
(k)
P (Xτ +k = j|Xτ = i, X` = i` , 0 ≤ ` < τ ) = P (Xk = j|X0 = i) = pij
3-14 Lecture 3: Discrete-Time Markov Chain – Part I
and
(k)
P (Xτ +k = j|Xτ = i) = P (Xk = j|X0 = i) = pij .
It is often of great interest to study the limiting behavior of a Markov chain Xn when n → ∞. Here, for
simplicity, we assume that our Markov chain is homogeneous. A property of limiting behavior is that Xn and
Xn+1 should have the same distribution when n is large. So we are interested in understanding if a Markov
chain will eventually converge to a ‘stable’ distribution (formally, we will call it a stationary distribution).
In particular, we would like to know given a Markov chain,
It turns out that to answer these questions, we will use concepts related to return time. Thus, we start with
understanding properties about return time.
Note that the quantity Ni may equal to ∞. It is a finite number with a non-zero probability if there are
some states such that when the chain enters one of them, the chain never go back to state i. Later we will
describe this phenomena using the concept of transient states and recurrent states.
Example 6 (revisited). Consider a Markov chain with the following transition diagram:
2 4 5
1 6
As can be seen easily, when the Markov chain enters states {4, 5, 6}, it never comes back to any of {1, 2, 3}.
Thus, N1 takes a non-trivial probability to be a finite number.
Let Ti = inf{n ≥ 1 : Xn = i} be the return time. Then the following events can be defined using either Ti
or Ni :
{Ti = ∞} = {Ni = 0}, {Ti < ∞} = {Ni > 0}.
These are useful later.
We then define fji = Pj (Ti < ∞) = Pj (Ni > 0) to be the probability of reaching state i in a finite number
of time when the chain starts at state j. Note that because Pj (Ti = ∞) + Pj (Ti < ∞) = 1, we have
fii = Pi (Ti < ∞) and Pi (Ti = ∞) = 1 − fii .
Proposition 3.8 (
fji fiir−1 (1 − fii ) if r ≥ 1
Pj (Ni = r) = .
1 − fji if r = 0
Proof: The case r = 0 is very simple because {Ni = 0} = {Ti = ∞}. Thus, Pj (Ni = 0) = Pj (Ti = ∞) =
1 − Pj (Ti < ∞) = 1 − fji .
For the rest of cases, we will do a proof by induction. Before doing that, we first investigate the case
Pj (Ni = r) for r > 0. Let τr be the r-th return time. Note that the event {Xτr = i} = {Ni ≥ r}.
3-16 Lecture 3: Discrete-Time Markov Chain – Part I
Then
Therefore, we conclude
To start with the proof by induction, consider r = 1. Pj (Ni ≥ 1) = 1 − Pj (Ni = 0) = fji so Pj (Ni = 1) =
(1 − fii )fji , which agrees with what we need for r = 1.
Assume that it works for r ≤ k. Now we show that it works for r = k + 1. Note that this means that we
have (
fji fiir−1 (1 − fii ) if r = 1, · · · k
Pj (Ni = r) = .
1 − fji if r = 0
For the case of r = k + 1, we use the fact that
Pj (Ni ≥ k + 1) = 1 − Pj (Ni ≤ k)
k
X
= 1 − (1 − fji ) − fji fiir−1 (1 − fii )
r=1
Thus,
Pj (Ni = k + 1) = (1 − fii )Pj (Ni ≥ k + 1 = fji fiik (1 − f ii)
which is the formula for r = k + 1. Thus, by induction, the result holds.
The above formula also gives an interesting result on the case of ‘starting from state i, returning to state i’
when we set j = i:
Pi (Ni = r) = fiir (1 − fii ), Pi (Nr > r) = fiir+1 ,
where fii = Pi (Ti < ∞).
Lecture 3: Discrete-Time Markov Chain – Part I 3-17
We have seen many situations that Ti and Ni are closely related. Here is another result about their rela-
tionship.
Corollary 3.9
Pi (Ni = ∞) = 1 ⇔ Pi (Ti < ∞) = 1
and
Pi (Ti < ∞) < 1 ⇔ Pi (Ni = ∞) = 0 ⇔ Ei (Ni ) < ∞.
Corollary 3.9 links the finiteness of Ti and Ni and also relates it to the expectation. With the following
formula of expectation, Corollary 3.9 will be very useful:
∞
X
E(X) = P (X ≥ t), (3.5)
t=1
Note that: either Pi (Ni = ∞) = 0 or Pi (Ni = ∞) = 1, with nothing in between (if fii < 1, then Pi (Ni =
∞) = 0; if fii = 1, then Pi (Ni = ∞) = 1). This, together with Corollary 3.9, implies that Ei (Ni ) = ∞ ⇐⇒
Pi (Ni = ∞) = 1.
Note that:
fii = Pi (Ti < ∞) = 1 ⇐⇒ Pi (Ni = ∞) = 1.
In other words, if a Markov chain returns to state i in finite time, then the chain visits this state infinitely
often.
P∞ (n)
Proposition 3.10 State i is recurrent ⇐⇒ n=1 pii = ∞.
Proposition 3.11 Recurrence is a communication class property, i.e. if i ↔ j and i is recurrent, then j is
recurrent.
3-18 Lecture 3: Discrete-Time Markov Chain – Part I
Proof: Homework.
Example: Gambler’s Ruin+. Recall that in Gambler’s ruin, the game ends when Xn hits 0 or
a + b. Now we extend the problem in the sense that the game does not ends when a player loses/takes
all money but the value of Xn stays the same once it hits 0 or a + b. Namely, Xn = 0 ⇒ Xn+1 = 0
(k) (k)
and Xn = a + b ⇒ Xn+1 = a + b. In this case p00 = pa+b,a+b = 1 for all k = 1, 2, · · · . Therefore,
P∞ (n) P∞ (n) P∞
n=1 p00 = n=1 pa+b,a+b = n=1 1 = ∞. Hence, 0 and a + b are recurrent states. Once they are reached
we stay there forever. Let q be the probability that player 1 loses. Consider state 1. If the next round
the player 1 loses, the chain stuck at 0 forever. Namely, T1 = ∞ because we can never come back. So
P1 (T1 = ∞) ≥ q, which implies
Note that the inequality in P1 (T1 = ∞) ≥ q is due to the fact that even if player 1 wins, the game may end
at a + b, so the return time to state 1 may still be infinite. Therefore, by definition, 1 is a transient state.
Since states {1, . . . , a + b − 1} form a communication class, all states in this class are also transient. These
states are transient because they occur a finite number of times before absorption into states 0 or a + b.
Example 8: 1-D Random Walk (revisited). Let Xn be a random walk on the set of all integers Z such
that (
p if j = i + 1
pij =
q := 1 − p if j = i − 1.
(2n+1)
Let’s study recurrence of state 0. We know that p00 = 0 for all n ≥ 0 and that, conditional on X0 = 0,
X2n =d ξ1 + · · · + ξ2n , where ξ1 , . . . , ξn are i.i.d. with P (ξi = 1) = 1 − P (ξ = −1) = p. Hence,
(2n) 2n n n
p00 = P (X2n = 0|X0 = 0) = p q .
n
1 √
Recall that Stirlings formula says that n! ∼ nn+ 2 e−n 2π, meaning that
n!
lim 1 √ = 1.
n→∞ nn+ 2 e−n 2π
Therefore,
(2n) (2n)! n n
p00 = p q
n!n!
1 √
(2n)2n+ 2 e−2n 2π
∼ (pq)n
n2n+1 e−2n 2π
1 1
22n+ 2 n2n+ 2 (pq)n 22n (4pq)n
= 1√ (pq)n = √ = √ .
n2n+1 2 2 π πn πn
We deduce that
∞ ∞
X (n)
X (2n) 1
p00 = p00 =∞ ⇔ 4pq ≥ 1 ⇔ p=q= .
n=1 n=1
2
P∞
(Ratio Test: Let n=1 an be a series which satisfies limn→∞ | aan+1
n
| = k. If k > 1 the series diverges, if k < 1
the series converges.) Conclusion: Only the symmetric random walk is recurrent on Z. Interestingly, the
symmetric random walk on Z2 is also recurrent, but it is transient on Zn for n ≥ 3, See Brémaud (1999,
p. 98).
Lecture 3: Discrete-Time Markov Chain – Part I 3-19
With the knowledge about recurrence, we are able to talk about the invariant measures and stationary
distribution of a stochastic matrix.
A vector x 6= 0 is called an invariant measure of a stochastic matrix P if
A probability vector π on a Markov P chain state space is called a stationary distribution of a stochastic
matrix P if π T P = π T , i.e., πi = j πj pji for each i.
The equation xT P = xT or π T P = π T is also called the global balance equaitons – the probability flow in
equals the flow out. Note that for an invariance measure x such that c = i xi < ∞, c−1 x is a stationary
P
distribution. But it may happen that c = ∞ for some invariant measure so one cannot always normalize it.
Example 9: Two-State Markov Chain. Consider a Markov chain with two states and a transition
probability matrix
1−p p
P= 0 < p < 1, 0 < q < 1.
q 1−q
The global balance equations:
1−p p
π0 , π1 = π0 , π1 or
q 1−q
(
(1 − p)π0 + qπ1 = π0 q
⇒ pπ0 = qπ1 ⇒ π0 = π1 .
pπ0 + (1 − q)π1 = π1 p
Using that π0 + π1 = 1, we obtain
q p
π1 + π1 = 1 ⇒ π1 =
p p+q
and deduce that the global balance equations have the unique solution
q p
πT = , ,
p+q p+q
By inspection, vectors παT = [α, 0, 0, 0, 1 − α] satisfy global balance equations: παT P = παT for any α ∈ (0, 1).
So the Gambler’s ruin chain has an uncountable number of stationary distributions.
Here, we see the case where a Markov chain may have infinite number of stationary distribution. And in
some cases it may not even have a stationary distribution! So returning to our original questions, we would
3-20 Lecture 3: Discrete-Time Markov Chain – Part I
like to know (i) when will a Markov chain has a stationary distribution? and (ii) how to find a stationary
distribution? and (ii) when the stationary distribution will be unique?
The following proposition partially answer the first question. Note that a Markov chain is recurrent if all its
states are recurrent.
Proposition 3.12 Let {Xn } be an irreducible, recurrent, homogeneous Markov chain with transition prob-
ability matrix P. For each i ∈ S define
∞
" #
X
yi = E0 I(Xn = i)I(n ≤ T0 ) ,
n=1
where 0 is an arbitrary reference state and T0 = inf{n ≥ 1 : Xn = 0} is the first return time to 0. Then
yi ∈ (0, ∞) for all i ∈ S, and yT = [y0 , y1 , . . . ] is an invariant measure of P.
(P1) When i = 0,
∞
" #
X
y0 = E0 I(Xn = 0)I(n ≤ T0 ) = 1
n=1
to be the probability of visiting state i at time point n before returning to state 0. Thus,
∞ ∞
(n)
X X
yi = E0 (I(Xn = i)I(n ≤ T0 )) = q0i
n=1 n=1
(1)
and q0i = E0 (I(X1 = i)I(1 ≤ T0 )) = p0i .
P
Proof: This proof consists of two parts. In the first part, we prove that each yi satisfies yi = j∈S yj pji .
In the second part, we will show that 0 < yi < ∞ for every i ∈ S.
Lecture 3: Discrete-Time Markov Chain – Part I 3-21
P (n)
Part 1. To show that yi = j∈S yj pji , we analyze q0i defined in property (P3):
(n)
q0i = P0 (X1 6= 0, X2 6= 0, · · · , Xn−1 6= 0, Xn = i)
X
= P0 (X1 6= 0, X2 6= 0, · · · , Xn−1 = j, Xn = i)
j6=0
X
= P0 (Xn = i | X1 6= 0, X2 6= 0, · · · , Xn−1 = j) P0 (X1 6= 0, X2 =
6 0, · · · , Xn−1 = j)
| {z }
j6=0 (n−1)
=q0j
(n−1)
X
= P (Xn = i | Xn−1 = j)q0j (Markov property)
j6=0
(n−1)
X
= q0j pji .
j6=0
Thus,
∞
(n)
X
yi = q0i
n=1
∞
(n)
X
= p0i + q0i
n=2
∞ X
(n−1)
X
= p0i + q0j pji
n=2 j6=0
∞ X
(n)
X
= p0i + q0j pji
n=1 j6=0
∞
!
(n)
X X
= p0i + q0j pji
j6=0 n=1
| {z }
=yj
X
= y0 p0i + yj pji
|{z}
=1 j6=0
X
= yj pji .
j∈S
Part 2. Now we show that 0 < yi < ∞. First note that y0 = 1 so we only need to focus on cases y 6= 0.
(n(i))
Because the Markov chain is irreducible, for each state i there exists a number n(i) ≥ 1 such that p0i > 0.
Then using the fact that y T = y T P implying y T = y T P(n(i)) ,
(n(i)) (n(i)) (n(i))
X X
yi = yj pji = y0 p0i + yj pji > 0.
j∈S
| {z }
j6=0
>0
To show that yi < ∞, we prove by contradiction. Assume that yi = ∞. Because the Markov chain is
(k(i))
irreducible, there exists a constant k(i) such that pi0 > 0. Then
(k(i)) (k(i)) (k(i))
X X
y0 = yj pj0 = yi pi0 + yj pj0 = ∞,
j∈S
| {z }
j6=i
=∞
3-22 Lecture 3: Discrete-Time Markov Chain – Part I
Proposition 3.13 The invariant measure of an irreducible and recurrent chain is unique up to a multiplica-
tive factor.
Proposition 3.14 An irreducible,P recurrent and homogeneous Markov chain is positive recurrent ⇔ all of
its invariant measures y satisfy i∈S yi < ∞.
yi = E0 [I(Xn = i)I(n ≤ T0 )] .
To see why positive recurrent is important, consider the following example about a 1 − D random walk on all
integers Z with p 6= q is transient and recurrent if p = q = 0.5. This Markov chain has an invariant measure
yT = [1, 1, . . . ] for any p and q since
.. .. ..
. . . · · · · · · · · · · · ·
· · · q 0 p · · · · · · · · ·
P = · · · · · ·
q 0 p · · · · · ·
· · · · · · · · · q 0 p · · ·
.. .. ..
··· ··· ··· ··· . . .
Since this measure is not normalizable (the state space is Z), the 1-D random walk can not be positive
recurrent. Thus, we see that an irreducible homogeneous Markov chain can have an invariant measure and
still be transient or null recurrent.
Lemma 3.15 Let {Xn } be a homogeneous Markov chain with state space S and n-step transition probability
(n) (n)
matrix Pn = {pij }. If i ∈ S is a transient state, then lim pji = 0 for all j ∈ S.
n→∞
P∞ (n) (n)
Proof: This proof relies a trick – if n=1 pji < ∞, then limn→∞ pji = 0. Thus, we only need to show
P∞ (n)
that n=1 pji < ∞ when i ∈ S is a transient state.
By definition,
∞ ∞ ∞ ∞
!
(n)
X X X X
pji = Pj (Xn = i) = Ej (I(Xn = i)) = Ej I(Xn = i) = Ej (Ni ).
n=1 n=1 n=1 n=1
Lecture 3: Discrete-Time Markov Chain – Part I 3-23
So we only need to compute each of Pj (Ni > k). Now, recalled from the proof of Proposition 3.8,
We obtain
∞
X ∞
X
Ej (Ni ) = Pj (Ni ≥ k + 1) = fji fiik .
k=0 k=0
Because the state i is transient, fii < 1 so the above summation becomes
∞
X fji
Ej (Ni ) = fji fiik = < ∞,
1 − fii
k=0
Theorem 3.16 (Stationary Distribution Criterion) An irreducible homogeneous Markov chain is pos-
itive recurrent if and only if it has a stationary distribution. Moreover, if the stationary distribution
π T = [π1 , π2 , . . . ] exists, it is unique and πi > 0 for all i ∈ S.
Proof: ⇒: P
By Propositions 3.12 and 3.14, the vectorP
y defined in Proposition 3.12 is an invariant measure with i∈S yi <
∞. Thus, the probability vector π = y/ i∈S yi is the stationary distribution.
The uniqueness follows from Proposition 3.13.
⇐:
To prove this direction, we use proof by contradiction. Because recurrence is a communication class property
(Proposition 3.11) and the Markov chain is irreducible, the fact that a state i is transient implies every state
is transient. Let π be a stationary distribution and we assume that the Markov chain is transient.
(n)
By Lemma 3.15, limn→∞ pji = 0 for any state j ∈ S. Since π is a stationary distribution, π T = π T Pn .
Using the dominated convergence theorem (we can exchange summation and limit)1 ,
(n) (n)
X X X
πi = lim πi = lim πj pji = πj lim pji = πj × 0 = 0
n→∞ n→∞ n→∞
j∈S j∈S j∈S
1 (n) P
This works because πj pji ≤ πj for every n and j∈S πj = 1.
3-24 Lecture 3: Discrete-Time Markov Chain – Part I
In the above case, we are working on a state space S that may possibly contain infinite number of states.
In many realistic scenarios the number of states is finite. Does the finiteness of state number gives us any
benefits? The answer is yes – and it gives us a huge benefit.
Theorem 3.17 An irreducible homogeneous Markov chain on a finite state space is positive recurrent.
Therefore, it always has a stationary distribution.
Proof:
We first prove that the chain is recurrent. We proceed by proof by contradiction. Assume that the chain is
transient. In the proof of Lemma 3.15, we have shown that if a state i is transient, then
∞ ∞
X (n)
X fji
pji = Ej (Ni ) = fji fiik = < ∞.
n=1
1 − fii
k=0
Finally, we end this lecture on the relation between the return time and the stationary distribution.
Theorem 3.18 Let {Xn } be an irreducible homogeneous positive recurrent Markov chain. Then
1
πi = ,
Ei (Ti )
where π = (π1 , · · · , πs ) is the stationary distribution of {Xn } and Ti = inf{n ≥ 1 : Xn = i} is the return
time to state i.
P∞
Proof: Define a vector y such that yi = E0 ( n=1 I(Xn = i)I(n ≤ T0 )). We already know that such a vector
describes an invariant measure and πi = P yi yj .
j∈S
P∞
Now
P we consider the case i = 0. Then y0 = E0 ( n=1 I(X n = 0)I(n ≤ T0 )) = 1 by property (P1). Moreover,
y0 1
y
i∈S i = E (T
0 0 ) due to property (P2). Thus, π 0 = P
yi = E0 (T0 ) .
i∈S
Because state 0 is just a reference state, we can apply the same argument to any other state. Thus, we
1
conclude that πi = Ei (Ti)
for each i ∈ S.
Here is a summary about the classification of states. Recall fii = Pi (Ti < ∞) is the probability of return to
i given we start at i and Ei (Ti ) is the expected return time. State i is called:
1. Recurrent if fii = 1.
2. Transient if fii < 1.
3. Positive Recurrent if fii = 1 and Ei (Ti ) < ∞.