0% found this document useful (0 votes)
12 views25 pages

Discrete-Time Markov Chains Overview

This document covers the fundamentals of discrete-time Markov chains as part of a stochastic modeling course. It introduces stochastic processes, defines Markov chains, and discusses their properties, including one-step memory and transition probability matrices. Several examples illustrate the application of Markov chains in various scenarios, such as genetic drift and precipitation levels.

Uploaded by

zwu363
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views25 pages

Discrete-Time Markov Chains Overview

This document covers the fundamentals of discrete-time Markov chains as part of a stochastic modeling course. It introduces stochastic processes, defines Markov chains, and discusses their properties, including one-step memory and transition probability matrices. Several examples illustrate the application of Markov chains in various scenarios, such as genetic drift and precipitation levels.

Uploaded by

zwu363
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STAT 516: Stochastic Modeling of Scientific Data Autumn 2024

Lecture 3: Discrete-Time Markov Chain – Part I


Instructor: Yen-Chi Chen

These notes are partially based on those of Mathias Drton.

3.1 Introduction

Before introducing Markov chain, we first talk about stochastic processes. A stochastic process is a family
of RVs Xn that is indexed by n, where n ∈ T . Note that sometimes people write Xt with t ∈ T . The set T
is called the index set. There are two common types of stochastic processes:

• T is discrete. For instance, T = {0, 1, 2, 3, · · · }. Then Xn is called a discrete-time stochastic process.

• T is continuous. For instance T = [0, ∞). Then Xn is called a continuous-time stochastic process.

Each random variable Xn can have a discrete, continuous, or mixed distribution. For example, in a queue
Xn could represent the time that the n-th customer waits after arrival before receiving service, with a
distribution that has an atom at zero but is otherwise continuous. Usually, each Xn takes its values in the
same set, which is called the state space and denoted S. Therefore, Xn ∈ S.
We will focus on discrete time stochastic processes with a discrete state space in this course.
Let {Xn : n = 0, 1, 2, ...} be a discrete time stochastic process and the state space S is discrete. The
probability model of this process is determined by the joint distribution function:

P (X0 = x0 , X1 = x1 , · · · , Xn = xn )

for all n = 0, 1, ... and x0 , x1 , ... ∈ S.


In general, this joint distribution function can be arbitrary so it is very complex. We need some addition
modeling on the joint distribution function to make it simpler enough that we can analyze. One particular
example is the IID assumptions – X0 , X1 , · · · are IID. In this case, the join distribution function can be
factorized into products of identical functions.
However, IID assumptions are often too strong for some scenarios. For instance, when modeling the genetic
drift, the IID assumption on each generation is not a good model. Here are two examples where the IID
assumptions do not work.
Example 1: Excess number of heads over tails in tossing a coin. Assume we toss a coin n times
and let Xn denote the excess number of heads over tails. Clearly, Xn ∈ {· · · , −2, −1, 0, 1, 2, · · · }. Assume
that each coin is tossed independently, then the process {Xn } has a one-step memory such that

P (x0 , · · · , xn ) = P (xn |xn−1 ) × P (xn−1 |xn−2 ) × P (x1 |x0 ) × P (x0 ),

where P (x0 , · · · , xn ) = P (X0 = x0 , X1 = x1 , · · · , Xn = xn ). Clearly, P (x0 , x1 ) 6= P (x0 )P (x1 ) so the process


is not independent although P (x0 , x2 |x1 ) = P (x0 |x1 )P (x2 |x1 ).

3-1
3-2 Lecture 3: Discrete-Time Markov Chain – Part I

Example 2: Precipitation level. Let the precipitation level of day n be Xn where Xn = 0 (dry) or 1 (wet).
Assume that the precipitation level of a day given all the past history only depends on the precipitation level
of the last two days. Namely,

P (xn |xn−1 , xn−2 , · · · , x0 ) = P (xn |xn−1 , xn−2 ).

Then the joint distribution function is

P (x0 , · · · , xn ) = P (xn |xn−1 , xn−2 ) × P (xn−1 |xn−2 , xn−3 ) × P (x2 |x1 , x0 ) × P (x1 |x0 ) × P (x0 ).

Again, it is clear that the RVs X0 , X1 , · · · , Xn are not IID. Although this model shows that the process has
a two-step memory, we can reformulate it to a one-step memory process by defining a sequence of random
vectors Y0 , Y1 , · · · , Yn , · · · such that Yn = (Xn , Xn+1 ). Then

P (yn |yn−1 , yn−2 , · · · , y0 ) = P (yn |yn−1 )

so the process {Yn } has a one-step memory.


The above two examples motivates us to study the process with a one-step memory. And such a stochastic
process is known as the Markov chain .

3.2 Markov Chain

A discrete time stochastic process {Xn : n = 0, 1, 2, · · · } is called a Markov chain if for every x0 , x1 , · · · , xn−2 , i, j ∈
S and n ≥ 0,
P (Xn = i|Xn−1 = j, · · · , X0 = x0 ) = P (Xn = i|Xn−1 = j)
whenever both sides are well-defined. The Markov chain has a one-step memory.
If the distribution function P (Xn = xn |Xn−1 = xn−1 ) = pij is independent of n, we called {Xn } a ho-
mogeneous Markov chain. Otherwise we called it an inhomogeneous Markov chain. For a homogeneous
Markov chain, X
pij = 1, pij ≥ 0
j∈S

for every i, j. Note that sometimes people write pij = pi→j , where pi→j stands for that the probability
moving from state i to state j.
Because S is a discrete set, we often label it as S = {1, 2, 3, · · · , s} and the elements {pij : i, j = 1, · · · , s}
forms an s × s matrix P = {pij }. P is called the transition (probability) matrix (t.p.m). The property
of homogeneous Markov chain implies that

P ≥ 0, P1s = 1s , (3.1)

where 1s = (1, 1, 1, 1, · · · , 1)T is the vector of 1’s. Note that any matrix satisfying equation (3.1) is called a
stochastic matrix.
Example 3: SIS (Susceptible-Infected-Susceptible) model.
Suppose we observe an individual over a sequence of days n = 1, 2, . . . and classify this individual each day
as (
I if infected
Xn =
S if susceptible.
Lecture 3: Discrete-Time Markov Chain – Part I 3-3

We would like to construct a stochastic model for the sequence {Xn : n = 1, 2, ...}. One possibility is to
assume that the Xn ’s are independent with P (Xn = I) = 1 − P (Xn = S) = α. However, this model is not
very realistic since we know from experience that the individual is more likely to stay infected if he or she is
already infected.
Since Markov chains are the simplest models that allow us to relax independence, we proceed by defining a
transition probability matrix:
%  I S 
P= I 1−α α
S β 1−β
It can be helpful to visualize the transitions that are possible (have positive probability) by a transition
diagram:
p

1−p I S 1−q

Example 4: Example: Ehrenfest Model of Diffusion.


We start with N particles in a closed box, divided into two compartments that are in contact with each
other so that particles may move between compartments. At each time epoch, one particle is chosen uni-
formly at random and moved from its current compartment to the other compartment. Let Xn be the
number of particles in compartment 1 (say) at step n. This stochastic process is Markov by construction.

Xn = 3

Transition probabilities of the Markov chain are:



i
N .
 for j = i − 1,
i
pij = 1− N, for j = i + 1,

0, otherwise.

The probability of transfer depends on the number of particles in each compartment. For N = 2 we have
states 0, 1, 2 and t.p.m.  
0 1 0
P = 1/2 0 1/2
0 1 0
and the transition diagrm

0 1 2

Example 5: Snoqualmie Falls Precipitation.


3-4 Lecture 3: Discrete-Time Markov Chain – Part I

There is a data on the precipitation (in inches), recorded by the UW Weather Service, at Sonqualmie Falls
in the years 1948–1983. We examine the data for January only and consider dry (=0) and wet (=1) only. If
we condition on the state on January 1st we obtain the frequencies of the four different transitions as:
0 1 Total
0 186 123 309
1 128 643 771
Total (314) (766) 1080
For example, there were 123 occasions on which a wet day followed a dry day.
From the table of frequencies we can compute the relative frequencies of transitions:
 
0.602 0.398
P=
b
0.166 0.834
This is an estimate (hence the hat!) of the t.p.m.

3.3 Property of Markov chain

Here are some important properties of a Markov chain.


Property: joint probability. Suppose we observe a finite realization of the discrete Markov chain and
want to compute the probability of this random event:
P (Xn = in , Xn−1 = in−1 , · · · , X1 = i1 , X0 = i0 )
= P (Xn = in |Xn−1 = in−1 , · · · , X0 = i0 ) P (Xn−1 = in−1 , · · · , X0 = i0 )
= pin−1 ,in × P (Xn−1 = in−1 |Xn−2 = in−2 , . . . , X0 = i0 ) × P (Xn−2 = in−2 , . . . , X0 = i0 )
= ···
= p0i0 pi0 ,i1 pi1 ,i2 · · · pin−2 ,in−1 pin−1 ,in
where p0 = (p01 , p02 , . . . )T is the distribution of X0 , called the initial distribution of {Xn }. Thus, every
Markov chain is fully specified by its transition probability matrix P and initial distribution p0 .
Property: Markov property. The Markov chain has a powerful property called the Markov property –
the distribution of Xm+n given a set of previous states depends only on the latest available state. Assume
that we observe a Markov chain from n = 0, 1, · · · , n and we are analyzing the distribution of Xm+n . Then
P (Xm+n = j|Xn = i, Xn−1 = in−1 , ..., X0 = i0 ) = P (Xm+n = j|Xn = i). (3.2)

To give an intuition about how we obtain the Markov property, consider a simple case where n = 1 and
m = 2.
X
P (X3 = i3 |X1 = i1 , X0 = i0 ) = P (X3 = i3 , X2 = i2 |X1 = i1 , X0 = i0 )
i2
X
= P (X3 = i3 |X2 = i2 , X1 = i1 , X0 = i0 )P (X2 = i2 |X1 = i1 , X0 = i0 )
i2
!
X
= P (X3 = i3 |X2 = i2 , X1 = i1 )P (X2 = i2 |X1 = i1 )
i2
X
= P (X3 = i3 , X2 = i2 |X1 = i1 )
i2

= P (X3 = i3 |X1 = i1 ).
Lecture 3: Discrete-Time Markov Chain – Part I 3-5

!
To argue the equality ‘=’, observe that

P (X3 = i3 |X2 = i2 , X1 = i1 , X0 = i0 ) = P (X3 = i3 |X2 = i2 ).

But we also have that


X
P (X3 = i3 |X2 = i1 , X1 = i1 ) = P (X3 = i3 , X0 = i0 |X2 = i2 , X1 = i1 )
i0
X
= P (X3 = i3 |X2 = i2 , X1 = i1 , X0 = i0 )P (X0 = i0 |X2 = i2 , X1 = i1 )
i0
X
= P (X3 = i3 |X2 = i2 ) P (X0 = i0 |X2 = i2 , X1 = i1 )
i0

= P (X3 = i3 |X2 = i2 ).

Property: conditional independence. We can represent the Markov chain using a simple graphical
model:

0 1 2 n−1 n

The claim of the Markov property is now obvious from the theorem on conditional independence and graphical
factorization. Indeed, the latest available state serves as a separating set.
Using the graph representation, we obtain an interesting property about a Markov chain: the past and the
future are independent given the present.
To see this, again we consider a simple case where n = 2 and we have X0 , X1 , X2 . Here X0 denotes the past,
X1 denotes the present, and X2 denotes the future. Then

P (X0 = i0 , X1 = i1 , X2 = i2 )
P (X0 = i0 , X2 = i2 |X1 = i1 ) =
P (X1 = i1 )
P (X2 = i2 |X1 = i1 , X0 = i0 )P (X1 = i1 , X0 = i0 )
=
P (X1 = i1 )
P (X1 = i1 , X0 = i0 )
= P (X2 = i2 |X1 = i1 )
P (X1 = i1 )
= P (X2 = i2 |X1 = i1 )P (X0 = i0 |X1 = i1 )

for any i0 , i1 , i2 ∈ S. Namely, X0 and X2 are conditional independent given X1 .

3.4 n-step Transition Probability and Chapman-Kolmogorov Equa-


tion

For a Markov chain, we define the n-step transition probability as


(n)
pij = P (Xn = j|X0 = i).

The n-step transition probability is time invariant.


3-6 Lecture 3: Discrete-Time Markov Chain – Part I

(n)
Lemma 3.1 Let {Xn } be a homogeneous Markov chain and let pij be the n-step transition probability.
Then for any k = 0, 1, 2, · · · ,
(n)
P (Xn+k = j|Xk = i) = pij .

Proof:
X
P (Xn+k = j|Xk = i) = P (Xn+k = j|Xn+k−1 = in+k−1 ) × · · · × P (Xk+1 = ik+1 |Xk = i)
ik+1 ,ik+2 ,··· ,in+k−1
X
= P (Xn = j|Xn−1 = in+k−1 ) × · · · × P (X1 = ik+1 |X0 = i)
ik+1 ,ik+2 ,··· ,in+k−1
(n)
= P (Xn = j|X0 = i) = pij .

The n-step transition probabilities are related to each other via the famous Chapman-Kolmogorov Equation.

(n)
Lemma 3.2 Let {Xn } be a homogeneous Markov chain and let pij be the n-step transition probability.
Then for any n, m = 0, 1, 2, · · ·
(n+m)
X (n) (m)
pij = pik pkj . (3.3)
k∈S

Proof:
(n+m)
pij = P (Xn+m = j|X0 = i)
X
= P (Xn+m = j, Xm = k|X0 = i)
k∈S
X
= P (Xn+m = j|Xm = k, X0 = i)P (Xm = k|X0 = i)
k∈S
X
= P (Xn+m = j|Xm = k)P (Xm = k|X0 = i) (Markov property)
k∈S
X
= P (Xn = j|X0 = k)P (Xm = k|X0 = i) (time-invariant property)
k∈S
(n) (m)
X
= pik pkj .
k∈S

The Chapman-Kolmogorov Equation (equation (3.3)) also implies


(n+1)
X (n)
Forward equation : pij = pik pkj , for n = 1, 2, . . . and
k
(n+1) (n)
X
Backward equation : pij = pik pkj , for n = 1, 2, . . . .
k

The forward equation singles out the final step and has the initial state i fixed. The equation is most useful
(n)
when interest centers on the pij ’s for a particular i but all values of j. Conversely, the backward equation
singles out the change from the initial state i and has the final state j fixed. This equation is useful when
(n)
interest is in the pij ’s for a particular j but all values of i. The backward equation can be interesting, in
particular, when there is an absorbing state j from which there is no escape (pjj = 1).
Lecture 3: Discrete-Time Markov Chain – Part I 3-7

(n)
If we collect the n-step transition probabilities into the matrix P(n) = {pij }, then Kolmogorov’s forward
and backward equations can be rewritten in matrix form as

P(n+1) = P(n) P = PP(n) ,

where P(1) = P. Therefore, P(n) = Pn .


This matrix form also implies a cool property about the marginal distribution of Xn . Assume that X0 has
a distribution p0 = (p01 , p02 , · · · , p0s )T . Let pn = (pn1 , · · · , pns )T be the marginal distribution of Xn , i.e.,
pnj = P (Xn = j). Then
X
pnj = P (Xn = j) = P (Xn = j, X0 = i)
i
X
= P (Xn = j|X0 = i)P (X0 = i)
i
(n)
X
= p0i pij .
i

Using the matrix form, we obtain


pTn = pT0 Pn .

Example 3: SIS model (revisited).


Recalled that SIS model has a transition probability

%  0 1 
P= 0 1−α α
1 β 1−β

Note that we use {0, 1} to denote the state I and S in the SIS model.
Assume that the initial distribution p0 = (1−α, α), i.e., P (X0 = 0) = 1−α. Moreover, assume that β = 1−α
so the distribution of X1 will be
 
1−α α
pT1 = pT0 P = (1 − α, α)
1−α α
= [(1 − α)2 + α(1 − α), α(1 − α) + α2 ] = (1 − α, α) = pT0 .

What will be the distribution of Xn ? Using the matrix form, we know that

pTn = pT0 Pn = pT1 Pn−1 = pT0 Pn−1 = · · · = pT0 .

Therefore, P (Xn = 0) = 1 − α and P (Xn = 1) = α for all n = 1, 2, 3, · · · .


A more interesting fact is the joint distribution of X0 , X1 , · · · , Xn :

P (X0 = i0 , X1 = i1 , · · · Xn = in ) = p0i0 pi0 i1 pi1 i2 · · · pin−1 in


= αi0 (1 − α)1−i0 αi1 (1 − α)1−i1 αi2 (1 − α)1−i2 · · · αin (1 − α)1−in ,

which is the joint PMF of IID random Bernoulli variables with parameter α. Therefore, under this special
case, the Markov chain reduces to IID Bernoulli RVs.
Note that in general, when the rows of t.p.m are the same, the corresponding Markov chain is a sequence of
IID RVs whose distribution is given by the first/any row of the t.p.m.
3-8 Lecture 3: Discrete-Time Markov Chain – Part I

Example 5: Snoqualmie Falls Precipitation (revisited).


In the Snoqualmie Falls Precipitation problem, we have a t.p.m
 
b = 0.602 0.398 .
P
0.166 0.834

If we consider the 2-step transition probability,


 
2 0.428 0.572
P =
b .
0.238 0.762

When we consider n-step transition probability with n large (in this case, n ≥ 10), it turns out that the
n-step transition probability matrix becomes
 
b n = 0.294 0.706 .
P
0.294 0.706

This implies that the initial distribution is uninformative – whether it is dry or wet on Jan 18th tells us little
about Jan 27th.
Recall that in the previous example, we saw that when the rows of a t.p.m. are the same, the correspond-
ing random variables are IID. Therefore, if we consider another sequence of RVs {Y0 (n) = Xk , Y1 (n) =
Xk+n , Y2 (n) = Xk+2n , · · · }, then Y0 (n), Y1 (n), Y2 (n), · · · are IID when n → ∞.
After seeing this example, one may conjecture that if the limit P∞ = limn→∞ Pn will have equal rows.
However, this is not always true. A counterexample is
 
0 1
P= .
1 0

3.5 First Step Analysis of Markov Chain

First-step analysis is a general strategy for solving many Markov chain problems by conditioning on the
first step of the Markov chain. We demonstrate this technique on a simple example – the Gambler’s ruin
problem.
Gambler’s ruin problem: Two players bet one dollar in each round. Player 1 wins with probability α
and loses with probability β = 1 − α. We assume that player 1 starts with a dollars and player 2 starts with
b dollars. Let Xn be the fortune of player 1 after n rounds. Xn can take values from 0 to a + b:

α if j = i + 1,

pij = P (Xn+1 = j|Xn = i) = β if j = i − 1,

0 otherwise.

Let T be the time of rounds when one of the players loses all his/her money. Because of the randomness of
this model, T is also a random variable. We are interested in the probability that player 1 wins the game,
which occurs when XT = a + b. Apparently, this probability depends on the initial amount of money that
player 1 has so we will denote it as

u(a) = P (XT = a + b|X0 = a).


Lecture 3: Discrete-Time Markov Chain – Part I 3-9

Note that u(0) = 0 and u(a + b) = 1. We may view u(a) as the probability that the chain is absorbed into
the state a + b at the hitting time T when the chain starts at X0 = a.
First step analysis proceeds as follows:

u(a) = P (XT = a + b|X0 = a)


a+b
X
= P (XT = a + b, X1 = j|X0 = a)
j=1
a+b
X
= P (XT = a + b|X1 = j, X0 = a)P (X1 = j|X0 = a)
j=1
a+b
X
= P (XT = a + b|X1 = j)P (X1 = j|X0 = a) (Markov property)
j=1

= P (XT = a + b|X1 = a + 1)P (X1 = a + 1|X0 = a)+


P (XT = a + b|X1 = a − 1)P (X1 = a − 1|X0 = a)
= u(a + 1)α + u(a − 1)β.

Therefore, we have u(a) = u(a + 1)α + u(a − 1)β with two boundary conditions: u(0) = 0, u(a + b) = 1.
Because α + β = 1,
(α + β)u(a) = u(a) = u(a + 1)α + u(a − 1)β

which implies
α(u(a + 1) − u(a)) = β(u(a) − u(a − 1)).

Define v(a) = u(a) − u(a − 1). Then the above leads to

β
αv(a + 1) = βv(a) ⇒ v(a + 1) = v(a).
α
By telescoping,
 2  a
β β β
v(a + 1) = v(a) = v(a − 1) = · · · = v(1).
α α α
Using the boundary condition u(0) = 0,

u(a) = u(a) − u(0)


Xa
= [u(j) − u(j − 1)]
j=1
Xa
= v(j)
j=1
a−1
X j
β
= v(1)
j=0
α

v(1) × a
  if α = β
= β
1−( α )
a
.
v(1) × β 6 β
if α =
1−( α )
3-10 Lecture 3: Discrete-Time Markov Chain – Part I

To find out v(1), we use the other boundary condition

1 = u(a + b) = u(a + b) − u(0)


a+b
X
= [u(j) − u(j − 1)]
j=1
a+b
X
= v(j)
j=1

v(1) × (a
 + b)  if α = β
= β a+b
1−( α ) .
v(1) × β if α 6= β
1−( α )

Therefore, 
1
 a+b if α = β
v(1) = β
1− α
β a+b if α 6= β.
1−( α )

and 
a
 a+b if α = β
a
u(a) = β
1−( α )
 β a+b if α 6= β.
1−( α )

3.6 Classification of States

We now turn to a classification of the states of a Markov chain that is crucial to understanding the behavior
of Markov chains.
An equivalence relation “∼” is a binary relation between elements of a set satisfying

1. Reflexivity: i ∼ i for all i

2. Symmetry: i ∼ j ⇒ j ∼ i

3. Transitivity: i ∼ j, j ∼ k ⇒ i ∼ k.

For a set S and a ∈ S, {s ∈ S : s ∼ a} is called an equivalence class. Equivalence relations will allow us to
split Markov chain state spaces into equivalence classes.
(m)
State j is accessible from state i (i → j) if there exists m ≥ 0 such that pij > 0. We say that i communicates
with j (i ↔ j) if j is accessible from i and i is accessible from j. A set of states C is a communicating
class if every pair of states in C communicates with each other, and no state in C communicates with any
state not in C.

Proposition 3.3 Communication of states is an equivalence relation.

Proof: Reflexivity and symmetry are clear. To prove transitivity, let i ↔ j and j ↔ k. We then want to
show that i ↔ k.
Lecture 3: Discrete-Time Markov Chain – Part I 3-11

Note that i → j if and only if the transition diagram contains a path from i to j. So there is a path from
i to j and from j to k. Concatenate the two to obtain a path from i to k, which testifies to the fact that
i → k.
Analogously, we have k → i and conclude that i ↔ k.

P
A set of states C is closed if j∈C pij = 1 for all i ∈ C.
Example 6. Consider a Markov chain with the following transition diagram:

2 4 5

1 6

Then {1, 2, 3}, {4, 5}, and {6} are the communication classes.
A Markov chain {Xn } is called irreducible if it has only one communication class, i.e., for all i and j ,
(n)
i ↔ j. For state i, di = gcd{n ≥ 1 : pii > 0} is called its period, where gcd = greatest common divisor
(n)
and di = +∞ if pii = 0 for all n ≥ 1.
Example 7. Consider the example with state space S = {0, 1, 2, ...} and Xn such that

p
 if i = 1,
P (Xn+1 = i|Xn = 0) = 1 − p if i = 0,

0 otherwise.

and for j 6= 0 
p
 if i = j + 1,
P (Xn+1 = i|Xn = j) = 1 − p if i = j − 1,

0 otherwise.

(5)
Then, d2 = gcd{2, 4, 5, 6, ...} = 1 though 1 is not in the list (think about why p22 > 0).
Example 8: Simple (1-D) Random Walk on the Integers. Consider another example with state space
Z. Let Xn be the position at time n. Then

P (Xn+1 = i − 1|Xn = i) = q and P (Xn+1 = i + 1|Xn = i) = p

with p = 1 − q. Suppose we start at 0, then it is clear that we cannot return to 0 after an odd number of
(2n+1)
steps, so p00 = 0 for all n, i.e.
(n)
d0 = gcd{n ≥ 1 : p00 > 0} = gcd{2, 4, 6, . . . } = 2.

Proposition 3.4 Period is a communication class property. Namely, i ↔ j ⇒ di = dj .

(n ) (n )
Proof: i ↔ j ⇒ there exists n1 , n2 such that pij 1 > 0 and pji 2 > 0. Then, by Chapman-Kolmogorov:
(n +n )
X (n ) (n ) (n ) (n )
pii 1 2 = pik 1 pki 2 ≥ pij 1 pji 2 > 0.
k
3-12 Lecture 3: Discrete-Time Markov Chain – Part I

Consequently we know that di |n1 + n2 . For example, suppose n1 = 3 and n2 = 5, then n1 + n2 = 3 + 5 = 8.


Therefore we know that di ≤ 8, i.e. we could have di = 8, 4, 2, 1; we know we could return after 8 time steps,
but it could be less.

Note: a|b means a divides b, i.e. there is an integer c s.t. b = ac.


(n)
Now, take any n such that pjj > 0. Then

(n +n2 +n) (n+n1 ) (n2 ) (n+n1 ) (n2 )


X
pii 1 = pik pki ≥ pij pji
k
" #
(n ) (n) (n ) (n ) (n) (n )
X
= pik 1 pkj pji 2 ≥ pij 1 pjj pji 2 > 0.
k

Hence, di |n1 + n2 + n.

Together, n1 + n2 = c1 di and n1 + n2 + n = c2 di imply that n = (c2 − c1 )di and as a result, di |n for all n
(n)
such that pjj > 0.

(n)
Since di is a divisor of the set {n : pjj > 0} and dj is the greatest common divisor of the same set (by
definition of period), di ≤ dj .

By symmetry, dj ≤ di ⇒ di = dj .

In a communication class, all states have the same period. Since all states communicate in an irreducible
Markov chains, it makes sense to define the period of such a Markov chain. If di = 1, state i is called
aperiodic. An irreducible Markov chain with period 1 is also called aperiodic.

Theorem 3.5 (Lattice Theorem (Brémaud p.75)) Suppose d is the period of an irreducible Markov
chain. Then for all states i, j there exists m ∈ {0, . . . , d − 1} and k ≥ 0 such that

(m+nd)
pij > 0, ∀n ≥ k.

Theorem 3.6 (Cyclic Classes) For any irreducible Markov chain one can find a unique partition of S
into d classes C0 , C1 , ..., Cd−1 such that for all k, and for i ∈ Ck ,
X
pij = 1,
j∈Ck+1

where, by convention Cd = C0 and where d is maximal (that is, there is no other such partition C00 , C10 , ..., Cd0 0 −1
with d0 > d).

Proof: Fix a state i and classify states j by the value of m in Lattice Theorem.
The number d ≥ 1 is the period of the chain. The classes C0 , C1 , ..., Cd−1 are called the cyclic classes.
Example 8: Simple (1-D) Random Walk on the Integers (revisited). Random walk on S = Z =
C0 + C1 where C0 and C1 are the sets of even and odd integers.
Lecture 3: Discrete-Time Markov Chain – Part I 3-13

3.7 Strong Markov Property

The Markov property states that the random variable at time n + m conditional on its behavior at time n
is independent of the at time prior to n. However, what if the time n is random?
Say we are interested in the behavior of XT +m given XT , where T is the first time that the Markov chain
hits the state 0. Do we still have the Markov property?
Some random time does not have the Markov property. Recall that the Markov property states that for
any m < n < k, Xm ⊥ Xk |Xn for non-random m, n, k. Let {Xn } be a Markov chain with state space
S = {1, 2, 3} and consider a random time

T = inf{n ≥ 1 : (Xn−1 , Xn , Xn+1 ) = (2, 1, 3) or (3, 1, 2)}.

To see why Markov property does not work for random m, n, k, consider m = T − 1, n = T, k = T + 1. Then
the probability

P (Xk = 3|Xn = 1, Xm = 2) = P (XT +1 = 3|XT = 1, XT −1 = 2) = 1 6= P (Xk = 3|Xn = 1)

because when Xm = Xn−1 = 3, P (Xk = 2|Xn = 1, Xm = 3) = P (XT +1 = 2|XT = 1, XT −1 = 3) = 1 so


P (Xk = 3|Xn = 1, Xm = 3) = 0. Thus, the conditional probability of Xk given Xn depends on Xm , which
is a violation of Markov property.
Therefore, it is crucial to identify a class of random time such that the Markov property holds. It turns out
that there is a simple class of random times that has the Markov property. This class is called the stopping
time.
A random variable τ ∈ {1, 2, 3, · · · } ∪ {∞} is called a stopping time if the event {τ = m} can be expressed
in terms of X0 , X1 , · · · , Xm . Intuitively, a stopping time is a random time such that we can observe it when
the time arrives.
Examples 9: Stopping times.

• Return time. Let Ti = inf{n ≥ 1 : Xn = i} is a stopping time because {Ti = m} = {X1 6=


i, · · · , Xm−1 6= i, Xm = i}. Ti is interpreted as the first time the chain returns to state i.
• Successive Returns. Let τk be the time of the k-th return to state i (note that τ1 = Ti ). Then τk is a
stopping time because (m )
X
{τk = m} = I(Xn = i) = k, Xm = i .
n=1

• Counterexample – non-stopping time: Let τ = inf{n ≥ 1 : Xn+1 = i} is not a stopping time because
when the time arrives at m, {τ = m} = {X1 6= i, · · · , Xm 6= i, Xm+1 = i} depends on Xm+1 .

Stopping time is a very important class of random variable in statistics. Many statistical procedure involves
a stopping time. For instance, if we are performing a sequence of experiments and we will stop when we
observe certain behavior such as a high signal or enough anomaly. Then the time (of related to the number
of sample) is a stopping time. If we want to use data from this sequence of experiments, then we need to
use theorems of stopping time (such as optional sampling theorem).

Theorem 3.7 (Strong Markov Property) Let {Xn } be a homogeneous Markov chain with a transition
probability matrix P = {pij } and let τ be a stopping time with respect to {Xn }. Then for any integer k,
(k)
P (Xτ +k = j|Xτ = i, X` = i` , 0 ≤ ` < τ ) = P (Xk = j|X0 = i) = pij
3-14 Lecture 3: Discrete-Time Markov Chain – Part I

and
(k)
P (Xτ +k = j|Xτ = i) = P (Xk = j|X0 = i) = pij .

Proof: We first prove the first equality.


P (Xτ +k = j, Xτ = i, 0 ≤ ` < τ, X` = i` )
P (Xτ +k = j|Xτ = i, 0 ≤ ` < τ, X` = i` ) =
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
P∞ (3.4)
P (Xτ +k = j, Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)
= r=1 .
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
Now, because τ is a stopping time, the event {τ = r} can be expressed as a function of X0 , · · · , Xr so the
Markov property implies
(k)
P (Xτ +k = j|Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r) = P (Xr+k = j|Xr = i) = pij .

Therefore, equation (3.4) becomes

P (Xτ +k = j|Xτ = i, 0 ≤ ` < τ, X` = i` )


P∞
P (Xτ +k = j, Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)
= r=1
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
P∞
P (Xτ +k = j|Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)P (Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)
= r=1
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
P∞ (k)
r=1 pij P (Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)
=
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
P∞
(k) r=1 P (Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)
= pij
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
(k)
= pij .

The second equality follows simply from the first equality:


P∞
P (Xτ +k = j|Xr = i, τ = r)P (Xr = i, τ = r)
P (Xτ +k = j|Xτ = i) = r=1
P (Xτ = i)
P∞
(k) r=1 P (Xr = i, τ = r)
= pij
P (Xτ = i)
(k)
= pij .

3.8 Stationary distribution

It is often of great interest to study the limiting behavior of a Markov chain Xn when n → ∞. Here, for
simplicity, we assume that our Markov chain is homogeneous. A property of limiting behavior is that Xn and
Xn+1 should have the same distribution when n is large. So we are interested in understanding if a Markov
chain will eventually converge to a ‘stable’ distribution (formally, we will call it a stationary distribution).
In particular, we would like to know given a Markov chain,

• does this chain has a stationary distribution?


Lecture 3: Discrete-Time Markov Chain – Part I 3-15

• if so, what is the stationary distribution?


• and does this stationary distribution unique?

It turns out that to answer these questions, we will use concepts related to return time. Thus, we start with
understanding properties about return time.

3.8.1 Return Times


P∞
Let Ni = n=1 I(Xn = i) denotes the number of visits of {Xn } to state i not counting the initial state. We
also define the following notations:

P (·|X0 = i) = Pi (·), E(·|X0 = i) = Ei (·).

Note that the quantity Ni may equal to ∞. It is a finite number with a non-zero probability if there are
some states such that when the chain enters one of them, the chain never go back to state i. Later we will
describe this phenomena using the concept of transient states and recurrent states.
Example 6 (revisited). Consider a Markov chain with the following transition diagram:

2 4 5

1 6

As can be seen easily, when the Markov chain enters states {4, 5, 6}, it never comes back to any of {1, 2, 3}.
Thus, N1 takes a non-trivial probability to be a finite number.
Let Ti = inf{n ≥ 1 : Xn = i} be the return time. Then the following events can be defined using either Ti
or Ni :
{Ti = ∞} = {Ni = 0}, {Ti < ∞} = {Ni > 0}.
These are useful later.
We then define fji = Pj (Ti < ∞) = Pj (Ni > 0) to be the probability of reaching state i in a finite number
of time when the chain starts at state j. Note that because Pj (Ti = ∞) + Pj (Ti < ∞) = 1, we have
fii = Pi (Ti < ∞) and Pi (Ti = ∞) = 1 − fii .

Proposition 3.8 (
fji fiir−1 (1 − fii ) if r ≥ 1
Pj (Ni = r) = .
1 − fji if r = 0

Proof: The case r = 0 is very simple because {Ni = 0} = {Ti = ∞}. Thus, Pj (Ni = 0) = Pj (Ti = ∞) =
1 − Pj (Ti < ∞) = 1 − fji .
For the rest of cases, we will do a proof by induction. Before doing that, we first investigate the case
Pj (Ni = r) for r > 0. Let τr be the r-th return time. Note that the event {Xτr = i} = {Ni ≥ r}.
3-16 Lecture 3: Discrete-Time Markov Chain – Part I

Then

Pj (Ni = r) = Pj (Ni = r, Xτr = i)


= Pj (Ni = r|Xτr = i)Pj (Xτr = i)

!
X
= Pj I(Xt = i) = 0 | Xτr = i Pj (Xτr = i)
t=τr +1

!
X
= Pi I(Xt = i) = 0 | X0 = i Pj (Xτr = i) (Strong Markov property)
t=1
= Pi (Ni = 0)Pj (Ni ≥ r)
= Pi (Ti = ∞)Pj (Ni ≥ r).

Therefore, we conclude

Pj (Ni = r) = Pi (Ti = ∞)Pj (Ni ≥ r) = (1 − fii )Pj (Ni ≥ r).

To start with the proof by induction, consider r = 1. Pj (Ni ≥ 1) = 1 − Pj (Ni = 0) = fji so Pj (Ni = 1) =
(1 − fii )fji , which agrees with what we need for r = 1.
Assume that it works for r ≤ k. Now we show that it works for r = k + 1. Note that this means that we
have (
fji fiir−1 (1 − fii ) if r = 1, · · · k
Pj (Ni = r) = .
1 − fji if r = 0
For the case of r = k + 1, we use the fact that

Pj (Ni = k + 1) = (1 − fii )Pj (Ni ≥ k + 1),

so all we need is the probability Pj (Ni ≥ k + 1).


This quantity can be easily calculated via

Pj (Ni ≥ k + 1) = 1 − Pj (Ni ≤ k)
k
X
= 1 − (1 − fji ) − fji fiir−1 (1 − fii )
r=1

= fji − fji (1 − fii )(1 + fii + fii2 + · · · + fiik−1 )


1 − fiik
= fji − fji (1 − fii )
1 − fii
= fji fiik .

Thus,
Pj (Ni = k + 1) = (1 − fii )Pj (Ni ≥ k + 1 = fji fiik (1 − f ii)
which is the formula for r = k + 1. Thus, by induction, the result holds.

The above formula also gives an interesting result on the case of ‘starting from state i, returning to state i’
when we set j = i:
Pi (Ni = r) = fiir (1 − fii ), Pi (Nr > r) = fiir+1 ,
where fii = Pi (Ti < ∞).
Lecture 3: Discrete-Time Markov Chain – Part I 3-17

We have seen many situations that Ti and Ni are closely related. Here is another result about their rela-
tionship.

Corollary 3.9
Pi (Ni = ∞) = 1 ⇔ Pi (Ti < ∞) = 1
and
Pi (Ti < ∞) < 1 ⇔ Pi (Ni = ∞) = 0 ⇔ Ei (Ni ) < ∞.

Corollary 3.9 links the finiteness of Ti and Ni and also relates it to the expectation. With the following
formula of expectation, Corollary 3.9 will be very useful:

X
E(X) = P (X ≥ t), (3.5)
t=1

when X is a random variable taking integer values.

3.8.2 Recurrence and Transience

Based on the return time property, we classify a state i as


(
recurrent/persistent, if Pi (Ti < ∞) = fii = 1
transient, otherwise.

Furthermore, a recurrent state is called


(
positive recurrent, if Ei (Ti ) < ∞
null recurrent, otherwise.

Note that: either Pi (Ni = ∞) = 0 or Pi (Ni = ∞) = 1, with nothing in between (if fii < 1, then Pi (Ni =
∞) = 0; if fii = 1, then Pi (Ni = ∞) = 1). This, together with Corollary 3.9, implies that Ei (Ni ) = ∞ ⇐⇒
Pi (Ni = ∞) = 1.
Note that:
fii = Pi (Ti < ∞) = 1 ⇐⇒ Pi (Ni = ∞) = 1.
In other words, if a Markov chain returns to state i in finite time, then the chain visits this state infinitely
often.

P∞ (n)
Proposition 3.10 State i is recurrent ⇐⇒ n=1 pii = ∞.

Proof: State i is recurrent ⇐⇒ Pi (Ti < ∞) = fii = 1 ⇐⇒ Pi (Ni = ∞) = 1 by Corollary 3.9.


It is easy to see that Pi (Ni = ∞) = 1 ⇐⇒ Ei (Ni ) = ∞.
P∞ (n)
Using equation (3.5), Ei (Ni ) = n=1 pii and the result follows.

Proposition 3.11 Recurrence is a communication class property, i.e. if i ↔ j and i is recurrent, then j is
recurrent.
3-18 Lecture 3: Discrete-Time Markov Chain – Part I

Proof: Homework.
Example: Gambler’s Ruin+. Recall that in Gambler’s ruin, the game ends when Xn hits 0 or
a + b. Now we extend the problem in the sense that the game does not ends when a player loses/takes
all money but the value of Xn stays the same once it hits 0 or a + b. Namely, Xn = 0 ⇒ Xn+1 = 0
(k) (k)
and Xn = a + b ⇒ Xn+1 = a + b. In this case p00 = pa+b,a+b = 1 for all k = 1, 2, · · · . Therefore,
P∞ (n) P∞ (n) P∞
n=1 p00 = n=1 pa+b,a+b = n=1 1 = ∞. Hence, 0 and a + b are recurrent states. Once they are reached
we stay there forever. Let q be the probability that player 1 loses. Consider state 1. If the next round
the player 1 loses, the chain stuck at 0 forever. Namely, T1 = ∞ because we can never come back. So
P1 (T1 = ∞) ≥ q, which implies

P1 (T1 < ∞) = 1 − P1 (T1 = ∞) ≤ 1 − q < 1 if q ∈ (0, 1).

Note that the inequality in P1 (T1 = ∞) ≥ q is due to the fact that even if player 1 wins, the game may end
at a + b, so the return time to state 1 may still be infinite. Therefore, by definition, 1 is a transient state.
Since states {1, . . . , a + b − 1} form a communication class, all states in this class are also transient. These
states are transient because they occur a finite number of times before absorption into states 0 or a + b.
Example 8: 1-D Random Walk (revisited). Let Xn be a random walk on the set of all integers Z such
that (
p if j = i + 1
pij =
q := 1 − p if j = i − 1.
(2n+1)
Let’s study recurrence of state 0. We know that p00 = 0 for all n ≥ 0 and that, conditional on X0 = 0,
X2n =d ξ1 + · · · + ξ2n , where ξ1 , . . . , ξn are i.i.d. with P (ξi = 1) = 1 − P (ξ = −1) = p. Hence,
 
(2n) 2n n n
p00 = P (X2n = 0|X0 = 0) = p q .
n
1 √
Recall that Stirlings formula says that n! ∼ nn+ 2 e−n 2π, meaning that

n!
lim 1 √ = 1.
n→∞ nn+ 2 e−n 2π

Therefore,

(2n) (2n)! n n
p00 = p q
n!n!
1 √
(2n)2n+ 2 e−2n 2π
∼ (pq)n
n2n+1 e−2n 2π
1 1
22n+ 2 n2n+ 2 (pq)n 22n (4pq)n
= 1√ (pq)n = √ = √ .
n2n+1 2 2 π πn πn

We deduce that
∞ ∞
X (n)
X (2n) 1
p00 = p00 =∞ ⇔ 4pq ≥ 1 ⇔ p=q= .
n=1 n=1
2
P∞
(Ratio Test: Let n=1 an be a series which satisfies limn→∞ | aan+1
n
| = k. If k > 1 the series diverges, if k < 1
the series converges.) Conclusion: Only the symmetric random walk is recurrent on Z. Interestingly, the
symmetric random walk on Z2 is also recurrent, but it is transient on Zn for n ≥ 3, See Brémaud (1999,
p. 98).
Lecture 3: Discrete-Time Markov Chain – Part I 3-19

3.8.3 Invariant Measures

With the knowledge about recurrence, we are able to talk about the invariant measures and stationary
distribution of a stochastic matrix.
A vector x 6= 0 is called an invariant measure of a stochastic matrix P if

• ∞ > xi ≥ 0 for each i, and

• xT P = xT , i.e., xi = j xj pji for each i.


P

A probability vector π on a Markov P chain state space is called a stationary distribution of a stochastic
matrix P if π T P = π T , i.e., πi = j πj pji for each i.

The equation xT P = xT or π T P = π T is also called the global balance equaitons – the probability flow in
equals the flow out. Note that for an invariance measure x such that c = i xi < ∞, c−1 x is a stationary
P
distribution. But it may happen that c = ∞ for some invariant measure so one cannot always normalize it.
Example 9: Two-State Markov Chain. Consider a Markov chain with two states and a transition
probability matrix  
1−p p
P= 0 < p < 1, 0 < q < 1.
q 1−q
The global balance equations:  
  1−p p  
π0 , π1 = π0 , π1 or
q 1−q
(
(1 − p)π0 + qπ1 = π0 q
⇒ pπ0 = qπ1 ⇒ π0 = π1 .
pπ0 + (1 − q)π1 = π1 p
Using that π0 + π1 = 1, we obtain
q p
π1 + π1 = 1 ⇒ π1 =
p p+q

and deduce that the global balance equations have the unique solution
 
q p
πT = , ,
p+q p+q

which is the stationary distribution.


Example: Gambler’s Ruin+ (simple version). Let the total fortune of both players be a + b = 4. Then
 
1 0 0 0 0
 q 0 p 0 0
 
P= 0 q 0 p 0 .

0 0 q 0 p
0 0 0 0 1

By inspection, vectors παT = [α, 0, 0, 0, 1 − α] satisfy global balance equations: παT P = παT for any α ∈ (0, 1).
So the Gambler’s ruin chain has an uncountable number of stationary distributions.
Here, we see the case where a Markov chain may have infinite number of stationary distribution. And in
some cases it may not even have a stationary distribution! So returning to our original questions, we would
3-20 Lecture 3: Discrete-Time Markov Chain – Part I

like to know (i) when will a Markov chain has a stationary distribution? and (ii) how to find a stationary
distribution? and (ii) when the stationary distribution will be unique?
The following proposition partially answer the first question. Note that a Markov chain is recurrent if all its
states are recurrent.

Proposition 3.12 Let {Xn } be an irreducible, recurrent, homogeneous Markov chain with transition prob-
ability matrix P. For each i ∈ S define


" #
X
yi = E0 I(Xn = i)I(n ≤ T0 ) ,
n=1

where 0 is an arbitrary reference state and T0 = inf{n ≥ 1 : Xn = 0} is the first return time to 0. Then
yi ∈ (0, ∞) for all i ∈ S, and yT = [y0 , y1 , . . . ] is an invariant measure of P.

Note: For i 6= 0, yi is the expected number of visits to state i before returning to 0.


Before starting the proof, we note the following three properties.

(P1) When i = 0,

" #
X
y0 = E0 I(Xn = 0)I(n ≤ T0 ) = 1
n=1

because for n ≥ 1, Xn = 0 if and only if n = T0 .


(P2)

" #
X X X
yi = E0 I(Xn = i)I(n ≤ T0 )
i∈S i∈S n=1
∞ X
" #
X
= E0 I(Xn = i)I(n ≤ T0 )
n=1 i∈S

" #
X
= E0 I(n ≤ T0 )
n=1
= E0 (T0 ).

(P3) For any i ∈ S, we define


(n)
q0i ≡ E0 (I(Xn = i)I(n ≤ T0 )) = P0 (X1 6= 0, X2 6= 0, · · · , Xn−1 6= 0, Xn = i)

to be the probability of visiting state i at time point n before returning to state 0. Thus,
∞ ∞
(n)
X X
yi = E0 (I(Xn = i)I(n ≤ T0 )) = q0i
n=1 n=1

(1)
and q0i = E0 (I(X1 = i)I(1 ≤ T0 )) = p0i .
P
Proof: This proof consists of two parts. In the first part, we prove that each yi satisfies yi = j∈S yj pji .
In the second part, we will show that 0 < yi < ∞ for every i ∈ S.
Lecture 3: Discrete-Time Markov Chain – Part I 3-21

P (n)
Part 1. To show that yi = j∈S yj pji , we analyze q0i defined in property (P3):

(n)
q0i = P0 (X1 6= 0, X2 6= 0, · · · , Xn−1 6= 0, Xn = i)
X
= P0 (X1 6= 0, X2 6= 0, · · · , Xn−1 = j, Xn = i)
j6=0
X
= P0 (Xn = i | X1 6= 0, X2 6= 0, · · · , Xn−1 = j) P0 (X1 6= 0, X2 =
6 0, · · · , Xn−1 = j)
| {z }
j6=0 (n−1)
=q0j
(n−1)
X
= P (Xn = i | Xn−1 = j)q0j (Markov property)
j6=0
(n−1)
X
= q0j pji .
j6=0

Thus,

(n)
X
yi = q0i
n=1

(n)
X
= p0i + q0i
n=2
∞ X
(n−1)
X
= p0i + q0j pji
n=2 j6=0
∞ X
(n)
X
= p0i + q0j pji
n=1 j6=0

!
(n)
X X
= p0i + q0j pji
j6=0 n=1
| {z }
=yj
X
= y0 p0i + yj pji
|{z}
=1 j6=0
X
= yj pji .
j∈S

Part 2. Now we show that 0 < yi < ∞. First note that y0 = 1 so we only need to focus on cases y 6= 0.
(n(i))
Because the Markov chain is irreducible, for each state i there exists a number n(i) ≥ 1 such that p0i > 0.
Then using the fact that y T = y T P implying y T = y T P(n(i)) ,
(n(i)) (n(i)) (n(i))
X X
yi = yj pji = y0 p0i + yj pji > 0.
j∈S
| {z }
j6=0
>0

To show that yi < ∞, we prove by contradiction. Assume that yi = ∞. Because the Markov chain is
(k(i))
irreducible, there exists a constant k(i) such that pi0 > 0. Then
(k(i)) (k(i)) (k(i))
X X
y0 = yj pj0 = yi pi0 + yj pj0 = ∞,
j∈S
| {z }
j6=i
=∞
3-22 Lecture 3: Discrete-Time Markov Chain – Part I

a contradiction. Thus, yi < ∞.

Proposition 3.13 The invariant measure of an irreducible and recurrent chain is unique up to a multiplica-
tive factor.

Proof: See Brémaud (1999, p. 102).

Proposition 3.14 An irreducible,P recurrent and homogeneous Markov chain is positive recurrent ⇔ all of
its invariant measures y satisfy i∈S yi < ∞.

Proof: By Proposition 3.12, there is an invariant measure y with

yi = E0 [I(Xn = i)I(n ≤ T0 )] .

Moreover, by Proposition 3.13, this


P invariant measure is unique up to a multiplicative factor. So what
remains to prove is to show that i∈S yi < ∞.
Using property (P2), X
yi = E0 (T0 ).
i∈S
P
Therefore, positive recurrent ⇐⇒ E0 (T0 ) < ∞ ⇐⇒ i∈S yi < ∞.

To see why positive recurrent is important, consider the following example about a 1 − D random walk on all
integers Z with p 6= q is transient and recurrent if p = q = 0.5. This Markov chain has an invariant measure
yT = [1, 1, . . . ] for any p and q since
 
.. .. ..
 . . . · · · · · · · · · · · ·
· · · q 0 p · · · · · · · · ·
 
P = · · · · · ·
 q 0 p · · · · · · 
· · · · · · · · · q 0 p · · ·
 
.. .. ..
··· ··· ··· ··· . . .

Since this measure is not normalizable (the state space is Z), the 1-D random walk can not be positive
recurrent. Thus, we see that an irreducible homogeneous Markov chain can have an invariant measure and
still be transient or null recurrent.

Lemma 3.15 Let {Xn } be a homogeneous Markov chain with state space S and n-step transition probability
(n) (n)
matrix Pn = {pij }. If i ∈ S is a transient state, then lim pji = 0 for all j ∈ S.
n→∞

P∞ (n) (n)
Proof: This proof relies a trick – if n=1 pji < ∞, then limn→∞ pji = 0. Thus, we only need to show
P∞ (n)
that n=1 pji < ∞ when i ∈ S is a transient state.
By definition,
∞ ∞ ∞ ∞
!
(n)
X X X X
pji = Pj (Xn = i) = Ej (I(Xn = i)) = Ej I(Xn = i) = Ej (Ni ).
n=1 n=1 n=1 n=1
Lecture 3: Discrete-Time Markov Chain – Part I 3-23

So we can switch our goal to Ej (Ni ).


Because Ni is a RV taking integer values, we can rewrite its expectation as

X ∞
X
Ej (Ni ) = Pj (Ni ≥ k) = Pj (Ni ≥ k + 1).
k=1 k=0

So we only need to compute each of Pj (Ni > k). Now, recalled from the proof of Proposition 3.8,

Pj (Ni ≥ k + 1) = fji fiik .

We obtain

X ∞
X
Ej (Ni ) = Pj (Ni ≥ k + 1) = fji fiik .
k=0 k=0

Because the state i is transient, fii < 1 so the above summation becomes

X fji
Ej (Ni ) = fji fiik = < ∞,
1 − fii
k=0

which is the desired result.

Finally, we obtain the criterion for stationary distribution.

Theorem 3.16 (Stationary Distribution Criterion) An irreducible homogeneous Markov chain is pos-
itive recurrent if and only if it has a stationary distribution. Moreover, if the stationary distribution
π T = [π1 , π2 , . . . ] exists, it is unique and πi > 0 for all i ∈ S.

Proof: ⇒: P
By Propositions 3.12 and 3.14, the vectorP
y defined in Proposition 3.12 is an invariant measure with i∈S yi <
∞. Thus, the probability vector π = y/ i∈S yi is the stationary distribution.
The uniqueness follows from Proposition 3.13.
⇐:
To prove this direction, we use proof by contradiction. Because recurrence is a communication class property
(Proposition 3.11) and the Markov chain is irreducible, the fact that a state i is transient implies every state
is transient. Let π be a stationary distribution and we assume that the Markov chain is transient.
(n)
By Lemma 3.15, limn→∞ pji = 0 for any state j ∈ S. Since π is a stationary distribution, π T = π T Pn .

Using the dominated convergence theorem (we can exchange summation and limit)1 ,
(n) (n)
X X X
πi = lim πi = lim πj pji = πj lim pji = πj × 0 = 0
n→∞ n→∞ n→∞
j∈S j∈S j∈S

for every state i ∈ S.


P
Then we conclude i∈S πi = 0 6= 1, a contradiction to the definition of stationary distribution. Thus, the
Markov chain is recurrent then by Proposition 3.14, the Markov chain is positive recurrent.

1 (n) P
This works because πj pji ≤ πj for every n and j∈S πj = 1.
3-24 Lecture 3: Discrete-Time Markov Chain – Part I

In the above case, we are working on a state space S that may possibly contain infinite number of states.
In many realistic scenarios the number of states is finite. Does the finiteness of state number gives us any
benefits? The answer is yes – and it gives us a huge benefit.

Theorem 3.17 An irreducible homogeneous Markov chain on a finite state space is positive recurrent.
Therefore, it always has a stationary distribution.

Proof:
We first prove that the chain is recurrent. We proceed by proof by contradiction. Assume that the chain is
transient. In the proof of Lemma 3.15, we have shown that if a state i is transient, then
∞ ∞
X (n)
X fji
pji = Ej (Ni ) = fji fiik = < ∞.
n=1
1 − fii
k=0

Because the number of state space is finite,



(n)
XX
pji < ∞.
i∈S n=1

However, if we exchange the summations,


∞ ∞ X
s ∞
(n) (n)
XX X X
pji = pji = 1 = ∞,
i∈S n=1 n=1 i=1 n=1

which is a contradiction. So we conclude that the chain is recurrent.


To see if the chain is positive recurrent, note that Proposition
P 3.12 shows that there exists an invariant
measure y. Because the number of state space is finite, i∈S yi < ∞ so by Proposition 3.14, the chain is
positive recurrent.

Finally, we end this lecture on the relation between the return time and the stationary distribution.

Theorem 3.18 Let {Xn } be an irreducible homogeneous positive recurrent Markov chain. Then
1
πi = ,
Ei (Ti )
where π = (π1 , · · · , πs ) is the stationary distribution of {Xn } and Ti = inf{n ≥ 1 : Xn = i} is the return
time to state i.
P∞
Proof: Define a vector y such that yi = E0 ( n=1 I(Xn = i)I(n ≤ T0 )). We already know that such a vector
describes an invariant measure and πi = P yi yj .
j∈S
P∞
Now
P we consider the case i = 0. Then y0 = E0 ( n=1 I(X n = 0)I(n ≤ T0 )) = 1 by property (P1). Moreover,
y0 1
y
i∈S i = E (T
0 0 ) due to property (P2). Thus, π 0 = P
yi = E0 (T0 ) .
i∈S

Because state 0 is just a reference state, we can apply the same argument to any other state. Thus, we
1
conclude that πi = Ei (Ti)
for each i ∈ S.

Here is a short summary about what we have learned so far:


Lecture 3: Discrete-Time Markov Chain – Part I 3-25

1. Irreducibility + recurrence ⇒ There exists an invariant measure that is unique up to a proportionality


constant.
2. Irreducibility + positive recurrence ⇔ Irreducibility + there exists a stationary distribution π and it
is unique. Moreover, when π exists, πi > 0 and πi = 1/Ei [Ti ].
3. Irreducibility + finite state-space ⇒ Irreducibility + positive recurrence.

Here is a summary about the classification of states. Recall fii = Pi (Ti < ∞) is the probability of return to
i given we start at i and Ei (Ti ) is the expected return time. State i is called:

1. Recurrent if fii = 1.
2. Transient if fii < 1.
3. Positive Recurrent if fii = 1 and Ei (Ti ) < ∞.

4. Null Recurrent if fii = 1 and Ei (Ti ) = ∞.


(n)
5. Periodic with period di if pii = 0 for all n not divisible by di , and di (> 1) is the greatest such integer.
6. Aperiodic if di = 1.

7. Ergodic if 3. and 6. apply.


8. Absorbing if pii = 1.

You might also like