0% found this document useful (0 votes)

12 views25 pages

Discrete-Time Markov Chains Overview

This document covers the fundamentals of discrete-time Markov chains as part of a stochastic modeling course. It introduces stochastic processes, defines Markov chains, and discusses their properties, including one-step memory and transition probability matrices. Several examples illustrate the application of Markov chains in various scenarios, such as genetic drift and precipitation levels.

Uploaded by

zwu363

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views25 pages

Discrete-Time Markov Chains Overview

Uploaded by

zwu363

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

STAT 516: Stochastic Modeling of Scientific Data Autumn 2024

Lecture 3: Discrete-Time Markov Chain – Part I

Instructor: Yen-Chi Chen

These notes are partially based on those of Mathias Drton.

3.1 Introduction

Before introducing Markov chain, we first talk about stochastic processes. A stochastic process is a family
of RVs Xn that is indexed by n, where n ∈ T . Note that sometimes people write Xt with t ∈ T . The set T
is called the index set. There are two common types of stochastic processes:

• T is discrete. For instance, T = {0, 1, 2, 3, · · · }. Then Xn is called a discrete-time stochastic process.

• T is continuous. For instance T = [0, ∞). Then Xn is called a continuous-time stochastic process.

Each random variable Xn can have a discrete, continuous, or mixed distribution. For example, in a queue
Xn could represent the time that the n-th customer waits after arrival before receiving service, with a
distribution that has an atom at zero but is otherwise continuous. Usually, each Xn takes its values in the
same set, which is called the state space and denoted S. Therefore, Xn ∈ S.
We will focus on discrete time stochastic processes with a discrete state space in this course.
Let {Xn : n = 0, 1, 2, ...} be a discrete time stochastic process and the state space S is discrete. The
probability model of this process is determined by the joint distribution function:

P (X0 = x0 , X1 = x1 , · · · , Xn = xn )

for all n = 0, 1, ... and x0 , x1 , ... ∈ S.

In general, this joint distribution function can be arbitrary so it is very complex. We need some addition
modeling on the joint distribution function to make it simpler enough that we can analyze. One particular
example is the IID assumptions – X0 , X1 , · · · are IID. In this case, the join distribution function can be
factorized into products of identical functions.
However, IID assumptions are often too strong for some scenarios. For instance, when modeling the genetic
drift, the IID assumption on each generation is not a good model. Here are two examples where the IID
assumptions do not work.
Example 1: Excess number of heads over tails in tossing a coin. Assume we toss a coin n times
and let Xn denote the excess number of heads over tails. Clearly, Xn ∈ {· · · , −2, −1, 0, 1, 2, · · · }. Assume
that each coin is tossed independently, then the process {Xn } has a one-step memory such that

P (x0 , · · · , xn ) = P (xn |xn−1 ) × P (xn−1 |xn−2 ) × P (x1 |x0 ) × P (x0 ),

where P (x0 , · · · , xn ) = P (X0 = x0 , X1 = x1 , · · · , Xn = xn ). Clearly, P (x0 , x1 ) 6= P (x0 )P (x1 ) so the process

is not independent although P (x0 , x2 |x1 ) = P (x0 |x1 )P (x2 |x1 ).

3-1
3-2 Lecture 3: Discrete-Time Markov Chain – Part I

Example 2: Precipitation level. Let the precipitation level of day n be Xn where Xn = 0 (dry) or 1 (wet).
Assume that the precipitation level of a day given all the past history only depends on the precipitation level
of the last two days. Namely,

P (xn |xn−1 , xn−2 , · · · , x0 ) = P (xn |xn−1 , xn−2 ).

Then the joint distribution function is

P (x0 , · · · , xn ) = P (xn |xn−1 , xn−2 ) × P (xn−1 |xn−2 , xn−3 ) × P (x2 |x1 , x0 ) × P (x1 |x0 ) × P (x0 ).

Again, it is clear that the RVs X0 , X1 , · · · , Xn are not IID. Although this model shows that the process has
a two-step memory, we can reformulate it to a one-step memory process by defining a sequence of random
vectors Y0 , Y1 , · · · , Yn , · · · such that Yn = (Xn , Xn+1 ). Then

P (yn |yn−1 , yn−2 , · · · , y0 ) = P (yn |yn−1 )

so the process {Yn } has a one-step memory.

The above two examples motivates us to study the process with a one-step memory. And such a stochastic
process is known as the Markov chain .

3.2 Markov Chain

A discrete time stochastic process {Xn : n = 0, 1, 2, · · · } is called a Markov chain if for every x0 , x1 , · · · , xn−2 , i, j ∈
S and n ≥ 0,
P (Xn = i|Xn−1 = j, · · · , X0 = x0 ) = P (Xn = i|Xn−1 = j)
whenever both sides are well-defined. The Markov chain has a one-step memory.
If the distribution function P (Xn = xn |Xn−1 = xn−1 ) = pij is independent of n, we called {Xn } a ho-
mogeneous Markov chain. Otherwise we called it an inhomogeneous Markov chain. For a homogeneous
Markov chain, X
pij = 1, pij ≥ 0
j∈S

for every i, j. Note that sometimes people write pij = pi→j , where pi→j stands for that the probability
moving from state i to state j.
Because S is a discrete set, we often label it as S = {1, 2, 3, · · · , s} and the elements {pij : i, j = 1, · · · , s}
forms an s × s matrix P = {pij }. P is called the transition (probability) matrix (t.p.m). The property
of homogeneous Markov chain implies that

P ≥ 0, P1s = 1s , (3.1)

where 1s = (1, 1, 1, 1, · · · , 1)T is the vector of 1’s. Note that any matrix satisfying equation (3.1) is called a
stochastic matrix.
Example 3: SIS (Susceptible-Infected-Susceptible) model.
Suppose we observe an individual over a sequence of days n = 1, 2, . . . and classify this individual each day
as (
I if infected
Xn =
S if susceptible.
Lecture 3: Discrete-Time Markov Chain – Part I 3-3

We would like to construct a stochastic model for the sequence {Xn : n = 1, 2, ...}. One possibility is to
assume that the Xn ’s are independent with P (Xn = I) = 1 − P (Xn = S) = α. However, this model is not
very realistic since we know from experience that the individual is more likely to stay infected if he or she is
already infected.
Since Markov chains are the simplest models that allow us to relax independence, we proceed by defining a
transition probability matrix:
% I S
P= I 1−α α
S β 1−β
It can be helpful to visualize the transitions that are possible (have positive probability) by a transition
diagram:
p

1−p I S 1−q

Example 4: Example: Ehrenfest Model of Diffusion.

We start with N particles in a closed box, divided into two compartments that are in contact with each
other so that particles may move between compartments. At each time epoch, one particle is chosen uni-
formly at random and moved from its current compartment to the other compartment. Let Xn be the
number of particles in compartment 1 (say) at step n. This stochastic process is Markov by construction.

Xn = 3

Transition probabilities of the Markov chain are:


i
N .
 for j = i − 1,
i
pij = 1− N, for j = i + 1,

0, otherwise.


The probability of transfer depends on the number of particles in each compartment. For N = 2 we have
states 0, 1, 2 and t.p.m.  
0 1 0
P = 1/2 0 1/2
0 1 0
and the transition diagrm

0 1 2

Example 5: Snoqualmie Falls Precipitation.

3-4 Lecture 3: Discrete-Time Markov Chain – Part I

There is a data on the precipitation (in inches), recorded by the UW Weather Service, at Sonqualmie Falls
in the years 1948–1983. We examine the data for January only and consider dry (=0) and wet (=1) only. If
we condition on the state on January 1st we obtain the frequencies of the four different transitions as:
0 1 Total
0 186 123 309
1 128 643 771
Total (314) (766) 1080
For example, there were 123 occasions on which a wet day followed a dry day.
From the table of frequencies we can compute the relative frequencies of transitions:

0.602 0.398
P=
b
0.166 0.834
This is an estimate (hence the hat!) of the t.p.m.

3.3 Property of Markov chain

Here are some important properties of a Markov chain.

Property: joint probability. Suppose we observe a finite realization of the discrete Markov chain and
want to compute the probability of this random event:
P (Xn = in , Xn−1 = in−1 , · · · , X1 = i1 , X0 = i0 )
= P (Xn = in |Xn−1 = in−1 , · · · , X0 = i0 ) P (Xn−1 = in−1 , · · · , X0 = i0 )
= pin−1 ,in × P (Xn−1 = in−1 |Xn−2 = in−2 , . . . , X0 = i0 ) × P (Xn−2 = in−2 , . . . , X0 = i0 )
= ···
= p0i0 pi0 ,i1 pi1 ,i2 · · · pin−2 ,in−1 pin−1 ,in
where p0 = (p01 , p02 , . . . )T is the distribution of X0 , called the initial distribution of {Xn }. Thus, every
Markov chain is fully specified by its transition probability matrix P and initial distribution p0 .
Property: Markov property. The Markov chain has a powerful property called the Markov property –
the distribution of Xm+n given a set of previous states depends only on the latest available state. Assume
that we observe a Markov chain from n = 0, 1, · · · , n and we are analyzing the distribution of Xm+n . Then
P (Xm+n = j|Xn = i, Xn−1 = in−1 , ..., X0 = i0 ) = P (Xm+n = j|Xn = i). (3.2)

To give an intuition about how we obtain the Markov property, consider a simple case where n = 1 and
m = 2.
X
P (X3 = i3 |X1 = i1 , X0 = i0 ) = P (X3 = i3 , X2 = i2 |X1 = i1 , X0 = i0 )
i2
X
= P (X3 = i3 |X2 = i2 , X1 = i1 , X0 = i0 )P (X2 = i2 |X1 = i1 , X0 = i0 )
i2
!
X
= P (X3 = i3 |X2 = i2 , X1 = i1 )P (X2 = i2 |X1 = i1 )
i2
X
= P (X3 = i3 , X2 = i2 |X1 = i1 )
i2

= P (X3 = i3 |X1 = i1 ).
Lecture 3: Discrete-Time Markov Chain – Part I 3-5

!
To argue the equality ‘=’, observe that

P (X3 = i3 |X2 = i2 , X1 = i1 , X0 = i0 ) = P (X3 = i3 |X2 = i2 ).

But we also have that

= P (X3 = i3 |X2 = i2 ).

Property: conditional independence. We can represent the Markov chain using a simple graphical
model:

0 1 2 n−1 n

The claim of the Markov property is now obvious from the theorem on conditional independence and graphical
factorization. Indeed, the latest available state serves as a separating set.
Using the graph representation, we obtain an interesting property about a Markov chain: the past and the
future are independent given the present.
To see this, again we consider a simple case where n = 2 and we have X0 , X1 , X2 . Here X0 denotes the past,
X1 denotes the present, and X2 denotes the future. Then

for any i0 , i1 , i2 ∈ S. Namely, X0 and X2 are conditional independent given X1 .

3.4 n-step Transition Probability and Chapman-Kolmogorov Equa-

tion

For a Markov chain, we define the n-step transition probability as

(n)
pij = P (Xn = j|X0 = i).

The n-step transition probability is time invariant.

3-6 Lecture 3: Discrete-Time Markov Chain – Part I

(n)
Lemma 3.1 Let {Xn } be a homogeneous Markov chain and let pij be the n-step transition probability.
Then for any k = 0, 1, 2, · · · ,
(n)
P (Xn+k = j|Xk = i) = pij .

The n-step transition probabilities are related to each other via the famous Chapman-Kolmogorov Equation.

(n)
Lemma 3.2 Let {Xn } be a homogeneous Markov chain and let pij be the n-step transition probability.
Then for any n, m = 0, 1, 2, · · ·
(n+m)
X (n) (m)
pij = pik pkj . (3.3)
k∈S

The Chapman-Kolmogorov Equation (equation (3.3)) also implies

(n+1)
X (n)
Forward equation : pij = pik pkj , for n = 1, 2, . . . and
k
(n+1) (n)
X
Backward equation : pij = pik pkj , for n = 1, 2, . . . .
k

The forward equation singles out the final step and has the initial state i fixed. The equation is most useful
(n)
when interest centers on the pij ’s for a particular i but all values of j. Conversely, the backward equation
singles out the change from the initial state i and has the final state j fixed. This equation is useful when
(n)
interest is in the pij ’s for a particular j but all values of i. The backward equation can be interesting, in
particular, when there is an absorbing state j from which there is no escape (pjj = 1).
Lecture 3: Discrete-Time Markov Chain – Part I 3-7

(n)
If we collect the n-step transition probabilities into the matrix P(n) = {pij }, then Kolmogorov’s forward
and backward equations can be rewritten in matrix form as

P(n+1) = P(n) P = PP(n) ,

where P(1) = P. Therefore, P(n) = Pn .

This matrix form also implies a cool property about the marginal distribution of Xn . Assume that X0 has
a distribution p0 = (p01 , p02 , · · · , p0s )T . Let pn = (pn1 , · · · , pns )T be the marginal distribution of Xn , i.e.,
pnj = P (Xn = j). Then
X
pnj = P (Xn = j) = P (Xn = j, X0 = i)
i
X
= P (Xn = j|X0 = i)P (X0 = i)
i
(n)
X
= p0i pij .
i

Using the matrix form, we obtain

pTn = pT0 Pn .

Example 3: SIS model (revisited).

Recalled that SIS model has a transition probability

% 0 1
P= 0 1−α α
1 β 1−β

Note that we use {0, 1} to denote the state I and S in the SIS model.
Assume that the initial distribution p0 = (1−α, α), i.e., P (X0 = 0) = 1−α. Moreover, assume that β = 1−α
so the distribution of X1 will be

1−α α
pT1 = pT0 P = (1 − α, α)
1−α α
= [(1 − α)2 + α(1 − α), α(1 − α) + α2 ] = (1 − α, α) = pT0 .

What will be the distribution of Xn ? Using the matrix form, we know that

pTn = pT0 Pn = pT1 Pn−1 = pT0 Pn−1 = · · · = pT0 .

Therefore, P (Xn = 0) = 1 − α and P (Xn = 1) = α for all n = 1, 2, 3, · · · .

A more interesting fact is the joint distribution of X0 , X1 , · · · , Xn :

P (X0 = i0 , X1 = i1 , · · · Xn = in ) = p0i0 pi0 i1 pi1 i2 · · · pin−1 in

= αi0 (1 − α)1−i0 αi1 (1 − α)1−i1 αi2 (1 − α)1−i2 · · · αin (1 − α)1−in ,

which is the joint PMF of IID random Bernoulli variables with parameter α. Therefore, under this special
case, the Markov chain reduces to IID Bernoulli RVs.
Note that in general, when the rows of t.p.m are the same, the corresponding Markov chain is a sequence of
IID RVs whose distribution is given by the first/any row of the t.p.m.
3-8 Lecture 3: Discrete-Time Markov Chain – Part I

Example 5: Snoqualmie Falls Precipitation (revisited).

In the Snoqualmie Falls Precipitation problem, we have a t.p.m

b = 0.602 0.398 .
P
0.166 0.834

If we consider the 2-step transition probability,

2 0.428 0.572
P =
b .
0.238 0.762

When we consider n-step transition probability with n large (in this case, n ≥ 10), it turns out that the
n-step transition probability matrix becomes

b n = 0.294 0.706 .
P
0.294 0.706

This implies that the initial distribution is uninformative – whether it is dry or wet on Jan 18th tells us little
about Jan 27th.
Recall that in the previous example, we saw that when the rows of a t.p.m. are the same, the correspond-
ing random variables are IID. Therefore, if we consider another sequence of RVs {Y0 (n) = Xk , Y1 (n) =
Xk+n , Y2 (n) = Xk+2n , · · · }, then Y0 (n), Y1 (n), Y2 (n), · · · are IID when n → ∞.
After seeing this example, one may conjecture that if the limit P∞ = limn→∞ Pn will have equal rows.
However, this is not always true. A counterexample is

0 1
P= .
1 0

3.5 First Step Analysis of Markov Chain

First-step analysis is a general strategy for solving many Markov chain problems by conditioning on the
first step of the Markov chain. We demonstrate this technique on a simple example – the Gambler’s ruin
problem.
Gambler’s ruin problem: Two players bet one dollar in each round. Player 1 wins with probability α
and loses with probability β = 1 − α. We assume that player 1 starts with a dollars and player 2 starts with
b dollars. Let Xn be the fortune of player 1 after n rounds. Xn can take values from 0 to a + b:

α if j = i + 1,

pij = P (Xn+1 = j|Xn = i) = β if j = i − 1,

0 otherwise.


Let T be the time of rounds when one of the players loses all his/her money. Because of the randomness of
this model, T is also a random variable. We are interested in the probability that player 1 wins the game,
which occurs when XT = a + b. Apparently, this probability depends on the initial amount of money that
player 1 has so we will denote it as

u(a) = P (XT = a + b|X0 = a).

Lecture 3: Discrete-Time Markov Chain – Part I 3-9

Note that u(0) = 0 and u(a + b) = 1. We may view u(a) as the probability that the chain is absorbed into
the state a + b at the hitting time T when the chain starts at X0 = a.
First step analysis proceeds as follows:

u(a) = P (XT = a + b|X0 = a)

= P (XT = a + b|X1 = a + 1)P (X1 = a + 1|X0 = a)+

P (XT = a + b|X1 = a − 1)P (X1 = a − 1|X0 = a)
= u(a + 1)α + u(a − 1)β.

Therefore, we have u(a) = u(a + 1)α + u(a − 1)β with two boundary conditions: u(0) = 0, u(a + b) = 1.
Because α + β = 1,
(α + β)u(a) = u(a) = u(a + 1)α + u(a − 1)β

which implies
α(u(a + 1) − u(a)) = β(u(a) − u(a − 1)).

Define v(a) = u(a) − u(a − 1). Then the above leads to

β
αv(a + 1) = βv(a) ⇒ v(a + 1) = v(a).
α
By telescoping,
2 a
β β β
v(a + 1) = v(a) = v(a − 1) = · · · = v(1).
α α α
Using the boundary condition u(0) = 0,

u(a) = u(a) − u(0)

Xa
= [u(j) − u(j − 1)]
j=1
Xa
= v(j)
j=1
a−1
X j
β
= v(1)
j=0
α

v(1) × a
if α = β
= β
1−( α )
a
.
v(1) × β 6 β
if α =
1−( α )
3-10 Lecture 3: Discrete-Time Markov Chain – Part I

To find out v(1), we use the other boundary condition

1 = u(a + b) = u(a + b) − u(0)

a+b
X
= [u(j) − u(j − 1)]
j=1
a+b
X
= v(j)
j=1

v(1) × (a
+ b) if α = β
= β a+b
1−( α ) .
v(1) × β if α 6= β
1−( α )

Therefore, 
1
 a+b if α = β
v(1) = β
1− α
β a+b if α 6= β.
1−( α )


and 
a
 a+b if α = β
a
u(a) = β
1−( α )
 β a+b if α 6= β.
1−( α )

3.6 Classification of States

We now turn to a classification of the states of a Markov chain that is crucial to understanding the behavior
of Markov chains.
An equivalence relation “∼” is a binary relation between elements of a set satisfying

1. Reflexivity: i ∼ i for all i

2. Symmetry: i ∼ j ⇒ j ∼ i

3. Transitivity: i ∼ j, j ∼ k ⇒ i ∼ k.

For a set S and a ∈ S, {s ∈ S : s ∼ a} is called an equivalence class. Equivalence relations will allow us to
split Markov chain state spaces into equivalence classes.
(m)
State j is accessible from state i (i → j) if there exists m ≥ 0 such that pij > 0. We say that i communicates
with j (i ↔ j) if j is accessible from i and i is accessible from j. A set of states C is a communicating
class if every pair of states in C communicates with each other, and no state in C communicates with any
state not in C.

Proposition 3.3 Communication of states is an equivalence relation.

Proof: Reflexivity and symmetry are clear. To prove transitivity, let i ↔ j and j ↔ k. We then want to
show that i ↔ k.
Lecture 3: Discrete-Time Markov Chain – Part I 3-11

Note that i → j if and only if the transition diagram contains a path from i to j. So there is a path from
i to j and from j to k. Concatenate the two to obtain a path from i to k, which testifies to the fact that
i → k.
Analogously, we have k → i and conclude that i ↔ k.

P
A set of states C is closed if j∈C pij = 1 for all i ∈ C.
Example 6. Consider a Markov chain with the following transition diagram:

2 4 5

1 6

Then {1, 2, 3}, {4, 5}, and {6} are the communication classes.
A Markov chain {Xn } is called irreducible if it has only one communication class, i.e., for all i and j ,
(n)
i ↔ j. For state i, di = gcd{n ≥ 1 : pii > 0} is called its period, where gcd = greatest common divisor
(n)
and di = +∞ if pii = 0 for all n ≥ 1.
Example 7. Consider the example with state space S = {0, 1, 2, ...} and Xn such that

p
 if i = 1,
P (Xn+1 = i|Xn = 0) = 1 − p if i = 0,

0 otherwise.


and for j 6= 0 
p
 if i = j + 1,
P (Xn+1 = i|Xn = j) = 1 − p if i = j − 1,

0 otherwise.


(5)
Then, d2 = gcd{2, 4, 5, 6, ...} = 1 though 1 is not in the list (think about why p22 > 0).
Example 8: Simple (1-D) Random Walk on the Integers. Consider another example with state space
Z. Let Xn be the position at time n. Then

P (Xn+1 = i − 1|Xn = i) = q and P (Xn+1 = i + 1|Xn = i) = p

with p = 1 − q. Suppose we start at 0, then it is clear that we cannot return to 0 after an odd number of
(2n+1)
steps, so p00 = 0 for all n, i.e.
(n)
d0 = gcd{n ≥ 1 : p00 > 0} = gcd{2, 4, 6, . . . } = 2.

Proposition 3.4 Period is a communication class property. Namely, i ↔ j ⇒ di = dj .

(n ) (n )
Proof: i ↔ j ⇒ there exists n1 , n2 such that pij 1 > 0 and pji 2 > 0. Then, by Chapman-Kolmogorov:
(n +n )
X (n ) (n ) (n ) (n )
pii 1 2 = pik 1 pki 2 ≥ pij 1 pji 2 > 0.
k
3-12 Lecture 3: Discrete-Time Markov Chain – Part I

Consequently we know that di |n1 + n2 . For example, suppose n1 = 3 and n2 = 5, then n1 + n2 = 3 + 5 = 8.

Therefore we know that di ≤ 8, i.e. we could have di = 8, 4, 2, 1; we know we could return after 8 time steps,
but it could be less.

Note: a|b means a divides b, i.e. there is an integer c s.t. b = ac.

(n)
Now, take any n such that pjj > 0. Then

(n +n2 +n) (n+n1 ) (n2 ) (n+n1 ) (n2 )

X
pii 1 = pik pki ≥ pij pji
k
" #
(n ) (n) (n ) (n ) (n) (n )
X
= pik 1 pkj pji 2 ≥ pij 1 pjj pji 2 > 0.
k

Hence, di |n1 + n2 + n.

Together, n1 + n2 = c1 di and n1 + n2 + n = c2 di imply that n = (c2 − c1 )di and as a result, di |n for all n
(n)
such that pjj > 0.

(n)
Since di is a divisor of the set {n : pjj > 0} and dj is the greatest common divisor of the same set (by
definition of period), di ≤ dj .

By symmetry, dj ≤ di ⇒ di = dj .

In a communication class, all states have the same period. Since all states communicate in an irreducible
Markov chains, it makes sense to define the period of such a Markov chain. If di = 1, state i is called
aperiodic. An irreducible Markov chain with period 1 is also called aperiodic.

Theorem 3.5 (Lattice Theorem (Brémaud p.75)) Suppose d is the period of an irreducible Markov
chain. Then for all states i, j there exists m ∈ {0, . . . , d − 1} and k ≥ 0 such that

(m+nd)
pij > 0, ∀n ≥ k.

Theorem 3.6 (Cyclic Classes) For any irreducible Markov chain one can find a unique partition of S
into d classes C0 , C1 , ..., Cd−1 such that for all k, and for i ∈ Ck ,
X
pij = 1,
j∈Ck+1

where, by convention Cd = C0 and where d is maximal (that is, there is no other such partition C00 , C10 , ..., Cd0 0 −1
with d0 > d).

Proof: Fix a state i and classify states j by the value of m in Lattice Theorem.
The number d ≥ 1 is the period of the chain. The classes C0 , C1 , ..., Cd−1 are called the cyclic classes.
Example 8: Simple (1-D) Random Walk on the Integers (revisited). Random walk on S = Z =
C0 + C1 where C0 and C1 are the sets of even and odd integers.
Lecture 3: Discrete-Time Markov Chain – Part I 3-13

3.7 Strong Markov Property

The Markov property states that the random variable at time n + m conditional on its behavior at time n
is independent of the at time prior to n. However, what if the time n is random?
Say we are interested in the behavior of XT +m given XT , where T is the first time that the Markov chain
hits the state 0. Do we still have the Markov property?
Some random time does not have the Markov property. Recall that the Markov property states that for
any m < n < k, Xm ⊥ Xk |Xn for non-random m, n, k. Let {Xn } be a Markov chain with state space
S = {1, 2, 3} and consider a random time

T = inf{n ≥ 1 : (Xn−1 , Xn , Xn+1 ) = (2, 1, 3) or (3, 1, 2)}.

To see why Markov property does not work for random m, n, k, consider m = T − 1, n = T, k = T + 1. Then
the probability

P (Xk = 3|Xn = 1, Xm = 2) = P (XT +1 = 3|XT = 1, XT −1 = 2) = 1 6= P (Xk = 3|Xn = 1)

because when Xm = Xn−1 = 3, P (Xk = 2|Xn = 1, Xm = 3) = P (XT +1 = 2|XT = 1, XT −1 = 3) = 1 so

P (Xk = 3|Xn = 1, Xm = 3) = 0. Thus, the conditional probability of Xk given Xn depends on Xm , which
is a violation of Markov property.
Therefore, it is crucial to identify a class of random time such that the Markov property holds. It turns out
that there is a simple class of random times that has the Markov property. This class is called the stopping
time.
A random variable τ ∈ {1, 2, 3, · · · } ∪ {∞} is called a stopping time if the event {τ = m} can be expressed
in terms of X0 , X1 , · · · , Xm . Intuitively, a stopping time is a random time such that we can observe it when
the time arrives.
Examples 9: Stopping times.

• Return time. Let Ti = inf{n ≥ 1 : Xn = i} is a stopping time because {Ti = m} = {X1 6=

i, · · · , Xm−1 6= i, Xm = i}. Ti is interpreted as the first time the chain returns to state i.
• Successive Returns. Let τk be the time of the k-th return to state i (note that τ1 = Ti ). Then τk is a
stopping time because (m )
X
{τk = m} = I(Xn = i) = k, Xm = i .
n=1

• Counterexample – non-stopping time: Let τ = inf{n ≥ 1 : Xn+1 = i} is not a stopping time because
when the time arrives at m, {τ = m} = {X1 6= i, · · · , Xm 6= i, Xm+1 = i} depends on Xm+1 .

Stopping time is a very important class of random variable in statistics. Many statistical procedure involves
a stopping time. For instance, if we are performing a sequence of experiments and we will stop when we
observe certain behavior such as a high signal or enough anomaly. Then the time (of related to the number
of sample) is a stopping time. If we want to use data from this sequence of experiments, then we need to
use theorems of stopping time (such as optional sampling theorem).

Theorem 3.7 (Strong Markov Property) Let {Xn } be a homogeneous Markov chain with a transition
probability matrix P = {pij } and let τ be a stopping time with respect to {Xn }. Then for any integer k,
(k)
P (Xτ +k = j|Xτ = i, X` = i` , 0 ≤ ` < τ ) = P (Xk = j|X0 = i) = pij
3-14 Lecture 3: Discrete-Time Markov Chain – Part I

and
(k)
P (Xτ +k = j|Xτ = i) = P (Xk = j|X0 = i) = pij .

Proof: We first prove the first equality.

P (Xτ +k = j, Xτ = i, 0 ≤ ` < τ, X` = i` )
P (Xτ +k = j|Xτ = i, 0 ≤ ` < τ, X` = i` ) =
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
P∞ (3.4)
P (Xτ +k = j, Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)
= r=1 .
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
Now, because τ is a stopping time, the event {τ = r} can be expressed as a function of X0 , · · · , Xr so the
Markov property implies
(k)
P (Xτ +k = j|Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r) = P (Xr+k = j|Xr = i) = pij .

Therefore, equation (3.4) becomes

P (Xτ +k = j|Xτ = i, 0 ≤ ` < τ, X` = i` )

P∞
P (Xτ +k = j, Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)
= r=1
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
P∞
P (Xτ +k = j|Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)P (Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)
= r=1
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
P∞ (k)
r=1 pij P (Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)
=
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
P∞
(k) r=1 P (Xτ = i, 0 ≤ ` < τ, X` = i` , τ = r)
= pij
P (Xτ = i, 0 ≤ ` < τ, X` = i` )
(k)
= pij .

The second equality follows simply from the first equality:

P∞
P (Xτ +k = j|Xr = i, τ = r)P (Xr = i, τ = r)
P (Xτ +k = j|Xτ = i) = r=1
P (Xτ = i)
P∞
(k) r=1 P (Xr = i, τ = r)
= pij
P (Xτ = i)
(k)
= pij .

3.8 Stationary distribution

It is often of great interest to study the limiting behavior of a Markov chain Xn when n → ∞. Here, for
simplicity, we assume that our Markov chain is homogeneous. A property of limiting behavior is that Xn and
Xn+1 should have the same distribution when n is large. So we are interested in understanding if a Markov
chain will eventually converge to a ‘stable’ distribution (formally, we will call it a stationary distribution).
In particular, we would like to know given a Markov chain,

• does this chain has a stationary distribution?

Lecture 3: Discrete-Time Markov Chain – Part I 3-15

• if so, what is the stationary distribution?

• and does this stationary distribution unique?

It turns out that to answer these questions, we will use concepts related to return time. Thus, we start with
understanding properties about return time.

3.8.1 Return Times

P∞
Let Ni = n=1 I(Xn = i) denotes the number of visits of {Xn } to state i not counting the initial state. We
also define the following notations:

P (·|X0 = i) = Pi (·), E(·|X0 = i) = Ei (·).

Note that the quantity Ni may equal to ∞. It is a finite number with a non-zero probability if there are
some states such that when the chain enters one of them, the chain never go back to state i. Later we will
describe this phenomena using the concept of transient states and recurrent states.
Example 6 (revisited). Consider a Markov chain with the following transition diagram:

2 4 5

1 6

As can be seen easily, when the Markov chain enters states {4, 5, 6}, it never comes back to any of {1, 2, 3}.
Thus, N1 takes a non-trivial probability to be a finite number.
Let Ti = inf{n ≥ 1 : Xn = i} be the return time. Then the following events can be defined using either Ti
or Ni :
{Ti = ∞} = {Ni = 0}, {Ti < ∞} = {Ni > 0}.
These are useful later.
We then define fji = Pj (Ti < ∞) = Pj (Ni > 0) to be the probability of reaching state i in a finite number
of time when the chain starts at state j. Note that because Pj (Ti = ∞) + Pj (Ti < ∞) = 1, we have
fii = Pi (Ti < ∞) and Pi (Ti = ∞) = 1 − fii .

Proposition 3.8 (
fji fiir−1 (1 − fii ) if r ≥ 1
Pj (Ni = r) = .
1 − fji if r = 0

Proof: The case r = 0 is very simple because {Ni = 0} = {Ti = ∞}. Thus, Pj (Ni = 0) = Pj (Ti = ∞) =
1 − Pj (Ti < ∞) = 1 − fji .
For the rest of cases, we will do a proof by induction. Before doing that, we first investigate the case
Pj (Ni = r) for r > 0. Let τr be the r-th return time. Note that the event {Xτr = i} = {Ni ≥ r}.
3-16 Lecture 3: Discrete-Time Markov Chain – Part I

Then

Pj (Ni = r) = Pj (Ni = r, Xτr = i)

= Pj (Ni = r|Xτr = i)Pj (Xτr = i)
∞
!
X
= Pj I(Xt = i) = 0 | Xτr = i Pj (Xτr = i)
t=τr +1
∞
!
X
= Pi I(Xt = i) = 0 | X0 = i Pj (Xτr = i) (Strong Markov property)
t=1
= Pi (Ni = 0)Pj (Ni ≥ r)
= Pi (Ti = ∞)Pj (Ni ≥ r).

Therefore, we conclude

Pj (Ni = r) = Pi (Ti = ∞)Pj (Ni ≥ r) = (1 − fii )Pj (Ni ≥ r).

To start with the proof by induction, consider r = 1. Pj (Ni ≥ 1) = 1 − Pj (Ni = 0) = fji so Pj (Ni = 1) =
(1 − fii )fji , which agrees with what we need for r = 1.
Assume that it works for r ≤ k. Now we show that it works for r = k + 1. Note that this means that we
have (
fji fiir−1 (1 − fii ) if r = 1, · · · k
Pj (Ni = r) = .
1 − fji if r = 0
For the case of r = k + 1, we use the fact that

Pj (Ni = k + 1) = (1 − fii )Pj (Ni ≥ k + 1),

so all we need is the probability Pj (Ni ≥ k + 1).

This quantity can be easily calculated via

Pj (Ni ≥ k + 1) = 1 − Pj (Ni ≤ k)
k
X
= 1 − (1 − fji ) − fji fiir−1 (1 − fii )
r=1

= fji − fji (1 − fii )(1 + fii + fii2 + · · · + fiik−1 )

1 − fiik
= fji − fji (1 − fii )
1 − fii
= fji fiik .

Thus,
Pj (Ni = k + 1) = (1 − fii )Pj (Ni ≥ k + 1 = fji fiik (1 − f ii)
which is the formula for r = k + 1. Thus, by induction, the result holds.

The above formula also gives an interesting result on the case of ‘starting from state i, returning to state i’
when we set j = i:
Pi (Ni = r) = fiir (1 − fii ), Pi (Nr > r) = fiir+1 ,
where fii = Pi (Ti < ∞).
Lecture 3: Discrete-Time Markov Chain – Part I 3-17

We have seen many situations that Ti and Ni are closely related. Here is another result about their rela-
tionship.

Corollary 3.9
Pi (Ni = ∞) = 1 ⇔ Pi (Ti < ∞) = 1
and
Pi (Ti < ∞) < 1 ⇔ Pi (Ni = ∞) = 0 ⇔ Ei (Ni ) < ∞.

Corollary 3.9 links the finiteness of Ti and Ni and also relates it to the expectation. With the following
formula of expectation, Corollary 3.9 will be very useful:
∞
X
E(X) = P (X ≥ t), (3.5)
t=1

when X is a random variable taking integer values.

3.8.2 Recurrence and Transience

Based on the return time property, we classify a state i as

(
recurrent/persistent, if Pi (Ti < ∞) = fii = 1
transient, otherwise.

Furthermore, a recurrent state is called

(
positive recurrent, if Ei (Ti ) < ∞
null recurrent, otherwise.

Note that: either Pi (Ni = ∞) = 0 or Pi (Ni = ∞) = 1, with nothing in between (if fii < 1, then Pi (Ni =
∞) = 0; if fii = 1, then Pi (Ni = ∞) = 1). This, together with Corollary 3.9, implies that Ei (Ni ) = ∞ ⇐⇒
Pi (Ni = ∞) = 1.
Note that:
fii = Pi (Ti < ∞) = 1 ⇐⇒ Pi (Ni = ∞) = 1.
In other words, if a Markov chain returns to state i in finite time, then the chain visits this state infinitely
often.

P∞ (n)
Proposition 3.10 State i is recurrent ⇐⇒ n=1 pii = ∞.

Proof: State i is recurrent ⇐⇒ Pi (Ti < ∞) = fii = 1 ⇐⇒ Pi (Ni = ∞) = 1 by Corollary 3.9.

It is easy to see that Pi (Ni = ∞) = 1 ⇐⇒ Ei (Ni ) = ∞.
P∞ (n)
Using equation (3.5), Ei (Ni ) = n=1 pii and the result follows.

Proposition 3.11 Recurrence is a communication class property, i.e. if i ↔ j and i is recurrent, then j is
recurrent.
3-18 Lecture 3: Discrete-Time Markov Chain – Part I

Proof: Homework.
Example: Gambler’s Ruin+. Recall that in Gambler’s ruin, the game ends when Xn hits 0 or
a + b. Now we extend the problem in the sense that the game does not ends when a player loses/takes
all money but the value of Xn stays the same once it hits 0 or a + b. Namely, Xn = 0 ⇒ Xn+1 = 0
(k) (k)
and Xn = a + b ⇒ Xn+1 = a + b. In this case p00 = pa+b,a+b = 1 for all k = 1, 2, · · · . Therefore,
P∞ (n) P∞ (n) P∞
n=1 p00 = n=1 pa+b,a+b = n=1 1 = ∞. Hence, 0 and a + b are recurrent states. Once they are reached
we stay there forever. Let q be the probability that player 1 loses. Consider state 1. If the next round
the player 1 loses, the chain stuck at 0 forever. Namely, T1 = ∞ because we can never come back. So
P1 (T1 = ∞) ≥ q, which implies

P1 (T1 < ∞) = 1 − P1 (T1 = ∞) ≤ 1 − q < 1 if q ∈ (0, 1).

Note that the inequality in P1 (T1 = ∞) ≥ q is due to the fact that even if player 1 wins, the game may end
at a + b, so the return time to state 1 may still be infinite. Therefore, by definition, 1 is a transient state.
Since states {1, . . . , a + b − 1} form a communication class, all states in this class are also transient. These
states are transient because they occur a finite number of times before absorption into states 0 or a + b.
Example 8: 1-D Random Walk (revisited). Let Xn be a random walk on the set of all integers Z such
that (
p if j = i + 1
pij =
q := 1 − p if j = i − 1.
(2n+1)
Let’s study recurrence of state 0. We know that p00 = 0 for all n ≥ 0 and that, conditional on X0 = 0,
X2n =d ξ1 + · · · + ξ2n , where ξ1 , . . . , ξn are i.i.d. with P (ξi = 1) = 1 − P (ξ = −1) = p. Hence,

(2n) 2n n n
p00 = P (X2n = 0|X0 = 0) = p q .
n
1 √
Recall that Stirlings formula says that n! ∼ nn+ 2 e−n 2π, meaning that

n!
lim 1 √ = 1.
n→∞ nn+ 2 e−n 2π

Therefore,

(2n) (2n)! n n
p00 = p q
n!n!
1 √
(2n)2n+ 2 e−2n 2π
∼ (pq)n
n2n+1 e−2n 2π
1 1
22n+ 2 n2n+ 2 (pq)n 22n (4pq)n
= 1√ (pq)n = √ = √ .
n2n+1 2 2 π πn πn

We deduce that
∞ ∞
X (n)
X (2n) 1
p00 = p00 =∞ ⇔ 4pq ≥ 1 ⇔ p=q= .
n=1 n=1
2
P∞
(Ratio Test: Let n=1 an be a series which satisfies limn→∞ | aan+1
n
| = k. If k > 1 the series diverges, if k < 1
the series converges.) Conclusion: Only the symmetric random walk is recurrent on Z. Interestingly, the
symmetric random walk on Z2 is also recurrent, but it is transient on Zn for n ≥ 3, See Brémaud (1999,
p. 98).
Lecture 3: Discrete-Time Markov Chain – Part I 3-19

3.8.3 Invariant Measures

With the knowledge about recurrence, we are able to talk about the invariant measures and stationary
distribution of a stochastic matrix.
A vector x 6= 0 is called an invariant measure of a stochastic matrix P if

• ∞ > xi ≥ 0 for each i, and

• xT P = xT , i.e., xi = j xj pji for each i.

A probability vector π on a Markov P chain state space is called a stationary distribution of a stochastic
matrix P if π T P = π T , i.e., πi = j πj pji for each i.

The equation xT P = xT or π T P = π T is also called the global balance equaitons – the probability flow in
equals the flow out. Note that for an invariance measure x such that c = i xi < ∞, c−1 x is a stationary
P
distribution. But it may happen that c = ∞ for some invariant measure so one cannot always normalize it.
Example 9: Two-State Markov Chain. Consider a Markov chain with two states and a transition
probability matrix
1−p p
P= 0 < p < 1, 0 < q < 1.
q 1−q
The global balance equations:
1−p p
π0 , π1 = π0 , π1 or
q 1−q
(
(1 − p)π0 + qπ1 = π0 q
⇒ pπ0 = qπ1 ⇒ π0 = π1 .
pπ0 + (1 − q)π1 = π1 p
Using that π0 + π1 = 1, we obtain
q p
π1 + π1 = 1 ⇒ π1 =
p p+q

and deduce that the global balance equations have the unique solution

q p
πT = , ,
p+q p+q

which is the stationary distribution.

Example: Gambler’s Ruin+ (simple version). Let the total fortune of both players be a + b = 4. Then
 
1 0 0 0 0
 q 0 p 0 0
 
P= 0 q 0 p 0 .

0 0 q 0 p
0 0 0 0 1

By inspection, vectors παT = [α, 0, 0, 0, 1 − α] satisfy global balance equations: παT P = παT for any α ∈ (0, 1).
So the Gambler’s ruin chain has an uncountable number of stationary distributions.
Here, we see the case where a Markov chain may have infinite number of stationary distribution. And in
some cases it may not even have a stationary distribution! So returning to our original questions, we would
3-20 Lecture 3: Discrete-Time Markov Chain – Part I

like to know (i) when will a Markov chain has a stationary distribution? and (ii) how to find a stationary
distribution? and (ii) when the stationary distribution will be unique?
The following proposition partially answer the first question. Note that a Markov chain is recurrent if all its
states are recurrent.

Proposition 3.12 Let {Xn } be an irreducible, recurrent, homogeneous Markov chain with transition prob-
ability matrix P. For each i ∈ S define

∞
" #
X
yi = E0 I(Xn = i)I(n ≤ T0 ) ,
n=1

where 0 is an arbitrary reference state and T0 = inf{n ≥ 1 : Xn = 0} is the first return time to 0. Then
yi ∈ (0, ∞) for all i ∈ S, and yT = [y0 , y1 , . . . ] is an invariant measure of P.

Note: For i 6= 0, yi is the expected number of visits to state i before returning to 0.

Before starting the proof, we note the following three properties.

(P1) When i = 0,
∞
" #
X
y0 = E0 I(Xn = 0)I(n ≤ T0 ) = 1
n=1

because for n ≥ 1, Xn = 0 if and only if n = T0 .

(P2)
∞
" #
X X X
yi = E0 I(Xn = i)I(n ≤ T0 )
i∈S i∈S n=1
∞ X
" #
X
= E0 I(Xn = i)I(n ≤ T0 )
n=1 i∈S
∞
" #
X
= E0 I(n ≤ T0 )
n=1
= E0 (T0 ).

(P3) For any i ∈ S, we define

(n)
q0i ≡ E0 (I(Xn = i)I(n ≤ T0 )) = P0 (X1 6= 0, X2 6= 0, · · · , Xn−1 6= 0, Xn = i)

to be the probability of visiting state i at time point n before returning to state 0. Thus,
∞ ∞
(n)
X X
yi = E0 (I(Xn = i)I(n ≤ T0 )) = q0i
n=1 n=1

(1)
and q0i = E0 (I(X1 = i)I(1 ≤ T0 )) = p0i .
P
Proof: This proof consists of two parts. In the first part, we prove that each yi satisfies yi = j∈S yj pji .
In the second part, we will show that 0 < yi < ∞ for every i ∈ S.
Lecture 3: Discrete-Time Markov Chain – Part I 3-21

P (n)
Part 1. To show that yi = j∈S yj pji , we analyze q0i defined in property (P3):

(n)
q0i = P0 (X1 6= 0, X2 6= 0, · · · , Xn−1 6= 0, Xn = i)
X
= P0 (X1 6= 0, X2 6= 0, · · · , Xn−1 = j, Xn = i)
j6=0
X
= P0 (Xn = i | X1 6= 0, X2 6= 0, · · · , Xn−1 = j) P0 (X1 6= 0, X2 =
6 0, · · · , Xn−1 = j)
| {z }
j6=0 (n−1)
=q0j
(n−1)
X
= P (Xn = i | Xn−1 = j)q0j (Markov property)
j6=0
(n−1)
X
= q0j pji .
j6=0

Thus,
∞
(n)
X
yi = q0i
n=1
∞
(n)
X
= p0i + q0i
n=2
∞ X
(n−1)
X
= p0i + q0j pji
n=2 j6=0
∞ X
(n)
X
= p0i + q0j pji
n=1 j6=0
∞
!
(n)
X X
= p0i + q0j pji
j6=0 n=1
| {z }
=yj
X
= y0 p0i + yj pji
|{z}
=1 j6=0
X
= yj pji .
j∈S

Part 2. Now we show that 0 < yi < ∞. First note that y0 = 1 so we only need to focus on cases y 6= 0.
(n(i))
Because the Markov chain is irreducible, for each state i there exists a number n(i) ≥ 1 such that p0i > 0.
Then using the fact that y T = y T P implying y T = y T P(n(i)) ,
(n(i)) (n(i)) (n(i))
X X
yi = yj pji = y0 p0i + yj pji > 0.
j∈S
| {z }
j6=0
>0

To show that yi < ∞, we prove by contradiction. Assume that yi = ∞. Because the Markov chain is
(k(i))
irreducible, there exists a constant k(i) such that pi0 > 0. Then
(k(i)) (k(i)) (k(i))
X X
y0 = yj pj0 = yi pi0 + yj pj0 = ∞,
j∈S
| {z }
j6=i
=∞
3-22 Lecture 3: Discrete-Time Markov Chain – Part I

a contradiction. Thus, yi < ∞.

Proposition 3.13 The invariant measure of an irreducible and recurrent chain is unique up to a multiplica-
tive factor.

Proof: See Brémaud (1999, p. 102).

Proposition 3.14 An irreducible,P recurrent and homogeneous Markov chain is positive recurrent ⇔ all of
its invariant measures y satisfy i∈S yi < ∞.

Proof: By Proposition 3.12, there is an invariant measure y with

yi = E0 [I(Xn = i)I(n ≤ T0 )] .

Moreover, by Proposition 3.13, this

P invariant measure is unique up to a multiplicative factor. So what
remains to prove is to show that i∈S yi < ∞.
Using property (P2), X
yi = E0 (T0 ).
i∈S
P
Therefore, positive recurrent ⇐⇒ E0 (T0 ) < ∞ ⇐⇒ i∈S yi < ∞.

To see why positive recurrent is important, consider the following example about a 1 − D random walk on all
integers Z with p 6= q is transient and recurrent if p = q = 0.5. This Markov chain has an invariant measure
yT = [1, 1, . . . ] for any p and q since
 
.. .. ..
 . . . · · · · · · · · · · · ·
· · · q 0 p · · · · · · · · ·
 
P = · · · · · ·
 q 0 p · · · · · · 
· · · · · · · · · q 0 p · · ·
 
.. .. ..
··· ··· ··· ··· . . .

Since this measure is not normalizable (the state space is Z), the 1-D random walk can not be positive
recurrent. Thus, we see that an irreducible homogeneous Markov chain can have an invariant measure and
still be transient or null recurrent.

Lemma 3.15 Let {Xn } be a homogeneous Markov chain with state space S and n-step transition probability
(n) (n)
matrix Pn = {pij }. If i ∈ S is a transient state, then lim pji = 0 for all j ∈ S.
n→∞

P∞ (n) (n)
Proof: This proof relies a trick – if n=1 pji < ∞, then limn→∞ pji = 0. Thus, we only need to show
P∞ (n)
that n=1 pji < ∞ when i ∈ S is a transient state.
By definition,
∞ ∞ ∞ ∞
!
(n)
X X X X
pji = Pj (Xn = i) = Ej (I(Xn = i)) = Ej I(Xn = i) = Ej (Ni ).
n=1 n=1 n=1 n=1
Lecture 3: Discrete-Time Markov Chain – Part I 3-23

So we can switch our goal to Ej (Ni ).

Because Ni is a RV taking integer values, we can rewrite its expectation as
∞
X ∞
X
Ej (Ni ) = Pj (Ni ≥ k) = Pj (Ni ≥ k + 1).
k=1 k=0

So we only need to compute each of Pj (Ni > k). Now, recalled from the proof of Proposition 3.8,

Pj (Ni ≥ k + 1) = fji fiik .

We obtain
∞
X ∞
X
Ej (Ni ) = Pj (Ni ≥ k + 1) = fji fiik .
k=0 k=0

Because the state i is transient, fii < 1 so the above summation becomes
∞
X fji
Ej (Ni ) = fji fiik = < ∞,
1 − fii
k=0

which is the desired result.

Finally, we obtain the criterion for stationary distribution.

Theorem 3.16 (Stationary Distribution Criterion) An irreducible homogeneous Markov chain is pos-
itive recurrent if and only if it has a stationary distribution. Moreover, if the stationary distribution
π T = [π1 , π2 , . . . ] exists, it is unique and πi > 0 for all i ∈ S.

Proof: ⇒: P
By Propositions 3.12 and 3.14, the vectorP
y defined in Proposition 3.12 is an invariant measure with i∈S yi <
∞. Thus, the probability vector π = y/ i∈S yi is the stationary distribution.
The uniqueness follows from Proposition 3.13.
⇐:
To prove this direction, we use proof by contradiction. Because recurrence is a communication class property
(Proposition 3.11) and the Markov chain is irreducible, the fact that a state i is transient implies every state
is transient. Let π be a stationary distribution and we assume that the Markov chain is transient.
(n)
By Lemma 3.15, limn→∞ pji = 0 for any state j ∈ S. Since π is a stationary distribution, π T = π T Pn .

Using the dominated convergence theorem (we can exchange summation and limit)1 ,
(n) (n)
X X X
πi = lim πi = lim πj pji = πj lim pji = πj × 0 = 0
n→∞ n→∞ n→∞
j∈S j∈S j∈S

for every state i ∈ S.

P
Then we conclude i∈S πi = 0 6= 1, a contradiction to the definition of stationary distribution. Thus, the
Markov chain is recurrent then by Proposition 3.14, the Markov chain is positive recurrent.

1 (n) P
This works because πj pji ≤ πj for every n and j∈S πj = 1.
3-24 Lecture 3: Discrete-Time Markov Chain – Part I

In the above case, we are working on a state space S that may possibly contain infinite number of states.
In many realistic scenarios the number of states is finite. Does the finiteness of state number gives us any
benefits? The answer is yes – and it gives us a huge benefit.

Theorem 3.17 An irreducible homogeneous Markov chain on a finite state space is positive recurrent.
Therefore, it always has a stationary distribution.

Proof:
We first prove that the chain is recurrent. We proceed by proof by contradiction. Assume that the chain is
transient. In the proof of Lemma 3.15, we have shown that if a state i is transient, then
∞ ∞
X (n)
X fji
pji = Ej (Ni ) = fji fiik = < ∞.
n=1
1 − fii
k=0

Because the number of state space is finite,

∞
(n)
XX
pji < ∞.
i∈S n=1

However, if we exchange the summations,

∞ ∞ X
s ∞
(n) (n)
XX X X
pji = pji = 1 = ∞,
i∈S n=1 n=1 i=1 n=1

which is a contradiction. So we conclude that the chain is recurrent.

To see if the chain is positive recurrent, note that Proposition
P 3.12 shows that there exists an invariant
measure y. Because the number of state space is finite, i∈S yi < ∞ so by Proposition 3.14, the chain is
positive recurrent.

Finally, we end this lecture on the relation between the return time and the stationary distribution.

Theorem 3.18 Let {Xn } be an irreducible homogeneous positive recurrent Markov chain. Then
1
πi = ,
Ei (Ti )
where π = (π1 , · · · , πs ) is the stationary distribution of {Xn } and Ti = inf{n ≥ 1 : Xn = i} is the return
time to state i.
P∞
Proof: Define a vector y such that yi = E0 ( n=1 I(Xn = i)I(n ≤ T0 )). We already know that such a vector
describes an invariant measure and πi = P yi yj .
j∈S
P∞
Now
P we consider the case i = 0. Then y0 = E0 ( n=1 I(X n = 0)I(n ≤ T0 )) = 1 by property (P1). Moreover,
y0 1
y
i∈S i = E (T
0 0 ) due to property (P2). Thus, π 0 = P
yi = E0 (T0 ) .
i∈S

Because state 0 is just a reference state, we can apply the same argument to any other state. Thus, we
1
conclude that πi = Ei (Ti)
for each i ∈ S.

Here is a short summary about what we have learned so far:

Lecture 3: Discrete-Time Markov Chain – Part I 3-25

1. Irreducibility + recurrence ⇒ There exists an invariant measure that is unique up to a proportionality

constant.
2. Irreducibility + positive recurrence ⇔ Irreducibility + there exists a stationary distribution π and it
is unique. Moreover, when π exists, πi > 0 and πi = 1/Ei [Ti ].
3. Irreducibility + finite state-space ⇒ Irreducibility + positive recurrence.

Here is a summary about the classification of states. Recall fii = Pi (Ti < ∞) is the probability of return to
i given we start at i and Ei (Ti ) is the expected return time. State i is called:

1. Recurrent if fii = 1.
2. Transient if fii < 1.
3. Positive Recurrent if fii = 1 and Ei (Ti ) < ∞.

4. Null Recurrent if fii = 1 and Ei (Ti ) = ∞.

(n)
5. Periodic with period di if pii = 0 for all n not divisible by di , and di (> 1) is the greatest such integer.
6. Aperiodic if di = 1.

7. Ergodic if 3. and 6. apply.

8. Absorbing if pii = 1.

StochBioChapter3 PDF
No ratings yet
StochBioChapter3 PDF
46 pages
Stoch Bio Chapter 3
No ratings yet
Stoch Bio Chapter 3
46 pages
Markov Chains: Definitions and Examples
No ratings yet
Markov Chains: Definitions and Examples
59 pages
Stochastic Processes: Markov Chains and More
No ratings yet
Stochastic Processes: Markov Chains and More
52 pages
Ellen's Stock Gamble and Wealth Probability
No ratings yet
Ellen's Stock Gamble and Wealth Probability
141 pages
1 Discrete-Time Markov Chains
No ratings yet
1 Discrete-Time Markov Chains
7 pages
Introduction to Stochastic Processes
No ratings yet
Introduction to Stochastic Processes
180 pages
Simulating Markov Chains Explained
No ratings yet
Simulating Markov Chains Explained
7 pages
Introduction to Markov Chains and Processes
No ratings yet
Introduction to Markov Chains and Processes
114 pages
Understanding Markov Chains and Properties
No ratings yet
Understanding Markov Chains and Properties
42 pages
Markov Chains: Discrete Time Overview
No ratings yet
Markov Chains: Discrete Time Overview
20 pages
Understanding Markov Chains and Properties
No ratings yet
Understanding Markov Chains and Properties
42 pages
Stochastic Processes: Markov Chains Guide
No ratings yet
Stochastic Processes: Markov Chains Guide
39 pages
Understanding Markov Models and Chains
No ratings yet
Understanding Markov Models and Chains
14 pages
Understanding Markov Chains Basics
No ratings yet
Understanding Markov Chains Basics
38 pages
Markov Chain and Markov Process Solved Questions and Lecture
No ratings yet
Markov Chain and Markov Process Solved Questions and Lecture
56 pages
Markov Chains in Stochastic Processes
No ratings yet
Markov Chains in Stochastic Processes
101 pages
Understanding Discrete-Time Markov Chains
No ratings yet
Understanding Discrete-Time Markov Chains
37 pages
Markov Chain Properties Explained
No ratings yet
Markov Chain Properties Explained
53 pages
Introduction to Stochastic Processes
No ratings yet
Introduction to Stochastic Processes
4 pages
Markov Chains and Stochastic Processes
100% (1)
Markov Chains and Stochastic Processes
16 pages
Markov Chains: Statistical Modeling Examples
No ratings yet
Markov Chains: Statistical Modeling Examples
14 pages
Markov Chains in Stochastic Processes
No ratings yet
Markov Chains in Stochastic Processes
4 pages
Introduction to Stochastic Processes
No ratings yet
Introduction to Stochastic Processes
144 pages
Markov Chain Properties Overview
No ratings yet
Markov Chain Properties Overview
33 pages
Markov Chains and Inventory Dynamics
No ratings yet
Markov Chains and Inventory Dynamics
61 pages
Understanding Markov Chains Basics
No ratings yet
Understanding Markov Chains Basics
67 pages
Introduction to Markov Chains
No ratings yet
Introduction to Markov Chains
27 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
10 pages
PIB - Markov Chains - Chua, Grimmett (2015) 42pg
No ratings yet
PIB - Markov Chains - Chua, Grimmett (2015) 42pg
42 pages
Understanding Basic Markov Chains
No ratings yet
Understanding Basic Markov Chains
75 pages
Markov Chains in Data Science Basics
No ratings yet
Markov Chains in Data Science Basics
111 pages
Introduction to Stochastic Processes
No ratings yet
Introduction to Stochastic Processes
5 pages
Stochastic Processes and Markov Chains
No ratings yet
Stochastic Processes and Markov Chains
21 pages
Understanding Discrete Time Markov Chains
No ratings yet
Understanding Discrete Time Markov Chains
31 pages
Analysis of Discrete Markov Chains
No ratings yet
Analysis of Discrete Markov Chains
70 pages
Markov Chains and Stochastic Processes
No ratings yet
Markov Chains and Stochastic Processes
71 pages
Understanding Markov Chains Basics
No ratings yet
Understanding Markov Chains Basics
11 pages
Understanding Markov Processes in Depth
No ratings yet
Understanding Markov Processes in Depth
30 pages
Introduction to Discrete-Time Markov Chains
No ratings yet
Introduction to Discrete-Time Markov Chains
22 pages
Introduction to Markov Chains
No ratings yet
Introduction to Markov Chains
14 pages
Understanding Markov Chains Basics
No ratings yet
Understanding Markov Chains Basics
4 pages
Sample For Basic Stochastic Process
No ratings yet
Sample For Basic Stochastic Process
4 pages
Markov Chains in Probability Theory
No ratings yet
Markov Chains in Probability Theory
7 pages
Discrete Time Markov Chains Overview
No ratings yet
Discrete Time Markov Chains Overview
51 pages
Markov Chains: Concepts and Applications
No ratings yet
Markov Chains: Concepts and Applications
48 pages
Classification of Stochastic Processes
No ratings yet
Classification of Stochastic Processes
43 pages
Understanding Markov Chains Basics
No ratings yet
Understanding Markov Chains Basics
46 pages
Markov Chains: Definitions and Examples
No ratings yet
Markov Chains: Definitions and Examples
55 pages
Markov Chains: A Comprehensive Overview
No ratings yet
Markov Chains: A Comprehensive Overview
8 pages
Understanding Markov Chains Basics
No ratings yet
Understanding Markov Chains Basics
40 pages
Introduction to Stochastic Processes
No ratings yet
Introduction to Stochastic Processes
26 pages
Discrete Time Markov Chains Overview
No ratings yet
Discrete Time Markov Chains Overview
23 pages
Markov Chain State Distribution Proof
No ratings yet
Markov Chain State Distribution Proof
30 pages
Understanding Markov Chains and Processes
No ratings yet
Understanding Markov Chains and Processes
8 pages
Markov Chains: Key Concepts and Notes
No ratings yet
Markov Chains: Key Concepts and Notes
34 pages
Alligator Food Analysis with Multinomial Models
No ratings yet
Alligator Food Analysis with Multinomial Models
8 pages
Lecturenote 174e Lyu
No ratings yet
Lecturenote 174e Lyu
59 pages
515 HW 1
No ratings yet
515 HW 1
6 pages
Book With Index
No ratings yet
Book With Index
103 pages
Syllabus 515
No ratings yet
Syllabus 515
2 pages
Homework 0
No ratings yet
Homework 0
4 pages
SNP2 Genotype and Crohn's Disease Analysis
No ratings yet
SNP2 Genotype and Crohn's Disease Analysis
2 pages
Ocean Oxygen Levels: Statistical Analysis
No ratings yet
Ocean Oxygen Levels: Statistical Analysis
2 pages
Difference of Proportions in Drug Testing
No ratings yet
Difference of Proportions in Drug Testing
2 pages
Math/Stat 493 Hw1 Due On January 23, 2026: Straddle
No ratings yet
Math/Stat 493 Hw1 Due On January 23, 2026: Straddle
1 page
Correlation and Regression Quiz Insights
No ratings yet
Correlation and Regression Quiz Insights
2 pages
Illicit Drug Use and TB Testing Analysis
No ratings yet
Illicit Drug Use and TB Testing Analysis
2 pages
Heart Rate and Wait Time Distributions
No ratings yet
Heart Rate and Wait Time Distributions
2 pages
High School Graduation Rate Regression Analysis
No ratings yet
High School Graduation Rate Regression Analysis
2 pages
STAT220 Probability Quiz Analysis
No ratings yet
STAT220 Probability Quiz Analysis
2 pages
Multivariate Regression in Penguins Study
No ratings yet
Multivariate Regression in Penguins Study
2 pages
HIV Incidence and Viral Load Analysis
No ratings yet
HIV Incidence and Viral Load Analysis
12 pages
Advanced Statistical Inference Homework Solutions
No ratings yet
Advanced Statistical Inference Homework Solutions
9 pages
CSE/STAT 416 Summer 2019 Final Exam
No ratings yet
CSE/STAT 416 Summer 2019 Final Exam
20 pages
Seattle Rental Prices Histogram Analysis
No ratings yet
Seattle Rental Prices Histogram Analysis
4 pages
STAT 535 Homework 2: Poisson & IV Models
No ratings yet
STAT 535 Homework 2: Poisson & IV Models
2 pages
Inclusive Design for Restaurants & Cafés
No ratings yet
Inclusive Design for Restaurants & Cafés
11 pages
Relative Pronouns Crossword Puzzle
No ratings yet
Relative Pronouns Crossword Puzzle
6 pages
Diffraction1 First Sem Mechanical Engineering Physics
No ratings yet
Diffraction1 First Sem Mechanical Engineering Physics
10 pages
English 2 Activity Sheet: Quarter 2 - MELC No. 3
100% (2)
English 2 Activity Sheet: Quarter 2 - MELC No. 3
9 pages
Immunotherapy in Cancer Treatment
No ratings yet
Immunotherapy in Cancer Treatment
16 pages
PL/SQL Programming Guide and Examples
No ratings yet
PL/SQL Programming Guide and Examples
43 pages
Free Course Completion: Introducing Engineering
No ratings yet
Free Course Completion: Introducing Engineering
2 pages
MBA IB Course Structure 2012-13 Onwards
No ratings yet
MBA IB Course Structure 2012-13 Onwards
103 pages
Mythic Magazine #014
100% (7)
Mythic Magazine #014
36 pages
ST-1.1-02 - Denah Dinding Penahan Tanah Batu Kali
No ratings yet
ST-1.1-02 - Denah Dinding Penahan Tanah Batu Kali
1 page
Long Division of Polynomials Tutorial
No ratings yet
Long Division of Polynomials Tutorial
6 pages
Grade 11 Oral Communication Final Exam
100% (4)
Grade 11 Oral Communication Final Exam
5 pages
IED Data Fields for Modbus SCADA
No ratings yet
IED Data Fields for Modbus SCADA
6 pages
Physics 20 Formula Sheet Summary
100% (1)
Physics 20 Formula Sheet Summary
2 pages
IBPS PO 2016 Prelims Cutoff Marks
No ratings yet
IBPS PO 2016 Prelims Cutoff Marks
3 pages
Input Requirement Set Analysis
No ratings yet
Input Requirement Set Analysis
1 page
Core Scale Permeability Characterization
No ratings yet
Core Scale Permeability Characterization
12 pages
Summary of the 7 Habits of Highly Effective People
No ratings yet
Summary of the 7 Habits of Highly Effective People
4 pages
KCET WPP-09 Rank List 2025
No ratings yet
KCET WPP-09 Rank List 2025
3 pages
Propositional Logic Worksheet 1011
No ratings yet
Propositional Logic Worksheet 1011
2 pages
Alan Turing: Primary Source Bibliography
No ratings yet
Alan Turing: Primary Source Bibliography
5 pages
Rock Mass Classification in Engineering
No ratings yet
Rock Mass Classification in Engineering
40 pages
Language Tudor
No ratings yet
Language Tudor
9 pages
Factors Influencing Financial Reporting Timeliness
No ratings yet
Factors Influencing Financial Reporting Timeliness
11 pages
Prentice Hall Chemistry © 2008 (Wilbraham) Correlated To - New PDF
No ratings yet
Prentice Hall Chemistry © 2008 (Wilbraham) Correlated To - New PDF
67 pages
IPCRF Development Plan for Teachers
94% (47)
IPCRF Development Plan for Teachers
2 pages
Zoho CRM Migration Tips & Best Practices
0% (1)
Zoho CRM Migration Tips & Best Practices
7 pages
Creativity and Innovation Across Boundaries
No ratings yet
Creativity and Innovation Across Boundaries
17 pages
Class 10 Social Science Blueprint 2025-26
No ratings yet
Class 10 Social Science Blueprint 2025-26
1 page
Threat and Error Management in CRM
No ratings yet
Threat and Error Management in CRM
41 pages