0% found this document useful (0 votes)
14 views58 pages

Stochastic Processes Lecture Notes

Stochastic Processes

Uploaded by

fullmesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views58 pages

Stochastic Processes Lecture Notes

Stochastic Processes

Uploaded by

fullmesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Jana Kurrek

Lectures: Paquette, Elliot


Textbook: Dobrow, Robert

Stochastic Processes

Figure 1. TASEP on Z/1000Z, with the initial distribution as the Bernoulli product measure.

Contents. Conditional expectation; Generating functions; Branching processes and random walk; Markov
chains, transition matrices, classification of states, ergodic theorem; Birth and death processes; Queueing.
w2022-math447: stochastic processes 2

Contents

Axioms of Probability 4
Conditional Probability 4
Conditional Distributions 4
Conditional Expectation 4
Time-Homogeneous Markov Chains 5
Finite State, Time-Homogeneous Chains 5
Transition Probabilities 6
Stationary Distributions 7
Regular Chains 9
Classification of States 10
First Step Analysis 12
Recurrence and Transience 12
Periodicity 15
Absorbing Chains 17
Positive and Null Recurrence 18
Reversibility 19
Markov Chain Monte Carlo 19
Markov Chain Coupling 19
Metropolis-Hastings Algorithm 21
Classical Markov Chains 22
Electrical Networks 22
Pólya’s Theorem 24
Pólya’s Urn 24
Branching Processes 25
Mean Generation Size 25
Generating Functions 26
Poisson Processes 32
Definition 1 32
Definition 2 33
Definition 3 36
w2022-math447: stochastic processes 3

Applications of Poisson Processes 36


Thinning and Superposition 36
Poissonization and Depoissonization 38
Order Statistics 40
Spatial Poisson Processes 42
Continuous-Time Markov Chains 43
Holding Times 43
Infinitesimal Generator 47
Classification of States 47
Stationary Distributions 48
Poisson Subordination 49
Birth-and-Death Chains 50
Martingales 52
Examples of Martingales 52
Limit Theorems 56
w2022-math447: stochastic processes 4

Axioms of Probability

Conditional Probability
Definition 1 (Conditional Probability). The conditional probability of
A given B, defined for P( B) > 0, is,

P( A ∩ B)
P( A | B) =
P( B)

Definition 2 (Independence). Events A and B are independent if


P( A | B) = P( A). Equivalently, P( A ∩ B) = P( A) P( B).

Definition 3 (Total Law of Probability). Let B1 , · · · , Bk be a sequence of


events that partition Ω. Then, for any event A,
 
k k
( A ∩ Bi ) = ∑ P ( A ∩ Bi ) = ∑ P ( A | Bi ) P ( Bi )
[
P( A) = P 
i ∈[k ] i i

More generally, P( A) = E[ P( A | X )].

Definition 4 (Bayes’ Rule). For events A and B1 , 1


With the Total Law of Probability,
P ( A | Bi ) P ( Bi )
P( B | A) P( A) P ( Bi | A) =
∑ j P A | Bj P Bj
 
P( A | B) =
P( B)

Conditional Distributions
Definition 5 (Conditional Probability Mass Function). The condi-
tional probability mass function of Y given X = x is,

P( X = x, Y = y)
P (Y = y | X = x ) =
P( X = x )

Definition 6 (Conditional Probability Density Function). The condi-


tional probability density function of Y given X = x is,

f ( x, y)
f Y |X (y|y) =
f X (x)

Conditional Expectation
Definition 7 (Conditional Expectation). The conditional expectation
Summary of Conditional Probability:
of Y given X = x, written E[Y | X = x ]( x ), is a function of x,
• Total Probability

∑ y · P(Y = y | X = x ) Ω is discrete P( A) = E[ P( A | X )]
y
E[Y | X = x ]( x ) = R ∞
 y · f (y | x )dy
−∞ Y|X Ω is continous • Total Expectation
E[Y ] = E[E[Y | X ]]

• Total Conditional Expectation


P(Y | A) = E[ P(Y | X, A) | A]

• Total Conditional Probability


E[Y | A] = E[E[Y | X, A] | A]
w2022-math447: stochastic processes 5

Definition 8. The conditional expectation of Y given A is,


1
E (Y | A ) =
P( A) ∑ yP({Y = y} ∩ A)
y

= ∑ yP(Y = y | A)
y

for the discrete case.


Definition 9 (Law of Total Expectation). If A1 , · · · , Ak partitions Ω and
Y is a random variable, then the law of total expectation states that,
k
E [Y ] = ∑ E [Y | A i ] P ( A i )
i =1

More generally, E[Y ] = E[E[Y | X ]].


Proof. For the discrete case,

E[E[Y | X ]] = ∑ E [Y | X = x ] · P ( X = x )
x
!
=∑ ∑ yP(Y = y | X = x) P( X = x )
x y

= ∑ y ∑ P (Y = y | X = x ) · P ( X = x )
y x

= ∑ y ∑ P( X = x, Y = y)
y x

= ∑ y · P (Y = y )
y

= E (Y )

Time-Homogeneous Markov Chains

Finite State, Time-Homogeneous Chains


Definition 10 (Finite State Stochastic Process). A finite state stochas-
tic process ( Xn )n≥0 has time steps in N and values in S = [ N − 1].
Definition 11 (Markov Property). The Markov property claims that
for every n ∈ N and every sequence of states (i0 , i1 , · · · ) where i j ∈ S, the
behavior of a system depends only on the previous state,

P ( X n = i n | X0 = i 0 , · · · , X n − 1 = i n − 1 ) = P ( X n = i n | X n − 1 = i n − 1 )

Definition 12 (Time Homogeneity). A Markov chain is time-homogeneous


if the probabilities in Definition 11 do not depend on n,

P ( X n = i n | X n − 1 = i n − 1 ) = P ( X1 = i 1 | X0 = i 0 ) ( n ∈ N)
w2022-math447: stochastic processes 6

Definition 13 (Transition Matrix). The transition matrix P for a time-


homogeneous Markov chain is the N × N matrix whose (i, j)th entry Pij is
the one-step transition probability p(i, j) = P( X1 = j | X0 = i ).
Remark 14. The transition matrix P is stochastic, that is,
• (Non-Negative Entries) 0 ≤ Pij ≤ 1 for 1 ≤ i, j ≤ N

• (Row Sum Equal to 1) ∑ N


j=1 Pij = 1 for 1 ≤ i ≤ N

Example 1: Biased Coin Flips

Let ( Xn )n≥0 denote a sequence of coin flips where,



0.51 if X = H
n
P ( X n +1 = H | X n ) =
0.49 if Xn = T

and, 
0.51 if X = T
n
P ( X n +1 = T | Xn ) =
0.49 if Xn = H

Then, ! !
0.51 0.49 PHH PHT
P= =
0.49 0.51 PTH PTT

Transition Probabilities
Definition 15 (Probability Distribution Vector). The distribution of a
discrete random variable X is the vector ⃗ϕ if,

ϕj = P( X = j) ∀j ∈ N

Definition 16 (Initial Distribution Vector). The initial probability


distribution of a Markov chain ( Xn )n≥0 is the distribution ⃗ϕ of X0 .
Theorem 17 (Transition Probabilities). The n-step transition probability
pn (i, j) = P( Xn = i | X0 = j) is the (i, j)th entry in the matrix Pn .
Proof. The base case is trivially true for n = 1. Assume the statement
holds for a given n. Using the properties of matrix multiplication and
the Law of Total Conditional Expectation,

P ( X n + 1 = j | X0 = i ) = ∑ P ( X n = k | X0 = i ) · P ( X n + 1 = j | X n = k )
k∈S
= ∑ pn (i, k) · p(k, j)
k∈S
n
=P P
= P n +1
w2022-math447: stochastic processes 7

Theorem 18 (Chapman-Kolmogorov). Let x, y, z ∈ S and m, n ∈ (0, ∞).

pm+n ( x, y) = P( Xm+n = y | X0 = x )
= ∑ P(Xm+n = y, Xm = z | X0 = x)
z∈S
= ∑ pm (x, z) · pn (z, y) The probabilistic interpretation for
Chapman-Kolmogorov is that transi-
z∈S
tioning from x to y in m + n steps is
equivalent to transitioning from i to k in
Definition 19 (Distribution of Xn ). The distribution of ( Xn )n≥0 is, m steps and then moving from k to j in
the remaining n steps.

ϕ · Pn i.e., P( Xn = j) = (ϕ · Pn ) j ∀j ∈ N

where P is the transition matrix of ( Xn ) and ϕ is the initial distribution.

Stationary Distributions
Definition 20 (Limiting Distribution Vector). A limiting distribution
⃗ so that,
for a time-homogeneous Markov chain ( Xn )n≥0 is a distribution π

lim pn (i, j) = π j
n→∞

The definition of a limiting distribution is equivalent to,

• limn→∞ P( Xn = j) = π j

• limn→∞ ⃗ϕ · Pn = π
⃗ , where ⃗ϕ is the initial distribution

• limn→∞ Pn = V , where V is a stochastic matrix whose rows are π


Definition 21 (Occupation Time). The occupation time for a time-


homogeneous Markov chain ( Xn )n≥0 from an initial state i is,
 n −1 
1
lim E
n→∞ n ∑ 1 X k = i | X0 = i
k =0

which represents the long-term expected proportion of time spent visiting j.

Remark 22. The limiting distribution gives the occupation time of ( Xn ),


 n −1  n −1
1
lim E
n→∞ n ∑ 1 { X k = j } | X0 = i = lim
n→∞
∑ E [ 1 X k = j | X0 = i ]
k =0 k =0
n −1
1
= lim
n→∞ n ∑ P ( X k = j | X0 = i )
k =0
n −1
1
= lim
n→∞ n ∑ pn (i, j)
k =0
n
= lim P
n→∞ ij
= π j by Cesaro’s Average
w2022-math447: stochastic processes 8

Note on Stationary Distributions:


Definition 23 (Stationary Distribution Vector). A limiting probability The converse of Lemma 24 is not
⃗ is a stationary probability distribution for P if,
vector π true. Stationary distributions are not
necessarily limiting distributions. For
example, the chain with,
π ⃗ ·P
⃗ =π  
1 0
P=
⃗ of ( Xn ) is stationary.
Lemma 24. The limiting distribution π 0 1
has every probability vector as a sta-
⃗ is the limiting distribution. We need to show tionary distribution since,
Proof. Assume that π
⃗ =π
that π ⃗ · P. For any initial distribution ⃗ϕ, ⃗ ·P = π
π ⃗ ∀⃗
π

⃗ = lim ⃗ϕ · Pn = lim ⃗ϕ · (Pn−1 · P) = lim ⃗ϕ · Pn−1 · P = π


  
0 1
π ⃗ ·P P= Unique Stationary
n→∞ n→∞ n→∞ 1 0

 
1 0
P= Multiple Stationary
0 1
Example 2: Random Walk on a Weighted Graph
 
0 1
P= No Limiting
Let G be a weighted graph with edge weight function w(i, j). 1 0

For a random walk on G, the stationary distribution π⃗ is pro-


p( x, x + 1) = 1 ( S = N) No Stationary
portial to the sum of the edge weights incident to each vertex,

w(v)
πv =
∑z w(z)
∀v ∈ V ( G ) where w(v) = ∑ w(v, z)
z∼v

is the sum of the edge weights on all edges incident to v.

Consider the random walk with transition matrix,

0 85 81 28
 
5 0 0 2
P = 7 7


1 0 0 0
2 2
4 4 0 0

Let w(i ) = ∑4j=1 w(i, j). Then,


 
w (1) w (2) w (3) w (4)
⃗ =
π ∑ w (1) ∑ w (2) ∑ w (3) ∑ w (4)
 
= 8 7 1 4
20 20 20 20
Random Walk on a Weighted Graph:
⃗ ·P = π
satisfies π ⃗.

Example 3: Simple Random Walk on a Graph

For a simple random walk on a non-weighted graph,

w(i, j) = 1 ∀i, j ∈ V ( G ) and w(v) = deg(v)

If | E( G )| is the number of edges in the graph, this gives,


deg(v) deg(v)
πv = =
∑z deg(z) 2| E( G )|
w2022-math447: stochastic processes 9

Regular Chains
Definition 25 (Regularity). A transition matrix P is regular if and only
if there exists n ∈ N so that every entry of Pn is positive.

Theorem 26. If the matrix Pn is regular, then Pm is regular (m > n).

Proof. Linear combinations of positive numbers are positive, so the


result follows by the definition of matrix multiplication,

Pijn+1 = ∑ Pinℓ · Pℓ j > 0


ℓ∈S

since Pinℓ > 0 and at least one Pℓ j is positive2 . 2


Otherwise Pn has a zero column.
The two-state Markov chain with,
Theorem 27. A stochastic matrix P has an eigenvalue λ∗ = 1. All other
 
1− p p
P=
eigenvalues λ of P satisfy |λ| ≤ 1, with strict inequality if P is regular. q 1−q
can be expressed as,

Proof. Let P be a k × k stochstic matrix. The rows of P sum to 1 by


definition, so P · ⃗1 = ⃗1 and λ∗ = 1 is a right eigenvalue of P. Suppose
that ⃗z is the eigenvector corresponding to any other eigenvalue λ
of P. Let |zm | = max1≤i≤k |zi | be the component of ⃗z of maximum
absolute value. Then, Therefore, its stationary distribution is,
 
q p
k k π= ,
|λ| · |zm | = |λzm | = |(P ·⃗z)m | = | ∑ Pmi zi | ≤ |zm | ∑ Pmi = |zm | p+q p+q
i =1 i =1 Example of Regularity:
The following matrix is regular,
and consequently |λ| ≤ 1.  
0 1/2 1/2
Assume that P is regular. Then Pn > 0 for some n > 0. P is a P= 1 0 0 
stochastic matrix, and it was shown above that P has an eigenvalue 1/2 1/2 0
λ∗ = 1. Moreover, all other eigenvalues λ of P satisfy |λ| ≤ 1. We because P4 is positive,
want to show that the inequality is strict. If λ is an eigenvalue of P,  
9/16 5/16 1/8
then λn is an eigenvalue of Pn . Let ⃗x be its corresponding eigenvalue, P4 =  1/4 3/8 3/8 
with | xm | = max1≤i≤k | xi |. Then, 1/2 5/16 3/16

k k
|λ|n · | xm | = |(Pn ·⃗z)m | = | ∑ Pnmi xi | ≤ | xm | ∑ Pnmi = | xm |
i =1 i =1

Since the entries of Pn are positive, the last inequality only holds
if | x1 | = · · · = | xk |. Similarly, the first inequality only holds if
x1 = · · · = xm . But the constant vector whose components are the
same is an eigenvector associated with the eigenvalue 1. Hence, if
λ ̸= 1, one of the inequalities must be strict. Thus, |λ|n < 1.

Theorem 28. Every finite state time-homogeneous Markov chain ( Xn )n≥0


has a stationary distribution π ⃗ . Moreover, the stationary distribution(s) of
( Xn ) are in bijective correspondance with the left 1-eigenvectors of P3 . 3
We require that the eigenvectors are
non-negative with components that
sum to 1.
w2022-math447: stochastic processes 10

Proof. If P is a stochastic matrix, then P has a right eigenvalue λ∗ = 1


(see Theorem 27). Since det(P T ) = det(P ), the left and right eigen-
values of P are equal. Hence, P has at least one left eigenvector π⃗ for
the eigenvalue 1. Normalizing this eigenvector to sum to 1 gives4 , 4
The vector with |π |i = |πi | is still an
eigenvector with eigenvalue 1.
P·π
⃗ = 1·π
⃗ i.e., P · π
⃗ =π

The proof that the stationary distribution(s) of ( Xn ) are in bijective


correspondance with the left 1-eigenvectors of P is similar.

Corollary 29. If ( Xn ) has a unique stationary distribution, then the distri-


bution is a left eigenvector of P corresponding to λ∗ = 1.

Theorem 30 (Perron-Frobenius). Let M be a k × k positive matrix. Then,

• There exists λ∗ ∈ R+ (called the Perron-Frobenius eigenvalue) which is


an eigenvalue of M. |λ| < λ∗ for all other eigenvalues λ of M.

• The eigenspace of eigenvectors associated with λ∗ is one-dimensional

Proof. The proof of the Perron–Frobenius theorem can be found in


many linear algebra textbooks, including Horn and Johnson (1990).

Theorem 31 (Limit Theorem for Finite Regular Chains). If ( Xn )n≥0 is


a finite state time-homogeneous Markov chain and P is regular, then there is
⃗ > 0 such that,
a unique, positive, stationary distribution π

lim Pn = V
n→∞

⃗.
where V is a matrix with all rows equal to π Example of Communication Classes:
G has 3 communication classes,

Classification of States
Definition 32 (Communication). Two states i, j ∈ S of a Markov chain
communicate, written i ↔ j, if there exist m, n ∈ N such that,

pm (i, j) > 0 and pn ( j, i ) > 0

Equivalently, two states communicate if and only if each state has a positive
probability of eventually being reached by a chain starting in the other state.

Theorem 33. The relation ↔ is an equivalence relation on the state space.

Proof. The relation ↔ is reflexive, symmetric, and transitive.

• (Reflexivity) i ↔ i since p0 (i, i ) = 1 > 0

• (Symmetry) i ↔ j =⇒ j ↔ i by definition
w2022-math447: stochastic processes 11

• (Transitivity) i ↔ j and j ↔ i =⇒ i ↔ k since,


pm1 +m2 (i, k) = P( Xm1 +m2 = k | X0 = i )
≥ P( Xm1 +m2 = k, Xm1 = j | X0 = i )
= P ( X m 1 = j | X0 = i ) · P ( X m 1 + m 2 = k | X m 1 = j )
= pm1 (i, j) pm2 ( j, k)
>0

Definition 34 (Irreducibility). The relation ↔ partitions the state space


into disjoint sets called communication classes. If there is only one com-
munication class, then the chain is called irreducible.
Definition 35 (Hitting Time). Let ( Xn )n≥0 be a Markov chain with state
space S. The hitting time ("first passage time") of A ⊆ S when X0 = x is,
τA = inf{n ≥ 0 | Xn ∈ A}
Definition 36 (First Return Time). Let ( Xn )n≥0 be a Markov chain with
state space S. The first return time is a variant of hitting time,
τx+ = inf{n ≥ 1 | Xn = i } assuming X0 = x
Definition 37 (Expected Number of Visits). Let ( Xn )n≥0 be a Markov
chain with state space S. The expected number of visits to i ∈ S is,

∑ pn (i, i ) assuming X0 = i
n =0
Proof. Let Ni be the random variable giving the total number of visits
to i ∈ S, including the initial visit. We can write,

Ni = ∑ 1{ Xn = i }
n =0
where the expectation of Ni when X0 = i is,
 ∞  ∞ ∞
E[ Ni ] = E ∑ 1{ Xn =i} = ∑ P( Xn = i ) = ∑ pn (i, i )
n =0 n =0 n =0

Theorem 38 (Limit Theorem for Finite Irreducible Chains). If ( Xn )n≥0


is a finite state time-homogeneous Markov chain and P is irreducible, then,
µ j = E[τj+ | X0 = j] < ∞ for all i ∈ S
⃗ > 0 such that,
and there exists a unique, positive, stationary distribution π
1
πj = , for all j ∈ S
E[ µ j ]
Furthermore,
n −1
1
π j = lim
n→∞ n ∑ Pijm for all j ∈ S
m =0
w2022-math447: stochastic processes 12

First Step Analysis


Definition 39 (First-Step Analysis). First-step analysis is the pro-
cess of conditioning on the first step of the chain and using the law of total
expectation to find the expected return time, E[τj+ | X0 = j].

Remark 40. If ( Xn )n≥0 is irreducible, then the expected return time can
also be found by taking the reciprocal of the stationary probability π.

Example 4: First-Step Analysis

Consider the Markov chain with transition matrix,

a b c
0 1 0 a
 
P =1/2 1 1/2 b
1/3 1/3 1/3 c

Define ex := E[τj+ | X0 = x ] for x ∈ { a, b, c}. Thus, ea is the


desired expected return time, and eb , ec are the first hitting
times to a for the chain started in b and c.

e a = 1 + eb
1 1
eb = + (1 + e c )
2 2
1 1 1
e c = + (1 + e b ) + (1 + e c )
3 3 3
Solving these equations gives,

8 7 10
ec = eb = ea =
3 3 3

Recurrence and Transience


Definition 41 (Recurrent). A state i ∈ S is recurrent if a Markov chain
starting at i will return to i infinitely often, with probability 1.

Definition 42 (Transient). A state i ∈ S is transient if a Markov chain


starting at i will return to i only finitely often, with probability 1.

Theorem 43. Recurrence and transience are class properties,

If i ∈ S is recurrent and j ↔ i =⇒ j is recurrent


If i ∈ S is transient and j ↔ i =⇒ j is transient

Proof. It suffices to show that if i ∈ S is transient and j ↔ i, then j is


transient. Since j ↔ i, there exist s, r ≥ 0 such that ps (i, j) > 0 and
w2022-math447: stochastic processes 13

pr ( j, i ) > 0. For all n ∈ N, it holds that,

pn+r+s (i, i ) ≥ ps (i, j) · pn ( j, j) · pr ( j, i )

by Chapman-Kolmogorov. Therefore,
∞ ∞
1
∑ pn ( j, j) ≤
ps (i, j) · pr ( j, i ) ∑ pn+r+s (i, i ) expanding the expectation
n =1 n =1

1

ps (i, j) · pr ( j, i ) ∑ pn (i, i )
n =1
< ∞ since i is transient

It follows that j is transient. Hence, if one state of a communication


class is transient, all states in that class are transient.
Conversely, if one state is recurrent, then the others must be recur-
rent. By contradiction, if the communication class contains a transient
state then by what was just proven all the states are transient.

Theorem 44. Every state in a finite, irreducible Markov chain is recurrent.

Proof. Every pair i, j ∈ S belongs to the same communication class,


and that class has finitely many elements. By definition, there is
positive probability of reaching j from i since i ↔ j. If i is visited
infinitely often then we get this chance of visiting j infinitely often. If
an event has a positive probability of occuring, and we get an infinite
number of trials, then it will occur an infinite number of times.

Theorem 45. Let ( Xn )n≥0 be an irreducible Markov chain. Then,



1 1
E[ Ni ] = ∑ pn (i, i ) = +
1 − P(τi < ∞)
= +
P(τi = ∞)
= ( I − P)ii−1
n =0

Moreover, 
finite ⇐⇒ i is transient
E[ Ni ] = Recall:
infinite ⇐⇒ i is recurrent Let A be a square matrix with the
property that An → 0, as n → ∞. Then,
∑∞ n −1
n=0 A = ( I − A ) . This gives the
Proof. Assume that E[ Ni ] < ∞. Let Ri be the number of returns to a matrix analog of the sum of a geometric
(n) series of real numbers.
state i. Define a sequence (τi )n≥0 by,

inf{n ≥ 1 | X ∗ = i } Ri ≥ n
(n) n
τi =
∞ otherwise

n=1 1{ Xn =i } is
( n −1)
where ( Xn∗ ) is the process ( Xn ) started at τi . Ni = ∑∞
1 more than the number of returns Ri , so,

Ni = 1 + Ri = 1 + ∑ 1{τi(n) <∞}
n =1
w2022-math447: stochastic processes 14

(n)
and τi = ∞ if and only if Xn visits i fewer than n times. Now,
(n) (1)
P(τi < ∞) = [ P(τi = ∞)]n by time homogeneity (⋆). Therefore,
 ∞ 
E[ Ni ] = E 1 + ∑ 1{τi(n) <∞}
n =1
 ∞ 
=E ∑ 1{τi(n) <∞}
n =0
∞  
= ∑ E 1{ τ ( n ) < ∞ } by Linearity of Expectation
n =0 i


∑ P(τi
(n)
= < ∞)
n =0

∑ P(τi
(1)
= < ∞)
n =0

∑ [ P(τi
(1)
= < ∞)]n (⋆)
n =0
1
= by definition of a geometric series
1 − P(τi+ < ∞)

finite ⇐⇒ i is transient
Thus, E[ Ni ] = 1− P(τ1+ <∞) =
i  infinite ⇐⇒ i is recurrent Stirling’s Formula states that as n → ∞,
√  n n
n! ∼ 2πn
e
Example 5: Simple Symmetric Random Walk on Z

The simple symmetric random walk on Z is recurrent,

E[ Ni ] = ∑ pn (0, 0)
n ≥0

= ∑ p2n (0, 0)
n ≥0
 
2n 1
= ∑ n
· 2n
2
n ≥0
1
≥ ∑ √ by Stirling’s Formula
4n
n ≥1

=∞

Definition 46 (Canonical Decomposition). The canonical decomposi-


tion of the state space S of a finite Markov chain is a separation of S,

S = T ∪ R1 ∪ · · · ∪ R m
w2022-math447: stochastic processes 15

where R1 , · · · , Rm are the communication classes of recurrent states and T


is the set of all transient states. P has the block matrix form,

T R1 R2 ··· Rm
∗ ∗ ∗ ··· ∗  T
0
 P1 0 ··· 0  R1
P =
0 0 P2 ··· 0  R2
. .. .. .. ..  ..
 .. . . . .  .
0 0 0 0 Pm Rm

where each square stochastic matrix Pi corresponds to a closed recurrent


communication class which is irreducible with a restricted state space5 . 5
A communication class is closed if it
consists of all recurrent states.
Remark 47. The block matrix form facilitates taking matrix powers,

T R1 R2 ··· Rm
∗ ∗ ∗ ··· ∗  T
0 limn→∞ P1n 0 ··· 0  R1
n
 
lim P =
0 0 limn→∞ P2n ··· 0  R2
n→∞ .

 .. .. .. .. ..  .
 ..
. . . .
0 0 0 0 limn→∞ Pnm Rm

Corollary 48. For every recurrent class Ri , there is a stationary distribu-


⃗ so that πi > 0 if and only if i ∈ Ri .
tion π

Corollary 49. The dimension of the eigenspace of P for the eigenvalue 1 is


the number of recurrent classes in the Markov chain.

Periodicity
Definition 50 (Period). The period of a state i, d = d(i ) is,

gcd( Ji ) where Ji = {n ≥ 0 | pn (i, i ) > 0}

If d(i ) = 1, state i is called aperiodic.

Theorem 51. The states of a communication class all have the same period.

Proof. Suppose that there exist states i, j such that i ↔ j and d(i ) ̸=
d( j). Since i and j communicate, there exist r, s ∈ N such that,

pm (i, j) > 0 and pn ( j, i ) > 0

Then m + n is a possible return time for i,

pm+n (i, i ) = ∑ pm (i, k) · pn (k, i) ≥ pm (i, j) · pn ( j, i) > 0


k∈S
w2022-math447: stochastic processes 16

and d(i ) is a divisor of m + n. Assume that pr ( j, j) > 0 for some


integer r. Then pr+m+n (i, i ) ≥ pm (i, j) · pr ( j, j) · pn ( j, i ) > 0 and
d(i ) is a divisor of r + m + n. Since d(i ) divides both m + n and
r + m + n, it must also divide r. Thus, d(i ) is a common divisor of
the set {r > 0 | pr ( j, j) > 0}. Since d( j) is the largest such divisor,
d(i ) ≤ d( j). A symmetric argument gives that d( j) ≤ d(i ).

Example 6: Periodicity

A random walk on the n-cycle has no limiting distribution


when n is even. The graph is regular, and the unique station-
ary distribution is uniform. However, the chain alternates
between even and odd states, and its position after n states
depends on the parity of the initial state.

Definition 52 (Periodic Chain). A Markov chain is periodic if it is


irreducible and all states have period greater than 1.
Note on Periodicity:
Definition 53 (Aperiodic Chain). A Markov chain is aperiodic if it is Any state i satisfying that Pii > 0 is
irreducible and all states have period equal to 1. necessarily aperiodic. Thus, a suficient
condition for an irreducible Markov
chain to be aperiodic is that Pii > 0 for
Theorem 54. P is irreducible and aperiodic if and only if P is regular. some i, i.e., at least one diagonal entry
of the transition matrix is non-zero.
Proof. Suppose that P is irreducible and aperiodic. Consider any
states i, j. Since P is irreducible, there exists m(i, j) so that pm(i,j) (i, j) >
0. Since P is aperiodic, there exists M(i ) so that for all n ≥ M (i ),
pn (i, i ) > 0. Taken together, this means that for n ≥ M (i ),

pn+m(i,j) (i, j) ≥ pn (i, i ) pm(i,j) (i, j) > 0

Now, pn (i, j) > 0 for all n ≥ max{ M(i ) + m(i, j) | (i, j) ∈ S × S}.
Suppose that P is regular. By definition, there exists an M > 0
such that for all n ≥ M, Pn has all entries strictly positive. This
means that pn (i, j) > 0 for all states i, j, and consequently that P is
irreducible. If Pn has strictly positive entries, so too does Pn+1 . Thus,
P( Xn = i | X0 = i ) > 0 and P( Xn+1 = i | X0 = i ) > 0. Since
gcd(n, n + 1) = 1, P is aperiodic.

Theorem 55 (Limit Theorem for Aperiodic Irreducible Chains). If


( Xn )n≥0 is a finite state, time-homogeneous, aperiodic, irreducible Markov
chain and P is its transition matrix, then there is a unique, positive, station-
ary distribution π⃗ > 0 such that,

lim Pn = V
n→∞

⃗.
where V is a matrix with all rows equal to π
w2022-math447: stochastic processes 17

Absorbing Chains
Definition 56 (Absorption). A state i ∈ S is called absorbing if,

p(i, i ) = 1

An absorbing chain has at least one absorbing state6 . 6


Intuitively, this is a state i that the
chain never leaves once it first visits i.
Definition 57 (Absorption Probability). Let ( Xn )n≥0 be a Markov chain
with all states transient or absorbing. The absorption probability is the
probability that the chain is absorbed in state j from transient state i.

Definition 58 (Absorption Time). Let ( Xn )n≥0 be a Markov chain with


all states transient or absorbing. The absorption time is the expected Note on Absorption Time:
number of steps from transient state i to absorption in some absorbing state. The problem of computing hitting time
reduces to the problem of computing
time to absorption because a state can
Theorem 59. Let ( Xn )n≥0 be finite-state and irreducible with transition be modified to be absorbing.
matrix P. To find the expected hitting time, E[τi ],

• Consider a new chain in which i is an absorbing state

• Define P̃ by deleting the ith row and setting P̃ii = 1

• Define Q by deleting the ith row and i-h column of P

Assume that the chain starts in state a. The first time that P hits i is,

∑ ( I − P̃ )−1 if i ̸= a
b ̸ =i a,b
1 + ∑ j̸=i Pij · ∑b̸=i ( I − P̃ ) − 1
if i = a
j,b

Proof. We need to find,

E[τa ] = E[inf{n ≥ 0 | Xn ∈ A}]



= ∑ n · P(τa = n)
n =1

= ∑ P(τa ≥ n)
n =1

For an absorbing state a,

P(τa ≥ n) = P( Xn−1 ̸= a) (n ≥ 1)

Considering every probable path that does not go through a,


k ∞
n −1
E[τa ] = ∑ ∑ ∑ πi · P̃i,b
i =1 n =1 b ̸ = a
k
= ∑ ∑ πi ( I − P̃)i,b
−1
i =1 b ̸ =1
w2022-math447: stochastic processes 18

Positive and Null Recurrence


Definition 60 (Positive Recurrent). A recurrent state j is positive recur-
rent if the expected return time E[τj+ | X0 = j] is finite7 . 7
Positive recurrence is the infinite
analog of finite recurrent chains.
Theorem 61 (Limit Theorem for Irreducible, Positive Recurrent
Chains). Let ( Xn )n≥0 be an infinite, irreducible, and positive recurrent
Markov chain. There exists a unique, positive, stationary distribution π,
which is the limiting distribution of the chain8 . That is, 8
For infinite irreducible chains that are
null recurrent, no stationary distribu-
π j = lim Pijn for all i, j tion exists.
n→∞

Moreover,
1
πj = for all j
E[τj+ ]

Definition 62 (Null Recurrent). A recurrent state j is null recurrent if


the expected return time E[τj+ | X0 = j] is infinite9 . 9
Null recurrent chains do not have
stationary distributions.
Theorem 63. Positive and null recurrence are class properties10 . 10
In particular, all states in a recurrent
communication class are either positive
Proof. Assume that i is a positive recurrent state. Let j be another or null recurrent.
state in the same communication class as i. Since i ↔ j, there exist
s, r ≥ 0 such that pr ( j, i ) > 0 and ps (i, j) > 0. Thus,

n −1
1 1
µj
= lim
n→∞ n
∑ pm ( j, i )
m =0

1 n −1
≥ lim
n→∞ n
∑ pr ( j, i) · pm−r−s (i, i) · ps (i, j)
m =r + s
!
n −1
n−r−s
 
1
= lim
n→∞ n
pr ( j, i ) ∑ pm−r−s (i, i) Ps (i, j)
n − r − s m= r +s
 
1
= pr ( j, i ) ps (i, j) > 0.
µi

Hence, µ j < ∞ and j is positive recurrent.

Definition 64 (Ergodic). A Markov chain is called ergodic if it is irre-


ducible, aperiodic, and all states have finite expected return times11 . 11
Every finite chain is ergodic, and
the condition that all states have finite
Theorem 65 (Limit Theorem for Ergodic Chains). Let ( Xn )n≥0 be an expected return times is equivalent to
all states being positive recurrent.
ergodic Markov chain. There exists a unique, positive, stationary distribu-
tion π, which is the limiting distribution of the chain. That is,

π j = lim Pijn for all i, j


n→∞

Moreover,
1
πj = for all j
E[τj+ ]
w2022-math447: stochastic processes 19

Reversibility
Definition 66 (Detailed Balance Equations).

πi Pij = π j P ji for all i, j ∈ S

More generally, we can write,

P ( X0 = i 0 , X1 = i 1 , · · · , X n = i n ) = P ( X0 = i n , X1 = i n − 1 , X n = i 0 )

Theorem 67. Let ( Xn )n≥0 be a Markov chain with transition matrix P. If


π ⃗ is stationary12 for P.
⃗ satisfies the detailed balance equations, then π 12
Checking detailed balance is often the
simplest way to verify that a particular
Proof. ∑i∈S πi Pij = ∑i∈S π j P ji = π j since P is stochastic. distribution is stationary.

Definition 68. An irreducible Markov chain ( Xn )n≥0 with transition


matrix P and stationary distribution π⃗ is time reversible if it satisfies the
detailed balance equations. The time reversal of ( Xn ) is the chain with,
π j Pij
P̂ij :=
πi
as its transition matrix.
Remark 69. If a chain with transition matrix P is reversible, then P̂ = P.
Proof. By the detailed balance equations,
π j Pij π j P ji
P̂ij = =
πi πi
so P ji = Pij .

Example 7: Reversibility

A simple random walk on a graph is time reversible.


  
deg(i ) 1
πi Pij = for neighbors i, j
2| E( G )| deg(i )
1
=
2| E( G )|
  
deg( j) 1
=
2| E( G )| deg( j)
= π j P ji

Markov Chain Monte Carlo

Markov Chain Coupling


Definition 70 (Total Variation Distance). The total variation distance
between two distributions µ and ν on a state space S is defined by13 , 13
This definition is explicitly proba-
bilistic: the distance between µ and ν is
∥µ − ν∥ TV = sup ∥µ( A) − ν( A)∥ the maximum difference between the
A⊆S probabilities assigned to a single event
by the two distributions.
w2022-math447: stochastic processes 20

Example 8: Coupling Distance

Suppose, for illustration, that the total variation distance,

∥π7 − π ∥ TV = 0.17

This tells us that the probability of any event, for example, the
probability of winning any specified card game using a deck
A coupling of two probability distri-
shuffled 7 times, differs by at most 0.17 from the probability of butions µ and ν is a pair of random
the same event using a perfectly shuffled deck. variables ( X, Y ) defined on a single
probability space such that the marginal
distribution of X is µ and the marginal
distribution of Y is ν. That is, a cou-
Definition 71 (Coupling of Markov Chains). A coupling of Markov pling ( X, Y ) satisfies,
chains is a process ( Xn , Yn )n≥0 with the property that both ( Xn ), (Yn )
P( X = x ) = µ( x ) and P(Y = y) = ν(y)
are Markov chains with transition matrix P, although the chains may have
different starting distributions.

Definition 72 (Coupling Time). The coupling time of a process ( Xn , Yn )n≥0


is defined to be the first time T in which Xn equals Yn ,

T = inf{n | Xn = Yn }

Lemma 73. Consider a coupling ( Xn , Yn )n≥0 of Markov chains ( Xn ), (Yn ).

⃗0 · Pn − π
∥π ⃗ ∥ TV ≤ P( T > n) for all n > 0 The coupling inequality reduces the
problem of showing that,
⃗ is the stationary distribution of (Yn ).
where π
⃗0 · Pn − π
∥π ⃗∥ → 0
to that of showing,
Proof. Define the process (Yn∗ ) by,
P( T > n) → 0 ⇐⇒ P( T < ∞) = 1

Y
n if n < T
Yn∗ =
 Xn if n ≥ T

(Yn∗ ) is a Markov chain with the same probability transition matrix P


as ( Xn ). This is because Yn and Xn share P and Zn and Xn share π ⃗0 .

Since we also have that (Yn ) ∼ π for all n,

⃗ = P( Xn ∈ A) − P(Yn∗ ∈ A)
⃗0 · Pn ( A) − π
π
= P( Xn ∈ A, T ≤ n) + P( Xn ∈ A, T > n)
− P(Yn∗ ∈ A, T ≤ n) − P(Yn∗ ∈ A, T > n)

However, on the event { T ≤ n}, Yn∗ = Xn , so that P( Xn ∈ A, T ≤ n) =


P(Yn∗ ∈ A, T ≤ n). Simplifying gives that,

⃗ = P( Xn ∈ A, T > n) − P(Yn∗ ∈ A, T > n) < P( T > n)


⃗0 · Pn ( A) − π
π
w2022-math447: stochastic processes 21

Remark 74 (Doeblin Coupling Argument). Proving the Markov Conver-


gence Theorem can be done by showing that P( T < ∞) = 1.

Proof. The bivariate chain Z = {( Xn , Yn ) | n ≥ 0} is a Markov chain


on the state space S × S. Its transition matrix P Z can be written,

P(Zi = P(Xi1 ,j1 ) · PY(i2 ,y2 )


1 ,i2 ),( j1 ,j2 )

and the stationary distribution is,

π Z (i, j) = πi π (y)

P( T < ∞) = 1 occurs if the Z chain hits {( j, j) | j ∈ S} ⊆ S × S.


Since Z has a stationary distribution, it suffices to show that Z is
irreducible14 . This can be done using a numbertheoretic proof. 14
If an irreducible Markov chain has a
stationary distribution, then the chain is
recurrent.
Metropolis-Hastings Algorithm
Definition 75 (Markov Chain Monte Carlo). Given a discrete or contin-
⃗ , the goal of Markov Chain Monte Carlo is
uous probability distribution π
to simulate a random variable X ∼ π ⃗.

Remark 76 (Strong Law of Large Numbers for Markov Chains). If


( Xn )n≥0 is a finite state, time-homogeneous, aperiodic, irreducible Markov
chain and r is a bounded and real-valued function, then15 , 15
The chain is not i.i.d, but successive
excursions between visits to the same
r ( X1 ) + · · · + r ( X n ) state are independent.
lim = E[r ( X )] a.s.
n→∞ n
where E[r ( X )] = ∑ j r ( j)π j .

Definition 77 (Metropolis-Hastings Algorithm). Let π ⃗ be a discrete


probability distribution. The Metropolis-Hastings Algorithm constructs
⃗,
a reversible Markov chain ( Xn )n≥0 whose stationary distribution is π

1. Let T, the proposal chain, be a transition matrix16 for any irreducible 16


It is assumed that the user knows
Markov chain with the same state space as π ⃗ how to sample from T.

2. Assume that at time n, the chain is at state i

3. Choose a new state j, the proposal state, according to T ij

4. Let U ∼ Uni f (0, 1). If Xn = i, define an acceptance function,



π j T ji  j if U ≤ a(i, j)
a(i, j) = and let Xn+1 :=
πi T ij i otherwise

Proof. The sequence ( Xn )n≥1 constructed by the Metropolis-Hastings


Algorithm is a Markov chain, as each Xn+1 only depends on Xn . Let
w2022-math447: stochastic processes 22

P be its transition matrix. We need to show that ( Xn ) is reversible


with stationary distribution is π⃗ . Given X0 = i, then,

 a(i, j) if a(i, j) ≤ 1
P(U ≤ a(i, j)) =
1 otherwise

 a(i, j) if π T ≤ π T
j ji i ij
=
1 otherwise

and for i ̸= j,

 T · a(i, j) if π j T ji ≤ πi T ij
ij
Pij =
 T ij otherwise

The diagonal entries of P are determined by the fact that the rows of
P sum to 1. There are two cases,

• If π j T ji ≤ πi T ij ,

π j T ij
 
πi Pij = πi T ij a(i, j) = πi T ij = π j T ji = π j P ji
πi T ij

• If π j T ji < πi T ij ,

πi T ij
 
πi Pij = πi T ij a(i, j) = π j T ji = π j T ji a( j, i ) = π j P ji
π j T ji

Hence, the detailed balance equations are satisfied.

Classical Markov Chains

Electrical Networks
We can represent Markov chains as electrical networks, which we
typically denote by undirected weighted graphs G = (V, E).

Definition 78 (Electrical Network). A network is a pair ( G, c), where


G = (V, E) is a countable graph and c : E → (0, ∞) is a function assigning
a positive conductance to each edge such that,

Cx : = ∑ Cxy < ∞ ∀x ∈ V
y∼ x

and ( Xn )n≥0 is a random walk with transition matrix,

Cxy
P xy :=
Cx
w2022-math447: stochastic processes 23

Definition 79 (Resistance). The resistance R xy of an edge e = ( x, y) is,

1
R xy =
Cxy

That is, it is the reciprocal of its conductance.

Definition 80 (Effective Conductance). The effective conductance


between two disjoint, non-empty finite sets of vertices A and B is defined,

ceff
A,B : = ∑ ca · P(τv < τb+ )
v∈ A

where τv is the first hitting time and τa+ is the first return time17 . When 17
Recall that we write τz and τa+ for the
A = { a} and B = {b}, the effective conductance is, first time that the random walk visits
z and the first positive time that the
random walk visits a respectively.
Ceff
A,B : = ∑ Ca · P(τv < τb+ | X0 = a)
v∈ A

A,B = ∞.
When A and B are not disjoint, we define Ceff

Definition 81 (Effective Resistance). The effective resistance between


two disjoint, non-empty finite sets of vertices A and B is defined,

1
Reff
A,B = eff
C A,B

Remark 82 (Series Additivity). Vertices connected in series can be re-


placed by one vertex whose resistance is the sum of the resistances.

Remark 83 (Parallel Additivity). Vertices connected in parallel can be


replaced by one vertex whose conductance is the sum of the conductances.
Shorting two vertices adds infinite
Remark 84. Given a network G and two disjoint, finite, non-empty sets conductance between them (or, merges
them while preserving edges). Thus,
of vertices A and B, we can form a network G ′ by contracting ("shorting") shorting two sets of nodes can only
each of the sets A and B into single vertices [ A] and [ B]. Then, decrease the effective resistence of the
network between two nodes.

ceff eff
A,B ( G ) = c[ A],[ B] ( G )

Remark 85 (Rayleigh’s Monotonicity Principle). Let G = (V, E) be a


finite graph, and let A and B be disjoint, non-empty subsets of V. Then,

ceff
A,B ( G ) Cutting an edge can only increase the
effective resistance between the two
is a monotone increasing function of c ∈ (0, ∞) E . vertices that it is adjacent to.

Remark 86. An irreducible Markov chain ( Xn ) is recurrent if and only if


the conductance to infinity, limn→∞ C0,[n] , is finite,

P(τ0 < ∞) = lim C0,[n] < ∞


n→∞
w2022-math447: stochastic processes 24

Pólya’s Theorem
Theorem 87 (Pólya’s Theorem). The d-dimensional hybercubic lattice Zd
is recurrent if d ≤ 2 and transient if d ≤ 3.
Corollary 88. A simple random walk on any subgraph of Z2 is recurrent18 . 18
Adding a finite number of edges to a
transient graph preserves transience.

Pólya’s Urn
Definition 89 (Pólya Urn Model). Pólya’s Urn is the process,
• An urn contains two balls, one black and one white

• Proceed by choosing a ball at random from those already in the urn

• Return the chosen ball to the urn and add another ball of the same color
The sequence of ordered pairs listing the numbers of black and white balls is
a Markov chain. A configuration ( a, b) with a black balls and b white balls
evolves according to,

( a + 1, b) with probability a
a+b
( a, b) →
(b, b + 1) with probability b
a+b

Example 9: Pólya’s Urn

We want to find the likelihood that the system reaches,

( B, W ) = (m, n) starting from ( B, W ) = (b, w) Figure 1: The urn process as a trajectory


on a two-dimensional lattice, where
Consider the transition, bullets indicate intermediate stages.

(1, 1) → (3, 3)

where one possible path is,

(1, 1) → (1, 2) → (1, 3) → (2, 3) → (3, 3)

Its likelihood is,


1 2 1 2 (1 · 2) · (1 · 2)
× × × =
2 3 4 5 2·3·4·5
there are (42) = 6 distinct routes, each with this same probabil-
ity. In general, for a path (b, w) → (m, n),
[b(b + 1) · · · (m − 1)] · [w(w + 1) · · · (n − 1)]
(b + w)(b + w + 1) · · · (m + n − 1)
Rewriting this probability using factorials,
( m − 1) ! ( n − 1) ! ( b + w − 1) !
× ×
( b − 1) ! ( w − 1) ! ( m + n − 1) !
w2022-math447: stochastic processes 25

The total number of distinct paths from (b, w) to (m, n) is,

m+n−b−w
 

m−b

so the transition probability P that, starting from configuration


(b, w), the system reaches (m, n) is:
! ! ! −1
m−1 n−1 m+n−1
P=
b−1 w−1 b+w−1

In particular, for (b, w) = (1, 1),

1 1
P= or, if U = m + n, P=
m+n−1 U

Branching Processes

Mean Generation Size


Definition 90 (Offspring Distribution). The offspring distribution X
gives the probability xk that an individual gives birth to k children,
Figure 2: Family tree.
X = ( x1 , x2 , · · · )

independently of other individuals.

Definition 91 (Branching Process). Let Zn be a random variable that


denotes the size of a population of living species. The Markov chain ( Zn )n≥1
with values in N0 is a branching process if,
Zn
Zn+1 = ∑ Xj
j =1

where X j denotes the number of children born to the jth person in the nth
generation. ( X j ) j≥1 is an i.i.d sequence with common distribution X. Fur-
thermore, Zn is independent of ( X j ).

Definition 92 (Extinction Time). The extinction time T0 of a branching


process is the hitting time to zero, that is, T0 := τ0 .

Definition 93 (Mean Generation Size). The mean generation size µn is


the mean size of the nth generation, that is, µk := E[ Zn ].

Definition 94 (Mean Offspring Size). The mean offspring size µ is the


mean of the offspring distribution, that is, µn := E[ X ].

Theorem 95. Let µ = ∑∞


k =0 k · xk be the mean of the offspring distribution.

E[ Zn ] = µn
w2022-math447: stochastic processes 26

Proof. By the Total Law of Expectation,



E[ Zn ] = ∑ E[Zn | Zn−1 = k] · P (Zn−1 = k)
k =0
∞  Zn−1 
= ∑E ∑ Xi Zn−1 = k · P ( Zn−1 = k )
k =0 i =1
∞  k 
= ∑ E ∑ Xi | Zn−1 = k · P ( Zn−1 = k)
k =0 i =1
∞  k 
= ∑ E ∑ i · P (Zn−1 = k) since Xi and Zn−1 independent
X
k =0 i =1

= ∑ µ · kP (Zn−1 = k)
k =0
= µ · E[ Zn−1 ]

Iterating the recurrence for n ≥ 0 gives that,

E[ Zn ] = µE[ Zn−1 ] = µ2 E[ Zn−2 ] = · · · = µn E[ Z0 ] = µn

since Z0 = 1

Theorem 96. If µ < 1, then P( T0 > n) ≤ E[ Z0 ] = µn . In particular, the


branching process goes extinct with probability 1, ( T0 < ∞) = 1.

Proof. By Markov’s Inequality,

P( T0 > n) = P( Zn ≥ 1) ≤ E[ Zn ] = µn E[ Z0 ]

Definition 97 (Criticality). A branching process is subcritical if µ < 1,


critical if µ = 1, and supercritical if µ > 1. Moreover,

0,

 if µ < 1
n
lim E[ Zn ] = lim µ = 1, if µ = 1
n→∞ n→∞
Note on Criticality:

∞, if µ > 1


For a subcritical process, mean genera-
tion size declines exponentially to zero.
For a supercritical process, it exhibits
exponential growth.
Generating Functions
Definition 98 (Generating Function). Let X be a discrete random vari-
ables with values in N0 . The probability generating function of X is,
∞ ∞
G ( s ) = E[ s X ] = ∑ sk · P( X = k ) = ∑ sk · πk
k =0 k =0
2
= P( X = 0) + sP( X = 1) + s P( X = 2) + · · ·

⃗ = (πk )k≥0 is the law of X.


where π
w2022-math447: stochastic processes 27

Example 10: Generating Functions

Let X ∼ Unif({0, 1, 2}). Then,


   
1 1 1 1 
G (s) = + s + s2 = 1 + s + s2
3 3 3 3

Let X ∼ Geom( p). For |s| < 1,


∞ ∞
sp
G (s) = ∑ sk p(1 − p)k−1 = sp ∑ (s(1 − p))k−1 = 1 − s(1 − p)
k =1 k =1

Let X ∼ Po(µ). For µ > 0,


∞ ∞ Properties of Generating Functions:
e−µ µk k (µs)k
G (s) = ∑ k!
· s = e−µ · ∑
k!
= e−µ eµs = eµ(s−1) 1. If X and Y satisfy,
k =0 k =0
GX (s) = GY (s) ∀s
law
then X = Y.
Remark 99. The series G (s) converges absolutely for |s| ≤ 1.
2. If X and Y are independent, then,
⃗ be the law of X. Then,
Proof. Let π GX +Y (s) = GX (s) · GY (s)
∞ ∞
| G (s)| = ∑ sk · πk ≤ ∑ |s|k · πk ≤ 1
k =0 k =0

so G (s) exists and is well-defined for |s| ≤ 1.

Theorem 100. Let ( Xn ) be an i.i.d sequence of random variables. Define


Z := ∑in=1 Xn . The probability generating function of Z is [ GX (s)]n .

Proof. Expanding the definition,

GZ ( s ) = E [ s Z ]
n
= E [ s ∑ i =1 Xn ]
 n 
= E ∏ s Xk
k =1
n
= ∏ E[sXk ] by independence
k =1
= G X1 ( s ) · · · G X n ( s )

If Xn is i.i.d., then,

GZ (s) = GX1 (s) · · · GXn (s) = [ GX (s)]n

where X has the same distribution as Xi .

Theorem 101. Probabilities for X can be obtained from the generating


function by successive differentiation. If G ( j) is the jth derivative of G,

G ( j) (s ) = ∑ k ( k − 1) · · · ( k − j + 1) s k − j P ( X = j )
k= j
w2022-math447: stochastic processes 28

Proof. Observe that,

G (0) = P ( X = 0)

G ′ (0) = ∑ ksk−1 P(X = k) = P ( X = 1)
k =1 s =0

G ′′ (0) = ∑ k ( k − 1 ) s k −2 P ( X = k ) = 2P( X = 2)
k =2 s =0

and so on. In general,



G ( j ) (0) = ∑ k ( k − 1) · · · ( k − j + 1) s k − j P ( X = j ) = j!P( X = j)
k= j s =0

and thus
G ( j ) (0)
P( X = j) = , for j = 0, 1, . . .
j!

Theorem 102. The generating function of the nth generation size Zn is the
n-fold composition of the offspring distribution generating function,

Gn (s) = E[s Zn ] = |G ◦ G ◦{z· · · ◦ G}(E[s Z0 ])


n times

Proof. The generating function of the nth generation size Zn is,


 Z    Z 
n −1 n −1
Gn (s) = E[s Zn ] = E sΣk=1 Xk = E E s∑k=1 Xk | Zn−1

where the last inequality is by the Total Law of Expectation.


 Z   
n −1 z

E s k =1 X k ∑ X
| Zn−1 = z = E s k=1 | Zn−1 = z by conditioning
k

 
z
= E s∑k=1 Xk by independence
 z 
=E ∏ s Xk
k =1
z
= ∏ E[sXk ] by independence
k =1
= [ G (s)]z for all z

G (s) is the generating function of the offspring distribution,



G (s) = ∑ sk πk
k =0

so this gives that,


Zn−1
E [ s ∑ k =1 Xk
| Zn−1 ] = [ G (s)] Zn−1
w2022-math447: stochastic processes 29

Taking expectations,

Gn (s) = E[ G (s) Zn−1 ] = Gn−1 ( G (s))

The result follows by induction on n.

Corollary 103. The extinction probability of a branching process is,

P( T0 < ∞) = lim lim |G ◦ G ◦{z· · · ◦ G}(E[s Z0 ])


k → ∞ s →0
k times

Proof. The generating function for the nth generation size Zn is,

Gn (s) = ∑ sk P (Zn = k)
k =0

Since P( T0 ≤ k) = P( Zk = 0),

P( Zk = 0) = lim E[s Zk ] = lim Gk (s) = lim |G ◦ G ◦{z· · · ◦ G}(E[s Z0 ])


s →0 s →0 s →0
k times
S∞ S∞
Now, P( T0 < ∞) = P( k =1 { T0 ≤ k}) = P( k =1 { Zk = 0}). Therefore,

P( T0 < ∞) = P(
[
{ Zk = 0})
k =1
= lim P({ Zk = 0}) since { Zk = 0} ⊆ { Zk+1 = 0}
k→∞
= lim lim |G ◦ G ◦{z· · · ◦ G}(E[s Z0 ])
k → ∞ s →0
k times

by the continuity of measure over increasing unions.

Theorem 104. The probability generating function G (s) of a discrete


random variable X is convex and non-decreasing on [0, 1] with G (1) = 1.

Proof. We have that,

1. G (1) = 1 since,

E[1 X ] = ∑ P( X = k) = 1
k =0
2. G (s) is strictly increasing since,

G ′ (s) = ∑ ksk−1 πk > 0 (s > 0 and πk ̸= 0 for all k ≥ 1)
k =1
3. G (s) is strictly convex since,

G ′′ (s) = ∑ k ( k − 1 ) s k −2 π k > 0 (s > 0 and πk ̸= 0 for all k ≥ 2)
k =2

Theorem 105. G ′ (1) can be used to find the mean of X.


w2022-math447: stochastic processes 30

Proof. G ′ (1) = ∑∞ ∞
k =1 k · π k = ∑ k =1 k · P ( X = k ) = E [ X ] .

Theorem 106. Let G (s) be the probability generating function of a discrete


random variable X. The smallest positive root of the equation G (s) = s is
the probability of eventual extinction, that is, P( T0 < ∞).

Proof. The extinction probability P( T0 < ∞) of a branching process is


a root of the equation s = G (s). To see this,

P( T0 ≤ k) = P( Zk = 0)
= Gk (0)
= G ( Gk−1 (0))
= G ( P( Zk−1 = 0))
= G ( P( T0 ≤ k − 1))

Taking the limits on both sides and using the continuity of G,

P( T0 < ∞) = G ( P( T0 < ∞))

Let x be a positive solution of s = G (s). We need to show that,

P( T0 < ∞) = lim P( Zk = 0) ≤ x
k→∞

The proof is by induction on k. Since G (s) is increasing on (0, 1] and


x > 0, P( Z1 = 0) = G1 (0) = G (0) ≤ G ( x ) = x. Assume that
P( Zk = 0) ≤ x for k < n. Then, P ( Zn = 0) = Gn (0) = G ( Gn−1 (0)) =
G ( P( T0 ≤ k − 1)) ≤ G ( x ) = x. Taking limits as n → ∞ gives,

P( T0 < ∞) ≤ x

Theorem 107. Let G (s) be the probability generating function of a discrete


random variable X. Exactly one of the following holds,

1. G (s) = s for infinitely many s ∈ [0, 1] and,

lim G ◦ G ◦{z· · · ◦ G}(s) = s ∀s ∈ [0, 1]


k→∞ |
k times

2. G (s) = s for two points s1 , s2 ∈ [0, 1] and,

lim G ◦ G ◦{z· · · ◦ G}(s) = s2 ∀s ∈ [0, 1] and s2 ̸= 1


k→∞ |
k times

3. G (s) = s for a unique point s ∈ [0, 1] and,

lim G ◦ G ◦{z· · · ◦ G}(s) = 1 ∀s ∈ [0, 1]


k→∞ |
k times
w2022-math447: stochastic processes 31

Proof. If πk = δ1k for all k ≥ 0, then G (s) = s for all s ∈ [0, 1] and
µ = 1. This implies that G (s) has infinitely many fixed points in
the interval [0, 1]. Assume that πk ̸= δ1k . Since G is convex, the two
curves y = G (s) and y = s can intersect at either one or two points.
The derivative of G (s) at s = 1 distinguishes these two cases.

1. If µ ≤ 1, then G ′ (1) ≤ 1. Since G ′ is strictly increasing in s,


G ′ (s) < G ′ (1) = 1 for 0 < s < 1. Let h(s) = s − G (s). Then
h′ (s) = 1 − G ′ (s) > 0 for 0 < s < 1. But h is increasing and
h(1) = 0, so h(s) < 0 and s < G (s) for 0 < s < 1. Then, y = G (s)
lies above y = s for 0 < s < 1, and s = 1 is the only point of
intersection. Thus, P( T0 < ∞) = 1.

2. If µ > 1, then G ′ (1) > 1, then h(0) = 0 − G (0) = −π0 =


− P( X = 0) < 0 since P( X = 0) = 0 contradicts convexity. Also,
h′ (1) = 1 − G ′ (1) = 1 − µ < 0. Thus, h(s) is decreasing at s = 1.
Since h(1) = 0, there exists 0 < t < 1, such that h(t) > 0. It follows
that there exists a fixed point s2 = G (s2 ) satisfying 0 < s2 < 1.

Example 11: Computing Extinction Probabilities

Consider a branching process with,


 
Z0 = 1 and π ⃗ = 1/3 1/3 1/3

⃗ is the offspring distribution. The curves,


where π

1
y=s and y = G (s) = (1 + s + s2 )
3
intersect at s = 1. Therefore, µ ≤ 1, and the extinction proba-
bility is P( T0 < ∞) = 1. We can also compute µ explicitly,

1
µ= (0 + 1 · s + 2 · s ) =1
3 s =1
w2022-math447: stochastic processes 32

Example 12: Computing Extinction Probabilities

Consider a branching process with,

Z0 = 1 and ⃗ ∼ Po(µ)
π

⃗ is the offspring distribution. Recall that,


where π

G ( s ) = e µ ( s −1)

Solving s = eµ(s−1) numerically by iteration,

G1 (0) G2 (0) G3 (0) G4 (0) G1 0(0) G15 (0)


0.135335 0.177403 0.192975 0.199079 0.203169 0.203187

Poisson Processes

Definition 1
Figure 3: A Poisson probability generat-
ing function with various means µ.
Definition 108 (Counting Function). A counting process ( Nt )t≥0 is a
collection of random variables with values in N0 such that,

Nt ≥ Ns ∀t ≥ s ≥ 0

and limt→s+ Nt = Ns for all s ∈ R (right-continuity19 ). 19


If 0 ≤ s < t, then Nt − Ns is the
number of events in the interval (s, t]
There are three equivalent definitions of a Poisson process, each of
which gives special insights into the stochastic model.

Definition 109 (Poisson Process – 1a). A Poisson process with parame-


ter λ ≥ 0 is a counting process ( Nt )t≥0 satisfying,

1. N0 = 0 Figure 4: Counting process.

2. Nt − Ns ∼ Po(λ · (t − s)) for all t > s > 0

3. Nt − Ns is independent of Nr for all t > s > 0 and 0 ≤ r ≤ s

where Nt − Ns is the number of events that have occured in (s, t]. The parameter λ is called the rate
because E[ Nt ] = t · λ,
E[ Nt ] = E[ Nt − N0 ] = λ · t
Example 13: Poisson Process

Jana sends Ioan 10 text messages per hour after 10am. We


want to find the probability that Ioan has exactly 18 texts by
noon and 70 texts at 5pm. This problem can be modelled by a
Poisson process with rate 10, where the desired probability is,

P( N2 = 18, N7 = 70)
w2022-math447: stochastic processes 33

By definition,

{ N2 = 18, N7 = 70} = { N2 = 18, N7 − N2 = 52}

Since [0, 2] and (2, 7] are disjoint,

P ( N2 = 18, N7 = 70) = P ( N2 = 18, N7 − N2 = 52)


= P ( N2 = 18) · P ( N7 − N2 = 52)
= P ( N2 = 18) · P ( N5 = 52)
 −20   −50
· 2018 · 5052

e e
= ·
18! 52!
= 0.0045

Definition 110 (Poisson Process – 1b). A Poisson process with parame-


ter λ ≥ 0 is a counting process ( Nt )t≥0 with,

N0 = 0 and Nt − Ns ∼ Po(λ · (t − s))

for all t > s > 0, conditionally on Nr for 0 ≤ r ≤ s.

Lemma 111. Nt − Ns ∼ Nt−s 20 20


Be careful when thinking about this
conditionally.
Proof. Nt−s − N0 ∼ Po(λ · (t − s − 0)) = Po(λ · (t − s)).

Theorem 112. Let ( Nt )t≥0 be a Poisson process with parameter λ. Then,

( Nt+s − Ns )t≥0 ∼ Po(λ) for s > 0

and ( Nt+s − Ns )t≥0 is a Poisson process.

Proof. We need to show that the translated process is probabilistically


equivalent to the original process. It suffices to show that,

Yt := Nt+s − Ns

is a Poisson Process (Definition 1b). Clearly, Y0 = 0. Now,

Yt − Yr = Nt+s − Ns − ( Nr+s − Ns )
= Nt+s − Nr+s

so Yt − Yr ∼ Po(λ · (t − r )) given ( Nq ) for 0 ≤ q ≤ r. Furthermore,


( Nq ) contains all information in (Yq ) = ( Nq+s − Ns ) for 0 ≤ q ≤ r,

Definition 2
Definition 113 (Interarrival Time). The interarrival times ( Xk )k≥0 are
the times between consecutive jumps of a counting process.
w2022-math447: stochastic processes 34

Definition 114 (Arrival Time). The arrival times (Sk )k≥0 are the times, Note on Arrival Time:
SK − Sk−1 = Xk (where S0 = 0)
lim Nt ̸= lim Nt
t→s+ t→s−

at which ( Nt )t≥0 increases.

Definition 115 (Poisson Process – 2). Let ( Xn )n≥0 be a sequence of i.i.d


exponential random variables with parameter λ. For t > 0,
Figure 5: Arrival times S1 , S2 , · · · and
Nt = max{n | X1 + · · · Xn ≤ t} interarrival times X1 , X2 , · · · .

with N0 = 0. Then, ( Nt )t≥0 is a Poisson process with parameter λ. Note on Exponential Distribution.

n Let Xi ∼ Exp(λi ). Then,


Sn = ∑ Xi n∈N
i =1 P ( X ≥ t ) = e−λ·t

defines a sequence (Sn ) of arrival times of the process, where Sk is the time R∞
E[ f ( X )] = λe−λt f (t)dt(⋆)
of the kth arrival. The interrarival time between k − 1 and k is, 0

X k = S k − S k −1 k ∈ N with S0 = 0 E[ X ] = 1
λ

Lemma 116. If ( Nt )t≥0 is a rate λ · t Poisson Process, as in the interarrival G ( s ) = E[r X ] =


R∞
λe−λt r t dt = λ
0 λ−log r
definition, then Nt ∼ Po(λt) for all t ≥ 0.

Proof. Let ( Nt ) be a rate λ · t Poisson Process. Then, P (min { X1 , · · · , Xn } > t) = e−(λ1 +...+λn )t

P( Nt = k) = P({Sk ≤ t} ∩ {Sk+1 > t}) (k ∈ N ) λi


| {z } P ( M = Xi ) = λ1 +···+λn where M = mini { Xi }
:= A

By the Total Probability Rule,


P( X > s + t | X > s) = P( X > t) for s, t > 0

P( Nt = k) = E[ P( A | Sk )]
= E 1{ Sk ≤ t } · P ( A | Sk ) + 1{ Sk > t } · P ( A | Sk )
 

= E 1{Sk ≤t} · P({Sk ≤ t} ∩ {Sk+1 > t} | Sk )]




= E 1{Sk ≤t} · P(Sk+1 > t | Sk )]




= E 1{Sk ≤t} · P(Sk+1 − Sk > t − Sk | Sk )]




= E 1{Sk ≤t} · P( Xk+1 > t − Sk | Sk )]




Applying Property (⋆) of the exponential distribution,


Z t
E 1{Sk ≤t} · P( Xk+1 > t − Sk | Sk )] = e−λ(t− x) f ( x )dx

0
λk x k−1 −λx
Z t
= e−λ(t− x ) e dx
0 ( k − 1) !
λk · tk · e−λt
=
k!
where the density f ( x ) of Sk can be found using the PGF21 . 21
Sk is Gamma(k, λ) distributed, so

λk x k−1 −λx
f (x) = e
( k − 1) !
w2022-math447: stochastic processes 35

Definition 117 (Memoryless). A random variable X is memoryless if,

P( X > s + t | X > s) = P( X > t)

Lemma 118. The exponential distribution is memoryless22 , 22


In fact, it is the only continuous
distribution that is memoryless.
Proof. Let X ∼ Exp(λ). Then for all t > s > 0,

e−λ(t+s)
P( X > t + s | X > s) = = e−λt
e−λs
Therefore, X − s ∼ Exp(λ) for X > s.

Lemma 119. The sum of independent exponentials is exponential.

Proof. Let Xi ∼ Exp(λi ). Then,

P (min ( X1 , . . . , Xn ) > t) = P ( X1 > t, . . . , Xn > t)


= P ( X1 > t ) . . . P ( X n > t )
= e − λ1 t . . . e − λ n t
= e−(λ1 +...+λn )t Advantages of Definition 1:
• Ease of construction and calculation
• Explicit law for interarrival times
Advantages of Definition 2:

Example 14: Poisson Process • Independence of increments


• Explicit law for statistics of Nt
Buses arrive at a bus stop according to a Poisson process with
parameter λ = 6. Suppose that you arrive at 1pm. Then,

1. Probability of waiting at least 15 minutes


   
1 3
P S1 > = Exp −
|{z} 4 2
= X1

2. Probability that exactly 3 buses arrive in the next hour

63
P( N1 = 3) =
e6 · 3!

3. Expected time to wait for the bus


 
1
E S1 = = 10 min
|{z} 6
= X1

4. 18 buses arrive between 12:50pm and 1:00pm. The expected


time to wait for the bus does not depend on the past
 
1
E S1 = = 10 min
|{z} 6
= X1
w2022-math447: stochastic processes 36

Definition 3
Definition 120 (Poisson Process – Definition 3). A Poisson process
with parameter λ is a counting process ( Nt )t≥0 satisfying23 , 23
There cannot be infinitely many
arrivals in a finite interval, and in an
1. N0 = 0 infinitesimal interval there may occur at
most one event.
2. ( Nt ) has stationary and independent increment

3. P( Nh = 0) = 1 − λh + o (h)

4. P( Nh = 1) = λh + o (h)

5. P( Nh > 1) = o (h)

where f (h) = o ( g(h)) means that,

f (h)
lim =0
h →0 g(h)

Applications of Poisson Processes Note on Thinning:


Let Z be a Poisson process with pa-
rameter λ. Suppose that ( Xi )i≥0 is a se-
Thinning and Superposition
quence of i.i.d Bernoulli trials with suc-
cess parameter p. If ( Xi ) is independent
Definition 121 (Thinning Poisson Process). Let ( Nt )t≥0 be a Poisson of ( Nt ), then Y = ∑iZ=1 Xi ∼ Po(λp).
process with parameter λ. Assume that each arrival, independent of other To see this, compute the probability
arrivals, is marked as a "Type k" (k ∈ [n]) event with probability pk (where generating function of Y.
(k)
∑ pk = 1). Let Nt be the number of "Type k" events in [0, t]. Then,
 
(k)
Nt is a Poisson process with rate λpk
t ≥0
   
(i ) ( j)
Nt and Nt are independent (0 ≤ i ̸= j ≤ n)
t ≥0 t ≥0

Each process is called a thinned Poisson process.

Definition 122 (Superposition Process Poisson). Assume that,


   
(1) (n)
Nt , · · · , Nt
t ≥0 t ≥0

are n independent Poisson processes with parameters λ1 , · · · , λn . Let,


(1) (n)
N = Nt + · · · + Nt

for t ≥ 0. Then, ( Nt )t≥0 ∼ Po(t · (λ1 + · · · + λn )).

Example 15: Birthday Problem

The Birthday Problem asks: "If people enter a room one by


one, how many people are in the room the first time two peo-
ple share a birthday, ignoring year and leap days?"
w2022-math447: stochastic processes 37

This problem can be embedded in a superposition of a Pois-


son process. People enter a room according to a Poisson pro-
cess ( Nt )t≥1 with rate λ = 1. Each person is independently
and uniformly marked with one of 365 birthdays.

Let ( Xi )i≥0 be the interarrival sequence for the process of peo-


ple entering the room. The Xi are i.i.d exponential with mean
1. Let T be the first time when two people in the room share
the same birthday. If K people are in the room at that time,

K
T= ∑ Xi
i =1

( Xi ) is independent of K. Taking the expectation,

E [ T ] = E [ K ] · E [ X1 ] = E [ K ]
| {z }
=1

Let Zk be the time when the second person marked with birth-
day k enters the room. Then, the first time that two people in
the room have the same birthday is,

T = min Zk
1≤k ≤365

Equivalently, Zk is the arrival time of the second event of a


Poisson process. Moreover, Zk has a gamma distribution with
parameters and density,

1 te−t/365
n=2 λ= f (t) = ( t > 0)
365 3652
The cumulative distribution function is,
Z t −s/365
se e−t/365 (365 + t)
P ( Z1 ≤ t) = ds = 1 −
0 3652 365
w2022-math447: stochastic processes 38

This gives,
 
P( T > t) = P min Zk > t
1≤k ≤365

= P( Z1 > t, · · · , Z365 > t)


= P ( Z1 > t)365
t 365 −t
 
= 1+ e ( t > 0)
265

Therefore, the desired birthday expectation is,


Z ∞ Z ∞ 365
t
E(K ) = E( T ) = P( T > t)dt = 1+ e−t dt
0 0 365

Poissonization and Depoissonization


Poisson processes can be used to prove theorems for discrete time
processes. Given ( Xk )k≥0 , do the following:

1. (Poissonize) Let Xt∗ = X Nt for a Poisson process ( Nt )t≥0 to embed


the discrete time process ( Xk ) in continuous time.

2. (Analyze) Show that Xt∗ has the desired property.

3. (Depoissonize) Transfer the chain to the discrete time process.

Theorem 123 (Recurrence via Poissonization). Let ( Xk )k≥0 be a time-


homogeneous Markov chain. If ( Nt )t≥0 is a Poisson process with λ = 1,
then a state x is recurrent if and only if,
Z ∞
P ( X Nt = x ) dt = ∞
0

Proof. It suffices to show the following,


Z ∞ ∞

0
P ( X Nt = x ) dt = ∑ pk (x, x)
k =0
w2022-math447: stochastic processes 39

Since P ( Xk = x ) = P ( X Nt = x | Nt = k),
Z ∞ Z ∞ ∞

0
P ( X Nt = x ) dt =
0
∑ P( Nt = x) · P (XNt = x | Nt = k) dt
k =0
∞ Z ∞
= ∑ P ( X Nt = x | Nt = k) ·
0
P( Nt = x )dt
k =0
∞ Z ∞
= ∑ P ( Xk = x ) ·
0
P( Nt = x )dt
k =0
∞ Z ∞ −t k
e t
= ∑ pk ( x, x ) ·
k!
dt
k =0 |0 {z }
=1

= ∑ pk (x, x)
k =0

Example 16: Poissonized Simple Random Walk on Z

Let ( Xk )k≥0 be a simple random walk on Zd .

1. (Poissonize) Let ( Nt ) be a Poisson process with λ = 1. As-


sume that each arrival is marked ”Lt ” if the chain moves to
the left, and ”Rt ” if the chain moves to the right. Then,

λ·t
   
(L ) t
( Nt t )t≥0 ∼ Po = Po
2 2
λ·t
   
(R ) t
( Nt t )t≥0 ∼ Po = Po
2 2
( Lt ) ( Rt )
Moreover, ( Nt ) are independent. Thus,
) and ( Nt
   
(R ) (L ) t t
Xt∗ := X Nt = ( Nt t )t≥0 − ( Nt t )t≥0 ∼ Po − Po
2 2
| {z } | {z }
+1(”Rt ”) −1(”Lt ”)

2. (Analyze) We want to determine if Xt∗ is recurrent,

P ( X Nt = 0) = P( Lt = Rt )

= ∑ P( Lt = k | Rt = k ) · P( Rt = k )
k
!2

e−t/2 (t/2)k
= ∑ (k!)
by independence
k =0
−t
=e · I0 (t) where I0 (t) is the Bessel function
w2022-math447: stochastic processes 40


Therefore, P( X Nt = 0) · 2πt → 1 as t → ∞ using the Stir-
ling Formula or Bessel function properties. Consequently,
Z ∞ Z ∞
dt
P ( X Nt = 0) dt = ∞ because √ =∞
0 1 2πt

3. (Depoissonize) Apply "Recurrence via Depoissonization".

Example 17: Poissonized Simple Random Walk on Zd

Let ( Xk )k≥0 be a simple random walk on Zd .

1. (Poissonize) Let ( Nt ) be a Poisson process with λ = 1.


Applying Thinning as in the example on Z,
 
(1) (d)
Xk∗ := X Nt = X Nt , · · · , X Nt

(i ) ( j)
where ( X Nt ) and ( X Nt ) are independent if i ̸= j, and,
 
(i ) t
( X Nt ) ∼ Po for all i ∈ [n]
d

2. (Analyze) We want to determine if Xt∗ is recurrent,


   d
(1)
P X Nt = ⃗0 = P X Nt = 0 by independence
 !2  d
∞ −t/2d ( t/2d )k
e
= ∑ 
k =0
(k!)

R∞ d
Therefore, 1
√ 1 dt < ∞ if and only if d ≥ 3 as
2πt/d

r d
2πt  
· P X Nt = ⃗0 → 1
d

3. (Depoissonize) Apply "Recurrence via Poissonization".

Order Statistics
If a Poisson process contains exactly n events in [0, t], then the un-
ordered times of those events are uniformly distributed on [0, t].

Remark 124 (Conditional on 1 Event). P (S1 ≤ s | Nt = 1) = st .


w2022-math447: stochastic processes 41

Proof. Using the definition of Conditional Probability,

P (S1 ≤ s, Nt = 1)
P (S1 ≤ s | Nt = 1) =
P ( Nt = 1)
P ( Ns = 1, Nt = 1)
=
P ( Nt = 1)
P ( Ns = 1, Nt − Ns = 0)
=
P ( Nt = 1)
P ( Ns = 1) · P ( Nt−s = 0)
=
P ( Nt = 1)
e−λs λse−λ(t−s)
=
e−λt λt
s
=
t
since Nt−s ∼ Po(λ · (t − s)).

Definition 125 (Order Statistic). Let U1 , · · · , Un be an i.i.d sequence of


Unif([0, t]) random variables. Their joint density function is,

1
f U1 ,...,Un (u1 , . . . , un ) =
tn
for 0 ≤ u1 , . . . , un ≤ t. Arrange Ui in increasing order,

U(1) ≤ U(2) ≤ · · · ≤ U(n)

U( k ) is the kth smallest of the Ui . The ordered sequence,


 
U(1) , . . . , U(n)

is the order statistics of the original sequence. Its joint density is,

n!
f U(1) ,...,U(n) (u1 , . . . , un ) =
tn
0 ≤ u1 < · · · < un ≤ t.

Theorem 126 (Order Statistics via Poissonization). Let S1 , S2 , · · · be


the arrival times of a Poisson process with parameter λ. Conditional on
Nt = n, the joint distribution of (S1 , · · · , Sn ) is the distribution of the order
statistics of n i.i.d uniform random variables on [0, t].

n!
f ( s1 , . . . , s n ) =
tn
for 0 < s1 < · · · < sn < t. Equivalently, let U1 , · · · , Un be an i.i.d
sequence of Unif([0, t]) random variables. Then, conditional on Nt = n,
 
(S1 , . . . , Sn ) and U(1) , . . . , U(n)

have the same distribution.


w2022-math447: stochastic processes 42

Proof. See Dobrow 6.5 (p.245).

Corollary 127. Results for arrival times offer a new method for simulating
a Poisson process with parameter λ on an interval [0, t]:

1. Simulate the number of arrivals N in [0, t] from Po(λ · t)

2. Generate N i.i.d random variables uniformly distributed on (0, t)

3. Sort the variables in increasing order to give the Poisson arrival times

Spatial Poisson Processes


The spatial Poisson process is a model for the distribution of events
in a two- or higher-dimensional space24 . For d ≥ 1 and A ⊆ Rd , let 24
The uniform distribution arises for
NA denote the number of points in the set A. We write | A| for the the spatial process in a similar way to
how it does for the one-dimensional
size of A (i.e., area in R2 and volume in R3 ). Poisson process. Given a bounded set
A ⊆ Rd , conditional on there being n
Definition 128 (Spatial Poisson Process). A collection of random vari- points in A, the location of the points
ables ( NA ) A⊆Rd is a spatial Poisson process with parameter λ if, are uniformly distributed on A.

1. NA ∼ Po(λ · | A|) for each bounded set A ⊆ Rd

2. NA and NB are independent random variables if A and B are disjoint

The definition of a spatial Poisson process can be generalized to a


Poisson random measure as follows,

Definition 129 (Poisson Random Measure). The Poisson random


measure is the unique function N : B → N0 so that for any A ∈ B ,

1. If NA is the number of points in A, then,


Z 
NA ∼ Po f ( x )dx
A

2. If A1 , · · · , Ak ∈ B are disjoint, compact sets,

( N ( A1 ), N ( A2 ), · · · , N ( Ak ))

are independent, Poisson distributed random variables with parameters,


Z
f ( x )dx 1≤j≤k
Aj

where f : Rd → R is called the "intensity" of the process.

Remark 130. Let f : Rd → R be a continuous, non-negative function.

1. (Number of Points in I) For any closed rectangle I ⊂ Rd ,


Z 
NI ∼ Po f ( x )dx Figure 6: Samples of a spatial Poisson
I process with parameter λ = 100 on the
square [0, 1] × [0, 1].
w2022-math447: stochastic processes 43

2. (Location of Points in I) Define an i.i.d collection X I of points,


R
f ( x )dx
( X j ) j∈[1,NI ] with X j ∼ P( X j ∈ B) = RB
I f ( x ) dx

3. (Spread of Points in Rd ) Repeat over a partition into rectangles,

X = ∪I XI and NA := |{ X ∩ A}| for each A ∈ B

N is a Poisson random measure with intensity f 25 . 25


Notes on Spatial Processes:
1. If d = 1 and f is constant at λ > 0,
Definition 131 (Non-Homogeneous Poisson Process). A counting then N[0,t] ↔ Nt is the Poisson
process ( Nt )t≥0 is non-homogeneous Poisson with intensity λ(t) if, process with rate λ.
2. If d > 1 and f is constant at λ > 0,
1. N0 = 0 then this is the homogeneous spatial
Poisson process.
2. For all t > 0, Nt has a Poisson distribution with mean,
Z t
E ( Nt ) = λ( x )dx
0

3. For 0 ≤ q < r ≤ s < t. Nr − Nq and Nt − Ns are independent

Continuous-Time Markov Chains

Holding Times
Definition 132 (Continuous-Time Markov Property). The Markov
property for continuous-time chains ( Xt )t≥0 with discrete state space S is,

P Xt = j | ( Xu )0≤u<s and Xs = i = P ( Xt = j | Xs = i )

for all t ≥ s and states i, j ∈ S.

Definition 133 (Time-Homogeneous). A continuous-time Markov chain


( Xt )t≥0 with discrete state space S is time-homogeneous if,

P ( X t + s = j | X s = i ) = P ( X t = j | X0 = i )

for s ≥ 0. The transition probabilities can be arranged in a function,

Pij (t) = P ( Xt = j | X0 = i )

which is the analog of pt (i, j) in the discrete setting26 . 26


P(t) is not the transition matrix P̃ of
the embedded chain, since t ∈ R.
Definition 134 (Chapman-Kolmogorov). A continuous-time Markov
chain ( Xt )t≥0 with transition function P(t) satisfies that,

P(s + t) = P(s) · P(t) for s, t ≥ 0.


w2022-math447: stochastic processes 44

Proof. By conditioning on Xs ,

Pij (s + t) = P ( Xs+t = j | X0 = i )
= ∑ P ( Xs+t = j | Xs = k, X0 = i ) · P ( Xs = k | X0 = i )
k
= ∑ P ( X s + t = j | X s = k ) · P ( X s = k | X0 = i )
k
= ∑ P ( X t = j | X0 = k ) · P ( X s = k | X0 = i )
k
= ∑ Pik (s) · Pkj (t)
k
= [P(s) · P(t)]ij

Definition 135 (Holding Time). The holding time Ti at a state i is the


length of time that a continuous-time Markov chain started in i stays in i
before transitioning to a new state.

Theorem 136. Let Ti be the holding time at state i. Then Ti ∼ Exp(λi ).

Proof. The exponential distribution is the only continuous distribu-


tion that is memoryless, so it suffices to prove that Ti is memoryless.
Let s, t ≥ 0. Suppose that the chain starts in i. Then,

{ Ti > s} = { Xu = i for u ∈ [0, s]}

Moreover, { Ti > s + t} implies that { Ti > s}, so,

P( Ti > s + t | X0 = i ) = P({ Ti > s + t} ∩ { Ti > s} | X0 = i )


| {z } | {z }
A B

Applying Bayes’ Law with the conditional probability measure,

P( A ∩ B | X0 = i ) = P( A | B ∩ { X0 = i }) · P( B | X0 = i )

and using homogeneity and the Markov property,

= P( Ti > s + t | { Ti > s} ∩ { X0 = i }) · P( Ti > s | X0 = i )


= P ( Ti > s + t | Xu = i for u ∈ [0, s]) · P ( Ti > s | X0 = i )
= P ( Ti > s + t | Xs = i ) · P ( Ti > s | X0 = i )
= P ( Ti > t | X0 = i ) · P ( Ti > s | X0 = i )

shows that Ti satisfies the definition of memoryless,

P( Ti > s + t | X0 = i ) = P ( Ti > t | X0 = i ) · P ( Ti > s | X0 = i )


w2022-math447: stochastic processes 45

Definition 137 (Absorbing State). A state i is absorbing if the parameter


of the exponential distribution for the holding time Ti is zero,
1
E[ Ti ] = =∞
0
Definition 138. A continuous-time Markov chain is explosive at a time
t ∈ R+ if there are infinitely-many transitions in all neighborhoods27 , 27
In Dobrow, a state i ∈ S is called
explosive if the holding time parameter
(t − ϵ, t) for ϵ > 0 arbitrary λi of Ti is infinite, i.e.,
1
E[ Ti ] = =0
The evolution of a continuous-time Markov chain which is neither ∞
absorbing nor explosive can be described as follows,

1. Starting from i, the process stays in i for an exponentially dis-


tributed length of time, which is 1/qi on average

2. The chain hits a new state j ̸= i, with probability pij

3. The process stays in j for an exponentially distributed length of


time, which is 1/q j on average. It then hits a new state l ̸= j

Example 18: Continuous-Time Weather Chain

Define the state space S = {rain, snow, clear}. Assume that,

1. Rain lasts, on average, 3 hours

2. Snow lasts, on average, 6 hours

3. Clear weather lasts, on average, 12 hours


w2022-math447: stochastic processes 46

Changes in weather states are described by the matrix,

rain snow clear


0 1/2 1/2
 
rain
P̃ = 3/4 0 1/4  snow
1/4 3/4 0 clear

Let Xt be the weather at time t. Then, ( Xt )t≥0 is a continuous-


time Markov chain. Moreover, P̃, as well as the exponential
parameters (λr , λs , λc ) = (1/3, 1/6, 1/12), specify P(t), i.e.,
P ( Xt1 = i1 , . . . , Xtn = in ) for n ≥ 1, states si and times ti ≥ 0.

Definition 139 (Embedded Chain). A sequence (Yk )k≥0 is the embedded


chain of a continuous-time process ( Xk ) if Yk is the kth state visited.

Remark 140. The transition matrix P̃ of the embedded chain of a continuous-


time process is a stochastic matrix whose diagonal entries are zero28 . 28
The process can never transition from
state i to state i. It remains in i for an
A continuous-time Markov chain can also be described by specify- exponential amount of time, after which
it transitions to another state j (j ̸= i).
ing transition rates between pairs of states. Suppose that for for each
state i, there are independent alarm clocks associated with each of the
states that the process can visit after i. Then,

1. If j can be hit from i, the alarm Cij associated with (i, j) will ring
after an exponentially distributed length of time with parameter qij

2. The minimum time for one Cij (i ̸= j) to finish ringing is the


minimum of independent exponentials, which was proven to be
exponential with parameter qi = ∑k qik

3. When the process first hits i, the clocks start simulataneously,


where the first alarm that rings determines the next state

4. If the (i, j) clock rings first and the process moves to j, a new set of
exponential alarm clocks are started with rates q j1 , q j2 , · · ·
w2022-math447: stochastic processes 47

Infinitesimal Generator
Definition 141 (Rate Matrix). The rate matrix (Q) for a continuous-
time Markov chain is defined as follows,

q · P̃
i ij i ̸ = j
(Q)ij :=
0 i=j
Note on Generator Matrix:

q
ij i ̸= j A continuous-time Markov chain
= ( Xt )t≥0 with transition function P(t)
0 i=j and generator Q satisfies that,
1. P′ (t) = P(t) · Q
Definition 142 (Generator Matrix). The generator matrix Q for a 2. P′ (t) = Q · P(t)
continuous-time Markov chain is defined as follows29 , These are called the "Kolmogorov
Forward, Backward Equations".

i ̸= j
29
If the smallest signed value of a
qij


generator matrix Q is finite, then we say
Qij := − qi i=j the chain is "bounded rate". Moreover,
any finite chain is bounded rate.

 |{z}

− ∑k qik

Corollary 143. The generator is not a stochastic matrix. Diagonal entries


are negative, entries can be greater than 1, and rows sum to 0.

Corollary 144. A continuous-time Markov chain ( Xt )t≥0 satisfies that,



 P C = Ĉ  = qij = qij i ̸= j
i,j qi ∑k qik
P̃ij :=
0 i=j

where P̃ is the embedded chain, and {Cij = Ĉ } is the event that Cij is the
first alarm that rings and determines the next state30 . 30
The holding time parameters λi are
equal to qi .
Corollary 145. A continuous-time Markov chain ( Xt )t≥0 with transition
function P(t) and generator Q satisfies that,

1 t2 t3 Note on Matrix Exponential:
P (t) = etQ = ∑ n!
(tQ)n = I + tQ + Q2 + Q3 + · · ·
2 6
Let A be a k × k matrix. Then,
n =0 ∞
1 n
eA = ∑ n!
A
n =0
1 2
Classification of States = I+A+ A +···
2

For characterizing the states of a continuous-time Markov chain, the


definitions of accessibility, communication, and irreducibility are
defined as in the discrete case. For example,

Definition 146 (Irreducible). A continuous-time Markov chain with


transition function P(t) is irreducible if for all i, j ∈ S,

Pij (t) > 0 for some t > 0

Lemma 147. If Pij (t) > 0 for some t > 0, then Pij (t) > 0 for all t > 0.
w2022-math447: stochastic processes 48

Proof. Suppose that Pij (t) > 0 for some t > 0. Then there exists a
path from i to j in the embedded chain, and, since the exponential
distribution is continuous, for any time s there is positive probability
of reaching j from i in s time units31 . 31
Formally, for s ≥ 0, this means that,
1. Pij (t + s) > 0
Corollary 148. All states of a continuous-time Markov chain are aperiodic.
2. Pij (t − s) > 0

Stationary Distributions
⃗ is
Definition 149 (Stationary Distribution). A probability distribution π
32
a stationary probability distribution for a continuous-time chain if , 32
As in the discrete case, the limiting
distribution, if it exists, is a stationary
π ⃗ · P(t)
⃗ =π distribution. However, the converse is
not necessarily true and depends on the
Corollary 150. The stationary distribution π ⃗ is not the same as the station- class structure of the chain.
⃗ 33 . However, for all j,
ary distribution of the embedded chain, ψ 33
πi is the long-term proportion of
time that the process spends in state
πj qj ψ( j)/q j i. Conversely, ψ(i ) is the long-term
ψ( j) = πj = proportion of transitions that the
∑k π (k )qk ∑k ψ(k)/qk
process makes into state i.
Theorem 151 (Fundamental Limit Theorem). Let ( Xt )t≥0 be a finite,
irreducible, continuous-time Markov chain with transition function P (t).
⃗ , which is the limiting
Then, there exists a unique stationary distribution π
distribution of the chain. That is, for all j ∈ S,

lim Pij (t) = π j for all i


t→∞

Equivalently,
lim P (t) = Π
t→∞
where Π is a matrix all of whose rows are equal to π
⃗.
Proof. The proof of the Fundamental Limit Theorem is omitted.

⃗ is a stationary distribution of a
Theorem 152. A probability distribution π
continuous-time Markov chain with generator Q if and only if,

⃗ Q = ⃗0
π ⇐⇒ ∑ πi Qij = 0 for all j ∈ S
i

⃗ P(t) for all t ≥ 0. Differentiating at t = 0,


⃗ =π
Proof. Assume that π

⃗ P ′ (0) = π
⃗0 = π ⃗Q

⃗ Q = ⃗0. Right-multiplying by P(t),


Conversely, assume that π
⃗0 = π ⃗ P′ (t) for t ≥ 0
⃗ QP(t) = π

by the Kolmogorov backward equation. This implies that π ⃗ P(t) is


constant, that is, π ⃗ P(0) for all t. But P(0) = I 34 , so,
⃗ P(t) = π 34
Observe that,
(
1, if i = j
⃗ P(t) = π
π ⃗ P (0) = π
⃗I =π
⃗ Pij (0) = P( Xt = j | X0 = i ) =
0, if i ̸= j
w2022-math447: stochastic processes 49

Poisson Subordination
Definition 153 (Subordination). Let ( Nt )t≥0 be a Poisson process with
parameter λ. Let (Yt )t≥0 be a finite-state, irreducible, discrete-time process
with transition matrix R. Define a continuous-time process ( Xt )t≥0 by
Xt := YNt 35 . The process ( Xt ) is subordinated to a Poisson process. 35
Transitions for the Xt process occur
at the arrival times of the Poisson
Remark 154. Let P(t) be the transition function of ( Xt ). Then, process. From state i, the process holds
an exponentially distributed amount
Pij (t) = P( Xt = j | X0 = i ) of time with parameter λ, and then
transitions to j with probability Rij

= ∑ P (Xt = j | { Nt = k} ∩ {X0 = i}) · P ( Nt = k | X0 = i)
k =0

= ∑ P (Yk = j | { Nt = k} ∩ {X0 = i}) · P ( Nt = k)
k =0

= ∑ P (Yk = j | Y0 = i) · P ( Nt = k)
k =0

e−λt (λt)k
= ∑ Rijk ·
k!
k =0

We have just seen how to construct a continuous-time Markov


chain from a discrete-time chain and a Poisson process. We will now
see that many continuous-time chains can be represented as chains
subordinated to a Poisson process.

Remark 155. Let Q be the generator of a continuous-time Markov chain


with holding time parameters {qi }. If qi ≤ λ for all i, then we can define,
1
R= Q + I where λ = max qi
λ i

which is a stochastic matrix. The transition function is,

P (t) = etQ = e−λt etQ eλt = e−λt et(Q+λI )



1 k
= e−λt ∑k!
t ( Q + λI )k
k =0
∞  k −λt
1 e (λt)k
= ∑ Q+I
k =0
λ k!

e−λt (λt)k
= ∑ Rk k!
k =0

R is not the matrix of the embedded Markov chain:


 
q /q , for i ̸= j q /λ, for i ̸= j Note on Poisson Subordination:
ij i ij
P̃ij = Rij =
0, for i = j  1 − qi /λ, for i = j 1. From state i, wait an exponential
length of time with rate λ
qi
2. Flip a coin with P( H ) = λ
Theorem 156. For a Markov chain subordinated to a Poisson process, the 3. If heads, transition according to R.
discrete R-chain has the same stationary distribution as the original chain. Otherwise, stay at i and repeat
w2022-math447: stochastic processes 50

⃗ λ( R − I ) = λπ
⃗Q = π
Proof. π ⃗ R − λπ
⃗.

Corollary 157. The following are equivalent,

⃗ for all t ≥ 0
⃗ P (t) = π
1. π

⃗ Q = ⃗0
2. π

⃗ R̃ = π
3. π ⃗ for some λ = maxi qi

Example 19: Totally Asymmetric Exclusion Process

The dynamics of the totally asymmetric simple exclusion


process on Z are as follows,

1. The alarm clock for a particle rings as a Poisson process of


rate 1 at each site independently

2. When the bell at site i rings, if there is a particle at site i


and a hole at site i − 1, they exchange

Equivalently, each particle in the system tries to jump to the


left at rate 1, and the jump succeeds whenever the site to the
left is unoccupied. Since it is possible for infinitely many par-
ticles to move instantaneously, and the minimum of infinitely-
many exponential clocks is zero, there is no embedded chain
and holding representation for this chain.

Birth-and-Death Chains Note on Yule Process:


The Yule process ia a birth-and-death
process in which the birth rate is x · λ
Definition 158 (Yule Process). Let Yt be the population size at time t. The
and the death rate 0.
Yule process (Yt )t≥0 is a continuous-time branching process where36 , 36
The rate represents the expected
time until the next division. Thus,
1. Y0 = 1, and the rate of the process is q x,x+1 = x · β P(min{C1 , · · · , Cx } > t) = e− xβ .

2. Each individual gives birth to an offspring at a constant rate β

3. Each individual gives birth independently of other individuals

4. The distribution of each division time is Exp(β)

Remark 159. A Yule process (Yt )t≥0 satisfies the following identity,
(1) (2)
Yt+τ ∼ Yt + Yt and τ ∼ Exp( β)
(Yt ) cannot be written as a Poisson
(1) (2)
where τ is the time of the first division, and Yt , Yt are Yule processes. subordinated discrete time walk.
However, it exists for all time without
exploding. Conversely, if q x,x+1 = β · x2 ,
then the process explodes in finite time.
w2022-math447: stochastic processes 51

Remark 160. Define the following,

Arrival i is the birth of member i + 1 of the population


Ĉi is the minimum time for one Ci to finish ringing
Zi is the time of arrival i
Tn := min{t | Yt = n}

We need to compute P( Tn ≤ t),


!
n −1
P( Tn ≤ t) = P ∑ Ĉi ≤ t where Ĉi ∼ Exp(i · β)
i =1
−1
!
n\
=P { Zi ≤ t} where Zi ∼ Exp( β)
i =1
n −1
= ∏ P(Zi ≤ t) since (Zi )i∈[n−1] i.i.d
i =1
= (1 − e−tβ )n−1 by induction

Remark 161. We can write Yt = ∑i∞=1 1{Tj ≤t} . Then,

∞ ∞ ∞
" #
  i −1
E[Yt ] = E ∑ 1{Tj ≤t} = ∑ P(Tj ≤ t) = ∑ 1 − e−t· β = et· β
i =1 i =1 i =1

using the sum of a geometric series.

Definition 162 (Birth-and-Death). Birth-and-death processes are a


class of time-reversible, continuous-time Markov chains. They satisfy,

1. Births occur from i to i + 1 with rate β

2. Deaths occur from i to i − 1 with rate 1

making q x,x+1 = β · x and q x,x−1 = x37 . 37


This represents an exponential 1 clock
and a population of size x.

Theorem 163. If β > 1, then the probability of extinction is less than 1. The following questions are equivalent,
1. How many times is the Exp(β)
Proof. The proof is by a Poisson Process argument, where, birth clock C of an individual the
minimum before the Exp(1) death
clock rings?
   
β 1
and 2. How many offspring does the
1+β 1+β
individual produce?
are the parameters of each thinned process. Computing the number
of divisions before death gives that E[Y ] = β.

Corollary 164. Let (Yt ) be a birth-and-death process with birth rate β and
death rate 1. If β ≤ 1, the underlying branching process goes extinct,
a.s
Yt −→ 0 where 0 is an absorbing state
t→∞
w2022-math447: stochastic processes 52

Conversely, when β > 1, there exists a random variable Y∞ ,


a.s
Yt → Y∞
t→∞

and P(Y∞ = 0) is the extinction probability of the branching process,

P(Y∞ = ∞) = 1 − P(Y∞ = 0)

Definition 165. Let (Yt ) be an irreducible continuous-time Markov chain.


(Yt ) is null-recurrent if and only if E[τ0+ ] = ∞38 . 38
Otherwise, (Yt ) is positive-recurrent.
This definition is the same as the
Corollary 166. The underlying discrete-time behavior does not determine discrete case.
positive or null recurrence for the chain. The holding time parameters can be
made sufficiently large to see this.

Theorem 167. An irreducible, continuous-time Markov chain is positive


⃗ . That
recurrent if and only if there exists a stationary probability vector π
is, ∑i∈S πi = 1 and π⃗ Q = 0, where Q is the generator of the chain.

Proof. The proof uses Martingales, and it was not seen in class.

Example 20: Reflected Birth-and-Death Chain

We can modify the birth-and-death chain to make the bound-


ary condition at zero reflect. Equivantly, we define,

q x,y

 x ̸= 0
Q′x,y = 1 x = 0 and y ̸= 0


0 x = 0 and y = 0

Q. Is the reflected birth-and-death chain recurrent?


The modified chain is recurrent if and only if β ≤ 1 since the
process goes extinct, i.e., reaches zero, with probability 1.

Q. Is the reflected birth-and-death chain positive-recurrent?


The modified chain is positive recurrent if and only if β < 1.

Martingales

Examples of Martingales
Definition 168 (Martingale). A martingale ( Mt )t≥0 is a stochastic
process that satisfies, for all t ≥ 0,

1. E[ Mt | ( Mr )r∈[0,s] ] = Ms for all 0 ≤ s ≤ t

2. E[| Mt |] < ∞

where Fk = σ ( M0 , · · · , Ms ) = ( Mr )r∈[0,s] is the history of the chain39 . 39


The condition that E[| Mt |] < ∞ is
omitted for the examples in this section.
w2022-math447: stochastic processes 53

Corollary 169. A discrete-time martingale ( Mt ) satisfies,


1. E[ Mt | Y0 , · · · , Mt−1 ] = Ms for all t ≥ 0

2. E[| Mt |] < ∞
Remark 170. If ( Mn )n≥0 is a martingale, then E[ Mn ] is constant.
Proof. By the Law of Total Expectation, for all 0 ≤ s ≤ t,

E[ Mt ] = E[E[ M0 , · · · , Ms ]] = E[ Ms ]

That is, E[ Mt ] = E[ M0 ] for all t.

Example 21: Simple Random Walk

The simple symmetric random walk Sn := ∑in=1 Xi has,



+1 with probability 1/2
Xi =
−1 with probability 1/2

for n ≥ 1 with S0 = 0. Then,

E [ S n + 1 | S0 , · · · , S n ] = E [ X n + 1 + S n | S0 , · · · , S n ]
= E [ X n + 1 | S0 , · · · , S n ] + E [ S n | S0 , · · · , S n ]
| {z }
E[ g( X )| X ]= g( X )

= E [ X n +1 ] + S n
| {z }
=0
= Sn

since Xn+1 is independent of X1 , · · · , Xn and consequently


Xn+1 is independent of S0 , · · · , Sn . Next, we prove that,
" # " #
n n n
E[|Sn |] = E ∑ Xi ≤E ∑ | Xi | = ∑ E[|Xi |] = n < ∞
i =1 i =1 i =1

Example 22: Biased Random Walk

Let p + q = 1. The biased random walk Sn := ∑in=1 Xi has,



+1 with probability p
Xi =
−1 with probability q

for n ≥ 1 with S0 = 0. While it is not a martingale,

E [ S n + 1 | S0 , · · · , S n ] = E [ X n + 1 ] + S n
| {z }
p−q

= ( p − q ) + Sn
w2022-math447: stochastic processes 54

the modified process (Ŝn )n≥1 is a martingale,

Ŝn = Sn − ( p − q) · n

Example 23: Biased Random Walk on N0 with Absorption

Let (Sn )n≥0 be the biased random walk on N0 with an absorb-


ing boundary at zero. (Ŝn ) is not a martingale,

E[Ŝn+1 | Ŝ0 , · · · , Ŝn ] = E[Sn+1 − ( p − q)(n + 1) | Ŝ0 , · · · , Ŝn ]


= E[ Xn+1 ] + Sn − ( p − q)(n + 1)
= Sn − ( p − q)(n + 1) + ( p − q)1{Yn ̸=0}

Define (Sn′ )n≥0 by Sn′ := Sn − ( p − q) · min{n, T } for


T := inf{n | Sn = 0}. Then (Sn′ ) is a martingale.

Example 24: Absorption Time for Biased Random Walk

If p < q, then (Sn ) is absorbed with probability 1,

Yk → 0
k→∞

Therefore,

E[S0′ ] = E[Sk′ ]
= E[Sn ] − E[( p − q) · min{n, T }]

E [ X0 ]
Taking k → ∞ gives the absorption time E[ T ] = (q− p)
.

Example 25: Branching Process

Let ( Zk )k≥0 be a branching process with offspring distribution


X = ( x1 , · · · , ). Put E[ X ] = µ. Then,

Zk
Mk : =
µk
w2022-math447: stochastic processes 55

Z
is a martingale. Since Zk+1 ∼ ∑ j=k 1 X j ,
 
Zk+1
E[ Mk+1 | M1 , · · · , Mk ] = E M1 , · · · , Mk
µ k +1
 
Z
∑ j=k 1 X j
= E k Zk 
µ ·µ
Zk
1
=
µk · µ
∑ E[Xj ] since µ, k fixed
j =1
Zk
1
=
µk · µ
∑µ
j =1
Zk
=
µk
= Mk

Example 26: Pólya’s Urn

If ( Rk , Bk ) are the number of red and blue balls at time k.


Rk
Mk : =
Rk + Bk
is a martingale. Recall that,
Bk
( Rk , Bk ) → ( Rk , Bk+1 ) with probability
R0 + B0 + k
Rk
( Rk , Bk ) → ( Rk+1 , Bk ) with probability
R0 + B0 + k
R + B0 + k − Bn
( Bk , k) → ( Bk , k + 1) with probability 0
R0 + B0 + k
Bk
( Bk , k) → ( Bk+1 , k + 1) with probability
R0 + B0 + k
Since ( Mk ) is a Markov Chain,
 
R k +1
E[ Mk+1 | M1 , · · · , Mk ] = E ( Rk , Bk )
Rk+1 + Bk+1
 
R k +1
=E ( Rk , Bk )
B0 + R0 + k + 1
1
= · E [ Rk+1 | ( Rk , Bk )]
B0 + R0 + k + 1
where Ak+1 be the event that the (k + 1)th ball is red, given
that the number of red and blue balls at time k are Rk and Bk .
Rk + 1
P ( R k +1 = R k + 1 ) ·
| {z } i Rk + Bk + 1
P( Ak+1 |( Rk ,Bk ))=E 1 A
h
k +1
w2022-math447: stochastic processes 56

Rk
+ P ( R k +1 = R k ) ·
| {z  } Rk + Bk + 1

P( Ack+1 |( Rk ,Bk ))=E 1 Ac
k +1

is the desired expectation. Simplifying,

R2k + Rk + Bk Rk
=
( R0 + B0 + k) · ( R0 + B0 + k + 1)
Rk · ( Rk + Bk + 1)
=
( Rk + Bk ) · ( Rk + Bk + 1)
Rk
=
Rk + Bk
= Mk

Example 27: Absorption Probabilities

Let (Yk ) be a Markov chain with an absorbing state i. Then,

Mk := P( Ai | Yk )

is a martingale, where Ai is the event that the chain is ab-


sorbed at i. By the Total Conditional Law of Expectation,

E[ Mk+1 | M0 , · · · , Mk ] = E[ P( Ai | Yk+1 ) | M0 , · · · , Mk ]
= E[ P( Ai | Yk+1 ) | Yk ]
= E[ P( Ai | Yk+1 , Yk ) | Yk ]
= P( Ai | Yk )
= Mk

where the third inequality follows by the Markov Property.

Limit Theorems
Definition 171 (Bracket). The bracket of a martingale ( Mk )k≥0 is40 , 40
This is effectively the accumulated
variance of the process (Yk ).
k  
⟨ Mk ⟩ : = ∑E ( Mn +1 − Mn ) 2 | F n
n =0

Example 28: Simple Random Walk

We saw that the simple random walk on Z0 is a martingale.

( Mn + 1 − Mn ) 2 = 1 for all n ∈ N0

so ⟨ Mk ⟩ = k for all k ∈ N.
w2022-math447: stochastic processes 57

Remark 172. Let ( Mk )k≥0 be a martingale. Then,

E ⟨ Mk ⟩ = E[( Mn+1 − M0 )2 ]

Proof. By definition,
h i h i
E ( Mn+1 − Mn )2 | Fn = E Mn2 +1 + Mn2 − 2 · Mn+1 Mn | Fn
h i
= E Mn2 +1 | Fn + Mn2 + 2 · Mn2

Therefore,
" #
n h i
E [⟨ Mk ⟩] = E ∑E 2
( Mn +1 − Mn ) | F n
n =0
n h h ii
= ∑ E E ( Mn + 1 − Mn ) 2 | F n
n =0
n h h i i
= ∑E E Mn2 +1 | Fn + Mn2 − 2 · Mn2
n =0
n
= ∑ E[ Mn2 +1 ] + E[ Mn2 ] − 2 · E[ Mn2 ]
n =0
n
= ∑ E[ Mn2 +1 ] − E[ Mn2 ]
n =0
= E[ Mn2 +1 ] − E[ M02 ]
= E[( Mn+1 − M0 )2 ]

Example 29: Pólya’s Urn (Bracket Process)

Rn Rn
Since ≥ , we have that,
n + n0 n + n0 +1
" 2 #
R n +1 Rn 1
E − Fn ≤
n + 1 + n0 n + n0 ( n + 1 + n0 )2

Implying that, for all n ∈ N,



1 π2
⟨ Mn ⟩ ≤ ∑ k 2
=
6
<∞
k =1

Theorem 173. Let ( Mk )k≥0 be a martingale. Suppose that,


 
P sup⟨ Mn ⟩ < ∞ = 1
n

Then, limn→∞ Mn exists. Denote it by M∞ . Then,

1. P (limn→∞ Mn = M∞ ) = 1
w2022-math447: stochastic processes 58

2. sup E[⟨ Mn ⟩] < ∞

3. Mn = E[ M∞ | Fn ]
10 Runs of the Pólya’s Urn Process:
Proof. The proof was not given in class.

Theorem 174. Let ( Mk )k≥0 be a martingale. If Mn ≥ 0 for all n ∈ N,


then there exists a random variable M∞ < ∞ such that,
 
P lim Mn = M∞ = 1
n→∞
Proof. The proof was not given in class.

Example 30: Branching Processes (Bracket Process)

Let ( Mk )k≥0 be the martingale for a branching process


Figure 7: It was proven in class that,
( Zk )k≥0 , as defined above. Then,
n Rn
h i Mn : = → M∞
⟨ Mk ⟩ = ∑ E ( Mk + 1 − Mk ) 2 F k n
k =1 where M∞ ∼ Unif([0, 1]).
 !2 
n
Zk+1 Zk
= ∑ E µ k + 1
− k
µ
Fk 
k =1
 !2 
n Zk
1 Zk
= ∑ E · ∑ Xi − k
µ k +1 i =1 µ
Fk 
k =1

 !2
 
n Zk 
1 Xi
= ∑ E  2k · ∑ −1 Fk 
k =1 µ i =1
µ
 !2
 
n Zk 
1 Xi µ
= ∑ E  2k · ∑ − Fk 
k =1 µ i =1
µ µ
 !2
 
n Zk 
1 Xi µ
= ∑ 2k · E  ∑ − Fk 
k =1 µ i =1
µ µ

Zn Var ( X1 )
= ·
µ2n µ2
using the definition of the population variance. Now, we can
bound sup ⟨ Mn ⟩ since ⟨ Mn ⟩ is an increasing sequence,

Zk Var ( X1 )
sup ⟨ Mn ⟩ ≤ ∑ µ 2k µ2
k =0
and use the mean exponential growth rate to conclude that,
∞   ∞
Z Var ( X1 ) 1 Var ( X1 )
E [sup ⟨ Mn ⟩] ≤ ∑ E 2kk 2
= ∑
k =0 µ µ k =0 µ
k µ2
with µ > 1 and Var ( X1 ) < ∞, the ratio Mk converges to a non-
degenerate limit random variable.

You might also like