Stochastic Processes Lecture Notes
Stochastic Processes Lecture Notes
Stochastic Processes
Figure 1. TASEP on Z/1000Z, with the initial distribution as the Bernoulli product measure.
Contents. Conditional expectation; Generating functions; Branching processes and random walk; Markov
chains, transition matrices, classification of states, ergodic theorem; Birth and death processes; Queueing.
w2022-math447: stochastic processes 2
Contents
Axioms of Probability 4
Conditional Probability 4
Conditional Distributions 4
Conditional Expectation 4
Time-Homogeneous Markov Chains 5
Finite State, Time-Homogeneous Chains 5
Transition Probabilities 6
Stationary Distributions 7
Regular Chains 9
Classification of States 10
First Step Analysis 12
Recurrence and Transience 12
Periodicity 15
Absorbing Chains 17
Positive and Null Recurrence 18
Reversibility 19
Markov Chain Monte Carlo 19
Markov Chain Coupling 19
Metropolis-Hastings Algorithm 21
Classical Markov Chains 22
Electrical Networks 22
Pólya’s Theorem 24
Pólya’s Urn 24
Branching Processes 25
Mean Generation Size 25
Generating Functions 26
Poisson Processes 32
Definition 1 32
Definition 2 33
Definition 3 36
w2022-math447: stochastic processes 3
Axioms of Probability
Conditional Probability
Definition 1 (Conditional Probability). The conditional probability of
A given B, defined for P( B) > 0, is,
P( A ∩ B)
P( A | B) =
P( B)
Conditional Distributions
Definition 5 (Conditional Probability Mass Function). The condi-
tional probability mass function of Y given X = x is,
P( X = x, Y = y)
P (Y = y | X = x ) =
P( X = x )
f ( x, y)
f Y |X (y|y) =
f X (x)
Conditional Expectation
Definition 7 (Conditional Expectation). The conditional expectation
Summary of Conditional Probability:
of Y given X = x, written E[Y | X = x ]( x ), is a function of x,
• Total Probability
∑ y · P(Y = y | X = x ) Ω is discrete P( A) = E[ P( A | X )]
y
E[Y | X = x ]( x ) = R ∞
y · f (y | x )dy
−∞ Y|X Ω is continous • Total Expectation
E[Y ] = E[E[Y | X ]]
= ∑ yP(Y = y | A)
y
E[E[Y | X ]] = ∑ E [Y | X = x ] · P ( X = x )
x
!
=∑ ∑ yP(Y = y | X = x) P( X = x )
x y
= ∑ y ∑ P (Y = y | X = x ) · P ( X = x )
y x
= ∑ y ∑ P( X = x, Y = y)
y x
= ∑ y · P (Y = y )
y
= E (Y )
P ( X n = i n | X0 = i 0 , · · · , X n − 1 = i n − 1 ) = P ( X n = i n | X n − 1 = i n − 1 )
P ( X n = i n | X n − 1 = i n − 1 ) = P ( X1 = i 1 | X0 = i 0 ) ( n ∈ N)
w2022-math447: stochastic processes 6
and,
0.51 if X = T
n
P ( X n +1 = T | Xn ) =
0.49 if Xn = H
Then, ! !
0.51 0.49 PHH PHT
P= =
0.49 0.51 PTH PTT
Transition Probabilities
Definition 15 (Probability Distribution Vector). The distribution of a
discrete random variable X is the vector ⃗ϕ if,
ϕj = P( X = j) ∀j ∈ N
P ( X n + 1 = j | X0 = i ) = ∑ P ( X n = k | X0 = i ) · P ( X n + 1 = j | X n = k )
k∈S
= ∑ pn (i, k) · p(k, j)
k∈S
n
=P P
= P n +1
w2022-math447: stochastic processes 7
pm+n ( x, y) = P( Xm+n = y | X0 = x )
= ∑ P(Xm+n = y, Xm = z | X0 = x)
z∈S
= ∑ pm (x, z) · pn (z, y) The probabilistic interpretation for
Chapman-Kolmogorov is that transi-
z∈S
tioning from x to y in m + n steps is
equivalent to transitioning from i to k in
Definition 19 (Distribution of Xn ). The distribution of ( Xn )n≥0 is, m steps and then moving from k to j in
the remaining n steps.
ϕ · Pn i.e., P( Xn = j) = (ϕ · Pn ) j ∀j ∈ N
Stationary Distributions
Definition 20 (Limiting Distribution Vector). A limiting distribution
⃗ so that,
for a time-homogeneous Markov chain ( Xn )n≥0 is a distribution π
lim pn (i, j) = π j
n→∞
• limn→∞ P( Xn = j) = π j
• limn→∞ ⃗ϕ · Pn = π
⃗ , where ⃗ϕ is the initial distribution
1 0
P= Multiple Stationary
0 1
Example 2: Random Walk on a Weighted Graph
0 1
P= No Limiting
Let G be a weighted graph with edge weight function w(i, j). 1 0
w(v)
πv =
∑z w(z)
∀v ∈ V ( G ) where w(v) = ∑ w(v, z)
z∼v
0 85 81 28
5 0 0 2
P = 7 7
1 0 0 0
2 2
4 4 0 0
Regular Chains
Definition 25 (Regularity). A transition matrix P is regular if and only
if there exists n ∈ N so that every entry of Pn is positive.
k k
|λ|n · | xm | = |(Pn ·⃗z)m | = | ∑ Pnmi xi | ≤ | xm | ∑ Pnmi = | xm |
i =1 i =1
Since the entries of Pn are positive, the last inequality only holds
if | x1 | = · · · = | xk |. Similarly, the first inequality only holds if
x1 = · · · = xm . But the constant vector whose components are the
same is an eigenvector associated with the eigenvalue 1. Hence, if
λ ̸= 1, one of the inequalities must be strict. Thus, |λ|n < 1.
lim Pn = V
n→∞
⃗.
where V is a matrix with all rows equal to π Example of Communication Classes:
G has 3 communication classes,
Classification of States
Definition 32 (Communication). Two states i, j ∈ S of a Markov chain
communicate, written i ↔ j, if there exist m, n ∈ N such that,
Equivalently, two states communicate if and only if each state has a positive
probability of eventually being reached by a chain starting in the other state.
• (Symmetry) i ↔ j =⇒ j ↔ i by definition
w2022-math447: stochastic processes 11
Remark 40. If ( Xn )n≥0 is irreducible, then the expected return time can
also be found by taking the reciprocal of the stationary probability π.
a b c
0 1 0 a
P =1/2 1 1/2 b
1/3 1/3 1/3 c
e a = 1 + eb
1 1
eb = + (1 + e c )
2 2
1 1 1
e c = + (1 + e b ) + (1 + e c )
3 3 3
Solving these equations gives,
8 7 10
ec = eb = ea =
3 3 3
by Chapman-Kolmogorov. Therefore,
∞ ∞
1
∑ pn ( j, j) ≤
ps (i, j) · pr ( j, i ) ∑ pn+r+s (i, i ) expanding the expectation
n =1 n =1
∞
1
≤
ps (i, j) · pr ( j, i ) ∑ pn (i, i )
n =1
< ∞ since i is transient
Moreover,
finite ⇐⇒ i is transient
E[ Ni ] = Recall:
infinite ⇐⇒ i is recurrent Let A be a square matrix with the
property that An → 0, as n → ∞. Then,
∑∞ n −1
n=0 A = ( I − A ) . This gives the
Proof. Assume that E[ Ni ] < ∞. Let Ri be the number of returns to a matrix analog of the sum of a geometric
(n) series of real numbers.
state i. Define a sequence (τi )n≥0 by,
inf{n ≥ 1 | X ∗ = i } Ri ≥ n
(n) n
τi =
∞ otherwise
n=1 1{ Xn =i } is
( n −1)
where ( Xn∗ ) is the process ( Xn ) started at τi . Ni = ∑∞
1 more than the number of returns Ri , so,
∞
Ni = 1 + Ri = 1 + ∑ 1{τi(n) <∞}
n =1
w2022-math447: stochastic processes 14
(n)
and τi = ∞ if and only if Xn visits i fewer than n times. Now,
(n) (1)
P(τi < ∞) = [ P(τi = ∞)]n by time homogeneity (⋆). Therefore,
∞
E[ Ni ] = E 1 + ∑ 1{τi(n) <∞}
n =1
∞
=E ∑ 1{τi(n) <∞}
n =0
∞
= ∑ E 1{ τ ( n ) < ∞ } by Linearity of Expectation
n =0 i
∞
∑ P(τi
(n)
= < ∞)
n =0
∞
∑ P(τi
(1)
= < ∞)
n =0
∞
∑ [ P(τi
(1)
= < ∞)]n (⋆)
n =0
1
= by definition of a geometric series
1 − P(τi+ < ∞)
finite ⇐⇒ i is transient
Thus, E[ Ni ] = 1− P(τ1+ <∞) =
i infinite ⇐⇒ i is recurrent Stirling’s Formula states that as n → ∞,
√ n n
n! ∼ 2πn
e
Example 5: Simple Symmetric Random Walk on Z
E[ Ni ] = ∑ pn (0, 0)
n ≥0
= ∑ p2n (0, 0)
n ≥0
2n 1
= ∑ n
· 2n
2
n ≥0
1
≥ ∑ √ by Stirling’s Formula
4n
n ≥1
=∞
S = T ∪ R1 ∪ · · · ∪ R m
w2022-math447: stochastic processes 15
T R1 R2 ··· Rm
∗ ∗ ∗ ··· ∗ T
0
P1 0 ··· 0 R1
P =
0 0 P2 ··· 0 R2
. .. .. .. .. ..
.. . . . . .
0 0 0 0 Pm Rm
T R1 R2 ··· Rm
∗ ∗ ∗ ··· ∗ T
0 limn→∞ P1n 0 ··· 0 R1
n
lim P =
0 0 limn→∞ P2n ··· 0 R2
n→∞ .
.. .. .. .. .. .
..
. . . .
0 0 0 0 limn→∞ Pnm Rm
Periodicity
Definition 50 (Period). The period of a state i, d = d(i ) is,
Theorem 51. The states of a communication class all have the same period.
Proof. Suppose that there exist states i, j such that i ↔ j and d(i ) ̸=
d( j). Since i and j communicate, there exist r, s ∈ N such that,
Example 6: Periodicity
Now, pn (i, j) > 0 for all n ≥ max{ M(i ) + m(i, j) | (i, j) ∈ S × S}.
Suppose that P is regular. By definition, there exists an M > 0
such that for all n ≥ M, Pn has all entries strictly positive. This
means that pn (i, j) > 0 for all states i, j, and consequently that P is
irreducible. If Pn has strictly positive entries, so too does Pn+1 . Thus,
P( Xn = i | X0 = i ) > 0 and P( Xn+1 = i | X0 = i ) > 0. Since
gcd(n, n + 1) = 1, P is aperiodic.
lim Pn = V
n→∞
⃗.
where V is a matrix with all rows equal to π
w2022-math447: stochastic processes 17
Absorbing Chains
Definition 56 (Absorption). A state i ∈ S is called absorbing if,
p(i, i ) = 1
Assume that the chain starts in state a. The first time that P hits i is,
∑ ( I − P̃ )−1 if i ̸= a
b ̸ =i a,b
1 + ∑ j̸=i Pij · ∑b̸=i ( I − P̃ ) − 1
if i = a
j,b
P(τa ≥ n) = P( Xn−1 ̸= a) (n ≥ 1)
Moreover,
1
πj = for all j
E[τj+ ]
n −1
1 1
µj
= lim
n→∞ n
∑ pm ( j, i )
m =0
1 n −1
≥ lim
n→∞ n
∑ pr ( j, i) · pm−r−s (i, i) · ps (i, j)
m =r + s
!
n −1
n−r−s
1
= lim
n→∞ n
pr ( j, i ) ∑ pm−r−s (i, i) Ps (i, j)
n − r − s m= r +s
1
= pr ( j, i ) ps (i, j) > 0.
µi
Moreover,
1
πj = for all j
E[τj+ ]
w2022-math447: stochastic processes 19
Reversibility
Definition 66 (Detailed Balance Equations).
P ( X0 = i 0 , X1 = i 1 , · · · , X n = i n ) = P ( X0 = i n , X1 = i n − 1 , X n = i 0 )
Example 7: Reversibility
∥π7 − π ∥ TV = 0.17
This tells us that the probability of any event, for example, the
probability of winning any specified card game using a deck
A coupling of two probability distri-
shuffled 7 times, differs by at most 0.17 from the probability of butions µ and ν is a pair of random
the same event using a perfectly shuffled deck. variables ( X, Y ) defined on a single
probability space such that the marginal
distribution of X is µ and the marginal
distribution of Y is ν. That is, a cou-
Definition 71 (Coupling of Markov Chains). A coupling of Markov pling ( X, Y ) satisfies,
chains is a process ( Xn , Yn )n≥0 with the property that both ( Xn ), (Yn )
P( X = x ) = µ( x ) and P(Y = y) = ν(y)
are Markov chains with transition matrix P, although the chains may have
different starting distributions.
T = inf{n | Xn = Yn }
⃗0 · Pn − π
∥π ⃗ ∥ TV ≤ P( T > n) for all n > 0 The coupling inequality reduces the
problem of showing that,
⃗ is the stationary distribution of (Yn ).
where π
⃗0 · Pn − π
∥π ⃗∥ → 0
to that of showing,
Proof. Define the process (Yn∗ ) by,
P( T > n) → 0 ⇐⇒ P( T < ∞) = 1
Y
n if n < T
Yn∗ =
Xn if n ≥ T
⃗ = P( Xn ∈ A) − P(Yn∗ ∈ A)
⃗0 · Pn ( A) − π
π
= P( Xn ∈ A, T ≤ n) + P( Xn ∈ A, T > n)
− P(Yn∗ ∈ A, T ≤ n) − P(Yn∗ ∈ A, T > n)
π Z (i, j) = πi π (y)
and for i ̸= j,
T · a(i, j) if π j T ji ≤ πi T ij
ij
Pij =
T ij otherwise
The diagonal entries of P are determined by the fact that the rows of
P sum to 1. There are two cases,
• If π j T ji ≤ πi T ij ,
π j T ij
πi Pij = πi T ij a(i, j) = πi T ij = π j T ji = π j P ji
πi T ij
• If π j T ji < πi T ij ,
πi T ij
πi Pij = πi T ij a(i, j) = π j T ji = π j T ji a( j, i ) = π j P ji
π j T ji
Electrical Networks
We can represent Markov chains as electrical networks, which we
typically denote by undirected weighted graphs G = (V, E).
Cx : = ∑ Cxy < ∞ ∀x ∈ V
y∼ x
Cxy
P xy :=
Cx
w2022-math447: stochastic processes 23
1
R xy =
Cxy
ceff
A,B : = ∑ ca · P(τv < τb+ )
v∈ A
where τv is the first hitting time and τa+ is the first return time17 . When 17
Recall that we write τz and τa+ for the
A = { a} and B = {b}, the effective conductance is, first time that the random walk visits
z and the first positive time that the
random walk visits a respectively.
Ceff
A,B : = ∑ Ca · P(τv < τb+ | X0 = a)
v∈ A
A,B = ∞.
When A and B are not disjoint, we define Ceff
1
Reff
A,B = eff
C A,B
ceff
A,B ( G ) Cutting an edge can only increase the
effective resistance between the two
is a monotone increasing function of c ∈ (0, ∞) E . vertices that it is adjacent to.
Pólya’s Theorem
Theorem 87 (Pólya’s Theorem). The d-dimensional hybercubic lattice Zd
is recurrent if d ≤ 2 and transient if d ≤ 3.
Corollary 88. A simple random walk on any subgraph of Z2 is recurrent18 . 18
Adding a finite number of edges to a
transient graph preserves transience.
Pólya’s Urn
Definition 89 (Pólya Urn Model). Pólya’s Urn is the process,
• An urn contains two balls, one black and one white
• Return the chosen ball to the urn and add another ball of the same color
The sequence of ordered pairs listing the numbers of black and white balls is
a Markov chain. A configuration ( a, b) with a black balls and b white balls
evolves according to,
( a + 1, b) with probability a
a+b
( a, b) →
(b, b + 1) with probability b
a+b
(1, 1) → (3, 3)
m+n−b−w
m−b
1 1
P= or, if U = m + n, P=
m+n−1 U
Branching Processes
where X j denotes the number of children born to the jth person in the nth
generation. ( X j ) j≥1 is an i.i.d sequence with common distribution X. Fur-
thermore, Zn is independent of ( X j ).
E[ Zn ] = µn
w2022-math447: stochastic processes 26
since Z0 = 1
P( T0 > n) = P( Zn ≥ 1) ≤ E[ Zn ] = µn E[ Z0 ]
GZ ( s ) = E [ s Z ]
n
= E [ s ∑ i =1 Xn ]
n
= E ∏ s Xk
k =1
n
= ∏ E[sXk ] by independence
k =1
= G X1 ( s ) · · · G X n ( s )
If Xn is i.i.d., then,
G (0) = P ( X = 0)
∞
G ′ (0) = ∑ ksk−1 P(X = k) = P ( X = 1)
k =1 s =0
∞
G ′′ (0) = ∑ k ( k − 1 ) s k −2 P ( X = k ) = 2P( X = 2)
k =2 s =0
and thus
G ( j ) (0)
P( X = j) = , for j = 0, 1, . . .
j!
Theorem 102. The generating function of the nth generation size Zn is the
n-fold composition of the offspring distribution generating function,
z
= E s∑k=1 Xk by independence
z
=E ∏ s Xk
k =1
z
= ∏ E[sXk ] by independence
k =1
= [ G (s)]z for all z
Taking expectations,
Proof. The generating function for the nth generation size Zn is,
∞
Gn (s) = ∑ sk P (Zn = k)
k =0
Since P( T0 ≤ k) = P( Zk = 0),
1. G (1) = 1 since,
∞
E[1 X ] = ∑ P( X = k) = 1
k =0
2. G (s) is strictly increasing since,
∞
G ′ (s) = ∑ ksk−1 πk > 0 (s > 0 and πk ̸= 0 for all k ≥ 1)
k =1
3. G (s) is strictly convex since,
∞
G ′′ (s) = ∑ k ( k − 1 ) s k −2 π k > 0 (s > 0 and πk ̸= 0 for all k ≥ 2)
k =2
Proof. G ′ (1) = ∑∞ ∞
k =1 k · π k = ∑ k =1 k · P ( X = k ) = E [ X ] .
P( T0 ≤ k) = P( Zk = 0)
= Gk (0)
= G ( Gk−1 (0))
= G ( P( Zk−1 = 0))
= G ( P( T0 ≤ k − 1))
P( T0 < ∞) = lim P( Zk = 0) ≤ x
k→∞
P( T0 < ∞) ≤ x
Proof. If πk = δ1k for all k ≥ 0, then G (s) = s for all s ∈ [0, 1] and
µ = 1. This implies that G (s) has infinitely many fixed points in
the interval [0, 1]. Assume that πk ̸= δ1k . Since G is convex, the two
curves y = G (s) and y = s can intersect at either one or two points.
The derivative of G (s) at s = 1 distinguishes these two cases.
1
y=s and y = G (s) = (1 + s + s2 )
3
intersect at s = 1. Therefore, µ ≤ 1, and the extinction proba-
bility is P( T0 < ∞) = 1. We can also compute µ explicitly,
1
µ= (0 + 1 · s + 2 · s ) =1
3 s =1
w2022-math447: stochastic processes 32
Z0 = 1 and ⃗ ∼ Po(µ)
π
G ( s ) = e µ ( s −1)
Poisson Processes
Definition 1
Figure 3: A Poisson probability generat-
ing function with various means µ.
Definition 108 (Counting Function). A counting process ( Nt )t≥0 is a
collection of random variables with values in N0 such that,
Nt ≥ Ns ∀t ≥ s ≥ 0
where Nt − Ns is the number of events that have occured in (s, t]. The parameter λ is called the rate
because E[ Nt ] = t · λ,
E[ Nt ] = E[ Nt − N0 ] = λ · t
Example 13: Poisson Process
P( N2 = 18, N7 = 70)
w2022-math447: stochastic processes 33
By definition,
Yt := Nt+s − Ns
Yt − Yr = Nt+s − Ns − ( Nr+s − Ns )
= Nt+s − Nr+s
Definition 2
Definition 113 (Interarrival Time). The interarrival times ( Xk )k≥0 are
the times between consecutive jumps of a counting process.
w2022-math447: stochastic processes 34
Definition 114 (Arrival Time). The arrival times (Sk )k≥0 are the times, Note on Arrival Time:
SK − Sk−1 = Xk (where S0 = 0)
lim Nt ̸= lim Nt
t→s+ t→s−
with N0 = 0. Then, ( Nt )t≥0 is a Poisson process with parameter λ. Note on Exponential Distribution.
defines a sequence (Sn ) of arrival times of the process, where Sk is the time R∞
E[ f ( X )] = λe−λt f (t)dt(⋆)
of the kth arrival. The interrarival time between k − 1 and k is, 0
X k = S k − S k −1 k ∈ N with S0 = 0 E[ X ] = 1
λ
Proof. Let ( Nt ) be a rate λ · t Poisson Process. Then, P (min { X1 , · · · , Xn } > t) = e−(λ1 +...+λn )t
P( Nt = k) = E[ P( A | Sk )]
= E 1{ Sk ≤ t } · P ( A | Sk ) + 1{ Sk > t } · P ( A | Sk )
λk x k−1 −λx
f (x) = e
( k − 1) !
w2022-math447: stochastic processes 35
e−λ(t+s)
P( X > t + s | X > s) = = e−λt
e−λs
Therefore, X − s ∼ Exp(λ) for X > s.
63
P( N1 = 3) =
e6 · 3!
Definition 3
Definition 120 (Poisson Process – Definition 3). A Poisson process
with parameter λ is a counting process ( Nt )t≥0 satisfying23 , 23
There cannot be infinitely many
arrivals in a finite interval, and in an
1. N0 = 0 infinitesimal interval there may occur at
most one event.
2. ( Nt ) has stationary and independent increment
3. P( Nh = 0) = 1 − λh + o (h)
4. P( Nh = 1) = λh + o (h)
5. P( Nh > 1) = o (h)
f (h)
lim =0
h →0 g(h)
K
T= ∑ Xi
i =1
E [ T ] = E [ K ] · E [ X1 ] = E [ K ]
| {z }
=1
Let Zk be the time when the second person marked with birth-
day k enters the room. Then, the first time that two people in
the room have the same birthday is,
T = min Zk
1≤k ≤365
1 te−t/365
n=2 λ= f (t) = ( t > 0)
365 3652
The cumulative distribution function is,
Z t −s/365
se e−t/365 (365 + t)
P ( Z1 ≤ t) = ds = 1 −
0 3652 365
w2022-math447: stochastic processes 38
This gives,
P( T > t) = P min Zk > t
1≤k ≤365
0
P ( X Nt = x ) dt = ∑ pk (x, x)
k =0
w2022-math447: stochastic processes 39
Since P ( Xk = x ) = P ( X Nt = x | Nt = k),
Z ∞ Z ∞ ∞
0
P ( X Nt = x ) dt =
0
∑ P( Nt = x) · P (XNt = x | Nt = k) dt
k =0
∞ Z ∞
= ∑ P ( X Nt = x | Nt = k) ·
0
P( Nt = x )dt
k =0
∞ Z ∞
= ∑ P ( Xk = x ) ·
0
P( Nt = x )dt
k =0
∞ Z ∞ −t k
e t
= ∑ pk ( x, x ) ·
k!
dt
k =0 |0 {z }
=1
∞
= ∑ pk (x, x)
k =0
λ·t
(L ) t
( Nt t )t≥0 ∼ Po = Po
2 2
λ·t
(R ) t
( Nt t )t≥0 ∼ Po = Po
2 2
( Lt ) ( Rt )
Moreover, ( Nt ) are independent. Thus,
) and ( Nt
(R ) (L ) t t
Xt∗ := X Nt = ( Nt t )t≥0 − ( Nt t )t≥0 ∼ Po − Po
2 2
| {z } | {z }
+1(”Rt ”) −1(”Lt ”)
P ( X Nt = 0) = P( Lt = Rt )
∞
= ∑ P( Lt = k | Rt = k ) · P( Rt = k )
k
!2
∞
e−t/2 (t/2)k
= ∑ (k!)
by independence
k =0
−t
=e · I0 (t) where I0 (t) is the Bessel function
w2022-math447: stochastic processes 40
√
Therefore, P( X Nt = 0) · 2πt → 1 as t → ∞ using the Stir-
ling Formula or Bessel function properties. Consequently,
Z ∞ Z ∞
dt
P ( X Nt = 0) dt = ∞ because √ =∞
0 1 2πt
(i ) ( j)
where ( X Nt ) and ( X Nt ) are independent if i ̸= j, and,
(i ) t
( X Nt ) ∼ Po for all i ∈ [n]
d
R∞ d
Therefore, 1
√ 1 dt < ∞ if and only if d ≥ 3 as
2πt/d
r d
2πt
· P X Nt = ⃗0 → 1
d
Order Statistics
If a Poisson process contains exactly n events in [0, t], then the un-
ordered times of those events are uniformly distributed on [0, t].
P (S1 ≤ s, Nt = 1)
P (S1 ≤ s | Nt = 1) =
P ( Nt = 1)
P ( Ns = 1, Nt = 1)
=
P ( Nt = 1)
P ( Ns = 1, Nt − Ns = 0)
=
P ( Nt = 1)
P ( Ns = 1) · P ( Nt−s = 0)
=
P ( Nt = 1)
e−λs λse−λ(t−s)
=
e−λt λt
s
=
t
since Nt−s ∼ Po(λ · (t − s)).
1
f U1 ,...,Un (u1 , . . . , un ) =
tn
for 0 ≤ u1 , . . . , un ≤ t. Arrange Ui in increasing order,
is the order statistics of the original sequence. Its joint density is,
n!
f U(1) ,...,U(n) (u1 , . . . , un ) =
tn
0 ≤ u1 < · · · < un ≤ t.
n!
f ( s1 , . . . , s n ) =
tn
for 0 < s1 < · · · < sn < t. Equivalently, let U1 , · · · , Un be an i.i.d
sequence of Unif([0, t]) random variables. Then, conditional on Nt = n,
(S1 , . . . , Sn ) and U(1) , . . . , U(n)
Corollary 127. Results for arrival times offer a new method for simulating
a Poisson process with parameter λ on an interval [0, t]:
3. Sort the variables in increasing order to give the Poisson arrival times
( N ( A1 ), N ( A2 ), · · · , N ( Ak ))
Holding Times
Definition 132 (Continuous-Time Markov Property). The Markov
property for continuous-time chains ( Xt )t≥0 with discrete state space S is,
P Xt = j | ( Xu )0≤u<s and Xs = i = P ( Xt = j | Xs = i )
P ( X t + s = j | X s = i ) = P ( X t = j | X0 = i )
Pij (t) = P ( Xt = j | X0 = i )
Proof. By conditioning on Xs ,
Pij (s + t) = P ( Xs+t = j | X0 = i )
= ∑ P ( Xs+t = j | Xs = k, X0 = i ) · P ( Xs = k | X0 = i )
k
= ∑ P ( X s + t = j | X s = k ) · P ( X s = k | X0 = i )
k
= ∑ P ( X t = j | X0 = k ) · P ( X s = k | X0 = i )
k
= ∑ Pik (s) · Pkj (t)
k
= [P(s) · P(t)]ij
P( A ∩ B | X0 = i ) = P( A | B ∩ { X0 = i }) · P( B | X0 = i )
1. If j can be hit from i, the alarm Cij associated with (i, j) will ring
after an exponentially distributed length of time with parameter qij
4. If the (i, j) clock rings first and the process moves to j, a new set of
exponential alarm clocks are started with rates q j1 , q j2 , · · ·
w2022-math447: stochastic processes 47
Infinitesimal Generator
Definition 141 (Rate Matrix). The rate matrix (Q) for a continuous-
time Markov chain is defined as follows,
q · P̃
i ij i ̸ = j
(Q)ij :=
0 i=j
Note on Generator Matrix:
q
ij i ̸= j A continuous-time Markov chain
= ( Xt )t≥0 with transition function P(t)
0 i=j and generator Q satisfies that,
1. P′ (t) = P(t) · Q
Definition 142 (Generator Matrix). The generator matrix Q for a 2. P′ (t) = Q · P(t)
continuous-time Markov chain is defined as follows29 , These are called the "Kolmogorov
Forward, Backward Equations".
i ̸= j
29
If the smallest signed value of a
qij
generator matrix Q is finite, then we say
Qij := − qi i=j the chain is "bounded rate". Moreover,
any finite chain is bounded rate.
|{z}
− ∑k qik
where P̃ is the embedded chain, and {Cij = Ĉ } is the event that Cij is the
first alarm that rings and determines the next state30 . 30
The holding time parameters λi are
equal to qi .
Corollary 145. A continuous-time Markov chain ( Xt )t≥0 with transition
function P(t) and generator Q satisfies that,
∞
1 t2 t3 Note on Matrix Exponential:
P (t) = etQ = ∑ n!
(tQ)n = I + tQ + Q2 + Q3 + · · ·
2 6
Let A be a k × k matrix. Then,
n =0 ∞
1 n
eA = ∑ n!
A
n =0
1 2
Classification of States = I+A+ A +···
2
Lemma 147. If Pij (t) > 0 for some t > 0, then Pij (t) > 0 for all t > 0.
w2022-math447: stochastic processes 48
Proof. Suppose that Pij (t) > 0 for some t > 0. Then there exists a
path from i to j in the embedded chain, and, since the exponential
distribution is continuous, for any time s there is positive probability
of reaching j from i in s time units31 . 31
Formally, for s ≥ 0, this means that,
1. Pij (t + s) > 0
Corollary 148. All states of a continuous-time Markov chain are aperiodic.
2. Pij (t − s) > 0
Stationary Distributions
⃗ is
Definition 149 (Stationary Distribution). A probability distribution π
32
a stationary probability distribution for a continuous-time chain if , 32
As in the discrete case, the limiting
distribution, if it exists, is a stationary
π ⃗ · P(t)
⃗ =π distribution. However, the converse is
not necessarily true and depends on the
Corollary 150. The stationary distribution π ⃗ is not the same as the station- class structure of the chain.
⃗ 33 . However, for all j,
ary distribution of the embedded chain, ψ 33
πi is the long-term proportion of
time that the process spends in state
πj qj ψ( j)/q j i. Conversely, ψ(i ) is the long-term
ψ( j) = πj = proportion of transitions that the
∑k π (k )qk ∑k ψ(k)/qk
process makes into state i.
Theorem 151 (Fundamental Limit Theorem). Let ( Xt )t≥0 be a finite,
irreducible, continuous-time Markov chain with transition function P (t).
⃗ , which is the limiting
Then, there exists a unique stationary distribution π
distribution of the chain. That is, for all j ∈ S,
Equivalently,
lim P (t) = Π
t→∞
where Π is a matrix all of whose rows are equal to π
⃗.
Proof. The proof of the Fundamental Limit Theorem is omitted.
⃗ is a stationary distribution of a
Theorem 152. A probability distribution π
continuous-time Markov chain with generator Q if and only if,
⃗ Q = ⃗0
π ⇐⇒ ∑ πi Qij = 0 for all j ∈ S
i
⃗ P ′ (0) = π
⃗0 = π ⃗Q
Poisson Subordination
Definition 153 (Subordination). Let ( Nt )t≥0 be a Poisson process with
parameter λ. Let (Yt )t≥0 be a finite-state, irreducible, discrete-time process
with transition matrix R. Define a continuous-time process ( Xt )t≥0 by
Xt := YNt 35 . The process ( Xt ) is subordinated to a Poisson process. 35
Transitions for the Xt process occur
at the arrival times of the Poisson
Remark 154. Let P(t) be the transition function of ( Xt ). Then, process. From state i, the process holds
an exponentially distributed amount
Pij (t) = P( Xt = j | X0 = i ) of time with parameter λ, and then
transitions to j with probability Rij
∞
= ∑ P (Xt = j | { Nt = k} ∩ {X0 = i}) · P ( Nt = k | X0 = i)
k =0
∞
= ∑ P (Yk = j | { Nt = k} ∩ {X0 = i}) · P ( Nt = k)
k =0
∞
= ∑ P (Yk = j | Y0 = i) · P ( Nt = k)
k =0
∞
e−λt (λt)k
= ∑ Rijk ·
k!
k =0
⃗ λ( R − I ) = λπ
⃗Q = π
Proof. π ⃗ R − λπ
⃗.
⃗ for all t ≥ 0
⃗ P (t) = π
1. π
⃗ Q = ⃗0
2. π
⃗ R̃ = π
3. π ⃗ for some λ = maxi qi
Remark 159. A Yule process (Yt )t≥0 satisfies the following identity,
(1) (2)
Yt+τ ∼ Yt + Yt and τ ∼ Exp( β)
(Yt ) cannot be written as a Poisson
(1) (2)
where τ is the time of the first division, and Yt , Yt are Yule processes. subordinated discrete time walk.
However, it exists for all time without
exploding. Conversely, if q x,x+1 = β · x2 ,
then the process explodes in finite time.
w2022-math447: stochastic processes 51
∞ ∞ ∞
" #
i −1
E[Yt ] = E ∑ 1{Tj ≤t} = ∑ P(Tj ≤ t) = ∑ 1 − e−t· β = et· β
i =1 i =1 i =1
Theorem 163. If β > 1, then the probability of extinction is less than 1. The following questions are equivalent,
1. How many times is the Exp(β)
Proof. The proof is by a Poisson Process argument, where, birth clock C of an individual the
minimum before the Exp(1) death
clock rings?
β 1
and 2. How many offspring does the
1+β 1+β
individual produce?
are the parameters of each thinned process. Computing the number
of divisions before death gives that E[Y ] = β.
Corollary 164. Let (Yt ) be a birth-and-death process with birth rate β and
death rate 1. If β ≤ 1, the underlying branching process goes extinct,
a.s
Yt −→ 0 where 0 is an absorbing state
t→∞
w2022-math447: stochastic processes 52
P(Y∞ = ∞) = 1 − P(Y∞ = 0)
Proof. The proof uses Martingales, and it was not seen in class.
Martingales
Examples of Martingales
Definition 168 (Martingale). A martingale ( Mt )t≥0 is a stochastic
process that satisfies, for all t ≥ 0,
2. E[| Mt |] < ∞
2. E[| Mt |] < ∞
Remark 170. If ( Mn )n≥0 is a martingale, then E[ Mn ] is constant.
Proof. By the Law of Total Expectation, for all 0 ≤ s ≤ t,
E[ Mt ] = E[E[ M0 , · · · , Ms ]] = E[ Ms ]
E [ S n + 1 | S0 , · · · , S n ] = E [ X n + 1 + S n | S0 , · · · , S n ]
= E [ X n + 1 | S0 , · · · , S n ] + E [ S n | S0 , · · · , S n ]
| {z }
E[ g( X )| X ]= g( X )
= E [ X n +1 ] + S n
| {z }
=0
= Sn
E [ S n + 1 | S0 , · · · , S n ] = E [ X n + 1 ] + S n
| {z }
p−q
= ( p − q ) + Sn
w2022-math447: stochastic processes 54
Ŝn = Sn − ( p − q) · n
Yk → 0
k→∞
Therefore,
E[S0′ ] = E[Sk′ ]
= E[Sn ] − E[( p − q) · min{n, T }]
E [ X0 ]
Taking k → ∞ gives the absorption time E[ T ] = (q− p)
.
Zk
Mk : =
µk
w2022-math447: stochastic processes 55
Z
is a martingale. Since Zk+1 ∼ ∑ j=k 1 X j ,
Zk+1
E[ Mk+1 | M1 , · · · , Mk ] = E M1 , · · · , Mk
µ k +1
Z
∑ j=k 1 X j
= E k Zk
µ ·µ
Zk
1
=
µk · µ
∑ E[Xj ] since µ, k fixed
j =1
Zk
1
=
µk · µ
∑µ
j =1
Zk
=
µk
= Mk
Rk
+ P ( R k +1 = R k ) ·
| {z } Rk + Bk + 1
P( Ack+1 |( Rk ,Bk ))=E 1 Ac
k +1
R2k + Rk + Bk Rk
=
( R0 + B0 + k) · ( R0 + B0 + k + 1)
Rk · ( Rk + Bk + 1)
=
( Rk + Bk ) · ( Rk + Bk + 1)
Rk
=
Rk + Bk
= Mk
Mk := P( Ai | Yk )
E[ Mk+1 | M0 , · · · , Mk ] = E[ P( Ai | Yk+1 ) | M0 , · · · , Mk ]
= E[ P( Ai | Yk+1 ) | Yk ]
= E[ P( Ai | Yk+1 , Yk ) | Yk ]
= P( Ai | Yk )
= Mk
Limit Theorems
Definition 171 (Bracket). The bracket of a martingale ( Mk )k≥0 is40 , 40
This is effectively the accumulated
variance of the process (Yk ).
k
⟨ Mk ⟩ : = ∑E ( Mn +1 − Mn ) 2 | F n
n =0
( Mn + 1 − Mn ) 2 = 1 for all n ∈ N0
so ⟨ Mk ⟩ = k for all k ∈ N.
w2022-math447: stochastic processes 57
E ⟨ Mk ⟩ = E[( Mn+1 − M0 )2 ]
Proof. By definition,
h i h i
E ( Mn+1 − Mn )2 | Fn = E Mn2 +1 + Mn2 − 2 · Mn+1 Mn | Fn
h i
= E Mn2 +1 | Fn + Mn2 + 2 · Mn2
Therefore,
" #
n h i
E [⟨ Mk ⟩] = E ∑E 2
( Mn +1 − Mn ) | F n
n =0
n h h ii
= ∑ E E ( Mn + 1 − Mn ) 2 | F n
n =0
n h h i i
= ∑E E Mn2 +1 | Fn + Mn2 − 2 · Mn2
n =0
n
= ∑ E[ Mn2 +1 ] + E[ Mn2 ] − 2 · E[ Mn2 ]
n =0
n
= ∑ E[ Mn2 +1 ] − E[ Mn2 ]
n =0
= E[ Mn2 +1 ] − E[ M02 ]
= E[( Mn+1 − M0 )2 ]
Rn Rn
Since ≥ , we have that,
n + n0 n + n0 +1
" 2 #
R n +1 Rn 1
E − Fn ≤
n + 1 + n0 n + n0 ( n + 1 + n0 )2
1. P (limn→∞ Mn = M∞ ) = 1
w2022-math447: stochastic processes 58
3. Mn = E[ M∞ | Fn ]
10 Runs of the Pólya’s Urn Process:
Proof. The proof was not given in class.
!2
n Zk
1 Xi
= ∑ E 2k · ∑ −1 Fk
k =1 µ i =1
µ
!2
n Zk
1 Xi µ
= ∑ E 2k · ∑ − Fk
k =1 µ i =1
µ µ
!2
n Zk
1 Xi µ
= ∑ 2k · E ∑ − Fk
k =1 µ i =1
µ µ
Zn Var ( X1 )
= ·
µ2n µ2
using the definition of the population variance. Now, we can
bound sup ⟨ Mn ⟩ since ⟨ Mn ⟩ is an increasing sequence,
∞
Zk Var ( X1 )
sup ⟨ Mn ⟩ ≤ ∑ µ 2k µ2
k =0
and use the mean exponential growth rate to conclude that,
∞ ∞
Z Var ( X1 ) 1 Var ( X1 )
E [sup ⟨ Mn ⟩] ≤ ∑ E 2kk 2
= ∑
k =0 µ µ k =0 µ
k µ2
with µ > 1 and Var ( X1 ) < ∞, the ratio Mk converges to a non-
degenerate limit random variable.