0% found this document useful (0 votes)

253 views130 pages

Random Walks and Markov Chains Overview

The document is a set of lecture notes on random walks. It introduces the concept of a random walk as a random process where the next step is chosen randomly from the possible neighbors. As an example, it describes a random walk on the integers where at each step the walker moves +1 or -1 with equal probability. It presents two key properties: 1) the expected number of visits to 0 by time t grows as the square root of t, and 2) the probability of eventually returning to 0 is 1.

Uploaded by

Omonda Nii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

253 views130 pages

Random Walks and Markov Chains Overview

Uploaded by

Omonda Nii

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

RANDOM WALKS

ARIEL YADIN

Course: 201.1.8031 Spring 2016

Lecture notes updated: May 2, 2016

Contents

Lecture 1. Introduction 3

Lecture 2. Markov Chains 8

Lecture 3. Recurrence and Transience 18

Lecture 4. Stationary Distributions 26

Lecture 5. Positive Recurrent Chains 33

Lecture 6. Convergence to Equilibrium 37

Lecture 7. Conditional Expectation 42

Lecture 8. Martingales 50

Lecture 9. Reversible Chains 55

Lecture 10. Discrete Analysis 60

Lecture 11. Networks 67

Lecture 12. Network Reduction 73

Lecture 13. Thompsons Principle 80

Lecture 14. Nash-Williams 84

Lecture 15. Flows 89

Lecture 16. Resistance in Euclidean Lattices 93

Lecture 17. Spectral Analysis 98

1
2

Lecture 18. Kestens Amenability Criterion 103

Lecture 19. 107

Lecture 20. 112

Lecture 21. 118

Lecture 22. 126

Number of exercises in lecture: 0

Total number of exercises until here: 0
3

Random Walks

Ariel Yadin

Lecture 1: Introduction

1.1. Overview

In this course we will study the behavior of random processes; that is, processes that evolve
in time with some randomness, or probability measure, governing the evolution.
Let us give some examples:

A gambler playing the roulette.

A drunk man walking in some city.
A drunk bird flying in the sky.
The evolution of a certain family name.

Some questions which we will be able to (hopefully) answer by the end of the course:

Suppose a gambler starts with N Shekel. What is the probability that the gambler will
earn another N Shekel before losing all of the money?
How long will it take for a drunk man walking to reach either his house or the city limits?
Suppose a chess knight moves randomly on a chess board. Will the knight eventually
return to the starting point? What is the expected number of steps until the knight
returns?
Suppose that men of the Rothschild family have three children on average. What is the
probability that the Rothschild name will still be alive in another 100 years? Is there
positive probability for the Rothschild name to survive forever?

1.2. Random Walks on Z

We will start with some soft example, and then go into the more deep and precise theory.
What is a random walk? A (simple) random walk on a graph is a process, or a sequence of
vertices, such that at every step the next vertex is chosen uniformly among the neighbors of the
current vertex, each step of the walk independently.

X Story about Polya meeting a couple in the woods.

George Polya (1887-1985)

Figure 1. Path of a drunk man walking in the streets.

Figure 2. Path of a drunk bird flying around.

Now, suppose we want to perform a random walk on Z. If the walker is at a vertex z, then
a uniformly chosen neighbor is choosing z + 1 or z 1 with probability 1/2 each.
5

That is, we can model a random walk on Z by considering an i.i.d. sequence (Xk )
k=1 , where
Pt
Xk is uniform on {1, 1}, and the walk will be St = k=1 Xk . So Xk is the k-th step of the
walk, and St is the position after t steps.
Let us consider a few properties of the random walk on Z:
First let us calculate the expected number of visits to 0 by time t:

Proposition 1.1. Let (St )t be a random walk on Z. Denote by Vt the number of visits to 0
up to time t; that is,

Vt = # {1 k t : Sk = 0} .

Then, there exists a constant c > 0 such that for all t,

E[Vt ] > c t.

Proof. An inequality we will use is Stirlings approximation of n!:

1 1
2n(n/e)n e 12n+1 < n! < 2n(n/e)n e 12n .

This leads by a bit of careful computation to:

1 2n 1 2n 1 2n 1
2 exp < < 2 exp .
n 12n + 1 n n 12n
Specifically,

2n
1
2 < n 22n < 2.
n

Now, what is the probability P[Sk = 0]? Note that there are k steps, so for Sk = 0 we need James Stirling (1692-1770)

that the number of +s equals the number of s. Rigorously, if

Rt = # {1 k t : Xk = 1} and Lt = # {1 k t : Xk = 1} ,

then Rt + Lt = t. Moreover, the distribution of Rt is Bin(t, 1/2). Also, St = Rt Lt , so for

St = 0 we need that Rt = Lt = t/2. This is only possible for even t, and we get

2k 2k
P[S2k = 0] = P[R2k = k] = 2 and P[S2k+1 = 0] = 0.
k
Pt
Now, note that Vt = k=1 1{Sk =0} . So

t bt/2c
X X 2k 2k
E[Vt ] = P[Sk = 0] = 2
k
k=1 k=1
6

Since
m m+1
1 1
X Z
dx = 2 1/2 ( m + 1 1),
k 1 x
k=1

we get that E[Vt ] c t for some c > 0. t
u

Let us now consider the probability that the random walker will return to the origin.

Proposition 1.2. P[ t 1 : St = 0] = 1.

Proof. Let p = P[ t 1 : St = 0]. Assume for a contradiction that p < 1. (p > 0 since
Pk
p > P[S2 = 0] = 21 .) Suppose that St = 0 for some t > 0. Then, since St+k = St + j=1 Xt+j ,

(St+k )k has the same distribution as a random walk on Z, and is independent of St .

So P[ k 1 : St+k = 0 | St = 0] = p. Thus, every time we are at 0 there is probability

0 < 1 p < 1 to never return.
Now we consider the different excursions. That is, let T0 = 0 and define inductively

Tk = inf {t Tk1 + 1 : St = 0} ,

where inf = . Now let K be the first k such that Tk = . The analysis above gives that for
k 1,

P[K = k] = P[T1 < , . . . , Tk1 < , Tk = ] = P[T1 T0 < , . . . , Tk1 Tk2 < , Tk Tk1 = ].

The main observation now is that the different Tk Tk1 are independent, so P[K = k] =
1
pk1 (1 p). That is, K Geo(1 p). Thus, E[K] = 1p . But note that K is exactly the
number of visits to 0 in the infinite time walk. That is, Vt % K. However, in the previous

proposition we have shown that E[Vt ] c t a contradiction!
So it must be that p = 1. t
u

It is not a coincidence that the expected number of visits to 0 is infinite, and that the
probability to return to 0 is 1. This will also be the case in 2-dimensions, but not in 3-dimensions.
In the upcoming classes we will rigorously prove the following theorem by Polya.

Theorem 1.3. Fix d 1. Let (Xk )k be i.i.d. d-dimension random variables uniformly dis-
Pt
tributed on {e1 , . . . , ed } (where {e1 , . . . , ed } is the standard basis for Rd ). Let St = k=1 Xk .
Let p(d) = P[ t 1 : St = 0]. Then, p(d) = 1 for d 2 and p(d) < 1 for d 3.
7

Remark 1.4. The proof for d 3 is mainly that P[St = 0] Ctd/2 . Thus, for d 3,

X
P[St = 0] < .
t=1

So by the Borel-Cantelli Lemma P[St = 0 i.o. ] = 0. In other words,

P[ T : t > T St 6= 0] = P[lim inf {St 6= 0}] = 1.

Thus, a.s. the number of visits to 0 is finite. If the probability to return to 0 was 1, then the
number of visits to 0 must be infinite a.s. All this will be done rigorously in the upcoming
classes.

Number of exercises in lecture: 0

Total number of exercises until here: 0
8

Random Walks

Ariel Yadin

Lecture 2: Markov Chains

2.1. Preliminaries

2.1.1. Graphs. We will make use of the structure known as a graph:

S

X Notation: For a set S we use k to denote the set of all subsets of size k in S; e.g.

S
= {{x, y} : x, y S, x 6= y} .
2

Definition 2.1. A graph G is a pair G = (V (G), E(G)), where V (G) is a countable set, and
E(G) V (G)

2 .
The elements of V (G) are called vertices. The elements of E(G) are called edges. The
G
notation x y (sometimes just x y when G is clear from the context) is used for {x, y} E(G).
If x y, we say that x is a neighbor of y, or that x is adjacent to y. If x e E(G) then
the edge e is said to be incident to x, and x is incident to e.
The degree of a vertex x, denoted deg(x) = degG (x) is the number of edges incident to x in
G.

X Notation: Many times we will use x G instead of x V (G).

Example 2.2. The complete graph.

Empty graph on n vertices.
Cycles.
Z, Z2 , Zd .
Regular trees.
Cayley graphs of finitely generated groups: Let G =< S > be a finitely generated group,
with a finite generating set S such that S is symmetric (S = S 1 ). Then, we can equip G
with a graph structure C = CG,S by letting V (C) = G and {g, h} E(C) iff g 1 h S.
S being symmetric implies that this is a graph.
9

CG,S is called the Cayley graph of G with respect to S.

Examples: Zd , regular trees, cycles, complete graphs.
454

Definition 2.3. Let G be a graph. A path in G is a sequence = (0 , 1 , . . . , n ) (with the

possibility of n = ) such that for all j, j j+1 . 0 is the start vertex and n is the end
vertex (when n < ).
The length of is || = n.
If is a path in G such that starts at x and ends at y we write : x y.

The notion of a path on a graph gives rise to two important notions: connectivity and graph
distance.

Definition 2.4. Let G be a graph. For two vertices x, y G define

dist(x, y) = distG (x, y) := inf {|| : : x y} ,

where inf = .

Exercise 2.1. Show that distG defines a metric on G.

(Recall that a metric is a function that satisfies:

(x, y) 0 and (x, y) = 0 iff x = y.

(x, y) = (y, x).
(x, y) (x, z) + (z, y). )

Definition 2.5. Let G be a graph. We say that vertices x and y are connected if there exists
a path : x y of finite length. That is, if distG (x, y) < . We denote x connected to y by
x y.
The relation is an equivalence relation, so we can speak of equivalence classes. The equiv-
alence class of a vertex x under this relation is called the connected component of x.
If a graph G has only one connected component it is called connected. That is, G is connected
if for every x, y G we have that x y.
10

Exercise 2.2. Prove that is an equivalence relation in any graph.

X In this course we will focus on connected graphs.

X Notation: For a path in a graph G, or more generally, a sequence of elements from a set

S, we use the following time notation: If s = (s0 , s1 , . . . , sn , . . .) is a sequence in S (finite of

infinite), then s[t1 , t2 ] = (st1 , st1 +1 , . . . , st2 ) for all integers t2 t1 0.

2.1.2. S-valued random variables. Given a countable set S, we can define a discrete topology
on S. Thus, the Borel -algebra on S is just the complete -algebra 2S . This gives rise to
the notion of S-valued random variables, which is just a fancy name for functions X from a
probability space into S such that for every s S the pull-back X 1 (s) is an event.
That is,

Definition 2.6. Let (, F, P) be a probability space, and let S be a countable set. A S-valued
random variable is a function X : S such that for any s S, X 1 (s) F.

2.1.3. Sequences - infinite dimensional vectors. At some point, we will want to consider
sequences of random variables. If X = (Xn )n is a sequence of S-valued random variables, we
can think of X as an infinite dimensional vector.
What is the appropriate measurable space for such vectors?
Well, we can consider = S N , the space of all sequences in S. Next, we have a -system
of cylinder sets: Given a finite sequence s0 , s1 , . . . , sm in S, the cylinder induced by these is

C = C(s1 , . . . , sm ) = S N : 0 = s0 , . . . , m = sm . The collection of all cylinder sets
forms a -system. We let F be the -algebra generated by this -system.

2.1.4. Caratheodory and Kolmogorov extension. Now suppose we have a probability mea-
sure P on (, F) as above. For every n, we can consider the restriction of P to the first n
Constantin Caratheodory coordinates; that is, we can consider n = S n and the full -algebra on n , and then
(1873-1950)

Pn [{s0 , s1 , . . . , sn1 }] := P[C(s0 , s1 , . . . , sn1 )]

defines a probability measure on n . Note that these measures are consistent, in the sense that
for any n > m,

Pm [{s0 , . . . , sm }] = Pn [{ S n : 0 = s0 , . . . , m = sm }].

Theorems by Caratheodory and Kolmogorov tell us that if we started with a consistent family
of probability measure on Sn , n = 1, 2, . . ., we could find a unique extension of these whose
restriction would give these measures.
In other words, the finite-dimensional marginals determine the probability measure of the Andrey Kolmogorov

sequence. (1903-1987)

2.1.5. Matrices. Recall that if A, B are n n matrix and v is an n-dimensional vector, then
Av, vA are vectors defined by
n
X n
X
(Av)k = Ak,j vj and (vA)k = vj Aj,k .
j=1 j=1

Also, AB is the matrix defined by

n
X
(AB)m,k = Am,j Bj,k .
j=1

These definitions can be generalized to infinite dimensions.

Also, we will view vectors also as functions, and matrices as operators. For example, if
C0 (N) = RN = {f : N R}. Then, any infinite matrix A is an operator on C0 (N) by defining
X X
(Af )(k) := A(k, n)f (n) and (f A)(k) := f (n)A(n, k).
n n

2.2. Markov Chains

A stochastic process is just a sequence of random variables. If (Xn )n is a stochastic

process, or a sequences of random variables, then we can think of the sequence (Xn )n as a
infinite dimensional random variable; consider the function f : N R defined by f (n) = Xn .
This is a different function for each . We can view this as a random function.
Up till now we have not restricted our processes - so anything can be a stochastic process.
However, in the discussion regarding random walks, we wanted the current step to be dependent
only on the position, regardless of the history and time. This gives rise to the following definition:

Definition 2.7. Let S be a countable set. A Markov chain on S is a sequence (Xn )n0 of
S-valued random variables (i.e. measurable functions Xn : S), that satisfies the following
Markovian property:
12

For any n 0, and any s0 , s1 , . . . , sn , sn+1 S,

P[Xn+1 = sn+1 |X0 = s0 , . . . , Xn = sn ] = P[Xn+1 = sn+1 |Xn = sn ] = P[X1 = sn+1 |X0 = sn ].

That is, the probability to go from s to s0 does not depend on n or on the history, but only
on the current position being at s and on s0 . This property is known as the Markov property.
Andrey Markov (1871-1897)
X A set S as above is called the state space.

Remark 2.8. Any Markov chain is characterized by its transition matrix.

Let (Xn )n be a Markov chain on S. For x, y S define P (x, y) = P[Xn+1 = y|Xn = x]
(which is independent of n). Then, P is a |S| |S| matrix indexed by the elements of S. One
immediately notices that for all x,
X
P (x, y) = 1,
yS

and that all the entries of P are in [0, 1]. Such a matrix is called stochastic. [ Each row of the
matrix is a probability measure on S. ]
On the other hand, suppose that P is a stochastic matrix indexed by a countable set S. Then,
one can define the sequence of S-valued random variables as follows. Let X0 = x for some fixed
starting point x X. For all n 0, conditioned on X0 = s0 , . . . , Xn = sn , define Xn+1 as
the random variable with distribution P[Xn+1 = y|Xn = sn , . . . , X0 = s0 ] = P (sn , y). One can
verify that this defines a Markov chain.
We will identify a stochastic matrix P with the Markov chain it defines.

X Notation: We say that (Xt )t is Markov-(, P ) if (Xt )t is a Markov chain with transition

matrix P and starting distribution X0 . If we wish to stress the state space, we say that
(Xt )t is Markov-(, P, S). Sometimes we omit the starting distribution; i.e. (Xt )t is Markov-P
means that (Xt )t is a Markov chain with transition matrix P .

Example 2.9. Consider the following state space and matrix: S = Z. P (x, y) = 0 if |x y| =
6 1
and P (x, y) = 1/2 if |x y| = 1.
What if we change this to P (x, y) = 1/4 for |x y| = 1 and P (x, x) = 1/2?
What about P (x, x + 1) = 3/4 and P (x, x 1) = 1/4? 454

Example 2.10. Consider the set Zn := Z/nZ = {0, 1, . . . , n 1}. Let P (x, y) = 1/2 for
x y {1, 1} (mod n). 454
13

1
Example 2.11. Let G be a graph. For x, y G define P (x, y) = deg(x) if x y and P (x, y) = 0
if x 6 y.
This Markov chain is called the simple random walk on G.
1
If we take 0 < < 1 and set Q(x, x) = and Q(x, y) = (1 ) deg(x) for x y, and
Q(x, y) = 0 for x 6 y, then Q is also a stochastic matrix, and defines what is sometimes called
the lazy random walk on G (with holding probability ). Note that Q = I + (1 )P . 4 5 4

X Notation: We will usually use (Xn )n to denote the realization of Markov chains. We will

also use Px to denote the probability measure Px = P[|X0 = x]. Note that the Markov property
is just the statement that

P[Xn = x|X0 = s0 , . . . , Xn = sn ] = P[Xn+1 = x|Xn = sn ] = Psn [X1 = x].

More generally, if is a probability measure on S, we write

X
P = P[|X0 ] = (s) Ps .
s

Exercise 2.3. Let (Xn )n be a Markov chain on state space S, with transition matrix
P . Show that for any event A (X0 , . . . , Xk )

P [Xn+k = y|A, Xk = x] = P n (x, y)

(provided P [A, Xk = x] > 0).

Remark 2.12. For those uncomfortable with -algebras,

Example 2.13. Consider a bored programmer. She has a (possibly biased) coin, and two chairs,
say a and b. Every minute, out of boredom, she tosses the coin. If it comes out heads, she moves
to the other chair. Otherwise, she does nothing.
This can be modeled by a Markov chain on the state space {a, b}. At each time, with some
probability 1 p the programmer does not move, and withprobability p she jumps to the other
1p p
state. The corresponding transition matrix would be P = .
p 1p
What is the probability Pa [Xn = b] =? For this we need to calculate P n .
A complicated way would be to analyze the eigenvalues of P ...
14

An easier way: Let n = P n (a, ). So n+1 = n P . Consider the vector = (1/2, 1/2).
Then P = P . Now, consider an = (n )(a). Since n is a probability measure, we get that
n (b) = 1 n (a), so

an = (n1 )P (a) = (1 p)n1 (a) + pn1 (b) 1/2

= (1 2p)(n1 )(a) + p (a) + (1 2p)(a) = (1 2p)an1 .

1 1+(12p)n
So an = (1 2p)n a0 = (1 2p)n 2 and P n (a, a) = n (a) = 2 . (This also implies that
n n 1(12p)n
P (a, b) = 1 P (a, a) = 2 .)
We see that
n
1 1
P 12 = .
1 1
454

The following proposition relates starting distributions, and steps of the Markov chain, to
matrix and vector multiplication.

Proposition 2.14. Let (Xn )n be a Markov chain with transition matrix P on some state
P
space S. Let be some distribution on S; i.e. is an S-indexed vector with s (s) = 1. Then,
P [Xn = y] = (P n )(y). Specifically, taking = x we get that Px [Xn = y] = P n (x, y).
Moreover, if f : S R is any function, which can be viewed as a S-indexed vector, then

P n f = E [f (Xn )] and (P n f )(x) = Ex [f (Xn )].

Proof. This is shown by induction: It is the definition for n = 0 (P 0 = I the identity matrix).
The Markov property gives for n > 0, using induction,
X
P [Xn = y] = P [Xn = y|Xn1 = s] P [Xn1 = s]
sS
X
= P (s, y)(P n1 )(s) = ((P n1 )P )(y) = (P n )(y).
s

The second assertion also follows by conditional expectation,

X X X
E [f (Xn )] = (s) E[f (Xn )|X0 = s] = (s) P[Xn = x|X0 = s]f (x)
s s x
X
= (s)P n (s, x)f (x) = P n f.
s,x

n
(P f )(x) = Ex [f (Xn )] is just for = x . t
u
15

2.3. Classification of Markov chains

When we spoke about graphs, we have the notion of connectivity. We are now interested to
generalize this notion to Markov chains. We want to say that a state x is connected to a state y
if there is a way to get from x to y; note that for general Markov chains this does not necessarily
imply that one can get from y to x.

Definition 2.15. Let P be the transition matrix of a Markov chain on S. P is called

irreducible if for every pair of states x, y S there exists t > 0 such that P t (x, y) > 0.

This means that for every pair, there is a large enough time such that with positive probability
the chain can go from one of the pair to the other in that time.

Example 2.16. Consider the cycle Z/nZ, for n even. This is an irreducible chain since for any
x, y, we have for t = dist(x, y), if is a path of length t from x to y,

P t (x, y) Px [(X0 , . . . , Xt ) = ] = 2t > 0.

Note that at each step, the Markov chain moves from the current position +1 or 1 (mod n).
Thus, since n is even, at even times the chain must be at even vertices, and at odd times the
chain must be at odd vertices.
Thus, it is not true that there exists t > 0 such that for all x, y, P t (x, y) > 0.
The main reason for this is that the chain has a period: at even times it is on some set, and
at odd times on a different set. Similarly, the chain cannot be back at its starting point at odd
times, only at even times. 454

Definition 2.17. Let P be a Markov chain on S.

A state x is called periodic if gcd {t 1 : P t (x, x) > 0} > 1, and this gcd is called the
period of x.
If gcd {t 1 : P t (x, x) > 0} = 1 the x is called aperiodic.
P is called aperiodic if all x S are aperiodic. Otherwise P is called periodic.

X Note that in the even-length cycle example, gcd {t 1 : P t (x, x) > 0} = gcd {2, 4, 6, . . .} =

Remark 2.18. If P is periodic, then there is an easy way to fix P to become aperiodic: namely,
let Q = I +(1)P be a lazy version of P . Then, Q(x, x) for all x, and thus Q is aperiodic.
16

Proposition 2.19. Let P be a Markov chain on state space S.

x is aperiodic if and only if there exists t(x) such that for all t > t(x), P t (x, x) > 0.
If P is irreducible, then P is aperiodic if and only if there exists an aperiodic state x.
Consequently, if P is irreducible and aperiodic, and if S is finite, then there exists t0
such that for all t > t0 all x, y admit P t (x, y) > 0.

Proof. We start with the first assertion. Assume that x is aperiodic. Let R = {t 1 : P t (x, x) > 0}.
Since P t+s (x, x) P t (x, x)P s (x, x) we get that t, s R implies t + s R; i.e. R is closed under
addition. A number theoretic result tells us that since gcd R = 1 it must be that Rc is finite.
The other direction is simpler. If Rc is finite, then R contains primes p 6= q, so gcd R =
gcd(p, q) = 1.
For the second assertion, if P is irreducible and x is aperiodic, then let t(x) be such that for
all t > t(x), P t (x, x) > 0. For any z, y let t(z, y) be such that P t(z,y) (z, y) > 0 (which exists by
irreducibility). Then, for any t > t(y, x) + t(x) + t(x, y) we get that

P t (y, y) P t(y,x) (y, x)P tt(y,x)t(x,y) (x, x)P t(x,y) (x, y) > 0.

So for all large enough t, P t (y, y) > 0, which implies that y is aperiodic. This holds for all y, so
P is aperiodic.
The other direction is trivial from the definition.
For the third assertion, for any z, y let t(z, y) be such that P t(z,y) (z, y) > 0. Let T =
maxz,y {t(z, y)}. Let x be an aperiodic state and let t(x) be such that for all t > t(x), P t (x, x) >
0. We get that for any t > 2T + t(x) we have that t t(z, x) t(x, z) t 2T > t(x), so

P t (z, y) P t(z,x) (z, x)P tt(z,x)t(x,z) (x, x)P t(x,z) (x, z) > 0.

t
u

Exercise 2.4. Let G be a finite connected graph, and let Q be the lazy random walk
1
on G with holding probability ; i.e. Q = I + (1 )P where P (x, y) = deg(x) if x y and
P (x, y) = 0 if x 6 y.
Show that Q is aperiodic. Show that for diam(G) = max {dist(x, y) : x, y G} we have that
for all t > diam(G), all x, y G admit Qt (x, y) > 0.
17

Number of exercises in lecture: 4

Total number of exercises until here: 4
18

Random Walks

Ariel Yadin

Lecture 3: Recurrence and Transience

3.1. Recurrence and Transience

X Notation: If (Xt )t is Markov-P on state space S, we can define the following: For A S,

TA = inf {t 0 : Xt A} and TA+ = inf {t 1 : Xt A} .

These are the hitting time of A and return time to A. (We use the convention that inf = .)
+
If A = {x} we write Tx = T{x} and similarly Tx+ = T{x} .
Recall that we saw that the simple random walk on Z a.s. returns to the origin. We also
stated that on Z3 this is not true, and the simple random walk will never return to the origin
with positive probability.
Let us classify Markov chain according to these properties.

Definition 3.1. Let P be a Markov chain on S. Consider a state x S.

If Px [Tx+ = ] > 0, we say that x is a transient state.

If Px [Tx+ < ] = 1, we say that x is recurrent .
For a recurrent state x, there are two options:
If Ex [Tx+ ] < we say that x is positive recurrent.
If Ex [Tx+ ] = we say that x is null recurrent.

Our first goal will be to prove the following theorem.

Theorem 3.2. Let (Xt )t be a Markov chain on S with transition matrix P . If P is irre-
ducible, then for any x, y S, x is (positive, null) recurrent if and only if y is (positive, null)
recurrent.

That is, for irreducible chains, all the states have the same classification.
19

3.2. Stopping Times

A word about -algebras:

Recall that the canonical -algebra we take on the space S N is the -algebra generated by

the cylinder sets. A cylinder set is a set of the form S N : 0 = x0 , . . . , t = xt for some
t 0. A S N is called a t-cylinder set if there exist x0 , . . . , xt S such that for every A
we have j = xj for all j = 0, . . . , t.
Recall the -algebra

(X0 , . . . , Xt ) = Xj1 (x) : x S , j = 0, . . . , t = A : A is a j-cylinder set for some j t .

Exercise 3.1. Define an equivalence relation on S N by t 0 if j = j0 for all

j = 0, 1, . . . , t.
Show that this is indeed an equivalence relation.
We say that en event A respects t if for any equivalent t 0 we have that A if and
only if 0 A.
Show that (X0 , X1 , . . . , Xt ) = {A : A respects t }.

The hitting and return times above have the property, that their value can be determined by
the history of the chain; that is the event {TA t} is determined by (X0 , X1 , . . . , Xt ).

Definition 3.3 (Stopping Time). Consider a Markov chain on S. Recall that the probability
space is (S N , F, P) where F is the -algebra generated by the cylinder sets.
A random variable T : S N N {} is called a stopping time if for all t 0, the event
{T t} (X0 , . . . , Xt ).

Example 3.4. Any hitting time and return time is a stopping time. Indeed,
t
[
{TA t} = {Xj A} .
j=0

Similarly for TA+ . 454

Example 3.5. Consider the simple random walk on Z3 . Let T = sup {t : Xt = 0}. This is the
last time the walk is at 0. One can show that T is a.s. finite. However, T is not a stopping time,
20

since for example

\
{T = 0} = { t > 0 Xt 6= 0} = {Xt 6= 0} 6 (X0 ).
t=1

454

Example 3.6. Let (Xt )t be a Markov chain and let T = inf {t TA : Xt A0 }, where A, A0
S. Then T is a stopping time, since
t [
[ k
{T t} = {Xm A, Xk A0 } .
k=0 m=0

454

Proposition 3.7. Let T, T 0 be stopping times. The following holds:

Any constant t N is a stopping time.

T T 0 and T T 0 are stopping times.
T + T 0 is a stopping time.

Proof. Since {t k} {, }, the trivial -algebra, we get that {t k} (X0 , . . . , Xk ) for

any k. So constants are stopping times.
For the minimum:
[
{T T 0 t} = {T t} {T 0 t} (X0 , . . . , Xt ).

The maximum is similar:

\
{T T 0 t} = {T t} {T 0 t} (X0 , . . . , Xt ).

For the addition,

t
[
{T + T 0 t} = {T = k, T 0 t k} .
k=0

Since {T = k} = {T k} \ {T k 1} (X0 , . . . , Xk ), we get that T + T 0 is a stopping

time. t
u

3.2.1. Conditioning on a stopping time. Stopping times are extremely important in the
theory of martingales, a subject we will come back to in the future.
For the moment, the important property we want is the Strong Markov Property.
For a fixed time t, we saw that the process (Xt+n )n is a Markov chain with starting distribution
Xt , independent of (X0 , . . . , Xt ). We want to do the same thing for stopping times.
21

Let T be a stopping time. The information captured by X0 , . . . , XT , is the -algebra (X0 , . . . , XT ).

This is defined to be the collection of all events A such that for all t, A{T t} (X0 , . . . , Xt ).
That is,
(X0 , . . . , XT ) = {A : A {T t} (X0 , . . . , Xt ) for all t} .

One can check that this is indeed a -algebra.

Exercise 3.2. Show that (X0 , . . . , XT ) is a -algebra.

Important examples are:

For any t, {T t} (X0 , . . . , XT ).

Thus, T is measurable with respect to (X0 , . . . , XT ).
XT is measurable with respect to (X0 , . . . , XT ) (indeed {XT = x, T t} (X0 , . . . , Xt )
for all t and x).

Proposition 3.8 (Strong Markov Property). Let (Xt )t be a Markov-P on S, and let T be
a stopping time. For all t 0, define Yt = XT +t . Then, conditioned on T < and XT , the
sequence (Yt )t is independent of (X0 , . . . , XT ) and is Markov-(XT , P ).

Proof. The (regular) Markov property tells us that for any m > k, and any event A (X0 , . . . , Xk ),

P[Xm = y, A, Xk = x] = P mk (x, y) P[A, Xk = x].

We need to show that for all t, and any A (X0 , . . . , XT ),

P[XT +t+1 = y|XT +t = x, 1A , T < ] = P (x, y)

(provided of course that P[XT +t = x, A, T < ] > 0). Indeed this follows from the fact that
A {T = k} (X0 , . . . , Xk ) (X0 , . . . , Xk+t ) for all k, so

X
P[XT +t+1 = y,A, XT +t = x, T < ] = P[Xk+t+1 = y, Xk+t = x, A, T = k]
k=0

X
= P (x, y) P[Xk+t = x, A, T = k] = P (x, y) P[XT +t = x, A, T < ].
k=0

t
u

Another way to state the above proposition is that for a stopping time T , conditional on
T < we can restart the Markov chain from XT .
22

3.3. Excursion Decomposition

We now use the strong Markov property to prove the following:

Example 3.9. Let P be an irreducible Markov chain on S. Fix x S.

(0)
Define inductively the following stopping times: Tx = 0, and
n o
Tx(k) = inf t Tx(k1) + 1 : Xt = x .
(k)
So Tx is the time of the k-th return to x.
Pt
Let Vt (x) be the number of visits to x up to time t; i.e. Vt (x) = k=1 1{Xk =x} .
(k)
It is immediate that Vt (x) k if and only if Tx t.
Now let us look at the excursions to x: The k-th excursion is

X[Tx(k1) , Tx(k) ] = (XT (k1) , XT (k1) +1 , . . . , XT (k) ).

x x x

These excursions are paths of the Markov chain ending at x and starting at x (except, possibly,
the first excursion which starts at X0 ).
For k > 0 define
x(k) = Tx(k) Tx(k1) ,
(k) (k)
if Tx < and 0 otherwise. For Tx < , this is the length of the k-th excursion.
(k1) (k1) (k)
We claim that conditioned on Tx < , the excursion X[Tx , Tx ], is independent
of (X0 , . . . , XT (k1) ), and has the distribution of the first excursion X[0, Tx+ ] conditioned on
x

X0 = x.
Indeed, let Yt = XT (k1) +t . For any A (X0 , . . . , XT (k1) ), and for any path : x x,
x x

since XT (k1) = x,
x

P[Y [0, x(k) ] = |A, Tx(k1) ] = P[X[Tx(k1) , Tx(k) ] = |A, Tx(k1) < ] = Px [X[0, Tx+ ] = ],

where we have used the strong Markov property. 454

This gives rise to the following relation:

Lemma 3.10. Let P be an irreducible Markov chain on S. Then,

(Px [Tx+ < ])k = Px [V (x) k] = Px [Tx(k) < ].

Consequently,
1
1 + Ex [V (x)] = ,
Px [Tx+ = ]
where 1/0 = .
23

Proof. The event {V (x) k} is the event that x is visited at least k times, which is exactly
the event that the k-th excursion ends at some finite time. From the example above we have
that for any m,

P[Tx(m) < |Tx(m1) < ] = P[t 1 : XT (m1) +t = x|Tx(m1) < ] = Px [Tx+ < ].
x

n o n o
(m) (m) (m1)
Since Tx < = Tx < , Tx < , we can inductively conclude that

Px [Tx(k) < ] = Px [Tx(k) < |Tx(k1) < ] P[Tx(k1) < ]

= = (Px [Tx+ < ])k

The second assertion follows from the fact that

X 1
1 + Ex [V (x)] = Px [V (x) k] = ,
k=0
1 Px [Tx+ < ]

where this holds even if Px [Tx+ < ] = 1. t

Similarly, one can prove:

Exercise 3.3. Let (Xt )t be Markov-(S, P ) for some irreducible P . Let Z S. Show
that under Px , the number of visits to x until hitting Z (i.e. the random variable V = VTZ (x) +
1{X0 =x} ) is distributed geometric-p, for p = Px [TZ < Tx+ ].

We now get the following important characterization of recurrence in Markov chains:

Corollary 3.11. Let P be an irreducible Markov chain on S. Then the following are equivalent:

(1) x is recurrent.
(2) Px [V (x) = ] = 1.
(3) For any state y, Px [Ty+ < ] = 1.
(4) Ex [V (x)] = .

Proof. If x is recurrent, then Px [Tx+ < ] = 1. So for any k, Px [V (x) k] = 1. Taking k to

infinity, we get that Px [V (x) = ] = 1. This is the first implication.
For the second implication: Let y S.
(k1) (k) (k)
Let Ek = X[Tx , Tx ] be the k-th excursion from x. We assumed that Px [ k Tx < ] =
1. So under Px , all (Ek ) are independent and identically distributed.
24

Since P is irreducible, there exists t > 0 such that Px [Xt = y , t < Tx+ ] > 0 (this is an
exercise). Thus, we have that p := Px [Ty < Tx+ ] Px [Xt = y , t < Tx+ ] > 0. This implies by
the strong Markov property that

Px [Ty < Tx(k+1) | Ty > Tx(k) , Tx(k) < ] p > 0.

(k)
So, using the fact that Px [ k Tx < ] = 1,

Px [Ty Tx(k) ] = Px [Ty Tx(k) | Ty > Tx(k1) , Tx(k1) < ] Px [Ty > Tx(k1) ]

(1 p) Px [Ty Tx(k1) ] (1 p)k .

Thus,

Px [Ty+ = ] Px [ k , Ty Tx(k1) ] = lim (1 p)k = 0.

This proves the second implication.

Finally, if for any y we have Px [Ty+ < ] = 1, then taking y = x shows that x is recurrent.
This shows that (1),(2),(3) are equivalent.
1
It is obvious that (2) implies (4). Since Px [Tx+ = ] = Ex [V (x)]+1 , we get that (4) implies
(1). t
u

Exercise 3.4. Show that if P is irreducible, there exists t > 0 such that Px [Xt = y , t <
Tx+ ] > 0.

Solution to ex:3.4. :(
There exists n such that P n (x, y) > 0 (because P is irreducible). Thus, there is a sequence
x = x0 , x1 , . . . , xn = y such that P (xj , xj+1 ) > 0 for all 0 j < n. Let m = max{0 j <
n : xj = x}, and let t = n m and yj := xm+j for 0 j t. Then, we have the sequence
x = y0 , . . . , yt = y so that yj 6= x for all 0 < j t, and we know that P (yj , yj+1 ) > 0 for all
0 j < t. Thus,

Px [Xt = y , t < Tx+ ] Px [ 0 j t , Xj = yj ] = P (y0 , y1 ) P (yt1 , yt ) > 0.

:) X
25

Example 3.12. A gambler plays a fair game. Each round she wins a dollar with probability
1/2, and loses a dollar with probability 1/2, all rounds independent. What is the probability
that she never goes bankrupt, if she starts with N dollars?

We have already seen that this defines a simple random walk on Z, and that E0 [Vt (0)] c t.
Thus, taking t we get that E0 [V (0)] = , and so 0 is recurrent.
Note that 0 here was not special, since all vertices look the same. This symmetry implies that
Px [Tx+ < ] = 1 for all x Z. Thus, for any N , PN [T0+ = ] = 0. That is, no matter how
much money the gambler starts with, she will always go bankrupt eventually. 454

We now have part of Theorem 3.2.

Corollary 3.13. Let P be an irreducible Markov chain on S. Then, for any x, y S, x is

transient if and only if y is transient.

Proof. As usual, by irreducibility, for any pair of states z, w we can find t(z, w) > 0 such that
P t(z,w) (z, w) > 0.
Fix x, y S and suppose that x is transient. For any t > 0,

P t+t(x,y)+t(y,x) (x, x) P t(x,y) (x, y)P t (y, y)P t(y,x) (y, x).

Thus,

X
t 1 X
Ey [V (y)] = P (y, y) t(x,y) P t+t(x,y)+t(y,x) (x, x) < .
t=1
P (x, y)P t(y,x) (y, x) t=1
So y is transient as well. t
u

Number of exercises in lecture: 4

Total number of exercises until here: 8
26

Random Walks

Ariel Yadin

Lecture 4: Stationary Distributions

4.1. Stationary Distributions

Suppose that P is a Markov chain on state space S such that for some starting distribution
, we have that P [Xn = x] (x) where is some limiting distribution. One immediately
checks that in this case we must have
X
P (x) = lim P n (y, s)P (s, x) = lim P n+1 (y, x) = (x),
n n
s

or P = . (That is, is a left eigenvector for P with eigenvalue 1.)

Definition 4.1. Let P be a Markov chain. If is a distribution satisfying P = then is

called a stationary distribution.

1p p 1 1
Example 4.2. Recall the two-state chain P = . We saw that P 1 .
2
p 1p 1 1
Indeed, it is simple to check that = (1/2, 1/2) is a stationary distribution in this case. 4 5 4

Example 4.3. Consider a finite graph G. Let P be the transition matrix of a simple random
1
walk on G. So P (x, y) = deg(x) 1{xy} . Or: deg(x)P (x, y) = 1{xy} . Thus,
X
deg(x)P (x, y) = deg(y).
x

So deg is a left eigenvector for P with eigenvalue 1. Since

X XX X
deg(x) = 1{xy} = 2 = 2|E(G)|,
x x y eE(G)

deg(x)
we normalize (x) = 2|E(G)| to get a stationary distribution for P . 454

The above stationary distribution has a special property, known as the detailed balance equa-
tion:
A distribution is said to satisfy the detailed balance equation with respect to a transition
matrix P if for all states x, y
(x)P (x, y) = (y)P (y, x).
27

Exercise 4.1. If satisfies the detailed balance equations, then is a stationary

distribution.

We will come back to such distributions in the future.

4.2. Stationary Distributions and Hitting Times

There is a deep connection between stationary distributions and return times. The main
result here is:

Theorem 4.4. Let P be an irreducible Markov chain on state space S. Then, the following
are equivalent:

P has a stationary distribution .

Every x is positive recurrent.
Some x is positive recurrent.
1
P has a unique stationary distribution, (x) = Ex [Tx+ ]
.

The proof of this theorem goes through a few lemmas.

X In the next lemma we will consider a function (vector) v : S [0, ]. Although it

may take the value , since we are only dealing with non-negative numbers we can write
P
vP (x) = y v(y)P (y, x) without confusion (with the convention that 0 = 0).

Lemma 4.5. Let P be an irreducible Markov chain on state space S. Let v : S [0, ] be
such that vP = v. Then:

If there exists a state x such that v(x) < then v(y) < for all states y.
If v is not the zero vector, then v(y) > 0 for all states y.

X Note that this implies that if is a stationary distribution then all the entries of are

strictly positive.

Proof. For any t, using the fact that v 0,

X
v(x) = v(z)P t (z, x) v(y)P t (y, x).
z
28

Thus, for a suitable choice of t, since P is irreducible, we know that P t (y, x) > 0, and so
v(x)
v(y) P t (y,x) < .
For the second assertion, if v is not the zero vector, since it is non-negative, there exists a
state x such that v(x) > 0. Thus, for any state y and for t such that P t (x, y) > 0 we get
X
v(y) = v(z)P t (z, y) v(x)P t (x, y) > 0.
z

t
u

Pt
X Notation: Recall that for a Markov chain (Xt )t we denote by Vt (x) = k=1 1{Xk =x} the

number of visits to x.

Lemma 4.6. Let (Xt )t be Markov-(P, ) for irreducible P . Assume T is a stopping time such
that
P [XT = x] = (x) for all x.

Assume further that 1 T < P -a.s. Let v(x) = E [VT (x)].

v(x)
Then, vP = v. Moreover, if E [T ] < then P has a stationary distribution (x) = E [T ] .

Proof. The assumptions on T give that for any j,

X
(x) = P [XT = x] = P [Xj = x, T = j].
j=1

X
X
P [Xj = y, T > j] = P [X0 = y] + P [Xj = y, T > j]
j=0 j=1

X
= P [Xj = y, T = j] + P [Xj = y, T > j]
j=1

X
= P [Xj = y, T j] = v(y).
j=1

Thus we have that

X
X
v(x) = P [Xj = x, T j] = P [Xj+1 = x, T > j]
j=1 j=0
X
X
= P [Xj+1 = x, Xj = y, T > j]
j=0 y

XX
= P [Xj = y, T > j]P (y, x) = (vP )(x).
y j=0
29

That is, vP = v.
Since
X X
v(x) = E [ VT (x)] = E [T ],
x x

v(x)
if E [T ] < , then (x) = E [T ] defines a stationary distribution. t
u

Example 4.7. Consider (Xt )t that is Markov-P for an irreducible P , and let v(y) = Ex [VTx+ (y)].
If x is recurrent, then Px -a.s. we have 1 Tx+ < , and Px [XTx+ = y] = 1{y=x} = Px [X0 = y].
So we conclude that vP = v. Since Px -a.s. VTx+ (x) = 1, we have that 0 < v(x) = 1 < , so
0 < v(y) < for all y.
Note that although it may be that Ex [Tx+ ] = , i.e. x is null-recurrent, we still have that for
any y, Ex [VTx+ (y)] < , i.e. the expected number of visits to y until returning to x is finite.
Ex [VT + (y)]
If x is positive recurrent, then (y) = x
Ex [Tx+ ]
is a stationary distribution for P . 454

This vector plays a special role, as in the next Lemma.

Lemma 4.8. Let P be an irreducible Markov chain. Let u(y) = Ex [VTx+ (y)]. Let v 0 be a
non-negative vector such that vP = v, and v(x) = 1. Then, v u. Moreover, if x is recurrent,
then v = u.

Proof. If y = x then v(x) = 1 u(x), so we can assume that y 6= x.

We will prove by induction that for all t, for any y 6= x,

t
X
(4.1) Px [Xk = y, Tx+ k] v(y).
k=1

Indeed, for t = 1 this is just

X
Px [X1 = y, Tx+ 1] = P (x, y) v(z)P (z, y) = v(y),
z

since v 0, v(x) = 1 and y 6= x.

For general t > 0, we rely on the fact that by the Markov property, for any y 6= x,

X X
Px [Xk+1 = y, Tx+ k+1] = Px [Xk+1 = y, Xk = z, Tx+ k] = Px [Xk = z, Tx+ k]P (z, y).
z6=x z6=x
30

So by induction,
t+1
X t
X
Px [Xk = y, Tx+ k] = P (x, y) + Px [Xk+1 = y, Tx+ k + 1]
k=1 k=1

X t
X
= P (x, y) + P (z, y) Px [Xk = z, Tx+ k]
z6=x k=1
X X
P (x, y) + P (z, y)v(z) = v(z)P (z, y) = v(y).
z6=x z

This completes a proof of (4.1) by induction.

Now, one notes that the left-hand side of (4.1) is just the expected number of visits to y
started at x, up to time Tx+ t. Taking t , using monotone convergence,
t
X
v(y) Px [Xk = y, Tx+ k] = Ex [VTx+ t (y)] % u(y).
k=1

This proves that v u.

Since x is recurrent, we have uP = u, and u(x) = 1 = v(x). We have seen that v u 0,
and of course (v u)P = v u. Until now we have not actually used irreducibility; we will
use this to show that v u = 0. Indeed, let y be any state. If v(y) > u(y) then v u is a
non-zero non-negative left eigenvector for P , so must be positive everywhere. This contradicts
v(x) u(x) = 0. So it must be that v u 0. t
u

We are now in good shape to prove Theorem 4.4.

Proof of Theorem 4.4. Assume that is a stationary distribution for P . Fix any state x. Recall
(z)
that (x) > 0. Define the vector v(z) = (x) . We have that v 0, vP = v and v(x) = 1. Hence,
v(z) Ex [VTx+ (z)] for all z. That is,
X X X (y) 1
Ex [Tx+ ] = Ex [VTx+ (y)] v(y) = = < .
y y y
(x) (x)

So x is positive recurrent. This holds for a generic x.

The second bullet of course implies the third.
Now assume some state x is positive recurrent. Let v(y) = Ex [VTx+ (y)]. Since x is recurrent,
v
we know that vP = v, and y v(y) = Ex [Tx+ ] < . So = E [T
P
+
]
is a stationary distribution
x x

for P .
Since P has a stationary distribution, by the first implication all states are positive recurrent.

Thus, for any state z, if v = (z) then vP = v and v(z) = 1. So z being recurrent we get that
31

v(y) = Ez [VTz+ (y)] for all y. Specifically,

X 1
Ez [Tz+ ] = v(y) = ,
y
(z)

which holds for all states z.

For the final implication, if P has a specific stationary distribution, then of course it has a
stationary distribution. t
u

Corollary 4.9 (Stationary distributions are unique). If an irreducible Markov chain P has
two stationary distributions and 0 , then = 0 .

Exercise 4.2. Let P be an irreducible Markov chain. Show that for positive recurrent
states x, y,
Ex [VTx+ (y)] Ey [VTy+ (x)] = 1.

4.3. Transience, positive or null recurrence are properties of the chain

We also have now shown that

Theorem* (3.2). [restatement] Let P be an irreducible Markov chain. For any two states
x, y: x is transient / null recurrent / positive recurrent if and only if y is transient / null
recurrent / positive recurrent.

Proof. We have seen that

1
Px [Tx+ = ] =
1 + Ex [V (x)]
implies that x is transient if and only if y is transient.
Now, if x is positive recurrent, then P has a stationary distribution, so all states, including y
are positive recurrent. t
u

In light of this:

Definition 4.10. Let P be an irreducible Markov chain. We say that

P is transient, if there exists a transient state.

P is null recurrent if there exists a null recurrent state.
P is positive recurrent if there exists a positive recurrent state.
32

Number of exercises in lecture: 2

Total number of exercises until here: 10
33

Random Walks

Ariel Yadin

Lecture 5: Positive Recurrent Chains

5.1. Simple Random Walks

Last lecture we proved that an irreducible Markov chain P has a stationary distribution if and
only if P is positive recurrent, and the stationary distribution is the reciprocal of the expected
return time.
Lets investigate what this means in the setting of a simple random walk on a graph.

Example 5.1. Let G be a graph, and let P be the simple random walk on G; that is, P (x, y) =
1
deg(x) 1{xy} .

First, it is immediate that P is irreducible. This was shown in the exercises.

Consider the vector v(x) = deg(x). We have that
X X 1
deg(x)P (x, y) = deg(x) 1{xy} = deg(y).
x x
deg(x)

That is, vP = v.
If we take u(y) = v(y)/v(x) for some x, then uP = u and u(x) = 1. Thus, if P is recurrent,
deg(y)
then Ex [VTx+ (y)] = u(y) = for all x, y. This does not depend on dist(x, y)!
deg(x)
P
Another observation is that x v(x) = 2|E(G)|. That is, P is positive recurrent if and only
deg(x)
if G is finite. Moreover, in this case, the stationary distribution for P is (x) = 2|E(G)| .

Note that if G is a finite regular graph then the stationary distribution on G is the uniform
distribution. 454

Example 5.2. Recall the simple random walk on Z. We already have seen that this is a
recurrent Markov chain. Thus, if vP = v, then v(y) = Ex [VTx+ (y)]v(x) for all x, y. Since the
constant vector ~1 satisfies ~1P = ~1, we get that Ex [VTx+ (y)] = 1 for all x, y. Thus, any v such
that vP = v must admit v c.
So there is no stationary distribution on Z; that is, Z is null-recurrent. (We could have also
deduced this from the previous example.) 454
34

Example 5.3. Consider a different Markov chain on Z: Let P (x, x+1) = p and P (x, x1) = 1p
for all x.
1
Suppose vP = v. Then, v(x) = v(x 1)p + v(x + 1)(1 p), or v(x + 1) = 1p (v(x) pv(x 1))
h i
1
Solving such recursions is simple: Set ux = v(x + 1) v(x) . So ux+1 = 1p Aux where

1 p
A= .
1p 0

Since the characteristic polynomial of A is 2 + p(1 p) = ( p)( (1 p)), we get that

the eigenvalues of A are p and 1 p. One can easily check that A is diagonalizable, and so

x
p
v(x) = ux (2) = (1 p)x (Ax u0 )(2) = (1 p)x [0 1]M DM 1 u0 = a 1p + b,

where D is diagonal with p, 1 p on the diagonal, and a, b are constants that depend on the
matrix M and on u0 (but independent of x).
P
Thus, x v(x) will only converge for a = 0, b = 0 which gives v = 0. That is, there is no
stationary distribution, and P is not positive recurrent.
In the future we will in fact see that P is transient for p 6= 1/2, and for p = 1/2 we have
already seen that P is null-recurrent. 454

Example 5.4. A chess knight moves on a chess board, each step it chooses uniformly among
the possible moves. Suppose the knight starts at the corner. What is the expected time it takes
the knight to return to its starting point?
At first, this looks difficult...
However, let G be the graph whose vertices are the squares of the chess board, V (G) =
2
{1, 2, . . . , 8} . Let x = (1, 1) be the starting point of the knight. For edges, we will connect two
vertices if the knight can jump from one to the other in a legal move.
35

Thus, for example, a vertex in the center of the board has 8 adjacent vertices. A corner,
on the other hand has 2 adjacent vertices. In fact, we can determine the degree of all vertices.

2 3 4 4 4 4 3 2
3 4 6 6 6 6 4 3
o 4 6 8 8 8 8 6 4
4 6 8 8 8 8 6 4
legal moves:
4 6 8 8 8 8 6 4
4 6 8 8 8 8 6 4
3 4 6 6 6 6 4 3
2 3 4 4 4 4 3 2

Summing all the degrees, one sees that 2|E(G)| = 4 (4 8 + 4 6 + 5 4 + 2 3 + 2) = 4 84 = 336.

Thus, the stationary distribution is (i, j) = deg(i, j)/336. Specifically, (x) = 2/336 and so
Ex [Tx+ ] = 168. 454

5.2. Summary so far

Let us sum up what we know so far about irreducible chains. If P is an irreducible Markov
chain, then:
1
Ex [V (x)] + 1 = Px [Tx+ =]
.
For all states x, y, x is transient if and only if y is transient.
If P is recurrent, the vector v(z) = Ex [VTx+ (z)] is a positive left eigenvector for P , and
any non-negative left eigenvector for P is proportional to v.
P has a stationary distribution if and only if P is positive recurrent.
If P is positive recurrent, then (x) Ex [Tx+ ] = 1.

5.3. Positive Recurrent Chains

Recall that Lemma 4.6 connects the expected number of visits to x up to an appropriate
stopping time, to the stationary distribution and the expected value of the stopping time:

Lemma* (4.6). [restatement] Let (Xt )t be Markov-(P, ) for irreducible P . Assume T is a

stopping time such that
P [XT = x] = (x) for all x.

Assume further that 1 T < P -a.s. Let v(x) = E [VT (x)].

v(x)
Then, vP = v. Moreover, if E [T ] < then P has a stationary distribution (x) = E [T ] .
36

Good choices of the stopping time T for positive recurrent chains will give some nice identities.

Proposition 5.5. Let P be a positive recurrent chain with stationary distribution . Then,
1
Ex [Tx+ ] = (x) .
(y)
Ex [VTx+ (y)] = (x) .
For x 6= y,
1 + Ex [VTy+ (x)] = (x) (Ey [Tx+ ] + Ex [Ty+ ]).

For x 6= y,
(x) Px [Ty+ < Tx+ ] (Ey [Tx+ ] + Ex [Ty+ ]) = 1.

(This is sometimes called the edge commute inequality. It will be important in the
future.) For x y,
1
Ex [Ty ] + Ey [Tx ] .
(x)P (x, y)

Proof. We have:

Follows by choosing T = Tx+ in Lemma 4.6.

We have already seen this. It also follows by choosing T = Tx+ in Lemma 4.6.
Let T = inf {t Tx + 1 : Xt = y}. So Ey [T ] = Ey [Tx+ ] + Ex [Ty+ ]. Since Py [T = z] =
1{z=y} , we can apply Lemma 4.6. The strong Markov property at time Tx gives that
X
Ey [VT (x)] = Ey [ 1{Xk =x} ] = Ex [VTy+ (x)] + 1.
Tx kT

So by Lemma 4.6,

Ex [VTy+ (x)] = Ey [VT (x)] 1 = (x) Ey [T ] 1 = (x) (Ey [Tx+ ] + Ex [Ty+ ]) 1.

This follows from the previous bullet since Px -a.s. VTy+ (x) + 1 Geo(p) for p = Px [Ty+ <
Tx+ ].
Since for x y we have that Px [Ty+ < Tx+ ] Px [X1 = y] = P (x, y), we get the assertion
from the previous bullet.
t
u

Number of exercises in lecture: 0

Total number of exercises until here: 10
37

Random Walks

Ariel Yadin

Lecture 6: Convergence to Equilibrium

6.1. Convergence to Equilibrium

Recall that we saw that if P t (y, x) (x) for all x, then must be a stationary distribution.
We will start to work our way to prove the opposite, at least for irreducible and aperiodic chains.
Our goal:

Theorem* (6.5). [restatement] Let (Xt )t be an irreducible and aperiodic Markov chain.
Suppose that is a stationary distribution for this chain. Then, for any starting distribution ,
and any state x,

P [Xt = x] (x).

6.2. Couplings

Example 6.1. Two gamblers walk into a casino in Las Vegas.

The first one plays a fair game - every round she wins a dollar with probability 1/2, and loses
a dollar with probability 1/2. All rounds are independent.
The second gambler plays an unfair game - every round he wins a dollar with probability
p < 1/2, and loses a dollar with probability 1 p, again all rounds independent.
It is extremely intuitive that the second gambler is worse off than the first one. It should be
the case that the probability of the second gambler to go bankrupt is at least the probability of
the first one. Also, it seems that any reasonable measure of success should be larger for the first
gambler than for the second.
How can we mathematically prove this?
For example, we would like to show that for all starting positions N and any M > N , we
have that P1N [T0 < TM ] P2N [T0 < TM ]. How can we show this?
The idea is to use couplings. 454
38

Definition 6.2. A coupling of Markov chains P, Q on a state space S, is a stochastic process

(Xt , Yt )t such that (Xt )t is Markov-P and (Yt )t is Markov-Q.
Note that (Xt , Yt )t need not be a Markov chain on S 2 . If a coupling (Xt , Yt )t is in addition a
Markov chain on S 2 , then we say that (Xt , Yt )t is a Markovian coupling. If R is the transition
matrix for the Markovian coupling (Xt , Yt )t , we say that R is a coupling of P, Q.

Example 6.3. Let us use a Markovian coupling to show that lowering the winning probability
for a gambler, lowers their chances of winning.
Let p < q, and let P be the transition matrix on N for the gambler that wins with probability
p, and let Q be the transition matrix for the gambler that wins with probability q. That is,
P (n, n + 1) = p and P (n, n 1) = 1 p for all n > 0, and P (0, 0) = 1. Similarly for Q.
The corresponding Markov chains are (Xt )t for P and (Yt )t for Q. We can could the chains
as follows: Given (Xt , Yt ), since Y moves up with higher probability than X, we can organize a
coupling such that Yt+1 Xt+1 in any case. That is, given (Xt , Yt ), if Xt > 0 let

(1, 1) with probability p

(Xt+1 , Yt+1 ) = (Xt , Yt ) + (1, 1) with probability q p

(1, 1) with probability 1 q.

If Xt = 0, Yt > 0 let

(0, 1) with probability q
(Xt+1 , Yt+1 ) = (Xt , Yt ) +
(0, 1) with probability 1 q.

If Xt = Yt = 0 let Xt+1 = Yt+1 = 0

It is immediate to check that this is indeed a coupling of P and Q, and that Yt Xt for all t
provided that Y0 X0 .
One can check that the resulting transition matrix is:

p

i = 1, j = 1n, m > 0

qp i = 1, j = 1, n, m > 0

1q

i = 1, j = 1, n, m > 0
R((n, m), (n + i, m + j)) =

q i = 0, j = 1, n = 0, m > 0

1 q i = 0, j = 1, n = 0, m > 0

1

i = 0, j = 0, n = m = 0.

So this is a Markovian coupling.

Thus, for any M > N ,

PQ R
N [T0 < TM ] = P(N,N ) [ t : Yt = 0 and n < t Yn < M ]

PR P
(N,N ) [ t : Xt = 0 and n < t Xn < M ] = PN [T0 < TM ],

where PP , PQ , PR denote the probability measures for P , Q, and R respectively, and we have
used the fact that under PR
(N,N ) , a.s. Xt Yt for all t. 454

6.2.1. Coupling Time.

Lemma 6.4. Let (Xt , Yt )t be a Markovian coupling of two Markov chains on the same state
space S with the same transition matrix P . Define the coupling time as

= inf {t 0 : Xt = Yt } .

This is a stopping time for the Markov chain (Xt , Yt )t .

Define

X
t t
Zt =
Y
t t .
Then, (Zt )t is a Markov chain with transition matrix P , started from Z0 = X0 .

Specifically, (Zt , Yt )t is a coupling of Markov chains such that for all t , Zt = Yt .

c
Proof. Since { t + 1} = { < t + 1} ((X0 , Y0 ), . . . , (Xt , Yt )), the Markov property at
time t gives

P[Zt+1 = y|Zt = x, t+1, Zt1 , . . . , Z0 ] = P[Xt+1 = y|Xt = x, t+1, Xt1 , . . . , X0 ] = P (x, y).

Since is a stopping time, we can use the strong Markov property to deduce that for any t,

P[Zt+1 = y|Zt = x, t, Zt1 , . . . , Z0 ] = P[Yt+1 = y|Yt = x, . . . , Y ] = P (x, y).

Thus, for any t,

P[Zt+1 = y|Zt = x, Zt1 , . . . , Z0 ]

= P[Zt+1 , t + 1|Zt = x, Zt1 , . . . , Z0 ] + P[Zt+1 , t|Zt = x, Zt1 , . . . , Z0 ]

= P (x, y) (P[ t + 1|Zt = x, Zt1 , . . . , Z0 ] + P[ t|Zt = x, Zt1 , . . . , Z0 ]) = P (x, y).

t
u
40

6.3. The Convergence Theorem

In this section we will prove a fundamental result in the theory of Markov chains.

Theorem 6.5. Let P be an irreducible and aperiodic Markov chain. If P has a stationary
distribution , then for any starting distribution , and any state x,

P [Xt = x] (x).

Proof. Let (Yt )t be Markov-(, P ) independent of (Xt )t . Since P t = , we have that (x) =
P[Yt = x]. Let be the coupling time of (Xt , Yt )t .
First we show that P[ < ] = 1, so P[ > t] 0. Indeed, (Xt , Yt )t is a Markov chain on S 2 ,
with transition matrix Q((x, y), (x0 , y 0 )) = P (x, x0 )P (y, y 0 ). Moreover, for (x, y) = (x)(y),
we get that is stationary distribution for Q.
We claim that since P is irreducible and aperiodic, then Q is also irreducible (and aperi-
odic). Indeed, let (x, y), (x0 , y 0 ) S 2 . We already saw that there exist t(x, x0 ), t(y, y 0 ) such
that for all t > t(x, x0 ), P t (x, x0 ) > 0 and for all t > t(y, y 0 ), P t (y, y 0 ) > 0. Thus, for all
t > max {t(x, x0 ), t(y, y 0 )} we have that Qt ((x, x0 ), (y, y 0 )) > 0. Thus, Q is irreducible.
Since Q has a stationary distribution and Q is irreducible, we get that Q is positive recurrent.
Specifically, P[T(x,x) < ] = 1 for any x S. Since T(x,x) , we get that P[ < ] = 1.
Now define
Y
t t
Zt =
X
t t .

So (Xt , Zt )t is a coupling of Markov chains such that for all t , Xt = Zt . Also, since
Z0 = Y0 ,

P[Zt = x] = P[Zt = x, t < ] + P[Zt = x, t ] = P[Zt = x, t < ] + P[Xt = x, t ].

Adding this to

P[Xt = x] = P[Xt = x, t < ] + P[Xt = x, t ],

we get that

| P[Xt = x] P[Zt = x]| = | P[Xt = x, t < ] P[Zt = x, t < ]| P[ > t] 0.

Finally, the previous lemma tells us that (Zt )t is Markov-(S, P, ), most importantly, starting
distribution . So P[Zt = x] = (x). t
u
41

Number of exercises in lecture: 0

Total number of exercises until here: 10
42

Random Walks

Ariel Yadin

Lecture 7: Conditional Expectation

7.1. Conditional Probability

Recall that we want to define a random walk. A (simple) random walk is a process that
given the current location chooses among the available neighbors uniformly. So we need a way
of conditioning on the current position.
That is, we want the notions of conditional probability and conditional expectation.
The notion of conditional expectation is central to probability. It is developed using the
Radon-Nikodym derivative from measure theory:
Johann Radon (1887-1956)
Theorem 7.1. Let , be two probability measures on (, F). Suppose that is absolutely
continuous with respect to ; that is, (A) = 0 implies that (A) = 0 for all A F.
d
Then, there exists a (-a.s. unique) random variable d on (, F, ) such that for any event
A F,
d
E [1A ] = E [1A ].
d
Otto Nikodym (1887-1974)

X Lebesgue integrals give the following form:

d
Z Z
d = d,
A A d
d
which can be informally stated as d d = d.
This theorem is used to prove the following theorem.

Theorem 7.2. Let X be a random variable on a probability space (, F, P) such that

E[|X|] < . Let G F be a sub--algebra of F. Then, there exists a (P-a.s. unique) G-
measurable random variable Y such that for all A G, E[Y 1A ] = E[X1A ].

X Notation: An X such as above is called integrable.

X Notation: If Y is G-measurable then we write Y G.

Definition 7.3. Let X be an integrable (E[|X|] < ) random variable on a probability space
(, F, P). Let G F be a sub--algebra of F.
The random variable from the above theorem is denoted E[X|G].
If Y is a random variable on (, F, P) then we denote E[X|Y ] := E[X|(Y )].
If A F is any event then we write

P[A|G] := E[1A |G].

Proof of Theorem 7.2. Note that uniqueness is immediate from the fact that if Y, Y 0 are two
such random variables, then for An = Y Y 0 n1 we have that An G (as a function of

(Y, Y 0 )) and
P[An ]n1 E[(Y Y 0 )1An ] = E[X1An ] E[X1An ] = 0.

So by continuity of probability,
[
P[Y > Y 0 ] = P[ An ] = lim P[An ] = 0.
n
n

Exchanging the roles of Y and Y 0 we get that P[Y 6= Y 0 ] = 0.

For existence we use the Radon-Nikodym derivative: First assume that X 0. Then, define
a probability measure on (, G) by
E[X1A ]
AG Q(A) = .
E[X]
If P[A] = 0 then Q(A) = 0 (e.g. by Cauchy-Schwartz E[X1A ]2 E[X 2 ] P[A] = 0); that is
Q << P. So the Radon-Nikodym derivative exists and for all A G,

E[X1A ] = E[ dQ
d P 1A ] E[X].

dQ
Taking Y = dP E[X] completes the case of X 0.
For the general case, recall that X = X + X , and X + , X are non-negative. Let Y1 =
E[X + |G] and Y2 = E[X |G]. Then, Y1 Y2 G and for any A G,

E[X1A ] = E[X + 1A ] E[X 1A ] = E[(Y1 Y2 )1A ].

Thus, Y = Y1 Y2 completes the proof. t

X Note that to prove that Y = E[X|G] one needs to show two things: Y G and E[Y 1A ] =

E[X1A ] for all A G.

X Important: Conditional expectation E[X|G] is the average value of X given the information
44

in G; this is a random variable, not a number as is the usual expectation. One needs to be careful
with this. Whenever we write E[X|G] = Z we actually mean that E[X|G] = Z a.s.

Exercise 7.1. Let X be an integrable random variable on (, F, P). Let G F be a

sub--algebra. Then,

If X G then E[X|G] = X. [ The average value of X given X is X itself. ]

If G = {, } then E[X|G] = E[X]. [ Given no information, the average value of X is
E[X]. ]
If X = c for c a constant, then X is measurable with respect to the trivial -algebra
{, } G, so E[c|G] = c.
If X is independent of G then E[X|G] = E[X]. [ Given no information about X, the
average value of X is E[X]. ]
E[E[X|G]] = E[X].

Solution.

It is trivial that E[X1A ] = E[X1A ] so if X G then X satisfies both properties required

to be a conditional expectation.
Again, constants are measurable with respect to any -algebra. For the second property,
E[X1 ] = 0 = E[E[X]1 ] and E[X1 ] = E[X].
Easy. Follows from the previous bullets.
If X is independent of G, then for any A G, E[X1A ] = E[X] P[A] = E[E[X]1A ]. Also,
E[X] G since constants are measurable with respect to any -algebra.
Consider the event G. Since 1 = 1 we get that E[X] = E[X1 ] = E[E[X|G]1 ] =
E[E[X|G]].
t
u

Exercise 7.2. If Y = Y 0 a.s. then E[X|Y ] = E[X|Y 0 ]. [ Changing by measure 0 does

not change the conditioning. ]

Hint: Consider E[X|(Y ) (Y 0 )].

Solution. It suffices to prove that if G and G 0 are -algebras such that if A G4G 0 then P[A] = 0
(that is, G and G 0 only differ on measure 0 events) then E[X|G] = E[X|G 0 ] a.s.
G G 0 is a -algebra as an intersection of -algebras. Let Z = E[X|G G 0 ]. Since G G 0 G
and G G 0 G 0 we have that Z is both G and G 0 measurable. Moreover, for any A G: if A 6 G 0
then P[A] = 0 so E[X1A ] = 0 = E[Z1A ]. If A G 0 then A G G 0 so E[X1A ] = E[Z1A ] by
definition. Thus, Z = E[X|G]. Similarly, exchanging the roles of G and G 0 , we get Z = E[X|G 0 ],
so E[X|G] = E[X|G 0 ] a.s. t
u

Exercise 7.3. E[aX + Y |G] = a E[X|G] + E[Y |G] a.s.

Solution. The right hand side is of course G-measurable. For any A G,

E[(aX+Y )1A ] = a E[X1A ]+E[Y 1A ] = a E[E[X|G]1A ]+E[E[Y |G]1A ] = E[(a E[X|G]+E[Y |G])1A ].

t
u

Exercise 7.4. If X Y then E[X|G] E[Y |G].

Solution. Since Y X 0 is suffices to show that if X 0 then E[X|G] 0 a.s.

Let An = E[X|G] n1 . So An G and

P[An ]n1 E[E[X|G]1An ] = E[X1An ] 0.

So P[An ] = 0 for all n, and thus P[E[X|G] < 0] = P[ n : An ] = 0. t

Exercise 7.5. Let G G. Show that for any event A with P[A] > 0,
E[P[A|G]1G ]
P[G|A] = .
P[A]

Thomas Bayes (1701-1761)

Solution. Note that since G G, by definition,

E[P[A|G]1G ] = E[1A 1G ] = P[A G].

t
u

7.2. More Properties

Proposition 7.4 (Monotone Convergence). If (Xn )n is a monotone non-decreasing sequence

of non-negative integrable random variables, such that Xn % X for some integrable X, then
E[Xn |G] % E[X|G] a.s.

Proof. Let Yn = X Xn . Since Xn % X, we get that Yn 0 for all n. Thus, (E[Yn |G])n is a
monotone non-increasing sequence of non-negative random variables. Let Z() = inf n E[Yn |G]() =
limn E[Yn |G]() = lim inf n E[Yn |G](). So Z G and Z 0. Fatous Lemma gives that for any
A G,
E[Z] lim inf E[E[Yn |G]] = lim inf E[X Xn ] = 0,
n n

since E[Xn ] % E[X] by monotone convergence. Thus, Z = 0 a.s. This implies that
a.s.
E[X|G] E[Xn |G] 0.

t
u

Proposition 7.5. If Z G then E[XZ|G] = E[X|G]Z a.s.

Proof. Note that E[X|G]Z G so we only need to prove the second property.
We use the usual four-step proof, from indicators to simple random variables to non-negatives
to general.
If Z = 1B for some B G then for any A G,

E[XZ1A ] = E[X1BA ] = E[E[X|G]1BA ] = E[E[X|G]Z1A ].

P
If Z is simple, then Z = ak 1Ak and by linearity and the previous case,
k
X X
E[XZ|G] = ak E[X1Ak |G] = ak E[X|G]1Ak = E[X|G]Z.
k k

For general non-negative Z, in the case X is non-negative, we approximate Z by a non-

decreasing sequence of simple random variables, Zn % Z, so that XZn % XZ and by monotone
convergence and the previous case,

E[XZ|G] = lim E[XZn | G] = lim E[X|G]Zn = E[X|G]Z.

n n
47

For a general Z G, and general X, write Z = Z + Z and X = X + X , with 0

Z + , Z G and X + , X 0. By the previous case and linearity,

E[X Z|G] = E[X (Z + Z )|G] = E[X |G](Z + Z ) = E[X |G]Z,

which immediately leads to the assertion. t

The following properties all have their usual proof adapted to the conditional setting.

Proposition 7.6 (Jensens Inequality). If g : R R is convex, and X, g(X) are integrable,

then
g(E[X|G]) E[g(X)|G].

Proof. If g is convex then for any m there exist am , bm such that g(s) am s+bm for all s, and g(m) = am m+bm . Johan Jensen (1859-1925)

Thus, for any , there exist A(), B() such that g(s) A()s + B() for all s and g(E[X|G]()) =
A() E[X|G]() + B(). It is not difficult to see that A, B are measurable, and determined by E[X|G] and g, so
A, B are G-measurable random variables. Thus,

g(E[X|G]) = A E[X|G] + B = E[AX + B| G] E[g(X)|G].

u
t

Proposition 7.7 (Cauchy-Schwarz). If X, Y are in L2 (, F, P), then

(E[XY |G])2 E[X 2 |G] E[Y 2 |G].

E[X 2 ][Y 2 ] < , so E[XY |G] is a.s. finite. If E[Y 2 |G] =

p
Proof. By Jensens inequality, E | E[XY |G]| E[|XY |] Augustin-Louis Cauchy

0 a.s. then Y = 0 a.s. and so both sides of the inequality become 0. So we can assume that E[Y 2 |G] > 0. (1789-1857)
E[XY |G]
Set = E[Y 2 |G]
, which is a G-measurable random variable. By linearity,

0 E[(X Y )2 |G] = E[X 2 |G] + 2 E[Y 2 |G] 2 E[XY |G]

(E[XY |G])2
= E[X 2 |G] .
E[Y 2 |G]
u
t

Hermann Schwarz
Proposition 7.8 (Markov / Chebyshev ). If X 0 is integrable, then for any G-measurable
(1843-1921)
Z such that Z > 0,
E[X|G]
P[X Z|G] .
Z

Proof. Let Y = Z1{XZ} . So Y X. Thus,

Z P[X Z|G] = E[Y |G] E[X|G].

u
t

Pafnuty Chebyshev

(1821-1894)
48

Remark 7.9. Suppose that (, F, P) is a probability space, and G F is some sub--algebra.

We have two vector spaces associated: L2 (, G, P) L2 (, F, P); the spaces of square-integrable
G-measurable random variables and square-integrable F-measurable random variables. These
spaces come equipped with a inner-product structure given by < X, Y >= E[XY ]. The theory
of inner-product (or Hilbert) spaces tells us that L2 (, F, P) = L2 (, G, P) V where V is
the orthogonal complement to L2 (, G, P) in L2 (, F, P). So we can project any F-measurable
square integrable X onto L2 (, G, P). This projection turns out to be exactly X 7 E[X|G].
In fact, it is immediate that E[X|G] is a square-integrable G-measurable random variable.
Moreover, for Y L2 (, G, P),

hX E[X|G], Y i = E[XY E[X|G]Y ] = E[XY ] E[E[XY |G]] = 0.

Thus, to minimize E[(X Y )2 ] over all Y L2 (, G, P), we can take Y = E[X|G].

7.2.1. The smaller -algebra always wins. Perhaps the most important property that has
no unconditional counterpart is

Proposition 7.10. Let X be an integrable random variable on a probability space (, F, P).

Let H G F be sub--algebras. Then,

E[E[X|H]|G] = E[X|H].
E[E[X|G]|H] = E[X|H].

Proof. The first assertion comes from the fact that E[X|H] H G, so conditioning on G has
no effect.
For the second assertion we have that E[X|H] H of course, and for any A H, using that
A G as well,

E[E[X|G]1A ] = E[E[X1A |G]] = E[X1A ] = E[E[X|H]1A ].

t
u

7.3. Partitioned Spaces

During this course, we will almost always use conditional probabilities conditioned on some
discrete random variable. Note that if Y is discrete with range R (perhaps d-dimensional), then
P
rR 1{Y =r} = 1 a.s. This simplifies the discussion regarding conditional probabilities.

The main observation is the following

U
Exercise 7.6. Suppose that (, F, P) is a probability space with = kI Ak where
Ak F for all k I, with I some countable (possibly finite) index set. Show that
]
((Ak )kI ) = Ak : J I .
kJ

Hint: Show that any set in the right-hand side must be in ((Ak )kI ). Show that the right-
hand side is a -algebra.

Lemma 7.11. Let X be an integrable random variable on (, F, P). Let I be some countable
U
index set (possibly finite). Suppose that P[ kI Ak ] = 1 where Ak F for all k, and P[Ak ] > 0
for all k. Let G = ((Ak )kI ). Then,
X E[X1Ak ]
E[X|G] = 1Ak .
P[Ak ]
k

E[X1 ]
Proof. Let Y = k 1Ak P[AkA]k . The of course Y G. For any A G we have that 1A =
P
P
kJ 1Ak (P-a.s.) for some J I. Thus,
X E[X1Ak ] X
E[Y 1A ] = E[1Ak ] = E[X1Ak ] = E[X1A ].
P[Ak ]
kJ kJ

t
u

Corollary 7.12. Let Y be a discrete random variable with range R on (, F, P). Let X be an
integrable random variable on the same space. Then,
X E[X1{Y =r} ] X
E[X|Y ] = 1{Y =r} = 1{Y =r} E[X|Y = r],
P[Y = r]
rR rR

E[X1{Y =r} ]
where we take the convention that E[X|Y = r] = P[Y =r] = 0 for P[Y = r] = 0.
U
Proof. = rR {Y = r}. t
u

X Note that E[X|Y ] is a discrete random variable as well, regardless of the original distribution

of X.
Number of exercises in lecture: 6
Total number of exercises until here: 16
50

Random Walks

Ariel Yadin

Lecture 8: Martingales

8.1. Martingales

X Do conditional expectation

Definition 8.1. Let (, F, P) be a probability space. A filtration is a monotone sequence of

sub--algebras F0 F1 F.
A sequence (Xn )n of random variables is said to be adapted to a filtration (Fn )n if for all
n, Xn Fn .

Definition 8.2. Let (, F, P) be a probability space, and let (Fn )n be a filtration. A sequence
(Xn )n is said to be a martingale with respect to the filtration (Fn )n , or sometimes a (Fn )n -
martingale, if for all n,

E[|Xn |] < (i.e. Xn is integrable).

E[Xn+1 |Fn ] = Xn .
(Xn )n is adapted to (Fn )n .

If the filtration is not specified then we say that (Xn )n is a martingale if it is a martingale
with respect to the natural filtration Fn := (X0 , . . . , Xn ); that is, a sequence of integrable
random variables such that for all n,

E[Xn+1 |Xn , . . . , X0 ] = Xn .

Exercise 8.1. Show that if (Xn )n is an Fn -martingale then (Xn )n is also a

martingale with respect to the natural filtration ((X0 , . . . , Xn ))n . (Hint: Show that for all
n, (X0 , . . . , Xn ) Fn .)
51

Example 8.3. Let (Xn )n be a simple random walk on Z started at X0 = 0. The Markov
property gives that

1 1
E[Xn+1 |Xn , . . . , X0 ] = (Xn + 1) + (Xn 1) = Xn .
2 2

So (Xn )n is a martingale. 454

Example 8.4. More generally, if (Xn )n is a sequence of independent random variables with
Pn
E[Xn ] = 0 for all n, and Sn = k=0 Xk , then

E[Sn+1 |Sn , . . . , S0 ] = Sn + E[Xn+1 |Sn , . . . , S0 ].

Since Sn , . . . , S0 (X0 , . . . , Xn ) and since Xn+1 is independent of (X0 , . . . , Xn ) we have that

E[Xn+1 |Sn , . . . , S0 ] = E[Xn+1 ] = 0.
So, in conclusion, (Sn )n is a martingale. 454

Proposition 8.5. Let (Xn )n be a (Fn )n -martingale. For any k n we have E[Xn |Fk ] = Xk .

Proof. For k = n this is obvious. Assume that k < n. By properties of conditional expectation,
because Fk Fn1 ,

E[Xn |Fk ] = E[E[Xn |Fn1 ]|Fk ] = E[Xn1 |Fk ].

Continuing inductively, we get the proposition. t

Exercise 8.2. Let (Xn )n be a (Fn )n -martingale. Let T be a stopping time (with respect
to the filtration (Fn )n ). Prove that (Yn := XT n )n is a (Fn )n -martingale.

Theorem 8.6 (Optional Stopping). Let (Xn )n be an (Fn )n -martingale and T a stopping
time. We have that E[XT |X0 ] = X0 in the following cases:

If T is bounded; that is if T t a.s. for some 0 < t < .

If T is a.s. finite and there exists M > 0 such that |Xn | M for all n a.s. ((Xn )n is
bounded).
If E[T ] < and there exists M > 0 such that |Xn+1 Xn | M for all n a.s. (Xn )n
has bounded increments.
52

Proof. We start with the first case: Let Yn = XT n . Since T t a.s. we get that Yt = XT .
Since Y0 = X0 we conclude

E[XT |X0 ] = E[Yt |Y0 ] = Y0 = X0 .

For the second case: Let Yn = XT n as above. We have

| E[Yn |X0 ] E[XT |X0 ]| = | E[(XT n XT ) 1{T >n} |X0 ]| 2M P[T > n|X0 ] 0,

because T < a.s. Thus, since T n is a bounded stopping time,

E[XT |X0 ] = lim E[Yn |Y0 ] = lim Y0 = X0 .

n n

Finally, for the third case: Note that

T
X 1
|XT n XT | 1{T >n} |Xk+1 Xk |1{T >n} M T 1{T >n} .
k=n

Thus, similarly to the above,

| E[XT n |X0 ] E[XT |X0 ]| M E[T 1{T >n} |X0 ].

Since T 1{T >n} 0, and since E[T ] < , we get by dominated convergence that E[T 1{T >n} ]
0, and so
X0 = E[XT n |X0 ] E[XT |X0 ].

t
u

Let us use martingales to calculate some probabilities.

Example 8.7 (Gamblers Ruin). Let (Xt )t be a simple random walk on Z. Let T = T ({0, n})
be the first time the walk is at 0 or n.
We can think of Xt as a the amount of money a gambler playing a fair game has after the
t-th game. What is the probability that a gambler that starts with x reaches n before going
bankrupt?
Let
pn (x) = Px [Tn < T0 ].

Since (Xt )t is a martingale, we get that (XtT )t is a bounded martingale under the measure Px .
Since T is a.s. finite, we can apply the optional stopping theorem to get

x = Ex [XT T ] = Ex [XT ]

= Ex [XT |Tn < T0 ] pn (x) + Ex [XT |T0 < Tn ] (1 pn (x)) = pn (x) n.

x
So pn (x) = n. 454

Remark 8.8. This is another proof that Z is recurrent:

Let An = Tn < T0+ . So (An )n is a decreasing sequence of events. Thus,

\
P1 [ An ] = lim P[An ] = lim n1 = 0.
n n
n

By symmetry,
\
P1 [ An ] = 0.
n

Now, the event that the walk never returns to 0 is the event that the walk takes a step to either
1 or 1 and then never returns to 0; i.e.
( ) ( )
+ \ ] \
T0 = = X1 = 1, An X1 = 1, An .
n n

The Markov property gives

\ \
P0 [T0+ = ] = P1 [ An ] + P1 [ An ] = 0.
n n

Example 8.9. What about the amount of time it takes to reach 0 or n?

Consider Yt = Xt2 t.

1
(Xt + 1)2 (t + 1) + (Xt 1)2 (t + 1) = Yt .

E[Yt+1 |X0 , . . . , Xt ] =
2

So (Yt )t is a martingale, and thus (YT t ) is a bounded martingale under the measure Px . Thus,
since Y0 = X02 ,

x2 = Ex [YT |Y0 ] = Ex [XT2 ] Ex [T ] = pn (x)n2 Ex [T ].

So by the previous example, for any 0 x n,

Ex [T ] = xn x2 = x(n x).

454

Remark 8.10. This is another proof that Z is null-recurrent:

Under P0 , the event Tn < T0+ implies that T0+ 2n. So,

P0 [T0+ 2n] P0 [X1 = 1, Tn < T0+ ] = P1 [Tn < T0 ] = 1

n.
54

Since P0 [T0+ 2n 1] P0 [T0+ 2n], we get that

X
X
E0 [T0+ ] = P0 [T0+ > m] = P0 [T0+ m]
m=0 m=1

X X 2
= P0 [T0+ 2n 1] + P0 [T0+ 2n] = .
n=1 n=1
n

Example 8.11. Consider the martingale Xt2 t. Using the optional stopping theorem at time
T = T0+ we get that
1 = E1 [X02 0] = E1 [XT2 T ] = E1 [T ].

Similarly, E1 [T ] = 1. Since
1 1
E0 [T0+ ] = E0 [T0+ |X1 = 1] + E0 [T0+ |X1 = 1] = (E1 [T0 + 1] + E1 [T0 + 1]) ,
2 2
we get that E0 [T0+ ] = 2 < !
Where did we go wrong?
We could not use the optional stopping theorem, because the martingale Xt2 t is not bounded!
454

Example 8.12. Actually, this last bit gives a third proof that E0 [T0+ ] = . Suppose that
Ex [T0 ] < . Since (Xt )t is a martingale with bounded differences, by the optional stopping
theorem x = Ex [XT0 ]. But, XT0 = 0 a.s. so Ex [T0 ] = for all x. Using the Markov property,
1
E0 [T0+ ] = (E1 [T0 + 1] + E1 [T0 + 1]) = .
2
454

Number of exercises in lecture: 2

Total number of exercises until here: 18
55

Random Walks

Ariel Yadin

Lecture 9: Reversible Chains

9.1. Time Reversal

Let (Xt )t be Markov-P . Then, conditioned on Xt , we have that X[0, t] and X[t, ) are
independent. This suggests looking at the chain run backwards in time - since determining the
past given the future will only depend on the current state.
However, in accordance with the second law of thermodynamics (entropy always increases),
we know that nice enough chains converge to a stationary distribution, even if the chain is started
from a very ordered distribution - namely a -measure. This suggests that there is a specific
direction we are looking at, and that the chain is moving from order to disorder represented by
the stationary measure.
However, if we start the chain from the stationary distribution, perhaps we can view the chain
both forwards and backwards in time. This is the content of the following.

Definition 9.1. Let P be an irreducible Markov chain with stationary distribution . Define
(y)
P (x, y) = P (y, x) (x) . P is called the time reversal of P .

The next theorem justifies the name time reversal.

Theorem 9.2. Let be the stationary distribution for an irreducible Markov chain P .
Then, P is an irreducible Markov chain, and is a stationary distribution for P .
Moreover: Let (Xt )t be Markov-(, P ). Fix any T > 0 and define Yt = XT t , t = 0, . . . , T .
Then, (Yt )Tt=0 is Markov-(, P ).

Proof. The fact that P is a Markov chain follows from

(x)
X X
1
P (x, y) = (y)P (y, x) (x) = (x) = 1.
y y

Also,
X X X
1
( P )(x) = (y)P (y, x) = (y)(x)P (x, y) (y) = (x) P (x, y) = (x),
y y y
56

so is stationary for P .
Finally, note that (x)P (x, y) = (y)P (y, x). So,

P[Y0 = x0 , . . . , YT = xT ] = P [X0 = xT , X1 = xT 1 , . . . , XT = x0 ]

= (xT )P (xT , xT 1 ) P (x1 , x0 ) = P (xT 1 , xT ) P (xT 2 , xT 1 ) P (x0 , x1 ) (x0 )

= (x0 ) P (x0 , x1 ) P (x1 , x2 ) P (xT 1 , xT ).

t
u

9.2. Reversible Chains

Recall the following definition:

Definition 9.3. Let P be a Markov chain on S. A probability measure on S, , is said to

satisfy the detailed balance equations if for all x, y S,

(x)P (x, y) = (y)P (y, x).

We also say that P and are in detailed balance.

We also proved in the exercises that if P and are in detailed balance, then must be a
stationary distribution for P . (The opposite is not necessarily true, as is shown in the exercises.)
Immediately we see a connection between detailed balance and time reversals:

Proposition 9.4. Let P be a Markov chain with stationary distribution . The following are
equivalent:

P and are in detailed balance.

P = P .
For any T > 0, (Xt )Tt=0 is Markov-(, P ) if and only if (XT t )Tt=0 is Markov-(, P ). [
The time reversal is the same as the forward-time chain. ]

Proof. We show that each bullet implies the one after it.
If P and are in detailed balance, then for any states x, y,

(y) 1
P (x, y) = P (y, x) (x) = (x)P (x, y) (x) = P (x, y).

So P = P .
57

If P = P then for any T > 0, if (Xt )Tt=0 is Markov-(, P ) then (XT t )Tt=0 is Markov-(, P ).
Since P = P we get that (XT t )Tt=0 is Markov-(, P ). Reversing the roles of Xt and XT t we
get that for all T > 0, (Xt )Tt=0 is Markov-(, P ) if and only if (XT t )Tt=0 is Markov-(, P ).
Now for the third implication, assume that for all T > 0, (Xt )Tt=0 is Markov-(, P ) if and
only if (XT t )Tt=0 is Markov-(, P ). Take T = 1. Then (X0 , X1 ) is Markov-(, P ) if and only if
(X1 , X0 ) is Markov-(, P ). That is,

(x)P (x, y) = P [X0 = x, X1 = y] = P [X1 = y, X0 = x] = (y)P (x, y).

So P and are in detailed balance. t

9.3. Reversible chains as weighted graphs

Definition 9.5. Let G be a graph. A conductance on G is a function c : V (G)2 [0, )

satisfying

c(x, y) = c(y, x) for all x, y.

c(x, y) > 0 if and only if x y.

The pair (G, c) is called a weighted graph, or sometimes a network or electric network.

P P
Remark 9.6. Let (G, c) be a weighted graph, with C = x,y c(x, y) < . Define cx = y c(x, y)
c(x,y) cx
and P (x, y) = cx . P is a stochastic matrix, and so defines a Markov chain. For (x) = C

we have that is a distribution, and (x)P (x, y) = c(x, y) = c(y, x) = (y)P (y, x). Thus, P is
reversible.
We will refer to such a P as the random walk on G induced by c.
On the other hand, if P is a reversible Markov chain S, we can define a weighted graph as
follows: Let V (G) = S and c(x, y) = (x)P (x, y). Let x y if c(x, y) > 0. Note that
X X
c(x, y) = (x)P (x, y) = 1.
x,y x,y

Also, we see that P is the random walk induced by (G, c).

X Connection to multiple edges and self-loops.

P
Definition 9.7. If (G, c) is a weighted graph with x,y c(x, y) < , then the Markov chain
P (x, y) = Pc(x,y) is called the weighted random walk on G with weights c.
z c(x,z)
58

Example 9.8. Let (G, c) be the graph V (G) = {0, 1, 2}, with edges E(G) = {{0, 1} , {1, 2} , {0, 2}}
and c(0, 1) = 1, c(1, 2) = 2 and c(2, 0) = 3.
The weighted random walk is then

1 3
0 4 4

1 2
P = 3 0 .
3

3 2
5 5 0

c(z, w) so = [ 13 1 5
P P
The stationary measure is, of course, (x) = y c(x, y)/ z,w 4 12 ] is the
stationary distribution.
We can compute that P = P (which is expected since P is reversible). 454

Example 9.9 (One dimensional Markov chains are almost reversible). Let P be a Markov
chain on Z such that P (x, y) > 0 if and only if |x y| = 1. For x Z let px = P (x, x + 1) (so
1 px = P (x, x 1).
Consider the following conductances on Z: Let c(0, 1) = 1. For x > 0 set
x
Y py
c(x, x + 1) = .
y=1
1 py
1px
Let c(0, 1) = px and for x < 0 set
x
Y 1 py
c(x, x 1) = .
y=1
py

Note that for any x Z we have that

px
c(x, x + 1) = c(x 1, x) ,
1 px
so
c(x, x + 1) px
= px = px .
c(x, x + 1) + c(x, x 1) (1 px )( 1p x
+ 1)
So P is the weighted random walk with weights given by c.
Moreover, note that

1
(c(x, x 1) + c(x, x + 1))P (x, x + 1) = c(x, x 1) 1px px = c(x, x + 1)

and

1
(c(x + 1, x) + c(x + 1, x + 2))P (x + 1, x) = c(x, x + 1) 1px (1 px ) = c(x, x + 1),

So for m(x) = c(x, x 1) + c(x, x + 1) we have that m(x)P (x, y) = m(y)P (y, x) for all x, y. That
is, if m was a distribution, P would be reversible.
59

To normalize m to be a distribution we would need that

X X X
m(x) = c(x, x 1) + c(x, x + 1) = 2 c(x, x + 1) < .
x x x

For example, if px = 1/3 for x > 0 and px = 2/3 for x < 0 we would have that c(x, x+1) = 2x
for x 0 and c(x, x 1) = 2x for x 0. Thus
X
X
m(x) = 2 2x + 2x = 4 2 = 8 < .
x x=0
c(x,x1)+c(x,x+1)
So (x) = 8 is a stationary distribution.
In general, we see that a drift towards 0 would give a reversible chain. 454

Number of exercises in lecture: 0

Total number of exercises until here: 18
60

Random Walks

Ariel Yadin

Lecture 10: Discrete Analysis

10.1. Laplacian

In order to study electric networks and conductances, we will first introduce the concept of
harmonic functions.
Let G = (V (G), c) be a network; recall that by this we mean: c : V (G) V (G) [0, ) with
P
c(x, y) = c(y, x) for all x, y G and cx := y c(x, y) < for all x. We denote by E(G) the set
of oriented edges of G; that is,

E(G) = {(x, y) : c(x, y) > 0} .

(We write x y when c(x, y) > 0.) For e E(G) we write e = (e+ , e ). c is known as the
conductance of the network.
Let C 0 (V ) = {f : V (G) R} and C 0 (E) = {f : E(G) R} be the sets of all functions of
vertices and (oriented) edges of G respectively.
We can define an operator : C 0 (V ) C 0 (E) by: for any edge x y,

(f )(x, y) = c(x, y)(f (x) f (y)).

We can also define an operator div : C 0 (E) C 0 (V )

X 1
(divF )(x) = (F (x, y) F (y, x)).
c
yx x

We can consider the spaces C 0 (V ), C 0 (E) with the inner products

X X 1
hf, f 0 i = cx f (x)f 0 (x) and hF, F 0 i = F (e)F 0 (e).
x e
c(e)

Consider the subspaces L2 (V ) = f C 0 (V ) : hf, f i < and L2 (E) = F C 0 (E) : hF, F i < .

The operator is a linear operator from L2 (V ) to L2 (E). Also div : L2 (E) L2 (V ) is a linear
61

operator, and
X X
hf, F i = (f (x) f (y))F (x, y) = f (x)(F (x, y) F (y, x))
(x,y) xy
X X 1
= cx f (x) (F (x, y) F (y, x)) = hf, divF i.
x
c
yx x

So = div and div = are dual of each other.

Recall that the weighted random walk on the network G is just the Markov process with
c(x,y)
transition matrix given by P (x, y) = cx .

Define the operator : C 0 (V ) C 0 (V ) by = 12 div. That is,

X 1 X
f (x) = 12 divf (x) = 1
2 (f (x, y) f (y, x)) = P (x, y)(f (x) f (y)).
c
yx x y

Exercise 10.1. Show that (in matrix form) = I P where I is the identity
operator.

10.2. Harmonic functions

Definition 10.1. A function f : V (G) R is called harmonic at x if f (x) = 0. f is

said to be harmonic on A if for all x A, f is harmonic at x. f is said to be harmonic, if f is
harmonic at all x.

Harmonic functions and martingales are intimately related.

Proposition 10.2. Let G = (V (G), c) be a network. Let f : G R be a function. Let S G

and let T = TS c be the first exit time of S, for (Xt )t , the weighted random walk on G.
Then, f is harmonic in S if and only if the sequence (Mt = f (XtT ))t is a martingale under
Px for all x.

Proof. First assume that f is harmonic in S. Note that if x 6 S then XtT = X0 = x a.s. under
Px . So as a constant sequence, Mt = f (x) is a martingale. So we only need to deal with x S.
The main observation here is that the Markov property is just the fact that
X
Ex [f (Xt+1 )|Ft ] = P (Xt , y)f (y) = (P f )(Xt ).
y
62

For any t, since 1{T t+1} = 1{T >t} Ft , and f (XT )1{T t} Ft ,

Ex [Mt+1 |Ft ] = Ex [f (Xt+1 )|Ft ]1{T >t} + f (XT )1{T t} = (P f )(Xt )1{T >t} + f (XT )1{T t} .

If f is harmonic at x, then P f (x) = f (x). Thus, since on the event T > t, f is harmonic at Xt ,
we get that (P f )(Xt )1{T >t} = f (Xt )1{T >t} . In conclusion,

Ex [Mt+1 |Ft ] = (P f )(Xt )1{T >t} +f (XT )1{T t} = f (Xt )1{T >t} +f (XT )1{T t} = f (XtT ) = Mt .

So Mt is a martingale.
For the other direction, assume that MtT is a martingale. Then, for any x S,

f (x) = M0 = Ex [M1 ] = Ex [f (X1 )] = (P f )(x),

were we have used that under Px , T 1 a.s. So we have that for any x S, f (x) =
(I P )f (x) = 0. So f is harmonic in S. t
u

Harmonic functions exhibit properties analogous to those in the continuous case.

Proposition 10.3 (Solution to Dirichlet Problem). Let G = (V (G), c) be a network. Let

B G (we think of B as the boundary). Let

D = {x G : Px [TB < ] = 1} .

(So B D.) Let u : B R be some bounded function (boundary values).

Then, there exists a unique function f : D R that is bounded, harmonic in D \ B and
admits f (b) = u(b) for all b B.

Proof. Define f (x) = Ex [u(XTB )]. This is well defined, since under Px , TB < a.s. and since
u is bounded.
It is immediate to check that for any b B, f (b) = u(b). Also, for x D \ B, since TB 1
Px a.s. , by the Markov property,
X
f (x) = Ex [u(XTB )] = P (x, y) Ey [u(XTB )] = P f (x).
y

So f is harmonic at x.
For uniqueness, assume that g : D R is bounded, harmonic in D \ B, and g(b) = u(b) for
all b B. We want to show that

(10.1) for all x D, g(x) = Ex [u(XTB )].

g is bounded, so (g(XTB t ))t is a bounded martingale, so (10.1) holds by the optional stopping
theorem, because TB < Px -a.s. for all x D. t
u

If we remove the condition that TB < then we can only guaranty existence but not
uniqueness of the solution to the Dirichlet problem.

Proposition 10.4. Let G = (V (G), c) be a network. Let B G and let u : B R be some

function.
Then, there exists a function f : G R that is harmonic in G \ B and admits f (b) = u(b)
for all b B.

Proof. We define

f (x) = Ex [u(XTB )1{TB <} ].

Obviously, f (b) = u(b) for all b B. Also, for x 6 B, since TB 1 Px -a.s. we have that f is
harmonic at x by the Markov property. t
u

X Comparison to Poisson formula?

The maximum principle for harmonic functions in Rd states that if a non-constant function is har-

monic in a connected open subset of Rd then it will have all its maximal values on the boundary.

Proposition 10.5 (Maximum Principle). Let G = (V (G), c) be a network. Let B G and

D = {x G : Px [TB < ] = 1} . Let f : D R be a bounded function, harmonic in D \ B.
Then,

sup f (x) = sup f (x) and inf f (x) = inf f (x).

xD xB xD xB

That is, supremum and infimum are on the boundary.

Moreover, if D \ B is connected, and f is not constant, any x such that f (x) attains the
supremum or infimum must admit x B.

Proof. For any x D we know that

f (x) = Ex [f (XTB )] sup f (b),

because XTB B a.s.

Now, assume that f (x) supyD f (y) for some x D \ B. Let z D. Since D \ B is
connected, there exists a path from x to z that does not intersect B. Thus, there exists t > 0
64

such that Px [TB t, Xt = z] > 0. Since f is harmonic in D \ B, we get that f (XTB s )s is a

martingale. Thus, stopping at time s = t,

(f (x)f (z))Px [TB t, Xt = z] = Ex [(f (x)f (XtTB ))1{TB t,Xt =z} ] Ex [f (x)f (XtTB )] = 0.

So f (z) f (x) f (z) for any z D, and f is constant.

This completes the proofs for the supremum. For the infimum, consider the function g = f .
So g is bounded, harmonic in D \ B. Since supxS g(x) = inf xS f (x) for any set S, we can
apply the proposition to g to get the assertions for the infimum. t
u
x
p
Example 10.6. Consider the following network: V (G) = Z and c(x, x + 1) = 1p . Suppose
that p > 1/2 (if p = 1/2 this is just the simple random walk on Z, and if p < 1/2 then we can
exchange x 7 x to get the same thing).
The weighted random walk here is given by

c(x, x + 1)
P (x, x + 1) = =p and P (x, x 1) = 1 p.
c(x, x + 1) + c(x 1, x)

First lets prove that the weighted random walk here is transient. For example, recall that it
suffices to show that

X
P0 [Xt = 0] < .
t=0

Well, since at each step the walk moves right with probability p and left with probability 1 p
independently, we can model this walk by
t
X
Xt = k ,
k=1

where (k )k are independent and all have distribution P[k = 1] = p = 1 P[k = 1].
k +1
The usual trick here is to note that 2 Ber(p), so

2t t
P0 [X2t = 0] = P[Bin(2t, p) = t] = p (1 p)t .
t

(This is symmetric in p as expected.) Of course P[X2t+1 = 0] = 0 because of parity issues.

Now, since 2t

t is the number of size t subsets out of 2t elements, this is at most the total

number of subsets which is 22t . Since for p 6= 1/2, 4p(1 p) < 1, we get that

X X 1
P[Xt = 0] (4p(1 p))t = < .
t=0 t=0
1 4p(1 p)

This is one proof that for p 6= 1/2 the weighted walk is transient.
65

Now, let us consider B = {0} and boundary values u(0) = 1. What is a bounded harmonic
function f : G R such that f is harmonic in G \ B? Well, we can take f 1, which is one
option. Another option is to take f (x) = Px [T0 < ]. But since G is transient, we know that
f 6 1!
Since Px [T0 < ] = Ex [u(0)1{T0 <} ] we see that this is the second solution from above.
However, the uniqueness is only for functions defined on {x : Px [T0 < ] = 1}, so a-priori
there is freedom to choose more than one option for those xs such that Px [T0 < ] < 1. 4 5 4

X add discussion on finite networks?

10.3. Green Function

Let G = (V (G), c) be a network. Let u : G R be a function. Suppose we want to solve the

equation f = u.
If we had a function g : G G R that satisfied

g(, x) = 1{x=}

for every x, we could write

X
f (y) = g(y, x)u(x).
x

Then,
X
f (z) = u(x)1{x=z} = u(z),
x

which is a solution. So finding the solution to g = 1{x=} is the basic step.

It turns out that such a g exists, and g is called the Green Function. It is the counterpart of
the classical Green Function.

Proposition 10.7. Let G = (V (G), c) be a network. Let Z G be a set (possibly empty).

Define
Z 1
TX
gZ (x, y) = Ex [ 1{Xk =y} ].
k=0

Assume that at least one of the following conditions holds:

The weighted random walk on G is transient.

Z 6= .
66

Then,
gZ (, x) = 1{x=}

for all x 6 Z. Moreover, for all x, y,

cx gZ (x, y) = cy gZ (y, x).

Proof. The conditions of the proposition are there to ensure that

X
gZ (x, y) = Px [Xk = y, TZ > k] < .
k=0

First, the Markov property gives that for a fixed y, using h(x) = gZ (x, y),

X X
X
h(x) = 1{x=y} + Px [Xk = y, TZ > k] = 1{x=y} + P (x, w) Pw [Xk1 = y, TZ > k 1]
k=1 k=1 w
X
= 1{x=y} + P (x, w)h(w),
w

so h(x) = 1{x=y} .
The symmetry of gZ is shown as follows: By the definition of the weighted random walk, we
have that cx P (x, y) = cy P (y, x) = c(x, y) for all x y. Thus, for any path in G, (x0 , . . . , xn ),

cx0 Px0 [X0 = x0 , . . . , Xn = xn ] = cxn Pxn [X0 = xn , . . . , xn = x0 ].

Thus, for any x, y,

X
cx Px [Xk = y, TZ > k] = cx Px [X[0, k] = ]
:xy
||=k , Z=
X
= cy Py [X[0, k] = (k , k1 , . . . , 0 )]
:xy
||=k , Z=

= cy Py [Xk = x, TZ > k].

Summing over k completes the proof. t

Number of exercises in lecture: 1

Total number of exercises until here: 19
67

Random Walks

Ariel Yadin

Lecture 11: Networks

11.1. Some discrete analysis

Let G = (V, c) be a network.

Recall that for x y,
(f )(x, y) = c(x, y)(f (x) f (y)).

Also,
X 1
(divF )(x) = (F (x, y) F (y, x)).
c
yx x
We have the duality formula
X X
hf, F i = (f (x) f (y))F (x, y) = f (x)(F (x, y) F (y, x))
(x,y) xy
X X 1
= cx f (x) (F (x, y) F (y, x)) = hf, divF i,
x
c
yx x

1
0 0
(x) and hF, F 0 i = 0
Also, = I P = 21 div.
P P
where hf, f i = x cx f (x)f e c(e) F (e)F (e).

We want to think of as differentiation. So the opposite operation should be some kind of

integral.
Let : x y be a path in G. For a function F C 0 (E) on the oriented edges of G, define
||1
1
I X
F = F (j , j+1 ) .
j=0
c(j , j+1 )

For a path define its reversal by = (|| , ||1 , . . . , 0 ). Also, define F C 0 (E) by F (x, y) =
F (y, x).
We make a few observations:

Proposition 11.1. Let F C 0 (E).

H H
F = F . Thus, if F is anti-symmetric, i.e. F (x, y) = F (y, x) for all x y, then
H H
for any path F = F .
If F = f for some f C 0 (V ), then for any path : x y we have that F =
H

f (x) f (y).
68

If f = g then there exists a constant such that f = g + .

Proof. The first bullet is immediate just reversing the order of the edges in F .
For the second bullet, expanding the sum, we find that for : x y,
I ||1
X
F = f (j ) f (j+1 ) = f (x) f (y).
j=0

For the third bullet, note that for any : x y we have that
I I
f (x) f (y) = f = g = g(x) g(y).

So f (x) g(x) = f (y) g(y) for all x, y, and the difference f g is constant. t
u

Definition 11.2. A function F C 0 (E) is said to respect Kirchhoff s cycle law if for any
H
cycle : x x, F = 0.
Gustav Kirchhoff
Any gradient respects Kirchhoffs cycle law, as shown above. But the converse also holds:
(1824-1887)

Proposition 11.3. F C 0 (E) respects Kirchhoff s cycle law if and only if there exists
f C 0 (V ) such that F = f .
R
In other words, if F respects Kirchhoffs cycle law, then we can define F := f for any f
R
such that f = F , and then all representations of F differ by some constant.

Proof. We only need to prove the only if direction.

Assume that F respects Kirchhoffs cycle law. First, note that F must be anti-symmetric.
Indeed, for x y, the path (x, y, x) is a cycle, and
I
F (x, y) + F (y, x) = c(x, y) F = 0.
(x,y,x)

Now, fix x, y G and let : x y and : x y. Then, the path = = (0 , . . . , || , ||1 , . . . , 0 )

is a cycle : x x. So
I I I I I
F F = F+ F = F = 0.

H
So F does not depend on the choice of : x y, but only on the endpoints x and y.

H
Fix some a G and for any x G define f (x) = F for some : x a, with the convention
that f (a) = 0. It is clear that for any x y,
1
I
F (x, y) = F = f (x) f (y).
c(x, y) (x,y)

So F = f . t
u
69

11.2. Electrical Networks

Let G = (V, c) be a network. For each edge x y, define the resistance of the edge to be
1
r(x, y) = c(x,y) . Let A, Z G be two disjoint subsets.
If we were physicists, we could enforce voltage 1 on A, voltage 0 on Z, and look at the voltage
and current flowing through the graph G, where each edge is a r(x, y)-Ohm resistor. According
V
to Ohms law, the current equals the potential difference divided by the resistance I = R .
Kirchhoff would reformulate this telling us that the total current out of each node should be 0,
except for those nodes in A Z.
Let us turn this into a mathematical definition. The physics will only serve as intuition (albeit
usually good intuition).

Definition 11.4. Let G = (V, c) be a network. Let A, Z be disjoint subsets of G.

A voltage imposed on A and Z is a function v : G R that is harmonic in G \ (A Z).
A unit voltage is a voltage v with v(a) = 1 for all a A and v(z) = 0 for all z Z.
Given a voltage v, the current induced by v is defined I(x, y) = v(x, y) = c(x, y)(v(x)v(y))
for all oriented edges x y.

v(x)v(y)
X Note that this has the form I(x, y) = r(x,y) , which is the form of Ohms law.

Georg Ohm (1789-1854)

Definition 11.5. Let G = (V, c) be a network, and let A, Z be disjoint subsets of G. A flow
from A to Z is a function F on oriented edges of G satisfying:

F is anti-symmetric: For every edge x y, F (x, y) = F (y, x).

1
P
F is divergence free: For every x G \ (A Z), divF (x) = yx cx (F (x, y) F (y, x)) =
0. (A function being divergence free is sometimes said to respect Kirchhoff s node
law.)

X For simplicity, we will sometimes extend a flow F to all pairs (x, y) by defining F (x, y) = 0

for x 6 y.

Example 11.6. If v is a voltage, then the current induced by v is a flow; indeed,

I(x, y) = c(x, y)(v(x) v(y)) = c(y, x)(v(y) v(x)) = I(y, x),

and for x 6 A Z,
X 1 X c(x, y)
divI(x) = (I(x, y) I(y, x)) = 2 (v(x) v(y)) = 2v(x) = 0.
c
yx x yx
cx

This fact is Kirchhoffs node law. 454

Example 11.7. If v is a voltage, and I is the current induced by v, then we have Kirchhoffs
cycle law: for any cycle : x x, = (x = 0 , 1 , . . . , n = x),
n1
X n1
X
I(xj , xj+1 )r(xj , xj+1 ) = v(xj ) v(xj+1 ) = v(x) v(x) = 0.
j=0 j=0

This of course is due to the fact that any derivative v respects Kirchhoffs cycle law. 4 5 4

Exercise 11.1. Let G = (V, c) be a finite network and A, Z disjoint subsets of

G. Let I be flow from A to Z, that satisfies Kirchhoffs cycle law: for any cycle : x x,
= (x = 0 , 1 , . . . , n = x),
n1
X
I(xj , xj+1 )r(xj , xj+1 ) = 0.
j=0

Then, there exists a voltage v such that I is induced by v. Moreover, if u, v are two such voltages,
then v u = , for some constant .

11.3. Probability and Electric Networks

Since voltages are harmonic functions, it is not surprising that there is a connection between
probability and electric networks. Let us elaborate on this.

Definition 11.8. Let G = (V, c) be a network. Let a G and Z G. Let v be a voltage

such that v(z) = 0 for all z Z and v(a) = 1. Define the effective conductance from a to Z
by
X ca
Ceff (a, Z) : = I(a, x) = divI(a)
x
2
X
= c(a, x)(v(a) v(x)) = ca v(a),
x

where I is the current induced by v.

The effective resistance is defined as the reciprocal of the effective conductance.

Reff (a, Z) := (Ceff (a, Z))1 .

Proposition 11.9. Let G = (V, c) be a network. Let {a} , Z be disjoint subsets. Let v be a
voltage such that v(z) = 0 for all z Z, and v(a) 6= 0 arbitrary. Let I be the current induced by
v. Then,
P
I(a,x) ca v(a)
Ceff (a, Z) = x
v(a) = v(a) .

If the component of a in G \ Z is finite, then Ceff (a, Z) = ca Pa [TZ < Ta+ ]. Specifically,
in this case Ceff (a, Z) does not depend on the choice of the voltage.

1
Proof. The first bullet follows from the fact that u = v(a) v is a voltage with u(z) = 0 for all
z Z and u(a) = 1, and v(a)u = v.
For the second bullet, let D be the component of a in G \ Z. We have two harmonic functions
v
on D: u = v(a) and Px [TZ > Ta ], which are 0 on Z and 1 on a. Thus, these functions are equal,
because D is finite. Now,

X 1 X
ca Pa [TZ < Ta+ ] = ca P (a, x)(1 u(x)) = c(a, x)(v(a) v(x))
x
v(a) x
ca v(a)
= = Ceff (a, Z).
v(a)

t
u

11.4. Resistance to Infinity

Example 11.10. Let G = (V, c) be an infinite network, and let a G. Let (Gn )n be an
S
increasing sequence of finite connected subgraphs of G, that contain a, such that G = n Gn
(in this case we say that (Gn )n exhaust G).
For every n, let Zn = G \ Gn . Note that the connected component of a in G \ Zn is Gn which
is finite. Thus, we can consider the effective conductance from a to Zn , Ceff (a, Zn ). This is a
sequence of numbers, which converges to a limit; indeed, if Ta+ < , since X[0, Ta+ ] is a finite
path, there exists n0 such that for all n > n0 , X[0, Ta+ ] Gn . The events {TZn < Ta+ } form a
decreasing sequence, so

lim Ceff (a, Zn ) = ca lim Pa [TZn < Ta+ ] = ca Pa [Ta+ = ].

n n
72

Thus, we see that limn Ceff (a, Zn ) does not depend on the choice of the exhausting subgraphs
(Gn )n , and

(G, c) is recurrent lim Ceff (a, Zn ) = 0 lim Reff (a, Zn ) = .

n n

454

In light of the above:

Definition 11.11. Let G = (V, c) be an infinite network, and let a G. Let (Gn )n be an
S
increasing sequence of finite connected subgraphs of G, that contain a, such that G = n Gn .
Let Zn = G \ Gn .
Define the conductance from a to infinity and resistance from a to infinity as

Ceff (a, ) = lim Ceff (a, Zn ) and Reff (a, ) = Ceff (a, )1 .
n

Thus, the theorem is:

Theorem 11.12. The weighted random walk in a network G is recurrent if and only if the
resistance from some vertex a to infinity is infinite.

Number of exercises in lecture: 1

Total number of exercises until here: 20
73

Random Walks

Ariel Yadin

Lecture 12: Network Reduction

12.1. Network Reduction

Recall that
Ceff (a, ) = ca Pa [Ta+ = ].

So the effective resistance or conductance to infinity will not help us decide whether (G, c) is
recurrent unless we have a way of simplifying a sequence finite networks Gn .
We will now compute a few operations that will help us reduce networks to simpler ones
without changing the effective conductance between a and Z. Thus, it will give us the ability to
compute probabilities on some networks.
When we wish two differentiate between effective conductance (or resistance) in two networks,
we will use Ceff (a, Z; G) and Ceff (a, Z; G0 ).

12.1.1. Parallel Law.

Exercise 12.1. Suppose (G, c) is a network with multiple edges. Let (G0 , c0 ) be the
network without multiple edges where the weight c0 (x, y) is the sum of all weights between x
and y in (G, c). That is,
X
c0 (x, y) = c(e).
eE(G),e+ =x,e =y

Then, (G0 , c0 ) is a network without multiple edges, and the weighted random walk on (G0 , c0 )
has the same distribution as the weighted random walk on (G, c).
Specifically, for all a, Z the effective conductance between a and Z does not change.

Solution. This is just the fact that the transition probabilities for (G, c) and (G0 , c0 ) are propor-
tional to each-other:
X
P (x, y) c(e) = c0 (x, y) P 0 (x, y).
e : e+ =x,e =y
74

t
u

x y
c1

x c1 + c2 y

Figure 3. Parallel Law

12.1.2. Series Law.

Proposition 12.1 (Series Law). Let (G, c) be a network. Suppose there exists w that has
exactly two adjacent vertices u1 , u2 .
Let (G0 , c0 ) be the network given by V (G0 ) = V (G) \ {w}, and

c(x, y) x, y V (G0 ) , {x, y} =
6 {u1 , u2 }

0
c (x, y) =
1
r(u1 ,w)+r(u2 ,w) + c(u1 , u2 ) {x, y} = {u1 , u2 } .

1
That is, we remove the edges between u1 w and u2 w and add weight c(u1 ,w)1 +c(u2 ,w)1 to
the edge u1 u2 (which may have originally had weight 0).
Then, for any a, Z such that w 6 {a} Z, and such that the component of a in G \ Z is finite,
we have that Ceff (a, Z; G) = Ceff (a, Z; G0 ).

Proof. Let (G0 , c0 ) be a network identical to (G, c) except that c0 (u1 , w) = c0 (u2 , w) = 0 and
c0 (u1 , u2 ) = c(u1 , u2 ) + C. We want to calculate C so that any function that is harmonic at u1 , w
on G will be harmonic at u1 on G0 as well.
Let f : G R be harmonic at u1 , w on G. If f (u1 ) = f (w), then harmonicity at w together
with the fact that w is adjacent only to u1 , u2 , give that f (u1 ) = f (w) = f (u2 ). So the weight
of the edges between u1 , u2 , w does not affect harmonicity of function, and can be changed.
75

f f (w)
Hence, we assume that f (u1 ) 6= f (w). Let h = f (u1 )f (w) . So h is harmonic at u1 , w and
h(w) = 0 and h(u1 ) = 1. Harmonicity at u1 gives that
X
c(u1 , y)(h(u1 ) h(y)) = c(u1 , w)(h(u1 ) h(w)) = c(u1 , w).
y6=w

Harmonicity at w gives

c(u1 , w) + c(u2 , w)h(u2 ) = 0.

Thus, in order for h to be harmonic at u1 on G0 , we require that

X c(u1 , w)
0= c(u1 , y)(h(u1 ) h(y)) + C(h(u1 ) h(u2 )) = c(u1 , w) + C 1 + .
c(u2 , w)
y6=w

This leads to the equation

c(u1 , w) c(u2 , w) 1
C= = .
c(u1 , w) + c(u2 , w) r(u1 , w) + r(u2 , w)
1
Thus, we have shown that choosing the weight r(u1 ,w)+r(u2 ,w) as above, we get that if f is
0
harmonic at u1 , w on G, then f is also harmonic at u1 on G . Taking u1 to play the role of u2 ,
the same holds if f is harmonic at u2 and w on G.
Let a, Z be as in the proposition. Let D be the component of a in G \ Z. Let v be a unit
voltage imposed on a and Z in D. Since we chose the weight on u1 u2 in G0 correctly, we get
that v is also a unit voltage imposed on a and Z in G0 .
Because Ceff (a, Z; G) = y v(a, y) and similarly in G0 , and since G \ Z and G0 \ Z only
P

differ at edges adjacent to u1 , u2 and w, we have that Ceff (a, Z; G) Ceff (a, Z; G0 ) = 0 for all
a 6 {u1 , u2 }.
Now, if a = u1 then we have by harmonicity of v at w,

(c(u1 , w) + c(u2 , w))v(w) = c(u1 , w)v(a) + c(u2 , w)v(u2 ).

Since the only difference is on edges adjacent to u1 , u2 and w,

1
Ceff (a, Z; G) Ceff (a, Z; G0 ) = c(a, w)(v(a) v(w)) (v(a) v(u2 ))
r(u1 , w) + r(u2 , w)
c(u1 , w)
= ((c(u1 , w) + c(u2 , w))(v(a) v(w)) + c(u2 , w)(v(a) v(u2 )))
c(u1 , w) + c(u2 , w)
= 0.

t
u
76

Remark 12.2. Note that if w has exactly 2 neighbors in a network (G, c) as above, with resistances
r1 , r2 on these edges, then the network with these two resistors exchanged with a single resistor of
resistance r1 +r2 is an equivalent network in the sense that effective resistances and conductances
do note change, as above.

c1 c2
u1 w u2

1
1
c1
+ c1
2

u1 u2

Figure 4. Series Law

Example 12.3. What is the effective conductance between a and z in the following network:

a z a z

17/24

1/2
1/3
a 1/2 z a z

1/2
1/2
3/8
1/2

1/2
a z a z

3/2 3/5
77

454

12.1.3. Contracting Equal Voltages.

Exercise 12.2. Let (G, c) be a network, and let v be a unit voltage imposed on a and
Z. Suppose x, y 6 {a} Z are such that v(x) = v(y). Define (G0 , c0 ) by contracting x, y to the
same vertex; that is: V (G0 ) is V (G) with the vertices x, y removed and a new vertex xy instead.
All edges and weights stay the same, except for those adjacent to x or y, for which we have
c0 (xy, w) = c(x, w) + c(y, w) for all w.
Then, v is a unit voltage imposed on a and Z in G0 (where v(xy) := v(x) = v(y)), and the
effective conductance between a and Z does not change: Ceff (a, Z; G) = Ceff (a, Z; G0 ).

Solution. Since the only change is at edges adjacent to x and y, we only need to check that for
w = xy or w xy such that w 6 {a} Z, v is harmonic at w in G0 .
For w xy,

X X
c0 (w, u)(v(w) v(u)) = c(w, u)(v(w) v(u)) + (c(w, x) + c(w, y))(v(w) v(xy))
u u6=x,y
X
= c(w, u)(v(w) v(u)),
u

where we have used that v(xy) = v(x) = v(y). So if v is harmonic at w in G then v is harmonic
at w in G0 .
Similarly, for w = xy,

X X X
c0 (xy, u)(v(xy) v(u)) = c(x, u)(v(x) v(u)) + c(y, u)(v(y) v(u)),
u u u

so v is harmonic at xy in G0 . t
u

Example 12.4. What is the effective conductance between a and z in the following network:
78

a z

a z a z

2/3

1/2
a z a z

2 2
1/2

454

Exercise 12.3. Let (G, c) be a network, and let v be a unit voltage imposed on a and Z.
Suppose x, y 6 {a} Z are such that v(x) = v(y). Let c0 be a new weight function on G that is
identical to c except for the edge x y. For x y let c0 (x, y) = C 0 some arbitrary number,
possibly 0. Let 0 be the Laplacian on (G, c0 ).
Then, v is harmonic in G \ ({a} Z) also with respect to c0 . Conclude that the effective
conductance between a and Z is the same in both (G, c) and (G, c0 ).

Solution. Since the difference is only the edge x y, we only need to check that the harmonicity
is preserved at x and y. Because v(x) v(y) = 0,
X
c0z 0 v(z) = c0 (z, w)(v(z) v(w))
w
X
= c(z, w)(v(z) v(w)) + c0 (x, y)(v(x) v(y)) = cz v(z).
w : {z,w}6={x,y}
79

Thus v is a unit voltage imposed on a and Z with respect to c0 as well. Also,

Ceff (a, Z; (G, c0 )) = c0a 0 v(a) = ca v(a) = Ceff (a, Z; (G, c)).

t
u

Example 12.5. The network from the previous example can be reduced by removing the vertical
edge. 454

Exercise 12.4. Let G = (V, c) be a network such that V = Z and x y if and only if
|x y| = 1. For the weighted random walk (Xt )t on G define
t
X
Vt (x) = 1{Xn =x}
n=0

the number of visits to x up to time t. Let T0+ = inf {t 1 : Xt = 0}.

Calculate E0 [VT + (x)] as a function of c only.
0

Number of exercises in lecture: 4

Total number of exercises until here: 24
80

Random Walks

Ariel Yadin

Lecture 13: Thompsons Principle

Suppose G is a network. We think of weights as conductances, so it seems intuitive that

increasing the conductance of edges would result in making the graph more transient. This is
what we prove in this lecture.

13.1. Thomsons Principle

Definition 13.1. For F L2 (E) and for v such that v L2 (E) define the energy of F
and of v by
X X
E(F ) := hF, F i = r(e)F (e)2 and E(v) := hv, vi = c(x, y)(v(x) v(y))2 .
e xy

X Note that if v L2 (V ) then E(v) = 2hv, vi by the duality formula.

Lemma 13.2 (Thomsons Principle / Dirichlet Principle). Let G = (V, c) be a finite network,
let A, Z be disjoint subsets.
Joseph John Thomson The unit voltage v is the function that minimizes the energy E(f ) over all functions f with
(1856-1940) f (a) = 1 for all a A and f (z) = 0 for all z Z.

Proof. By the duality formula we have that for any f, f 0 C 0 (V ),

hf, f 0 i = 21 hf, f 0 i = hf, f 0 i.

(That is, the Laplacian is self-dual.) Since f v = 0 on A Z, and since v is harmonic off A Z,
we get that (f v)v 0. So,
X
h(f v), vi = hf v, vi = cx (f (x) v(x))v(x) = 0.
x

This implies

E(f ) = 2h(f v + v), f v + vi = E(f v) + E(v) E(v),

where we have used that the energy is always non-negative. t

u
Johann Dirichlet

(1805-1859)
81

Lemma 13.3 (Thomsons Principle - Dual Form). Let G be a finite network, let {a} , Z be
disjoint subsets. Let v(x) = 12 gZ (x, a) where gZ (x, a) is the Green function (the expected number
of visits to a started at x from time 0 until before hitting Z).
Then, over all flows F from a to Z with flow divF (a) = 1, the energy E(F ) is minimized at
I = v.

Proof. First, we know that v is a voltage on a and Z with v(z) = 0 for all z Z. Also,
divI(a) = 2v(a) = 1.
Let F be a flow from a to Z with divF (a) = 1. Then, F I is a flow from a to Z with
div(F I)(a) = 0. Since div(F I) is 0 off Z, and v is 0 on Z, we get that div(F I)v 0.
Thus,

hF I, Ii = hdiv(F I), vi = 0.

So,
E(F ) = E(F I + I) = E(F I) + E(I) E(I).

t
u

Corollary 13.4 (Rayleighs Monotonicity Principle). Let G be a finite network, and let a be
a point not in a subset Z. Suppose c0 is a weight function on G such that c c0 . Then,
John William Strutt, 3rd
Ceff (a, Z; c) Ceff (a, Z; c0 ).
Baron Rayleigh (1842-1919)

Proof. Let v be the unit voltage imposed on a and Z with respect to c, and let u be the unit
voltage imposed on a and Z with respect to c0 .
Note that
X
E(v) = 2hv, vi = 2 cx v(x)v(x) = 2ca v(a) = 2Ceff (a, Z; c),
x

because v(x) = 0 for x 6 {a} Z, v(z) = 0 for z Z, and v(a) = 1. Similarly, E(u) =
2Ceff (a, Z; c0 ). (This fact is called conservation of energy.)
Since ca c0a , using Thomsons principle,
X X
Ceff (a, Z; c) = 1
2 c(x, y)(v(x) v(y))2 1
2 c(x, y)(u(x) u(y))2
x,y x,y
X
1
2 c0 (x, y)(u(x) u(y))2 = Ceff (a, Z; c0 ).
x,y

t
u
82

Corollary 13.5. Let G be an infinite network. Let c0 be a weight function on G such that
c0 c.
If (G, c) is transient, then also (G, c0 ) is transient.

Proof. Fix a vertex o G. For every n, let Gn be the ball of radius n around o; that is

Gn = {x G : dist(x, o) n} .

So (Gn )n form an increasing sequence of subgraphs that exhaust G. Let Zn = Gn+1 \ Gn , which
is the outer boundary of the ball of radius n. We know that G is transient, which is equivalent
to
lim Reff (a, Zn ; c) <
n

(because imposing a unit voltage on a and G \ Gn is the same as imposing a unit voltage on a
and Zn ). Now, for each fixed n, since c0 c, considering the finite networks (Gn , c) and (Gn , c0 ),
we have that
Ceff (a, Zn ; c) Ceff (a, Zn , c0 ).

Thus,
lim Reff (a, Zn ; c0 ) lim Reff (a, Zn ; c) < ,
n n
0
so (G, c ) is transient. t
u

Exercise 13.1. Let H be a subgraph of a graph G (not necessarily spanning all vertices
of G. Show that if the simple random walk on H is transient then so is the simple random walk
on G.

13.2. Shorting

Another intuitive network operation that can be done is to short two vertices. This can be
thought of as imposing a conductance of between them. Since this increases the conductance,
it is intuitive that this will increase the effective conductance.

Proposition 13.6. Let (G, c) be a finite network. Let b, d G and define (G0 , c0 ) by shorting
b and d: Let V (G0 ) = V (G) \ {b, d} {bd} and c0 (z, w) = c(z, w) for z, w 6 {b, d} and c0 (bd, w) =
c(b, w) + c(d, w).
Then, for any disjoint sets {a} , Z, we have that Ceff (a, Z; G) Ceff (a, Z; G0 ).
83

Proof. Let v be the unit voltage imposed on a and Z with respect to c, and let u be the unit
voltage imposed on a and Z with respect to c0 .
Conservation of energy tells us that
X X
2Ceff (a, Z; c) = c(x, y)(v(x) v(y))2 and 2Ceff (a, Z; c0 ) = c0 (x, y)(u(x) u(y))2 .
x,y x,y

Note that u can be viewed as a function on V (G) by setting u(b) = u(d) = u(bd). Since ca c0a ,
using Thomsons principle,
X X
Ceff (a, Z; c0 ) = 1
2 c0 (x, y)(u(x) u(y))2 + c0 (bd, w)(u(bd) u(w))2
x,yG\{b,d} w
X X X
= 1
2 c(x, y)(u(x) u(y))2 + c(k, w)(u(k) u(w))2
x,yG\{b,d} k=b,d w
X
= 1
2 c(x, y)(u(x) u(y))2
x,yG
X
1
2 c(x, y)(v(x) v(y))2 = Ceff (a, Z; c). t
u
x,yG

Number of exercises in lecture: 1

Total number of exercises until here: 25
84

Random Walks

Ariel Yadin

Lecture 14: Nash-Williams

14.1. A Probabilistic Interpretation of Current

Proposition 14.1. Let (G, c) be a network. Let a G and Z G such that the component
of a in G \ Z is finite.
For the weighted random walk on G, (Xt )t , and for any edge x y, let Vx,y be the number of
times the walk goes from x to y until hitting Z; that is,
TZ
X
Vx,y := 1{Xk1 =x,Xk =y} .
k=1

Then,
Ea [Vx,y Vy,x ] = v(x, y) Reff (a, Z),

where v is a unit voltage imposed on a, Z.

Proof. Let
Z 1
TX
ca ca
g(x) = gZ (a, x) = Ea 1{Xk =x} = gZ (x, a).
cx cx
k=0
We have already seen that g is harmonic in G \ ({a} Z). Also, g(z) = 0 for all z and
1
g(a) = = ca Reff (a, Z).
Pa [TZ < Ta+ ]
(g is a voltage imposed on a, Z with g(z) = 0 for all z Z.)
Now,

X
Ea [Vx,y ] = Pa [Xk1 = x, Xk = y, TZ > k 1]
k=1

X 1 1
= Pa [Xk = x, TZ > k]P (x, y) = cx g(x)P (x, y) = g(x)c(x, y).
ca ca
k=0

Thus,
1
Ea [Vx,y Vy,x ] = c(x, y)(g(x) g(y)).
ca
g
That is, since v = g(a) is a unit voltage imposed on a, Z, and since c1a g = Reff (a, Z)v,

Ea [Vx,y Vy,x ] = Reff (a, Z) c(x, y)(v(x) v(y)). t

u
85

14.2. The Nash-Williams Criterion

Definition 14.2. Let G be a graph. Let A, Z be disjoint subsets.

A subset of edges is a cut between A and Z if any path : a z with a A and z Z
must pass through and edge in .
A subset of edges is a cutest, (sometimes a cut between A and ), if any infinite simple
path that starts at a A, must pass through an edge in .

One intuitive statement is that if e is a cut edge between a and Z, then Reff (a, Z) r(e),
because there is at least some more resistance between a and Z.

Proposition 14.3. Let (G, c) be a finite network. Let {a} , Z be disjoint subsets and let e be
a cut edge between a and Z. Then Reff (a, Z) r(e).

Proof. Suppose that e = (x, y). Let Vx,y be the number of times a random walk crosses the edge
(x, y) until hitting Z and let Vy,x be the number of times the walk crosses y, x before hitting Z.
We have seen that
Ea [Vx,y Vy,x ] = v(x, y) Reff (a, Z),

where v is a unit voltage imposed on a and Z.

Because G is finite, we know by uniqueness of harmonic functions that v(x) = Px [Ta < TZ ].
Because (x, y) is a cut edge between a and Z, to get from y to a the walk must pass through x;
that is,
v(y) = Py [Ta < TZ ] = Py [Tx < TZ ] Px [Ta < TZ ] v(x).

So 0 v(x) v(y) 1.
Now, since (x, y) is a cut edge between a and Z, we must have that Vx,y Vy,x 1, because
the walk must cross the edge (x, y) and every time it crosses back over (y, x) it must return to
cross (x, y). Thus,

1 c(x, y)(v(x) v(y))Reff (a, Z) c(x, y)Reff (a, Z).

t
u

If is a cut between a and Z then shorting all edges in would result in a cut edge of
P
conductance at most e c(e). A natural generalization of the above is the following.

Lemma 14.4 (Nash-Williams Inequality). Let (G, c) be a finite network, and {a} , Z disjoint
sets. Suppose that 1 , . . . , k are k pairwise disjoint cuts between a and Z. Then,
Crispin Nash-Williams

(1932-2001)
86

1
k
X X
Reff (a, Z) c(e) .
j=1 ej

Proof. Note that since removing edges from a cut-set only increases the right hand side, we
can prove the lemma with the assumption that cut-sets are minimal. Specifically, they do not
contain both (x, y) and (y, x) for an edge x y.
Let v be a unit voltage imposed on a and Z. We know (conservation of energy) that 21 E(v) =
Ceff (a, Z).
For an edge (x, y) let Vx,y be the number of crossings from x to y until hitting Z; that is,

TZ
X
Vx,y = 1{Xk1 =x,Xk =y} .
k=1

Then, for any minimal cut between a and Z, we have that Pa -a.s.

X
Vx,y Vy,x 1.
(x,y)

Also, we have that for any edge (x, y),

Ea [Vx,y Vy,x ] = v(x, y) Reff (a, Z).

Thus, applying Cauchy-Schwarz, for any cut between a and Z,

2
X X X
1 Reff (a, Z)2 v(x, y) Reff (a, Z)2 c(x, y) c(x, y)(v(x) v(y))2 .
(x,y) (x,y) (x,y)

That is, for any one of the cuts j ,

1
X X
c(e) Reff (a, z)2 c(x, y)(v(x) v(y))2 .
ej (x,y)

Since the cuts j are disjoint, and since we assumed that the cut does not contain both (x, y)
and (y, x) (because they are minimal), we have that
1
k
X X 1X
c(e) Reff (a, z)2 c(x, y)(v(x) v(y))2 = Reff (a, Z).
j=1
2 x,y
ej

t
u
87

Corollary 14.5 (Nash-Williams Criterion). Let (G, c) be an infinite network. If (n )n is a

sequence of pairwise disjoint finite cutsets between a and such that

!1
X X
c(e) = ,
n=1 en

then (G, c) is recurrent.

Proof. Fix n. Let Gn be subnetwork induced by (G, c) on the smallest ball (in the graph metric)
Sn
that contains j=1 j . Let Zn = G \ Gn .
So (Gn )n exhaust G and for each fixed n,
1
n
X X
Reff (a, Zn ) c(e) .
j=1 ej

Letting n , the left hand side tends to Reff (a, ) and the right hand side tends to the
infinite sum. Since Reff (a, ) , (G, c) is recurrent. t
u

Example 14.6. We now give a proof that Z and Z2 are recurrent.

Recall that we could prove this by showing that

1/ t d = 1

P0 [X2t = 0] const .
1/t

d=2

However, it will be easier to do this without these calculations (especially the more complicated
Z2 case).
For Z this is easy, because Z is just composed of edges in series, so for any n > 0,

Reff (0, {n, n}) = n/2 .

Now for Z2 : By the Nash-Williams criterion, it suffices to find disjoint cutsets (n )n such
that
X 1
= .
n
|n |

Indeed, taking

n = {(x, y) : ||x|| = n, ||y|| = n + 1}

we have that |n | = 4(2n + 1). 454

X ADD: example that Nash-Williams is not necessary.

Number of exercises in lecture: 0

Total number of exercises until here: 25
89

Random Walks

Ariel Yadin

Lecture 15: Flows

15.1. Finite Energy Flows

The Nash-Williams criterion was a sufficient condition for recurrence. We now turn to a
stronger condition which is necessary and sufficient.
Let (G, c) be an infinite weighted graph. Recall that a flow from A to Z is an anti-symmetric
function with vanishing divergence off A Z.
In this spirit, we say that F is a flow from o G to if

F is anti-symmetric.
divF (o) 6= 0 and divF (x) = 0 for all x 6= o.

If in addition divF (o) = 1 we say that F is a unit flow from o to infinity.

Theorem 15.1 (T. Lyons, 1983). A weighted graph (G, c) is transient if and only if there
exists a finite energy flow on (G, c) from some vertex o G to .
Terrence Lyons

Proof. The proof is an adaptation of a method of H. Royden in the continuous world.

Assume that F is a flow from o to . By changing F 7 F/divF (o) we can assume without
loss of generality that F is a unit flow.
For each n let Gn be the finite subnetwork of G induced on the ball of radius n around o (in
the graph metric). Let Zn = Gn \ Gn1 . Transience of G is equivalent to

lim Reff (o, Zn ) < .

1
Let vn = 2 gZn (x, o), where gZn is the Green function on the finite network Gn . Since
divvn (o) = 2vn (o) = 1, the dual version of Thompsons principle tells us that E(F )
EGn (F ) EGn (vn ).
Also, since
1 co
vn (o) = 21 gZn (o, o) = = Reff (o, Zn )
2 Po [TZn < To+ ] 2
90

and vn (o) = 21 , we get that

X c2o
EGn (vn ) = 2 cx vn (x)vn (x) = 2co vn (o)vn (o) = Reff (o, Zn ).
x
2

Thus, if F has finite energy on G then

2 2
lim Reff (o, Zn ) = lim EG (vn ) 2 E(F ) <
n c20 n n co
and (G, c) is transient.
For the other direction, assume that (G, c) is transient and consider the functions vn (x) =
Px [To < TZn ] and v(x) = Px [To < ]. vn is a unit voltage imposed on o and Zn in Gn , and
vn (x) % v(x) for every x by monotone convergence. Note that v(o) = vn (o) = 1 and v, vn are
non-constant because (G, c) is transient.
Let In = vn and I = v. Note that for every edge e, In (e)2 r(e) I(e)2 r(e). Also, for
every n, since Gn is finite,

E(In ) = 2hvn , vn i = 2co vn (o)vn (o) = 2Ceff (o, Zn ) 2co < .

Thus, Fatous lemma (for sums) tells us that

X X
E(I) = lim In (e)2 r(e) lim inf In (e)2 r(e) 2co < .
n n
e e

Since I = v, we have that

X
divI(o) = 2v(o) = 2 P (o, y)(1 Py [To < ]) = 2 Po [To+ = ] > 0
y

by transience, and divI(x) = 2v(x) = 0 for all x 6= o. That is, I is a flow from o to with
finite energy. t
u

15.2. Flows on Zd

We now want to give some more details about random walks on Zd .

We start with a proof that Zd is transient for d 3. By Rayleighs monotonicity principle it
suffices to prove that Z3 is transient. By Lyons Theorem it suffices to provide a finite energy
flow on Z3 .
Let be a probability measure on paths on some graph G. Let denote the random path,
and suppose that -a.s. every vertex of G is visited finitely many times. Then, we can define
V (x, y) to be the number of times crosses the edge (x, y), and E (x, y) to be the expectation
of V (x, y) under .
91

Claim 15.2. Suppose that is infinite and 0 = o -a.s. Suppose also that E (x, y) < for
every edge (x, y).
Then, F (x, y) := E (x, y) E (y, x) is a flow from o to .

Proof. Anti-symmetry is clear. Also, for any x 6= o, since is infinite, it cannot terminate at x.
Thus, every time crosses an edge (y, x) it must then cross an edge (x, z) immediately after.
P
Thus, -a.s. yx V (x, y) V (y, x) = 0 and so divF (x) = 0.
Also, since 0 = o, we get one extra passage out of o, but the rest must cancel: co divF (o) =
P
2 y F (o, y) = 2. t
u

That is, to show a graph is transient, we need to construct a measure on infinite paths, that
start at some vertex, and the expected number of visits to any vertex is finite. If the energy is
finite for such a measure, we have transience.

15.2.1. Wedges. Let us prove something a bit more general than Z3 being transient and Z2
being recurrent.
Let : N N be an increasing function. Consider the subgraph of Z3 induced on

W = (x, y, z) Z3 : |z| (|x|) .

(This is the -wedge.)

Theorem 15.3 (T. Lyons 1983). If

X 1
=
n=1
n((n) + 1)

then W is recurrent.
If (n + 1) (n) 1 and

X 1
<
n=1
n((n) + 1)
then W is transient.

Proof. The first direction is simpler. Let W be a wedge, and let Bn denote the ball of radius n
around 0 in the graph metric (which is the L1 distance in R3 ). Let Bn be the edges connecting
Bn to Bnc . Thus, Bn form disjoint cutsets between 0 and .
What is the size of Bn ? Well there are at most 2n choices for x and then, given x there are
at most 2((|x|) + 1) 2((n) + 1) choices for z, which then determines y up to sign. Thus, the
92

size is bounded by |Bn | O(n((n) + 1)). So Nash-Williams tells us that if

X 1
=
n=1
n((n) + 1)

the walk is recurrent.

Now for the other direction. We define a measure on paths in W . Let U1 , U2 be uniformly
chosen on [0, 1] independently. Let L be the set {(n, U n, U 0 (n)) : n N}. Choose a monotone
path in W that is always at distance at most 1 from L. (A monotone path is a path in Z3
such that dist(t+1 , 0) dist(t , 0).)
Fix an edge e in W and suppose that (x, y, z) is an endpoint of e. Let R = |x| + |y| + |z|.
The event that e implies that (x, y, z) is at distance at most 1 from L. That implies that
R1
3n n + U n + U 0 (n) R 1 n
3
(where we have used that (n) n). Also

|x n| + |y U n| + |z U 0 (n)| 1

so nU [y 1, y + 1] and (n)U 0 [z 1, z + 1]. Thus,

4
[e ] .
n((n) + 1)
Because visits any edge at most once, this is also a bound on E (e). Since there are at most
O(R((R) + 1)) O(n((n) + 1)) such possibilities for (x, y, z) W , we have that the energy
of the flow is at most
X X X X 1
2 E (e)2 const. n((n) + 1)
n
n2 ((n) + 1)2
R |x|+|y|+|z|=R e : e+ =(x,y,z)
X 1
= const. .
n
n((n) + 1)
Since this is finite, the flow is of finite energy and the wedge is transient. t
u

Example 15.4. For example, if we choose (n) = n , we get a transient wedge. This is also
true if we take (n) = (log n)2 .
If we choose (n) = 1 we get essentially Z2 and recurrence, of course. Also, (n) = log n will
give a divergent sum, so this wedge is recurrent. 454

Number of exercises in lecture: 0

Total number of exercises until here: 25
93

Random Walks

Ariel Yadin

Lecture 16: Resistance in Euclidean Lattices

Lets wrap up our discussions with some examples of random walks on graphs.

16.1. Euclidean lattices

We have already seen that Zd are transient for d 3 and recurrent for d 2. We saw two
different methods to prove this.
The first was brute force computation of P0 [St = 0], using Stirlings formula, and then ap-
proximating E0 [Vt (0)] and E0 [V (0)].
The second method was more robust, and less computational. It involved approximating the
energy of certain flows, mainly taking a uniform direction and following that direction with a
path in the lattice.
Energy estimates and the Nash-Williams inequality can give us better control of the effective
resistance and Green function.

16.1.1. Resistance Estimates. Since Zd , d 3, is transient, we know that Reff (0, Bn ) is

bounded by two constants, where Bn is the boundary of the ball of radius n around 0.
However, for d = 2 we know that Reff (0, Bn ) . We now investigate the growth rate of
this function.

z Z2 : dist(0, z) n . Then, there exist constants 0 <

Proposition 16.1. Let Zn =
c, C < independent of n such that

c log n Reff (0, Zn ) C log n.

Proof. The lower bound follows by noting that for the sets

n = (z, z 0 ) Z2 : dist(z, 0) = n 1, dist(z 0 , 0) = n ,

all 1 , 2 , . . . , n are cuts between 0 and Zn , with size |n | = O(n). So the Nash-Williams
inequality gives
n n
X 1 X 1
Reff (0, Zn ) const. = const. log n.
|k | k
k=1 k=1
94

For the other direction, let vn (x) = 14 gZn (x, 0). So vn is a voltage imposed on 0 and Zn , with
vn (0) = 1
4 and vn (0) = (4 P0 [T0+ > TZn ])1 = Reff (0, Zn ). Also,

E(vn ) = 8vn (0)vn (0) = 2Reff (0, Zn ).

Let U be a uniform random variable in [0, 1], and let L = (n, U n) R2 . Let be some

random monotone path from 0 that always is at distance at most 1 from L. For any edge
e = (x, y) in Z2 , the event e implies that |x n| 1 and nU [y 1, y + 1]. Thus, the
2 2
expected number of times crosses e is at most n |x|1 . Let Fn be the flow given by this
random path restricted to G \ Zn . Since the number of edges with endpoint at distance n from
0 is O(n),
n
X
E(Fn ) O(k k 2 ) = O(log n).
k=1

Recall that divFn (0) = 1/2, so Thompsons principle tells us that for I = vn , since I is a
current with divI(0) = 2vn (0) = 21 ,

E(Fn ) E(I) = E(vn ) = 2Reff (0, Zn ).

t
u

Remark 16.2. If we tried to adapt the argument above to Zd , we would see that the probability
that an edge e at distance n from 0 is in is at most O(n(d1) ) (because we would be looking
at the direction (n, U1 n, U2 n, . . . , Ud1 n) for U1 , . . . , Ud i.i.d. ). Thus,
n
X n
X
Reff (0, Zn ) 21 E(Fn ) O(k d1 k 2(d1) ) = O(k 1d ) = O(n2d ).
k=1 k=1

Similarly the lower bound would follow from the Nash-Williams inequality.

16.2. Regular Trees

Let Td denote the d regular tree. Fix some vertex Td as the root. For n 0 let
Tn = {x Td : dist(x, ) = n}. It is easy to check that |T0 | = 1 and |Tn | = d(d 1)n1 for
n 1.
For any x, y Tn there exists a graph automorphism : Td Td that maps (x) = y and
fixes each level Tn ; i.e. (Tn ) = Tn . Thus, if vn is a unit voltage imposed on and Tn , we
have that vn is constant on Tk for k n. Thus, all vertices in each level Tk can be shorted into
95

one vertex, without changing the effective resistance Reff (, Tn ). This gives us a network whose
vertices are {0, 1, . . . , n}, and resistances r(k, k + 1) = |Tk+1 |1 . Thus, the effective resistance is
n
1X 1 d1
1 (d 1)n .

Reff (, Tn ) = k1
=
d (d 1) d(d 2)
k=1

d1
Thus, Reff (, ) = d(d2) < so Td is transient for d > 2.

16.2.1. A computational proof. We now give a computational proof that the random walk
on Td is transient for d > 2.
Let (Xt )t be the random walk on Td , and consider the following sequences: Dt := dist(Xt , )
and Mt = (d 1)Dt . Let Tj be the first time Xt Tj .
First, note that

E[Mt+1 | Ft ] = 1{Dt =0} (d 1)1 + 1{Dt >0} 1

1)Mt + (1 d1 )(d 1)1 Mt

d (d

= 1{Dt =0} (d 1)1 + 1{Dt >0} Mt .

So under Px for x 6= , we have that (MtT0 )t is a bounded martingale. If Px [T0 < ] = 1 we

would have by the optional stopping theorem that

(d 1)dist(x,) = Ex [MT0 ] = 1,

which is a contradiction. Since

1 X
P [T+ < ] = Px [T0 < ] < 1,
d
xT1

we get that Td is transient.

In fact, the above lets us calculate exactly the probability to escape from : If T = T0 Tn
then by the optional stopping theorem, for x T1 ,

(d 1)1 = Ex [MT ] = Px [T > Tn ] (d 1)n + 1 Px [T > Tn ],

so
d2 (d 1)n1 1
Px [T > Tn ] = = 1 .
d 1 (d 1)n+1 (d 1)n 1
Also, v(x) = Px [T < Tn ] is a unit voltage on and Tn . Thus,
1 X d2
v() = Px [T > Tn ] = .
d d 1 (d 1)n+1
xT1

So
d(d 2) d(d 2) 1
Ceff (, Tn ) = dv() = = 1 (d 1)n ,
d 1 (d 1)n+1 d1
96

which coincides with our calculation above.

16.3. Flows from random paths

In this section, we generalize the previous constructions on Zd .

Let be a probability measure on infinite paths in G started from o G. By mapping each
path in the support of to its loop-erasure, we may assume without loss of generality that is
supported on simple paths (paths that do not cross any vertex more than once).
Let , be two independent random paths of law . Now, define F C 0 (E) by F (x, y) =
((x, y) ) ((y, x) ) (by e we mean that there exists n such that e = (n , n+1 ).
We claim that F is a flow. Indeed, for x 6= o the number of edges going into x in equals the
number of edges exiting x in . Thus, for x 6= o,
X X
1 2
divF (x) = cx (F (x, y) F (y, x)) = cx E[1{(x,y)} 1{(y,x)} ] = 0.
yx yx

Similarly for x = o, there is one more edge exiting o than edges entering o. So we have divF (o) =
2
co .

Let us calculate the energy of F . First, note that for x y,

F (x, y)2 = (((x, y) ) ((y, x) ))2 ((x, y) )2 + ((y, x) )2 .

Thus, for , independent paths of law ,

X X
E(F ) = r(e)F (e)2 2 r(e)(e ) (e )
e e
X
= 2E r(e)1{e} 1{e}
e

We conclude:

Proposition 16.3. Let G be a graph. Suppose that G admits a probability measure on

infinite paths in G started from some fixed o G such that for two independent paths , we
have E | | < . Then G is transient.

The following is an open question.

Conjecture 16.4. Let G be a transitive graph. If the simple random walk on G is transient,
then there exists a measure on infinite paths started from some fixed o G such that for two
independent paths , of law , there exists > 0 with

E[e|| ] < .
97

Number of exercises in lecture: 0

Total number of exercises until here: 25
98

Random Walks

Ariel Yadin

Lecture 17: Spectral Analysis

17.1. Spectral Radius

Let (G, c) be a network. Recall that the transition matrix P is an operator on C 0 (V ) that
operates by P f (x) = y P (x, y)f (y). Also, recall that the space L2 (V ) is the space of functions
P

f C 0 (V ) that admit
X
hf, f i = cx f (x)2 < .
x

One can easily check that P : L2 (V ) L2 (V ). Also, P is a self-adjoint operator, and its norm
admits ||P || 1 (this is called a contraction).

Proposition 17.1. Let (G, c) be a weighted graph with transition matrix P . The limit

(P ) = lim sup(P n (x, y))1/n

does not depend on x, y.

Proof. Fix z, w V . We will show that (P ) lim supn (P n (z, w))1/n .

0
Because P is irreducible, we have that for some t, t0 > 0, P t (z, x) > 0, P t (y, w) > 0. Thus,
0 0
P n (z, w) P t (z, x)P ntt (x, y)P t (y, w).
0
Since (P t (z, x))1/n 1 and (P t (y, w))1/n 1,

lim sup(P n (z, w))1/n lim sup(P n (x, y))1/n .

n n

Exchanging the roles of x, y and z, w we get that the lim sup does not depend on the choice of
x, y. t
u

Definition 17.2 (Spectral Radius). Let (G, c) be a weighted graph with transition matrix P .
Define the spectral radius of (G, c) to be

(G, c) = (P ) := lim sup P n (x, x).

n
99

One of the reasons for the name spectral radius is that by the Cauchy-Hadamard criterion, the
generating function for the Green function has radius of convergence 1 . That is, the function

X
g(x, y|z) = P n (x, y)z n
n=0

converges when |z| < 1 .

Note that 1, and that g(x, y|z = 1) is exactly the Green function. Since the Green
function converges if and only if G is transient, we have that for recurrent graphs = 1. The
Jacques Hadamard
natural question arises, what are precisely the cases for which = 1? This has been answered
(1865-1963)
by Kesten in his PhD thesis in 1959, see Theorem 18.1 below.
The above is a good reason for the radius part of the name spectral radius. The next propo-
sition explains the spectral part of the name.

Proposition 17.3. Let (G, c) be a weighted graph with transition matrix P . Then, ||P || =
(P ). Moreover, for any x, y,
r
cy
P n (x, y) (P )n .
cx

Proof. First, note that for any x, y,

r
cy
P n (x, y) = c1 n 1 n
x hP y , x i cx ||P || ||y || ||x || = ||P ||n .
cx
Thus, (P ) ||P ||.
The other direction is a bit more complicated. Let f L2 (V ) have finite support S V .
Now, because S is finite, for every there exists N = N (, S), such that for all n > N , and all
x, y S we have that P 2n (x, y) ((P ) + )2n . Thus, for all n > N (, S),
X
||P n f ||2 = hP 2n f, f i = cx P 2n (x, y)f (x)f (y)
x,y
X
2n
cx P (x, y)f (x)f (y)1{f (x)>0} 1{f (y)>0}
x,y
X
((P ) + )2n cx f (x)f (y)1{f (x)>0} 1{f (y)>0} = Cf ((P ) + )2n .
x,y

Thus, lim supn ||P n f ||1/n (P ) + for any , and so lim supn ||P n f ||1/n (P ).
Now, consider the sequence an = ||P n f ||. We have that

a2n+1 = hP n+1 f, P n+1 f i = hP n f, P n+2 f i

||P n f || ||P n+2 f || = an an+2 .

100

an+1
That is, bn := an is a non-decreasing sequence. Thus, the following limits exist and satisfy

sup bn = lim bn = lim a1/n

n (P ).
n n n

So,
||P f ||
= b0 sup bn (P ).
||f || n

This holds for all finitely supported f .

We want this to hold for all f L2 (V ).We now use the fact that the finitely supported
functions are dense in L (V ). Indeed, let f L2 (V ). Fix > 0. Since x cx f (x)2 < , there
2
P

exists a finite set S V such that

X
cx f (x)2 < 2 .
x6S

Thus, setting g = f 1S , we have that ||f g||2 < 2 . Now, since g is finitely supported, and
since ||g|| ||f ||,

||P f || = ||P (f g) + P g|| ||P || ||f g|| + ||P g||

||P || + (P )||g|| ||P || + (P )||f ||.

Taking 0, ||P f || (P )||f ||. Since this holds for all f , we get that ||P || (P ). t
u

Exercise 17.1. Let (G, c) be a weighted graph with transition matrix P . Let (P ) be
the spectral radius. Show that if G is recurrent then (P ) = 1.

17.1.1. Energy minimization. Let (G, c) be a weighted graph. Consider the functions on G
with finite support; i.e. L0 (V ). These all have finite energy. We want to find the function that
minimizes the energy, when normalized to have length 1.

Proposition 17.4. Let (G, c) be a weighted graph. Then

E(f )
1 (G) = inf .
06=f L0 (V ) 2hf, f i

(Sometimes 1 is called the spectral gap. This is the minimal possible energy of unit
length functions.)
101

Proof. Note that for f L0 (V ) we can use duality so that

1
2 E(f ) = hf, f i = hf, f i hP f, f i.

Thus, it suffices to show that

hP f, f i
= := sup .
06=f L0 (V ) hf, f i

Now, for any f 6= 0 we have by Cauchy-Schwarz

|hP f, f i| ||P f || ||f || ||P || hf, f i,

so ||P || = (P ). On the other hand, since P is self-adjoint, for any f, g L0 (V ),

1
hP f, gi = (hP (f + g), f + gi hP (f g), f gi) .
4
So

hP f, gi (hf + g, f + gi + hf g, f gi) = (hf, f i + hg, gi) .
4 2
||f ||
Now take g = ||P f || P f . Plugging this in above gives

||f || ||f ||2

||f || ||P f || = hP f, P f i hf, f i + hP f, P f i = ||f ||2 .
||P f || 2 ||P f ||2
So ||P f || ||f || for all f L0 (V ).
Using the fact that L0 (V ) is dense in L2 (V ) completes the proof: For any f L2 (V ) and any
find g L0 (V ) such that ||f g|| < and ||g|| ||f ||. Then,

||P f || ||P (f g)|| + ||P g|| ||P || + ||g|| ||P || + ||f ||.

Taking 0 gives that (P ) = ||P || . t

17.2. Isoperimetric Constant

For a graph G, we are interested in how small a boundary of a set can be, compared to the
volume of that set. These serve as bottlenecks in the graph, so a random walk can get stuck
inside for a while. Thus, it makes sense to define the following.

Definition 17.5. Let (G, c) be a weighted graph. Let S G be a finite subset. Define the
(edge) boundary of S to be

S = {(x, y) E(G) : x S , y 6 S} .

Define the isoperimetric constant of G to be

= (G, c) := inf {c(S)/c(S) : S is a finite connected subset of G} .

102

P P
Here c(S) = eS c(e) and c(S) = xS cx .

Of course 1 (G) 0 for any graph. When (G) > 0, we have that sets expand: the
edges coming out of a set carry a constant proportion of the weight of the set.

Definition 17.6. Let (G, c) be a weighted graph. If (G, c) = 0 we say that (G, c) is amenable.
Otherwise we call (G, c) non-amenable.
A sequence a finite connected sets (Sn )n such that c(Sn )/c(Sn ) 0 is called a Folner
sequence, or the sets are called Folner sets.

Erling Folner (1919 - 1991) The concept of amenability was introduced by von Neumann in the context of groups and the
Banach-Tarski paradox. Folners criterion using boundaries of sets provided the ability to carry
over the concept of amenability to other geometric objects such as graphs.
The isoperimetric constant is a geometrical object. It turns out that positivity of the isoperi-
metric constant is equivalent to the spectral radius being strictly less than 1.

Exercise 17.2. Let S Td be a finite connected subset, with |S| 2. Show that
John von Neumann
|S| = |S|(d 2) + 2.
(1903-1957)
d2
Deduce that (Td ) = d .

Number of exercises in lecture: 2

Total number of exercises until here: 27
103

Random Walks

Ariel Yadin

Lecture 18: Kestens Amenability Criterion

18.1. Kestens Thesis

Kesten, in his PhD thesis in 1959 proved the connection between amenability and spectral
radius strictly less than 1. This was subsequently generalized to more general settings by others
(including Cheeger, Dodziuk, Mohar).

Theorem 18.1. A weighted graph (G, c) is amenable if and only if (G, c) = 1. In fact,

2 p
1 1 2 1 .
2

Harry Kesten (1931)

First we require

Lemma 18.2. Let (G, c) be a weighted graph. For any f L0 (V ) (that is, with finite support)

X X
2(G, c) cx f (x) |f (x, y)|.
x x,y

Note that if f = 1S for a finite set S, this is exactly the definition of .

Proof. Since f has finite support we can write

Z Z X X X
c({x : f (x) > t})dt = cx 1{f (x)>t} dt = cx f (x)1{f (x)0} cx f (x).
0 0 x x x

Also,
Z
1{f (x)>tf (y)} dt = |f (x) f (y)|1{f (x)f (y)} .
0

Using the set St = {x : f (x) > t} we see that

St = {(x, y) E : f (x) > t y} .

104

Since for any t, c(St ) c(St ), we can integrate over t to get

X Z Z X
cx f (x) c(St )dt c(x, y)1{f (x)>tf (y)} dt
x 0 0 x,y
X 1X
= c(x, y)|f (x) f (y)|1{f (x)f (y)} |f (x, y)|,
x,y
2 x,y

where we have used the fact that all sums are finite because f has finite support. t
u
p
Proof of Theorem 18.1. The leftmost inequality is just 2 /2 1 1 2 , valid for any
[0, 1].
The rightmost inequality follows by taking a sequence of finite connected sets (Sn )n such that
= limn c(Sn )/c(Sn ). Since

(1Sn (x, y))2 = c(x, y)2 (1{(x,y)Sn } + 1{(y,x)Sn } ),

1 1X X
E(1Sn ) = r(x, y)(1Sn (x, y))2 = c(x, y)1{(x,y)Sn } = c(Sn ).
2 2 x,y x,y
P
Also, h1Sn , 1Sn i = x cx 1{xSn } = c(Sn ). Thus,
E(f ) c(Sn )
1= inf lim = .
06=f L0 (V ) 2hf, f i n c(Sn )
The central inequality is 2 1 2 . We use that
E(f ) hP f, f i
1= inf0 and sup .
06=f L (V ) 2hf, f i 06=f L0 (V ) hf, f i

First, for f L0 (V ),
X X X
2hf, f i + 2hP f, f i = c(x, y)f (x)2 + c(x, y)f (y)2 + 2 c(x, y)f (x)f (y)
x,y x,y x,y
X
= c(x, y)(f (x) + f (y))2
x,y

Applying Cauchy-Schwarz,
X X
42 hf, f i2 c(x, y)(f (x) f (y))2 c(x, y)(f (x) + f (y))2
x,y x,y

= E(f ) (2hf, f i + 2hP f, f i) 2E(f ) hf, f i (1 + ).

105

Rearranging, we get that for any f L0 (V ),

E(f )
42 2 (1 + ).
hf, f i
Taking infimum over all f L0 (V ), we get that 2 (1 )(1 + ) = 1 2 as required. t
u

Example 18.3. Lets calculate (Td ), the spectral radius for the d-regular tree.
Let r be the root of Td , and let Tn = {x : dist(x, r) = n}.
For one direction, consider the function
n
X
fn (x) = (d 1)k/2 1{xTk } = 1{1dist(x,r)n} (d 1)dist(x,r)/2 .
k=1

If x y then c(x, y)f (x)f (y) = (d 1)(dist(x,r)+dist(y,r))/2 if 1 dist(x, r), dist(y, r) n, and
0 otherwise. Thus, since |Tk | = d(d 1)k1 ,
X n
X
||fn ||2 = cx (d 1)dist(x,r) = d(d 1)k1 d (d 1)k
x : 1dist(x,r)n k=1

= d2 (d 1)1 n.

Simlarly,
2(d1)1/2

d (d 1)dist(x,r)/2 2 dist(x, r) n 1,

1/2
(d1) (d 1)dist(x,r)/2

dist(x, r) {1, n} ,

d
P fn (x) =
(d 1)1/2 x = r,

1 (d 1)n/2

dist(x, r) = n + 1,
d

and 0 otherwise. So,

X
||P fn ||2 = cx (P fn (x))2
x
n1
X 4(d 1) 1
=d d(d 1)k1 (d 1)k + d (d 1)1 + d2 (d 1)n 2 (d 1)n
d2 d
k=2
d1 d1
+ d2 (d 1)1 + d2 (d 1)n1 2 (d 1)n
d2 d
d d
= 4(n 2) + + 1 + 1 + 1 = 4n 5 + .
d1 d1
This implies that
2 d1
(Td ) ||P fn ||/||fn || .
d

2 d1
For the other direction, since (Td ) = d2
p
d , we have that (Td ) 1 (Td )2 = d .
454
106

Number of exercises in lecture: 0

Total number of exercises until here: 27
107

Random Walks

Ariel Yadin

Lecture 19:

19.1. Speed of Random Walks

Let (G, c) be a weighted graph and let (Xt )t be the corresponding weighted random walk. In
the exercises one shows that the limit
Ex [dist(Xt , X0 )]
lim
t t
exists for transitive graphs, and is independent of the choice of starting vertex x. We call this
the speed of the random walk. For general graph this limit may not exist, so we consider lim inf
and lim sup of the sequence. Of course, these limits lie between 0 and 1.

Definition 19.1. Let (G, c) be a weighted graph and let (Xt )t be the corresponding weighted
random walk. Fix some o G. The lower speed and upper speed are defined to be
Eo [dist(Xt , X0 )] Eo [dist(Xt , X0 )]
lim inf and lim sup .
t t t t
If these limits coincide, we call the corresponding limit the speed.

Example 19.2. Let us calculate the speed of the random walk on Td .

Fix o Td . Let (Xt )t be the random walk and define Dt = dist(Xt , o). Let Lt = Lt (o) =
Pt
k=0 1{Xk =o} and L1 = 0.
d2
Consider the sequence Mt = dist(Xt , o) d t d2 Lt1 .

Eo [dist(Xt+1 , o) | Ft ] = 1{Xt =o} + 1{Xt 6=o} (dist(Xt , o) + 1) d1 1

d + (dist(Xt , o) 1) d

d2 d2

= dist(Xt , o) + + 1{Xt =o} 1 dist(Xt , o)
d d
d2 2
= dist(Xt , o) + + 1{Xt =o} ,
d d
where we have used that dist(Xt , o)1{Xt =o} = 0. Thus,
d2 2 d2 2
Eo [Mt+1 | Ft ] = dist(Xt , o) + + 1{Xt =o} (t + 1) Lt
d d d d
d2 2
= dist(Xt , o) t Lt1 = Mt .
d d
108

So (Mt )t is a martingale. This implies that

d2 2
0 = Eo [Mt ] = Eo [dist(Xt , o)] t Eo [Lt1 ].
d d
Since Td is transient, we know by monotone convergence that
1
lim Eo [Lt1 ] = Eo [V (o) + 1] = < .
t Po [To+ = ]
Thus,
1 d2
lim Eo [dist(Xt , o)] = .
t t d
454

It is not a coincidence that Td has positive speed. In fact, this has to do with the fact that
(Td ) < 1, or that Td is non-amenable.

Theorem 19.3. Let (G, c) be a weighted graph, and let (Xt )t be the corresponding random
walk started at some o G. Assume the following:

(G, c) < 1.
There exists M > 0 such that cx M for all x (i.e. cx is uniformly bounded).
The limit
b := lim sup |B(o, r)|1/r <
r

is finite, where B(o, r) is the ball of radius r around o.

Then, the lower speed is positive. In fact, a.s.

1 log (G)
lim inf dist(Xt , o) > 0.
t t log b

Proof. Let 0 < < log

log b . So b < 1. We can choose > b such that < 1. Because > b,

there exists some universal constant K > 0 such that |B(o, r)| Kr for all r. Because cx is
p
uniformly bounded, we have that K can be chosen large enough so that K > M/co . Thus, for
all x and all t, P t (o, x) Kt . Combining these two bounds we get that
X
P[dist(Xt , o) t] P t (o, x) K 2 t t .
xB(o,btc)

Since < 1, we get that these probabilities are summable. By Borel-Cantelli, we have that

P[lim inf 1t dist(Xt , o) ] = 0.

Taking log
log b completes the proof. t
u
109

X Recall that by Fatous Lemma

log
Eo [lim inf t1 dist(Xt , o)] lim inf t1 Eo [dist(Xt , o)].
log b
So, non-amenable graphs have positive (lower) speed.

Example 19.4. For all d, the random walk on Zd has zero speed:
In fact, we show that for a random walk (Xt )t on Zd , E0 [dist(Xt , 0)] t1/2 .
Consider the j-th coordinate Xt (j).
1 1
E0 [Xt+1 (j)2 | Ft ] = (Xt (j) + 1)2 + (Xt (j) 1)2 + 1 Xt (j)2 = Xt (j)2 + .
1

2d 2d d
t
is a martingale, and 0 = E0 [Xt (j)2 ] dt . So E0 [|Xt (j)|] t/d, and
p
Thus, Mt = Xt (j)2 d

E0 [dist(Xt , 0)] dt.
Also, note that we can write
t
X
Xt (j) = k
k=1
where (k )k are i.i.d. random variables with P[k = 1] = P[k = 1] = 1/2d and P[k = 0] =
1 1/2d. Since E[k ] = 0 and Var[k ] = E[k2 ] = 1/d, we get by the central limit theorem that
1/2
dt Xt (j) converges in distribution to a standard normal random variable, N (0, 1). So

lim P0 [ d|Xt (j)| t 12 ] = P[|N (0, 1)| 12 ] := c > 0.
t

Thus,
1 1 1
1/2 t c
lim inf E0 [|Xt (j)|] lim inf P0 [|Xt (j)| 2 td ] = ,
t t t t 2 d 2 d
and so
1 c
lim inf E0 [dist(Xt , 0)] d.
t t 2
454

Since many interesting graphs have zero speed, we sometimes are interested in a bit more
precision.

Definition 19.5. Let (G, c) be a weighted graph and let (Xt )t be the corresponding weighted
random walk. Fix some o G. The lower escape exponent and upper escape exponent
are defined to be
log Eo [dist(Xt , X0 )] log Eo [dist(Xt , X0 )]
lim inf and lim sup .
t log t t log t
If these limits coincide, we call the corresponding limit the escape exponent.
110

Example 19.6. Td has escape exponent 1. In fact any graph with positive speed has escape
exponent 1. (This is immediate from log Eo [dist(Xt , X0 )] = log( 1t Eo [dist(Xt , X0 )]) + log t.)
454

Example 19.7. Zd has escape exponent 1/2, as shown above. 454

Speed exponent 1/2 plays an important role in the theory. Walks with speed exponent 1/2
are called diffusive. Walks with speed exponent < 1/2 (resp. > 1/2) are called sub-diffusive
(resp. super-diffusive). Walks with speed exponent 1 (i.e. positive speed) are called ballistic.

19.2. Graph Powers

If G is a graph, there is a natural graph structure on V (G)d : Define the graph Gd to be as

follows. The vertex set of Gd is V (Gd ) = V (G)d . The edges are define by the relations:

(x1 , . . . , xd ) (y1 , . . . , yd ) k : j 6= k , xj = yj and xk yk .

Lemma 19.8. Let G be a graph with speed exponent .

Then, any lazy random walk on G has speed exponent . Moreover, for any d 1, the graph
d
G has speed exponent as well.

Proof. First,

Exercise 19.1. Show that

d
X
distGd ((x1 , . . . , xd ), (y1 , . . . , yd )) = distG (xj , yj ).
j=1

Now, let (Xt )t be a random walk on Gd and let Xt (j) be the j-th coordinate of Xt . Note
that (Xt (j))t is a lazy random walk on G with holding probability 1 d1 . Then,
X
dist(Xt , X0 ) = dist(Xt (j), X0 (j)),
j

so it suffices to prove that any lazy walk on G has speed exponent .

Let (Yt )t be a lazy walk on G with holding probability p. Let (Xt )t be a simple random walk
on G. Suppose that P is the transition matrix for the simple random walk (Xt )t on G, so that
111

the transition matrix for (Yt )t is Q = pI + (1 p)P . Let f (x) = dist(x, o). We have that
t
t
X t
Q = (1 p)k ptk P k ,
k
k=0
so
X
Eo [dist(Yt , o)] = Qt (o, x)dist(x, o) = (Qt f )(o)
x
t t
X t k tk k
X t
= (1 p) p P f (o) = (1 p)k ptk Eo [dist(Xk , o)].
k k
k=0 k=0

Now, for any > 0 there exists K such that for all k > K ,

k Eo [dist(Xk , o)] k + .
t
(1 p)k ptk = P[Bt = k]. By Chebychevs inequality,

Let Bt Bin(t, 1 p), and let qk = k

4 Var[Bt ] 4p
P[|Bt (1 p)t| > 12 (1 p)t] 2 2
= ,
(1 p) t (1 p)t
so
4p
P[Bt 21 (1 p)t] 1 1.
(1 p)t
Hence, for > 0, for all large enough t (so that (1 p)t > 2K ),
t
X
qk Eo [dist(Xk , o)] P[Bt 21 (1 p)t] 1
2 (1 p)t ,
k=0

which implies that

log P[Bt 12 (1 p)t] ((1 p)/2)

log Eo [dist(Yt , o)]
lim inf + lim = .
t log t t log t
On the other hand,
t
X
qk Eo [dist(Xk , o)] K + t+ ,
k=0
so
log 1 + tK

log Eo [dist(Yt , o)]
+
lim sup + + lim = + .
t log t t log t
t
u

Number of exercises in lecture: 1

Total number of exercises until here: 28
112

Random Walks

Ariel Yadin

Lecture 20:

20.1. Lamp-Lighter Graphs

We have already seen that non-amenable graphs must have positive speed and so escape
exponent 1. Non-amenable graphs are also transient, because their spectral radius is strictly less
than 1. The converse of these statements does not hold.
Figure 5 sums up the situation (for graphs) in terms of speed, amenability and transience.

non-amenable
positive speed
Td
(Ackermann)

LL(Z 3 )

LL(Z)
zero speed
= (( log2 k)k )
3

sub- Z
Z3 diffusive

transient recurrent

Exponential growth line

Figure 5. Possibilities for speed, amenability and transience

113

We will now construct a special class of graphs called lamp-lighter graphs. These are used
to give many examples in geometric group theory. They will provide us with examples of
(exponential volume growth) amenable graphs with positive speed.
Let us describe the construction in words, before the formal definition. We start with any
graph G (finite or infinite). This is the base graph. Suppose some lamp-lighter walks around
on the graph G. At every site of G there is some lamp, whose state is either on or off. The
lamp-lighter walks around and can also change the state of the lamp at her current position -
changing it either to on or to off.
What is a position in this new space? A position consists of the configuration of all lamps on
G, that is, a function from G to {0, 1} and the position of the lamp-lighter, i.e. a vertex in G.

Definition 20.1 (Lamp-Lighter Chain). Let P be a Markov chain on state space S. We define
the Markov chain LL(P ), called lamp-lighter on P , as follows.
S S
The state space for LL(P ) is LL(S) := S ({0, 1} )c , where ({0, 1} )c is the set of : S
{0, 1} with finite support (i.e. 1 (1) is finite). For a state (x, ) LL(S), we call x the position
of the lamp-lighter. If (y) = 1 we say the lamp at y is on, and if (y) = 0 we say it is off.
S S
For a lamp configuration ({0, 1} )c and a position x S we define x {0, 1} by
x x
(y) = (y) for all y 6= x and (x) = (x) + 1 (mod 2).
Define the transition matrix LL(P ) by setting
1
LL(P )((x, ), (y, )) = P (x, y),
4
for {, x , y , ( x )y } and 0 otherwise.
If (G, c) is a weighted graph, the LL(G) = LL(P ) for P the weighted random walk on (G, c).

X Note that the chain LL(P ) evolves as follows: At each step, the lamp-lighter chooses a

neighbor of her current position with distribution P (x, ) and moves there, then she refreshes
the state of the lamps at the old position and at the new position to on or off with probability
1/2 each, independently.

Remark 20.2. If G is a graph, then LL(G) defines a graph structure as well. P ((x, ), (y, )) > 0
if and only if P (x, y) > 0 and {, y }. So the graph structure on LL(G) is given by the
relations (x, ) (y, ) for {, x , y , ( x )y }.

In fact:
114

Exercise 20.1. Suppose that (G, c) is a weighted graph, and P is the transition matrix
of the weighted random walk on G. Show that LL(P ) is given by a weighted random walk on a
G
weighted graph whose vertices are (x, ), x G, ({0, 1} )c . What is the weight function on
this graph?

Exercise 20.2. Let P be an irreducible Markov chain. Let (Xt , t )t be Markov-LL(P ).

Show that (Xt )t is Markov-P .

G
Exercise 20.3. Let G be a graph, and let L = LL(G). Let o G and let 0 {0, 1}
denote the all zero function (configuration). Then, for any (x, ) L,

distL ((x, ), (o, 0)) | 1 (1)|.

The next example is an (exponential volume growth) amenable graph, but with positive speed.

Example 20.3. Consider L = LL(Z3 ).

First we show that L is amenable. We only need to demonstrate a Folner sequence. For every
r, let (Br )r be a Folner sequence in Z3 (e.g. the L balls of radius r). Let

Ar = (x, ) L : x Br , 1 (1) Br .

Note that |Ar | = |Br |2|Br | . Also, ((x, ), (y, )) Ar if and only if (x, y) Br and
{, y , x , ( x )y }. Thus, |Ar | = 4|Br |2|Br | . Thus, since the degree in L is 12,
|Ar | 4|Br |
(L) inf = inf = 0,
r 12|Ar | r 6|Br |

and so L is amenable.
Next we show that L has positive speed. Let 0 denote the all 0 lamp configuration, and let
o = (0, 0) L. Let (Xt , t ) be a random walk on L. We claim that for any z 6= 0,
1
(20.1) Po [t (z) = 1] = Po [Tz+ t],
2
115

where Tz+ = inf {t 1 : Xt = z}.

Given this, we have that
X 1
Eo [dist(Xt , o)] Eo [|t1 (1)|] = P[t (z) = 1] = Eo [|Rt |],
z
2

where Rt = {X1 , . . . , Xt }. Since (Xt )t is a random walk on Z3 , we are left with showing that
3
limt t1 EZ0 [|Rt |] > 0. In fact, we have using Exercise 20.4 below,
3
EZ0 [|Rt |] 3
PZ0 [T0+ = ] > 0.
t
We turn to proving (20.1). Let (y0 , 0 ), . . . , (yn , n ) be a path in L. Let T = inf {t : yt = z},
(where inf = ). Define a new path

(y0 , 0 ), . . . (yT 1 , T 1 ), (yT , Tz ), (yT +1 , Tz +1 ), . . . , (yn , nz ).

Since L is a regular graph, both paths have the same probability. Summing over all possible
paths, we get that for any k t,

Po [t (z) = 1, Tz+ = k] = Po [tz (z) = 1, Tz+ = k] = Po [t (z) = 0, Tz+ = k].

So
1
Po [t (z) = 1 | Tz+ = k] = Po [t (z) = 0 | Tz+ = k] = .
2
Thus,
t t
X 1 X 1
Po [t (z) = 1] = Po [t (z) = 1, Tz+ = k] = Po [Tz+ = k] = Po [Tz+ t],
2 2
k=1 k=1

proving (20.1). 454

Exercise 20.4. Show that for d 3, if (Xt )t is a random walk on Zd , and Rt =

{X1 , . . . , Xt } is the range, then
E0 [|Rt |]
P0 [T0+ = ].
t

Example 20.4. We have already seen an example of amenable zero-speed graphs: Zd . We in

fact know that these are diffusive.
Let us show that this can even be done with a graph of exponential volume growth.
116

We will show that LL(Z) is (at most) diffusive.

Let (Xt , t ) be a random walk on L = LL(Z). Let o = (0, 0) L. Define Mt = maxkt |Xt |.
Since the lamp-lighter up to time t never leaves [Mt , Mt ], we have Po -a.s. that 1 (1)
[Mt , Mt ].
Note that at time t, the lamp-lighter can walk to one of the ends of [Mt , Mt ] in at most
Mt steps, and then start turning off all the lamps in [Mt , Mt ] in at most 2Mt steps, finally
returning to 0 in at most another Mt steps. Thus, dist(Xt , o) 4Mt for all t, Po -a.s.

Thus it suffices to show that E[Mt ] 2 t for all t. For this we use a trick called the reflection
principle.
By the strong Markov property at time Tx , we have that
t
X
Po [Xt x, Tx t] = Po [Tx = k] Px [Xtk x]
k=0
t1
X 1
= 1{x=o} Po [Xt o] + Po [Tx = k] + Po [Tx = t]
2
k=1
1
Po [Tx t] .
2
where we have used transitivity, and symmetry by reflecting around 0:
1
Px [Xt x] = P0 [Xt 0] = P0 [Xt 0] ,
2
since P0 [Xt 0] + P0 [Xt 0] = 1 + P0 [Xt = 0] 1. We now have

P0 [max Xk x] = P0 [Tx t] 2 P0 [Xt x].

Reflecting around 0,

P0 [min Xk x] = P0 [Tx t] 2 P0 [Xt x].

P0 [Mt x] P0 [max Xk x] + P0 [min Xk x] 2 P0 [Xt x] + 2 P0 [Xt x].

kt kt

We conclude with

X
X
2 E[|Xt |] = 2 P0 [|Xt | x + 1] = 2 P0 [Xt x + 1] + 2 P0 [Xt (x + 1)]
x=0 x=0

X
P0 [Mt x + 1] = E0 [Mt ].
x=0
117

So E0 [Mt ] 2 E[|Xt |] 2 t. Thus,

Eo [dist(Xt , o)] 8 t.

454

Number of exercises in lecture: 4

Total number of exercises until here: 32
118

Random Walks

Ariel Yadin

Lecture 21:

Our next goal is to complete the picture in Figure 5; that is to give examples of graphs that
are transient, but have very slow speed (sub-diffusive), and examples of graphs that are recurrent
but have positive upper speed.

21.1. Concentration of Martingales: Azumas inequality

Let (Xt )t be a random walk on Z. We know (using the martingale |Xt |2 t) that E0 [T{r,r} ] =
r2 . That is, it takes a random walk r2 steps to reach distance r. We have already seen that this
implies diffusive behavior of the walk.
Let us prove a short concentration result, showing that actually T{r,r} is close to r2 with
very high probability.

Theorem 21.1 (Azumas Inequality). Let (Mt )t be a (Ft )t -martingale with bounded incre-
ments (i.e. |Mt+1 Mt | 1 a.s.). Then for any > 0,
2

P[Mt M0 ] exp .
2t
Proof. There are two main ideas:
The first idea, is that for a random variable X with E[X] = 0 and |X| 1 a.s. one has
2
E[eX ] e /2
. Indeed, f (x) = ex is a convex function, so for |x| 1 we can write x =
x+1
1 + (1 ) (1), where = 2 , so

ex e + (1 )e = cosh() + x sinh().

(Here 2 cosh() = e + e and 2 sinh() = e e .) Thus, because E[X] = 0, and using

(2k)! 2k k!,

E[eX ] cosh() + E[X] sinh() = cosh()

X 2k X 2k 2
= = e /2 .
(2k)! 2k k!
k=0 k=0

For the second idea, due to Sergei Bernstein, one applies the Chebychev / Markov inequality
Segei Bernstein (1880-1968) to the non-negative random variable eX , and then optimizes on .
119

In our case: For every t, since E[Mt Mt1 | Ft1 ] = 0 and |Mt Mt1 | 1, exactly as
above we could show that
2
E[e(Mt Mt1 ) | Ft1 ] e /2
a.s.

Thus,

E[e(Mt M0 ) ] = E e(Mt1 M0 ) E[e(Mt Mt1 ) | Ft1 ]

2 2
e /2
E[e(Mt1 M0 ) ] et /2
.

Now apply Markovs inequality to the non-negative random variable e(Mt M0 ) to get

P[Mt M0 ] = P[e(Mt M0 ) e ] exp 1 2

2 t .

Optimizing over we get that for = /t,

2

P[Mt M0 ] exp .
2t
t
u

Example 21.2. Lets apply Azumas inequality to random walks on Z.

Let (Xt )t be a random walk on Z. Recall that (Xt )t is a martingale. Consider the stopping
time T = T{r,r} . This is the first time |Xt | r.
Recall the reflection principle:

P0 [T t] = P0 [max |Xk | r] 4 P0 [Xt r].

Using Azumas inequality on this last term,

r2

P0 [T{r,r} t] 4 exp .
2t
454

21.2. Recurrent Trees - The Grove

Let (dk )kN be a sequence of positive numbers. For each k, let k be a binary tree of depth
dk .
Define the graph ((dk )k ) to be the graph N, with the tree k glued at the vertex k N (let
S
the root of k be k N); that is, the vertex set of ((dk )d ) is k=0 V (k ). The edges are those
in each k , with the edges k k + 1 for all k added. We call this the (dk )k -grove.

Proposition 21.3. The graph ((dk )k ) is recurrent.

120

d4
d3
d2
d1
d0

0 1 2 3 4

Figure 6. The graph ((dk )k ).

Proof. If v is a unit voltage on 0 and k , then for any n k, and any vertex x n we have
that v(x) = v(n). Indeed, if (Xt )t is a random walk on this graph, then because n is finite,
Px -a.s. the hitting time of n is finite, and also, v(Xt ) is a bounded martingale. Thus, by the
optional stopping theorem, v(x) = Ex [v(XTn )] = v(n).
Thus, we can short together all vertices in each tree n , n k. This results in the network
which is just the graph N. Thus, Reff (0, n ) = n . t
u

Pd
Recall that if is a finite binary tree of depth d, then |E( )| = |V ( )| 1 = k=0 2k 1 =
2d+1 2.

Lemma 21.4. Let r N ((dk )k ). The hitting time of r, Tr , has expectation given by

r1
X 3
E0 [Tr ] = 4 (r k)2dk r(r + 1).
2
k=0

Proof. Every time the walk as at a vertex k N, with probability 1/2 it starts a random walk
in the finite subtree k . The expected time to return to the root in a finite tree is the reciprocal
of the stationary distribution on that tree. Thus, we have

k := Ek [Tk+ | X1 6 N] = |E(k )| = 2(2dk 1).

Now, using the strong Markov property, for k > 0,

1 1 1
Ek [Tk+1 ] = 1 + Ek1 [Tk+1 ] + (k + Ek [Tk+1 ])
4 4 2
1 1 1 1 1
= + Ek1 [Tk ] + Ek [Tk+1 ] + k + Ek [Tk+1 ].
4 4 4 2 2
121

Rearranging and iterating,

k
X
Ek [Tk+1 ] = 2k + 1 + Ek1 [Tk ] = = 2 j + k + E0 [T1 ].
j=1

2
Similarly, E0 [T1 ] = 3 (0 + E0 [T1 ]) + 31 , so E0 [T1 ] = 20 + 1. Thus,
k
X k
X
Ek [Tk+1 ] = 2 j + k + 1 = 2dj +2 3(k + 1),
j=0 j=0

and
r1
X r1
X r1
X
E0 [Tr ] = Ek [Tk+1 ] = (r k)2dk +2 3 k+1
k=0 k=0 k=0
r1
X 3
= (r k)2dk +2 r(r + 1)
2
k=0
t
u

Exercise 21.1. Let be a finite binary tree of depth d with root o. Then,

Po [To+ > 2d ] (2e)1 .

The next theorem gives an example of a tree with speed exponent for any 1/2.

Theorem 21.5. Let dk = b log2 (k + 1)c. The tree ((dk )k ) has speed exponent ( + 2)1 .

Proof. Let (Xt )t be a random walk on = ((dk )k ). For x denote |x| the number that is
the root of the unique finite subtree k such that x k . So |x| dist(x, 0) |x| + d|x| . So it
suffices to prove that
log E0 [|Xt |]
lim = ( + 2)1 .
t log t
The lower bound is simpler. Note that

P0 [|Xt | < r] P0 [|Xt | < r, Tr > t] + P0 [|Xt | < r, Tr t]

1
P0 [Tr > t] + Pr [|Xt | < r] t1 E0 [Tr ] + .
2
If we take t 4 E0 [Tr ], we get that P0 [|Xt | < r] 34 . Since
r1 r
3 3
X Z
dk
E0 [Tr ] = 4 (r k)2 r(r + 1) (r x + 1)x dx r(r + 1) r+2 ,
2 1 2
k=0
122

We get that for t = d4 E0 [Tr ]e, r t1/(+2) . So

r
E0 [|Xt |] P0 [|Xt | r]r t1/(+2) .
4
We now turn to the upper bound on E0 [|Xt |]. Define inductively the following times. 0 = 0
and
n+1 = inf {t n : |Xt | =
6 |Xn |} .

That is, (n )n are the subsequent times the random walk moves from a vertex in N to a new
vertex in N. For every 0 < k N
1
Pk [X1 = k + 1 | X1 N] = Pk [X1 = k 1 | X1 N] = .
2
So the sequence (Zn = Xn )n is a random walk on N.
Now, if the walk is at a vertex k N, then with probability 1/2 it performs an excursion into
the finite subtree k , and with remaining probability 1/2 is moves in N. Thus, by the exercise
above,
P0 [n+1 > n + 2dk Zn = k, Fn ] (4e)1

a.s.

Let x = r, y = 2r, z = 3r. Let N < M be such that N = Ty and M = inf {m > N : Zm {x, z}}.
For n N let Jn = 1{n+1 >n +2dx } , and let S = {N n < M : Jn = 1}. Since dk dx for all
k [x, z], we have from the above, that for any set A {0, 1, . . . , k 1},
k|A|
P0 [S A + N | M N k] 1 (4e)1 .

Thus, for any < K, the event |S| < can be bounded by

P0 [|S| < ] P0 [M N < K] + P0 [|S| < | M N K]

K K
P0 [M N < K] + 1 (4e)1 .

Now, M N is the number of steps a random walk on Z started at y = 2r takes to reach
{x, z} = {r, 3r}. Translating r 7 0 we get that P0 [M N < ] is bounded by the probability
a random walk on Z started at 0 reaches [r, r] before time . Azumas inequality above (and
Example 21.2 following it) tells us that
r2

P0 [M N < K] 4 exp .
2K
Taking K = br2 (8 log r)2 c and = bKc for small enough (so that log(1 (4e)1 ) (1
1) > 2 log ) we have

P0 [|S| < ] < exp(((log r)2 )) + exp(((r/ log r)2 )),

123

which decays faster than any polynomial in r.

The event |S| implies that

T3r M > |S| 2dx + N > 2dx .

We thus conclude that for t = b 2dx c,

P0 [|Xt | > 3r] P0 [T3r < t] P0 [|S| < ] exp(((log r)2 )).

Since r2 (log r)1 and 2dx r . we get that t r2+ (log r)1 , and

E0 [|Xt |] 3r + t P0 [|Xt | > 3r] 3r + t exp(((log t)2 )) t1/(+2)+o(1) .

So
log E0 [|Xt |] 1
lim sup ,
t log t +2
which coincides with our lower bound. t
u

21.3. Transient and Sub-Diffusive

Example 21.6. We now have an example of a transient sub-diffusive graph. Let = ((dk )k )
be the grove for dk = b log2 (k + 1)c. Let G = 3 .
We know that as a graph power G has speed exponent ( + 2)1 . However, since N is a
subgraph of , then also N3 is a subgraph of G. We know that N3 is transient, so G must be
transient as well. 454

21.4. Recurrent Positive Speed

Raymond Paley

Exercise 21.2. [Paley-Zygmund Inequality] Let X be a non-negative random variable.

(1907-1933)

Let [0, 1]. Then,

(E[X])2
P[X > E[X]] (1 )2 .
E[X 2 ]

Lemma 21.7. There exists a universal constant p > 0 such that the following holds. Let be
Antoni Zygmund
a finite binary tree of depth d with root o. For any t d,
(1900-1992)
Po [dist(Xt , o) 16 t] p,

where (Xt )t is a random walk on .

124

Pt
Proof. Let Dt = dist(Xt , o). We have already seen that for Lt := k=0 1{Xt =o} (L1 = 0) and
Mt = Dt 13 t Lt1 that (Mt )dt=0 is a martingale (the restriction to t d is so that the walk
does not reach the leaves). Thus, for t d,

Eo [Dt ] = 13 t + Eo [Lt1 ].

Also, for t d,

Eo [Dt2 | Ft1 ] = 1{Xt1 =o} + 1{Xt1 6=o} 1

(Dt1 1)2 + 2
(Dt1 + 1)2

3 3

2 2
= Dt1 +1+ Dt1 .
3
So
t1
2 2X1
Eo [Dt2 ] = Eo [Dt1
2
]+1+ E0 [Dt1 ] = = t + 3 k + Eo [Lk1 ].
3 3
k=0
Note that for t d, we have that Lt is the number of visits to the root up to time t. Let q be
the probability that a random walk on an infinite rooted binary tree does not return to the root.
Then, if A is the set of leaves in then Po [TA < To+ ] q. However, since Po -a.s. t d TA ,
we get that for any t d,
1 1
1 Eo [Lt ] Eo [LTA ] = + < .
Po [TA < To ] q
Thus we conclude that
1 1
Eo [Dt2 ] t + t(t 1) + t.
9 q
We now use the Payley-Zygmund inequality to conclude that for any t d,
1 2
1 (Eo [Dt ])2 1 9t
Po [Dt Eo [Dt ]] 2 1 2 41 .
2 4 Eo [Dt ] 4 9 t + ( 98 + 1q ) t

Since Eo [Dt ] 13 t the proof is complete. t

Example 21.8. We complete the picture in Figure 5 by giving an example of a recurrent graph,
but with positive speed.
Recall that for the (dk )k grove, the expected time to reach the vertex r N is
r1
X 3
E0 [Tr ] = 4 (r k)2dk r(r + 1).
2
k=0

Let (dk )k be an increasing sequence that satisfies

r1
!
X 3
dr > 1 + 4 (r k)2dk r(r + 1) .
2
k=0
125

(This sequence must grow super-fast, at least like the Ackermann tower function.) Note that
this ensures that dr > d4 E0 [Tr ]e. Consider the (dk )k -grove = ((dk )k ).
is of course recurrent.
3
Recall that for a random walk (Xt )t on and for t 4 E0 [Tr ], we have that P0 [|Xt | < r] 4

(where |Xt | is the root of the finite subtree containing Xt ).

Given that X0 = r, we have by Lemma 21.7 for any t dr ,

Pr [dist(Xt , r) 16 t] c > 0,

for some universal constant c > 0. So if we take t = 2 t0 for t0 = d4 E0 [Tr ]e, then t0 < dr so
0
t
1 0
X 1
P0 [dist(Xt , 0) 6t ] P0 [|Xt0 | = k] Pk [dist(Xt , k) 16 t0 ] c.
4
k=r

So
1 1 0
E0 [dist(Xt , 0)] c t t.
4 6
And so has positive speed. 454

Number of exercises in lecture: 2

Total number of exercises until here: 34
126

Random Walks

Ariel Yadin

Lecture 22:

22.1. Galton-Watson Processes

The final topic for this course is a special Markov chain on trees, known as the Galton-Watson
process.
Francis Galton (1822-1911) Galton and Watson were interested in the question of the survival of aristocratic surnames in
the Victorian era. They proposed a model to study the dynamics of such a family name.
In words, the model can be stated as follows. We start with one individual. This individual
has a certain random number of offspring. Thus passes one generation. In the next generation,
each one of the offspring has its own offspring independently. The processes continues building
a random tree of descent.
Henry Watson (1827-1903)
The formal definition is a bit complicated. For the moment let us focus only on the population
size at a given generation.
P
Definition 22.1. Let be a distribution on N; i.e. : N [0, 1] such that n (n) = 1. The
Galton-Watson Process, with offspring distribution , (also denoted GW ,) is the following
Markov chain (Zn )n on N:
Let (Xj,k )j,kN be a sequence of i.i.d. random variables with distribution .

At generation n = 0 we set Z0 = 1. [ Start with one individual. ]

Given Zn , let
Zn
X
Zn+1 := Xn+1,k .
k=1
[ Xn+1,k represents the number of offspring of the k-th individual in generation n. ]

Example 22.2. If (0) = 1 then the GW process is just the sequence Z0 = 1, Zn = 0 for all
n > 0.
If (1) = 1 then GW is Zn = 1 for all n.
How about (0) = p = 1 (1)? In this case, Z0 = 1, and given that Zn = 1, we have
that Zn+1 = 0 with probability p, and Zn+1 = 1 with probability 1 p, independently of all
(Zk : k n). If Zn = 0 the Zn+1 = 0 as well.
127

What is the distribution of T = inf {n : Zn = 0}? Well, on can easily check that T Geo(p).
So GW is essentially a geometric random variable.
We will in general assume that (0) + (1) < 1, otherwise the process is not interesting.
454

22.2. Generating Functions

X Notation: For a function f : R R we write f (n) = f f for the composition of f

with itself n times.

Let X be a random variable with values in N. The probability generating function, or
PGF , is defined as
X
GX (z) := E[z X ] = P[X = n]z n .
n
This function can be thought of as a function from [0, 1] to [0, 1]. If (n) = P[X = n] is the
density of X, then we write G = GX .
Some immediate properties:

Exercise 22.1. Let GX be the probability generating function of a random variable X

with values in N. Show that

If z [0, 1] then 0 GX (z) 1.

GX (1) = 1.
GX (0) = P[X = 0].
G0X (1) = E[X].
E[X 2 ] = G00X (1) + G0X (1).
n
z n GX (0+) = P[X = n].

Proposition 22.3. A PGF GX is convex on [0, 1].

Proof. GX is twice differentiable, with

G00X (z) = E[X(X 1)z X2 ] 0.

t
u

The PGF is an important tool in the study of Galton-Watson processes.

128

Proposition 22.4. Let (Zn )n be a GW process. For z [0, 1],

E[z Zn+1 Z0 , . . . , Zn ] = G (z)Zn .

Thus,
GZn = G(n)
= G G .

Proof. Conditioned on Z0 , . . . , Zn , we have that

Zn
X
Zn+1 = Xk ,
k=1

where X1 , . . . , are i.i.d. distributed according to . Thus,

Zn
Y Zn
Y
E[z Zn+1 Z0 , . . . , Zn ] = E[ z Xk Z 0 , . . . , Z n ] = E[z Xk ] = G (z)Zn .

k=1 k=1

Taking expectations of booths sides we have that

GZn+1 (z) = E[z Zn+1 ] = E[G (z)Zn ] = GZn (G (z)) = GZn G (z).

An inductive procedure gives

GZn = GZn1 G = GZn2 G G = = G(n)

since GZ1 = G . t
u

22.3. Extinction

Recall that the first question we would like to answer is the extinction probability for a GW
process.
Let (Zn )n be a GW process. Extinction is the event {n : Zn = 0}. The extinction
probability is defined to be q = q(GW ) = P[n : Zn = 0]. Note that the events {Zn = 0}
form an increasing sequence, so

q(GW ) = lim P[Zn = 0].

Proposition 22.5. Consider a GW . (Assume that (0) + (1) < 1.) Let q = q(GW ) be the
extinction probability and G = G . Then,

q is the smallest solution to the equation G(z) = z. If only one solution exists, q = 1.
Otherwise, q < 1 and the only other solution is G(1) = 1.
q = 1 if and only if G0 (1) = E[X] 1.
129

X Positivity of the extinction probability depends only on the mean number of offspring!

Proof. If P[X = 0] = G(0) = 0 then Zn Zn1 for all n, so q = 0, because there is never
extinction. Also, the only solutions to G(z) = z in this case are 0, 1 because G00 (z) > 0 for z > 0
so G0 is strictly convex, and thus G(z) < z for all z (0, 1). So we can assume that G(0) > 0.
Let f (z) = G(z) z. So f 00 (z) > 0 for z > 0. Thus, f 0 is a strictly increasing function.

Case 1: If G0 (1) 1. So f 0 (1) 0. Since f 0 (0+) = (1 (1)) < 0 (because

(1) < 1), and since f 0 is strictly increasing, for all z < 1 we have that f 0 (z) < 0. Thus,
the minimal value of f is at 1; that is, f (z) > 0 for all z < 0 and there is only one
solution to f (z) = 0 at 1.
Case 2: If G0 (1) > 1. Then f 0 (1) > 0. Since f 0 (0+) < 0 there must be some 0 < x < 1
such that f 0 (x) = 0. Since f 0 is strictly increasing, this is the unique minimum of f in
[0, 1]. Since f 0 (z) > 0 for z > x, as a minimum, we have that f (x) < f ( 1+x
2 ) f (1) = 0.

Also, f (0) = (0) > 0, and because f is continuous, there exists a 0 < p < x such that
f (p) = 0.
We claim that p, 1 are the only solutions to f (z) = 0. Indeed, if a < b are any such
solutions, then because f is strictly convex, for any a < z < b, f (z) < f (a) + (1
)f (b) = 0 for some (0, 1).
In conclusion, in the case G0 (1) > 1 we have that there are exactly two solutions to
G(z) = z, which are p and 1.
Moreover, p < x for x the unique minimum of f , so because f 0 is strictly increasing,

1 (1 (1)) = f 0 (0+) f 0 (z) f 0 (p) < f 0 (x) = 0

for any z p. Thus, for any z p we have that

Z p
f (z) = f (z) f (p) = f 0 (t)dt p z,
z

which implies that G(z) p for any z p.

Now, recall that the extinction probability admits

q = lim P[Zn = 0] = lim GZn (0) = lim G(n) (0).

n n n

Since G is a continuous function, we get that G(q) = q so q is a solution to G(z) = z.

If two solutions exists (equivalently, G0 (1) > 1), say p and 1, then G(n) (0) p for all n, so
q p and thus must be q = p < 1.
130

If only one solution exists then q = 1. t

q
G0 (1) > 1
G0 (1) 1

0 1 0 1
Figure 7. The two possibilities for G0 (1). The blue dotted line and crosses
show how the iterates G(n) (0) advance toward the minimal solution of G(z) = z.

Martingales and Stopping Times
No ratings yet
Martingales and Stopping Times
5 pages
Understanding Brownian Motion
No ratings yet
Understanding Brownian Motion
40 pages
Stochastic Control and Filtering Theory
No ratings yet
Stochastic Control and Filtering Theory
112 pages
Kolmogorov's Probability Axioms Explained
No ratings yet
Kolmogorov's Probability Axioms Explained
14 pages
Tucker Family Vacation Probability Analysis
No ratings yet
Tucker Family Vacation Probability Analysis
7 pages
Strassen's 2x2 Matrix Multiplication Proof
No ratings yet
Strassen's 2x2 Matrix Multiplication Proof
6 pages
Mode of Binomial Distribution Explained
No ratings yet
Mode of Binomial Distribution Explained
16 pages
Markov Chains: Definitions and Examples
No ratings yet
Markov Chains: Definitions and Examples
59 pages
Understanding Diffusion Models Basics
No ratings yet
Understanding Diffusion Models Basics
50 pages
Stochastic Calculus Problem Set 2
No ratings yet
Stochastic Calculus Problem Set 2
7 pages
MAT311 Abstract Algebra Lecture Notes
No ratings yet
MAT311 Abstract Algebra Lecture Notes
57 pages
Billingsley's Legacy in Probability Theory
No ratings yet
Billingsley's Legacy in Probability Theory
11 pages
Discrete-Time Markov Control Processes
No ratings yet
Discrete-Time Markov Control Processes
285 pages
Markov Chain Analysis and Transition Matrices
No ratings yet
Markov Chain Analysis and Transition Matrices
24 pages
Finite Difference Methods for PDEs
No ratings yet
Finite Difference Methods for PDEs
39 pages
Martingales and Stopping Times Explained
100% (1)
Martingales and Stopping Times Explained
12 pages
Stochastic Differential Equations Overview
No ratings yet
Stochastic Differential Equations Overview
119 pages
Improving Software Estimation Promises Using Monte Carlo Simulation
No ratings yet
Improving Software Estimation Promises Using Monte Carlo Simulation
9 pages
Spectral Methods in Fluid Dynamics
No ratings yet
Spectral Methods in Fluid Dynamics
12 pages
Diffusion Models in Image Translation
No ratings yet
Diffusion Models in Image Translation
46 pages
Billingsley: Probability and Measure Insights
No ratings yet
Billingsley: Probability and Measure Insights
25 pages
Neyman-Fisher Factorization Theorem
No ratings yet
Neyman-Fisher Factorization Theorem
3 pages
Ising Model: Phase Transition Analysis
No ratings yet
Ising Model: Phase Transition Analysis
8 pages
Math F424 2191
100% (1)
Math F424 2191
3 pages
Advanced Probability Theory Overview
No ratings yet
Advanced Probability Theory Overview
69 pages
(Bank of America, Andersen) Efficient Simulation of The Heston Stochastic Volatility Model
No ratings yet
(Bank of America, Andersen) Efficient Simulation of The Heston Stochastic Volatility Model
38 pages
Advanced Stochastic Calculus Exercises
No ratings yet
Advanced Stochastic Calculus Exercises
2 pages
Stopping Times and Their Properties
No ratings yet
Stopping Times and Their Properties
3 pages
Øksendal's SDE Solutions Manual
33% (3)
Øksendal's SDE Solutions Manual
35 pages
Schilling's Guide to Brownian Motion
No ratings yet
Schilling's Guide to Brownian Motion
240 pages
Dynkin Systems and σ-Algebras Tutorial
100% (1)
Dynkin Systems and σ-Algebras Tutorial
493 pages
Stochastic Models: Mathematical Foundations
No ratings yet
Stochastic Models: Mathematical Foundations
18 pages
Ridge Regression and Regularization Methods
No ratings yet
Ridge Regression and Regularization Methods
21 pages
Errata for Multiplicative Number Theory I
No ratings yet
Errata for Multiplicative Number Theory I
3 pages
Understanding the PAC Learning Model
No ratings yet
Understanding the PAC Learning Model
7 pages
Algebraic Stochastic Calculus Foundations
No ratings yet
Algebraic Stochastic Calculus Foundations
21 pages
Gauss-Markov Model Overview
No ratings yet
Gauss-Markov Model Overview
150 pages
Functional Ito Calculus for Martingales
No ratings yet
Functional Ito Calculus for Martingales
33 pages
Amari Methods
No ratings yet
Amari Methods
38 pages
STAT3006 Statistical Learning Notes
No ratings yet
STAT3006 Statistical Learning Notes
110 pages
Advanced Econometric Methods Overview
No ratings yet
Advanced Econometric Methods Overview
362 pages
Geometrical Methods in Differential Equations
No ratings yet
Geometrical Methods in Differential Equations
366 pages
Financial Modeling with Brownian Motion
No ratings yet
Financial Modeling with Brownian Motion
60 pages
Stochastic Calculus Overview
No ratings yet
Stochastic Calculus Overview
114 pages
Differential Geometry Lecture Notes
No ratings yet
Differential Geometry Lecture Notes
376 pages
(Dimitris N. Politis, Joseph P. Romano, Michael Subsampling
No ratings yet
(Dimitris N. Politis, Joseph P. Romano, Michael Subsampling
180 pages
Projection Methods in Functional Equations
No ratings yet
Projection Methods in Functional Equations
31 pages
Susskind's Statistical Mechanics Notes
No ratings yet
Susskind's Statistical Mechanics Notes
21 pages
Advanced Numerical Methods in Finance
No ratings yet
Advanced Numerical Methods in Finance
170 pages
Data Analysis with Mapper & Homology
No ratings yet
Data Analysis with Mapper & Homology
71 pages
Courant-Fischer Theorem Overview
No ratings yet
Courant-Fischer Theorem Overview
5 pages
Protter's Stopping Time Solutions
No ratings yet
Protter's Stopping Time Solutions
25 pages
Voleon Interview Preparation Guide
No ratings yet
Voleon Interview Preparation Guide
8 pages
Recurrence and Transience in Random Walks
No ratings yet
Recurrence and Transience in Random Walks
16 pages
Random Walk: Recurrence vs Transience
No ratings yet
Random Walk: Recurrence vs Transience
13 pages
Recurrence vs Transience in Random Walks
No ratings yet
Recurrence vs Transience in Random Walks
45 pages
Survey of Random Walks on Graphs
No ratings yet
Survey of Random Walks on Graphs
46 pages
Understanding Simple Random Walks
No ratings yet
Understanding Simple Random Walks
11 pages
Sage Commands for Group Theory
No ratings yet
Sage Commands for Group Theory
15 pages
1974 Motor Trend Car Performance Data
No ratings yet
1974 Motor Trend Car Performance Data
1 page
MATH 1010A Tutorial Notes
No ratings yet
MATH 1010A Tutorial Notes
4 pages
Quadratic Residues and Legendre Symbols
No ratings yet
Quadratic Residues and Legendre Symbols
8 pages
Wavelet Analysis in fMRI Research
No ratings yet
Wavelet Analysis in fMRI Research
26 pages
Taylor's Theorem and Polynomials
No ratings yet
Taylor's Theorem and Polynomials
4 pages
Linear Systems in Control Engineering
No ratings yet
Linear Systems in Control Engineering
36 pages
Topology Exercises for Math3070 Course
No ratings yet
Topology Exercises for Math3070 Course
1 page
Lebesgue Measure and Convergence Theorems
No ratings yet
Lebesgue Measure and Convergence Theorems
10 pages
Operator Theory in Hilbert Space
No ratings yet
Operator Theory in Hilbert Space
74 pages
(Ronald L. Lipsman, Jonathan M. Rosenberg) Multiva (B-Ok - Xyz)
No ratings yet
(Ronald L. Lipsman, Jonathan M. Rosenberg) Multiva (B-Ok - Xyz)
280 pages
2017 2018 Undergraduate
No ratings yet
2017 2018 Undergraduate
1,138 pages
Lebesgue Measure in R² Explained
No ratings yet
Lebesgue Measure in R² Explained
53 pages
Complex Analysis Apuntes
No ratings yet
Complex Analysis Apuntes
31 pages
Real Harmonic Analysis Lectures
No ratings yet
Real Harmonic Analysis Lectures
113 pages
Basketball Rules and Skills Overview
No ratings yet
Basketball Rules and Skills Overview
15 pages
Overview of Brazil's Geography and Economy
No ratings yet
Overview of Brazil's Geography and Economy
166 pages
Sales Register for FI Dubai International
No ratings yet
Sales Register for FI Dubai International
3 pages
Charterhouse Auction Details - Feb 2023
No ratings yet
Charterhouse Auction Details - Feb 2023
50 pages
Understanding Spreadsheets: Features & Functions
No ratings yet
Understanding Spreadsheets: Features & Functions
3 pages
Attitudes of Midwives in Clinical Practice
No ratings yet
Attitudes of Midwives in Clinical Practice
37 pages
Causes of the French Revolution
No ratings yet
Causes of the French Revolution
1 page
Compression Molding Guidelines
No ratings yet
Compression Molding Guidelines
30 pages
Sharding in Blockchain Technology
No ratings yet
Sharding in Blockchain Technology
9 pages
Genie Awp Models
No ratings yet
Genie Awp Models
130 pages
Testbank - Business Research Methods 13th Ed Schindler
No ratings yet
Testbank - Business Research Methods 13th Ed Schindler
224 pages
EC175B Flight Manual Revision Update
No ratings yet
EC175B Flight Manual Revision Update
83 pages
Decision-Making Insights and School Newspaper
No ratings yet
Decision-Making Insights and School Newspaper
4 pages
2200W PV Charger for 12V Systems
No ratings yet
2200W PV Charger for 12V Systems
2 pages
Formative Assessment and Metacognition
No ratings yet
Formative Assessment and Metacognition
7 pages
70s, 80s & 90s Music Playlist
No ratings yet
70s, 80s & 90s Music Playlist
8 pages
One-Line Communication System Diagram
No ratings yet
One-Line Communication System Diagram
1 page
AC - CDU Size
No ratings yet
AC - CDU Size
20 pages
Intermediate Certificate of Chinni Yashwanth
No ratings yet
Intermediate Certificate of Chinni Yashwanth
1 page
Substation Foundation Construction Details
100% (1)
Substation Foundation Construction Details
1 page
900 MHz Wireless Gateway Overview
No ratings yet
900 MHz Wireless Gateway Overview
2 pages
Principles of Management Overview
No ratings yet
Principles of Management Overview
4 pages
Troubleshooting Powder Coating Systems
No ratings yet
Troubleshooting Powder Coating Systems
1 page
The Winning Way: Sports Meets Business
No ratings yet
The Winning Way: Sports Meets Business
28 pages
Pathfinder Character Sheet Overview
No ratings yet
Pathfinder Character Sheet Overview
4 pages
Acute Severe Asthma Management Guidelines
No ratings yet
Acute Severe Asthma Management Guidelines
1 page
Aseptic Techniques in Microbiology
No ratings yet
Aseptic Techniques in Microbiology
18 pages
Air Defence System Upgrades
100% (1)
Air Defence System Upgrades
63 pages
Understanding Appendicitis Symptoms
No ratings yet
Understanding Appendicitis Symptoms
11 pages
City-Level English Exam for 7th Graders
No ratings yet
City-Level English Exam for 7th Graders
24 pages

Random Walks and Markov Chains Overview

Uploaded by

Random Walks and Markov Chains Overview

Uploaded by

RANDOM WALKS

Course: 201.1.8031 Spring 2016

Lecture 2. Markov Chains 8

Lecture 3. Recurrence and Transience 18

Lecture 4. Stationary Distributions 26

Lecture 5. Positive Recurrent Chains 33

Lecture 6. Convergence to Equilibrium 37

Lecture 7. Conditional Expectation 42

Lecture 9. Reversible Chains 55

Lecture 10. Discrete Analysis 60

Lecture 11. Networks 67

Lecture 12. Network Reduction 73

Lecture 13. Thompsons Principle 80

Lecture 14. Nash-Williams 84

Lecture 15. Flows 89

Lecture 16. Resistance in Euclidean Lattices 93

Lecture 17. Spectral Analysis 98

Lecture 18. Kestens Amenability Criterion 103

Lecture 19. 107

Lecture 20. 112

Lecture 21. 118

Lecture 22. 126

Number of exercises in lecture: 0

A gambler playing the roulette.

1.2. Random Walks on Z

X Story about Polya meeting a couple in the woods.

George Polya (1887-1985)

Figure 1. Path of a drunk man walking in the streets.

Figure 2. Path of a drunk bird flying around.

Then, there exists a constant c > 0 such that for all t,

Proof. An inequality we will use is Stirlings approximation of n!:

This leads by a bit of careful computation to:

that the number of +s equals the number of s. Rigorously, if

then Rt + Lt = t. Moreover, the distribution of Rt is Bin(t, 1/2). Also, St = Rt Lt , so for

(St+k )k has the same distribution as a random walk on Z, and is independent of St .

So P[ k 1 : St+k = 0 | St = 0] = p. Thus, every time we are at 0 there is probability

So by the Borel-Cantelli Lemma P[St = 0 i.o. ] = 0. In other words,

P[ T : t > T St 6= 0] = P[lim inf {St 6= 0}] = 1.

Number of exercises in lecture: 0

Lecture 2: Markov Chains

2.1.1. Graphs. We will make use of the structure known as a graph:

X Notation: Many times we will use x G instead of x V (G).

Example 2.2. The complete graph.

CG,S is called the Cayley graph of G with respect to S.

Definition 2.3. Let G be a graph. A path in G is a sequence = (0 , 1 , . . . , n ) (with the

Definition 2.4. Let G be a graph. For two vertices x, y G define

dist(x, y) = distG (x, y) := inf {|| : : x y} ,

 Exercise 2.1. Show that distG defines a metric on G.

(x, y) 0 and (x, y) = 0 iff x = y.

 Exercise 2.2. Prove that is an equivalence relation in any graph.

X In this course we will focus on connected graphs.

S, we use the following time notation: If s = (s0 , s1 , . . . , sn , . . .) is a sequence in S (finite of

Pn [{s0 , s1 , . . . , sn1 }] := P[C(s0 , s1 , . . . , sn1 )]

Also, AB is the matrix defined by

These definitions can be generalized to infinite dimensions.

2.2. Markov Chains

A stochastic process is just a sequence of random variables. If (Xn )n is a stochastic

For any n 0, and any s0 , s1 , . . . , sn , sn+1 S,

P[Xn+1 = sn+1 |X0 = s0 , . . . , Xn = sn ] = P[Xn+1 = sn+1 |Xn = sn ] = P[X1 = sn+1 |X0 = sn ].

Remark 2.8. Any Markov chain is characterized by its transition matrix.

P[Xn = x|X0 = s0 , . . . , Xn = sn ] = P[Xn+1 = x|Xn = sn ] = Psn [X1 = x].

More generally, if is a probability measure on S, we write

P [Xn+k = y|A, Xk = x] = P n (x, y)

(provided P [A, Xk = x] > 0).

Remark 2.12. For those uncomfortable with -algebras,

an = (n1 )P (a) = (1 p)n1 (a) + pn1 (b) 1/2

= (1 2p)(n1 )(a) + p (a) + (1 2p)(a) = (1 2p)an1 .

P n f = E [f (Xn )] and (P n f )(x) = Ex [f (Xn )].

The second assertion also follows by conditional expectation,

2.3. Classification of Markov chains

Definition 2.15. Let P be the transition matrix of a Markov chain on S. P is called

P t (x, y) Px [(X0 , . . . , Xt ) = ] = 2t > 0.

Definition 2.17. Let P be a Markov chain on S.

Proposition 2.19. Let P be a Markov chain on state space S.

Number of exercises in lecture: 4

Lecture 3: Recurrence and Transience

Exercise 2.1. Show that distG defines a metric on G.

Exercise 2.2. Prove that is an equivalence relation in any graph.

Exercise 3.1. Define an equivalence relation on S N by t 0 if j = j0 for all

Exercise 3.2. Show that (X0 , . . . , XT ) is a -algebra.

Exercise 4.1. If satisfies the detailed balance equations, then is a stationary