Random Walks and Markov Chains Overview
Random Walks and Markov Chains Overview
ARIEL YADIN
Contents
Lecture 1. Introduction 3
Lecture 8. Martingales 50
Random Walks
Ariel Yadin
Lecture 1: Introduction
1.1. Overview
In this course we will study the behavior of random processes; that is, processes that evolve
in time with some randomness, or probability measure, governing the evolution.
Let us give some examples:
Some questions which we will be able to (hopefully) answer by the end of the course:
Suppose a gambler starts with N Shekel. What is the probability that the gambler will
earn another N Shekel before losing all of the money?
How long will it take for a drunk man walking to reach either his house or the city limits?
Suppose a chess knight moves randomly on a chess board. Will the knight eventually
return to the starting point? What is the expected number of steps until the knight
returns?
Suppose that men of the Rothschild family have three children on average. What is the
probability that the Rothschild name will still be alive in another 100 years? Is there
positive probability for the Rothschild name to survive forever?
We will start with some soft example, and then go into the more deep and precise theory.
What is a random walk? A (simple) random walk on a graph is a process, or a sequence of
vertices, such that at every step the next vertex is chosen uniformly among the neighbors of the
current vertex, each step of the walk independently.
Now, suppose we want to perform a random walk on Z. If the walker is at a vertex z, then
a uniformly chosen neighbor is choosing z + 1 or z 1 with probability 1/2 each.
5
That is, we can model a random walk on Z by considering an i.i.d. sequence (Xk )
k=1 , where
Pt
Xk is uniform on {1, 1}, and the walk will be St = k=1 Xk . So Xk is the k-th step of the
walk, and St is the position after t steps.
Let us consider a few properties of the random walk on Z:
First let us calculate the expected number of visits to 0 by time t:
Proposition 1.1. Let (St )t be a random walk on Z. Denote by Vt the number of visits to 0
up to time t; that is,
Vt = # {1 k t : Sk = 0} .
Now, what is the probability P[Sk = 0]? Note that there are k steps, so for Sk = 0 we need James Stirling (1692-1770)
Rt = # {1 k t : Xk = 1} and Lt = # {1 k t : Xk = 1} ,
t bt/2c
X X 2k 2k
E[Vt ] = P[Sk = 0] = 2
k
k=1 k=1
6
Since
m m+1
1 1
X Z
dx = 2 1/2 ( m + 1 1),
k 1 x
k=1
we get that E[Vt ] c t for some c > 0. t
u
Let us now consider the probability that the random walker will return to the origin.
Proposition 1.2. P[ t 1 : St = 0] = 1.
Proof. Let p = P[ t 1 : St = 0]. Assume for a contradiction that p < 1. (p > 0 since
Pk
p > P[S2 = 0] = 21 .) Suppose that St = 0 for some t > 0. Then, since St+k = St + j=1 Xt+j ,
Tk = inf {t Tk1 + 1 : St = 0} ,
where inf = . Now let K be the first k such that Tk = . The analysis above gives that for
k 1,
P[K = k] = P[T1 < , . . . , Tk1 < , Tk = ] = P[T1 T0 < , . . . , Tk1 Tk2 < , Tk Tk1 = ].
The main observation now is that the different Tk Tk1 are independent, so P[K = k] =
1
pk1 (1 p). That is, K Geo(1 p). Thus, E[K] = 1p . But note that K is exactly the
number of visits to 0 in the infinite time walk. That is, Vt % K. However, in the previous
proposition we have shown that E[Vt ] c t a contradiction!
So it must be that p = 1. t
u
It is not a coincidence that the expected number of visits to 0 is infinite, and that the
probability to return to 0 is 1. This will also be the case in 2-dimensions, but not in 3-dimensions.
In the upcoming classes we will rigorously prove the following theorem by Polya.
Theorem 1.3. Fix d 1. Let (Xk )k be i.i.d. d-dimension random variables uniformly dis-
Pt
tributed on {e1 , . . . , ed } (where {e1 , . . . , ed } is the standard basis for Rd ). Let St = k=1 Xk .
Let p(d) = P[ t 1 : St = 0]. Then, p(d) = 1 for d 2 and p(d) < 1 for d 3.
7
Remark 1.4. The proof for d 3 is mainly that P[St = 0] Ctd/2 . Thus, for d 3,
X
P[St = 0] < .
t=1
Thus, a.s. the number of visits to 0 is finite. If the probability to return to 0 was 1, then the
number of visits to 0 must be infinite a.s. All this will be done rigorously in the upcoming
classes.
Random Walks
Ariel Yadin
2.1. Preliminaries
S
X Notation: For a set S we use k to denote the set of all subsets of size k in S; e.g.
S
= {{x, y} : x, y S, x 6= y} .
2
Definition 2.1. A graph G is a pair G = (V (G), E(G)), where V (G) is a countable set, and
E(G) V (G)
2 .
The elements of V (G) are called vertices. The elements of E(G) are called edges. The
G
notation x y (sometimes just x y when G is clear from the context) is used for {x, y} E(G).
If x y, we say that x is a neighbor of y, or that x is adjacent to y. If x e E(G) then
the edge e is said to be incident to x, and x is incident to e.
The degree of a vertex x, denoted deg(x) = degG (x) is the number of edges incident to x in
G.
The notion of a path on a graph gives rise to two important notions: connectivity and graph
distance.
where inf = .
Definition 2.5. Let G be a graph. We say that vertices x and y are connected if there exists
a path : x y of finite length. That is, if distG (x, y) < . We denote x connected to y by
x y.
The relation is an equivalence relation, so we can speak of equivalence classes. The equiv-
alence class of a vertex x under this relation is called the connected component of x.
If a graph G has only one connected component it is called connected. That is, G is connected
if for every x, y G we have that x y.
10
X Notation: For a path in a graph G, or more generally, a sequence of elements from a set
2.1.2. S-valued random variables. Given a countable set S, we can define a discrete topology
on S. Thus, the Borel -algebra on S is just the complete -algebra 2S . This gives rise to
the notion of S-valued random variables, which is just a fancy name for functions X from a
probability space into S such that for every s S the pull-back X 1 (s) is an event.
That is,
Definition 2.6. Let (, F, P) be a probability space, and let S be a countable set. A S-valued
random variable is a function X : S such that for any s S, X 1 (s) F.
2.1.3. Sequences - infinite dimensional vectors. At some point, we will want to consider
sequences of random variables. If X = (Xn )n is a sequence of S-valued random variables, we
can think of X as an infinite dimensional vector.
What is the appropriate measurable space for such vectors?
Well, we can consider = S N , the space of all sequences in S. Next, we have a -system
of cylinder sets: Given a finite sequence s0 , s1 , . . . , sm in S, the cylinder induced by these is
C = C(s1 , . . . , sm ) = S N : 0 = s0 , . . . , m = sm . The collection of all cylinder sets
forms a -system. We let F be the -algebra generated by this -system.
2.1.4. Caratheodory and Kolmogorov extension. Now suppose we have a probability mea-
sure P on (, F) as above. For every n, we can consider the restriction of P to the first n
Constantin Caratheodory coordinates; that is, we can consider n = S n and the full -algebra on n , and then
(1873-1950)
defines a probability measure on n . Note that these measures are consistent, in the sense that
for any n > m,
Pm [{s0 , . . . , sm }] = Pn [{ S n : 0 = s0 , . . . , m = sm }].
Theorems by Caratheodory and Kolmogorov tell us that if we started with a consistent family
of probability measure on Sn , n = 1, 2, . . ., we could find a unique extension of these whose
restriction would give these measures.
In other words, the finite-dimensional marginals determine the probability measure of the Andrey Kolmogorov
sequence. (1903-1987)
2.1.5. Matrices. Recall that if A, B are n n matrix and v is an n-dimensional vector, then
Av, vA are vectors defined by
n
X n
X
(Av)k = Ak,j vj and (vA)k = vj Aj,k .
j=1 j=1
Definition 2.7. Let S be a countable set. A Markov chain on S is a sequence (Xn )n0 of
S-valued random variables (i.e. measurable functions Xn : S), that satisfies the following
Markovian property:
12
That is, the probability to go from s to s0 does not depend on n or on the history, but only
on the current position being at s and on s0 . This property is known as the Markov property.
Andrey Markov (1871-1897)
X A set S as above is called the state space.
and that all the entries of P are in [0, 1]. Such a matrix is called stochastic. [ Each row of the
matrix is a probability measure on S. ]
On the other hand, suppose that P is a stochastic matrix indexed by a countable set S. Then,
one can define the sequence of S-valued random variables as follows. Let X0 = x for some fixed
starting point x X. For all n 0, conditioned on X0 = s0 , . . . , Xn = sn , define Xn+1 as
the random variable with distribution P[Xn+1 = y|Xn = sn , . . . , X0 = s0 ] = P (sn , y). One can
verify that this defines a Markov chain.
We will identify a stochastic matrix P with the Markov chain it defines.
X Notation: We say that (Xt )t is Markov-(, P ) if (Xt )t is a Markov chain with transition
matrix P and starting distribution X0 . If we wish to stress the state space, we say that
(Xt )t is Markov-(, P, S). Sometimes we omit the starting distribution; i.e. (Xt )t is Markov-P
means that (Xt )t is a Markov chain with transition matrix P .
Example 2.9. Consider the following state space and matrix: S = Z. P (x, y) = 0 if |x y| =
6 1
and P (x, y) = 1/2 if |x y| = 1.
What if we change this to P (x, y) = 1/4 for |x y| = 1 and P (x, x) = 1/2?
What about P (x, x + 1) = 3/4 and P (x, x 1) = 1/4? 454
Example 2.10. Consider the set Zn := Z/nZ = {0, 1, . . . , n 1}. Let P (x, y) = 1/2 for
x y {1, 1} (mod n). 454
13
1
Example 2.11. Let G be a graph. For x, y G define P (x, y) = deg(x) if x y and P (x, y) = 0
if x 6 y.
This Markov chain is called the simple random walk on G.
1
If we take 0 < < 1 and set Q(x, x) = and Q(x, y) = (1 ) deg(x) for x y, and
Q(x, y) = 0 for x 6 y, then Q is also a stochastic matrix, and defines what is sometimes called
the lazy random walk on G (with holding probability ). Note that Q = I + (1 )P . 4 5 4
X Notation: We will usually use (Xn )n to denote the realization of Markov chains. We will
also use Px to denote the probability measure Px = P[|X0 = x]. Note that the Markov property
is just the statement that
Exercise 2.3. Let (Xn )n be a Markov chain on state space S, with transition matrix
P . Show that for any event A (X0 , . . . , Xk )
Example 2.13. Consider a bored programmer. She has a (possibly biased) coin, and two chairs,
say a and b. Every minute, out of boredom, she tosses the coin. If it comes out heads, she moves
to the other chair. Otherwise, she does nothing.
This can be modeled by a Markov chain on the state space {a, b}. At each time, with some
probability 1 p the programmer does not move, and withprobability p she jumps to the other
1p p
state. The corresponding transition matrix would be P = .
p 1p
What is the probability Pa [Xn = b] =? For this we need to calculate P n .
A complicated way would be to analyze the eigenvalues of P ...
14
An easier way: Let n = P n (a, ). So n+1 = n P . Consider the vector = (1/2, 1/2).
Then P = P . Now, consider an = (n )(a). Since n is a probability measure, we get that
n (b) = 1 n (a), so
1 1+(12p)n
So an = (1 2p)n a0 = (1 2p)n 2 and P n (a, a) = n (a) = 2 . (This also implies that
n n 1(12p)n
P (a, b) = 1 P (a, a) = 2 .)
We see that
n
1 1
P 12 = .
1 1
454
The following proposition relates starting distributions, and steps of the Markov chain, to
matrix and vector multiplication.
Proposition 2.14. Let (Xn )n be a Markov chain with transition matrix P on some state
P
space S. Let be some distribution on S; i.e. is an S-indexed vector with s (s) = 1. Then,
P [Xn = y] = (P n )(y). Specifically, taking = x we get that Px [Xn = y] = P n (x, y).
Moreover, if f : S R is any function, which can be viewed as a S-indexed vector, then
Proof. This is shown by induction: It is the definition for n = 0 (P 0 = I the identity matrix).
The Markov property gives for n > 0, using induction,
X
P [Xn = y] = P [Xn = y|Xn1 = s] P [Xn1 = s]
sS
X
= P (s, y)(P n1 )(s) = ((P n1 )P )(y) = (P n )(y).
s
n
(P f )(x) = Ex [f (Xn )] is just for = x . t
u
15
When we spoke about graphs, we have the notion of connectivity. We are now interested to
generalize this notion to Markov chains. We want to say that a state x is connected to a state y
if there is a way to get from x to y; note that for general Markov chains this does not necessarily
imply that one can get from y to x.
This means that for every pair, there is a large enough time such that with positive probability
the chain can go from one of the pair to the other in that time.
Example 2.16. Consider the cycle Z/nZ, for n even. This is an irreducible chain since for any
x, y, we have for t = dist(x, y), if is a path of length t from x to y,
Note that at each step, the Markov chain moves from the current position +1 or 1 (mod n).
Thus, since n is even, at even times the chain must be at even vertices, and at odd times the
chain must be at odd vertices.
Thus, it is not true that there exists t > 0 such that for all x, y, P t (x, y) > 0.
The main reason for this is that the chain has a period: at even times it is on some set, and
at odd times on a different set. Similarly, the chain cannot be back at its starting point at odd
times, only at even times. 454
A state x is called periodic if gcd {t 1 : P t (x, x) > 0} > 1, and this gcd is called the
period of x.
If gcd {t 1 : P t (x, x) > 0} = 1 the x is called aperiodic.
P is called aperiodic if all x S are aperiodic. Otherwise P is called periodic.
X Note that in the even-length cycle example, gcd {t 1 : P t (x, x) > 0} = gcd {2, 4, 6, . . .} =
2.
Remark 2.18. If P is periodic, then there is an easy way to fix P to become aperiodic: namely,
let Q = I +(1)P be a lazy version of P . Then, Q(x, x) for all x, and thus Q is aperiodic.
16
x is aperiodic if and only if there exists t(x) such that for all t > t(x), P t (x, x) > 0.
If P is irreducible, then P is aperiodic if and only if there exists an aperiodic state x.
Consequently, if P is irreducible and aperiodic, and if S is finite, then there exists t0
such that for all t > t0 all x, y admit P t (x, y) > 0.
Proof. We start with the first assertion. Assume that x is aperiodic. Let R = {t 1 : P t (x, x) > 0}.
Since P t+s (x, x) P t (x, x)P s (x, x) we get that t, s R implies t + s R; i.e. R is closed under
addition. A number theoretic result tells us that since gcd R = 1 it must be that Rc is finite.
The other direction is simpler. If Rc is finite, then R contains primes p 6= q, so gcd R =
gcd(p, q) = 1.
For the second assertion, if P is irreducible and x is aperiodic, then let t(x) be such that for
all t > t(x), P t (x, x) > 0. For any z, y let t(z, y) be such that P t(z,y) (z, y) > 0 (which exists by
irreducibility). Then, for any t > t(y, x) + t(x) + t(x, y) we get that
P t (y, y) P t(y,x) (y, x)P tt(y,x)t(x,y) (x, x)P t(x,y) (x, y) > 0.
So for all large enough t, P t (y, y) > 0, which implies that y is aperiodic. This holds for all y, so
P is aperiodic.
The other direction is trivial from the definition.
For the third assertion, for any z, y let t(z, y) be such that P t(z,y) (z, y) > 0. Let T =
maxz,y {t(z, y)}. Let x be an aperiodic state and let t(x) be such that for all t > t(x), P t (x, x) >
0. We get that for any t > 2T + t(x) we have that t t(z, x) t(x, z) t 2T > t(x), so
P t (z, y) P t(z,x) (z, x)P tt(z,x)t(x,z) (x, x)P t(x,z) (x, z) > 0.
t
u
Exercise 2.4. Let G be a finite connected graph, and let Q be the lazy random walk
1
on G with holding probability ; i.e. Q = I + (1 )P where P (x, y) = deg(x) if x y and
P (x, y) = 0 if x 6 y.
Show that Q is aperiodic. Show that for diam(G) = max {dist(x, y) : x, y G} we have that
for all t > diam(G), all x, y G admit Qt (x, y) > 0.
17
Random Walks
Ariel Yadin
X Notation: If (Xt )t is Markov-P on state space S, we can define the following: For A S,
These are the hitting time of A and return time to A. (We use the convention that inf = .)
+
If A = {x} we write Tx = T{x} and similarly Tx+ = T{x} .
Recall that we saw that the simple random walk on Z a.s. returns to the origin. We also
stated that on Z3 this is not true, and the simple random walk will never return to the origin
with positive probability.
Let us classify Markov chain according to these properties.
Theorem 3.2. Let (Xt )t be a Markov chain on S with transition matrix P . If P is irre-
ducible, then for any x, y S, x is (positive, null) recurrent if and only if y is (positive, null)
recurrent.
That is, for irreducible chains, all the states have the same classification.
19
The hitting and return times above have the property, that their value can be determined by
the history of the chain; that is the event {TA t} is determined by (X0 , X1 , . . . , Xt ).
Definition 3.3 (Stopping Time). Consider a Markov chain on S. Recall that the probability
space is (S N , F, P) where F is the -algebra generated by the cylinder sets.
A random variable T : S N N {} is called a stopping time if for all t 0, the event
{T t} (X0 , . . . , Xt ).
Example 3.4. Any hitting time and return time is a stopping time. Indeed,
t
[
{TA t} = {Xj A} .
j=0
Example 3.5. Consider the simple random walk on Z3 . Let T = sup {t : Xt = 0}. This is the
last time the walk is at 0. One can show that T is a.s. finite. However, T is not a stopping time,
20
454
Example 3.6. Let (Xt )t be a Markov chain and let T = inf {t TA : Xt A0 }, where A, A0
S. Then T is a stopping time, since
t [
[ k
{T t} = {Xm A, Xk A0 } .
k=0 m=0
454
3.2.1. Conditioning on a stopping time. Stopping times are extremely important in the
theory of martingales, a subject we will come back to in the future.
For the moment, the important property we want is the Strong Markov Property.
For a fixed time t, we saw that the process (Xt+n )n is a Markov chain with starting distribution
Xt , independent of (X0 , . . . , Xt ). We want to do the same thing for stopping times.
21
Proposition 3.8 (Strong Markov Property). Let (Xt )t be a Markov-P on S, and let T be
a stopping time. For all t 0, define Yt = XT +t . Then, conditioned on T < and XT , the
sequence (Yt )t is independent of (X0 , . . . , XT ) and is Markov-(XT , P ).
Proof. The (regular) Markov property tells us that for any m > k, and any event A (X0 , . . . , Xk ),
(provided of course that P[XT +t = x, A, T < ] > 0). Indeed this follows from the fact that
A {T = k} (X0 , . . . , Xk ) (X0 , . . . , Xk+t ) for all k, so
X
P[XT +t+1 = y,A, XT +t = x, T < ] = P[Xk+t+1 = y, Xk+t = x, A, T = k]
k=0
X
= P (x, y) P[Xk+t = x, A, T = k] = P (x, y) P[XT +t = x, A, T < ].
k=0
t
u
Another way to state the above proposition is that for a stopping time T , conditional on
T < we can restart the Markov chain from XT .
22
These excursions are paths of the Markov chain ending at x and starting at x (except, possibly,
the first excursion which starts at X0 ).
For k > 0 define
x(k) = Tx(k) Tx(k1) ,
(k) (k)
if Tx < and 0 otherwise. For Tx < , this is the length of the k-th excursion.
(k1) (k1) (k)
We claim that conditioned on Tx < , the excursion X[Tx , Tx ], is independent
of (X0 , . . . , XT (k1) ), and has the distribution of the first excursion X[0, Tx+ ] conditioned on
x
X0 = x.
Indeed, let Yt = XT (k1) +t . For any A (X0 , . . . , XT (k1) ), and for any path : x x,
x x
since XT (k1) = x,
x
P[Y [0, x(k) ] = |A, Tx(k1) ] = P[X[Tx(k1) , Tx(k) ] = |A, Tx(k1) < ] = Px [X[0, Tx+ ] = ],
Consequently,
1
1 + Ex [V (x)] = ,
Px [Tx+ = ]
where 1/0 = .
23
Proof. The event {V (x) k} is the event that x is visited at least k times, which is exactly
the event that the k-th excursion ends at some finite time. From the example above we have
that for any m,
P[Tx(m) < |Tx(m1) < ] = P[t 1 : XT (m1) +t = x|Tx(m1) < ] = Px [Tx+ < ].
x
n o n o
(m) (m) (m1)
Since Tx < = Tx < , Tx < , we can inductively conclude that
Exercise 3.3. Let (Xt )t be Markov-(S, P ) for some irreducible P . Let Z S. Show
that under Px , the number of visits to x until hitting Z (i.e. the random variable V = VTZ (x) +
1{X0 =x} ) is distributed geometric-p, for p = Px [TZ < Tx+ ].
Corollary 3.11. Let P be an irreducible Markov chain on S. Then the following are equivalent:
(1) x is recurrent.
(2) Px [V (x) = ] = 1.
(3) For any state y, Px [Ty+ < ] = 1.
(4) Ex [V (x)] = .
Since P is irreducible, there exists t > 0 such that Px [Xt = y , t < Tx+ ] > 0 (this is an
exercise). Thus, we have that p := Px [Ty < Tx+ ] Px [Xt = y , t < Tx+ ] > 0. This implies by
the strong Markov property that
(k)
So, using the fact that Px [ k Tx < ] = 1,
Px [Ty Tx(k) ] = Px [Ty Tx(k) | Ty > Tx(k1) , Tx(k1) < ] Px [Ty > Tx(k1) ]
Thus,
Exercise 3.4. Show that if P is irreducible, there exists t > 0 such that Px [Xt = y , t <
Tx+ ] > 0.
Solution to ex:3.4. :(
There exists n such that P n (x, y) > 0 (because P is irreducible). Thus, there is a sequence
x = x0 , x1 , . . . , xn = y such that P (xj , xj+1 ) > 0 for all 0 j < n. Let m = max{0 j <
n : xj = x}, and let t = n m and yj := xm+j for 0 j t. Then, we have the sequence
x = y0 , . . . , yt = y so that yj 6= x for all 0 < j t, and we know that P (yj , yj+1 ) > 0 for all
0 j < t. Thus,
:) X
25
Example 3.12. A gambler plays a fair game. Each round she wins a dollar with probability
1/2, and loses a dollar with probability 1/2, all rounds independent. What is the probability
that she never goes bankrupt, if she starts with N dollars?
We have already seen that this defines a simple random walk on Z, and that E0 [Vt (0)] c t.
Thus, taking t we get that E0 [V (0)] = , and so 0 is recurrent.
Note that 0 here was not special, since all vertices look the same. This symmetry implies that
Px [Tx+ < ] = 1 for all x Z. Thus, for any N , PN [T0+ = ] = 0. That is, no matter how
much money the gambler starts with, she will always go bankrupt eventually. 454
Proof. As usual, by irreducibility, for any pair of states z, w we can find t(z, w) > 0 such that
P t(z,w) (z, w) > 0.
Fix x, y S and suppose that x is transient. For any t > 0,
P t+t(x,y)+t(y,x) (x, x) P t(x,y) (x, y)P t (y, y)P t(y,x) (y, x).
Thus,
X
t 1 X
Ey [V (y)] = P (y, y) t(x,y) P t+t(x,y)+t(y,x) (x, x) < .
t=1
P (x, y)P t(y,x) (y, x) t=1
So y is transient as well. t
u
Random Walks
Ariel Yadin
Suppose that P is a Markov chain on state space S such that for some starting distribution
, we have that P [Xn = x] (x) where is some limiting distribution. One immediately
checks that in this case we must have
X
P (x) = lim P n (y, s)P (s, x) = lim P n+1 (y, x) = (x),
n n
s
Example 4.3. Consider a finite graph G. Let P be the transition matrix of a simple random
1
walk on G. So P (x, y) = deg(x) 1{xy} . Or: deg(x)P (x, y) = 1{xy} . Thus,
X
deg(x)P (x, y) = deg(y).
x
deg(x)
we normalize (x) = 2|E(G)| to get a stationary distribution for P . 454
The above stationary distribution has a special property, known as the detailed balance equa-
tion:
A distribution is said to satisfy the detailed balance equation with respect to a transition
matrix P if for all states x, y
(x)P (x, y) = (y)P (y, x).
27
There is a deep connection between stationary distributions and return times. The main
result here is:
Theorem 4.4. Let P be an irreducible Markov chain on state space S. Then, the following
are equivalent:
may take the value , since we are only dealing with non-negative numbers we can write
P
vP (x) = y v(y)P (y, x) without confusion (with the convention that 0 = 0).
Lemma 4.5. Let P be an irreducible Markov chain on state space S. Let v : S [0, ] be
such that vP = v. Then:
If there exists a state x such that v(x) < then v(y) < for all states y.
If v is not the zero vector, then v(y) > 0 for all states y.
X Note that this implies that if is a stationary distribution then all the entries of are
strictly positive.
Thus, for a suitable choice of t, since P is irreducible, we know that P t (y, x) > 0, and so
v(x)
v(y) P t (y,x) < .
For the second assertion, if v is not the zero vector, since it is non-negative, there exists a
state x such that v(x) > 0. Thus, for any state y and for t such that P t (x, y) > 0 we get
X
v(y) = v(z)P t (z, y) v(x)P t (x, y) > 0.
z
t
u
Pt
X Notation: Recall that for a Markov chain (Xt )t we denote by Vt (x) = k=1 1{Xk =x} the
number of visits to x.
Lemma 4.6. Let (Xt )t be Markov-(P, ) for irreducible P . Assume T is a stopping time such
that
P [XT = x] = (x) for all x.
X
X
P [Xj = y, T > j] = P [X0 = y] + P [Xj = y, T > j]
j=0 j=1
X
= P [Xj = y, T = j] + P [Xj = y, T > j]
j=1
X
= P [Xj = y, T j] = v(y).
j=1
That is, vP = v.
Since
X X
v(x) = E [ VT (x)] = E [T ],
x x
v(x)
if E [T ] < , then (x) = E [T ] defines a stationary distribution. t
u
Example 4.7. Consider (Xt )t that is Markov-P for an irreducible P , and let v(y) = Ex [VTx+ (y)].
If x is recurrent, then Px -a.s. we have 1 Tx+ < , and Px [XTx+ = y] = 1{y=x} = Px [X0 = y].
So we conclude that vP = v. Since Px -a.s. VTx+ (x) = 1, we have that 0 < v(x) = 1 < , so
0 < v(y) < for all y.
Note that although it may be that Ex [Tx+ ] = , i.e. x is null-recurrent, we still have that for
any y, Ex [VTx+ (y)] < , i.e. the expected number of visits to y until returning to x is finite.
Ex [VT + (y)]
If x is positive recurrent, then (y) = x
Ex [Tx+ ]
is a stationary distribution for P . 454
Lemma 4.8. Let P be an irreducible Markov chain. Let u(y) = Ex [VTx+ (y)]. Let v 0 be a
non-negative vector such that vP = v, and v(x) = 1. Then, v u. Moreover, if x is recurrent,
then v = u.
t
X
(4.1) Px [Xk = y, Tx+ k] v(y).
k=1
X
Px [X1 = y, Tx+ 1] = P (x, y) v(z)P (z, y) = v(y),
z
X X
Px [Xk+1 = y, Tx+ k+1] = Px [Xk+1 = y, Xk = z, Tx+ k] = Px [Xk = z, Tx+ k]P (z, y).
z6=x z6=x
30
So by induction,
t+1
X t
X
Px [Xk = y, Tx+ k] = P (x, y) + Px [Xk+1 = y, Tx+ k + 1]
k=1 k=1
X t
X
= P (x, y) + P (z, y) Px [Xk = z, Tx+ k]
z6=x k=1
X X
P (x, y) + P (z, y)v(z) = v(z)P (z, y) = v(y).
z6=x z
Proof of Theorem 4.4. Assume that is a stationary distribution for P . Fix any state x. Recall
(z)
that (x) > 0. Define the vector v(z) = (x) . We have that v 0, vP = v and v(x) = 1. Hence,
v(z) Ex [VTx+ (z)] for all z. That is,
X X X (y) 1
Ex [Tx+ ] = Ex [VTx+ (y)] v(y) = = < .
y y y
(x) (x)
for P .
Since P has a stationary distribution, by the first implication all states are positive recurrent.
Thus, for any state z, if v = (z) then vP = v and v(z) = 1. So z being recurrent we get that
31
Corollary 4.9 (Stationary distributions are unique). If an irreducible Markov chain P has
two stationary distributions and 0 , then = 0 .
Exercise 4.2. Let P be an irreducible Markov chain. Show that for positive recurrent
states x, y,
Ex [VTx+ (y)] Ey [VTy+ (x)] = 1.
Theorem* (3.2). [restatement] Let P be an irreducible Markov chain. For any two states
x, y: x is transient / null recurrent / positive recurrent if and only if y is transient / null
recurrent / positive recurrent.
In light of this:
Random Walks
Ariel Yadin
Last lecture we proved that an irreducible Markov chain P has a stationary distribution if and
only if P is positive recurrent, and the stationary distribution is the reciprocal of the expected
return time.
Lets investigate what this means in the setting of a simple random walk on a graph.
Example 5.1. Let G be a graph, and let P be the simple random walk on G; that is, P (x, y) =
1
deg(x) 1{xy} .
That is, vP = v.
If we take u(y) = v(y)/v(x) for some x, then uP = u and u(x) = 1. Thus, if P is recurrent,
deg(y)
then Ex [VTx+ (y)] = u(y) = for all x, y. This does not depend on dist(x, y)!
deg(x)
P
Another observation is that x v(x) = 2|E(G)|. That is, P is positive recurrent if and only
deg(x)
if G is finite. Moreover, in this case, the stationary distribution for P is (x) = 2|E(G)| .
Note that if G is a finite regular graph then the stationary distribution on G is the uniform
distribution. 454
Example 5.2. Recall the simple random walk on Z. We already have seen that this is a
recurrent Markov chain. Thus, if vP = v, then v(y) = Ex [VTx+ (y)]v(x) for all x, y. Since the
constant vector ~1 satisfies ~1P = ~1, we get that Ex [VTx+ (y)] = 1 for all x, y. Thus, any v such
that vP = v must admit v c.
So there is no stationary distribution on Z; that is, Z is null-recurrent. (We could have also
deduced this from the previous example.) 454
34
Example 5.3. Consider a different Markov chain on Z: Let P (x, x+1) = p and P (x, x1) = 1p
for all x.
1
Suppose vP = v. Then, v(x) = v(x 1)p + v(x + 1)(1 p), or v(x + 1) = 1p (v(x) pv(x 1))
h i
1
Solving such recursions is simple: Set ux = v(x + 1) v(x) . So ux+1 = 1p Aux where
1 p
A= .
1p 0
x
p
v(x) = ux (2) = (1 p)x (Ax u0 )(2) = (1 p)x [0 1]M DM 1 u0 = a 1p + b,
where D is diagonal with p, 1 p on the diagonal, and a, b are constants that depend on the
matrix M and on u0 (but independent of x).
P
Thus, x v(x) will only converge for a = 0, b = 0 which gives v = 0. That is, there is no
stationary distribution, and P is not positive recurrent.
In the future we will in fact see that P is transient for p 6= 1/2, and for p = 1/2 we have
already seen that P is null-recurrent. 454
Example 5.4. A chess knight moves on a chess board, each step it chooses uniformly among
the possible moves. Suppose the knight starts at the corner. What is the expected time it takes
the knight to return to its starting point?
At first, this looks difficult...
However, let G be the graph whose vertices are the squares of the chess board, V (G) =
2
{1, 2, . . . , 8} . Let x = (1, 1) be the starting point of the knight. For edges, we will connect two
vertices if the knight can jump from one to the other in a legal move.
35
Thus, for example, a vertex in the center of the board has 8 adjacent vertices. A corner,
on the other hand has 2 adjacent vertices. In fact, we can determine the degree of all vertices.
2 3 4 4 4 4 3 2
3 4 6 6 6 6 4 3
o 4 6 8 8 8 8 6 4
4 6 8 8 8 8 6 4
legal moves:
4 6 8 8 8 8 6 4
4 6 8 8 8 8 6 4
3 4 6 6 6 6 4 3
2 3 4 4 4 4 3 2
Let us sum up what we know so far about irreducible chains. If P is an irreducible Markov
chain, then:
1
Ex [V (x)] + 1 = Px [Tx+ =]
.
For all states x, y, x is transient if and only if y is transient.
If P is recurrent, the vector v(z) = Ex [VTx+ (z)] is a positive left eigenvector for P , and
any non-negative left eigenvector for P is proportional to v.
P has a stationary distribution if and only if P is positive recurrent.
If P is positive recurrent, then (x) Ex [Tx+ ] = 1.
Recall that Lemma 4.6 connects the expected number of visits to x up to an appropriate
stopping time, to the stationary distribution and the expected value of the stopping time:
Good choices of the stopping time T for positive recurrent chains will give some nice identities.
Proposition 5.5. Let P be a positive recurrent chain with stationary distribution . Then,
1
Ex [Tx+ ] = (x) .
(y)
Ex [VTx+ (y)] = (x) .
For x 6= y,
1 + Ex [VTy+ (x)] = (x) (Ey [Tx+ ] + Ex [Ty+ ]).
For x 6= y,
(x) Px [Ty+ < Tx+ ] (Ey [Tx+ ] + Ex [Ty+ ]) = 1.
(This is sometimes called the edge commute inequality. It will be important in the
future.) For x y,
1
Ex [Ty ] + Ey [Tx ] .
(x)P (x, y)
Proof. We have:
So by Lemma 4.6,
This follows from the previous bullet since Px -a.s. VTy+ (x) + 1 Geo(p) for p = Px [Ty+ <
Tx+ ].
Since for x y we have that Px [Ty+ < Tx+ ] Px [X1 = y] = P (x, y), we get the assertion
from the previous bullet.
t
u
Random Walks
Ariel Yadin
Recall that we saw that if P t (y, x) (x) for all x, then must be a stationary distribution.
We will start to work our way to prove the opposite, at least for irreducible and aperiodic chains.
Our goal:
Theorem* (6.5). [restatement] Let (Xt )t be an irreducible and aperiodic Markov chain.
Suppose that is a stationary distribution for this chain. Then, for any starting distribution ,
and any state x,
P [Xt = x] (x).
6.2. Couplings
Example 6.3. Let us use a Markovian coupling to show that lowering the winning probability
for a gambler, lowers their chances of winning.
Let p < q, and let P be the transition matrix on N for the gambler that wins with probability
p, and let Q be the transition matrix for the gambler that wins with probability q. That is,
P (n, n + 1) = p and P (n, n 1) = 1 p for all n > 0, and P (0, 0) = 1. Similarly for Q.
The corresponding Markov chains are (Xt )t for P and (Yt )t for Q. We can could the chains
as follows: Given (Xt , Yt ), since Y moves up with higher probability than X, we can organize a
coupling such that Yt+1 Xt+1 in any case. That is, given (Xt , Yt ), if Xt > 0 let
(1, 1) with probability p
(Xt+1 , Yt+1 ) = (Xt , Yt ) + (1, 1) with probability q p
(1, 1) with probability 1 q.
If Xt = 0, Yt > 0 let
(0, 1) with probability q
(Xt+1 , Yt+1 ) = (Xt , Yt ) +
(0, 1) with probability 1 q.
PQ R
N [T0 < TM ] = P(N,N ) [ t : Yt = 0 and n < t Yn < M ]
PR P
(N,N ) [ t : Xt = 0 and n < t Xn < M ] = PN [T0 < TM ],
where PP , PQ , PR denote the probability measures for P , Q, and R respectively, and we have
used the fact that under PR
(N,N ) , a.s. Xt Yt for all t. 454
Lemma 6.4. Let (Xt , Yt )t be a Markovian coupling of two Markov chains on the same state
space S with the same transition matrix P . Define the coupling time as
= inf {t 0 : Xt = Yt } .
c
Proof. Since { t + 1} = { < t + 1} ((X0 , Y0 ), . . . , (Xt , Yt )), the Markov property at
time t gives
P[Zt+1 = y|Zt = x, t+1, Zt1 , . . . , Z0 ] = P[Xt+1 = y|Xt = x, t+1, Xt1 , . . . , X0 ] = P (x, y).
Since is a stopping time, we can use the strong Markov property to deduce that for any t,
t
u
40
In this section we will prove a fundamental result in the theory of Markov chains.
Theorem 6.5. Let P be an irreducible and aperiodic Markov chain. If P has a stationary
distribution , then for any starting distribution , and any state x,
P [Xt = x] (x).
Proof. Let (Yt )t be Markov-(, P ) independent of (Xt )t . Since P t = , we have that (x) =
P[Yt = x]. Let be the coupling time of (Xt , Yt )t .
First we show that P[ < ] = 1, so P[ > t] 0. Indeed, (Xt , Yt )t is a Markov chain on S 2 ,
with transition matrix Q((x, y), (x0 , y 0 )) = P (x, x0 )P (y, y 0 ). Moreover, for (x, y) = (x)(y),
we get that is stationary distribution for Q.
We claim that since P is irreducible and aperiodic, then Q is also irreducible (and aperi-
odic). Indeed, let (x, y), (x0 , y 0 ) S 2 . We already saw that there exist t(x, x0 ), t(y, y 0 ) such
that for all t > t(x, x0 ), P t (x, x0 ) > 0 and for all t > t(y, y 0 ), P t (y, y 0 ) > 0. Thus, for all
t > max {t(x, x0 ), t(y, y 0 )} we have that Qt ((x, x0 ), (y, y 0 )) > 0. Thus, Q is irreducible.
Since Q has a stationary distribution and Q is irreducible, we get that Q is positive recurrent.
Specifically, P[T(x,x) < ] = 1 for any x S. Since T(x,x) , we get that P[ < ] = 1.
Now define
Y
t t
Zt =
X
t t .
So (Xt , Zt )t is a coupling of Markov chains such that for all t , Xt = Zt . Also, since
Z0 = Y0 ,
Adding this to
we get that
Finally, the previous lemma tells us that (Zt )t is Markov-(S, P, ), most importantly, starting
distribution . So P[Zt = x] = (x). t
u
41
Random Walks
Ariel Yadin
Recall that we want to define a random walk. A (simple) random walk is a process that
given the current location chooses among the available neighbors uniformly. So we need a way
of conditioning on the current position.
That is, we want the notions of conditional probability and conditional expectation.
The notion of conditional expectation is central to probability. It is developed using the
Radon-Nikodym derivative from measure theory:
Johann Radon (1887-1956)
Theorem 7.1. Let , be two probability measures on (, F). Suppose that is absolutely
continuous with respect to ; that is, (A) = 0 implies that (A) = 0 for all A F.
d
Then, there exists a (-a.s. unique) random variable d on (, F, ) such that for any event
A F,
d
E [1A ] = E [1A ].
d
Otto Nikodym (1887-1974)
d
Z Z
d = d,
A A d
d
which can be informally stated as d d = d.
This theorem is used to prove the following theorem.
Definition 7.3. Let X be an integrable (E[|X|] < ) random variable on a probability space
(, F, P). Let G F be a sub--algebra of F.
The random variable from the above theorem is denoted E[X|G].
If Y is a random variable on (, F, P) then we denote E[X|Y ] := E[X|(Y )].
If A F is any event then we write
Proof of Theorem 7.2. Note that uniqueness is immediate from the fact that if Y, Y 0 are two
such random variables, then for An = Y Y 0 n1 we have that An G (as a function of
(Y, Y 0 )) and
P[An ]n1 E[(Y Y 0 )1An ] = E[X1An ] E[X1An ] = 0.
So by continuity of probability,
[
P[Y > Y 0 ] = P[ An ] = lim P[An ] = 0.
n
n
E[X1A ] = E[ dQ
d P 1A ] E[X].
dQ
Taking Y = dP E[X] completes the case of X 0.
For the general case, recall that X = X + X , and X + , X are non-negative. Let Y1 =
E[X + |G] and Y2 = E[X |G]. Then, Y1 Y2 G and for any A G,
X Note that to prove that Y = E[X|G] one needs to show two things: Y G and E[Y 1A ] =
X Important: Conditional expectation E[X|G] is the average value of X given the information
44
in G; this is a random variable, not a number as is the usual expectation. One needs to be careful
with this. Whenever we write E[X|G] = Z we actually mean that E[X|G] = Z a.s.
Solution.
Solution. It suffices to prove that if G and G 0 are -algebras such that if A G4G 0 then P[A] = 0
(that is, G and G 0 only differ on measure 0 events) then E[X|G] = E[X|G 0 ] a.s.
G G 0 is a -algebra as an intersection of -algebras. Let Z = E[X|G G 0 ]. Since G G 0 G
and G G 0 G 0 we have that Z is both G and G 0 measurable. Moreover, for any A G: if A 6 G 0
then P[A] = 0 so E[X1A ] = 0 = E[Z1A ]. If A G 0 then A G G 0 so E[X1A ] = E[Z1A ] by
definition. Thus, Z = E[X|G]. Similarly, exchanging the roles of G and G 0 , we get Z = E[X|G 0 ],
so E[X|G] = E[X|G 0 ] a.s. t
u
E[(aX+Y )1A ] = a E[X1A ]+E[Y 1A ] = a E[E[X|G]1A ]+E[E[Y |G]1A ] = E[(a E[X|G]+E[Y |G])1A ].
t
u
Exercise 7.5. Let G G. Show that for any event A with P[A] > 0,
E[P[A|G]1G ]
P[G|A] = .
P[A]
t
u
Proof. Let Yn = X Xn . Since Xn % X, we get that Yn 0 for all n. Thus, (E[Yn |G])n is a
monotone non-increasing sequence of non-negative random variables. Let Z() = inf n E[Yn |G]() =
limn E[Yn |G]() = lim inf n E[Yn |G](). So Z G and Z 0. Fatous Lemma gives that for any
A G,
E[Z] lim inf E[E[Yn |G]] = lim inf E[X Xn ] = 0,
n n
since E[Xn ] % E[X] by monotone convergence. Thus, Z = 0 a.s. This implies that
a.s.
E[X|G] E[Xn |G] 0.
t
u
Proof. Note that E[X|G]Z G so we only need to prove the second property.
We use the usual four-step proof, from indicators to simple random variables to non-negatives
to general.
If Z = 1B for some B G then for any A G,
The following properties all have their usual proof adapted to the conditional setting.
Proof. If g is convex then for any m there exist am , bm such that g(s) am s+bm for all s, and g(m) = am m+bm . Johan Jensen (1859-1925)
Thus, for any , there exist A(), B() such that g(s) A()s + B() for all s and g(E[X|G]()) =
A() E[X|G]() + B(). It is not difficult to see that A, B are measurable, and determined by E[X|G] and g, so
A, B are G-measurable random variables. Thus,
u
t
0 a.s. then Y = 0 a.s. and so both sides of the inequality become 0. So we can assume that E[Y 2 |G] > 0. (1789-1857)
E[XY |G]
Set = E[Y 2 |G]
, which is a G-measurable random variable. By linearity,
(E[XY |G])2
= E[X 2 |G] .
E[Y 2 |G]
u
t
Hermann Schwarz
Proposition 7.8 (Markov / Chebyshev ). If X 0 is integrable, then for any G-measurable
(1843-1921)
Z such that Z > 0,
E[X|G]
P[X Z|G] .
Z
u
t
Pafnuty Chebyshev
(1821-1894)
48
7.2.1. The smaller -algebra always wins. Perhaps the most important property that has
no unconditional counterpart is
E[E[X|H]|G] = E[X|H].
E[E[X|G]|H] = E[X|H].
Proof. The first assertion comes from the fact that E[X|H] H G, so conditioning on G has
no effect.
For the second assertion we have that E[X|H] H of course, and for any A H, using that
A G as well,
t
u
During this course, we will almost always use conditional probabilities conditioned on some
discrete random variable. Note that if Y is discrete with range R (perhaps d-dimensional), then
P
rR 1{Y =r} = 1 a.s. This simplifies the discussion regarding conditional probabilities.
U
Exercise 7.6. Suppose that (, F, P) is a probability space with = kI Ak where
Ak F for all k I, with I some countable (possibly finite) index set. Show that
]
((Ak )kI ) = Ak : J I .
kJ
Hint: Show that any set in the right-hand side must be in ((Ak )kI ). Show that the right-
hand side is a -algebra.
Lemma 7.11. Let X be an integrable random variable on (, F, P). Let I be some countable
U
index set (possibly finite). Suppose that P[ kI Ak ] = 1 where Ak F for all k, and P[Ak ] > 0
for all k. Let G = ((Ak )kI ). Then,
X E[X1Ak ]
E[X|G] = 1Ak .
P[Ak ]
k
E[X1 ]
Proof. Let Y = k 1Ak P[AkA]k . The of course Y G. For any A G we have that 1A =
P
P
kJ 1Ak (P-a.s.) for some J I. Thus,
X E[X1Ak ] X
E[Y 1A ] = E[1Ak ] = E[X1Ak ] = E[X1A ].
P[Ak ]
kJ kJ
t
u
Corollary 7.12. Let Y be a discrete random variable with range R on (, F, P). Let X be an
integrable random variable on the same space. Then,
X E[X1{Y =r} ] X
E[X|Y ] = 1{Y =r} = 1{Y =r} E[X|Y = r],
P[Y = r]
rR rR
E[X1{Y =r} ]
where we take the convention that E[X|Y = r] = P[Y =r] = 0 for P[Y = r] = 0.
U
Proof. = rR {Y = r}. t
u
X Note that E[X|Y ] is a discrete random variable as well, regardless of the original distribution
of X.
Number of exercises in lecture: 6
Total number of exercises until here: 16
50
Random Walks
Ariel Yadin
Lecture 8: Martingales
8.1. Martingales
X Do conditional expectation
Definition 8.2. Let (, F, P) be a probability space, and let (Fn )n be a filtration. A sequence
(Xn )n is said to be a martingale with respect to the filtration (Fn )n , or sometimes a (Fn )n -
martingale, if for all n,
If the filtration is not specified then we say that (Xn )n is a martingale if it is a martingale
with respect to the natural filtration Fn := (X0 , . . . , Xn ); that is, a sequence of integrable
random variables such that for all n,
E[Xn+1 |Xn , . . . , X0 ] = Xn .
Example 8.3. Let (Xn )n be a simple random walk on Z started at X0 = 0. The Markov
property gives that
1 1
E[Xn+1 |Xn , . . . , X0 ] = (Xn + 1) + (Xn 1) = Xn .
2 2
Example 8.4. More generally, if (Xn )n is a sequence of independent random variables with
Pn
E[Xn ] = 0 for all n, and Sn = k=0 Xk , then
Proposition 8.5. Let (Xn )n be a (Fn )n -martingale. For any k n we have E[Xn |Fk ] = Xk .
Proof. For k = n this is obvious. Assume that k < n. By properties of conditional expectation,
because Fk Fn1 ,
Exercise 8.2. Let (Xn )n be a (Fn )n -martingale. Let T be a stopping time (with respect
to the filtration (Fn )n ). Prove that (Yn := XT n )n is a (Fn )n -martingale.
Theorem 8.6 (Optional Stopping). Let (Xn )n be an (Fn )n -martingale and T a stopping
time. We have that E[XT |X0 ] = X0 in the following cases:
Proof. We start with the first case: Let Yn = XT n . Since T t a.s. we get that Yt = XT .
Since Y0 = X0 we conclude
| E[Yn |X0 ] E[XT |X0 ]| = | E[(XT n XT ) 1{T >n} |X0 ]| 2M P[T > n|X0 ] 0,
Since T 1{T >n} 0, and since E[T ] < , we get by dominated convergence that E[T 1{T >n} ]
0, and so
X0 = E[XT n |X0 ] E[XT |X0 ].
t
u
Example 8.7 (Gamblers Ruin). Let (Xt )t be a simple random walk on Z. Let T = T ({0, n})
be the first time the walk is at 0 or n.
We can think of Xt as a the amount of money a gambler playing a fair game has after the
t-th game. What is the probability that a gambler that starts with x reaches n before going
bankrupt?
Let
pn (x) = Px [Tn < T0 ].
Since (Xt )t is a martingale, we get that (XtT )t is a bounded martingale under the measure Px .
Since T is a.s. finite, we can apply the optional stopping theorem to get
x = Ex [XT T ] = Ex [XT ]
x
So pn (x) = n. 454
\
P1 [ An ] = lim P[An ] = lim n1 = 0.
n n
n
By symmetry,
\
P1 [ An ] = 0.
n
Now, the event that the walk never returns to 0 is the event that the walk takes a step to either
1 or 1 and then never returns to 0; i.e.
( ) ( )
+ \ ] \
T0 = = X1 = 1, An X1 = 1, An .
n n
1
(Xt + 1)2 (t + 1) + (Xt 1)2 (t + 1) = Yt .
E[Yt+1 |X0 , . . . , Xt ] =
2
So (Yt )t is a martingale, and thus (YT t ) is a bounded martingale under the measure Px . Thus,
since Y0 = X02 ,
Ex [T ] = xn x2 = x(n x).
454
Example 8.11. Consider the martingale Xt2 t. Using the optional stopping theorem at time
T = T0+ we get that
1 = E1 [X02 0] = E1 [XT2 T ] = E1 [T ].
Similarly, E1 [T ] = 1. Since
1 1
E0 [T0+ ] = E0 [T0+ |X1 = 1] + E0 [T0+ |X1 = 1] = (E1 [T0 + 1] + E1 [T0 + 1]) ,
2 2
we get that E0 [T0+ ] = 2 < !
Where did we go wrong?
We could not use the optional stopping theorem, because the martingale Xt2 t is not bounded!
454
Example 8.12. Actually, this last bit gives a third proof that E0 [T0+ ] = . Suppose that
Ex [T0 ] < . Since (Xt )t is a martingale with bounded differences, by the optional stopping
theorem x = Ex [XT0 ]. But, XT0 = 0 a.s. so Ex [T0 ] = for all x. Using the Markov property,
1
E0 [T0+ ] = (E1 [T0 + 1] + E1 [T0 + 1]) = .
2
454
Random Walks
Ariel Yadin
Let (Xt )t be Markov-P . Then, conditioned on Xt , we have that X[0, t] and X[t, ) are
independent. This suggests looking at the chain run backwards in time - since determining the
past given the future will only depend on the current state.
However, in accordance with the second law of thermodynamics (entropy always increases),
we know that nice enough chains converge to a stationary distribution, even if the chain is started
from a very ordered distribution - namely a -measure. This suggests that there is a specific
direction we are looking at, and that the chain is moving from order to disorder represented by
the stationary measure.
However, if we start the chain from the stationary distribution, perhaps we can view the chain
both forwards and backwards in time. This is the content of the following.
Definition 9.1. Let P be an irreducible Markov chain with stationary distribution . Define
(y)
P (x, y) = P (y, x) (x) . P is called the time reversal of P .
Theorem 9.2. Let be the stationary distribution for an irreducible Markov chain P .
Then, P is an irreducible Markov chain, and is a stationary distribution for P .
Moreover: Let (Xt )t be Markov-(, P ). Fix any T > 0 and define Yt = XT t , t = 0, . . . , T .
Then, (Yt )Tt=0 is Markov-(, P ).
(x)
X X
1
P (x, y) = (y)P (y, x) (x) = (x) = 1.
y y
Also,
X X X
1
( P )(x) = (y)P (y, x) = (y)(x)P (x, y) (y) = (x) P (x, y) = (x),
y y y
56
so is stationary for P .
Finally, note that (x)P (x, y) = (y)P (y, x). So,
P[Y0 = x0 , . . . , YT = xT ] = P [X0 = xT , X1 = xT 1 , . . . , XT = x0 ]
t
u
We also proved in the exercises that if P and are in detailed balance, then must be a
stationary distribution for P . (The opposite is not necessarily true, as is shown in the exercises.)
Immediately we see a connection between detailed balance and time reversals:
Proposition 9.4. Let P be a Markov chain with stationary distribution . The following are
equivalent:
Proof. We show that each bullet implies the one after it.
If P and are in detailed balance, then for any states x, y,
(y) 1
P (x, y) = P (y, x) (x) = (x)P (x, y) (x) = P (x, y).
So P = P .
57
If P = P then for any T > 0, if (Xt )Tt=0 is Markov-(, P ) then (XT t )Tt=0 is Markov-(, P ).
Since P = P we get that (XT t )Tt=0 is Markov-(, P ). Reversing the roles of Xt and XT t we
get that for all T > 0, (Xt )Tt=0 is Markov-(, P ) if and only if (XT t )Tt=0 is Markov-(, P ).
Now for the third implication, assume that for all T > 0, (Xt )Tt=0 is Markov-(, P ) if and
only if (XT t )Tt=0 is Markov-(, P ). Take T = 1. Then (X0 , X1 ) is Markov-(, P ) if and only if
(X1 , X0 ) is Markov-(, P ). That is,
The pair (G, c) is called a weighted graph, or sometimes a network or electric network.
P P
Remark 9.6. Let (G, c) be a weighted graph, with C = x,y c(x, y) < . Define cx = y c(x, y)
c(x,y) cx
and P (x, y) = cx . P is a stochastic matrix, and so defines a Markov chain. For (x) = C
we have that is a distribution, and (x)P (x, y) = c(x, y) = c(y, x) = (y)P (y, x). Thus, P is
reversible.
We will refer to such a P as the random walk on G induced by c.
On the other hand, if P is a reversible Markov chain S, we can define a weighted graph as
follows: Let V (G) = S and c(x, y) = (x)P (x, y). Let x y if c(x, y) > 0. Note that
X X
c(x, y) = (x)P (x, y) = 1.
x,y x,y
P
Definition 9.7. If (G, c) is a weighted graph with x,y c(x, y) < , then the Markov chain
P (x, y) = Pc(x,y) is called the weighted random walk on G with weights c.
z c(x,z)
58
Example 9.8. Let (G, c) be the graph V (G) = {0, 1, 2}, with edges E(G) = {{0, 1} , {1, 2} , {0, 2}}
and c(0, 1) = 1, c(1, 2) = 2 and c(2, 0) = 3.
The weighted random walk is then
1 3
0 4 4
1 2
P = 3 0 .
3
3 2
5 5 0
c(z, w) so = [ 13 1 5
P P
The stationary measure is, of course, (x) = y c(x, y)/ z,w 4 12 ] is the
stationary distribution.
We can compute that P = P (which is expected since P is reversible). 454
Example 9.9 (One dimensional Markov chains are almost reversible). Let P be a Markov
chain on Z such that P (x, y) > 0 if and only if |x y| = 1. For x Z let px = P (x, x + 1) (so
1 px = P (x, x 1).
Consider the following conductances on Z: Let c(0, 1) = 1. For x > 0 set
x
Y py
c(x, x + 1) = .
y=1
1 py
1px
Let c(0, 1) = px and for x < 0 set
x
Y 1 py
c(x, x 1) = .
y=1
py
1
(c(x, x 1) + c(x, x + 1))P (x, x + 1) = c(x, x 1) 1px px = c(x, x + 1)
and
1
(c(x + 1, x) + c(x + 1, x + 2))P (x + 1, x) = c(x, x + 1) 1px (1 px ) = c(x, x + 1),
So for m(x) = c(x, x 1) + c(x, x + 1) we have that m(x)P (x, y) = m(y)P (y, x) for all x, y. That
is, if m was a distribution, P would be reversible.
59
For example, if px = 1/3 for x > 0 and px = 2/3 for x < 0 we would have that c(x, x+1) = 2x
for x 0 and c(x, x 1) = 2x for x 0. Thus
X
X
m(x) = 2 2x + 2x = 4 2 = 8 < .
x x=0
c(x,x1)+c(x,x+1)
So (x) = 8 is a stationary distribution.
In general, we see that a drift towards 0 would give a reversible chain. 454
Random Walks
Ariel Yadin
10.1. Laplacian
In order to study electric networks and conductances, we will first introduce the concept of
harmonic functions.
Let G = (V (G), c) be a network; recall that by this we mean: c : V (G) V (G) [0, ) with
P
c(x, y) = c(y, x) for all x, y G and cx := y c(x, y) < for all x. We denote by E(G) the set
of oriented edges of G; that is,
(We write x y when c(x, y) > 0.) For e E(G) we write e = (e+ , e ). c is known as the
conductance of the network.
Let C 0 (V ) = {f : V (G) R} and C 0 (E) = {f : E(G) R} be the sets of all functions of
vertices and (oriented) edges of G respectively.
We can define an operator : C 0 (V ) C 0 (E) by: for any edge x y,
X 1
(divF )(x) = (F (x, y) F (y, x)).
c
yx x
X X 1
hf, f 0 i = cx f (x)f 0 (x) and hF, F 0 i = F (e)F 0 (e).
x e
c(e)
Consider the subspaces L2 (V ) = f C 0 (V ) : hf, f i < and L2 (E) = F C 0 (E) : hF, F i < .
The operator is a linear operator from L2 (V ) to L2 (E). Also div : L2 (E) L2 (V ) is a linear
61
operator, and
X X
hf, F i = (f (x) f (y))F (x, y) = f (x)(F (x, y) F (y, x))
(x,y) xy
X X 1
= cx f (x) (F (x, y) F (y, x)) = hf, divF i.
x
c
yx x
Exercise 10.1. Show that (in matrix form) = I P where I is the identity
operator.
Proof. First assume that f is harmonic in S. Note that if x 6 S then XtT = X0 = x a.s. under
Px . So as a constant sequence, Mt = f (x) is a martingale. So we only need to deal with x S.
The main observation here is that the Markov property is just the fact that
X
Ex [f (Xt+1 )|Ft ] = P (Xt , y)f (y) = (P f )(Xt ).
y
62
For any t, since 1{T t+1} = 1{T >t} Ft , and f (XT )1{T t} Ft ,
Ex [Mt+1 |Ft ] = Ex [f (Xt+1 )|Ft ]1{T >t} + f (XT )1{T t} = (P f )(Xt )1{T >t} + f (XT )1{T t} .
If f is harmonic at x, then P f (x) = f (x). Thus, since on the event T > t, f is harmonic at Xt ,
we get that (P f )(Xt )1{T >t} = f (Xt )1{T >t} . In conclusion,
Ex [Mt+1 |Ft ] = (P f )(Xt )1{T >t} +f (XT )1{T t} = f (Xt )1{T >t} +f (XT )1{T t} = f (XtT ) = Mt .
So Mt is a martingale.
For the other direction, assume that MtT is a martingale. Then, for any x S,
were we have used that under Px , T 1 a.s. So we have that for any x S, f (x) =
(I P )f (x) = 0. So f is harmonic in S. t
u
D = {x G : Px [TB < ] = 1} .
Proof. Define f (x) = Ex [u(XTB )]. This is well defined, since under Px , TB < a.s. and since
u is bounded.
It is immediate to check that for any b B, f (b) = u(b). Also, for x D \ B, since TB 1
Px a.s. , by the Markov property,
X
f (x) = Ex [u(XTB )] = P (x, y) Ey [u(XTB )] = P f (x).
y
So f is harmonic at x.
For uniqueness, assume that g : D R is bounded, harmonic in D \ B, and g(b) = u(b) for
all b B. We want to show that
g is bounded, so (g(XTB t ))t is a bounded martingale, so (10.1) holds by the optional stopping
theorem, because TB < Px -a.s. for all x D. t
u
If we remove the condition that TB < then we can only guaranty existence but not
uniqueness of the solution to the Dirichlet problem.
Proof. We define
Obviously, f (b) = u(b) for all b B. Also, for x 6 B, since TB 1 Px -a.s. we have that f is
harmonic at x by the Markov property. t
u
The maximum principle for harmonic functions in Rd states that if a non-constant function is har-
monic in a connected open subset of Rd then it will have all its maximal values on the boundary.
(f (x)f (z))Px [TB t, Xt = z] = Ex [(f (x)f (XtTB ))1{TB t,Xt =z} ] Ex [f (x)f (XtTB )] = 0.
c(x, x + 1)
P (x, x + 1) = =p and P (x, x 1) = 1 p.
c(x, x + 1) + c(x 1, x)
First lets prove that the weighted random walk here is transient. For example, recall that it
suffices to show that
X
P0 [Xt = 0] < .
t=0
Well, since at each step the walk moves right with probability p and left with probability 1 p
independently, we can model this walk by
t
X
Xt = k ,
k=1
where (k )k are independent and all have distribution P[k = 1] = p = 1 P[k = 1].
k +1
The usual trick here is to note that 2 Ber(p), so
2t t
P0 [X2t = 0] = P[Bin(2t, p) = t] = p (1 p)t .
t
number of subsets which is 22t . Since for p 6= 1/2, 4p(1 p) < 1, we get that
X X 1
P[Xt = 0] (4p(1 p))t = < .
t=0 t=0
1 4p(1 p)
This is one proof that for p 6= 1/2 the weighted walk is transient.
65
Now, let us consider B = {0} and boundary values u(0) = 1. What is a bounded harmonic
function f : G R such that f is harmonic in G \ B? Well, we can take f 1, which is one
option. Another option is to take f (x) = Px [T0 < ]. But since G is transient, we know that
f 6 1!
Since Px [T0 < ] = Ex [u(0)1{T0 <} ] we see that this is the second solution from above.
However, the uniqueness is only for functions defined on {x : Px [T0 < ] = 1}, so a-priori
there is freedom to choose more than one option for those xs such that Px [T0 < ] < 1. 4 5 4
g(, x) = 1{x=}
Then,
X
f (z) = u(x)1{x=z} = u(z),
x
Then,
gZ (, x) = 1{x=}
First, the Markov property gives that for a fixed y, using h(x) = gZ (x, y),
X X
X
h(x) = 1{x=y} + Px [Xk = y, TZ > k] = 1{x=y} + P (x, w) Pw [Xk1 = y, TZ > k 1]
k=1 k=1 w
X
= 1{x=y} + P (x, w)h(w),
w
so h(x) = 1{x=y} .
The symmetry of gZ is shown as follows: By the definition of the weighted random walk, we
have that cx P (x, y) = cy P (y, x) = c(x, y) for all x y. Thus, for any path in G, (x0 , . . . , xn ),
Random Walks
Ariel Yadin
Also,
X 1
(divF )(x) = (F (x, y) F (y, x)).
c
yx x
We have the duality formula
X X
hf, F i = (f (x) f (y))F (x, y) = f (x)(F (x, y) F (y, x))
(x,y) xy
X X 1
= cx f (x) (F (x, y) F (y, x)) = hf, divF i,
x
c
yx x
1
0 0
(x) and hF, F 0 i = 0
Also, = I P = 21 div.
P P
where hf, f i = x cx f (x)f e c(e) F (e)F (e).
For a path define its reversal by = (|| , ||1 , . . . , 0 ). Also, define F C 0 (E) by F (x, y) =
F (y, x).
We make a few observations:
f (x) f (y).
68
Proof. The first bullet is immediate just reversing the order of the edges in F .
For the second bullet, expanding the sum, we find that for : x y,
I ||1
X
F = f (j ) f (j+1 ) = f (x) f (y).
j=0
For the third bullet, note that for any : x y we have that
I I
f (x) f (y) = f = g = g(x) g(y).
So f (x) g(x) = f (y) g(y) for all x, y, and the difference f g is constant. t
u
Definition 11.2. A function F C 0 (E) is said to respect Kirchhoff s cycle law if for any
H
cycle : x x, F = 0.
Gustav Kirchhoff
Any gradient respects Kirchhoffs cycle law, as shown above. But the converse also holds:
(1824-1887)
Proposition 11.3. F C 0 (E) respects Kirchhoff s cycle law if and only if there exists
f C 0 (V ) such that F = f .
R
In other words, if F respects Kirchhoffs cycle law, then we can define F := f for any f
R
such that f = F , and then all representations of F differ by some constant.
So F = f . t
u
69
Let G = (V, c) be a network. For each edge x y, define the resistance of the edge to be
1
r(x, y) = c(x,y) . Let A, Z G be two disjoint subsets.
If we were physicists, we could enforce voltage 1 on A, voltage 0 on Z, and look at the voltage
and current flowing through the graph G, where each edge is a r(x, y)-Ohm resistor. According
V
to Ohms law, the current equals the potential difference divided by the resistance I = R .
Kirchhoff would reformulate this telling us that the total current out of each node should be 0,
except for those nodes in A Z.
Let us turn this into a mathematical definition. The physics will only serve as intuition (albeit
usually good intuition).
v(x)v(y)
X Note that this has the form I(x, y) = r(x,y) , which is the form of Ohms law.
X For simplicity, we will sometimes extend a flow F to all pairs (x, y) by defining F (x, y) = 0
for x 6 y.
and for x 6 A Z,
X 1 X c(x, y)
divI(x) = (I(x, y) I(y, x)) = 2 (v(x) v(y)) = 2v(x) = 0.
c
yx x yx
cx
Example 11.7. If v is a voltage, and I is the current induced by v, then we have Kirchhoffs
cycle law: for any cycle : x x, = (x = 0 , 1 , . . . , n = x),
n1
X n1
X
I(xj , xj+1 )r(xj , xj+1 ) = v(xj ) v(xj+1 ) = v(x) v(x) = 0.
j=0 j=0
This of course is due to the fact that any derivative v respects Kirchhoffs cycle law. 4 5 4
Then, there exists a voltage v such that I is induced by v. Moreover, if u, v are two such voltages,
then v u = , for some constant .
Since voltages are harmonic functions, it is not surprising that there is a connection between
probability and electric networks. Let us elaborate on this.
Proposition 11.9. Let G = (V, c) be a network. Let {a} , Z be disjoint subsets. Let v be a
voltage such that v(z) = 0 for all z Z, and v(a) 6= 0 arbitrary. Let I be the current induced by
v. Then,
P
I(a,x) ca v(a)
Ceff (a, Z) = x
v(a) = v(a) .
If the component of a in G \ Z is finite, then Ceff (a, Z) = ca Pa [TZ < Ta+ ]. Specifically,
in this case Ceff (a, Z) does not depend on the choice of the voltage.
1
Proof. The first bullet follows from the fact that u = v(a) v is a voltage with u(z) = 0 for all
z Z and u(a) = 1, and v(a)u = v.
For the second bullet, let D be the component of a in G \ Z. We have two harmonic functions
v
on D: u = v(a) and Px [TZ > Ta ], which are 0 on Z and 1 on a. Thus, these functions are equal,
because D is finite. Now,
X 1 X
ca Pa [TZ < Ta+ ] = ca P (a, x)(1 u(x)) = c(a, x)(v(a) v(x))
x
v(a) x
ca v(a)
= = Ceff (a, Z).
v(a)
t
u
Example 11.10. Let G = (V, c) be an infinite network, and let a G. Let (Gn )n be an
S
increasing sequence of finite connected subgraphs of G, that contain a, such that G = n Gn
(in this case we say that (Gn )n exhaust G).
For every n, let Zn = G \ Gn . Note that the connected component of a in G \ Zn is Gn which
is finite. Thus, we can consider the effective conductance from a to Zn , Ceff (a, Zn ). This is a
sequence of numbers, which converges to a limit; indeed, if Ta+ < , since X[0, Ta+ ] is a finite
path, there exists n0 such that for all n > n0 , X[0, Ta+ ] Gn . The events {TZn < Ta+ } form a
decreasing sequence, so
Thus, we see that limn Ceff (a, Zn ) does not depend on the choice of the exhausting subgraphs
(Gn )n , and
454
Definition 11.11. Let G = (V, c) be an infinite network, and let a G. Let (Gn )n be an
S
increasing sequence of finite connected subgraphs of G, that contain a, such that G = n Gn .
Let Zn = G \ Gn .
Define the conductance from a to infinity and resistance from a to infinity as
Ceff (a, ) = lim Ceff (a, Zn ) and Reff (a, ) = Ceff (a, )1 .
n
Theorem 11.12. The weighted random walk in a network G is recurrent if and only if the
resistance from some vertex a to infinity is infinite.
Random Walks
Ariel Yadin
Recall that
Ceff (a, ) = ca Pa [Ta+ = ].
So the effective resistance or conductance to infinity will not help us decide whether (G, c) is
recurrent unless we have a way of simplifying a sequence finite networks Gn .
We will now compute a few operations that will help us reduce networks to simpler ones
without changing the effective conductance between a and Z. Thus, it will give us the ability to
compute probabilities on some networks.
When we wish two differentiate between effective conductance (or resistance) in two networks,
we will use Ceff (a, Z; G) and Ceff (a, Z; G0 ).
Exercise 12.1. Suppose (G, c) is a network with multiple edges. Let (G0 , c0 ) be the
network without multiple edges where the weight c0 (x, y) is the sum of all weights between x
and y in (G, c). That is,
X
c0 (x, y) = c(e).
eE(G),e+ =x,e =y
Then, (G0 , c0 ) is a network without multiple edges, and the weighted random walk on (G0 , c0 )
has the same distribution as the weighted random walk on (G, c).
Specifically, for all a, Z the effective conductance between a and Z does not change.
Solution. This is just the fact that the transition probabilities for (G, c) and (G0 , c0 ) are propor-
tional to each-other:
X
P (x, y) c(e) = c0 (x, y) P 0 (x, y).
e : e+ =x,e =y
74
t
u
c2
x y
c1
x c1 + c2 y
Proposition 12.1 (Series Law). Let (G, c) be a network. Suppose there exists w that has
exactly two adjacent vertices u1 , u2 .
Let (G0 , c0 ) be the network given by V (G0 ) = V (G) \ {w}, and
c(x, y) x, y V (G0 ) , {x, y} =
6 {u1 , u2 }
0
c (x, y) =
1
r(u1 ,w)+r(u2 ,w) + c(u1 , u2 ) {x, y} = {u1 , u2 } .
1
That is, we remove the edges between u1 w and u2 w and add weight c(u1 ,w)1 +c(u2 ,w)1 to
the edge u1 u2 (which may have originally had weight 0).
Then, for any a, Z such that w 6 {a} Z, and such that the component of a in G \ Z is finite,
we have that Ceff (a, Z; G) = Ceff (a, Z; G0 ).
Proof. Let (G0 , c0 ) be a network identical to (G, c) except that c0 (u1 , w) = c0 (u2 , w) = 0 and
c0 (u1 , u2 ) = c(u1 , u2 ) + C. We want to calculate C so that any function that is harmonic at u1 , w
on G will be harmonic at u1 on G0 as well.
Let f : G R be harmonic at u1 , w on G. If f (u1 ) = f (w), then harmonicity at w together
with the fact that w is adjacent only to u1 , u2 , give that f (u1 ) = f (w) = f (u2 ). So the weight
of the edges between u1 , u2 , w does not affect harmonicity of function, and can be changed.
75
f f (w)
Hence, we assume that f (u1 ) 6= f (w). Let h = f (u1 )f (w) . So h is harmonic at u1 , w and
h(w) = 0 and h(u1 ) = 1. Harmonicity at u1 gives that
X
c(u1 , y)(h(u1 ) h(y)) = c(u1 , w)(h(u1 ) h(w)) = c(u1 , w).
y6=w
Harmonicity at w gives
c(u1 , w) c(u2 , w) 1
C= = .
c(u1 , w) + c(u2 , w) r(u1 , w) + r(u2 , w)
1
Thus, we have shown that choosing the weight r(u1 ,w)+r(u2 ,w) as above, we get that if f is
0
harmonic at u1 , w on G, then f is also harmonic at u1 on G . Taking u1 to play the role of u2 ,
the same holds if f is harmonic at u2 and w on G.
Let a, Z be as in the proposition. Let D be the component of a in G \ Z. Let v be a unit
voltage imposed on a and Z in D. Since we chose the weight on u1 u2 in G0 correctly, we get
that v is also a unit voltage imposed on a and Z in G0 .
Because Ceff (a, Z; G) = y v(a, y) and similarly in G0 , and since G \ Z and G0 \ Z only
P
differ at edges adjacent to u1 , u2 and w, we have that Ceff (a, Z; G) Ceff (a, Z; G0 ) = 0 for all
a 6 {u1 , u2 }.
Now, if a = u1 then we have by harmonicity of v at w,
1
Ceff (a, Z; G) Ceff (a, Z; G0 ) = c(a, w)(v(a) v(w)) (v(a) v(u2 ))
r(u1 , w) + r(u2 , w)
c(u1 , w)
= ((c(u1 , w) + c(u2 , w))(v(a) v(w)) + c(u2 , w)(v(a) v(u2 )))
c(u1 , w) + c(u2 , w)
= 0.
t
u
76
Remark 12.2. Note that if w has exactly 2 neighbors in a network (G, c) as above, with resistances
r1 , r2 on these edges, then the network with these two resistors exchanged with a single resistor of
resistance r1 +r2 is an equivalent network in the sense that effective resistances and conductances
do note change, as above.
c1 c2
u1 w u2
1
1
c1
+ c1
2
u1 u2
Example 12.3. What is the effective conductance between a and z in the following network:
a z a z
17/24
1/2
1/3
a 1/2 z a z
1/2
1/2
3/8
1/2
1/2
a z a z
3/2 3/5
77
454
Exercise 12.2. Let (G, c) be a network, and let v be a unit voltage imposed on a and
Z. Suppose x, y 6 {a} Z are such that v(x) = v(y). Define (G0 , c0 ) by contracting x, y to the
same vertex; that is: V (G0 ) is V (G) with the vertices x, y removed and a new vertex xy instead.
All edges and weights stay the same, except for those adjacent to x or y, for which we have
c0 (xy, w) = c(x, w) + c(y, w) for all w.
Then, v is a unit voltage imposed on a and Z in G0 (where v(xy) := v(x) = v(y)), and the
effective conductance between a and Z does not change: Ceff (a, Z; G) = Ceff (a, Z; G0 ).
Solution. Since the only change is at edges adjacent to x and y, we only need to check that for
w = xy or w xy such that w 6 {a} Z, v is harmonic at w in G0 .
For w xy,
X X
c0 (w, u)(v(w) v(u)) = c(w, u)(v(w) v(u)) + (c(w, x) + c(w, y))(v(w) v(xy))
u u6=x,y
X
= c(w, u)(v(w) v(u)),
u
where we have used that v(xy) = v(x) = v(y). So if v is harmonic at w in G then v is harmonic
at w in G0 .
Similarly, for w = xy,
X X X
c0 (xy, u)(v(xy) v(u)) = c(x, u)(v(x) v(u)) + c(y, u)(v(y) v(u)),
u u u
so v is harmonic at xy in G0 . t
u
Example 12.4. What is the effective conductance between a and z in the following network:
78
a z
a z a z
2/3
1/2
a z a z
2 2
1/2
454
Exercise 12.3. Let (G, c) be a network, and let v be a unit voltage imposed on a and Z.
Suppose x, y 6 {a} Z are such that v(x) = v(y). Let c0 be a new weight function on G that is
identical to c except for the edge x y. For x y let c0 (x, y) = C 0 some arbitrary number,
possibly 0. Let 0 be the Laplacian on (G, c0 ).
Then, v is harmonic in G \ ({a} Z) also with respect to c0 . Conclude that the effective
conductance between a and Z is the same in both (G, c) and (G, c0 ).
Solution. Since the difference is only the edge x y, we only need to check that the harmonicity
is preserved at x and y. Because v(x) v(y) = 0,
X
c0z 0 v(z) = c0 (z, w)(v(z) v(w))
w
X
= c(z, w)(v(z) v(w)) + c0 (x, y)(v(x) v(y)) = cz v(z).
w : {z,w}6={x,y}
79
Ceff (a, Z; (G, c0 )) = c0a 0 v(a) = ca v(a) = Ceff (a, Z; (G, c)).
t
u
Example 12.5. The network from the previous example can be reduced by removing the vertical
edge. 454
Exercise 12.4. Let G = (V, c) be a network such that V = Z and x y if and only if
|x y| = 1. For the weighted random walk (Xt )t on G define
t
X
Vt (x) = 1{Xn =x}
n=0
Random Walks
Ariel Yadin
Definition 13.1. For F L2 (E) and for v such that v L2 (E) define the energy of F
and of v by
X X
E(F ) := hF, F i = r(e)F (e)2 and E(v) := hv, vi = c(x, y)(v(x) v(y))2 .
e xy
Lemma 13.2 (Thomsons Principle / Dirichlet Principle). Let G = (V, c) be a finite network,
let A, Z be disjoint subsets.
Joseph John Thomson The unit voltage v is the function that minimizes the energy E(f ) over all functions f with
(1856-1940) f (a) = 1 for all a A and f (z) = 0 for all z Z.
(That is, the Laplacian is self-dual.) Since f v = 0 on A Z, and since v is harmonic off A Z,
we get that (f v)v 0. So,
X
h(f v), vi = hf v, vi = cx (f (x) v(x))v(x) = 0.
x
This implies
(1805-1859)
81
Lemma 13.3 (Thomsons Principle - Dual Form). Let G be a finite network, let {a} , Z be
disjoint subsets. Let v(x) = 12 gZ (x, a) where gZ (x, a) is the Green function (the expected number
of visits to a started at x from time 0 until before hitting Z).
Then, over all flows F from a to Z with flow divF (a) = 1, the energy E(F ) is minimized at
I = v.
Proof. First, we know that v is a voltage on a and Z with v(z) = 0 for all z Z. Also,
divI(a) = 2v(a) = 1.
Let F be a flow from a to Z with divF (a) = 1. Then, F I is a flow from a to Z with
div(F I)(a) = 0. Since div(F I) is 0 off Z, and v is 0 on Z, we get that div(F I)v 0.
Thus,
hF I, Ii = hdiv(F I), vi = 0.
So,
E(F ) = E(F I + I) = E(F I) + E(I) E(I).
t
u
Corollary 13.4 (Rayleighs Monotonicity Principle). Let G be a finite network, and let a be
a point not in a subset Z. Suppose c0 is a weight function on G such that c c0 . Then,
John William Strutt, 3rd
Ceff (a, Z; c) Ceff (a, Z; c0 ).
Baron Rayleigh (1842-1919)
Proof. Let v be the unit voltage imposed on a and Z with respect to c, and let u be the unit
voltage imposed on a and Z with respect to c0 .
Note that
X
E(v) = 2hv, vi = 2 cx v(x)v(x) = 2ca v(a) = 2Ceff (a, Z; c),
x
because v(x) = 0 for x 6 {a} Z, v(z) = 0 for z Z, and v(a) = 1. Similarly, E(u) =
2Ceff (a, Z; c0 ). (This fact is called conservation of energy.)
Since ca c0a , using Thomsons principle,
X X
Ceff (a, Z; c) = 1
2 c(x, y)(v(x) v(y))2 1
2 c(x, y)(u(x) u(y))2
x,y x,y
X
1
2 c0 (x, y)(u(x) u(y))2 = Ceff (a, Z; c0 ).
x,y
t
u
82
Corollary 13.5. Let G be an infinite network. Let c0 be a weight function on G such that
c0 c.
If (G, c) is transient, then also (G, c0 ) is transient.
Proof. Fix a vertex o G. For every n, let Gn be the ball of radius n around o; that is
Gn = {x G : dist(x, o) n} .
So (Gn )n form an increasing sequence of subgraphs that exhaust G. Let Zn = Gn+1 \ Gn , which
is the outer boundary of the ball of radius n. We know that G is transient, which is equivalent
to
lim Reff (a, Zn ; c) <
n
(because imposing a unit voltage on a and G \ Gn is the same as imposing a unit voltage on a
and Zn ). Now, for each fixed n, since c0 c, considering the finite networks (Gn , c) and (Gn , c0 ),
we have that
Ceff (a, Zn ; c) Ceff (a, Zn , c0 ).
Thus,
lim Reff (a, Zn ; c0 ) lim Reff (a, Zn ; c) < ,
n n
0
so (G, c ) is transient. t
u
Exercise 13.1. Let H be a subgraph of a graph G (not necessarily spanning all vertices
of G. Show that if the simple random walk on H is transient then so is the simple random walk
on G.
13.2. Shorting
Another intuitive network operation that can be done is to short two vertices. This can be
thought of as imposing a conductance of between them. Since this increases the conductance,
it is intuitive that this will increase the effective conductance.
Proposition 13.6. Let (G, c) be a finite network. Let b, d G and define (G0 , c0 ) by shorting
b and d: Let V (G0 ) = V (G) \ {b, d} {bd} and c0 (z, w) = c(z, w) for z, w 6 {b, d} and c0 (bd, w) =
c(b, w) + c(d, w).
Then, for any disjoint sets {a} , Z, we have that Ceff (a, Z; G) Ceff (a, Z; G0 ).
83
Proof. Let v be the unit voltage imposed on a and Z with respect to c, and let u be the unit
voltage imposed on a and Z with respect to c0 .
Conservation of energy tells us that
X X
2Ceff (a, Z; c) = c(x, y)(v(x) v(y))2 and 2Ceff (a, Z; c0 ) = c0 (x, y)(u(x) u(y))2 .
x,y x,y
Note that u can be viewed as a function on V (G) by setting u(b) = u(d) = u(bd). Since ca c0a ,
using Thomsons principle,
X X
Ceff (a, Z; c0 ) = 1
2 c0 (x, y)(u(x) u(y))2 + c0 (bd, w)(u(bd) u(w))2
x,yG\{b,d} w
X X X
= 1
2 c(x, y)(u(x) u(y))2 + c(k, w)(u(k) u(w))2
x,yG\{b,d} k=b,d w
X
= 1
2 c(x, y)(u(x) u(y))2
x,yG
X
1
2 c(x, y)(v(x) v(y))2 = Ceff (a, Z; c). t
u
x,yG
Random Walks
Ariel Yadin
Proposition 14.1. Let (G, c) be a network. Let a G and Z G such that the component
of a in G \ Z is finite.
For the weighted random walk on G, (Xt )t , and for any edge x y, let Vx,y be the number of
times the walk goes from x to y until hitting Z; that is,
TZ
X
Vx,y := 1{Xk1 =x,Xk =y} .
k=1
Then,
Ea [Vx,y Vy,x ] = v(x, y) Reff (a, Z),
Proof. Let
Z 1
TX
ca ca
g(x) = gZ (a, x) = Ea 1{Xk =x} = gZ (x, a).
cx cx
k=0
We have already seen that g is harmonic in G \ ({a} Z). Also, g(z) = 0 for all z and
1
g(a) = = ca Reff (a, Z).
Pa [TZ < Ta+ ]
(g is a voltage imposed on a, Z with g(z) = 0 for all z Z.)
Now,
X
Ea [Vx,y ] = Pa [Xk1 = x, Xk = y, TZ > k 1]
k=1
X 1 1
= Pa [Xk = x, TZ > k]P (x, y) = cx g(x)P (x, y) = g(x)c(x, y).
ca ca
k=0
Thus,
1
Ea [Vx,y Vy,x ] = c(x, y)(g(x) g(y)).
ca
g
That is, since v = g(a) is a unit voltage imposed on a, Z, and since c1a g = Reff (a, Z)v,
One intuitive statement is that if e is a cut edge between a and Z, then Reff (a, Z) r(e),
because there is at least some more resistance between a and Z.
Proposition 14.3. Let (G, c) be a finite network. Let {a} , Z be disjoint subsets and let e be
a cut edge between a and Z. Then Reff (a, Z) r(e).
Proof. Suppose that e = (x, y). Let Vx,y be the number of times a random walk crosses the edge
(x, y) until hitting Z and let Vy,x be the number of times the walk crosses y, x before hitting Z.
We have seen that
Ea [Vx,y Vy,x ] = v(x, y) Reff (a, Z),
So 0 v(x) v(y) 1.
Now, since (x, y) is a cut edge between a and Z, we must have that Vx,y Vy,x 1, because
the walk must cross the edge (x, y) and every time it crosses back over (y, x) it must return to
cross (x, y). Thus,
t
u
If is a cut between a and Z then shorting all edges in would result in a cut edge of
P
conductance at most e c(e). A natural generalization of the above is the following.
Lemma 14.4 (Nash-Williams Inequality). Let (G, c) be a finite network, and {a} , Z disjoint
sets. Suppose that 1 , . . . , k are k pairwise disjoint cuts between a and Z. Then,
Crispin Nash-Williams
(1932-2001)
86
1
k
X X
Reff (a, Z) c(e) .
j=1 ej
Proof. Note that since removing edges from a cut-set only increases the right hand side, we
can prove the lemma with the assumption that cut-sets are minimal. Specifically, they do not
contain both (x, y) and (y, x) for an edge x y.
Let v be a unit voltage imposed on a and Z. We know (conservation of energy) that 21 E(v) =
Ceff (a, Z).
For an edge (x, y) let Vx,y be the number of crossings from x to y until hitting Z; that is,
TZ
X
Vx,y = 1{Xk1 =x,Xk =y} .
k=1
Then, for any minimal cut between a and Z, we have that Pa -a.s.
X
Vx,y Vy,x 1.
(x,y)
Since the cuts j are disjoint, and since we assumed that the cut does not contain both (x, y)
and (y, x) (because they are minimal), we have that
1
k
X X 1X
c(e) Reff (a, z)2 c(x, y)(v(x) v(y))2 = Reff (a, Z).
j=1
2 x,y
ej
t
u
87
!1
X X
c(e) = ,
n=1 en
Proof. Fix n. Let Gn be subnetwork induced by (G, c) on the smallest ball (in the graph metric)
Sn
that contains j=1 j . Let Zn = G \ Gn .
So (Gn )n exhaust G and for each fixed n,
1
n
X X
Reff (a, Zn ) c(e) .
j=1 ej
Letting n , the left hand side tends to Reff (a, ) and the right hand side tends to the
infinite sum. Since Reff (a, ) , (G, c) is recurrent. t
u
1/ t d = 1
P0 [X2t = 0] const .
1/t
d=2
However, it will be easier to do this without these calculations (especially the more complicated
Z2 case).
For Z this is easy, because Z is just composed of edges in series, so for any n > 0,
Now for Z2 : By the Nash-Williams criterion, it suffices to find disjoint cutsets (n )n such
that
X 1
= .
n
|n |
Indeed, taking
Random Walks
Ariel Yadin
The Nash-Williams criterion was a sufficient condition for recurrence. We now turn to a
stronger condition which is necessary and sufficient.
Let (G, c) be an infinite weighted graph. Recall that a flow from A to Z is an anti-symmetric
function with vanishing divergence off A Z.
In this spirit, we say that F is a flow from o G to if
F is anti-symmetric.
divF (o) 6= 0 and divF (x) = 0 for all x 6= o.
Theorem 15.1 (T. Lyons, 1983). A weighted graph (G, c) is transient if and only if there
exists a finite energy flow on (G, c) from some vertex o G to .
Terrence Lyons
1
Let vn = 2 gZn (x, o), where gZn is the Green function on the finite network Gn . Since
divvn (o) = 2vn (o) = 1, the dual version of Thompsons principle tells us that E(F )
EGn (F ) EGn (vn ).
Also, since
1 co
vn (o) = 21 gZn (o, o) = = Reff (o, Zn )
2 Po [TZn < To+ ] 2
90
by transience, and divI(x) = 2v(x) = 0 for all x 6= o. That is, I is a flow from o to with
finite energy. t
u
15.2. Flows on Zd
Claim 15.2. Suppose that is infinite and 0 = o -a.s. Suppose also that E (x, y) < for
every edge (x, y).
Then, F (x, y) := E (x, y) E (y, x) is a flow from o to .
Proof. Anti-symmetry is clear. Also, for any x 6= o, since is infinite, it cannot terminate at x.
Thus, every time crosses an edge (y, x) it must then cross an edge (x, z) immediately after.
P
Thus, -a.s. yx V (x, y) V (y, x) = 0 and so divF (x) = 0.
Also, since 0 = o, we get one extra passage out of o, but the rest must cancel: co divF (o) =
P
2 y F (o, y) = 2. t
u
That is, to show a graph is transient, we need to construct a measure on infinite paths, that
start at some vertex, and the expected number of visits to any vertex is finite. If the energy is
finite for such a measure, we have transience.
15.2.1. Wedges. Let us prove something a bit more general than Z3 being transient and Z2
being recurrent.
Let : N N be an increasing function. Consider the subgraph of Z3 induced on
then W is recurrent.
If (n + 1) (n) 1 and
X 1
<
n=1
n((n) + 1)
then W is transient.
Proof. The first direction is simpler. Let W be a wedge, and let Bn denote the ball of radius n
around 0 in the graph metric (which is the L1 distance in R3 ). Let Bn be the edges connecting
Bn to Bnc . Thus, Bn form disjoint cutsets between 0 and .
What is the size of Bn ? Well there are at most 2n choices for x and then, given x there are
at most 2((|x|) + 1) 2((n) + 1) choices for z, which then determines y up to sign. Thus, the
92
|x n| + |y U n| + |z U 0 (n)| 1
Example 15.4. For example, if we choose (n) = n , we get a transient wedge. This is also
true if we take (n) = (log n)2 .
If we choose (n) = 1 we get essentially Z2 and recurrence, of course. Also, (n) = log n will
give a divergent sum, so this wedge is recurrent. 454
Random Walks
Ariel Yadin
Lets wrap up our discussions with some examples of random walks on graphs.
We have already seen that Zd are transient for d 3 and recurrent for d 2. We saw two
different methods to prove this.
The first was brute force computation of P0 [St = 0], using Stirlings formula, and then ap-
proximating E0 [Vt (0)] and E0 [V (0)].
The second method was more robust, and less computational. It involved approximating the
energy of certain flows, mainly taking a uniform direction and following that direction with a
path in the lattice.
Energy estimates and the Nash-Williams inequality can give us better control of the effective
resistance and Green function.
Proof. The lower bound follows by noting that for the sets
all 1 , 2 , . . . , n are cuts between 0 and Zn , with size |n | = O(n). So the Nash-Williams
inequality gives
n n
X 1 X 1
Reff (0, Zn ) const. = const. log n.
|k | k
k=1 k=1
94
For the other direction, let vn (x) = 14 gZn (x, 0). So vn is a voltage imposed on 0 and Zn , with
vn (0) = 1
4 and vn (0) = (4 P0 [T0+ > TZn ])1 = Reff (0, Zn ). Also,
Let U be a uniform random variable in [0, 1], and let L = (n, U n) R2 . Let be some
random monotone path from 0 that always is at distance at most 1 from L. For any edge
e = (x, y) in Z2 , the event e implies that |x n| 1 and nU [y 1, y + 1]. Thus, the
2 2
expected number of times crosses e is at most n |x|1 . Let Fn be the flow given by this
random path restricted to G \ Zn . Since the number of edges with endpoint at distance n from
0 is O(n),
n
X
E(Fn ) O(k k 2 ) = O(log n).
k=1
Recall that divFn (0) = 1/2, so Thompsons principle tells us that for I = vn , since I is a
current with divI(0) = 2vn (0) = 21 ,
t
u
Remark 16.2. If we tried to adapt the argument above to Zd , we would see that the probability
that an edge e at distance n from 0 is in is at most O(n(d1) ) (because we would be looking
at the direction (n, U1 n, U2 n, . . . , Ud1 n) for U1 , . . . , Ud i.i.d. ). Thus,
n
X n
X
Reff (0, Zn ) 21 E(Fn ) O(k d1 k 2(d1) ) = O(k 1d ) = O(n2d ).
k=1 k=1
Similarly the lower bound would follow from the Nash-Williams inequality.
Let Td denote the d regular tree. Fix some vertex Td as the root. For n 0 let
Tn = {x Td : dist(x, ) = n}. It is easy to check that |T0 | = 1 and |Tn | = d(d 1)n1 for
n 1.
For any x, y Tn there exists a graph automorphism : Td Td that maps (x) = y and
fixes each level Tn ; i.e. (Tn ) = Tn . Thus, if vn is a unit voltage imposed on and Tn , we
have that vn is constant on Tk for k n. Thus, all vertices in each level Tk can be shorted into
95
one vertex, without changing the effective resistance Reff (, Tn ). This gives us a network whose
vertices are {0, 1, . . . , n}, and resistances r(k, k + 1) = |Tk+1 |1 . Thus, the effective resistance is
n
1X 1 d1
1 (d 1)n .
Reff (, Tn ) = k1
=
d (d 1) d(d 2)
k=1
d1
Thus, Reff (, ) = d(d2) < so Td is transient for d > 2.
16.2.1. A computational proof. We now give a computational proof that the random walk
on Td is transient for d > 2.
Let (Xt )t be the random walk on Td , and consider the following sequences: Dt := dist(Xt , )
and Mt = (d 1)Dt . Let Tj be the first time Xt Tj .
First, note that
(d 1)dist(x,) = Ex [MT0 ] = 1,
so
d2 (d 1)n1 1
Px [T > Tn ] = = 1 .
d 1 (d 1)n+1 (d 1)n 1
Also, v(x) = Px [T < Tn ] is a unit voltage on and Tn . Thus,
1 X d2
v() = Px [T > Tn ] = .
d d 1 (d 1)n+1
xT1
So
d(d 2) d(d 2) 1
Ceff (, Tn ) = dv() = = 1 (d 1)n ,
d 1 (d 1)n+1 d1
96
Similarly for x = o, there is one more edge exiting o than edges entering o. So we have divF (o) =
2
co .
We conclude:
Conjecture 16.4. Let G be a transitive graph. If the simple random walk on G is transient,
then there exists a measure on infinite paths started from some fixed o G such that for two
independent paths , of law , there exists > 0 with
E[e|| ] < .
97
Random Walks
Ariel Yadin
Let (G, c) be a network. Recall that the transition matrix P is an operator on C 0 (V ) that
operates by P f (x) = y P (x, y)f (y). Also, recall that the space L2 (V ) is the space of functions
P
f C 0 (V ) that admit
X
hf, f i = cx f (x)2 < .
x
One can easily check that P : L2 (V ) L2 (V ). Also, P is a self-adjoint operator, and its norm
admits ||P || 1 (this is called a contraction).
Proposition 17.1. Let (G, c) be a weighted graph with transition matrix P . The limit
Exchanging the roles of x, y and z, w we get that the lim sup does not depend on the choice of
x, y. t
u
Definition 17.2 (Spectral Radius). Let (G, c) be a weighted graph with transition matrix P .
Define the spectral radius of (G, c) to be
One of the reasons for the name spectral radius is that by the Cauchy-Hadamard criterion, the
generating function for the Green function has radius of convergence 1 . That is, the function
X
g(x, y|z) = P n (x, y)z n
n=0
Proposition 17.3. Let (G, c) be a weighted graph with transition matrix P . Then, ||P || =
(P ). Moreover, for any x, y,
r
cy
P n (x, y) (P )n .
cx
Thus, lim supn ||P n f ||1/n (P ) + for any , and so lim supn ||P n f ||1/n (P ).
Now, consider the sequence an = ||P n f ||. We have that
an+1
That is, bn := an is a non-decreasing sequence. Thus, the following limits exist and satisfy
So,
||P f ||
= b0 sup bn (P ).
||f || n
Thus, setting g = f 1S , we have that ||f g||2 < 2 . Now, since g is finitely supported, and
since ||g|| ||f ||,
Taking 0, ||P f || (P )||f ||. Since this holds for all f , we get that ||P || (P ). t
u
Exercise 17.1. Let (G, c) be a weighted graph with transition matrix P . Let (P ) be
the spectral radius. Show that if G is recurrent then (P ) = 1.
17.1.1. Energy minimization. Let (G, c) be a weighted graph. Consider the functions on G
with finite support; i.e. L0 (V ). These all have finite energy. We want to find the function that
minimizes the energy, when normalized to have length 1.
E(f )
1 (G) = inf .
06=f L0 (V ) 2hf, f i
(Sometimes 1 is called the spectral gap. This is the minimal possible energy of unit
length functions.)
101
1
2 E(f ) = hf, f i = hf, f i hP f, f i.
||P f || ||P (f g)|| + ||P g|| ||P || + ||g|| ||P || + ||f ||.
For a graph G, we are interested in how small a boundary of a set can be, compared to the
volume of that set. These serve as bottlenecks in the graph, so a random walk can get stuck
inside for a while. Thus, it makes sense to define the following.
Definition 17.5. Let (G, c) be a weighted graph. Let S G be a finite subset. Define the
(edge) boundary of S to be
S = {(x, y) E(G) : x S , y 6 S} .
P P
Here c(S) = eS c(e) and c(S) = xS cx .
Of course 1 (G) 0 for any graph. When (G) > 0, we have that sets expand: the
edges coming out of a set carry a constant proportion of the weight of the set.
Definition 17.6. Let (G, c) be a weighted graph. If (G, c) = 0 we say that (G, c) is amenable.
Otherwise we call (G, c) non-amenable.
A sequence a finite connected sets (Sn )n such that c(Sn )/c(Sn ) 0 is called a Folner
sequence, or the sets are called Folner sets.
Erling Folner (1919 - 1991) The concept of amenability was introduced by von Neumann in the context of groups and the
Banach-Tarski paradox. Folners criterion using boundaries of sets provided the ability to carry
over the concept of amenability to other geometric objects such as graphs.
The isoperimetric constant is a geometrical object. It turns out that positivity of the isoperi-
metric constant is equivalent to the spectral radius being strictly less than 1.
Exercise 17.2. Let S Td be a finite connected subset, with |S| 2. Show that
John von Neumann
|S| = |S|(d 2) + 2.
(1903-1957)
d2
Deduce that (Td ) = d .
Random Walks
Ariel Yadin
Kesten, in his PhD thesis in 1959 proved the connection between amenability and spectral
radius strictly less than 1. This was subsequently generalized to more general settings by others
(including Cheeger, Dodziuk, Mohar).
Theorem 18.1. A weighted graph (G, c) is amenable if and only if (G, c) = 1. In fact,
2 p
1 1 2 1 .
2
First we require
Lemma 18.2. Let (G, c) be a weighted graph. For any f L0 (V ) (that is, with finite support)
X X
2(G, c) cx f (x) |f (x, y)|.
x x,y
Also,
Z
1{f (x)>tf (y)} dt = |f (x) f (y)|1{f (x)f (y)} .
0
where we have used the fact that all sums are finite because f has finite support. t
u
p
Proof of Theorem 18.1. The leftmost inequality is just 2 /2 1 1 2 , valid for any
[0, 1].
The rightmost inequality follows by taking a sequence of finite connected sets (Sn )n such that
= limn c(Sn )/c(Sn ). Since
First, for f L0 (V ),
X X X
2hf, f i + 2hP f, f i = c(x, y)f (x)2 + c(x, y)f (y)2 + 2 c(x, y)f (x)f (y)
x,y x,y x,y
X
= c(x, y)(f (x) + f (y))2
x,y
For g = f 2 ,
X X
hf, f i = cx g(x) (2)1 c(x, y)|g(x) g(y)|
x x,y
X
= (2)1 c(x, y)|f (x) f (y)| |f (x) + f (y)|.
x,y
Applying Cauchy-Schwarz,
X X
42 hf, f i2 c(x, y)(f (x) f (y))2 c(x, y)(f (x) + f (y))2
x,y x,y
Example 18.3. Lets calculate (Td ), the spectral radius for the d-regular tree.
Let r be the root of Td , and let Tn = {x : dist(x, r) = n}.
For one direction, consider the function
n
X
fn (x) = (d 1)k/2 1{xTk } = 1{1dist(x,r)n} (d 1)dist(x,r)/2 .
k=1
If x y then c(x, y)f (x)f (y) = (d 1)(dist(x,r)+dist(y,r))/2 if 1 dist(x, r), dist(y, r) n, and
0 otherwise. Thus, since |Tk | = d(d 1)k1 ,
X n
X
||fn ||2 = cx (d 1)dist(x,r) = d(d 1)k1 d (d 1)k
x : 1dist(x,r)n k=1
= d2 (d 1)1 n.
Simlarly,
2(d1)1/2
d (d 1)dist(x,r)/2 2 dist(x, r) n 1,
1/2
(d1) (d 1)dist(x,r)/2
dist(x, r) {1, n} ,
d
P fn (x) =
(d 1)1/2 x = r,
1 (d 1)n/2
dist(x, r) = n + 1,
d
Random Walks
Ariel Yadin
Lecture 19:
Let (G, c) be a weighted graph and let (Xt )t be the corresponding weighted random walk. In
the exercises one shows that the limit
Ex [dist(Xt , X0 )]
lim
t t
exists for transitive graphs, and is independent of the choice of starting vertex x. We call this
the speed of the random walk. For general graph this limit may not exist, so we consider lim inf
and lim sup of the sequence. Of course, these limits lie between 0 and 1.
Definition 19.1. Let (G, c) be a weighted graph and let (Xt )t be the corresponding weighted
random walk. Fix some o G. The lower speed and upper speed are defined to be
Eo [dist(Xt , X0 )] Eo [dist(Xt , X0 )]
lim inf and lim sup .
t t t t
If these limits coincide, we call the corresponding limit the speed.
d2 d2
= dist(Xt , o) + + 1{Xt =o} 1 dist(Xt , o)
d d
d2 2
= dist(Xt , o) + + 1{Xt =o} ,
d d
where we have used that dist(Xt , o)1{Xt =o} = 0. Thus,
d2 2 d2 2
Eo [Mt+1 | Ft ] = dist(Xt , o) + + 1{Xt =o} (t + 1) Lt
d d d d
d2 2
= dist(Xt , o) t Lt1 = Mt .
d d
108
It is not a coincidence that Td has positive speed. In fact, this has to do with the fact that
(Td ) < 1, or that Td is non-amenable.
Theorem 19.3. Let (G, c) be a weighted graph, and let (Xt )t be the corresponding random
walk started at some o G. Assume the following:
(G, c) < 1.
There exists M > 0 such that cx M for all x (i.e. cx is uniformly bounded).
The limit
b := lim sup |B(o, r)|1/r <
r
there exists some universal constant K > 0 such that |B(o, r)| Kr for all r. Because cx is
p
uniformly bounded, we have that K can be chosen large enough so that K > M/co . Thus, for
all x and all t, P t (o, x) Kt . Combining these two bounds we get that
X
P[dist(Xt , o) t] P t (o, x) K 2 t t .
xB(o,btc)
Since < 1, we get that these probabilities are summable. By Borel-Cantelli, we have that
Taking log
log b completes the proof. t
u
109
log
Eo [lim inf t1 dist(Xt , o)] lim inf t1 Eo [dist(Xt , o)].
log b
So, non-amenable graphs have positive (lower) speed.
Example 19.4. For all d, the random walk on Zd has zero speed:
In fact, we show that for a random walk (Xt )t on Zd , E0 [dist(Xt , 0)] t1/2 .
Consider the j-th coordinate Xt (j).
1 1
E0 [Xt+1 (j)2 | Ft ] = (Xt (j) + 1)2 + (Xt (j) 1)2 + 1 Xt (j)2 = Xt (j)2 + .
1
2d 2d d
t
is a martingale, and 0 = E0 [Xt (j)2 ] dt . So E0 [|Xt (j)|] t/d, and
p
Thus, Mt = Xt (j)2 d
E0 [dist(Xt , 0)] dt.
Also, note that we can write
t
X
Xt (j) = k
k=1
where (k )k are i.i.d. random variables with P[k = 1] = P[k = 1] = 1/2d and P[k = 0] =
1 1/2d. Since E[k ] = 0 and Var[k ] = E[k2 ] = 1/d, we get by the central limit theorem that
1/2
dt Xt (j) converges in distribution to a standard normal random variable, N (0, 1). So
lim P0 [ d|Xt (j)| t 12 ] = P[|N (0, 1)| 12 ] := c > 0.
t
Thus,
1 1 1
1/2 t c
lim inf E0 [|Xt (j)|] lim inf P0 [|Xt (j)| 2 td ] = ,
t t t t 2 d 2 d
and so
1 c
lim inf E0 [dist(Xt , 0)] d.
t t 2
454
Since many interesting graphs have zero speed, we sometimes are interested in a bit more
precision.
Definition 19.5. Let (G, c) be a weighted graph and let (Xt )t be the corresponding weighted
random walk. Fix some o G. The lower escape exponent and upper escape exponent
are defined to be
log Eo [dist(Xt , X0 )] log Eo [dist(Xt , X0 )]
lim inf and lim sup .
t log t t log t
If these limits coincide, we call the corresponding limit the escape exponent.
110
Example 19.6. Td has escape exponent 1. In fact any graph with positive speed has escape
exponent 1. (This is immediate from log Eo [dist(Xt , X0 )] = log( 1t Eo [dist(Xt , X0 )]) + log t.)
454
Speed exponent 1/2 plays an important role in the theory. Walks with speed exponent 1/2
are called diffusive. Walks with speed exponent < 1/2 (resp. > 1/2) are called sub-diffusive
(resp. super-diffusive). Walks with speed exponent 1 (i.e. positive speed) are called ballistic.
Proof. First,
Now, let (Xt )t be a random walk on Gd and let Xt (j) be the j-th coordinate of Xt . Note
that (Xt (j))t is a lazy random walk on G with holding probability 1 d1 . Then,
X
dist(Xt , X0 ) = dist(Xt (j), X0 (j)),
j
the transition matrix for (Yt )t is Q = pI + (1 p)P . Let f (x) = dist(x, o). We have that
t
t
X t
Q = (1 p)k ptk P k ,
k
k=0
so
X
Eo [dist(Yt , o)] = Qt (o, x)dist(x, o) = (Qt f )(o)
x
t t
X t k tk k
X t
= (1 p) p P f (o) = (1 p)k ptk Eo [dist(Xk , o)].
k k
k=0 k=0
Now, for any > 0 there exists K such that for all k > K ,
k Eo [dist(Xk , o)] k + .
t
(1 p)k ptk = P[Bt = k]. By Chebychevs inequality,
Let Bt Bin(t, 1 p), and let qk = k
4 Var[Bt ] 4p
P[|Bt (1 p)t| > 12 (1 p)t] 2 2
= ,
(1 p) t (1 p)t
so
4p
P[Bt 21 (1 p)t] 1 1.
(1 p)t
Hence, for > 0, for all large enough t (so that (1 p)t > 2K ),
t
X
qk Eo [dist(Xk , o)] P[Bt 21 (1 p)t] 1
2 (1 p)t ,
k=0
Random Walks
Ariel Yadin
Lecture 20:
We have already seen that non-amenable graphs must have positive speed and so escape
exponent 1. Non-amenable graphs are also transient, because their spectral radius is strictly less
than 1. The converse of these statements does not hold.
Figure 5 sums up the situation (for graphs) in terms of speed, amenability and transience.
non-amenable
positive speed
Td
(Ackermann)
LL(Z 3 )
LL(Z)
zero speed
= (( log2 k)k )
3
sub- Z
Z3 diffusive
transient recurrent
We will now construct a special class of graphs called lamp-lighter graphs. These are used
to give many examples in geometric group theory. They will provide us with examples of
(exponential volume growth) amenable graphs with positive speed.
Let us describe the construction in words, before the formal definition. We start with any
graph G (finite or infinite). This is the base graph. Suppose some lamp-lighter walks around
on the graph G. At every site of G there is some lamp, whose state is either on or off. The
lamp-lighter walks around and can also change the state of the lamp at her current position -
changing it either to on or to off.
What is a position in this new space? A position consists of the configuration of all lamps on
G, that is, a function from G to {0, 1} and the position of the lamp-lighter, i.e. a vertex in G.
Definition 20.1 (Lamp-Lighter Chain). Let P be a Markov chain on state space S. We define
the Markov chain LL(P ), called lamp-lighter on P , as follows.
S S
The state space for LL(P ) is LL(S) := S ({0, 1} )c , where ({0, 1} )c is the set of : S
{0, 1} with finite support (i.e. 1 (1) is finite). For a state (x, ) LL(S), we call x the position
of the lamp-lighter. If (y) = 1 we say the lamp at y is on, and if (y) = 0 we say it is off.
S S
For a lamp configuration ({0, 1} )c and a position x S we define x {0, 1} by
x x
(y) = (y) for all y 6= x and (x) = (x) + 1 (mod 2).
Define the transition matrix LL(P ) by setting
1
LL(P )((x, ), (y, )) = P (x, y),
4
for {, x , y , ( x )y } and 0 otherwise.
If (G, c) is a weighted graph, the LL(G) = LL(P ) for P the weighted random walk on (G, c).
X Note that the chain LL(P ) evolves as follows: At each step, the lamp-lighter chooses a
neighbor of her current position with distribution P (x, ) and moves there, then she refreshes
the state of the lamps at the old position and at the new position to on or off with probability
1/2 each, independently.
Remark 20.2. If G is a graph, then LL(G) defines a graph structure as well. P ((x, ), (y, )) > 0
if and only if P (x, y) > 0 and {, y }. So the graph structure on LL(G) is given by the
relations (x, ) (y, ) for {, x , y , ( x )y }.
In fact:
114
Exercise 20.1. Suppose that (G, c) is a weighted graph, and P is the transition matrix
of the weighted random walk on G. Show that LL(P ) is given by a weighted random walk on a
G
weighted graph whose vertices are (x, ), x G, ({0, 1} )c . What is the weight function on
this graph?
G
Exercise 20.3. Let G be a graph, and let L = LL(G). Let o G and let 0 {0, 1}
denote the all zero function (configuration). Then, for any (x, ) L,
The next example is an (exponential volume growth) amenable graph, but with positive speed.
Ar = (x, ) L : x Br , 1 (1) Br .
Note that |Ar | = |Br |2|Br | . Also, ((x, ), (y, )) Ar if and only if (x, y) Br and
{, y , x , ( x )y }. Thus, |Ar | = 4|Br |2|Br | . Thus, since the degree in L is 12,
|Ar | 4|Br |
(L) inf = inf = 0,
r 12|Ar | r 6|Br |
and so L is amenable.
Next we show that L has positive speed. Let 0 denote the all 0 lamp configuration, and let
o = (0, 0) L. Let (Xt , t ) be a random walk on L. We claim that for any z 6= 0,
1
(20.1) Po [t (z) = 1] = Po [Tz+ t],
2
115
where Rt = {X1 , . . . , Xt }. Since (Xt )t is a random walk on Z3 , we are left with showing that
3
limt t1 EZ0 [|Rt |] > 0. In fact, we have using Exercise 20.4 below,
3
EZ0 [|Rt |] 3
PZ0 [T0+ = ] > 0.
t
We turn to proving (20.1). Let (y0 , 0 ), . . . , (yn , n ) be a path in L. Let T = inf {t : yt = z},
(where inf = ). Define a new path
Since L is a regular graph, both paths have the same probability. Summing over all possible
paths, we get that for any k t,
So
1
Po [t (z) = 1 | Tz+ = k] = Po [t (z) = 0 | Tz+ = k] = .
2
Thus,
t t
X 1 X 1
Po [t (z) = 1] = Po [t (z) = 1, Tz+ = k] = Po [Tz+ = k] = Po [Tz+ t],
2 2
k=1 k=1
Reflecting around 0,
So
We conclude with
X
X
2 E[|Xt |] = 2 P0 [|Xt | x + 1] = 2 P0 [Xt x + 1] + 2 P0 [Xt (x + 1)]
x=0 x=0
X
P0 [Mt x + 1] = E0 [Mt ].
x=0
117
So E0 [Mt ] 2 E[|Xt |] 2 t. Thus,
Eo [dist(Xt , o)] 8 t.
454
Random Walks
Ariel Yadin
Lecture 21:
Our next goal is to complete the picture in Figure 5; that is to give examples of graphs that
are transient, but have very slow speed (sub-diffusive), and examples of graphs that are recurrent
but have positive upper speed.
Let (Xt )t be a random walk on Z. We know (using the martingale |Xt |2 t) that E0 [T{r,r} ] =
r2 . That is, it takes a random walk r2 steps to reach distance r. We have already seen that this
implies diffusive behavior of the walk.
Let us prove a short concentration result, showing that actually T{r,r} is close to r2 with
very high probability.
Theorem 21.1 (Azumas Inequality). Let (Mt )t be a (Ft )t -martingale with bounded incre-
ments (i.e. |Mt+1 Mt | 1 a.s.). Then for any > 0,
2
P[Mt M0 ] exp .
2t
Proof. There are two main ideas:
The first idea, is that for a random variable X with E[X] = 0 and |X| 1 a.s. one has
2
E[eX ] e /2
. Indeed, f (x) = ex is a convex function, so for |x| 1 we can write x =
x+1
1 + (1 ) (1), where = 2 , so
ex e + (1 )e = cosh() + x sinh().
For the second idea, due to Sergei Bernstein, one applies the Chebychev / Markov inequality
Segei Bernstein (1880-1968) to the non-negative random variable eX , and then optimizes on .
119
In our case: For every t, since E[Mt Mt1 | Ft1 ] = 0 and |Mt Mt1 | 1, exactly as
above we could show that
2
E[e(Mt Mt1 ) | Ft1 ] e /2
a.s.
Thus,
2 2
e /2
E[e(Mt1 M0 ) ] et /2
.
Now apply Markovs inequality to the non-negative random variable e(Mt M0 ) to get
Let (dk )kN be a sequence of positive numbers. For each k, let k be a binary tree of depth
dk .
Define the graph ((dk )k ) to be the graph N, with the tree k glued at the vertex k N (let
S
the root of k be k N); that is, the vertex set of ((dk )d ) is k=0 V (k ). The edges are those
in each k , with the edges k k + 1 for all k added. We call this the (dk )k -grove.
d4
d3
d2
d1
d0
0 1 2 3 4
Proof. If v is a unit voltage on 0 and k , then for any n k, and any vertex x n we have
that v(x) = v(n). Indeed, if (Xt )t is a random walk on this graph, then because n is finite,
Px -a.s. the hitting time of n is finite, and also, v(Xt ) is a bounded martingale. Thus, by the
optional stopping theorem, v(x) = Ex [v(XTn )] = v(n).
Thus, we can short together all vertices in each tree n , n k. This results in the network
which is just the graph N. Thus, Reff (0, n ) = n . t
u
Pd
Recall that if is a finite binary tree of depth d, then |E( )| = |V ( )| 1 = k=0 2k 1 =
2d+1 2.
Lemma 21.4. Let r N ((dk )k ). The hitting time of r, Tr , has expectation given by
r1
X 3
E0 [Tr ] = 4 (r k)2dk r(r + 1).
2
k=0
Proof. Every time the walk as at a vertex k N, with probability 1/2 it starts a random walk
in the finite subtree k . The expected time to return to the root in a finite tree is the reciprocal
of the stationary distribution on that tree. Thus, we have
1 1 1
Ek [Tk+1 ] = 1 + Ek1 [Tk+1 ] + (k + Ek [Tk+1 ])
4 4 2
1 1 1 1 1
= + Ek1 [Tk ] + Ek [Tk+1 ] + k + Ek [Tk+1 ].
4 4 4 2 2
121
2
Similarly, E0 [T1 ] = 3 (0 + E0 [T1 ]) + 31 , so E0 [T1 ] = 20 + 1. Thus,
k
X k
X
Ek [Tk+1 ] = 2 j + k + 1 = 2dj +2 3(k + 1),
j=0 j=0
and
r1
X r1
X r1
X
E0 [Tr ] = Ek [Tk+1 ] = (r k)2dk +2 3 k+1
k=0 k=0 k=0
r1
X 3
= (r k)2dk +2 r(r + 1)
2
k=0
t
u
Exercise 21.1. Let be a finite binary tree of depth d with root o. Then,
The next theorem gives an example of a tree with speed exponent for any 1/2.
Theorem 21.5. Let dk = b log2 (k + 1)c. The tree ((dk )k ) has speed exponent ( + 2)1 .
Proof. Let (Xt )t be a random walk on = ((dk )k ). For x denote |x| the number that is
the root of the unique finite subtree k such that x k . So |x| dist(x, 0) |x| + d|x| . So it
suffices to prove that
log E0 [|Xt |]
lim = ( + 2)1 .
t log t
The lower bound is simpler. Note that
That is, (n )n are the subsequent times the random walk moves from a vertex in N to a new
vertex in N. For every 0 < k N
1
Pk [X1 = k + 1 | X1 N] = Pk [X1 = k 1 | X1 N] = .
2
So the sequence (Zn = Xn )n is a random walk on N.
Now, if the walk is at a vertex k N, then with probability 1/2 it performs an excursion into
the finite subtree k , and with remaining probability 1/2 is moves in N. Thus, by the exercise
above,
P0 [n+1 > n + 2dk Zn = k, Fn ] (4e)1
a.s.
Let x = r, y = 2r, z = 3r. Let N < M be such that N = Ty and M = inf {m > N : Zm {x, z}}.
For n N let Jn = 1{n+1 >n +2dx } , and let S = {N n < M : Jn = 1}. Since dk dx for all
k [x, z], we have from the above, that for any set A {0, 1, . . . , k 1},
k|A|
P0 [S A + N | M N k] 1 (4e)1 .
Thus, for any < K, the event |S| < can be bounded by
P0 [|Xt | > 3r] P0 [T3r < t] P0 [|S| < ] exp(((log r)2 )).
Since r2 (log r)1 and 2dx r . we get that t r2+ (log r)1 , and
So
log E0 [|Xt |] 1
lim sup ,
t log t +2
which coincides with our lower bound. t
u
Example 21.6. We now have an example of a transient sub-diffusive graph. Let = ((dk )k )
be the grove for dk = b log2 (k + 1)c. Let G = 3 .
We know that as a graph power G has speed exponent ( + 2)1 . However, since N is a
subgraph of , then also N3 is a subgraph of G. We know that N3 is transient, so G must be
transient as well. 454
Raymond Paley
Lemma 21.7. There exists a universal constant p > 0 such that the following holds. Let be
Antoni Zygmund
a finite binary tree of depth d with root o. For any t d,
(1900-1992)
Po [dist(Xt , o) 16 t] p,
Pt
Proof. Let Dt = dist(Xt , o). We have already seen that for Lt := k=0 1{Xt =o} (L1 = 0) and
Mt = Dt 13 t Lt1 that (Mt )dt=0 is a martingale (the restriction to t d is so that the walk
does not reach the leaves). Thus, for t d,
Eo [Dt ] = 13 t + Eo [Lt1 ].
Also, for t d,
2 2
= Dt1 +1+ Dt1 .
3
So
t1
2 2X1
Eo [Dt2 ] = Eo [Dt1
2
]+1+ E0 [Dt1 ] = = t + 3 k + Eo [Lk1 ].
3 3
k=0
Note that for t d, we have that Lt is the number of visits to the root up to time t. Let q be
the probability that a random walk on an infinite rooted binary tree does not return to the root.
Then, if A is the set of leaves in then Po [TA < To+ ] q. However, since Po -a.s. t d TA ,
we get that for any t d,
1 1
1 Eo [Lt ] Eo [LTA ] = + < .
Po [TA < To ] q
Thus we conclude that
1 1
Eo [Dt2 ] t + t(t 1) + t.
9 q
We now use the Payley-Zygmund inequality to conclude that for any t d,
1 2
1 (Eo [Dt ])2 1 9t
Po [Dt Eo [Dt ]] 2 1 2 41 .
2 4 Eo [Dt ] 4 9 t + ( 98 + 1q ) t
Example 21.8. We complete the picture in Figure 5 by giving an example of a recurrent graph,
but with positive speed.
Recall that for the (dk )k grove, the expected time to reach the vertex r N is
r1
X 3
E0 [Tr ] = 4 (r k)2dk r(r + 1).
2
k=0
(This sequence must grow super-fast, at least like the Ackermann tower function.) Note that
this ensures that dr > d4 E0 [Tr ]e. Consider the (dk )k -grove = ((dk )k ).
is of course recurrent.
3
Recall that for a random walk (Xt )t on and for t 4 E0 [Tr ], we have that P0 [|Xt | < r] 4
Pr [dist(Xt , r) 16 t] c > 0,
for some universal constant c > 0. So if we take t = 2 t0 for t0 = d4 E0 [Tr ]e, then t0 < dr so
0
t
1 0
X 1
P0 [dist(Xt , 0) 6t ] P0 [|Xt0 | = k] Pk [dist(Xt , k) 16 t0 ] c.
4
k=r
So
1 1 0
E0 [dist(Xt , 0)] c t t.
4 6
And so has positive speed. 454
Random Walks
Ariel Yadin
Lecture 22:
The final topic for this course is a special Markov chain on trees, known as the Galton-Watson
process.
Francis Galton (1822-1911) Galton and Watson were interested in the question of the survival of aristocratic surnames in
the Victorian era. They proposed a model to study the dynamics of such a family name.
In words, the model can be stated as follows. We start with one individual. This individual
has a certain random number of offspring. Thus passes one generation. In the next generation,
each one of the offspring has its own offspring independently. The processes continues building
a random tree of descent.
Henry Watson (1827-1903)
The formal definition is a bit complicated. For the moment let us focus only on the population
size at a given generation.
P
Definition 22.1. Let be a distribution on N; i.e. : N [0, 1] such that n (n) = 1. The
Galton-Watson Process, with offspring distribution , (also denoted GW ,) is the following
Markov chain (Zn )n on N:
Let (Xj,k )j,kN be a sequence of i.i.d. random variables with distribution .
Example 22.2. If (0) = 1 then the GW process is just the sequence Z0 = 1, Zn = 0 for all
n > 0.
If (1) = 1 then GW is Zn = 1 for all n.
How about (0) = p = 1 (1)? In this case, Z0 = 1, and given that Zn = 1, we have
that Zn+1 = 0 with probability p, and Zn+1 = 1 with probability 1 p, independently of all
(Zk : k n). If Zn = 0 the Zn+1 = 0 as well.
127
What is the distribution of T = inf {n : Zn = 0}? Well, on can easily check that T Geo(p).
So GW is essentially a geometric random variable.
We will in general assume that (0) + (1) < 1, otherwise the process is not interesting.
454
t
u
Thus,
GZn = G(n)
= G G .
k=1 k=1
GZn+1 (z) = E[z Zn+1 ] = E[G (z)Zn ] = GZn (G (z)) = GZn G (z).
since GZ1 = G . t
u
22.3. Extinction
Recall that the first question we would like to answer is the extinction probability for a GW
process.
Let (Zn )n be a GW process. Extinction is the event {n : Zn = 0}. The extinction
probability is defined to be q = q(GW ) = P[n : Zn = 0]. Note that the events {Zn = 0}
form an increasing sequence, so
Proposition 22.5. Consider a GW . (Assume that (0) + (1) < 1.) Let q = q(GW ) be the
extinction probability and G = G . Then,
q is the smallest solution to the equation G(z) = z. If only one solution exists, q = 1.
Otherwise, q < 1 and the only other solution is G(1) = 1.
q = 1 if and only if G0 (1) = E[X] 1.
129
X Positivity of the extinction probability depends only on the mean number of offspring!
Proof. If P[X = 0] = G(0) = 0 then Zn Zn1 for all n, so q = 0, because there is never
extinction. Also, the only solutions to G(z) = z in this case are 0, 1 because G00 (z) > 0 for z > 0
so G0 is strictly convex, and thus G(z) < z for all z (0, 1). So we can assume that G(0) > 0.
Let f (z) = G(z) z. So f 00 (z) > 0 for z > 0. Thus, f 0 is a strictly increasing function.
Also, f (0) = (0) > 0, and because f is continuous, there exists a 0 < p < x such that
f (p) = 0.
We claim that p, 1 are the only solutions to f (z) = 0. Indeed, if a < b are any such
solutions, then because f is strictly convex, for any a < z < b, f (z) < f (a) + (1
)f (b) = 0 for some (0, 1).
In conclusion, in the case G0 (1) > 1 we have that there are exactly two solutions to
G(z) = z, which are p and 1.
Moreover, p < x for x the unique minimum of f , so because f 0 is strictly increasing,
q
G0 (1) > 1
G0 (1) 1
0 1 0 1
Figure 7. The two possibilities for G0 (1). The blue dotted line and crosses
show how the iterates G(n) (0) advance toward the minimal solution of G(z) = z.