Cours GA Online

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Random graphs

Quodlibet
(preliminary version)

N.C.

September 28, 2020


Contents

1 Basic on percolation 3
1.1 Definition, clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Phase transition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 pc > 0 for bounded degrees graphs . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 d-ary tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.3 Cubic lattice Z2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Mean-field regime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

I Galton–Watson trees 11

2 One-dimensional random walks 12


2.1 General theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.1 Reminder on Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.2 Asymptotic behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Walks with finite mean and the LLN . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Recurrence/transience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.2 Wald’s equality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Chung-Fuchs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.2 Local central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3 Skip-free random walks 22


3.1 Duality and the cycle lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.1 Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.2 Cycle lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Kemperman’s formula and applications . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Kemperman’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2 Simple symmetric random walk . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.3 Poisson random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3 Ballot theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3.1 Ballot theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

1
3.3.2 Staying positive forever . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4 Parking on the line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.1 The model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.2 Parking probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4 Galton-Watson trees 33
4.1 Plane trees and Galton–Watson processes . . . . . . . . . . . . . . . . . . . . . . 33
4.1.1 Plane trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.1.2 Galton–Watson trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Lukasiewicz walk and direct applications . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.1 Lukasiewicz walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Lukasiewicz walk of a Galton–Watson tree . . . . . . . . . . . . . . . . . . 36
4.2.3 Lagrange inversion formula . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.3 Probabilistic counting of trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.1 Prescribed degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.2 Uniform trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.3 Poisson Galton–Watson trees and Cayley trees . . . . . . . . . . . . . . . 43
4.4 Contour function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

2
Chapter I: Basic on percolation
An appetizer

In this introductory chapter we present the model of (Bernoulli bond) percolation. This
famous model enables us to generate a random graph from a deterministic graph by keeping
edges at random. It is sometimes used to model porous media. The random graphs studied in
part I (random Galton–Watson trees) and in part II (Erdös-Renyi random graphs) can be seen,
at least in some cases, as a percolation model on a particular graph. Our goal here is only to
present the main features of the model focusing on the existence of the phase transition for the
existence of any infinite cluster.

1.1 Definition, clusters


Graph. Let g be a graph 1 , i.e. a pair g = (V(g), E(g)), where V = V(g) is the set of vertices of
g and E = E(g) is the set of edges of g which is a multiset over V × V i.e. where repetitions are
allowed. The graph is simple if they are no multiple edges nor loops. If x, y ∈ V and {x, y} ∈ E

2
1

3 4

Figure 1.1: An example of a graph g = (V, E) with vertex set V = {1, 2, 3, 4} and
edge set E = {{{1, 1}, {1, 2}, {1, 2}, {1, 3}, {3, 2}}}. The vertex degrees of 1, 2, 3, 4 are
respectively 5, 3, 2, 0.

we say that x and y are neighbors and that x and y are adjacent to the edge {x, y}. The degree
of a vertex x ∈ V denoted by deg(x) is the number of half-edges adjacent to x, otherwise said it
is the number of edges adjacent to x where loops are counted twice. The graph distance on g is
denoted by dggr or dgr when there is no ambiguity and is defined by

dgr (x, y) = minimal number of edges to cross to go from x to y.


1
more formally, a non-oriented multi-graph

3
By convention we put dgr (x, y) = ∞ if there is no path linking x to y in g. The equivalence
classes from the relation x ∼ y ⇐⇒ dgr (x, y) < ∞ are the connected components of g. We say
that g is connected if it has only one connected component. Unless explicitly specified, we shall
always suppose that E(g) is finite or countable and that g is locally finite i.e. that the vertex
degrees are all finite (no vertices of infinite degree).
We record here a useful property known as König’s lemma (which in this generality uses a
weak version of the axiom of choice, but in all our applications, this axiom is actually superflu-
ous):

Proposition 1.1 (König’s lemma). Let g = (V, E) be a locally finite connected graph. Then the
following propositions are equivalent

(i) E(g) is infinite (countable),

(ii) V(g) is infinite (countable),

(iii) For every v0 ∈ V , there is a self-avoiding path (a path with no repeated vertices) starting
from v0 and of infinite length,

(iv) For every v0 ∈ V and any n > 1, there is a self-avoiding path starting from v0 and of
length n.

Proof. Left to the reader.

Percolation. Once g is fixed, for any p ∈ [0, 1], one can define a random graph Perc(g, p), called
the (Bernoulli bond) percolation model on g with parameter p whose vertex set is V(g) and
where each edge e ∈ E(g) is kept independently of each other with probability p. The edges kept
are called “open” and those discarded are called “closed” (in agreement with the porous medium
interpretation of the model).
Obviously, for each p ∈ [0, 1], the random graph Perc(g, p) is a subgraph of g which is bigger
as p increases. To make this statement formal, it is useful to couple i.e. to realize on the same
probability space, all graphs Perc(g, p) for p ∈ [0, 1]. A natural way to do this is to consider
a probability space (Ω, F, P) which supports i.i.d. random variables (Ue : e ∈ E(g)) which are
uniformly distributed on [0, 1] –recall that we suppose that E(g) is at most countable–. It is now
clear that if we set  
Perc(g, p) = V(g); {e ∈ E(g) : Ue 6 p} ,
then for each p, the random graph Perc(g, p) is indeed distributed as a percolation on g with
parameter p and furthermore p 7→ Perc(g, p) is increasing (for the inclusion of edges).
Remark 1.1 (Application to global cooling). The increasing coupling of (Perc(g, p) : p ∈ [0, 1])
can be imagined as follows: each edge e of g possess a random ‘height” given by 1 − Ue . At
p = 0, the sea level is at height 1 and all edges of the graph are invisible, covered by water. As
p increases, the sea level goes down, so that the edges high enough are now visible. New islands
appear, and the process stops when the sea level is at 0, we then see the entire graph g.

4
The clusters of Perc(g, p) are its connected components.

1.2 Phase transition


In this section, we focus on graphs g which are infinite and connected. Much of the theory
of percolation is focused on the existence of large clusters in Perc(g, p). More precisely, if g
is an infinite connected graph, one can ask whether for some parameter p, the random graph
Perc(g, p) admits an infinite cluster. More precisely, the function

p 7→ P(Perc(g, p) contains an infinite cluster),

is easily seen to be increasing using our coupling. Since the event {∃ infinite cluster in Perc(g, p)}
is independent of the status of any finite number of edges, it has probability 0 or 1 by Kolmogorov
0 − 1 law. We say that there is a phase transition, if this probability does not dependent trivially
on p.
Definition 1.1 (pc and phase transition). We define the critical parameter pc (g) as

pc (g) = inf{p > [0, 1] : P(Perc(g, p) has an infinite cluster) = 1}.

If pc (g) ∈ (0, 1) we say that there is a non trivial phase transition for percolation on g.
Knowing whether or not there is an infinite cluster at the critical threshold pc , depending on
the geometry of the underlying graph is one of the main open question in the area.
Remark 1.2. The terminology “phase transition” comes from the fact that a slight variation of
the parameter p induces dramatic changes in the random graph Perc(g, p). This can be used
to model physical phase transitions (such as the transformation of water into ice when the
temperature drops below 0◦ C).
Example 1.1. For example the percolation on the line graph on Z has no phase transition since
pc (line) = 1 and similarly on the graph made of a star with infinite degree (though not locally
finite) we have pc (star) = 0.
In the rest of this section we shall prove the existence of a non-trivial phase transition for
percolation on two infinite graphs: the infinite binary tree and the cubic planar lattice. We shall
use the so-called first and second moment methods which are (sometimes subtle) application of
Markov and Cauchy-Schwarz inequalities and which will accompany us all along this coure.

1.2.1 pc > 0 for bounded degrees graphs


Our first proposition shows that the critical parameter must be positive as long as the underlying
graph g has bounded degree.

Proposition 1.2. Let g be a graph such that

max deg(v) 6 M.
v∈V(g)

5
Then we have P(∃ infinite cluster in Perc(g, p)) = 0 as long as p(M − 1) < 1.

Proof. This lower bound follows the principle of the first moment method. Let us consider a
reference vertex v0 in g and let X(p) = 1 if there is an infinite cluster in Perc(g, p) containing
v0 and 0 otherwise. Our goal is to show that X(p) = 0 for p small. For this, we shall use proxy
random variables Xn (p) counting the number of self-avoiding paths starting from v0 of length n
and made of open edges in Perc(g, p). Clearly, since the degree of each vertex in g is bounded
above by M , there is at most M · (M − 1)n−1 non-backtracking paths of length n starting from
v0 in g. Since there are more non-backtracking paths than self-avoiding paths, by independence
of the status of the edges we have

E[Xn (p)] 6 M · (M − 1)n−1 · pn .

A simple compactness argument (usually called König’s lemma) shows that v0 is in an infinite
cluster if and only if there is a self-avoiding path of arbitrary length starting from v0 . We deduce
that

P(∃ infinite cluster containing v0 in Perc(g, p)) = P(Xn (p) > 1, ∀n > 1)
6 inf P(Xn (p) > 1)
n>1
6 inf E[Xn (p)].
Markov n>1

Hence if p·(M −1) < 1 the above probability is 0. By countable union over v0 ∈ g the probability
that there exists an infinite cluster (at all) is also zero in this regime.

1.2.2 d-ary tree


Fix d > 3. Let us suppose in this section that g is the infinite binary tree td where all vertices
have degree d except for the origin vertex v0 which has degree d − 1 (so that there are exactly
(d − 1)n vertices at distance n from v0 ). By Proposition 1.2 we have pc (td ) > 1/(d − 1) and in
fact this lower bound is sharp:
1
Proposition 1.3. We have pc (td ) = d−1 .

Proof. Let us focus on the case d = 3 to ease notation. To prove the upper bound, we use the
second moment method.

Lemma 1.4 (Second moment method). Let X ∈ {0, 1, 2, . . . } be a non-negative integer valued
random variable having a second moment. Suppose also X is not 0 a.s. Then we have
E[X]2
P(X > 1) > .
E[X 2 ]
One-line proof: Use Cauchy-Schwarz E[X]2 = E[X1X>0 ]2 6 E[X 2 ]P(X > 0).
Recall that X(p) is the number of infinite cluster in t3 and Xn (p) is the number of open paths
in Perc(t3 , p) starting at v0 and reaching level n. When p > 1/2 we know that E[Xn (p)] = (2p)n

6
tends to infinity, but that does not imply that Xn (p) > 1 with large probability. To ensure this,
we shall compute the second moment of Xn (p):
 2 
X
E[(Xn (p))2 ] = E  1v0 ↔x in Perc(t3 ,p)
 
x:dgr (x,v0 )=n
X
= P(v0 ↔ x and v0 ↔ y in Perc(t3 , p))
x,y:dgr (x,v0 )=dgr (y,v0 )=n
p
= (2p)n 1 + p + 2p2 + 4p3 + · · · + 2n−1 pn ∼ (2p)2n ,

2p − 1

as n → ∞ for p > 1/2. We thus find that the second moment of Xn (p) is of the same
order as the first moment squared. Applying Lemma 1.4 we deduce that P(Xn (p) > 0) >
p
E[Xn (p)]2 /E[Xn (p)2 ] > 2p−1 asymptotically. Hence by König lemma, the probability that v0
p
is in an infinite cluster is at least 2p−1 and so by the 0 − 1-law there is an infinite cluster in
Perc(t3 , p) when p > 1/2.

Figure 1.2: Increasing Bernoulli percolation on a complete binary tree up to level 10


with parameters p = 0.2, 0.4, 0.5 and p = 0.6.

1
Exercise 1.1. Show that there is no infinite cluster in Perc(td , p) at p = d−1 .

Of course, the knowledgeable reader may have noticed that the open subtree of the origin in
Perc(td , p) is a Galton–Watson tree with offspring distribution Bin(d−1, p). The phase transition
for the existence of an infinite cluster happens when p(d − 1), the mean number of children in
the Galton–Watson tree, is larger than 1. We shall study in more details Galton–Watson trees
in Part I and in particular get a new proof of the above proposition. However, the above proof
can easily be extended to the case of spherically symmetric tree (i.e. trees having dk children at
generation k > 1, exercise!).

1.2.3 Cubic lattice Z2


Let us focus now on the case when g is the standard Manhattan lattice i.e. the cubic lattice in
dimension 2. This is the usual grid graph, whose vertices are indexed by Z2 and an edge joins
the point x to the point x + e where e ∈ {(0, ±1), (±1, 0)}. Let us denote this graph by z2 . We

7
know from Proposition 1.2 that pc > 1/3, but the second moment method does not work well in
this setting since two paths of length n may have a very complicated structure. To show that
pc < 1, we shall rely on another argument specific to planar lattices

Proposition 1.5. We have 0 < pc (z2 ) < 1.

Proof: The idea is to use duality. More precisely, if the cluster of the origin vertex (0, 0) is
finite in Perc(z2 , p) this forces the existence of a blocking self-avoiding loop in the dual graph,
see Figure 1.3.

Figure 1.3: If the cluster of the origin is finite, then it is blocked by a blocking dual
self-avoiding loop of length at least 4.

Since the dual graph of z2 is z2 itself, there are at most 4 · 3n−1 dual loops of length n
starting from the origin, and at most n · 4 · 3n−1 such loops blocking the origin (re-root at its
first intersection with the positive origin axis which must be at distance less than n). Hence, the
probability that there exists such a loop in the dual graph is upper bounded by its expectation
and so by

X
P(∃ blocking loop) 6 n4n (1 − p)n .
n>4

The same sum can be made smaller than 1 if p is close enough to 1. In this case, the probability
that the origin cluster in Perc(z2 , p) be infinite is strictly positive and so pc (z2 ) < 1.

Remark 1.3. In essence the duality argument shows that the percolation on z2 is self-dual at
p = 1/2 and this is one of the key ingredient to prove that indeed pc (z2 ) = 1/2 (a result due to
Kesten).

8
Figure 1.4: Increasing Bernoulli percolation on a 50×50 grid with parameters p = 0.35,
p = 0.45, p = 0.55 and p = 0.65. Notice the appearance of a ubiquitous cluster between
the second and the third picture.

Since the two-dimensional cubic lattice in included in its higher dimensional analog in a
trivial fashion, we deduce that there is a non-trivial phase transition in zd for any d > 2. The
nature of the phase transition in low dimensions 3, 4, 5, 6 . . . and also in dimension 2 to some
extent is still elusive.

1.3 Mean-field regime


In this section, we consider the case when the underlying graph g is the complete graph Kn
on n vertices. This is the graph made of the vertices 1, 2, . . . , n and where there is always an
edge between two distinct vertices (no loops). When n = ∞, the infinite vertex set is Z>0 . It
should be clear that for any p > 0 there is an infinite cluster in the p-percolation on K∞ , and
furthermore that the diameter of the percolated graph is two almost surely (exercise). However,
the following might be a surprise:

Proposition 1.6. For any p, p0 ∈ (0, 1) almost surely Perc(K∞ , p) and Perc(K∞ , p0 ) are iso-
morphic. In particular, the equivalence class of non trivial Bernoulli bond-percolation on K∞ is
almost surely constant: this is the Rado graph.

Recall that two graph g, g0 are equivalent if there is a bijection of their edge and vertex sets
with preserve the adjacency properties. The proof of the proposition is easy once we know the
following characteristic property of the Rado graph (over the vertex set Z>0 ): it is a countable
simple graph such that for any finite disjoint subsets U, V ⊂ Z>0 , there exists v outside of U
and V such that v is neighbor to all vertices in U and none of V . We leave the reader prove
that there is a unique (equivalence class) of graph with a countable vertex set satisfying this
assumption.

In a sense, percolation on the infinite complete graph K∞ is deterministic and rather boring.
One of the main object in this course is obtained by studying Perc(Kn , p) when p may vary
with n. This random graph, usually refer to as the Erdös–Rényi random graph will be denoted

9
by G(n, p). This model was introduced2 by Erdös3 and Rényi4 in 1959 who wanted to probe
randomly a graph with n (labeled) vertices. This random graph model has become ubiquitous
in probability and commonly referred to as the “mean field model ”. This means that the initial
geometry of the model is trivial: one could permute all the vertices and get the same model.

Bibliographical notes. Percolation theory is a very broad and vivid area in nowadays probability
theory, [11, 25, 9]. When the underlying graph has strong geometric constraints (e.g. the cubic
lattices in Zd for d > 2) then the study of the phase transition and in particular of the critical
behavior is still a challenge for mathematicians. For more about the links between the geometry
of the graph and the behavior of Bernoulli percolation we advise the reading of the influential
paper [3].

2
Actually this definition of random graph is not really due to Erdös and Rényi who considered a random graph
on n vertices with a fixed number m 6 n2 of edges. However, once conditioned on the number of edges the


two models are equivalent and we shall use the name Erdös–Rényi instead of Edgar Gilbert who introduced this
variant.
3
Paul Erdös 1913-1996

4
Alfréd Rényi 1921-1970

10
Part I:
Galton-Watson trees
In this part we study the study the model of Galton–Watson tree, or branching process,
modeling the genealogy of an asexual population where individuals reproduce indepen-
dently of each other according to the same offspring distribution. The main tool to study
such objects is their encodings by one-dimensional random walks.

Figure 1.5: A large critical Galton–Watson with finite variance

11
Chapter II: One-dimensional random walks
Back to basics
In this chapter we consider the following object:

Definition 2.1 (One-dimensional random walk). Let µ be a probability distribution on Z with


µ(Z>0 ) > 0 as well as µ(Z<0 ) > 0. Consider X1 , X2 , . . . i.i.d. copies of law µ which we see as
the increments of the process (Sn : n > 0) on Z defined as follows : S0 = 0 and for n > 1

Sn = X1 + · · · + Xn .

We say that (S) is a one-dimensional random walk with step distribution µ (or µ-random walk
for short).

6 100

4
2000 4000 6000 8000 10 000
2
-100

200 400 600 800 1000


-200
-2

-300
-4

-400
-6

Figure 2.1: Two samples of one-dimensional random walks with different step distri-
butions. The first one seems continuous at large scales whereas the second one displays
macroscopic jumps.

Notice that we restrict to the lattice case by demanding that the support of µ be included
in Z and that we excluded the monotone situation since the support of µ contains both positive
and negative integers. Of course, the behavior of a one-dimensional random walk depends on
the step distribution µ in a non-trivial way as we will see. We first recall the general background
on such object before moving to skip-free random walks which can only make negative jumps of
size −1 and which will be used in the next chapters to study random trees and graphs.

12
2.1 General theory
In this section we gather a few general results on one-dimensional random walk and start
with the applications of discrete Markov chain theory since a one-dimensional random walk is
clearly a very particular case of Markov chain in discrete time with a discrete state space.

2.1.1 Reminder on Markov chains


We start with the parity consideration:

Proposition 2.1. The Markov chain (Sn : n > 0) is

• irreducible if Supp(µ) is not included in aZ for some a > 1,

• It is furthermore aperiodic if Supp(µ) is not included in b + aZ for some a > 1 and b ∈ Z.

Proof. Using the fact that the walk is not monotone, it is an exercise to check that the chain
can come back to 0 with positive probability and so the set of integers accessible by the chain
starting from 0 is a subgroup of Z. Writing Bezout relation we can find non-negative integers
α1 , . . . , αk , α10 , . . . , αk0 and `1 , . . . , `k , `01 , . . . , `0k ∈ Supp(µ) so that

α1 `1 + · · · + αk `k = α10 `01 + · · · + αk0 `0k + gcd(Supp(µ)).

Hence, by the above consideration gcd(Supp(µ)) is an accessible value for the walk and the first
point is proved. For the second point, notice that if Supp(µ) ⊂ b + aZ then

Sn ≡ bn [a]

and so the chain cannot be aperiodic if a ∈/ {−1, +1}. If Supp(µ) is not included in b + aZ for
|a| > 1, then we have gcd(Supp(µ) − k0 ) = 1 where k0 is any integer in the support of µ. We
pick k0 ∈ Supp(µ) such that the measure µ̃(·) = µ(k0 + ·) does not put all its mass on Z>0 nor
on Z60 . It is possible since otherwise Supp(µ) = {α, β} with α < 0 < β and the walk is not
aperiodic in this case. Then, by the first point of the proposition, we can find n0 > 0 large
enough so that a µ̃-random walk satisfies P(Sen0 = k0 ) > 0 which means that

P(Sn0 = (n + 1)k0 ) > 0.

Combining this with the trivial point P(Sn0 +1 = (n0 + 1)k0 ) > µ(k0 )n0 +1 > 0 we deduce that
the integer (n0 + 1)k0 is accessible both at time n0 and time n0 + 1 for the chain. By standard
results on Markov chain this implies aperiodicity.
1
Example 2.1. The simple random walk on Z with P(S1 = ±1) = 2 is irreducible but not
aperiodic.
The counting measure on Z is clearly an invariant measure for any µ-random walk (beware,
it is not usually reversible, and it might not be the only invariant measure up to multiplicative

13
constant in the transient case). Due to homogeneity of the process, the Markov property takes
a nice form in our setup: as usual Fn = σ(X1 , . . . , Xn ) is the natural filtration generated by the
walk (S) up to time n and a stopping time is a random variable τ ∈ {0, 1, 2, . . . } ∪ {∞} such
that for each n > 0 the event {τ = n} is measurable with respect to Fn .

Proposition 2.2 (Strong Markov property). If τ is a stopping time then conditionally on


(τ )
{τ < ∞} (implicitly of positive probability) the process (Sn )n>0 = (Sτ +n −Sτ )n>0 is independent
of (Sn )06n6τ and is distributed as the initial walk (Sn )n>0 .

Proof. Let f, g be two positive measurable functions and let us compute


h   i X∞ h  i
E f ((Sn )06n6τ ) g (Sn(τ ) )n>0 1τ <∞ = E 1τ =k f ((Sn )06n6k ) g (Sn(k) )n>0
τ <∞
k=0

X h  i
= E [1τ =k f ((Sn )06n6k )] E g (Sn(k) )n>0
indep.
k=0

X
= E [1τ =k f ((Sn )06n6k )] E [g ((Sn )n>0 )]
stat.
k=0
= E [f ((Sn )06n6τ ) 1τ <∞ ] E [g ((Sn )n>0 )] ,

and this proves the proposition.

2.1.2 Asymptotic behavior


It is easy to see from the classification of states that for any µ-random walk we have almost
surely
lim sup Sn ∈ {−∞, +∞} and lim inf Sn ∈ {−∞, +∞},
n→∞ n→∞

and this motivates the following definition:


Definition-Proposition 2.1. A one-dimensional random walk (S) falls into exactly one of the
three categories:

(i) Either Sn → ∞ a.s. as n → ∞ in which case (S) is said to drift towards ∞,

(ii) Or Sn → −∞ a.s. as n → ∞ in which case (S) is said to drift towards −∞,

(iii) Or (S) oscillates i.e. lim supn→∞ Sn = +∞ and lim inf n→∞ Sn = −∞ almost surely.
When a random walk drifts, it is obviously transient, but in the oscillating case, it may be
transient or recurrent. Recall that a random walk (S) is recurrent iff one of the following
equivalent conditions is satisfied
"∞ #
X
P(∃n > 0 : Sn = 0) = 1 ⇐⇒ E 1Sn =0 = ∞ ⇐⇒ lim inf |Sn | < ∞ a.s. (2.1)
n→∞
n=0

We will see below (Theorem 2.3) a recurrence criterion based on the Fourier transform.
Exercise 2.1. Find an example of an oscillating yet transient one-dimensional random walk.

14
2.2 Walks with finite mean and the LLN
In this section we examine the particular case when µ has finite mean and show in particular
that the walk is recurrent whenever it is centered, otherwise it is transient and drifts. It will be
a good opportunity to wiggle around the strong and weak laws of large numbers, see Exercises
2.2 and 2.3.

2.2.1 Recurrence/transience

Theorem 2.3 (Dichotomy for walks with finite mean)


Suppose E[|X1 |] < ∞ then

(i) If E[X1 ] 6= 0 then (S) is transient and drifts,

(ii) otherwise if E[X1 ] = 0 then (S) is recurrent.

Proof. The first point (i) is easy since by the strong law of large numbers we have n−1 Sn →
E[X1 ] almost surely: when E[X1 ] 6= 0 this automatically implies that |Sn | → ∞ and so (S) is
transient and drifts towards ±∞ depending on the sign of E[X].
In the second case we still use the law of large numbers to deduce that Sn /n → 0 almost
surely as n → ∞. This implies that for any ε > 0 we have |Sn | 6 εn eventually and so for n
large enough

"∞ #
X X
1|Si |6εn > n with high probability so that E 1|Si |6εn > n/2. (2.2)
i=0 i=0

We claim that this inequality is not compatible with transience. Indeed, according to (2.1), if
the walk (S) is transient then for some constant C > 0 we have
"∞ #
X
E 1Si =0 6 C.
i=0

If k0 ∈ Z, applying the strong Markov property at the stopping time τ = inf{i > 0 : Si = k0 }
we deduce that "∞ # "∞ #
X X
E 1Si =k0 = P(τ < ∞)E 1Si =0 6 C.
i=0 i=0

Hence, if the walk were transient we would have


"∞ # "∞ #
X X X
E 1|Si |6εn 6 E 1Si =k 6 2εnC,
i=0 −εn6k6εn i=0

which contradicts (2.2) for ε > 0 small enough. Hence the walk cannot be transient.

15
The theorem above can be seen as a particular example of the Kesten–Spitzer–Whitman
theorem saying that a random walk with independent increments on a group is transient if and
only if its range (i.e. the number of visited vertices) grows linearly with time. Here is an exercise
on this theme:
Exercise 2.2 (Recurrence and weak law of large numbers). Let (Sn )n>0 be a one-dimensional
random walk with step distribution µ on R.
(P)
1. Show that if n−1 · Sn −−−→ 0 then (S) is recurrent.
n→∞

2. (*) By considering increments having law P(X = k) = P(X = −k) ∼ c/k 2 for some c > 0,
show that the converse is false.

2.2.2 Wald’s equality

Theorem 2.4 (Wald equality )


Suppose E[|X1 |] < ∞. Let τ be a stopping time with finite expectation. Then we have

E[τ ] · E[X1 ] = E[Sτ ].

Proof with martingales. We present a first proof based on martingale techniques. If we


denote by m the mean of X1 then clearly the process (Sn − nm)n>0 is a martingale for the
canonical filtration Fn = σ(X1 , . . . , Xn ). By the optional sampling theorem we deduce that

E[Sn∧τ ] = mE[n ∧ τ ]. (2.3)

Since τ is almost surely finite, we can let n → ∞ and get by monotone convergence that the
right hand side tends to E[τ ]. However, to deduce that the left hand side also converges towards
E[Sτ ] one would need a domination... To get this,the trick is to reproduce the argument with
the process
Xn
Yn = |Xi | − m̃n,
i=1

where m̃ = E[|X1 |]. Then (Yn )n>0 is again a martingale for the filtration (Fn ). Notice that Yn
is also a martingale for its own filtration but the last statement is stronger. We can then apply
the optional sampling theorem again for n ∧ τ and use monotone convergence on both sides to
get that E[ τi=1 |Xi |] = m̃E[τ ]. Clearly the variable τi=1 |Xi | dominates all variables Sn∧τ for
P P

n > 0. On can then use this domination to prove convergence of the left-hand side in (2.3).

Proof with law of large numbers. We give a second proof based on the law of large numbers.
The idea is to iterate the stopping rule. Let τ = τ1 6 τ2 6 τ3 6 . . . be the successive stopping
times obtained formally as

τi+1 = τi+1 ((S)n>0 ) = τi ((Sn )n>0 ) + τ ((Sn+τi ((Sn )n>0 ) )n>0 ),

16
for i > 1. In particular (τi+1 − τi ; Sτi+1 − Sτi )i>0 are i.i.d. of law (τ, Xτ ). By the law of large
numbers (we suppose here that τ 6= 0 otherwise the result is trivial) we get that

Sτi a.s.
−−−→ E[X1 ].
τi i→∞
On the other hand since τ has finite expectation by assumption, applying once more the law of
large numbers we deduce that
Sτi Sτ τi a.s.
= i · −−−→ E[X1 ] · E[τ ].
i τi i i→∞
We then use the reciprocal of the law of large numbers (see Exercise 2.3) to deduce that Sτ has
finite expectation and equal to E[τ ] · E[X1 ] as claimed by Wald1 .
Exercise 2.3 (Reciproque to the strong law of large numbers). Let (Xi )i>0 be i.i.d. real variables
and suppose that for some constant c ∈ R we have
X1 + · · · + Xn a.s.
−−−→ c.
n n→∞

The goal is to show that Xi have a finite first moment and E[X1 ] = c. For this we argue by
contradiction and suppose that E[|X|] = ∞.
P
(i) Show that n>1 P(|X| > n) = ∞.

(ii) Deduce that |Xn | > n for infinitely many n’s.

(iii) Conclude.

(iv) By considering increments having law P(X = k) = P(X = −k) ∼ c/(k 2 log k) for some
c > 0, show that the converse of the weak law of large numbers does not hold i.e. that the
convergence
X1 + · · · + Xn (P)
−−−→ c,
n n→∞

does not imply that E[|X|] < ∞.

2.3 Fourier transform


In this section, we use the Fourier transform to give a recurrence criterion as well as a local
version of the central limit theorem. Recall that if µ is the step distribution of a random walk
(S) on Z, then the Fourier2 transform of the measure µ is defined by

µ̂(ξ) = E[eiξX1 ], for ξ ∈ R.

1
Abraham Wald 1902–1950

2
Jean Baptiste Joseph Fourier 1768–1830

17
To get information on the walk from µ̂, the main idea is of course to use Cauchy’s formula to
relate probabilities to integrals of powers of the Fourier transform:
X 1 Z π
∀x ∈ Z, P(Sn = x) = dξ e−iξx eiξk P(Sn = k) (2.4)
2π −π
k∈Z
Z π Z π
1 1
= dξ e−iξx
E[e iξSn
]= dξ e−iξx (µ̂(ξ))n ,
2π −π 2π −π

where we used the fact that E[eiξSn ] = (µ̂(ξ))n by independence of the increments and where
the interversion of series and integral is easily justified by dominated convergence.

2.3.1 Chung-Fuchs
In this section, we give a criterion for recurrence of a one-dimensional random walk based on its
Fourier transform. The criterion is valid mutatis mutandis for more general random walks with
values in Rd .
Theorem 2.5 (Easy version of Chung–Fuchs)
The one-dimensional walk (S) is recurrent if and only if we have
Z π  
1
lim dξ Re = ∞.
r↑1 −π 1 − rµ̂(ξ)

P
Proof. By (2.1), the walk (S) is recurrent if and only if the series n>0 P(Sn = 0) diverges.
Recall from (2.4) that
Z π Z π
1 1
P(Sn = 0) = dt E[eitSn ] = dt (µ̂(t))n .
2π −π 2π −π

We are lead to sum the last equality for n > 0, but before that we first multiply by rn for some
r ∈ [0, 1) in order to be sure that we can exchange series, expectation and integral. One gets
Z π Z π
X 1 X 1 dt
rn P(Sn = 0) = dt rn (µ̂(t))n = .
2π −π 2π −π 1 − rµ̂(t)
n>0 n>0

Since the left-hand side is real, one can take the real part in the integral. Letting r ↑ 1, the first
P
series diverges if and only if n>0 P(Sn = 0) = ∞. This completes the proof of the theorem.

In fact, there is a stronger version of Theorem 2.5 which is obtained by formally interchanging
the limit and the integral in the last theorem: the random walk (S) is transient or recurrent
according as to whether the real part of (1 − µ̂(t))−1 is integrable or not near 0 (we do not give
the proof). Notice that in the case when the law µ is symmetric (i.e. X ∼ −X when X ∼ µ)
then µ̂ is real valued and the monotone convergence theorem shows that the recurrence criterion
of Theorem 2.5 indeed reduces to

Z
= ∞.
≈0 1 − µ̂(ξ)

18
Let us also state without proof a theorem of Shepp3 which shows the disturbing fact that
there exists recurrent one-dimensional random walk with arbitrary fat tails (but not regular
varying):
Theorem 2.6 (Shepp [23] )
For any position function (x) tending to 0 as x → ∞, there exists a symmetric step distri-
bution µ such that µ(R\[−x, x]) > (x) for any x > 0 and such that the associated random
walk (S) is recurrent.

Exercise 2.4. Suppose µ is centered.

1. Show that µ̂(t) = 1 + o(t) as t → 0.

2. Give another proof of Theorem 2.3 using Theorem 2.5.

2.3.2 Local central limit theorem


The central limit theorem is one of the most important theorems in probability theory and says

in our context that the rescaled random walk Sn / n converges in distribution towards a normal
law provided that µ is centered and has finite variance. There are many proofs of this result, the
most standard being through the use of Fourier transform and Lévy’s criterion for convergence
in law4 . We will see below that the central limit theorem can be “desintegrated” to get a more
powerful “local” version of it. The proof is again based on (2.4).

When a one-dimensional random walk with mean m and variance σ 2 satisfies a central limit
theorem we mean that for any a < b we have
Z b
Sn − nm
 
dx −x2 /(2σ2 )
P √ ∈ [a, b] −−−→ √ e .
n n→∞ a 2πσ 2
We say that we have a local central limit theorem if we can diminish the interval [a, b] as a
function of n until it contains just one point of the lattice, that is if for x ∈ Z we have

Sn − nm x − nm x − nm + 1 (x−nm)2 1
  
1
P(Sn = x) = P √ ∈ √ , √ ≈√ e− 2nσ2 √ .
n n n 2πσ 2 n

It turns out that the necessary conditions for the central limit theorem are already sufficient to
get the local central limit theorem (the result extends to higher dimension and to the case of
random walks converging towards stable Lévy process with mutatis mutandis the same proof):

3
Lawrence Alan Shepp 1936–2013
4
Here are a couple of other proofs: Lindeberg swapping trick, method of moments, Stein method,
contraction method and Zolotarev metric. See the beautiful page by Terence Tao on this subject:
https://terrytao.wordpress.com/2010/01/05/254a-notes-2-the-central-limit-theorem/

19
Theorem 2.7 (Local central limit theorem, Gnedenko)
Let µ be a distribution supported on Z, aperiodic, with mean m ∈ R and with a finite
2 2
variance σ 2 > 0. Then if we denote by γσ (x) = √2πσ1
e−x /(2σ ) the density distribution of
the centered normal law of variance σ 2 then we have

x − nm
 
1/2 −1/2

lim sup n P(Sn = x) − n γσ √ = 0.
n→∞ x∈Z n

The usual central limit theorem follows from its local version. Indeed, if we consider the random
variable S̃n = Sn + n−1/2 Un where Un is uniform over [0, 1] and independent of Sn . Then the

local central limit theorem shows that the law of (S̃n − nm)/ n is absolutely continuous with
respect to the Lebesgue measure on R whose density fn converges pointwise towards the density

of γσ . Scheffe’s lemma (see Exercise 3.1) then implies that (S̃n − nm)/ n converges in law
towards γσ (dx) and similarly after removing the tilde. Before moving to the proof of the local
CLT, let us translate the aperiodicity condition on the Fourier transform:

Lemma 2.8. When µ is aperiodic we have

|µ̂(ξ)| < 1, for ξ ∈ (0, 2π).


n
Proof. Indeed, if we have |µ̂(ξ)| = |E[eiξX1 ]| = 1 we have also | µ̂(ξ) | = |E[eiξSn ]| = 1. This
implies by the equality case in the triangle inequality that all eiξx for x ∈ Supp(L(Sn )) are
(positively) aligned. Using the aperiodicity assumption, one can choose n large enough so that
the support of the law of Sn contains 0 and 1. This shows that ξ ≡ 0 modulo [2π].

Proof of the local central limit theorem. The starting point is again Cauchy formula’s
relating probabilities to Fourier transform:
Z π Z π
1 −ixt 1
P(Sn = x) = dt e iSn t
E[e ] = dt e−ixt (µ̂(t))n .
2π −π 2π −π

Since |µ̂(t)| < 1 when t 6= 0 the main contribution of the integral comes from the integration near
0: we shall apply Laplace’s method. Since we want to use the series expansion of the Fourier
transform near 0 it is natural to introduce ν the image measure of µ after translation of −m so
2
that ν is centered and has finite variance: we can write ν̂(t) = 1 − σ2 t2 + o(t2 ) for t small. The
last display then becomes

Z π
1
P(Sn = x) = dt e−ixt einm (ν̂(t))n
2π −π
Z π
1 σ2
= dt e−ixt einmt (1 − t2 + o(t2 ))n
2π −π 2
Z π √n
1 1 √ √
−iux/ n i nmu σ2 2
= √ du e e (1 − u + o(u2 /n))n .
n 2π −π√n | 2n {z }
≈γ1/σ (u)

20
We can then approximate the last integral by
Z ∞ √ √ √ N
    
1 1 −iux/ n i nmu 1 x
√ du e e γ1/σ (u) = √ E exp i nm − √ ,
n 2π −∞ 2πn n σ

itN 2
−t /2 the last display
where N denote a standard
 normal
 √ variable. Using the identity E[e ] = e
is indeed equal to γσ x−nm

n
/ n as desired. It remains to quantify the last approximation.
The error made in the approximation is clearly bounded above by the sum of the two terms:
1 1
Z
A= √ du γ1/σ (u),
n 2π |u|>π√n
  n
1 1 u
Z

B=√ √
du γ1/σ (u) − ν̂ √ .
n 2π |u|<π n n

The first term A causes no problem since it is exponentially small (of the order of e−n ) hence

negligible in front of 1/ n. The second term may be further bounded above by the sum of three
terms
  n
1 u
Z

B 6 √ du γ1/σ (u) − ν̂ √

n |u|<n1/4 n
Z
+ √
du γ1/σ (u)
n1/4 <|u|<π n
u n
Z  

+ √
du ν̂ √ .

n1/4 <|u|<π n n

The first of this term is shown to be o(n−1/2 ) using dominated convergence: in the region
considered for u, the integrand converges pointwise to 0; for
the domination we may use the fact
√ √u n 2 2 2 2
for |u| < ε n we have by the expansion of ν̂ that ν̂ n
6 (1 − σ4nu )n 6 e−σ u /4 . The

second term of the sum is handle
  as above and seen to be of order e− n . For the third term, we
n 2 2 √ √ √
bound the integrand by ν̂ √un 6 e−σ u /4 for |u| < ε n, as for ε n < |u| < π n we use

the fact that |µ̂(x)| < c < 1 for all x ∈ [ε, π] by aperiodicity. The sum of the three terms is then
of negligible order compared to n−1/2 as desired.

Bibliographical notes. The material in this chapter is standard and can be found in many
textbooks, see e.g. [24, Chapter II], [5, Chapter 8] or [13, 18]. The Fourier transform is a
remarkable tool (whose efficiency is sometimes a bit mysterious) to study random walk with
independent increments. The results of this chapter are mostly based on [24, Chapter II]. The
local central limit theorem is valid in the much broader context of random walks converging
towards stable Lévy processes, see Gnedenko’s local limit theorem in [12, Theorem 4.2.1], and
can be sharpened when we have further moment assumptions, see [19].

21
Chapter III: Skip-free random walks

A simple but powerful observation

In this chapter we still consider a one-dimensional random walk (S) based on i.i.d. increments
of law µ (whose support is not contained in Z>0 nor in Z60 ) and furthermore that the walk is
skip-free which means that
Supp(µ) ∈ {−1, 0, 1, 2, 3, . . . }.

In other words, the only negative step of (S) are steps of size −1. We shall see that some
magic happens for such walks. Let us start by drawing a consequence of the last chapter: the
expectation
X
m= kµk
k>−1

is always well defined and belongs to [−1, ∞] and so by Theorem 2.3 the walk is recurrent if
m = 0 and drifts otherwise.

3.1 Duality and the cycle lemma


In this section we perform two simples combinatorial operations on paths, namely reversing
it and apply cycle shift.

3.1.1 Duality
We begin with a simple but surprisingly important observation called duality. This is valid for
any random walk, not necessarily skip-free.

Proposition 3.1 (Duality). For each fixed n > 0, we have the following equality in distribution
(d)
(0 = S0 , S1 , . . . , Sn ) = (Sn − Sn , Sn − Sn−1 , Sn − Sn−2 , . . . , Sn − S1 , Sn − S0 ).

Proof. It suffices to notice that the increments of the walk (Sn −Sn−1 , Sn −Sn−2 , . . . , Sn −S1 , Sn −
S0 ) are just given by (Xn , Xn−1 , . . . , X1 ) which obviously has the same law as (X1 , . . . , Xn ) since
the (Xi )i>1 are i.i.d. hence exchangeable.

22
(S)

n
n

Figure 3.1: Geometric interpretation of the duality: the rotation by angle π of the first
n steps of the walk (S) leaves the distribution invariant.

For A ⊂ Z, we denote by TA = inf{i > 0 : Si ∈ A} the first hitting time of A by the walk.
Then the previous proposition shows (make a drawing) that n > 0

P(TZ<0 > n) = P(S0 = 0, S1 > 0, . . . , Sn > 0)


= P(Sn − Sn = 0, Sn − Sn−1 > 0, . . . , Sn > 0)
duality
= P(Sn > Sn−1 , Sn > Sn−2 , . . . , Sn > S0 ) = P(n is a new record time for the walk).

Summing over n > 0 we deduce that


X
P(TZ<0 > n) = E[TZ<0 ] = E[#record times]. (3.1)
n>0

However, it is easy to see that the number of (ascending large) record times is a geometric
random variable which is finite almost surely if and only if the walk does not drift towards +∞.
In particular, for skip-free random walk, when TZ<0 < ∞ we must have STZ<0 = −1 and so when
m < 0 we have
1
E[TZ<0 ] = .
Thm.2.4 |m|

3.1.2 Cycle lemma


The following theorem has many names, equivalent forms and ramifications in the probabilistic
and combinatorial literature (Kemperman’s formula, Otter–Dwass formula, Feller1 combinatorial
lemma, Désiré André cycle lemma, Lagrange inversion formula...). We shall start with the
following deterministic statement:
Let x1 , x2 , . . . , xn ∈ {−1, 0, 1, . . . } be integers which we consider as the increments of the
skip-free walk (s) defined by

s0 = 0, s1 = x1 , s2 = x1 + x2 , ... , sn = x1 + · · · + xn .

1
William Feller 1906–1970, born Vilibald Srećko Feller

23
If ` ∈ {0, 1, 2, . . . , n − 1} we consider (s(`) ) the `-th cyclic shift of the walk obtained by cyclically
shifting its increments ` times, that is
(`) (`) (`)
s0 = 0, s1 = x`+1 , ... , sn−` = x`+1 + · · · + xn , ... , s(`)
n = x`+1 + · · · + xn + x1 + · · · + x` .

Lemma 3.2 (Feller). Suppose that sn = −k for k > 1. Then there are exactly k cyclic shifts
(s(`) ) with ` ∈ {0, 1, 2, . . . , n − 1} for which time n is the first hitting time of −k by the walk
(s(`) ).

Proof. Let us first prove there is at least one such cycle shift. For this consider the first time
` ∈ {1, 2, . . . , n} such that the walk (s) reaches its minimum. Then clearly, after performing the
cycle shift at that time, the new walk stays above −k over {0, 1, . . . , n − 1}, see Figure 3.2 below.

Ming t'
^

*
¥Æ •

Figure 3.2: Re-rooting the walk at the hitting time of the minimum gives a walk

:ËÆü
reaching its minimum at time n.

We can thus suppose without loss of generality that time n is the hitting time of −k by the
walk. It is now clear (again see Figure 3.3 below) that that the only possible cyclic shifts of the
walk such that the resulting walk first hits −k at time n correspond to the first hitting times of
−1, −2, . . . , −k by the walk (we use skip-free here to notice that those hitting times all appear).

Remark 3.1. Beware, Feller’s3combinatorial lemma does not say that .


- the cyclic shifts (s(`) ) are
distinct. Indeed, in the action of Z/nZ on {(s(`) : ` ∈ {0, 1, . . . , n − 1}} by cyclic shift, the size
of the orbit is equal to n/j where j|n is the cardinal of the subgroup stabilizing (s(0) ). In our
case, it is easy to see that j must also divide k and in this case there are only j/k distinct cyclic
shifts having n as the first hitting time of −k.

24
Ming t'
^

*
¥Æ •

:ËÆü
-
3 .

Figure 3.3: The only possible allowed cycle shifts correspond to hitting times of
−1, −2, . . . , −k.

3.2 Kemperman’s formula and applications


3.2.1 Kemperman’s formula
For k ∈ Z, if Tk = inf{n > 0 : Sn = k} is the first hitting time of k by the walk, then an easy
corollary of the cycle lemma is Kemperman’s2 formula:

Proposition 3.3 (Kemperman’s formula). Let (S) be a skip-free random walk. Then for every
n > 1 and every k > 1 we have
1 1
P(Sn = −k) = P(T−k = n).
n k
Proof. Let us first re-write Lemma 3.2 in a single equation
n−1
1X
1sn =−k = 1T−k (s(i) )=n .
k
i=0

Indeed, if the walk (s) is such that sn = −k then there exists exactly k shifts which do not
annulate the indicator functions on the right-hand side. Since we divide by k the total sum
is one. We take expectation when (s) = (S) is a one-dimensional random walk with i.i.d.
(i)
increments. Using the fact that for all 0 6 i 6 n − 1 we have (Sj )06j6n = (Sj )06j6n in
distribution, we deduce Kemperman’s formula.

Remark 3.2. Actually, Kemperman’s formula (and the its later corollaries) holds true even
when the increments of the walk (S) are not necessarily i.i.d. Indeed, to apply duality and cyclic

2
Johannes Henricus Bernardus Kemperman 1924–2011

25
symmetry, one just needs that the law of increments X1 , . . . , Xn be invariant by cyclic shift, and
by time reversal, i.e.
(d) (d)
(X1 , . . . , Xn ) = (Xn , . . . , X1 ) and (X1 , . . . , Xn ) = (X2 , X3 , . . . , Xn , X1 ).

Remark 3.3. Combining Kemperman’s formula with the local central limit theorem, we deduce
that if (S) is an aperiodic skip-free random walk with centered increments having finite variance
σ 2 then we have
1 1
P(T−1 = n) ∼ √ · 3/2 , as n → ∞.
2πσ 2 n

3.2.2 Simple symmetric random walk


Let us give a first application of this formula in the case of the symmetric simple random
walk whose step distribution is 12 (δ1 + δ−1 ). Due to parity reason T−1 must be odd, and by
Kemperman’s formula we have

−(2n−1) 2n − 1 (2n − 2)!


 
1 1
P(T−1 = 2n − 1) = P(S2n−1 = −1) = 2 = 2−2n+1 .(3.2)
2n − 1 2n − 1 n n!(n − 1)!

As an application of this formula, we can prove the famous arcsine law 3 :

Proposition 3.4 (1st Arcsine law). Let (S) be the simple symmetric random walk on Z. We
put Kn = inf{0 6 k 6 n : Sk = sup06i6n Si } then

Kn (d) dx
−−−→ p 1 .
n n→∞ π x(1 − x) x∈(0,1)

The name arcsine comes from the cumulative distribution function of the right-hand side which

is π2 arcsin( x).

2.0

1.5

1.0

0.5

0.2 0.4 0.6 0.8 1.0

Figure 3.4: The arcsine distribution

3
there are at least three arsine laws in the theory of random walk...

26
Remark 3.4. Quoting Feller “Contrary to intuition, the maximum accumulated gain is much
more likely to occur towards the very beginning or the very end of a coin-tossing game than
somewhere in the middle.”
Proof. Putting T0+ = inf{n > 0 : Sn = 0} to be the first return time at 0, using duality we can
compute exactly for k ∈ {1, 2, ...}

P(Kn = k) = P(T−1 > n − k)P(T0+ > k)


1
= P(T−1 > n − k)P(T−1 > k − 1).
2
1 √ 1
Using (3.2) and Stirling’s formula, the last display is shown to be equivalent to π where
k(n−k)
the last asymptotic holds as k and n − k tend to ∞. If we add a little blur to Kn and consider
K˜n = Kn +Un where Un is independent of Kn and uniformly distributed over [0, 1]. Then clearly
K˜n /n has a density with respect to Lebesgue measure which converges pointwise towards the
density of the arcsine law. It follows from Scheffé’s lemma that K̃n /n converges in total variance
towards the arcsine law and consequently Kn /n converges in distribution towards the arcsine
law since Un /n → 0 in probability.

Exercise 3.1 (Scheffé’s lemma). Let Xn , X be random variables having densities fn , f with
respect to a background measure π. We suppose that fn → f pointwise π-almost everywhere.
Prove that

(i) fn → f in L1 (π).

(ii) dT V (fn dπ, f dπ) → 0 where dTV is the total variation distance,

(iii) deduce that Xn → X in distribution.

Exercise 3.2. Extend the arcsine law to any centered random walk with finite variance.

3.2.3 Poisson random walk


Another explicit application of Kemperman’s formula is obtained by considering a random walk
(S) with step distribution given by the law of Pα − 1 where Pα is a Poisson random variable of
parameter α, namely
αk
P(X = k − 1) = e−α , for k > 0.
k!
Clearly, if α > 1 then the walk is transient and drifts towards +∞. Using the additivity property
of independent Poisson law after applying Kemperman’s formula yields to
1
P(T−1 = n) = P(Sn = −1)
n
1 (αn)n−1
= P(Pnα = n − 1) = e−αn .
n n!

27
Definition 3.1. For α ∈ [0, 1], the Borel–Tanner distribution ξα is the law on Z>0 given by

(αn)n−1
ξα (n) = e−αn , for n > 1.
n!
Exercise 3.3. Do you have an elementary way to see that the above display is a probability
distribution?

3.3 Ballot theorems


Let us now turn our attention to ballot theorem when we require a positivity constraint on
the walk. In the following we say that (S) is skip-free ascending if (−S) is a skip-free random
walk.

3.3.1 Ballot theorem


Lemma 3.5. Let (S) be a skip-free ascending random walk. Then for every n > 1 and every
k > 1 we have
k
P(Si > 0, ∀1 6 i 6 n | Sn = k) = .
n
Proof. By duality and Kemperman’s formula we have
k
P(Si > 0, ∀1 6 i 6 n and Sn = k) = P(T−k = n) = P(Sn = k).
duality Prop.3.3 n

Let us give an immediate application which is useful during election days:


Theorem 3.6 (Ballot theorem)
During an election, candidates A and B respectively have a > b votes. Suppose that votes
are spread uniformly in the urn. What is the chance that during the counting of votes,
candidate A is always ahead?
a−b
answer: .
a+b

Proof. Let us model the scenario by a uniform path making only +1 or −1 steps which starts
at (0, 0) and ends at (a + b, a − b). The +1 steps correspond to votes for candidate A and the
−1 steps for votes for B. Hence we ask about the probability that such a path stay positive
between time 1 and a + b. We just remark that such a walk has the same distribution as the
first a + b steps of (Sk )06k6a+b a simple random walk (with ± steps with equal probability)
conditioned on Sa+b = a − b: indeed each ± path starting at (0, 0) and ending at (a + b, a − b)
has the same probability under that distribution. The conclusion follows from Lemma 3.5 and
Remark ??.

28
3.3.2 Staying positive forever
If (S) is a one-dimensional random walk with integrable increments with positive mean then by
the law of large numbers, the probability that the walk stays positive after time 1 is strictly
positive. We compute below this probability in the case of skip-free ascending and skip-free
(descending) walks:

Corollary 3.7. If (S) is skip-free ascending such that E[S1 ] > 0 then we have

P(Si > 0 : ∀i > 1) = E[S1 ].

Proof. We have

P(Si > 0 : ∀i > 1) = lim P(Si > 0 : ∀1 6 i 6 n)


n→∞
= lim E [P(Si > 0 : ∀1 6 i 6 n | Sn )]
n→∞
 
Sn
= lim E 1S >0 → E[S1 ],
Lem.3.5 n→∞ n n
by the strong law of large numbers (since Sn /n → E[S1 ] almost surely and in L1 ).

Proposition 3.8. If (S) is skip-free descending (with µ 6= δ0 ) then P(Sn > 0, ∀n > 0) = 1 − α
where α is the smallest solution in α ∈ [0, 1] to the equation:

X
α= µk αk+1 . (3.3)
k=−1

Proof. Since µ is supported by {−1, 0, 1, . . . } its mean m is well-defined and belongs to [−1, ∞].
We already know from the previous chapter that P(Sn > 0, ∀n > 0) > 0 if and only if m > 0
(we use here the fact that the walk is not constant since µ 6= δ0 ). We denote by τ<0 the hitting
time of {. . . , −3, −2, −1} by the walk (S). Notice that by our assumptions if τ<0 is finite then
necessarily Sτ<0 = −1. To get the equation of the proposition we perform one step of the random
walk S: if S1 = −1 then τ<0 < ∞. Otherwise if S1 > 0 then consider the stopping times

θ0 = 0, θ1 = inf{k > 1 : Sk = S1 − 1}, θ2 = inf{k > θ1 : Sk = S1 − 2}, . . . .

By the strong Markov property we see that (θi+1 − θi )i>0 are i.i.d. of law τ<0 . Furthermore on
the event S1 > 0 we have
S1
\
{τ<0 < ∞} = {θn+1 − θn < ∞}.
n=0
Taking expectation, we deduce that P(τ<0 < ∞) is indeed solution to (3.3). Now, notice that
F : α 7→ ∞ k+1 is a convex function on [0, 1] which always admits 1 as a fixed point.
P
k=−1 µk α
Since F 0 (1) = m + 1 we deduce that F admits two fixed points in the case m > 0. But when
m > 0 we already know that α < 1 and so α must be equal to the smallest solution of (3.3).
Exercise 3.4. Let (S) be a skip-free descending random walk which drifts towards +∞. Compute
the law of inf{Sk : k > 0}.

29
3.4 Parking on the line
In this section we give a nice application of skip-free random walk to the parking problem
on the line.

3.4.1 The model


Imagine an oriented discrete line with n vertices, the parking spots (each vertex can only accom-
modate at most one car). Cars labeled from 1 up to m arrive sequential on those vertices and
try to park at their arrival node. If the parking spot is occupied, the car drives towards the left
of the line and park on the first available spot. If it does not find a free spot, the car exits from
the parking lot.

Figure 3.5: Illustration of the parking of 6 cars. Notice that car number 6 did not
manage to find a spot and exit the parking lot.

An easy abelian property shows that the unlabeled final configuration as well as the number
of cars exiting the parking lot does not depend on the order in which we try to park the cars.
In our random model, one shall imagine that the m cars pick independently an arriving vertex
∈ {1, 2, . . . , n} uniformly at random. Of course, as long as m > n, it is impossible that all cars
park.

3.4.2 Parking probability


We shall prove the following theorem due to Konheim and Weiss:
Theorem 3.9 (Konheim & Weiss [14])
Imagine that m cars try to park uniformly and independently on n vertices. The probability
that they all manage to park is equal to
n+1−m 1 m
 
1+ .
n+1 n

30
In particular, if m = [αn] with α ∈ (0, 1) the last probability converges to (1 − α)eα as
n → ∞.

Proof. The idea is to encode the parking situation by a (random) walk (S). Specifically, each
vertex receiving k cars corresponds to an increment of the walk of 1 − k. The path we obtain
this way is clearly skip-free ascending. By construction of the coding, for any i ∈ {1, . . . , n} the
value of the walk at time i is equal to i minus the number of cars arriving on vertices on the left
of it. It is easy to see that full parking for the cars corresponds to the fact that the walk stays
non-negative until time n.

Figure 3.6: Encoding the parking configuration by a skip-free ascending random walk.

In our probabilistic model where the m cars choose independently and uniformly their arrival
vertex, the increments of the walk (S) are not independent. However, we clearly have Sn = n−m
and the increments of this walk are exchangeable, we can thus apply Lemma 3.5 (see the remark
following it). The slight problem is that Lemma 3.5 evaluates the probability that the walk stay
positive, and we need the probability that it stays non-negative. To go around this problem,
we imagine that we add a (n + 1)th vertex at the left extremity of the line. Clearly, each
successful parking configuration on {1, . . . , n} corresponds to a single configuration where the
m cars choose to park on vertices in {1, . . . , n} and that the vertex n + 1 is empty at the end.
In terms of the random walk, we precisely ask that it stays positive. Hence, by Lemma 3.5, the
number of successful parking configurations with m drivers and n spots is equal to
n+1−m
· (n + 1)m .
n+1
The theorem follows immediately.

Biliographical notes. The study of skip-free random walk may be seen as a particular case
of fluctuations theory for random walks (related to the Wiener–Hopf factorisation), see e.g. [17]
for a more trajectorial approach. The combinatorial approach taken here and based on the

31
cycle lemma is adapted from [10, Chapter XII] and [5, Section 8.4]; it has many ramifications
in the combinatorial literature, see [1] for much more about Ballot theorems and [8] for parking
functions. In general, paths transformations are very useful tools in fluctuation theory for
random walk. In particular, we mention the Sparre-Andersen identity relating the position of
the maximum and the time spent on the positive half-line for a random walk of length n, see
[10, Chapter XII] for more details. More recent applications of fluctuation theory for random
walks can be found e.g. in [2, 21].

32
Chapter IV: Galton-Watson trees
Will my family name survive?

In this chapter we use our knowledge on one-dimensional random walk to study random
trees coding for the genealogy of a population where individuals reproduce independently of
each other according to the same offspring distribution. These are the famous Galton–Watson
trees.

0 2n

Figure 4.1: A large Galton–Watson tree and its contour function

4.1 Plane trees and Galton–Watson processes


4.1.1 Plane trees
Throughout this work we will use the standard formalism for plane trees as found in [22]. Let

[
U = (N∗ )n
n=0

where N∗ = {1, 2, . . .} and (N∗ )0 = {∅} by convention. An element u of U is thus a finite


sequence of positive integers. We let |u| be the length of the word u. If u, v ∈ U, uv denotes the
concatenation of u and v. If v is of the form uj with j ∈ N, we say that u is the parent of v or

33
that v is a child of u. More generally, if v is of the form uw, for u, w ∈ U, we say that u is an
ancestor of v or that v is a descendant of u.
Definition 4.1. A plane tree τ is a (finite or infinite) subset of U such that

1. ∅ ∈ τ (∅ is called the root of τ ),

2. if v ∈ τ and v 6= ∅, the parent of v belongs to τ

3. for every u ∈ U there exists ku (τ ) ∈ {0, 1, 2, . . . } ∪ {∞} such that uj ∈ τ if and only if
j 6 ku (τ ).

3211 3212

321

11 31 32 33 34

1 2
3

Figure 4.2: A finite plane tree.

A plane tree can be seen as a graph, in which an edge links two vertices u, v such that u is the
parent of v or vice-versa. Notice that with our definition, vertices of infinite degree are allowed
since ku may be infinite. When all degrees are finite, the tree is said to be locally finite. In this
case, this graph is of course a tree in the graph-theoretic sense, and has a natural embedding
in the plane, in which the edges from a vertex u to its children u1, . . . , uku (τ ) are drawn from
left to right. All the trees considered in these pages are plane trees. The integer |τ | denotes the
number of vertices of τ and is called the size of τ . For any vertex u ∈ τ , we denote the shifted
tree at u by σu (τ ) := {v ∈ τ : uv ∈ τ }. If a and b are two vertices of τ , we denote the set of
vertices along the unique geodesic path going from a to b in τ by [[a, b]].

Definition 4.2. The set U is a plane tree where ku = ∞, ∀u ∈ U. It is called Ulam’s tree.

4.1.2 Galton–Watson trees


Let µ be a distribution on {0, 1, 2, . . . } which we usually suppose to be different from δ1 . In-
formally speaking, a Galton–Watson1 tree with offspring distribution µ is a random (plane)

1
Francis Galton (1822-1911) and Henry William Watson (1827-1903)

34
tree coding the genealogy of a population starting with one individual and where all individu-
als reproduce independently of each other according to the distribution µ. Here is the proper
definition:
Definition 4.3. Let Ku for u ∈ U be independent and identically distributed random variables
of law µ. We let T be the random plane tree made of all u = u1 u2 . . . un ∈ U such that
ui 6 Ku1 ...ui−1 for all 1 6 i 6 n. Then the law of T is the µ-Galton–Watson distribution.

Notice that the random tree defined above may very well be infinite.
Exercise 4.1. Show that the the random tree T constructed above has the following branching
property: Conditionally on k∅ (T ) = ` > 0 then the ` random trees σi (T ) for 1 6 i 6 ` are
independent and distributed as T .
Exercise 4.2. If τ0 is a finite plane tree and if T is a µ-Galton–Watson tree then
Y
P(T = τ0 ) = µku (τ0 ) .
u∈τ0

We now link the Galton–Watson tree to the well-known Galton–Watson process. We first
recall its construction. Let (ξi,j : i > 0, j > 1) be i.i.d. random variables of law µ. The
µ-Galton–Walton process is defined by setting Z0 = 1 and for i > 0
Zi
X
Zi+1 = ξi,j .
j=1

It is then clear from our constructions that if T is a µ-Galton–Watson tree, then the process
Xn = #{u ∈ T : |u| = n} has the law of a µ-Galton–Watson process.

4.2 Lukasiewicz walk and direct applications


In this section we will encode (finite) trees via one-dimensional walks. This will enable us
to get information on random Galton–Watson trees from our previous study of one-dimensional
random walks.

4.2.1 Lukasiewicz walk


The lexicographical order < on U is defined as the reader may imagine: if u = i1 i2 . . . ik and
v = j1 j2 . . . jk are two words of the same length then u < v if i` < j` where ` is the first index
where i` 6= j` . The breadth first order on U is defined by u ≺ v if |u| < |v| and if the two words
are of the same length then we require u < v (for the lexicographical order).

Definition 4.4. Let τ be a locally finite tree (i.e. ku (τ ) < ∞ for every u ∈ τ ). Write u0 , u1 , . . .
for its vertices listed in in the breadth first order. The Lukasiewicz walk W(τ ) = (Wn (τ ), 0 6
n 6 |τ |) associated to τ is given by W0 (τ ) = 0 and for 0 6 n 6 |τ | − 1:

Wn+1 (τ ) = Wn (τ ) + kun (τ ) − 1.

35
11 12
3211 3212

10 321

5 11 6 31 7 32 8 33 9 34

2 1
3 2 4 3

1 ∅

Figure 4.3: Left: a finite plane tree and its vertices listed in breadth-first order. Right:
its associated Lukasiewicz walk.

In words, the Lukasiewicz walk consists in listing the vertices in breadth first order and
making a stack by adding the number of children of each vertex and subtracting one (for the
current vertex). In the case of a finite plane tree, the total number of children is equal to the
number of vertices minus one, it should be clear that the Lukasiewicz walk which starts at 0 stays
non-negative until it finishes at the first time it touches the value −1. Note also that this walk
(or more precise its opposite) is skip-free in the sense of Chapter 3 since Wi+1 (τ ) − Wi (τ ) > −1
for any 0 6 i 6 |τ | − 1. When the tree is infinite but locally finite, every vertex of the tree will
appear in the breadth first ordering2 and the Lukasiewicz path stays non-negative for ever. We
leave the following exercise to the reader:
Exercise 4.3. Let Tlf the set of all finite and infinite locally finite plane trees. Let Wlf the set
of all finite and infinite paths (w0 , w1 , . . . , wn ) with n ∈ {1, 2, . . . } ∪ {∞} which starts at w0 = 0
and ends at wn = −1 and such that wi+1 − wi > −1 as well as wi > 0 for any 0 6 i 6 n − 1.
Then taking the Lukasiewicz walk creates a bijection between Tlf and Wlf .
Remark 4.1 (Different type of exploration). The Lukasiewicz path encodes the information when
we discover a tree using the breadth first search. Although we shall only use this exploration
in these notes, one can similarly discover the tree using the depth first search or using more
exotic type of exploration. This flexibility in the exploration algorithm is at the core of many
nice results in random graph theory, see e.g. [4, 16, 6]. [See also next chapters]

4.2.2 Lukasiewicz walk of a Galton–Watson tree


As it turns out the Lukasiewicz walk associated to a µ-Galton–Watson tree is roughly speaking
a random walk. Recall that the offspring distribution µ is supported by {0, 1, 2, . . . } so a µ-
Galton–Watson tree is locally finite a.s.
2
this is not true if we had chosen to explore the tree in the lexicographical order.

36
Proposition 4.1. Let T be a µ-Galton–Watson tree, and let (Sn )n>0 be a random walk with
i.i.d. increments of law P(S1 = k) = µ(k + 1) for k > −1. If T < is the first hitting time of −1
by the walk S then we have
 (d)
W0 (T ), W1 (T ), . . . , W|T | (T ) = (S0 , S1 , . . . , ST < )

Proof. Let (ωi0 : 0 6 i 6 n) be the first n steps of a skip-free random walk so that n is less than
or equal than the hitting time of −1 by this walk. By reversing the Lukasiewicz construction
we see that in order that the first n steps of the Lukasiewicz walk of the tree T matches with
(ωi0 : 0 6 i 6 n) the first n vertices of the trees in breadth first order as well as their number of
children are fixed by (ωi0 : 0 6 i 6 n), see Figure 4.4.

10

5 6 7 8 9

2 3 4

Figure 4.4: Fixing the first n vertices explored (in red) during the breadth first explo-
ration of a Galton–Watson tree. The black vertices and their subtrees (in gray) have
not been explored yet.

The probability under the µ-Galton–Watson to see this event is seen to be


Y n−1
Y
= P (Si )06i6n = (ωi0 )06i6n ,

µku (T ) = µω 0 0
i+1 −ωi +1
u∈τ0 i=0

where τ0 is the tree made by the first n vertices in breadth first order in T . The proposition
easily follows.

Extinction probability. Here is a direct application of the last result: a random walk proof of the
following well-known criterion for survival of a Galton–Watson process.
Theorem 4.2 (Extinction probability )
Let µ be an offspring distribution of mean m > 0 such that µ 6= δ1 . Recall that T denotes a
µ-Galton–Watson tree and (Zn )n>0 a µ-Galton–Watson process. Then the probability that
the tree is finite is equal to the smallest solution α ∈ [0, 1] to the equation

X
α= µk α k , (4.1)
k>0

37
in particular it is equal to 1 if m 6 1.

Proof. The equivalence between the first two assertions is obvious by the discussion at the end
of Section 4.1.2. We then use Proposition 4.1 to deduce that P(|T | = ∞) = P(T < = ∞). Since
the walk S is non trivial (i.e. not constant), we know from Chapter 2 that the last probability is
positive if and only if its drift is strictly positive i.e. if m > 1. The computation of the probability
that the walks stays positive has been carried out in Proposition 3.8 and its translation in our
context yields the statement.
Let us also recall the more “standard” proof of the last theorem which is useful in what
follows. Let g(z) = k>0 µk z k be the generating function of the offspring distribution µ. By
P

standard operation on generating functions, the generating function of #[T ]n , the number of
individuals at generation n is given by

g ◦ g ◦ · · · ◦ g,

(n-fold composition). Specifying at z = 0 we deduce that un = P(#[T ]n = 0) follows the


recurrence relation u0 = 0 and un+1 = g(un ). This recursive system is easily studied and un
converges towards the first fixed point of g in [0, 1] which is strictly less than 1 if and only if
g 0 (1) > 1 by convexity of g. We conclude using the fact that T = ∞ if and only if #[T ]n > 0
for all n > 0.

I
.

Figure 4.5: Illustration of the “standard” proof Theorem 4.2

Exercise 4.4 (A theorem of Dekking [7]). We say that an infinite tree τ contains an infinite
binary tree (starting at the root) if it is possible to find a subset S of vertices of τ containing the
origin ∅ and such that each vertex in S has exactly two children in S. Let g(z) = k>0 µk z k
P

be the generating function ofq the offspring distribution µ.

qi.hi.f ?.i 9PHo


@⑧
lexicographical
q
1. Show that the probability Horatio
that a µ-GW tree T contains no infinite binary tree (starting

jo
1%08/-00
at the root) is the smallest solution z ∈ [0, 1] to

• u
z = g(z) + (1 − z)g 0 (z).

38
"

-8081%00

' ? ?

%•µ•g•

2. Application: in the case p1 = (1−p) and p3 = p with p ∈ [0, 1] show that there is no infinite
binary tree in T if and only if p < 89 and that in the critical case p = 89 this probability is
in fact positive (contrary to the above case for survival of the tree).

Remark 4.2 (A historical remark). We usually attribute to Galton and Watson the introduction
and study of the so-called Galton–Watson trees in 1873 in order to study the survival of family
names among British lords. However, in their initial paper devoted to the calculation of the
extinction probability they conclude hastily that the latter is a fixed point of Equation 4.1
and since 1 is always a fixed point, then the extinction is almost sure whatever the offspring
distribution. Too bad. This is even more surprising since almost thirty years before, in 1845
Irénée-Jules Bienaymé3 considered the very same model and derived correctly the extinction
probability. This is yet just another illustration of Stigler’s law of eponymy!

4.2.3 Lagrange inversion formula


The Lagrange inversion is a close formula for the coefficients of the inverse of a power series. More
precisely, imagine that f (z) = i>0 fi z i ∈ C[[z]] is a formal power series in the indeterminate z
P

(no convergence conditions are assumed) so that f0 = 0 and f1 6= 0. One then would like to invert
f i.e. finding a power series φ ∈ C[[z]] such that z = φ(f (z)) = f (φ(z)). In combinatorics the
z
above equation is usually written in the “Lagrange formulation” by supposing that f (z) = R(z)
with R(z) ∈ C[[z]] with R(0) 6= 0 so that the equation becomes

φ(z) = z · R(φ(z)). (4.2)


Theorem 4.3 (Lagrange inversion formula)
Let R ∈ C[[z]] be a formal power series in z such that [z 0 ]R 6= 0. Then there exists a unique
formal power series φ satisfying (4.2) and we have for all k > 0 and all n > 1
k k n−1  k−1 
[z n ] φ(z) = [z ] z R(z)n ,
n
where [z n ]f (z) in the coefficient in front of z n in the formal power series f ∈ C[[z]].

Proof. The idea is to interpret combinatorially the weights in the formal expansion z · R(φ(z)),
where R(z) = i>0 ri z i . Indeed, by expanding recursively the equation (4.2), it can be seen
P

that the coefficient in front of z n in φ can be interpreted as a sum over all plane trees with n
vertices where the weight of a tree τ is given by
Y
w(τ ) = rku (τ ) .
u∈τ

Similarly for k > 1, the coefficient of in zn φk


is the weight of forest of k trees having n vertices
in total. Now, using the Lukasiewicz encoding, a forest can be encoded by a skip-free descending

3
Irénée-Jules Bienaymé (1796-1878)

39
path with n steps and reaching −k for the first time at time n and the weight of such paths
become w(S) = n−1
Q
i>0 rSi+1 −Si +1 . By Feller’s combinatorial lemma, for a skip-free descending
walk (S) of length n such that Sn = −k there are exactly k cyclic shifts so that n is the k-th
strict descending ladder time. So if we partition the set of all walks of length n so that Sn = −k
using the cyclic shift as an equivalence relation, we know that in each equivalence classes, the
proportion of walks so that Tk< = n is nk (most of the classes actually have n elements in it, but
it could be the case that the subgroup of cyclic shifts fixing the walk is non-trivial and has order
`|k, in which case there are n/` elements in the orbit and k/` are such that Tk< = n). Since the
weight w(·) is constant over all equivalence classes we deduce that:

X n X
w(S) = w(S).
k
(S) walks of length n (S) walks of length n
Sn =−k Sn =−k and Tk< =n

It remains to notice that  


[z n−1 ] z k−1 R(z)n ,

is exactly the weight of all paths (S) of length n such that Sn = −k.
Here are two recreative (but surprising) applications of Lagrange inversion formula:
Exercise 4.5. Let F (x) be the be the unique power series with rational and positive coefficients
such that for all n > 0 the coefficient of xn in F n+1 (x) is equal to 1. Show that F (x) = 1−ex−x .
Exercise 4.6. For a ∈ (0, 1) shows that the only positive solution x = x(a) of x5 − x − a = 0 can
be written as
X 5k  a4k+1
x=− .
k 4k + 1
k>0

4.3 Probabilistic counting of trees


In this section we illustrate how to enumerate certain classes of trees using our knowledge on
random walks. One underlying idea is to design a random variable which is uniformly distributed
on the set we wish to count.

4.3.1 Prescribed degrees

Theorem 4.4 (Harary & Prins & Tutte (1964))


P P
The number of plane trees with di vertices with i > 0 children, and with n = 1+ idi = di
vertices is equal to
(n − 1)!
.
d0 !d1 ! · · · di ! · · ·

P P
Proof. Fix di , k and n as in the theorem. Notice that we must have n = 1 + idi = di . By
the encoding of plane trees into their Lukaciewicz path it suffices to enumerate the number of

40
paths starting from 0, ending at −1 at n and with di steps of i − 1 and which stay non-negative
until time n − 1. Clearly, if one removes the last assumption there are
 
n n!
=
d0 , . . . , di , . . . d0 !d1 ! · · ·

such paths. If we partition those paths according to the cyclic shift equivalence relation, then
by Lemma 3.2 (see Remark 3.1) we know that each equivalence class has cardinal n and has a
unique element which stays non-negative until time n − 1. Hence the quantity we wanted to
enumerate is equal to  
1 n Y 1
= (n − 1)! .
n d0 , . . . , di , . . . di !
i

Corollary 4.5 (Catalan’s counting). We have


 
1 2n
#{plane trees with n + 1 vertices} = .
n+1 n

Proof. With the same notation as in the preceding theorem, the number of trees with n > 1
vertices is equal to

(n − 1)!
 
X 1 X n 1 n
= = [z n−1 ] 1 + z + z 2 + z 3 + · · · .
d0 !d1 ! · · · n d0 , d1 , . . . n
d0 ,d1 ,d2 ,...P
P d0 ,d1 ,d2 ,...P
P
1+ idi =n= di 1+ idi =n= di

Using Lagrange inversion formula (Theorem 4.3) the last quantity can be expressed as [z n ]φ(z)
z

where φ(z) is the formal series solution to φ(z) = 1−φ(z) . Solving we get φ(z) = 12 (1 − 1 − 4z)
and a coefficient extraction yields the desired formula.

4.3.2 Uniform trees


We denote by Tn the set of all plane trees with n edges and by Tn a uniform plane tree taken
in Tn . As we shall see Tn can be seen as a conditioned version of a Galton–Watson tree:

Proposition 4.6. Let T be the Galton–Watson tree with geometric offspring distribution of
k+1
parameter 1/2, i.e. µk = 12 for k > 0. Then Tn has the law of T conditioned on having n
edges.

Proof. Let τ0 be a tree with n edges. Then by Exercise 4.2 we have


Y
P(T = τ0 ) = 2−ku (τ )−1 .
u∈τ0
P
However, it is easy to see that in a plane tree with n edges we have u∈τ0 ku (τ0 ) = |τ0 | − 1 = n
so that the last display is equal to 21 4−n . The point is that this probability does not depend on
τ0 as long as it has n edges. Hence, the conditional law of T on Tn is the uniform law.

41
Notice that the above proposition and its proof hold for any non trivial parameter of the
geometric offspring distribution. However, we chose 1/2 because in this case the offspring dis-
tribution is critical, i.e. it has mean 1. We can give another proof of Corollary 4.5:
Proof. Combining by the last proposition with Proposition 4.1 and Kemperman formula yields
1
P(|T | = n + 1) = P(T < = n + 1) = P(Sn+1 = −1),
Prop.3.3 n+1

where (S) is the random walk whose increments are distributed as P(S1 = k) = 2−k−2 for
k > −1 or equivalently as G − 1 where G is the geometric offspring distribution of parameter
1/2. Recall that G is also the number of failures before the first success in a series of independent
coin flips: this is the negative Binomial distribution with parameter (1, 1/2). Hence P(Sn+1 =
−1) = P(Binneg(n + 1, 1/2) = n) where Binneg(n, p) is the negative Binomial distribution with
parameter (n, p). This distribution is explicit and we have P(Binneg(n, p) = k) = n+k−1
 n
n−1 p (1 −
p)k which is our case reduces to
 
1 1 1 −n 1 2n
P(Sn+1 = −1) = P(Binneg(n + 1, 1/2) = n) = 4 .
n+1 n+1 2 n+1 n

By the previous proposition (and its proof) we have on the other hand
1
P(|T | = n + 1) = #{plane trees with n + 1 vertices} · 4−n .
2
The result follows by comparing the last two displays.

Exercise 4.7. Extend the above proof to show that the number of forest of f > 1 trees whose
total number of edges is n is equal to
 
f 2n + f
.
2n + f n

Exercise 4.8. Give another proof of the last display using Theorem 4.3.
Theorem 4.7 (Height of a uniform point)
Let Tn be a uniform plane tree with n edges. Conditionally on Tn let δn be a uniformly
chosen vertex of Tn and denote Hn its height (the distance to the ancestor of Tn ) then we
have
2n+1

2h + 1 n−h
P(Hn = h) =  .
2n + 1 2n n

In particular Hn / n converges towards the Rayleigh distribution with density 2x exp(−x2 )1x>0 .

Proof. We compute exactly the probability that the point δn is located at height h > 0.

42
If so, the tree Tn is obtained from the line joining ∅ to δn by grafting
τh−1
δn
τ̃h−1
2h + 1 plane trees on its left and on its right. Obviously, the total
number of edges of these trees must be equal to n − h. Using Exercise
4.7 we deduce that

2h+1 2n+1

τ2 τ̃2
2n+1 n−h
P(Hn = h) = 2n
 .
τ1 τ̃1
n
τ0 τ̃0 The second item of the theorem follows after applying Stirling formula
∅ and using Exercise 3.1.

4.3.3 Poisson Galton–Watson trees and Cayley trees


In this section we focus on a different type of trees first studied by Cayley 4

Definition 4.5. A Cayley tree is a labeled tree with n vertices without any orientation nor
distinguished point. In other words it is a spanning tree on Kn , the complete graph over n
vertices. See Fig. 4.6.

4 7 10

5 2

8 1 11 9 6

Figure 4.6: A Cayley tree over {1, 2, 3, 4, . . . , 11}.

Let T be a Galton-Watson (plane) tree with Poisson offspring distribution of parameter 1 (in
particular, the mean number of children is 1 and we are in the critical case). As above we denote
by Tn the random tree T conditioned on having n vertices.

Proposition 4.8. Consider Tn and assign the labels {1, . . . , n} uniformly at random to the
vertices of Tn . After forgetting the plane ordering Tn this produces a Cayley tree which we
denote by Tn . Then Tn is uniformly distributed over all Cayley trees.

Proof. Let us first compute the probability that T has n vertices. Using the Lukasiewicz walk
and the cyclic lemma we get that P(|T | = n) = n1 P(Sn = −1), where S is the random walk
whose increments are centered distributed according to Poisson(1) − 1 variable. It follows that

1 nn−2
P(|T | = n) = P(Poisson(n) = n − 1) = e−n .
n (n − 1)!

4
Arthur Cayley (1821-1895) receiving a phone call

43
Fix a Cayley tree t and let us study the possible ways to obtain t by the above process. We
first choose the root of the tree among the n possibles vertices. Once the origin is distinguished,
Q
there are u∈t ku ! possible ways to give a planar orientation to the tree, where ku is the number
of children of the vertex u (for this we needed to distinguish the ancestor vertex). After these
operations, each of the labeled, rooted, plane trees obtained appears with a probability (under
the Galton–Watson measure) equal to
1 −n Y 1
e .
n! k !
u∈t u

Performing the summation, the symmetry factors involving the ku ! conveniently disappear and
we get
−1
e−n n−2

−n n 1
P(Tn → t) = n × e = n−2 .
n! (n − 1)! n
Since the result of the last display does not depend on the shape of t, the induced law is indeed
uniform over all Cayley trees and we have even proved:

Corollary 4.9 (Cayley’s formula). The total number of Cayley trees on n vertices is nn−2 .

Exercise 4.9. For any p > 2 a p-tree is a plane tree such that the number of children of each
vertex is either 0 or p. When p = 2 we speak of binary trees. In particular the number of edges
of a p-tree must be a multiple of p. Show that for any k > 1 we have
 
1 kp + 1
#{p − trees with kp edges } = ,
kp + 1 k
in three ways: using a direct application of Theorem 4.4, using a probabilistic approach via a
certain class of random Galton–Watson trees, or via Lagrange inversion’s formula Theorem 4.3.

4.4 Contour function


We finish this chapter by mentioning another more geometrical encoding of plane trees which
is less convenient in the general case but very useful in the case of geometric Galton–Watson
trees.
Let τ be a finite plane tree. The contour function Cτ associated with τ is heuristically
obtained by recording the height of a particle that climbs the tree and makes its contour at
unit speed. More formally, to define it properly one needs the definition of a corner: We view
τ as embedded in the plane then a corner of a vertex in τ is an angular sector formed by two
consecutive edges in clockwise order around this vertex. Note that a vertex of degree k in τ has
exactly k corners. If c is a corner of τ , Ver(c) denotes the vertex incident to c.
The corners are ordered clockwise cyclically around the tree in the so-called contour order.
If τ has n > 2 vertices we index the corners by letting (c0 , c1 , c2 , . . . , c2n−3 ) be the sequence of
corners visited during the contour process of τ , starting from the corner c0 incident to ∅ that is
located to the left of the oriented edge going from ∅ to 1 in τ .

44
Definition 4.6. Let τ be a finite plane tree with n > 2 vertices and let (c0 , c1 , c2 , . . . , c2n−3 ) be
the sequence of corners visited during the contour process of τ . We put c2n−2 = c0 for notational
convenience. The contour function of τ is the walk defined by

Cτ (i) = |Ver(ci )|, for 0 6 i 6 2n − 2.

c12 c14

c13

c11 c15

c2 c8 c10 c16 c18 c20


c9 c17 c19
c3 c5
c1 c7
c21
c4 c6

c0

Figure 4.7: The contour function associated with a plane tree.

Clearly, the contour function of a finite plane tree is a finite non-negative walk of length
2|τ | − 1 which only makes ±1 jumps. Here as well, the encoding of a tree into its contour
function is invertible:
Exercise 4.10. Show that taking the contour function creates a bijection between the set of all
finite plane trees and the set of all non-negative finite walks with ±1 steps which start and end
at 0.
Now, we give a probabilistic description of the law of the contour function of T when T is
distributed as a geometric(1/2)-Galton–Watson tree (i.e. has the same law as in Section 4.3.2).

Proposition 4.10. Let T as above. Then its contour function CT has the same law as

(S0 , S1 , . . . , ST < −1 ),

where (S) is a simple symmetric random walk and T < is the first hitting time of −1.
Proof. Notice first that T is almost surely finite by Theorem 4.2 and so all the objects considered
above are well defined. Let τ0 be a plane tree with n edges. We have seen in the previous
proposition that P(T = τ0 ) = 21 4−n . On the other hand, the contour function of τ0 has length 2n
and the probability that the first 2n steps of (S) coincide with this function and that T < = 2n+1
is equal to 2−2n · 21 = 21 4−n . This concludes the proof.

Exercise 4.11. Give a new proof of Corollary 4.5 using the contour function.
Exercise 4.12. Let T be a Galton–Watson tree with geometric(1/2) offspring distribution. The
height of T is the maximal length of one of its vertices. Prove that
1
P(Height(T ) > n) = .
n+1

45
Bibliographical notes. The material about Galton–Watson tree is rather classical. The coding of
trees and the formalism for plane trees (the so-called Neveu’s notation [22]) can be found in [20].
The lecture notes of Igor Kortchemski [15] is a very good introduction accessible to the first years
of undergraduate studies in math. Beware some authors prefer to take the lexicographical order
rather than the breadth first order to define the Lukasiewicz walk (in the finite case this causes
no problem but this is not a bijection if the trees can be infinite). The two exercices illustrating
Lagrange inversion formula are taken from the MathOverFlow post ”What is Lagrange inversion
good for?”.

46
Bibliography
[1] L. Addario-Berry and B. A. Reed, Ballot theorems, old and new, in Horizons of com-
binatorics, vol. 17 of Bolyai Soc. Math. Stud., Springer, Berlin, 2008, pp. 9–35.

[2] L. Alili, L. Chaumont, and R. Doney, On a fluctuation identity for random walks and
lévy processes, Bulletin of the London Mathematical Society, 37 (2005), pp. 141–148.

[3] I. Benjamini, O. Schramm, et al., Percolation beyond Zs , many questions and a few
answers, Electronic Communications in Probability, 1 (1996), pp. 71–82.

[4] N. Broutin and J.-F. Marckert, A new encoding of coalescent processes: applications
to the additive and multiplicative cases, Probability Theory and Related Fields, 166 (2016),
pp. 515–552.

[5] K. L. Chung, A course in probability theory, Academic Press [A subsidiary of Harcourt


Brace Jovanovich, Publishers], New York-London, second ed., 1974. Probability and Math-
ematical Statistics, Vol. 21.

[6] N. Curien, Peeling random planar maps, Saint-Flour course 2019,


https://www.imo.universite-paris-saclay.fr/ curien/.

[7] F. M. Dekking, Branching processes that grow faster than binary splitting, Amer. Math.
Monthly, 98 (1991), pp. 728–731.

[8] P. Diaconis and A. Hicks, Probabilizing parking functions, Advances in Applied Math-
ematics, 89 (2017), pp. 125–155.

[9] H. Duminil-Copin, Sixty years of percolation, arXiv preprint arXiv:1712.04651, (2017).

[10] W. Feller, An introduction to probability theory and its applications. Vol. II., Second
edition, John Wiley & Sons, Inc., New York-London-Sydney, 1971.

[11] G. Grimmett, Percolation and disordered systems, in Lectures on probability theory and
statistics, Springer, 1997, pp. 153–300.

[12] I. A. Ibragimov and Y. V. Linnik, Independent and stationary sequences of random


variables, Wolters-Noordhoff Publishing, Groningen, 1971. With a supplementary chapter
by I. A. Ibragimov and V. V. Petrov, Translation from the Russian edited by J. F. C.
Kingman.

47
[13] O. Kallenberg, Foundations of Modern Probability, Springer, New York, second ed.,
2002.

[14] A. G. Konheim and B. Weiss, An occupancy discipline and applications, SIAM Journal
on Applied Mathematics, 14 (1966), pp. 1266–1274.

[15] I. Kortchemski, Arbres et marches aléatoires, Journées X-UPS, (2016).

[16] M. Krivelevich and B. Sudakov, The phase transition in random graphs: A simple
proof, Random Structures & Algorithms, 43 (2013), pp. 131–138.

[17] A. E. Kyprianou, Wiener–hopf decomposition, Encyclopedia of Quantitative Finance,


(2010).

[18] S. Lalley, One-dimensional random walks (lecture notes), http://galton.uchicago.edu/ lal-


ley/Courses/312/RW.pdf.

[19] G. F. Lawler and V. Limic, Random walk: a modern introduction, vol. 123 of Cambridge
Studies in Advanced Mathematics, Cambridge University Press, Cambridge, 2010.

[20] J.-F. Le Gall, Random trees and applications, Probability Surveys, (2005).

[21] P. Marchal, Two consequences of a path transform, Bulletin of the London Mathematical
Society, 33 (2001), pp. 213–220.

[22] J. Neveu, Arbres et processus de Galton-Watson, Ann. Inst. H. Poincaré Probab. Statist.,
22 (1986), pp. 199–207.

[23] L. Shepp, Recurrent random walks with arbitrarily large steps, Bulletin of the American
Mathematical Society, 70 (1964), pp. 540–542.

[24] F. Spitzer, Principles of random walk, Springer-Verlag, New York-Heidelberg, second ed.,
1976. Graduate Texts in Mathematics, Vol. 34.

[25] W. Werner, Lectures on two-dimensional critical percolation, arXiv preprint


arXiv:0710.0856, (2007).

48

You might also like