Professional Documents
Culture Documents
Dajani Dirksin Ergodic Theory
Dajani Dirksin Ergodic Theory
4 Entropy 55
4.1 Randomness and Information . . . . . . . . . . . . . . . . . . 55
4.2 Definitions and Properties . . . . . . . . . . . . . . . . . . . . 56
4.3 Calculation of Entropy and Examples . . . . . . . . . . . . . . 63
4.4 The Shannon-McMillan-Breiman Theorem . . . . . . . . . . . 66
4.5 Lochs’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3
4
5
6 Introduction and preliminaries
x, T x, T 2 x, . . . .
. . . , T −1 x, x, T x, . . . .
We want also that the evolution is in steady state i.e. stationary. In the
language of ergodic theory, we want T to be measure preserving.
f, f ◦ T, f ◦ T 2 , . . .
is stationary. This means that for all Borel sets B1 , . . . , Bn , and all integers
r1 < r2 < . . . < rn , one has for any k ≥ 1,
µ ({x : f (T r1 x) ∈ B1 , . . . f (T rn x) ∈ Bn }) =
µ {x : f (T r1 +k x) ∈ B1 , . . . f (T rn +k x) ∈ Bn } .
Using the above Theorem, one can get an easier criterion for checking that
a transformation is measure preserving.
Proof. Let
C = {B ∈ B2 : T −1 B ∈ B1 , and µ1 (T −1 B) = µ2 (B)}.
8 Introduction and preliminaries
µ1 (T −1 E) = µ1 (∪∞
n=1 T
−1
En ) = lim µ1 (T −1 En )
n→∞
= lim µ2 (En )
n→∞
= µ2 (∪∞
n=1 En )
= µ2 (E).
µ T −1 {x : x0 = a0 , . . . , xn = an } = µ ({x : x0 = a0 , . . . , xn = an })
A∆B = (A ∪ B) \ (A ∩ B) = (A \ B) ∪ (B \ A).
Proof. Let
D = {A ∈ B : for any > 0, there exists C ∈ A such that µ(A∆C) < }.
Hence, A ∈ D. Similarly, one can show that D is closed under decreasing in-
tersections so that D is a monotone class containg A, hence by the Monotone
class Theorem B ⊆ D. Therefore, B = D, and the theorem is proved.
T x = x + θ mod 1 = x + θ − bx + θc.
and
λ T −1 [a, b) = b − a = λ ([a, b)) .
Although this map is very simple, it has in fact many facets. For example,
iterations of this map yield the binary expansion of points in [0, 1) i.e., using
T one can associate with each point in [0, 1) an infinite sequence of 0’s and
1’s. To do so, we define the function a1 by
(
0 if 0 ≤ x < 1/2
a1 (x) =
1 if 1/2 ≤ x < 1,
Thus, x = ∞
P ai
i=1 2i . We shall later see that the sequence of digits a1 , a2 , . . .
forms an i.i.d. sequence of Bernoulli random variables.
(c) Baker’s Transformation – This example is the two-dimensional version
of example (b). The underlying probability space is [0, 1)2 with product
Lebesgue σ-algebra B × B and product Lebesgue measure λ × λ. Define
T : [0, 1)2 → [0, 1)2 by
(
(2x, y2 ) 0 ≤ x < 1/2
T (x, y) = y+1
(2x − 1, 2 ) 1/2 ≤ x < 1.
(d) β-transformations
√
– Let X = [0, 1) with the Lebesgue σ-algebra B. Let
1+ 5
β = 2 , the golden mean. Notice that β 2 = β + 1. Define a transformation
Basic Examples 11
T : X → X by
βx 0 ≤ x < 1/β
T x = βx mod 1 =
βx − 1 1/β ≤ x < 1.
where ( √
5+3 5
10
0 ≤ x < 1/β
g(x) = √
5+ 5
10
1/β ≤ x < 1.
and
µ ({x : x−n+1 = a−n , . . . , xn+1 = an }) = pa−n . . . pan .
12 Introduction and preliminaries
Notice that in case X = {0, 1, . . . k − 1}N , then one should consider cylinder
sets of the form {x : x0 = a0 , . . . xn = an }. In this case
T −1 {x : x0 = a0 , . . . , xn = an } = ∪k−1
j=0 {x : x0 = j, x1 = a0 , . . . , xn+1 = an },
Since
T −1 ({x : xn1 ∈ B1 , . . . xnr ∈ Br }) = {x : xn1 +1 ∈ B1 , . . . xnr +1 ∈ Br },
stationarity of the process Yn implies that T is measure preserving. Further-
more, if we let πi : X → R be the natural projection onto the ith coordinate,
then Yi (ω) = πi (φ(ω)) = π0 ◦ T i (φ(ω)).
(h) Random Shifts – Let (X, B, µ) be a probability space, and T : X → X an
invertible measure preserving transformation. Then, T −1 is measurable and
measure preserving with respect to µ. Suppose now that at each moment
instead of moving forward by T (x → T x), we first flip a fair coin to decide
whether we will use T or T −1 . We can describe this random system by means
of a measure preserving transformation in the following way.
Let Ω = {−1, 1}Z with product σ-algebra F (i.e. the σ-algebra generated
by the cylinder sets), and the uniform product measure IP (see example (e)),
and let σ : Ω → Ω be the left shift. As in example (e), the map σ is measure
preserving. Now, let Y = Ω × X with the product σ-algebra, and product
measure IP × µ. Define S : Y → Y by
S(ω, x) = (σω, T ω0 x).
Then S is invertible (why?), and measure preserving with respect to IP × µ.
To see the latter, for any set C ∈ F, and any A ∈ B, we have
(IP × µ) S −1 (C × A) = (IP × µ) ({(ω, x) : S(ω, x) ∈ (C × A))
= (IP × µ) ({(ω, x) : ω0 = 1, σω ∈ C, T x ∈ A)
+ (IP × µ) {(ω, x) : ω0 = −1, σω ∈ C, T −1 x ∈ A
= (IP × µ) {ω0 = 1} ∩ σ −1 C × T −1 A
= IP {ω0 = 1} ∩ σ −1 C µ T −1 A
+ IP {ω0 = −1} ∩ σ −1 C µ (T A)
= IP {ω0 = 1} ∩ σ −1 C µ(A)
(h) continued fractions – Consider ([0, 1), B), where B is the Lebesgue σ-
algebra. Define a transformation T : [0, 1) → [0, 1) by T 0 = 0 and for x 6= 0
1 1
T x = − b c.
x x
14 Introduction and preliminaries
An interesting feature of this map is that its iterations generate the continued
fraction expansion for points in (0, 1). For if we define
(
1 if x ∈ ( 12 , 1)
a1 = a1 (x) = 1
n if x ∈ ( n+1 , n1 ], n ≥ 2,
1 1
then, T x = x
− a1 and hence x = . For n ≥ 1, let an = an (x) =
a1 + T x
a1 (T n−1 x). Then, after n iterations we see that
1 1
x= = ... = .
a1 + T x 1
a1 +
. 1
a2 + . . +
an + T n x
pn 1
In fact, if = , then one can show that {qn } are mono-
qn 1
a1 +
. 1
a2 + . . +
an
tonically increasing, and
pn 1
|x − | < 2 → 0 as n → ∞.
qn qn
1
x= .
1
a1 +
1
a2 +
1
a3 +
..
.
Recurrence 15
1.4 Recurrence
Let T be a measure preserving transformation on a probability space (X, F, µ),
and let B ∈ F. A point x ∈ B is said to be B-recurrent if there exists k ≥ 1
such that T k x ∈ B.
Proof Let F be the subset of B consisting of all elements that are not B-
recurrent. Then,
F = {x ∈ B : T k x ∈
/ B for all k ≥ 1}.
We want to show that µ(F ) = 0. First notice that F ∩ T −k F = ∅ for all
k ≥ 1, hence T −l F ∩ T −m F = ∅ for all l 6= m. Thus, the sets F, T −1 F, . . .
are pairwise disjoint, and µ(T −n F ) = µ(F ) for all n ≥ 1 (T is measure
preserving). If µ(F ) > 0, then
X
1 = µ(X) ≥ µ ∪k≥0 T −k F = µ(F ) = ∞,
k≥0
a contradiction.
The proof of the above theorem implies that almost every x ∈ B returns
to B infinitely often. In other words, there exist infinitely many integers
n1 < n2 < . . . such that T ni x ∈ B. To see this, let
D = {x ∈ B : T k x ∈ B for finitely many k ≥ 1}.
Then,
D = {x ∈ B : T k x ∈ F for some k ≥ 0} ⊆ ∪∞
k=0 T
−k
F.
Thus, µ(D) = 0 since µ(F ) = 0 and T is measure preserving.
For x ∈ A, let n(x) := inf{n ≥ 1 : T n x ∈ A}. We call n(x) the first return
time of x to A.
µ(B)
µA (B) = , for B ∈ F ∩ A,
µ(A)
Ak = {x ∈ A : n(x) = k}
Bk = {x ∈ X \ A : T x, . . . , T k−1 x 6∈ A, T k x ∈ A}.
Notice that A = ∞
S
k=1 Ak , and
Let C ∈ F ∩A, since T is measure preserving it follows that µ(C) = µ(T −1 C).
To show that µA (C) = µA (TA−1 C), we show that
B. 1
..
........
...
..
..
B. 1 B. 2 .
.... ....
....... .......
.... ....
T 2 A \ A ∪ T A ≡ (A, 3)
B. 1 B. 2 B. 3
.. .. ..
........ ........ ........
... ... ...
.. .. ..
T A \ A ≡ (A, 2)
B. 1 B. 2 B. 3 B. 4
.... .... .... ....
....... ....... ....... .......
.... .... .... ....
A ≡ (A, 1)
A1 A2 A3 A4 A5
Now,
∞
[ ∞
[
TA−1 (C) = Ak ∩ TA−1 C = Ak ∩ T −k C,
k=1 k=1
hence ∞
X
TA−1 (C) µ Ak ∩ T −k C .
µ =
k=1
On the other hand, using repeatedly (1.1) , one gets for any n ≥ 1,
Since
∞
! ∞
[ X
1 ≥ µ Bn ∩ T −n C = µ(Bn ∩ T −n C),
n=1 n=1
it follows that
lim µ(Bn ∩ T −n C) = 0.
n→∞
18 Introduction and preliminaries
Thus,
∞
X
−1
µ Ak ∩ T −k C = µ(TA−1 C).
µ(C) = µ(T C) =
k=1
This shows that µA (C) = µA (TA−1 C), which implies that TA is measure pre-
serving with respect to µA .
(b) Determine explicitely the induced transformation of T on the set [0, 1)×
[0, G1 ).
where B ⊂ A, B ∈ F and i ∈ N.
ν(B)
(3) µ(B, i) = Z and then extend µ to all of X.
f (y) dν(y)
A
ν(B)
= Z = µ(B, 1) .
f (y) dν(y)
A
20 Introduction and preliminaries
1.6 Ergodicity
Definition 1.6.1 Let T be a measure preserving transformation on a proba-
bility space (X, F, µ). The map T is said to be ergodic if for every measurable
set A satisfying T −1 A = A, we have µ(A) = 0 or 1.
(iv) If A, B ∈ F with µ(A) > 0 and µ(B) > 0, then there exists n > 0 such
that µ(T −n A ∩ B) > 0.
Remark 1.6.1
1. In case T is invertible, then in the above characterization one can replace
T −n by T n .
2. Note that if µ(B4T −1 B) = 0, then µ(B \ T −1 B) = µ(T −1 B \ B) = 0.
Since
B = B \ T −1 B ∪ B ∩ T −1 B ,
and
T −1 B = T −1 B \ B ∪ B ∩ T −1 B ,
we see that after removing a set of measure 0 from B and a set of measure 0
from T −1 B, the remaining parts are equal. In this case we say that B equals
T −1 B modulo sets of measure 0.
3. In words, (iii) says that if A is a set of positive measure, almost every
x ∈ X eventually (in fact infinitely often) will visit A.
4. (iv) says that elements of B will eventually enter A.
Ergodicity 21
Using induction (and the fact that µ(E∆F ) ≤ µ(E∆G) + µ(G∆F )), one can
−k
show that for each k ≥ 1 one has µ T B∆B = 0. Hence, µ(C∆B) = 0
which implies that µ(C) = µ(B). Therefore, µ(B) = 0 or 1.
(ii)⇒(iii) Let µ(A) > 0 and let B = ∞ −n
A. Then T −1 B ⊂ B. Since T
S
n=1 T
is measure preserving, then µ(B) > 0 and
We use the subscript R whenever we are dealing only with real-valued func-
tions.
Let (Xi , Fi , µi ), i = 1, 2 be two probability spaces, and T : X1 → X2 a mea-
sure preserving transformation i.e., µ2 (A) = µ1 (T −1 A). Define the induced
operator UT : L0 (X2 , F2 , µ2 ) → L0 (X1 , F1 , µ1 ) by
UT f = f ◦ T.
(i) UT is linear
(i) T is ergodic.
Proof
The implications (iii)⇒(ii), (ii)⇒(iv), (v)⇒(iv), and (iii)⇒(v) are all clear.
It remains to show (i)⇒(iii) and (iv)⇒(i).
(i)⇒(iii) Suppose f (T x) = f (x) a.e. and assume without any loss of gener-
ality that f is real (otherwise we consider separately the real and imaginary
parts of f ). For each n ≥ 1 and k ∈ Z, let
k k+1
X(k,n) = {x ∈ X : n
≤ f (x) < }.
2 2n
Then, T −1 X(k,n) ∆X(k,n) ⊆ {x : f (T x) 6= f (x)} which implies that
µ T −1 X(k,n) ∆X(k,n) = 0.
kn T
hence limn→∞ exists. Let Y = n≥1 X(kn ,n) , then µ(Y ) = 1. Now, if
2n
kn
x ∈ Y , then 0 ≤ |f (x) − kn /2n | < 1/2n for all n. Hence, f (x) = limn→∞ n ,
2
and f is a constant on Y .
p
Exercise 1.8.1 Suppose θ = with gcd(p, q) = 1. Find a non-trivial Tθ -
q
invariant set. Conclude that Tθ is not ergodic if θ is a rational.
2πinθ 2πinx
P
Hence, n∈Z an (1 − e )e = 0. By the uniqueness of the Fourier
2πinθ
coefficients, we have an (1 − e ) = 0 for all n ∈ Z. If n 6= 0, since θ is
irrational we have 1 − e2πinθ 6= 0. Thus, an = 0 for all n 6= 0, and therefore
f (x) = a0 is a constant. By Theorem 1.7.1, Tθ is ergodic.
Exercise 1.8.2 Consider the probability space ([0, 1), B × B, λ × λ), where
as above B is the Lebesgue σ-algebra on [0, 1), and λ normalized Lebesgue
measure. Suppose θ ∈ (0, 1) is irrational, and define Tθ × Tθ : [0, 1) × [0, 1) →
[0, 1) × [0, 1) by
Tθ × Tθ (x, y) = (x + θ mod (1) , y + θ mod (1) ) .
Show that Tθ × Tθ is measure preserving, but is not ergodic.
Example 2–One (or Two) sided shift. Let X = {0, 1, . . . k − 1}N , F the σ-
algebra generated by the cylinders, and µ the product measure defined on
cylinder sets by
µ ({x : x0 = a0 , . . . xn = an }) = pa0 . . . pan ,
where p = (p0 , p1 , . . . , pk−1 ) is a positive probability vector. Consider the
left shift T defined on X by T x = y, where yn = xn+1 (See Example (e) in
Subsection 1.3). We show that T is ergodic. Let E be a measurable subset
of X which is T -invariant i.e., T −1 E = E. For any > 0, by Lemma 1.2.1
(see subsection 1.2), there exists A ∈ F which is a finite disjoint union of
cylinders such that µ(E∆A) < . Then
|µ(E) − µ(A)| = |µ(E \ A) − µ(A \ E)|
≤ µ(E \ A) + µ(A \ E) = µ(E∆A) < .
Since A depends on finitely many coordinates only, there exists n0 > 0
such that T −n0 A depends on different coordinates than A. Since µ is a
product measure, we have
µ(A ∩ T −n0 A) = µ(A)µ(T −n0 A) = µ(A)2 .
26 Introduction and preliminaries
Further,
and
µ E∆(A ∩ T −n0 A) ≤ µ(E∆A) + µ(E∆T −n0 A) < 2.
Hence,
Thus,
The following lemma provides, in some cases, a useful tool to verify that a
measure preserving transformation defined on ([0, 1), B, µ) is ergodic, where
B is the Lebesgue σ-algebra, and µ is a probability measure equivalent to
Lebesgue measure λ (i.e., µ(A) = 0 if and only if λ(A) = 0).
then λ(B) = 1.
Proof The proof is done by contradiction. Suppose λ(B c ) > 0. Given ε > 0
there exists by Lemma 1.2.1 a set Eε that is a finite disjoint union of open
intervals such that λ(B c 4Eε ) < ε. Now by conditions (a) and (b) (that
is, writing Eε as a countable union of disjoint elements of C) one gets that
λ(B ∩ Eε ) ≥ γλ(Eε ).
Examples of Ergodic Transformations 27
we have that
γ(λ(B c ) − ε) < λ(B c 4Eε ) < ε .
Hence γλ(B c ) < ε + γε, and since ε > 0 is arbitrary, we get a contradiction.
(see Example (b), subsection 1.3). We have seen that T is measure preserving.
We will use Lemma 1.8.1 to show that T is ergodic. Let C be the collection
of all intervals of the form [k/2n , (k + 1)/2n ) with n ≥ 1 and 0 ≤ k ≤ 2n − 1.
Notice that the the set {k/2n : n ≥ 1, 0 ≤ k < 2n − 1} of dyadic rationals
is dense in [0, 1), hence each open interval is at most a countable union of
disjoint elements of C. Hence, C satisfies the first hypothesis of Knopp’s
Lemma. Now, T n maps each dyadic interval of the form [k/2n , (k + 1)/2n )
linearly onto [0, 1), (we call such an interval dyadic of order n); in fact,
T n x = 2n x mod(1). Let B ∈ B be T -invariant, and assume λ(B) > 0. Let
A ∈ C, and assume that A is dyadic of order n. Then, T n A = [0, 1) and
1
λ(A ∩ B) = λ(A ∩ T −n B) = λ(T n A ∩ B)
λ(A)
1
= λ(B) = λ(A)λ(B).
2n
Thus, the second hypothesis of Knopp’s Lemma is satisfied with γ = λ(B) >
0. Hence, λ(B) = 1. Therefore T is ergodic.
Bk = {x ∈ X \ A : T x, . . . , T k−1 x 6∈ A, T k x ∈ A}.
T −k A = 1, then, T
S
Exercise 1.8.4 Show that if TA is ergodic and µ k≥1
is ergodic.
Chapter 2
For example consider X = {0, 1}N , F the σ-algebra generated by the cylinder
sets, and µ the uniform product measure, i.e.,
µ ({x : x1 = a1 , x2 = a2 , . . . , xn = an }) = 1/2n .
Suppose one is interested in finding the frequency of the digit 1. More pre-
cisely, for a.e. x we would like to find
1
lim #{1 ≤ i ≤ n : xi = 1}.
n→∞ n
Using the Strong Law of Large Numbers one can answer this question easily.
Define
1, if xi = 1,
Yi (x) :=
0, otherwise.
29
30 The Ergodic Theorem
1 1
lim #{1 ≤ i ≤ n : xi = 1} = .
n→∞ n 2
Suppose now we are interested in the frequency of the block 011, i.e., we
would like to find
1
lim #{1 ≤ i ≤ n : xi = 0, xi+1 = 1, xi+2 = 1}.
n→∞ n
For the proof of the above theorem, we need the following simple lemma.
Lemma 2.1.1 Let M > 0 be an integer, and suppose {an }n≥0 , {bn }n≥0 are
sequences of non-negative real numbers such that for each n = 0, 1, 2, . . . there
exists an integer 1 ≤ m ≤ M with
an + · · · + an+m−1 ≥ bn + · · · + bn+m−1 .
Then, for each positive integer N > M , one has
a0 + · · · + aN −1 ≥ b0 + · · · + bN −M −1 .
Hence,
an + . . . + an+m−1 = F (T n x) + . . . + F (T n+m−1 x)
≥ f (T n x) + . . . + f (T n+m−1 x) = fm (T n x)
≥ bn + . . . + bn+m−1 .
–If T n x ∈
/ X0 , then take m = 1 since
Integrating both sides, and using the fact that T is measure preserving one
gets Z Z
N F (x) dµ(x) ≥ (N − M ) min(f (x), L)(1 − ) dµ(x).
X X
Since Z Z
F (x) dµ(x) = f (x) dµ(x) + Lµ(X \ X0 ),
X X0
one has
Z Z
f (x) dµ(x) ≥ f (x) dµ(x)
X X0
Z
= F (x) dµ(x) − Lµ(X \ X0 )
X
(N − M )
Z
≥ min(f (x), L)(1 − ) dµ(x) − Lδ.
N X
fm (x)
≤ (f (x) + ).
m
For any δ > 0 there exists an integer M > 0 such that the set
The Ergodic Theorem and its consequences 35
Remarks
(1) Let us study further the limit f ∗ in the case that T is not ergodic. Let I
be the sub-σ-algebra of F consisting of all T -invariant subsets A ∈ F. Notice
that if f ∈ L1 (µ), then the conditional expectation of f given I (denoted by
Eµ (f |I)), is the unique a.e. I-measurable L1 (µ) function with the property
that Z Z
f (x) dµ(x) = Eµ (f |I)(x) dµ(x)
A A
and Z Z
f 1A (x) dµ(x) = f ∗ 1A (x) dµ(x).
X X
(2) Suppose T is ergodic and measure preserving with respect to µ, and let
ν be a probability measure which is equivalent to µ (i.e. µ and ν have the
same sets of measure zero so µ(A) = 0 if and only if ν(A) = 0), then for
every f ∈ L1 (µ) one has
n−1 Z
1X
lim f (T i (x)) = f dµ
n→∞ n X
i=0
ν a.e.
n−1
1X 1
lim n(TAi (x)) = ,
n→∞ n µ(A)
i=0
almost everywhere on A.
√
1+ 5
Exercise 2.1.2 Let β = , and consider the transformation Tβ :
2
[0, 1) → [0, 1) given by Tβ x = βx mod(1) = βx − bβxc. Define b1 on
[0, 1) by
0 if 0 ≤ x < 1/β
b1 (x) =
1 if 1/β ≤ x < 1,
Fix k ≥ 0. Find the a.e. value (with respect to Lebesgue measure) of the
following limit
1
lim #{1 ≤ i ≤ n : bi = 0, bi+1 = 0, . . . , bi+k = 0}.
n→∞ n
Using the Ergodic Theorem, one can give yet another characterization of
ergodicity.
n−1 Z
1X
lim 1A (T i x) = 1A (x) dµ(x) = µ(A) a.e.
n→∞ n X
i=0
The Ergodic Theorem and its consequences 37
Then,
n−1 n−1
1X 1X
lim 1T −i A∩B (x) = lim 1T −i A (x)1B (x)
n→∞ n n→∞ n
i=0 i=0
n−1
1X
= 1B (x) lim 1A (T i x)
n→∞ n
i=0
= 1B (x)µ(A) a.e.
Pn−1
Since for each n, the function limn→∞ n1 i=0 1T −i A∩B is dominated by the
constant function 1, it follows by the dominated convergence theorem that
n Z n−1
1X −i 1X
lim µ(T A ∩ B) = lim 1T −i A∩B (x) dµ(x)
n→∞ n X n→∞ n i=0
i=0
Z
= 1B µ(A) dµ(x) = µ(A)µ(B).
X
Hence, µ(E) = µ(E)2 . Since µ(E) > 0, this implies µ(E) = 1. Therefore, T
is ergodic.
To show ergodicity one needs to verify equation (2.1) for sets A and B
belonging to a generating semi-algebra only as the next proposition shows.
Proof We only need to show that if (2.2) holds for all A, B ∈ S, then it
holds for all A, B ∈ F. Let > 0, and A, B ∈ F. Then, by Lemma 1.2.1
(subsection 1.2) there exist sets A0 , B0 each of which is a finite disjoint union
of elements of S such that
Since,
it follows that
Further,
Hence,
! !
1Xn−1 n−1
1X
µ(T −i A ∩ B) − µ(A)µ(B) − µ(T −i A0 ∩ B0 ) − µ(A0 )µ(B0 )
n n i=0
i=0
n−1
1 X
µ(T −i A ∩ B) + µ(T −i A0 ∩ B0 ) − |µ(A)µ(B) − µ(A0 )µ(B0 )|
≤
n i=0
< 4.
Therefore, " #
n−1
1X
lim µ(T −i A ∩ B) − µ(A)µ(B) = 0.
n→∞ n i=0
The Ergodic Theorem and its consequences 39
Theorem 2.1.2 Suppose µ1 and µ2 are probability measures on (X, F), and
T : X → X is measurable and measure preserving with respect to µ1 and µ2 .
Then,
Let
n−1
1X
CA = {x ∈ X : lim 1A (T i x) = µ1 (A)},
n→∞ n
i=0
(k)
where P k = (pij ) is the k th power of the matrix P , and P 0 is the k × k
identity matrix. More precisely, Q = (qij ), where
n−1
1 X (k)
qij = lim pij .
n→∞ n
k=0
1
Pn−1 (k)
Lemma 2.2.1 For each i, j ∈ {0, 1, . . . N − 1}, the limit limn→∞ n k=0 pij
exists, i.e., qij is well-defined.
Since n1 n−1
P
k=0 1{x:x0 =i,xk =j} (x) ≤ 1 for all n, by the dominated convergence
theorem,
n−1
1 1X
qij = lim µ({x ∈ X : x0 = i, xk = j})
πi n→∞ n k=0
Z n−1
1 1X
= lim 1{x:x0 =i,xk =j} (x) dµ(x)
πi X n→∞ n k=0
Z
1
= f ∗ (x)1{x:x0 =i} (x) dµ(x)
πi X
Z
1
= f ∗ (x) dµ(x)
πi {x:x0 =i}
which is finite since f ∗ is integrable. Hence qij exists.
Exercise 2.2.1 Show that the matrix Q has the following properties:
(a) Q is stochastic.
(b) Q = QP = P Q = Q2 .
(c) πQ = π.
Proof
(i) ⇒ (ii) By the ergodic theorem for each i, j,
n−1
1X
lim 1{x:x0 =i,xk =j} (x) = 1{x:x0 =i} (x)πj .
n→∞ n
k=0
Characterization of Irreducible Markov Chains 43
Hence,
n−1
1 1X
qij = lim µ({x ∈ X : x0 = i, xk = j}) = πj ,
πi n→∞ n k=0
i.e., qij is independent of i. Therefore, all rows of Q are identical.
(ii) ⇒ (iii) If all the rows of Q are identical, then all the columns of Q are
constants. Thus, for each j there exists a constant cj such that qij = cj for
all i. Since πQ = π, it follows that qij = cj = πj > 0 for all i, j.
(iii) ⇒ (iv) For any i, j
n−1
1 X (k)
lim pij = qij > 0.
n→∞ n
k=0
(n)
Hence, there exists n such that pij > 0, therefore P is irreducible.
(iv) ⇒ (iii) Suppose P is irreducible. For any state i ∈ {0, 1, . . . , N − 1}, let
Si = {j : qij > 0}. Since Q is a stochastic matrix, it follows that Si 6= ∅. Let
l ∈ Si , then qil > 0. Since Q = QP = QP n for all n, then for any state j
N
X −1
(n) (n)
qij = qim pmj ≥ qil plj
m=0
(n)
for any n. Since P is irreducible, there exists n such that plj > 0. Hence,
qij > 0 for all i, j.
(iii) ⇒ (ii) Suppose qij > 0 for all j = 0, 1, . . . , N − 1. Fix any state j, and
let qj = max0≤i≤N −1 qij . Suppose that not all the qij ’s are the same. Then
there exists k ∈ {0, 1, . . . , N − 1} such that qkj < qj . Since Q is stochastic
and Q2 = Q, then for any i ∈ {0, 1, . . . , N − 1} we have,
N
X −1 N
X −1
qij = qil qlj < qil qj = qj .
l=0 l=0
44 The Ergodic Theorem
Let
Since T is the left shift, for all n sufficiently large, the cylinders T −n A and
B depend on different coordinates. Hence, for n sufficiently large,
(n+r−s−m)
µ(T −n A ∩ B) = πj0 pj0 j1 . . . pjm−1 jm pjm i0 pi0 i1 . . . pil−1 il .
Thus,
n−1
1X
lim µ(T −k A ∩ B)
n→∞ n
k=0
n−1
1 X (k)
= πj0 pj0 j1 . . . pjm−1 jm pi0 i1 . . . pil−1 il lim pjm i0
n→∞ n
k=0
= (πj0 pj0 j1 . . . pjm−1 jm )(πi0 pi0 i1 . . . pil−1 il )
= µ(B)µ(A).
Therefore, T is ergodic.
(ii) ⇒ (v) If all the rows of Q are identical, then qij = πj for all i, j. If
P −1
vP = v, then vQ = v. This implies that for all j, vj = ( N i=0 vi )πj . Thus,
v is a multiple of π. Therefore, 1 is a simple eigenvalue.
(v) ⇒ (ii) Suppose 1 is a simple eigenvalue. For any i, let qi∗ = (qi0 , . . . , qi(N −1) )
denote the ith row of Q then, qi∗ is a probability vector. From Q = QP , we get
qi∗ = qi∗ P . By hypothesis π is the only probability vector satisfying πP = P ,
hence π = qi∗ , and all the rows of Q are identical.
Characterization of Irreducible Markov Chains 45
2.3 Mixing
As a corollary to the ergodic theorem we found a new definition of ergodicity;
namely, asymptotic average independence. Based on the same idea, we now
define other notions of weak independence that are stronger than ergodicity.
Notice that strongly mixing implies weakly mixing, and weakly mixing im-
plies ergodicity. This follows from the simple fact that if {an } is a sequence
n−1
1X
of real numbers such that limn→∞ an = 0, then limn→∞ |ai | = 0, and
n i=0
n−1
1X
hence limn→∞ ai = 0. Furthermore, if {an } is a bounded sequence,
n i=0
then the following are equivalent:
n−1
1X
(i) limn→∞ |ai | = 0
n i=0
n−1
1X
(ii) limn→∞ |ai |2 = 0
n i=0
Using this one can give three equivalent characterizations of weakly mixing
transformations, can you state them?
(a) Show that if equation (2.3) holds for all A, B ∈ S, then T is weakly
mixing.
(b) Show that if equation (2.4) holds for all A, B ∈ S, then T is strongly
mixing.
Measure Preserving
Isomorphisms and Factor Maps
(1) The measure structure given by the σ-algebra and the probability mea-
sure. Note, that in this context, sets of measure zero can be ignored.
So our notion of being the same must mean that we have a map
ψ : (X, F, µ, T ) → (Y, C, ν, S)
satisfying
47
48 Measure Preserving Isomorphism and Factor Maps
T
N... .............................................................. N...
... ...
... ...
... ...
... ...
... ...
ψ ...
...
...
...
...
...
ψ
... ...
. .
......... .........
... ...
S
N 0..............................................................N 0
x→ Tx → T 2x → ··· → T nx →
↓ ↓ ↓ ↓ ↓
2 n
ψ(x) → S (ψ(x)) → S (ψ(x)) → · · · → S (ψ(x)) →
Exercise 3.1.1 Suppose (X, F, µ, T ) and (Y, C, ν, S) are two isomorphic dy-
namical systems. Show that
Examples
(1) Let K = {z ∈ C : |z| = 1} be equipped with the Borel σ-algebra B on
K, and Haar measure (i.e., normalized Lebeque measure on the unit circle).
Measure Preserving Isomorphisms 49
where ∞ k
P
k=1 ak /N is the N -adic expansion of x (for example if N = 2 we
get the binary expansion, and if N = 10 we get the decimal expansion). Let
√
1+ 5
Exercise 3.1.3 Let G = , so that G2 = G + 1. Consider the set
2
1 [ 1 1
X = [0, ) × [0, 1) [ , 1) × [0, ),
G G G
endowed with the product Borel σ-algebra. Define the transformation
y
(Gx, G ),
(x, y) ∈ [0, G1 ) × [0, 1]
T (x, y) =
(Gx − 1, 1 + y ), (x, y) ∈ [ 1 , 1) × [0, 1 ).
G G
G
(a) Show that T is measure preserving with respect to normalized Lebesgue
measure on X.
(b) Now let S : [0, 1) × [0, 1) → [0, 1) × [0, 1) be given by
y
(Gx, ), (x, y) ∈ [0, G1 ) × [0, 1]
G
S(x, y) =
(G2 x − G, G + y ), (x, y) ∈ [ 1 , 1) × [0, 1).
G
G2
Factor Maps 51
T −1 G = T −1 ψ −1 C = ψ −1 S −1 C ⊆ ψ −1 C = G.
and let S be the left shift on X = {0, 1}N with the σ-algebra F generated by
the cylinders, and the uniform product measure µ. Define ψ : [0, 1)×[0, 1) →
X by
ψ(x, y) = (a1 , a2 , . . .),
an
where x = ∞
P
n=1 n is the binary expansion of x. It is easy to check that ψ
2
is a factor map.
Exercise 3.2.1 Let T be the left shift on X = {0, 1, 2}N which is endowed
with the σ-algebra F, generated by the cylinder sets, and the uniform product
measure µ giving each symbol probability 1/3, i.e.,
1
µ ({x ∈ X : x1 = i1 , x2 = i2 , . . . , xn = in }) = ,
3n
where i1 , i2 , . . . , in ∈ {0, 1, 2}.
Let S be the left shift on Y = {0, 1}N which is endowed with the σ-algebra G,
generated by the cylinder sets, and the product measure ν giving the symbol
0 probability 1/3 and the symbol 1 probability 2/3, i.e.,
2 1
µ ({y ∈ Y : y1 = j1 , y2 = j2 , . . . , yn = jn }) = ( )j1 +j2 +...+jn ( )n−(j1 +j2 +...+jn ) ,
3 3
where j1 , j2 , . . . , jn ∈ {0, 1}. Show that S is a factor of T .
∞
_
T k ψ −1 G
k=0
ψ (. . . , x−1 , x0 , x1 , . . .) = (x0 , x1 , . . .) .
Entropy
55
56 Entropy
the symbol ai , and T the left shift. We define the entropy of this system by
n
X
H(p1 , . . . , pn ) = h(T ) := − pi log2 pi . (4.1)
i=1
The only function with this property is the logarithm function to any base.
We choose base 2 because information is usually measured in bits.
In the above example of the ticker-tape the symbols were transmitted
independently. In general, the symbol generated might depend on what has
been received before. In fact these dependencies are often ‘built-in’ to be
able to check the transmitted sequence of symbols on errors (think here of
the Morse sequence, sequences on compact discs etc.). Such dependencies
must be taken into consideration in the calculation of the average information
per symbol. This can be achieved if one replaces the symbols ai by blocks of
symbols of particular size. More precisely, for every n, let Cn be the collection
of all possible n-blocks (or cylinder sets) of length n, and define
X
Hn := − P (C) log P (C) .
C∈Cn
Then n1 Hn can be seen as the average information per symbol when a block
of length n is transmitted. The entropy of the source is now defined by
Hn
h := lim . (4.2)
n→∞ n
The existence of the limit in (4.2) follows from the fact that Hn is a subadditive
sequence, i.e., Hn+m ≤ Hn + Hm , and proposition (4.2.2) (see proposition
(4.2.3) below).
Now replace the source by a measure preserving system (X, B, µ, T ).
How can one define the entropy of this system similar to the case of a
source? The symbols {a1 , a2 , . . . , an } can now be viewed as a partition
A = {A1 , A2 , . . . , An } of X, so that X is the disjoint union (up to sets
of measure zero) of A1 , A2 , . . . , An . The source can be seen as follows: with
each point x ∈ X, we associate an infinite sequence · · · x−1 , x0 , x1 , · · · , where
xi is aj if and only if T i x ∈ Aj . We define the entropy of the partition α by
n
X
H(α) = Hµ (α) := − µ(Ai ) log µ(Ai ) .
i=1
T −1 α := {T −1 A1 , . . . , T −1 An }
and
α ∨ β := {Ai ∩ Bj : Ai ∈ α, Bj ∈ β}
are both partitions of X.
The members of a partition are called the atoms of the partition. We say
that the partition β = {B1 , . . . , Bm } is a refinement of the partition α =
{A1 , . . . , An }, and write α ≤ β, if for every 1 ≤ j ≤ m there exists an
1 ≤ i ≤ n such that Bj ⊂ Ai (up to sets of measure zero). The partition
α ∨ β is called the common refinement of α and β.
Proof We prove properties (b) and (c), the rest are left as an exercise.
XX
H(α ∨ β) = − µ(A ∩ B) log µ(A ∩ B)
A∈α B∈β
XX µ(A ∩ B)
= − µ(A ∩ B) log
A∈α B∈β
µ(A)
XX
+ − µ(A ∩ B) log µ(A)
A∈α B∈β
= H(β|α) + H(α).
We now show that H(β|α) ≤ H(β). Recall that the function f (t) = −t log t
for 0 < t ≤ 1 is concave down. Thus,
XX µ(A ∩ B)
H(β|α) = − µ(A ∩ B) log
B∈β A∈α
µ(A)
XX µ(A ∩ B) µ(A ∩ B)
= − µ(A) log
B∈β A∈α
µ(A) µ(A)
XX µ(A ∩ B)
= µ(A)f
B∈β A∈α
µ(A)
!
X X µ(A ∩ B)
≤ f µ(A)
B∈β A∈α
µ(A)
X
= f (µ(B)) = H(β).
B∈β
60 Entropy
exists.
Proof Fix any m > 0. For any n ≥ 0 one has n = km + i for some i between
0 ≤ i ≤ m − 1. By subadditivity it follows that
an akm+i akm ai am ai
= ≤ + ≤ k + .
n km + i km km km km
Note that if n → ∞, k → ∞ and so lim supn→∞ an /n ≤ am /m. Since m is
arbitrary one has
an am an
lim sup ≤ inf ≤ lim inf .
n→∞ n m n→∞ n
n+p−1
_
an+p = H( T −i α)
i=0
n−1 n+p−1
_ _
−i
≤ H( T α) + H( T −i α)
i=0 i=n
p−1
_
= an + H( T −i α)
i=0
= an + ap .
exists.
We are now in position to give the definition of the entropy of the trans-
formation T .
where
n−1
_ X
H( T −i α) = − µ(D) log(µ(D)) .
i=0 Wn−1
D∈ i=0 T −i α
Now, dividing by n, and taking the limit as n → ∞, one gets the desired
result
Setting
where in the last inequality the supremum is taken over all possible finite
partitions α of X. Thus hν (S) ≤ hµ (T ). The proof of hµ (T ) ≤ hν (S) is done
similarly. Therefore hν (S) = hµ (T ), and the proof is complete.
Ai := {x ∈ X : x0 = i} , i = 1, . . . , n .
T −m Ai = {x ∈ X : xm = i} ,
and
In other words, m −i
W
i=0 T α is precisely the collection of cylinders of length
m (i.e., the collection of all m-blocks), and these by definition generate F.
Hence, α is a generating partition, so that
m−1
!
1 _
−i
h(T ) = h(α, T ) = lim H T α .
m→∞ m
i=0
α, T −1 α, · · · , T −(m−1) α
H(α ∨ T −1 α ∨ · · · ∨ T −(m−1) α)
= H(α) + H(T −1 α) + · · · + H(T −(m−1) α)
X n
= mH(α) = −m pi log pi .
i=1
Thus,
n n
1 X X
h(T ) = lim (−m) pi log pi = − pi log pi .
m→∞ m
i=1 i=1
Exercise 4.3.1 Let T be the left shift on X = {1, 2, · · · , n}Z endowed with
the σ-algebra F generated by the cylinder sets, and the Markov measure µ
given by the stochastic matrix P = (pij ), and the probability vector π =
(π1 , . . . , πn ) with πP = π. Show that
n X
X n
h(T ) = − πi pij log pij
j=1 i=1
We claim that
X
Iα|β (x) = − log Eµ (1α(x) |σ(β)) = − 1A (x) log E(1A |σ(β)), (4.3)
A∈α
Furthermore,
n Z
1 _
−i 1
lim H( T α) = lim IWni=0 T −i α (x) dµ(x) = h(α, T ).
n→∞ n + 1 n→∞ n + 1 X
i=0
and
Bn = {x ∈ X : f1A (x) ≤ t, . . . , fn−1
A
(x) ≤ t, fnA (x) > t}.
68 Entropy
Wn for−i x ∈ A one
Notice that has fn (x) = fnAW(x), and for x ∈ Bn one has
Eµ (1A | i=1 T α) (x) < 2−t . Since Bn ∈ σ ( ni=1 T −i α), then
Z
µ(Bn ∩ A) = 1A (x) dµ(x)
Bn
Z n
!
_
= Eµ 1A | T −i α (x) dµ(x)
Bn i=1
Z
≤ 2−t dµ(x) = 2−t µ(Bn ).
Bn
Thus,
µ ({x ∈ A : f ∗ (x) > t}) = µ ({x ∈ A : fn (x) > t, for some n})
= µ {x ∈ A : fnA (x) > t, for some n}
= µ (∪∞
n=1 A ∩ Bn )
X∞
= µ (A ∩ Bn )
n=1
∞
X
−t
≤ 2 µ(Bn ) ≤ 2−t .
n=1
hence,
XZ − log µ(A) XZ ∞
= µ(A) dt + 2−t dt
A∈α 0 A∈α − log µ(A)
X X µ(A)
= − µ(A) log µ(A) +
A∈α A∈α
loge 2
1
= Hµ (α) + < ∞.
loge 2
So far we have defined the notion of the conditional entropy Iα|β when α
and β are countable partitions. We can generalize the definition to the case
α is a countable partition and G is a σ-algebra as follows (see equation (4.3)),
Iα|G (x) = − log Eµ (1α(x) |G).
If we denote by ∞
W −i
Wn −i
i=1 T α = σ(∪n i=1 T α), then
Iα| W∞
i=1 T
−i α (x) = lim Iα| n
W
i=1 T
−i α (x). (4.4)
n→∞
Exercise 4.4.2 Give a proof of equation (4.4) using the following important
theorem, known as the Martingale Convergence Theorem (and is stated to
our setting)
Theorem 4.4.1 (Martingale Convergence Theorem) Let C1 ⊆ C2 ⊆ · · · be a
sequence of increasing σalgebras, and let C = σ(∪n Cn ). If f ∈ L1 (µ), then
Eµ (f |C) = lim Eµ (f |Cn )
n→∞
f (T n x)
lim = 0, µ a.e.
n→∞ n
1 W
lim I ni=0 T −i α (x) = h(α, T ) a.e.
n→∞ n + 1
= IWn−1
i=1 T
−i α (T x) + Iα| n−1 T −i α (T x) + fn (x)
W
i=1
2
= IWn−2
i=0 T
−i α (T x) + fn−1 (T x) + fn (x)
..
.
= Iα (T n x) + f1 (T n−1 x) + . . . + fn−1 (T x) + fn (x).
n
1 X
We now study the sequence { (fn−k − f )(T k x)}. Let
n + 1 k=0
If we take the limit as n → ∞, then by exercise (4.4.3) the second term tends
to 0 a.e., and by the ergodic theorem as well as the dominated convergence
theorem, the first term tends to zero a.e. Hence,
1 W
lim I ni=0 T −i α (x) = h(α, T ) a.e.
n→∞ n + 1
The above theorem Wn can−ibe interpreted as providing an estimate of the size
of
Wnthe −iatoms of i=0 T α. For n sufficiently large, a typical element A ∈
i=0 T α satisfies
1
− log µ(A) ≈ h(α, T )
n+1
or
µ(An ) ≈ 2−(n+1)h(α,T ) .
Furthermore, if α is a generating partition (i.e. ∞ −i
W
i=0 T α = F, then in the
conclusion of Shannon-McMillan-Breiman Theorem one can replace h(α, T )
by h(T ).
72 Entropy
1
x = = [0; a1 , a2 , · · · ] (4.5)
1
a1 +
1
a2 +
1
a3 +
...
In other words, if Bn (x) denotes the decimal cylinder consisting of all points
y in [0, 1) such that the first n decimal digits of y agree with those of x,
and if Cj (x) denotes the continued fraction cylinder of order j containing
x, i.e., Cj (x) is the set of all points in [0, 1) such that the first j digits in
their continued fraction expansion is the same as that of x, then m(n, x) is
the largest integer such that Bn (x) ⊂ Cm(n,x) (x). Lochs proved the following
theorem:
Lochs’ Theorem 73
Theorem 4.5.1 Let λ denote Lebesgue measure on [0, 1). Then for a.e.
x ∈ [0, 1)
m(n, x) 6 log 2 log 10
lim = .
n→∞ n π2
Definition 4.5.2 Let c ≥ 0. We say that P has entropy c a.e. with respect
to λ if
log λ (Pn (x))
− → c a.e.
n
mP,Q (n, x) c
→ a.e.
n d
74 Entropy
(respectively
λ (Qn (x)) > 2−n(d+η) .)
Lochs’ Theorem 75
Exercise 4.5.2 Prove Theorem (4.5.1) using Theorem (4.5.2). Use the fact
that the continued fraction map T is ergodic with respect to Gauss measure
µ, given by Z
1 1
µ(B) = dx,
B log 2 1 + x
π2
and has entropy equal to hµ (T ) = .
6 log 2
Exercise 4.5.3 Reformulate and prove Lochs’ Theorem for any two NTFM
maps S and T on [0, 1).
78 Entropy
Chapter 5
µ(A) = 0 ⇔ ν(A) = 0, A ∈ F.
79
80 Hurewicz Ergodic Theorem
dµ dν
Usually the function f is denoted by and the function g by .
dν dµ
Now suppose that (X, B, µ) is a probability space, and T : X → X a
measurable transformation. One can define a new measure µ ◦ T −1 on (X, B)
by µ ◦ T −1 (A) = µ(T −1 A) for A ∈ B. It is not hard to prove that for
f ∈ L1 (µ), Z Z
f d(µ ◦ T −1 ) = f ◦ T dµ (5.1)
Proof We show the result for indicator functions only, the rest of the proof
is left to the reader.
Z
1A (x) dµ(x) = µ(A) = µ(T (T −1 A))
X
Z
= ω1 (x) dµ(x)
T −1 A
Z
= 1A (T x)ω1 (x) dµ(x).
X
Proposition 5.2.2 Under the assumptions of Proposition 5.2.1, one has for
all n, m ≥ 1, that
Therefore, for almost every x ∈ A there exist infinitely many n ≥ 1 such that
T n x ∈ A. Replacing T by T −1 yields the result for n ≤ −1.
X∞
= µ(T n B)
n=1
∞ Z
X
= 1T n B (x) dµ(x)
n=1 X
∞
Z X
= 1B (T −n x) dµ(x).
X n=1
R P∞
Hence, X n=1 1B (T −n x) dµ(x) < ∞, which implies that
∞
X
1B (T −n x) < ∞ µ a.e.
n=1
fn (x) fn (x)
f (x) = lim sup n−1
= lim sup ,
n→∞ X n→∞ gn (x)
ωi (x)
i=0
and
fn (x) fn (x)
f (x) = lim inf n−1
= lim inf .
n→∞ X n→∞ gn (x)
ωi (x)
i=0
By Proposition (5.2.2), one has gn+m (x) = gn (x) + gm (T n x). Using Exercise
(5.2.1) and Proposition (5.2.4), we will show that f and f are T -invariant.
To this end,
fn (T x)
f (T x) = lim sup n
n→∞ gn (T x)
fn+1 (x) − f (x)
ω1 (x)
= lim sup
n→∞ gn+1 (x) − g(x)
ω1 (x)
fn+1 (x) − f (x)
= lim sup
n→∞ gn+1 (x) − g(x)
fn+1 (x) gn+1 (x) f (x)
= lim sup · −
n→∞ gn+1 (x) gn+1 (x) − g(x) gn+1 (x) − g(x)
fn+1 (x)
= lim sup
n→∞ gn+1 (x)
= f (x).
Hurewicz Ergodic Theorem 85
(Similarly f is T -invariant).
Now, to prove that f∗ exists, is integrable and T -invariant, it is enough
to show that Z Z Z
f dµ ≥ f dµ ≥ f dµ.
X X X
Hence,
ωn (x)fm (T n x) ≥ min(f (x), L)(1 − )gm (T n x)ωn (x).
Now,
–If T n x ∈
/ X0 , then take m = 1 since
Hence by T -invariance of f , and Lemma 2.1.1 for all integers N > M one
has
Integrating both sides, and using Proposition (5.2.1) together with the T -
invariance of f one gets
Z Z
N F (x) dµ(x) ≥ min(f (x), L)(1 − )gN −M (x) dµ(x)
X X
Z
= (N − M ) min(f (x), L)(1 − ) dµ(x).
X
Since Z Z
F (x) dµ(x) = f (x) dµ(x) + Lµ(X \ X0 ),
X X0
one has
Z Z
f (x) dµ(x) ≥ f (x) dµ(x)
X X0
Z
= F (x) dµ(x) − Lµ(X \ X0 )
X
(N − M )
Z
≥ min(f (x), L)(1 − ) dµ(x) − Lδ.
N X
Notice that G ≤ f . Let bn = G(T n x)ωn (x), and an = (f (x) + )ωn (x). We
now check that the sequences {an } and {bn } satisfy the hypothesis of Lemma
2.1.1 with M > 0 as above.
–if T n x ∈ Y0 , then there exists 1 ≤ m ≤ M such that
Hence,
–If T n x ∈
/ Y0 , then take m = 1 since
G(x) + G(T x)ω1 (x) + . . . + G(T N −M −1 x)ωN −M −1 (x) ≤ (f (x) + )gN (x).
88 Hurewicz Ergodic Theorem
6.1 Existence
Suppose X is a compact metric space, and let B be the Borel σ-algebra i.e.,
the σ-algebra generated by the open sets. Let M (X) be the collection of all
Borel probability measures on X. There is natural embedding of the space
X in M (X) via the map x → δx , where δX is the Dirac measure concentrated
at x (δx (A) = 1 if x ∈ A, and is zero otherwise). Furthermore, M (X) is a
convex set, i.e., pµ + (1 − p)ν ∈ M (X) whenever µ, ν ∈ M (X) and 0 ≤ p ≤ 1.
Theorem 6.1.2 below shows that a member of M (X) is determined by how
it integrates continuous functions. We denote by C(X) the Banach space of
all complex valued continuous functions on X under the supremum norm.
Theorem 6.1.1 Every member of M (X) is regular, i.e., for all B ∈ B and
every > 0 there exist an open set U and a closed sed C such that C ⊆
B ⊆ U such that µ(U \ C ) < .
Idea of proof Call a set B ∈ B with the above property a regular set. Let
R = {B ∈ B : B is regular }. Show that R is a σ-algebra containing all the
closed sets.
Corollary 6.1.1 For any B ∈ B, and any µ ∈ M (X),
µ(B) = sup µ(C) = inf µ(U ).
B⊆U :U open
C⊆B:C closed
89
90 Invariant measures for continuous transformations
Proof From the above corollary, it suffices to show that µ(C) = m(C) for
all closed subsets C of X. Let > 0, by regularity of the measure m there
exists an open set U such that C ⊆ U and m(U \ C) < . Define f ∈ C(X)
as follows
(
0 x∈/ U
f (x) = d(x,X\U )
d(x,X\U )+d(x,C)
x ∈ U .
Using a similar argument, one can show that m(C) ≤ µ(C) + . Therefore,
µ(C) = m(C) for all closed sets, and hence for all Borel sets.
This allows us to define a metric structure on M (X) as follows. A sequence
{µn } in M (X) converges to µ ∈ M (X) if and only if
Z Z
lim f (x) dµn (x) = f (x) dµ(x)
n→∞ X X
for all f ∈ C(X). We will show that under this notion of convergence
the space M (X) is compact, but first we need The Riesz Representation
Representation Theorem.
n−1
1X i
µn = T σn .
n i=0
Proof
R We need toR show that for any continuous function f on X, one
has X f (x) dµ(x) = X f (T x) dµ. Since M (X) is compact there exists a
µ ∈ M (X) and a subsequence {µni } such that µni → µ in M (X). Now for
any f continuous, we have
Z Z Z Z
f (T x) dµ(x) − f (x) dµ(x) = lim f (T x) dµn (x) − f (x) dµ n (x)
j j
X X
j→∞ X X
Z nj −1
1 X
f (T i+1 x) − f (T i x) dσnj (x)
= lim
j→∞ nj X
Z i=0
1
= lim (f (T nj x) − f (x)) dσnj (x)
j→∞ nj X
2 supx∈X |f (x)|
≤ lim = 0.
j→∞ nj
Therefore µ ∈ M (X, T ).
n−1 Z
1X
lim fk (T i x) = fk (x) dµ(x)
n→∞ n X
i=0
T∞
for all x ∈ Xk . Let Y = k=1 Xk , then µ(Y ) = 1, and
n−1 Z
1X
lim fk (T i x) = fk (x)dµ(x)
n→∞ n X
i=0
for all k and all x ∈ Y . Now, let f ∈ C(X), then there exists a subse-
quence {fkj } converging to f in the supremum norm, and hence is uniformly
convergent. For any x ∈ Y , using uniform convergence and the dominated
convergence theorem, one gets
n−1 n−1
1X 1X
lim f (T i x) = lim lim fkj (T i x)
n→∞ n n→∞ j→∞ n
i=0 i=0
n−1
1X
= lim lim fkj (T i x)
j→∞ n→∞ n
i=0
Z Z
= lim fkj dµ = f dµ.
j→∞ X X
Hence by theorem 6.1.7, there exists a measurable set Y with µ(Y ) = 1 such
that
n−1 Z n−1 Z
1X 1X
lim f (y) d(δT i x )(y) = lim f (T i x) = f (y) dµ(y)
n→∞ n X n→∞ n i=0 X
i=0
1
Pn−1
for all x ∈ Y , and f ∈ C(X). Thus, n i=0 δT i x → µ for all x ∈ Y .
n−1 Z
1X
lim f (T i x)g(x) = g(x) f (y) dµ(y).
n→∞ n X
i=0
n−1 Z Z Z
1X i
lim f (T x)g(x) dµ(x) = g(x)dµ(x) f (y) dµ(y).
n→∞ n X X X
i=0
n−1 Z Z Z
1X i
lim f (T x)G(x) dµ(x) = G(x)dµ(x) f (y)dµ(y)
n→∞ n X X X
i=0
Z
n−1 Z Z
1X i
f (T x)G(x) dµ(x) − Gdµ f dµ < .
Xn
i=0 X X
96 Invariant measures for continuous transformations
for all x ∈ Y and all f ∈ C(X). When T is uniquely ergodic we will see that
we have a much stronger result.
(iii) There exists a µ ∈ M (X, T ) such that for every f ∈ C(X) and all
x ∈ X.
n−1 Z
1X i
lim f (T x) = f (y) dµ(y).
n→∞ n X
i=0
Hence, Z Z
f (T x) dµ(x) = f (x) dµ(x).
X X
Thus, µ ∈ M (X, T ), and for every f ∈ C(X),
n−1 Z
1X i
lim f (T x) = f (x) dµ(x)
n→∞ n X
i=0
for all x ∈ X.
(iii) ⇒ (iv) Suppose µ ∈ M (X, T ) is such that for every f ∈ C(X),
n−1 Z
1X i
lim f (T x) = f (x) dµ(x)
n→∞ n X
i=0
Let
n−1 n−1
1X 1X j
µn = δT j xn = T δxn .
n j=0 n j=0
Then, Z Z
| gdµn − g dµ| ≥ .
X X
Since M (X) is compact. there exists a subsequence µni converging to ν ∈
M (X). Hence, Z Z
| g dν − g dµ| ≥ .
X X
By Theorem (6.1.5), ν ∈ M (X, T ) and by unique ergodicity µ = ν, which is
a contradiction.
Example If Tθ is an irrational rotation, then Tθ is uniquely ergodic. This is
a consequence of the above theorem and Weyl’s Theorem on uniform distri-
bution: for any Riemann integrable function f on [0, 1), and any x ∈ [0, 1),
one has
n−1 Z
1X
lim f (x + iθ − bx + iθc) = f (y) dy.
n→∞ n X
i=0
{1, 2, 4, 8, 1, 3, 6, 1, . . .}
obtained by writing the first decimal digit of each term in the sequence
For each k = 1, 2, . . . , 9, let pk (n) be the number of times the digit k appears
in the first n terms of the first digit sequence. The asymptotic relative fre-
pk (n)
quency of the digit k is then limn→∞ . We want to identify this limit
n
for each k ∈ {1, 2, . . . 9}. To do this, let θ = log10 2, then θ is irrational. For
k = 1, 2, . . . , 9, let Jk = [log10 k, log10 (k + 1)). By unique ergodicity of Tθ , we
have for each k = 1, 2, . . . , 9,
n−1
1X k+1
lim 1Jk (Tθj (0)) = λ(Jk ) = log10 .
n→∞ n k
i=0
100 Invariant measures for continuous transformations
Returning to our original problem, notice that the first digit of 2i is k if and
only if
k · 10r ≤ 2i < (k + 1) · 10r
for some r ≥ 0. In this case,
But Tθi (0) = iθ − biθc, so that Tθi (0) ∈ Jk . Summarizing, we see that the first
digit of 2i is k if and only if Tθi (0) ∈ Jk . Thus,
n−1
pk (n) 1X k+1
lim = lim 1Jk (Tθi (0)) = log10 .
n→∞ n n→∞ n
i=0
k
Chapter 7
Topological Dynamics
In this chapter we will shift our attention from the theory of measure preserv-
ing transformations and ergodic theory to the study of topological dynamics.
The field of topological dynamics also studies dynamical systems, but in a
different setting: where ergodic theory is set in a measure space with a mea-
sure preserving transformation, topological dynamics focuses on a topological
space with a continuous transformation. The two fields bear great similarity
and seem to co-evolve. Whenever a new notion has proven its value in one
field, analogies will be sought in the other. The two fields are, however, also
complementary. Indeed, we shall encounter a most striking example of a map
which exhibits its most interesting topological dynamics in a set of measure
zero, right in the blind spot of the measure theoretic eye.
In order to illuminate this interplay between the two fields we will be merely
interested in compact metric spaces. The compactness assumption will play
the role of the assumption of a finite measure space. The assumption of the
existence of a metric is sufficient for applications and will greatly simplify
all of our proofs. The chapter is structured as follows. We will first intro-
duce several basic notions from topological dynamics and discuss how they
are related to each other and to analogous notions from ergodic theory. We
will then devote a separate section to topological entropy. To conclude the
chapter, we will discuss some examples and applications.
101
102 Basic Notions
• X is a Hausdorff space
• X is a Baire space
• f is uniformly continuous
The reader may recognize that this theorem contains some of the most
important results in analytic topology, most notably the Lebesgue number
lemma, Baire’s category theorem, the extreme value theorem and Urysohn’s
lemma.
Let us now introduce some concepts of topological dynamics: topological
transitivity, topological conjugacy, periodicity and expansiveness.
But ∪∞ m
m=−∞ T Un is T -invariant and therefore intersects every open set in
X by (3). Thus, ∪∞ m
m=−∞ T Un is dense in X for every n. Now, since X is
a Baire space (c.f. theorem 7.1.1), we see that {x ∈ X|OT (x) = X} is itself
dense in X and thus certainly non-empty (as X 6= ∅).
Proof
Exercise 7.1.1 Let X be a compact metric space with metric d and let
T : X −→ X is a topologically transitive homeomorphism. Show that if T is
an isometry (i.e. d(T (x), T (y)) = d(x, y), for all x, y ∈ X) then T is minimal.
Proof
Proof
The proof of the first two statements is trivial. For the third statement
we note that
∞
\ ∞
\ ∞
\
−n −1 −1 −n −1
T (φ An ) = φ ◦S (An ) = φ ( S −n An )
n=−∞ n=−∞ n=−∞
The lefthandside of the equation contains at most one point if and only if
the righthandside does, so a finite open cover α is a generator for S if and
only if φ−1 α = {φ−1 A|A ∈ α} is a generator for T . Hence the result follows
from theorem 7.1.4.
Definition 7.2.1 Let α be an open cover of X and let N (α) be the number
of sets in a finite subcover of α of minimal cardinality. We define the entropy
of α to be H(α) = log(N (α)).
We must show that h(T, α) is well-defined, i.e. that the right hand side exists.
Wn−1 −i
Theorem 7.2.1 limn→∞ n1 H( i=1 T α) exists.
Proof
Define
n−1
_
an = H( T −i α)
i=0
1. h(T, α) ≥ 0
3. h(T, α) ≤ H(α)
111
Proof
These are easy consequences of proposition 7.2.1. We will only prove the
third statement. We have:
n−1
_ n−1
X
−i
H( T α) ≤ H(T −i α)
i=0 i=0
≤ nH(α)
h1 (T ) = sup h(T, α)
α
We will defer a discussion of the properties of topological entropy till the end
of this chapter.
Figure 7.1: An (n, ε)-separated set (left) and (n, ε)-spanning set (right). The
dotted circles represent open balls of dn -radius 2ε (left) and of dn -radius ε
(right), respectively.
The following theorem shows that the above notions are actually two sides
of the same coin.
113
We will only prove the last two inequalities. The first is left as an exercise
to the reader. Let A be an (n, ε)-separated set of cardinality sep(n, ε, T ).
Suppose A is not (n, ε)-spanning for X. Then there is some x ∈ X such
that dn (x, a) ≥ ε, for all a ∈ A. But then A ∪ {x} is an (n, ε)-separated
set of cardinality larger than sep(n, ε, T ). This contradiction shows that A
is an (n, ε)-spanning set for X. The second inequality now follows since the
cardinality of A is at least as large as span(n, ε, T ).
To prove the third inequality, let A be an (n, ε)-separated set of cardinal-
ity sep(n, ε, T ). Note that if α is an open cover of dn -diameter less than ε,
then no element of α can contain more than 1 element of A. This holds in
particular for an open cover of minimal cardinality, so the third inequality is
proved.
The final statement follows for span(n, ε, T ) and cov(n, ε, T ) from the com-
pactness of X and subsequently for sep(n, ε, T ) by the last inequality.
Exercise 7.2.3 Finish the proof of the above theorem by proving the first
inequality.
1
Lemma 7.2.1 The limit cov(ε, T ) := limn→∞ n
log(cov(n, ε, T )) exists and
is finite.
Proof
Thus, A ∩ T −m B has dm+n -diameter less than ε and the sets {A ∩ T −m B|A ∈
α, B ∈ β} form a cover of X. Hence,
Motivated by the above lemma and theorem, we have the following definition
1
h2 (T ) = lim lim log(cov(n, ε, T ))
ε↓0 n→∞n
1
= lim lim log(span(n, ε, T ))
ε↓0 n→∞ n
1
= lim lim log(sep(n, ε, T ))
ε↓0 n→∞ n
Lemma 7.2.2 Let X be a compact metric space. Suppose {αn }∞ n=1 is a se-
quence of open covers of X such that limn→∞ diam(αn ) = 0, where diam is
the diameter under the metric d. Then limn→∞ h1 (T, αn ) = h1 (T ).
Proof
We will only prove the lemma for the case h1 (T ) < ∞. Let ε > 0 be arbi-
trary and let β be an open cover of X such that h1 (T, β) > h1 (T ) − ε. By
theorem 7.1.1 there exists a Lebesgue number δ > 0 for β. By assumption,
there exists an N > 0 such that for n ≥ N we have diam(αn ) < δ. So, for
such n, we find that for any A ∈ αn there exists a B ∈ β such that A ⊂ B,
i.e. β ≤ αn . Hence, by proposition 7.2.2 and the above,
h1 (T ) − ε < h1 (T, β) ≤ h1 (T, αn ) ≤ h1 (T )
for all n ≥ N .
Exercise 7.2.4 Finish the proof of lemma 7.2.2. That is, show that if
h1 (T ) = ∞, then limn→∞ h1 (T, αn ) = ∞.
Theorem 7.2.3 For a continuous transformation T : X −→ X of a compact
metric space X, definitions (I) and (II) of topological entropy coincide.
Proof
Step 1: h2 (T ) ≤ h1 (T ).
For any n, let {αk }∞ k=1 be the sequence of open covers of X, defined by
1
αk = {Bn (x, 3k )|x ∈ X}. Then, clearly, the dn -diameter of αk is smaller than
1
Wn−1 −i
k
. Now, i=0 T αk is an open W cover of X by sets of dn -diameter smaller
than k , hence cov(n, k , T ) ≤ N ( n−1
1 1 −i
i=0 T αk ). Since limk→∞ diam(αk ) = 0,
by lemma 7.2.2, we can subsequently take the log, divide by n, take the limit
for n → ∞ and the limit for k → ∞ to obtain the desired result.
Step 2: h1 (T ) ≤ h2 (T ).
Let {βk }∞ 1
k=1 be defined by βk = {B(x, k )|x ∈ X}. Note that δk = k is a
2
Lebesgue number for βk , for each k. Let A be an (n, δ2k )-spanning set for X
of cardinality span(n, δ2k , T ). Then, for each a ∈ A the ball B(T i (a), δ2k ) is
contained in a member of βk (for all 0 ≤ i < n), hence Bn (T i (a), δ2k ) is con-
tained in a member of n−1
W −i
Wn−1 −i 1
i=0 T βk . Thus, N ( i=0 T βk ) ≤ span(n, 4k , T ).
Proceeding in the same way as in step 1, we obtain the desired inequality.
116 Equivalence of the two Definitions
Properties of Entropy
We will now list some elementary properties of topological entropy.
Theorem 7.2.4 Let X be a compact metric space and let T : X −→ X be
continuous. Then h(T ) satisfies the following properties:
1. If the metrics d and d˜ both generate the topology of X, then h(T ) is the
same under both metrics
˜ y) < ε2
d(x, y) < ε1 ⇒ d(x,
˜ y) < ε2 ⇒ d(x, y) < ε3
d(x,
hence, Bd˜Y (y0 , η̃) ⊂ BdY (y0 , η). Analogously, for any y0 ∈ Y and δ̃ > 0, there
is a δ > 0 such that
BdY (y0 , δ) ⊂ Bd˜Y (y0 , δ̃)
Thus, d˜Y induces the topology of Y .
Now, for x1 , x2 ∈ X, we have:
max max d(T ni+k (x), T ni+k (a)) = max d(T j (x), T j (a)) < ε
0≤k≤n−1 0≤i≤m−1 0≤j≤nm−1
max d(T −i (T n−1 (x)), T −i (T n−1 (y))) = max d(T i (x), T i (y)) > ε
0≤i≤n−1 0≤i≤n−1
Exercise 7.2.5 Show that assertion (4) of theorem 7.2.4, i.e. h(T −1 ) =
h(T ), still holds if X is merely assumed to be compact.
7.3 Examples
In this section we provide a few detailed examples to get some familiarity
with the material discussed in this chapter. We derive some simple properties
of the dynamical systems presented below, but leave most of the good stuff
for you to discover in the exercises. After all, the proof of the pudding is in
the eating!
plenty of fish to go around. If, on the other hand, population is near the up-
per limit level P ∗ , it is expected to decrease as a result of overcrowding. We
arrive at the following simple model for the population at time n, denoted
by Pn :
Pn+1 = λPn (P ∗ − Pn )
Where λ is a ‘speed’ parameter. If we now set P ∗ = 1, we can think of Pn
as the population at time n as a percentage of the limiting population level.
Then, writing x = P0 , the population dynamics are described by iteration of
the following map:
Of course, in the context of the biological model formulated above, our atten-
tion is restricted to the interval [0, 1] since the population percentage cannot
lie outside this interval.
The quadratic map is deceivingly simple, as it in fact displays a wide range
of interesting dynamics, such as periodicity, topological transitivity, bifurca-
tions and chaos.
Let us take a closer look at the case λ > 4. For x ∈ (−∞, 0) ∪ (1, ∞)
it can be shown that Qnλ (x) → −∞ as n → ∞. The interesting dynam-
ics occur in the interval [0, 1]. Figure 7.2 shows a phase portrait for Qλ
on [0, 1], which is nothing more than the graph of Qλ together with the
graph of the line y = x. In a phase portrait the trajectory of a point
under iteration of the map Qλ is easily visualized by iteratively project-
ing from the line y = x to the graph and vice versa. From our phase
portrait for Qλ it is immediately visible that there is a ‘leak’ in our in-
terval, i.e. the subset A1 = {x ∈ [0, 1]|Qλ (x) ∈
/ [0, 1]} is non-empty. Define
An = {x ∈ [0, 1]|Qλ (x) ∈ [0, 1] for 0 ≤ i ≤ n − 1, Qnλ (x) ∈
i
/ [0, 1]} and set
∞
C = [0, 1] − ∪n=1 An . This procedure is reminiscent of the construction of the
‘classical’ Cantor set, by removing middle-thirds in subsequent steps. In the
exercises below it is shown that C is indeed a Cantor set.
120 Equivalence of the two Definitions
Figure 7.2: Phase portrait of the quadratic map Qλ for λ > 4 on [0, 1]. The
arrows indicate the (partial) trajectory of a point in [0, 1]. The dotted lines
divide [0, 1] in three parts: I0 , A1 and I1 (from left to right).
Proof
The first statement follows trivially from the fact that e2niπ = 1 for n ∈ Z.
121
Thus, the points {Rθn (z)|n ∈ Z} are all distinct and it follows that {Rθn (z)}∞
n=1
is an infinite sequence. By compactness, this sequence has a convergent
subsequence, which is Cauchy. Therefore, we can find integers n > m such
that
d(Rθn (z), Rθm (z)) < ε
where d is the arc length distance function. Now, since Rθ is distance preserv-
ing with respect to this metric, we can set l = n−m to obtain d(Rθl (z), z) < ε.
Also, by continuity of Rθl , Rθl maps the connected, closed arc from z to Rθl (z)
onto the connected closed arc from Rθl (z) to Rθ2l (z) and this one onto the arc
connecting Rθ2l (z) to Rθ3l (z), etc. Since these arcs have positive and equal
length, they cover S 1 . The result now follows, since the arcs have length
smaller than ε, and ε > 0 was arbitrary.
We will prove the proposition for the left shift on Xk+ , the case for the left shift
+
on Xk is similar. Fix 0 < ε < 1 and pick any x = {xi }∞ ∞
i=0 , y = {yi }i=0 ∈ Xk .
Notice that if at least one of the first n symbols in the itineraries of x and y
differ, then
dn (x, y) = max d(T i (x), T i (y)) = 1 > ε
0≤i≤n−1
where we have used the metric d on Xk+ defined above. Thus, the set An
consisting of sequences a ∈ Xk+ such that ai ∈ {0, . . . , k −1} for 0 ≤ i ≤ n−1
and aj = 0 for j ≥ n is (n, ε)-separated. Explicitly, a ∈ An is of the form
a0 a1 a2 . . . an−1 000 . . .
Since there are k n possibilities to choose the first n symbols of an itinerary,
we obtain sep(n, ε, L) ≥ k n . Hence,
1
h(T ) = lim lim log(sep(n, ε, L)) ≥ log(k)
ε↓0 n→∞ n
To prove the reverse inequality, take l ∈ Z>0 such that 2−l < ε. Then An+l
is an (n, ε, L)-spanning set, since for every x ∈ Xk+ there is some a ∈ An+l
for which the first n + l symbols coincide. In other words, d(x, a) < 2−l < ε.
Therefore,
1 n+l
h(T ) = lim lim log(span(n, ε, L)) ≤ lim lim log(k) = log(k)
ε↓0 n→∞ n ε↓0 n→∞ n
and our proof is complete.
123
Since T is continuous, T −i Iai is closed for all i and therefore An is closed for
n ∈ Z≥0 . As [0, 1] is a compact metric space, it follows from theorem 7.1.1
that An is compact. Moreover, {An }∞ n=0 forms a decreasing sequence, in the
sense that An+1 ⊂ An . Therefore, the intersection ∩∞ n=0 An is not empty and
∞
a is precisely the itinerary of x ∈ ∩n=0 An . We conclude that ψ is onto.
Now, since ψ ◦ T = L ◦ ψ, with L : X̃2+ −→ X̃2+ the (induced) left shift, we
have found a bijection between the periodic points of L on X̃2+ and those
of T on [0, 1]. The periodic points for T are quite difficult to find, but the
periodic points of L are easily identified. First observe that there are no
periodic points in the earlier defined equivalence classes in X̃2+ since these
points are eventually mapped to 0. Now, any periodic point a of L of period
n in X2+ must have an itinerary of the form
(a0 a1 . . . an−1 a0 a1 . . . an−1 a0 a1 . . .)
Hence, T has 2n periodic points of period n in [0, 1].
125
Exercise 7.3.1 In this exercise you are asked to prove some properties of
the shift map. Let Xk = {0, 1, . , k − 1}Z and Xk+ = {0, 1, . , k − 1}N∪{0} .
c. Show that the one-sided shift on Xk+ is continuous. Show that the two-
sided shift on Xk is a homeomorphism.
Exercise 7.3.2 Let Qλ be the quadratic map and let C, A1 be as in the first
example. Set [0, 1] − A1 = I0 ∪ I1 , where I0 is the√interval left of A1 and I1
the interval to the right. We assume that λ > 2 + 5, so that |Q0λ (x)| > 1 for
all x ∈ I0 ∪ I1 (check this!). Let X2+ = {0, 1}N and let φ : C −→ X2+ be the
itinerary map generated by Qλ on {I0 , I1 }. This exercise serves as another
demonstration of the usefulness of symbolic dynamics.
Exercise 7.3.3 Let Qλ be the quadratic map and let C be as in the first
example. This exercise shows that C is a Cantor set, i.e. a closed, totally
disconnected and perfect
√ subset of [0, 1]. As in the previous exercise, we will
assume that λ > 2 + 5.
126 Equivalence of the two Definitions
1. Hµ (α) ≤ log(N (α)), where N (α) is the number of sets in the partition
1
of non-zero measure. Equality holds only if µ(A) = N (α) , for all A ∈ α
with µ(A) > 0
127
128 Equivalence of the two Definitions
Proofs
Now, by successively applying proposition (4.2.1) (d), (g) and finally using
the fact that T is measure preserving, we get
n−1
_ n−1
_ n−1
X n−1
_
−i −i −i
H( T α| T γ) ≤ H(T α| T −i γ)
i=0 i=0 i=0 i=0
n−1
X
≤ H(T −i α|T −i γ)
i=0
= nH(α|γ)
Combining the inequalities, dividing both sides by n and taking the limit for
n → ∞ gives the desired result.
Main Theorem 129
Wn−1
Finally, since α ≤ i=0 T −i α, we obtain by (2),
n−1
_
n
h(T , α) ≤ h(T , n
T −i α) = nh(T, α)
i=0
Exercise 8.1.1 Use lemma 8.1.1 and proposition (4.2.1) to show that for an
invertible measure preserving transformation T on (X, F, µ):
Proof
Bi ∩ Ai = Bi
Bi ∩ Aj = ∅ if i 6= j
µ(A ∩ B0 )
µ(A|B0 ) =
µ(B0 )
Hence, we can apply (1) of lemma 8.1.1 and use the fact that
to obtain
Hµ (α|β) ≤ µ(B0 ) log(n) < nε log(n) < 1
Define for each i, i = 1, . . . , n, the open set Ci by Ci = B0 ∪ Bi = X −
∪j6=i Bj . Then we can define an open cover of X by γ = {C1 , . . . , Cn }.
By (1)
W of lemma 8.1.1 weWhave for m ≥ 1, in the notation of the lemma,
Hµ ( m−1
i=0 T −i
β) ≤ log(N ( m−1 −i
i=0 T β)). Note that γ is not necessarily
Wm−1 −ia par-
tition, butWby the proof of (1) of lemma 8.1.1, we still have Hµ ( i=0 T β) ≤
log(2m N ( m−1 −i
i=0 T γ)), since β contains (at most) one more set of non-zero
measure.
Thus, it follows that
Proof
Exercise 8.1.3 Improve the result in the exercise above by showing that we
can replace the inequality by an equality sign, i.e. for any µ, ν ∈ M (X, T )
and p ∈ [0, 1] we have hpµ+(1−p)ν (T ) = phµ (T ) + (1 − p)hν (T ).
1. For any x ∈ X and δ > 0, there is a 0 < η < δ such that µ(∂B(x, η)) =
0
Proof
(1): Suppose the statement is false. Then there exists an x ∈ X and δ > 0
such that for all 0 < η < δ µ(∂B(x, η)) > 0. Let {ηi }∞ i=1 be a sequence of
distinct real numbers P∞ satisfying 0 < ηi < δ and ηi ↑ δ for i → ∞. Then,
∞
µ(∪i=1 ∂B(x, ηi )) = i=1 µ(∂B(x, ηi )) = ∞, contradicting the fact that µ is
a probability measure on X.
(2): Fix δ > 0. By (1), for each x ∈ X, we can find an 0 < ηx < δ/2
such that µ(∂B(x, ηx )) = 0. The collection {B(x, ηx )|x ∈ X} forms an open
cover of X, so by compactness there exists a finite subcover which we denote
by β = {B1 , . . . , Bn }. Define α by letting A1 = B1 and for 0 < j ≤ n let
j−1
Aj = Bj − (∪k=1 Bk ). Then α is a partition of X, diam(Aj ) < diam(Bj ) < δ
and µ(∂Aj ) ≤ µ(∪ni=1 ∂Bi ) = 0, since ∂Aj ⊂ ∪ni=1 ∂Bi .
(3): Let x ∈ ∂(∩n−1 j=0 T
−j
Aj ). Then x ∈ ∩n−1
j=0 T
−j A , but x ∈
j / ∩n−1
j=0 T
−j
Aj .
−j −k
That is, every open neighborhood of x intersects every T Aj , but x ∈ / T Ak
Main Theorem 133
where the final inequality follows from the assumption µ(∂A) = 0. The proof
of the opposite inequality is similar.
Lemma 8.1.4 Let q, n be integers such that 1 < q < n. Define, for 0 ≤ j ≤
q − 1, a(j) = b n−j
q
c, where bc means taking the integer part. Then we have
the following
2. Fix 0 ≤ j ≤ q − 1. Define
Then
Example 8.1.1 Let us work out what is going on in lemma 8.1.4 for n =
10, q = 4. Then a(0) = b 10−0 4
c = 2 and similarly, a(1) = 2, a(2) = 2
and a(3) = 1. This gives us S0 = {8, 9}, S1 = {0, 9}, S2 = {0, 1} and
S3 = {0, 1, 2, 7, 8, 9}. For example, S3 is obtained by taking all nonnegative
integers smaller than 3 and adding the numbers 3+a(3)q = 7, 3+a(3)q+1 = 8
and 3 + a(3)q + 2 = 9. Note that card(Sj ) ≤ 8. One can readily check all
properties stated in the lemma, e.g. (a(3) − 1)q + 3 ≤ b 10−34
− 1c · 4 + 3 =
3 ≤ 6 = n − q.
We shall now finish the proof of our main theorem. The proof is presented
in a logical order, which is in this case not quite the same as thinking order.
Therefore, before plunging into a formal proof, we will briefly discuss the
main ideas.
We would like to construct a Borel probability measure µ with measure the-
oretic entropy hµ (T ) ≥ sep(ε, T ) := limn→∞ n1 log(sep(n, ε, T )). To do this,
we first find a measure σn with Hσn ( n−1 −i
W
i=0 T α) equal to log(sep(n, ε, T )),
where α is a suitably chosen partition. This is not too difficult. The problem
is to find an element of M (X, T ) with this property. Theorem (6.1.5) sug-
gest a suitable measure µ which can be obtained from the σn . The trick in
lemma 8.1.4 plays a crucial role in getting from an estimate for W Hσn to one
for Hµ . The idea is to first remove the ‘tails’ Sj of the partition n−1 −i
i=0 T α,
make a crude estimate for these tails and later add them back on again.
Lemma 8.1.3 fills in the remaining technicalities.
Proof
we find that
n−1
_
log(sep(n, ε, T )) = Hσn ( T −i α)
i=0
a(j)−1 q−1
X _ X
−rq−j
≤ Hσn (T ( T −i α)) + Hσn (T −i α)
r=0 i=0 i∈Sj
a(j)−1 q−1
X _
≤ Hσn ◦T −(rq+j) ( T −i α) + 2q log k
r=0 i=0
Here we used proposition (4.2.1)(d) and lemma 8.1.1. Now if we sum the
above inequality over j and divide both sides by n, we obtain by (3) and (1)
of lemma 8.1.4
n−1 q−1
q 1X _ 2q 2
log(sep(n, ε, T )) ≤ Hσn ◦T −l ( T −i α) + log k
n n l=0 i=0
n
q−1
_ 2q 2
≤ H µn ( T −i α) + log k
i=0
n
136 Measures of Maximal Entropy
To get a taste of the power of this statement, let us recast our proof of the
invariance of topological entropy under conjugacy ((2) of theorem 7.2.4). Let
φ : X1 → X2 denote the conjugacy. We note that µ ∈ M (X1 , T1 ) if and only
if µ ◦ φ−1 ∈ M (X2 , T2 ) and we observe that hµ (T1 ) = hµ◦φ−1 (T2 ). The result
now follows from the Variational Principle. It is that simple.
• If h(T ) < ∞ then the extreme points of Mmax (X, T ) are precisely the
ergodic members of Mmax (X, T ).
But then,
N
X µn
hµ (T ) ≥ >N
n=1
2n
This holds for arbitrary N ∈ Z>0 , so hµ (T ) = h(T ) = ∞ and µ is a measure
of maximal entropy.
138 Measures of Maximal Entropy
Now suppose that T has a unique measure of maximal entropy. Then, for
any ν ∈ M (X, T ), hµ/2+ν/2 (T ) = 21 hµ (T ) + 12 hν (T ) = ∞. Hence, µ = ν,
M (X, T ) = {µ}.
(4): The first statement follows from (2), for h(T ) < ∞, and (3), for
h(T ) = ∞. If T is uniquely ergodic, then M (X, T ) = {µ} for some µ.
By the Variational Principle, hµ (T ) = h(T ).
Proof
sep(ε, T ) ≤ sep(η, T ).
We conclude that sep(ε, T ) = sep(η, T ). Since 0 < η < ε was arbitrary, this
shows that h(T ) = limη↓0 sep(η, T ) = sep(ε, T ). This completes our proof.
At this point we would like to make a remark on the proof of the above propo-
sition. One might be inclined to believe that the measure µ constructed in
the proof of theorem 8.1.2 is a measure of maximal entropy for any continu-
ous transformation T . The reader should note, though, that µ still depends
on ε and it is therefore only because of the above corollary that µ is a mea-
sure of maximal entropy for expansive homeomorphisms.
140 Measures of Maximal Entropy
Bibliography
[KK] Kamae, Teturo; Keane, Michael, A simple proof of the ratio ergodic
theorem, Osaka J. Math. 34 (1997), no. 3, 653–657.
[M] Misiurewicz, M., 1976, A short proof of the variational principle for
a Zn+ action on a compact space, Asterique, 40, pp. 147-187
[Mu] Munkres, J.R., 2000, Topology, Second Edition, Prentice Hall, Inc.
141
142 Measures of Maximal Entropy
[Pa] William Parry, Topics in Ergodic Theory, Reprint of the 1981 original.
Cambridge Tracts in Mathematics, 75. Cambridge University Press,
Cambridge, 2004.
143
144 Measures of Maximal Entropy
one-sided, 103
translations, 9
uniquely ergodic, 96
weakly mixing, 45