Professional Documents
Culture Documents
Dajani Ergodic Theory
Dajani Ergodic Theory
Dajani Ergodic Theory
Contents
1 Introduction and preliminaries
1.1 What is Ergodic Theory? . . . . . . .
1.2 Measure Preserving Transformations
1.3 Basic Examples . . . . . . . . . . . .
1.4 Recurrence . . . . . . . . . . . . . . .
1.5 Induced and Integral Transformations
1.5.1 Induced Transformations . . .
1.5.2 Integral Transformations . . .
1.6 Ergodicity . . . . . . . . . . . . . . .
1.7 Other Characterizations of Ergodicity
1.8 Examples of Ergodic Transformations
2 The
2.1
2.2
2.3
.
.
.
.
.
.
.
.
.
.
5
5
6
9
15
15
15
18
20
22
24
Ergodic Theorem
The Ergodic Theorem and its consequences . . . . . . . . . . .
Characterization of Irreducible Markov Chains . . . . . . . . .
Mixing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
29
41
45
.
.
.
.
.
.
.
.
.
.
and
. . .
. . .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Factor Maps
47
. . . . . . . . . . . . . 47
. . . . . . . . . . . . . 51
. . . . . . . . . . . . . 52
4 Entropy
4.1 Randomness and Information . . . . . . .
4.2 Definitions and Properties . . . . . . . . .
4.3 Calculation of Entropy and Examples . . .
4.4 The Shannon-McMillan-Breiman Theorem
4.5 Lochs Theorem . . . . . . . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
55
55
56
63
66
72
4
5 Hurewicz Ergodic Theorem
5.1 Equivalent measures . . . . . . . . . . . . . . . . . . . . . . .
5.2 Non-singular and conservative transformations . . . . . . . . .
5.3 Hurewicz Ergodic Theorem . . . . . . . . . . . . . . . . . . . .
79
79
80
83
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
101
102
108
108
114
118
Chapter 1
Introduction and preliminaries
1.1
1.2
Definition 1.2.1 Let (X, B, ) be a probability space, and T : X X measurable. The map T is said to be measure preserving with respect to if
(T 1 A) = (A) for all A B.
This definition implies that for any measurable function f : X R, the
process
f, f T, f T 2 , . . .
is stationary. This means that for all Borel sets B1 , . . . , Bn , and all integers
r1 < r2 < . . . < rn , one has for any k 1,
({x : f (T r1 x) B1 , . . . f (T rn x) Bn }) =
{x : f (T r1 +k x) B1 , . . . f (T rn +k x) Bn } .
In case T is invertible, then T is measure preserving if and only if (T A) =
(A) for all A B. We can generalize the definition of measure preserving to
the following case. Let T : (X1 , B1 , 1 ) (X2 , B2 , 2 ) be measurable, then
T is measure preserving if 1 (T 1 A) = 2 (A) for all A B2 . The following
T E = i=1 T Ei B1 .
1
1 (T 1 E) = 1 (
En ) =
n=1 T
lim 1 (T 1 En )
lim 2 (En )
= 2 (
n=1 En )
= 2 (E).
Thus, E C. A similar proof shows that if F1 F2 . . . are elements of C,
then
i=1 Fi C. Hence, C is a monotone class containing the algebra A(S2 ).
By the monotone class theorem, B2 is the smallest monotone class containing
A(S2 ), hence B2 C. This shows that B2 = C, therefore T is measurable
and measure preserving.
For example if
X = [0, 1) with the Borel -algebra B, and a probability measure on B.
Then a transformation T : X X is measurable and measure preserving if
and only if T 1 [a, b) B and (T 1 [a, b)) = ([a, b)) for any interval [a, b).
X = {0, 1}N with product -algebra and product measure . A transformation T : X X is measurable and measure preserving if and only
if
T 1 ({x : x0 = a0 , . . . , xn = an }) B,
and
T 1 {x : x0 = a0 , . . . , xn = an } = ({x : x0 = a0 , . . . , xn = an })
for any cylinder set.
Exercise 1.2.1 Recall that if A and B are measurable sets, then
AB = (A B) \ (A B) = (A \ B) (B \ A).
Show that for any measurable sets A, B, C one has
(AB) (AC) + (CB).
Another useful lemma is the following (see also ([KT]).
Basic Examples
Lemma 1.2.1 Let (X, B, ) be a probability space, and A an algebra generating B. Then, for any A B and any > 0, there exists C A such that
(AC) < .
Proof. Let
D = {A B : for any > 0, there exists C A such that (AC) < }.
Clearly, A D B. By the Monotone Class Theorem (Theorem (1.2.1)),
we need to show that D is a monotone
S class. To this end, let A1 A2
be a sequence in D, and let A = n=1 An , notice that (A) = lim (An ).
n
Let > 0, there exists an N such that (AAN ) = |(A) (AN )| < /2.
Since AN D, then there exists C A such that (AN C) < /2. Then,
(AC) (AAN ) + (AN C) < .
Hence, A D. Similarly, one can show that D is closed under decreasing intersections so that D is a monotone class containg A, hence by the Monotone
class Theorem B D. Therefore, B = D, and the theorem is proved.
1.3
Basic Examples
10
and
T 1 [a, b) = b a = ([a, b)) .
Although this map is very simple, it has in fact many facets. For example,
iterations of this map yield the binary expansion of points in [0, 1) i.e., using
T one can associate with each point in [0, 1) an infinite sequence of 0s and
1s. To do so, we define the function a1 by
(
0 if 0 x < 1/2
a1 (x) =
1 if 1/2 x < 1,
then T x = 2x a1 (x). Now, for n 1 set an (x) = a1 (T n1 x). Fix x X,
for simplicity, we write an instead of an (x), then T x = 2x a1 . Rewriting
2
we get x = a21 + T2x . Similarly, T x = a22 + T 2 x . Continuing in this manner,
we see that for each n 1,
x=
an T n x
a1 a2
+ 2 + + n + n .
2
2
2
2
n
X
ai
i=1
2i
T nx
0 as n .
2n
P ai
Thus, x =
i=1 2i . We shall later see that the sequence of digits a1 , a2 , . . .
forms an i.i.d. sequence of Bernoulli random variables.
(c) Bakers Transformation This example is the two-dimensional version
of example (b). The underlying probability space is [0, 1)2 with product
Lebesgue -algebra B B and product Lebesgue measure . Define
T : [0, 1)2 [0, 1)2 by
(
0 x < 1/2
(2x, y2 )
T (x, y) =
y+1
(2x 1, 2 ) 1/2 x < 1.
Exercise 1.3.1 Verify that T is invertible, measurable and measure preserving.
(d) -transformations
Let X = [0, 1) with the Lebesgue -algebra B. Let
1+ 5
= 2 , the golden mean. Notice that 2 = + 1. Define a transformation
Basic Examples
11
T : X X by
T x = x mod 1 =
x
0 x < 1/
x 1 1/ x < 1.
g(x) dx,
B
where
(
g(x) =
5+3 5
10
5+ 5
10
0 x < 1/
1/ x < 1.
X
bi
,
x=
i
i=1
12
Notice that in case X = {0, 1, . . . k 1}N , then one should consider cylinder
sets of the form {x : x0 = a0 , . . . xn = an }. In this case
T 1 {x : x0 = a0 , . . . , xn = an } = k1
j=0 {x : x0 = j, x1 = a0 , . . . , xn+1 = an },
and it is easy to see that T is measurable and measure preserving.
(f) Markov Shifts Let (X, F, T ) be as in example (e). We define a measure
on F as follows. Let P = (pij ) be a stochastic k k matrix, and q =
(q0 , q1 , . . . , qk1 ) a positive probability vector such that qP = q. Define on
cylinders by
({x : xn = an , . . . xn = an }) = qan pan an+1 . . . pan1 an .
Just as in example (e), one sees that T is measurable and measure preserving.
(g) Stationary Stochastic Processes Let (, F, IP) be a probability space,
and
. . . , Y2 , Y1 , Y0 , Y1 , Y2 , . . .
a stationary stochastic process on with values in R. Hence, for each k Z
IP (Yn1 B1 , . . . , Ynr Br ) = IP (Yn1 +k B1 , . . . , Ynr +k Br )
for any n1 < n2 < < nr and any Lebesgue sets B1 , . . . , Br . We want to
see this process as coming from a measure preserving transformation.
Let X = RZ = {x = (. . . , x1 , x0 , x1 , . . .) : xi R} with the product -algebra
(i.e. generated by the cylinder sets). Let T : X X be the left shift i.e.
T x = z where zn = xn+1 . Define : X by
() = (. . . , Y2 (), Y1 (), Y0 (), Y1 (), Y2 (), . . . ).
Then, is measurable since if B1 , . . . , Br are Lebesgue sets in R, then
1 ({x X : xn1 B1 , . . . xnr Br }) = Yn1
(B1 ) . . . Yn1
(Br ) F.
r
1
Define a measure on X by
(E) = IP 1 (E) .
On cylinder sets has the form,
({x X : xn1 B1 , . . . xnr Br }) = IP (Yn1 B1 , . . . , Ynr Br ) .
Basic Examples
13
Since
T 1 ({x : xn1 B1 , . . . xnr Br }) = {x : xn1 +1 B1 , . . . xnr +1 Br },
stationarity of the process Yn implies that T is measure preserving. Furthermore, if we let i : X R be the natural projection onto the ith coordinate,
then Yi () = i (()) = 0 T i (()).
(h) Random Shifts Let (X, B, ) be a probability space, and T : X X an
invertible measure preserving transformation. Then, T 1 is measurable and
measure preserving with respect to . Suppose now that at each moment
instead of moving forward by T (x T x), we first flip a fair coin to decide
whether we will use T or T 1 . We can describe this random system by means
of a measure preserving transformation in the following way.
Let = {1, 1}Z with product -algebra F (i.e. the -algebra generated
by the cylinder sets), and the uniform product measure IP (see example (e)),
and let : be the left shift. As in example (e), the map is measure
preserving. Now, let Y = X with the product -algebra, and product
measure IP . Define S : Y Y by
S(, x) = (, T 0 x).
Then S is invertible (why?), and measure preserving with respect to IP .
To see the latter, for any set C F, and any A B, we have
(IP ) S 1 (C A) = (IP ) ({(, x) : S(, x) (C A))
= (IP ) ({(, x) : 0 = 1, C, T x A)
+ (IP ) {(, x) : 0 = 1, C, T 1 x A
= (IP ) {0 = 1} 1 C T 1 A
+ (IP ) {0 = 1} 1 C T A
= IP {0 = 1} 1 C T 1 A
+ IP {0 = 1} 1 C (T A)
= IP {0 = 1} 1 C (A)
+ IP {0 = 1} 1 C (A)
= IP( 1 C)(A) = IP(C)(A) = (IP )(C A).
(h) continued fractions Consider ([0, 1), B), where B is the Lebesgue algebra. Define a transformation T : [0, 1) [0, 1) by T 0 = 0 and for x 6= 0
1
1
T x = b c.
x
x
14
if x ( 12 , 1)
1
if x ( n+1
, n1 ], n 2,
1
. For n 1, let an = an (x) =
a1 + T x
a1 (T n1 x). Then, after n iterations we see that
then, T x =
1
x
a1 and hence x =
x=
In fact, if
pn
=
qn
1
= ... =
a1 + T x
a1 +
1
.
a2 + . . +
1
an + T n x
a1 +
1
.
a2 + . . +
an
tonically increasing, and
|x
pn
1
| < 2 0 as n .
qn
qn
x=
a1 +
a2 +
1
a3 +
1
..
Recurrence
1.4
15
Recurrence
a contradiction.
The proof of the above theorem implies that almost every x B returns
to B infinitely often. In other words, there exist infinitely many integers
n1 < n2 < . . . such that T ni x B. To see this, let
D = {x B : T k x B for finitely many k 1}.
Then,
k
D = {x B : T k x F for some k 0}
F.
k=0 T
1.5
1.5.1
16
For x A, let n(x) := inf{n 1 : T n x A}. We call n(x) the first return
time of x to A.
Exercise 1.5.1 Show that n is measurable with respect to the -algebra
F A on A.
By Poincare Theorem, n(x) is finite a.e. on A. In the sequel we remove
from A the set of measure zero on which n(x) = , and we denote the new
set again by A. Consider the -algebra F A on A, which is the restriction
of F to A. Furthermore, let A be the probability measure on A, defined by
A (B) =
(B)
, for B F A,
(A)
and
T 1 Bn = An+1 Bn+1 .
(1.1)
17
B. 1
..
........
...
..
B. 1
B. 2
B. 1
B. 2
B. 3
B. 1
B. 2
B. 3
B. 4
A2
A3
A4
A5
....
.......
....
....
.......
....
A1
T A \ A (A, 2)
....
.......
....
....
.......
....
....
.......
....
....
.......
....
T 2 A \ A T A (A, 3)
..
........
...
..
..
........
...
..
..
........
...
..
..
.
A (A, 1)
Ak
TA1 C
k=1
Ak T k C,
k=1
hence
TA1 (C)
Ak T k C .
k=1
On the other hand, using repeatedly (1.1) , one gets for any n 1,
T 1 (C) = (A1 T 1 C) + (B1 T 1 C)
= (A1 T 1 C) + (T 1 (B1 T 1 C))
= (A1 T 1 C) + (A2 T 2 C) + (B2 T 2 C)
..
.
n
X
=
(Ak T k C) + (Bn T n C).
k=1
Since
1
Bn T n C
n=1
(Bn T n C),
n=1
it follows that
lim (Bn T n C) = 0.
18
Thus,
(C) = (T
C) =
Ak T k C = (TA1 C).
k=1
A (TA1 C),
1+ 5
, so that G2 = G + 1. Consider the set
Exercise 1.5.4 Let G =
2
[ 1
1
1
X = [0, ) [0, 1) [ , 1) [0, ),
G
G
G
endowed with the product Borel -algebra, and the normalized Lebesgue
measure . Define the transformation
(Gx, G ),
T (x, y) =
1.5.2
Integral Transformations
19
(3) (B, i) = Z
f (y) d(y)
A
if i + 1 f (y),
if i + 1 > f (y).
(B)
.
f (y) d(y)
A
(B, 1) =
(An S 1 B, n)
(disjoint union).
n=1
Since
n=1
(T 1 (B, 1)) =
X
(An S 1 B)
(S 1 B)
Z
= Z
n=1
f (y) d(y)
f (y) d(y)
A
= Z
(B)
f (y) d(y)
= (B, 1) .
20
1.6
Ergodicity
Definition 1.6.1 Let T be a measure preserving transformation on a probability space (X, F, ). The map T is said to be ergodic if for every measurable
set A satisfying T 1 A = A, we have (A) = 0 or 1.
Theorem 1.6.1 Let (X, F, ) be a probability space and T : X X measure preserving. The following are equivalent:
(i) T is ergodic.
(ii) If B F with (T 1 BB) = 0, then (B) = 0 or 1.
n
A) = 1.
(iii) If A F with (A) > 0, then (
n=1 T
(iv) If A, B F with (A) > 0 and (B) > 0, then there exists n > 0 such
that (T n A B) > 0.
Remark 1.6.1
1. In case T is invertible, then in the above characterization one can replace
T n by T n .
2. Note that if (B4T 1 B) = 0, then (B \ T 1 B) = (T 1 B \ B) = 0.
Since
B = B \ T 1 B B T 1 B ,
and
T 1 B = T 1 B \ B B T 1 B ,
we see that after removing a set of measure 0 from B and a set of measure 0
from T 1 B, the remaining parts are equal. In this case we say that B equals
T 1 B modulo sets of measure 0.
3. In words, (iii) says that if A is a set of positive measure, almost every
x X eventually (in fact infinitely often) will visit A.
4. (iv) says that elements of B will eventually enter A.
Ergodicity
21
T k B.
n=1 k=n
\
[
(CB) =
T k B B c +
T k B c B
n=1 k=n
[
k
n=1 k=n
!
B Bc
k=1
!
T k B c B
k=1
T k BB .
k=1
[
[
(B) = B
T n A =
(B T n A) > 0.
n=1
n=1
22
1.7
We denote by L0 (X, F, ) the space of all complex valued measurable functions on the probability space (X, F, ). Let
Z
p
0
|f |p d(x) < }.
L (X, F, ) = {f L (X, F, ) :
X
We use the subscript R whenever we are dealing only with real-valued functions.
Let (Xi , Fi , i ), i = 1, 2 be two probability spaces, and T : X1 X2 a measure preserving transformation i.e., 2 (A) = 1 (T 1 A). Define the induced
operator UT : L0 (X2 , F2 , 2 ) L0 (X1 , F1 , 1 ) by
UT f = f T.
The following properties of UT are easy to prove.
Proposition 1.7.1 The operator UT has the following properties:
(i) UT is linear
(ii) UT (f g) = UT (f )UT (g)
(iii) UT c = c for any constant c.
(iv) UT is a positive linear operator
(v) UT 1B = 1B T = 1T 1 B for all B F2 .
R
R
(vi) X1 UT f d1 = X2 f d2 for all f L0 (X2 , F2 , 2 ), (where if one side
doesnt exist or is infinite, then the other side has the same property).
(vii) Let p 1. Then, UT Lp (X2 , F2 , 2 ) Lp (X1 , F1 , 1 ), and ||UT f ||p =
||f ||p for all f Lp (X2 , F2 , 2 ).
Exercise 1.7.1 Prove Proposition 1.7.1
Using the above properties, we can give the following characterization of
ergodicity
23
Theorem 1.7.1 Let (X, F, ) be a probability space, and T : X X measure preserving. The following are equivalent:
(i) T is ergodic.
(ii) If f L0 (X, F, ), with f (T x) = f (x) for all x, then f is a constant
a.e.
(iii) If f L0 (X, F, ), with f (T x) = f (x) for a.e. x, then f is a constant
a.e.
(iv) If f L2 (X, F, ), with f (T x) = f (x) for all x, then f is a constant
a.e.
(v) If f L2 (X, F, ), with f (T x) = f (x) for a.e. x, then f is a constant
a.e.
Proof
The implications (iii)(ii), (ii)(iv), (v)(iv), and (iii)(v) are all clear.
It remains to show (i)(iii) and (iv)(i).
(i)(iii) Suppose f (T x) = f (x) a.e. and assume without any loss of generality that f is real (otherwise we consider separately the real and imaginary
parts of f ). For each n 1 and k Z, let
X(k,n) = {x X :
k+1
k
f (x) <
}.
n
2
2n
Hence, for each n 1, there exists a unique integer kn such that X(kn ,n) =
kn
1. In fact, X(k1 ,1) X(k2 ,2) . . ., and { n } is a bounded increasing sequence,
2
24
T
kn
exists.
Let
Y
=
n1 X(kn ,n) , then (Y ) = 1. Now, if
2n
kn
x Y , then 0 |f (x) kn /2n | < 1/2n for all n. Hence, f (x) = limn n ,
2
and f is a constant on Y .
hence limn
1.8
25
an e2in(x+) =
nZ
= f (x) =
an e2in e2inx
nZ
2inx
an e
nZ
P
2in 2inx
Hence,
)e
= 0. By the uniqueness of the Fourier
nZ an (1 e
2in
coefficients, we have an (1 e
) = 0 for all n Z. If n 6= 0, since is
irrational we have 1 e2in 6= 0. Thus, an = 0 for all n 6= 0, and therefore
f (x) = a0 is a constant. By Theorem 1.7.1, T is ergodic.
Exercise 1.8.2 Consider the probability space ([0, 1), B B, ), where
as above B is the Lebesgue -algebra on [0, 1), and normalized Lebesgue
measure. Suppose (0, 1) is irrational, and define T T : [0, 1) [0, 1)
[0, 1) [0, 1) by
T T (x, y) = (x + mod (1) , y + mod (1) ) .
Show that T T is measure preserving, but is not ergodic.
Example 2One (or Two) sided shift. Let X = {0, 1, . . . k 1}N , F the algebra generated by the cylinders, and the product measure defined on
cylinder sets by
({x : x0 = a0 , . . . xn = an }) = pa0 . . . pan ,
where p = (p0 , p1 , . . . , pk1 ) is a positive probability vector. Consider the
left shift T defined on X by T x = y, where yn = xn+1 (See Example (e) in
Subsection 1.3). We show that T is ergodic. Let E be a measurable subset
of X which is T -invariant i.e., T 1 E = E. For any > 0, by Lemma 1.2.1
(see subsection 1.2), there exists A F which is a finite disjoint union of
cylinders such that (EA) < . Then
|(E) (A)| = |(E \ A) (A \ E)|
(E \ A) + (A \ E) = (EA) < .
Since A depends on finitely many coordinates only, there exists n0 > 0
such that T n0 A depends on different coordinates than A. Since is a
product measure, we have
(A T n0 A) = (A)(T n0 A) = (A)2 .
26
Further,
(ET n0 A) = (T n0 ET n0 A) = (EA) < ,
and
E(A T n0 A) (EA) + (ET n0 A) < 2.
Hence,
|(E) ((A T n0 A))| E(A T n0 A) < 2.
Thus,
|(E) (E)2 | |(E) (A)2 | + |(A)2 (E)2 |
= |(E) ((A T n0 A))| + ((A) + (E))|(A) (E)|
< 4.
Since > 0 is arbitrary, it follows that (E) = (E)2 , hence (E) = 0 or 1.
Therefore, T is ergodic.
The following lemma provides, in some cases, a useful tool to verify that a
measure preserving transformation defined on ([0, 1), B, ) is ergodic, where
B is the Lebesgue -algebra, and is a probability measure equivalent to
Lebesgue measure (i.e., (A) = 0 if and only if (A) = 0).
Lemma 1.8.1 (Knopps Lemma) . If B is a Lebesgue set and C is a class
of subintervals of [0, 1) satisfying
(a) every open subinterval of [0, 1) is at most a countable union of disjoint
elements from C,
(b) A C , (A B) (A), where > 0 is independent of A,
then (B) = 1.
Proof The proof is done by contradiction. Suppose (B c ) > 0. Given > 0
there exists by Lemma 1.2.1 a set E that is a finite disjoint union of open
intervals such that (B c 4E ) < . Now by conditions (a) and (b) (that
is, writing E as a countable union of disjoint elements of C) one gets that
(B E ) (E ).
27
1
(T n A B)
(A)
1
(B) = (A)(B).
2n
Thus, the second hypothesis of Knopps Lemma is satisfied with = (B) >
0. Hence, (B) = 1. Therefore T is ergodic.
Exercise 1.8.3 Let > 1 be a non-integer, and consider the transformation
T : [0, 1) [0, 1) given by T x = x mod(1) = x bxc. Use Lemma
1.8.1 to show that T is ergodic with respect to Lebesgue measure , i.e. if
T1 A = A, then (A) = 0 or 1.
28
(Ak T k C)
(Bk T k C)
k1
k1
= C E = F.
Hence, F is T -invariant, and by ergodicity of T we have (F ) = 0 or 1.
If (F ) = 0, then (C) = 0, and hence A (C) = 0.
If (F ) = 1, then (X \ F ) = 0. Since
X \ F = (A \ C) ((X \ A) \ E) A \ C,
it follows that
(A \ C) (X \ F ) = 0.
Since (A \ C) = (A) (C), we have (A) = (C), i.e., A (C) = 1.
Exercise 1.8.4 Show that if TA is ergodic and
is ergodic.
k1
T k A = 1, then, T
Chapter 2
The Ergodic Theorem
2.1
1X
Yi = EY1 (a.e.).
lim
n n
i=1
For example consider X = {0, 1}N , F the -algebra generated by the cylinder
sets, and the uniform product measure, i.e.,
({x : x1 = a1 , x2 = a2 , . . . , xn = an }) = 1/2n .
Suppose one is interested in finding the frequency of the digit 1. More precisely, for a.e. x we would like to find
1
#{1 i n : xi = 1}.
n n
lim
Using the Strong Law of Large Numbers one can answer this question easily.
Define
1, if xi = 1,
Yi (x) :=
0, otherwise.
29
30
Then,
n
1X
1
#{1 i n : xi = 0, xi+1 = 1, xi+2 = 1} =
Zi (x).
n
n i=1
It is not hard to see that this sequence is stationary but not independent. So
one cannot directly apply the strong law of large numbers. Notice that if T
is the left shift on X, then Yn = Y1 T n1 and Zn = Z1 T n1 .
In general, suppose (X, F, ) is a probability space and T : X X a measure
preserving transformation. For f L1 (X, F, ), we would like to know under
n1
1X
f (T i x) exist a.e. If it does exist
what conditions does the limit limn
n i=0
what is its value? This is answered by the Ergodic Theorem which was
originally proved by G.D. Birkhoff in 1931. Since then, several proofs of this
important theorem have been obtained; here we present a recent proof given
by T. Kamae and M.S. Keane in [KK].
Theorem 2.1.1 (The Ergodic Theorem) Let (X, F, ) be a probability space
and T : X X a measure preserving transformation. Then, for any f in
L1 (),
n1
1X
f (T i (x)) = f (x)
lim
n n
i=0
R
R
exists a.e., is T -invariant and X f d
=
f d. If moreover T is ergodic,
X
R
31
For the proof of the above theorem, we need the following simple lemma.
Lemma 2.1.1 Let M > 0 be an integer, and suppose {an }n0 , {bn }n0 are
sequences of non-negative real numbers such that for each n = 0, 1, 2, . . . there
exists an integer 1 m M with
an + + an+m1 bn + + bn+m1 .
Then, for each positive integer N > M , one has
a0 + + aN 1 b0 + + bN M 1 .
Proof of Lemma 2.1.1 Using the hypothesis we recursively find integers
m0 < m1 < . . . < mk < N with the following properties
m0 M, mi+1 mi M for i = 0, . . . , k 1, and N mk < M,
a0 + . . . + am0 1 b0 + . . . + bm0 1 ,
am0 + . . . + am1 1 bm0 + . . . + bm1 1 ,
..
.
amk1 + . . . + amk 1 bmk1 + . . . + bmk 1 .
Then,
a0 + . . . + aN 1 a0 + . . . + amk 1
b0 + . . . + bmk 1 b0 + . . . bN M 1 .
Proof of Theorem 2.1.1 Assume with no loss of generality that f 0
(otherwise we write f = f + f , and we consider each part separately).
fn (x)
, and f (x) =
Let fn (x) = f (x) + . . . + f (T n1 x), f (x) = lim supn
n
fn (x)
lim inf n
. Then f and f are T -invariant. This follows from
n
fn (T x)
f (T x) = lim sup
n
n
fn+1 (x) n + 1 f (x)
= lim sup
n+1
n
n
n
fn+1 (x)
= lim sup
= f (x).
n+1
n
32
33
Since
Z
f (x) d(x) + L(X \ X0 ),
F (x) d(x) =
X
X0
one has
Z
Z
f (x) d(x)
f (x) d(x)
X0
Z
F (x) d(x) L(X \ X0 )
Z
(N M )
f (x) d(x).
X
Z
f (x) d(x)
f (x) d(x).
X
34
R
Since f 0, the measure defined by (A) = A f (x) d(x) is absolutely
continuous with respect to the measure . Hence, there exists 0 > 0 such
that
if (A) < , then (A) < 0 . Since (X \ Y0 ) < , then (X \ Y0 ) =
R
f (x)d(x) < 0 . Hence,
X\Y0
Z
f (x) d(x) =
G(x) d(x) +
f (x) d(x)
X\Y0
Z
N
(f (x) + ) d(x) + 0 .
N M X
Z
f d
f d
X
f d,
X
f (x) =
f (y)d(y) =
f (y) d(y).
X
35
Remarks
(1) Let us study further the limit f in the case that T is not ergodic. Let I
be the sub--algebra of F consisting of all T -invariant subsets A F. Notice
that if f L1 (), then the conditional expectation of f given I (denoted by
E (f |I)), is the unique a.e. I-measurable L1 () function with the property
that
Z
Z
f (x) d(x) =
E (f |I)(x) d(x)
A
n1
1X
1X
(f 1A )(T i x) = 1A (x) lim
f (T i x) = 1A (x)f (x) a.e.
n n
n n
i=0
i=0
lim
and
Z
f 1A (x) d(x) =
f 1A (x) d(x).
1X
lim
f (T i (x)) =
n n
i=0
Z
f d
X
a.e.
Exercise 2.1.1 (Kacs Lemma) Let T be a measure preserving and ergodic
transformation on a probability space (X, F, ). Let A be a measurable
subset of X of positive measure, and denote by n the first return time map
and let TA be the induced transformation of T on A (see section 1.5). Prove
that
Z
n(x) d = 1.
A
36
1X
1
n(TAi (x)) =
,
n n
(A)
i=0
lim
almost everywhere on A.
1+ 5
Exercise 2.1.2 Let =
, and consider the transformation T :
2
[0, 1) [0, 1) given by T x = x mod(1) = x bxc. Define b1 on
[0, 1) by
0 if 0 x < 1/
b1 (x) =
1 if 1/ x < 1,
Fix k 0. Find the a.e. value (with respect to Lebesgue measure) of the
following limit
1
#{1 i n : bi = 0, bi+1 = 0, . . . , bi+k = 0}.
n n
lim
Using the Ergodic Theorem, one can give yet another characterization of
ergodicity.
Corollary 2.1.1 Let (X, F, ) be a probability space, and T : X X a
measure preserving transformation. Then, T is ergodic if and only if for all
A, B F, one has
n1
1X
lim
(T i A B) = (A)(B).
n n
i=0
(2.1)
1X
lim
1A (T i x) =
n n
i=0
Z
1A (x) d(x) = (A) a.e.
X
37
Then,
n1
1X
lim
1T i AB (x) =
n n
i=0
n1
1X
lim
1T i A (x)1B (x)
n n
i=0
n1
1X
1A (T i x)
= 1B (x) lim
n n
i=0
= 1B (x)(A) a.e.
Pn1
Since for each n, the function limn n1 i=0
1T i AB is dominated by the
constant function 1, it follows by the dominated convergence theorem that
Z
n
n1
1X
1X
i
lim
(T A B) =
lim
1T i AB (x) d(x)
n n
X n n i=0
i=0
Z
1B (A) d(x) = (A)(B).
=
X
1X
lim
(T i E E) = (E)2 .
n n
i=0
Hence, (E) = (E)2 . Since (E) > 0, this implies (E) = 1. Therefore, T
is ergodic.
To show ergodicity one needs to verify equation (2.1) for sets A and B
belonging to a generating semi-algebra only as the next proposition shows.
Proposition 2.1.1 Let (X, F, ) be a probability space, and S a generating
semi-algebra of F. Let T : X X be a measure preserving transformation.
Then, T is ergodic if and only if for all A, B S, one has
n1
1X
(T i A B) = (A)(B).
n n
i=0
lim
(2.2)
38
Proof We only need to show that if (2.2) holds for all A, B S, then it
holds for all A, B F. Let > 0, and A, B F. Then, by Lemma 1.2.1
(subsection 1.2) there exist sets A0 , B0 each of which is a finite disjoint union
of elements of S such that
(AA0 ) < , and (BB0 ) < .
Since,
(T i A B)(T i A0 B0 ) (T i AT i A0 ) (BB0 ),
it follows that
|(T i A B) (T i A0 B0 )| (T i A B)(T i A0 B0 )
(T i AT i A0 ) + (BB0 )
< 2.
Further,
|(A)(B) (A0 )(B0 )|
<
Hence,
!
n1
1X
(T i A B) (A)(B)
n
i=0
!
n1
1X
(T i A0 B0 ) (A0 )(B0 )
n i=0
n1
1 X
(T i A B) + (T i A0 B0 ) |(A)(B) (A0 )(B0 )|
n i=0
< 4.
Therefore,
"
lim
#
n1
1X
(T i A B) (A)(B) = 0.
n i=0
39
Theorem 2.1.2 Suppose 1 and 2 are probability measures on (X, F), and
T : X X is measurable and measure preserving with respect to 1 and 2 .
Then,
(i) if T is ergodic with respect to 1 , and 2 is absolutely continuous with
respect to 1 , then 1 = 2 ,
(ii) if T is ergodic with respect to 1 and 2 , then either 1 = 2 or 1 and
2 are singular with respect to each other.
Proof (i) Suppose T is ergodic with respect to 1 and 2 is absolutely continuous with respect to 1 . For any A F, by the ergodic theorem for a.e.
x one has
n1
1X
1A (T i x) = 1 (A).
lim
n n
i=0
Let
n1
1X
1A (T i x) = 1 (A)},
CA = {x X : lim
n n
i=0
then 1 (CA ) = 1, and by absolute continuity of 2 one has 2 (CA ) = 1. Since
T is measure preserving with respect to 2 , for each n 1 one has
n1
1X
n i=0
1A (T i x) d2 (x) = 2 (A).
n1
1X
1A (T i x)d2 (x) =
n i=0
Z
1 (A) d2 (x).
X
1X
1A (T j x) = i (A)}.
n n
j=0
Ci = {x X : lim
40
Note that
Z
Z
E (1A |I)(x) d(x) =
(A) =
X
x (A) d(x),
X
2.2
41
1X k
P ,
Q = lim
n n
k=0
(k)
1 X (k)
pij .
qij = lim
n n
k=0
Lemma 2.2.1 For each i, j {0, 1, . . . N 1}, the limit limn
exists, i.e., qij is well-defined.
1
n
Pn1
k=0
(k)
pij
n1
1 1X
1 X (k)
pij =
({x X : x0 = i, xk = j}).
n k=0
i n k=0
Since T is measure preserving, by the ergodic theorem,
n1
n1
1X
1X
1{x:xk =j} (x) = lim
1{x:x0 =j} (T k x) = f (x),
n n
n n
k=0
k=0
lim
n1
1X
1X
1{x:x0 =i,xk =j} (x) = 1{x:x0 =i} (x) lim
1{x:x0 =j} (T k x) = f (x)1{x:x0 =i} (x).
lim
n n
n n
k=0
k=0
42
P
Since n1 n1
k=0 1{x:x0 =i,xk =j} (x) 1 for all n, by the dominated convergence
theorem,
n1
qij
1
1X
=
lim
({x X : x0 = i, xk = j})
i n n k=0
Z
n1
1X
1
lim
=
1{x:x0 =i,xk =j} (x) d(x)
i X n n k=0
Z
1
f (x)1{x:x0 =i} (x) d(x)
=
i X
Z
1
f (x) d(x)
=
i {x:x0 =i}
Exercise 2.2.1 Show that the matrix Q has the following properties:
(a) Q is stochastic.
(b) Q = QP = P Q = Q2 .
(c) Q = .
We now give a characterization for the ergodicity of T . Recall that the
matrix P is said to be irreducible if for every i, j {0, 1, . . . N 1}, there
(n)
exists n 1 such that pij > 0.
Theorem 2.2.1 The following are equivalent,
(i) T is ergodic.
(ii) All rows of Q are identical.
(iii) qij > 0 for all i, j.
(iv) P is irreducible.
(v) 1 is a simple eigenvalue of P .
Proof
(i) (ii) By the ergodic theorem for each i, j,
n1
1X
1{x:x0 =i,xk =j} (x) = 1{x:x0 =i} (x)j .
lim
n n
k=0
43
1X
lim
({x X : x0 = i, xk = j}) = i j .
n n
k=0
Hence,
n1
qij =
1X
1
lim
({x X : x0 = i, xk = j}) = j ,
i n n k=0
1 X (k)
pij = qij > 0.
n n
k=0
lim
(n)
N
1
X
(n)
(n)
m=0
(n)
for any n. Since P is irreducible, there exists n such that plj > 0. Hence,
qij > 0 for all i, j.
(iii) (ii) Suppose qij > 0 for all j = 0, 1, . . . , N 1. Fix any state j, and
let qj = max0iN 1 qij . Suppose that not all the qij s are the same. Then
there exists k {0, 1, . . . , N 1} such that qkj < qj . Since Q is stochastic
and Q2 = Q, then for any i {0, 1, . . . , N 1} we have,
qij =
N
1
X
l=0
N
1
X
l=0
qil qj = qj .
44
pi0 i1 . . . pil1 il .
Thus,
n1
1X
(T k A B)
lim
n n
k=0
n1
1 X (k)
pjm i0
= j0 pj0 j1 . . . pjm1 jm pi0 i1 . . . pil1 il lim
n n
k=0
= (j0 pj0 j1 . . . pjm1 jm )(i0 pi0 i1 . . . pil1 il )
= (B)(A).
Therefore, T is ergodic.
(ii) (v) If all the rows of Q are identical, then qij = j for all i, j. If
P 1
vP = v, then vQ = v. This implies that for all j, vj = ( N
i=0 vi )j . Thus,
v is a multiple of . Therefore, 1 is a simple eigenvalue.
(v) (ii) Suppose 1 is a simple eigenvalue. For any i, let qi = (qi0 , . . . , qi(N 1) )
denote the ith row of Q then, qi is a probability vector. From Q = QP , we get
qi = qi P . By hypothesis is the only probability vector satisfying P = P ,
hence = qi , and all the rows of Q are identical.
2.3
45
Mixing
(2.3)
(2.4)
Notice that strongly mixing implies weakly mixing, and weakly mixing implies ergodicity. This follows from the simple fact that if {an } is a sequence
n1
1X
of real numbers such that limn an = 0, then limn
|ai | = 0, and
n i=0
n1
1X
ai = 0. Furthermore, if {an } is a bounded sequence,
hence limn
n i=0
then the following are equivalent:
n1
(i) limn
1X
|ai | = 0
n i=0
(ii) limn
1X
|ai |2 = 0
n i=0
n1
46
Using this one can give three equivalent characterizations of weakly mixing
transformations, can you state them?
Exercise 2.3.1 Let (X, F, ) be a probability space, and T : X X a
measure preserving transformation. Let S be a generating semi-algebra of
F.
(a) Show that if equation (2.3) holds for all A, B S, then T is weakly
mixing.
(b) Show that if equation (2.4) holds for all A, B S, then T is strongly
mixing.
Exercise 2.3.2 Consider the one or two-sided Bernoulli shift T as given in
Example (e) in subsection 1.3, and Example (2) in subsection 1.8. Show that
T is strongly mixing.
Exercise 2.3.3 Let (X, F, ) be a probability space, and T : X X a
measure preserving transformation. Consider the transformation T T defined on (X X, F F, ) by T T (x, y) = (T x, T y).
(a) Show that T T is measure preserving with respect to .
(b) Show that T T is ergodic, if and only if T is weakly mixing.
Chapter 3
Measure Preserving
Isomorphisms and Factor Maps
3.1
48
...
...
...
...
...
...
...
...
...
.
.........
...
S
N 0..............................................................N 0
2
n
(x) S ((x)) S ((x)) S ((x))
Definition 3.1.1 Two dynamical systems (X, F, , T ) and (Y, C, , S) are
isomorphic if there exist measurable sets N X and M Y with (X\N ) =
(Y \ M ) = 0 and T (N ) N, S(M ) M , and finally if there exists a
measurable map : N M such that (i)(iv) are satisfied.
Exercise 3.1.1 Suppose (X, F, , T ) and (Y, C, , S) are two isomorphic dynamical systems. Show that
(a) T is ergodic if and only if S is ergodic.
(b) T is weakly mixing if and only if S is weakly mixing.
(c) T is strongly mixing if and only if S is strongly mixing.
Examples
(1) Let K = {z C : |z| = 1} be equipped with the Borel -algebra B on
K, and Haar measure (i.e., normalized Lebeque measure on the unit circle).
49
1
,
Nn
X
ak
7 (ak )k1 ,
: x =
k
N
k=1
P
k
where
k=1 ak /N is the N -adic expansion of x (for example if N = 2 we
get the binary expansion, and if N = 10 we get the decimal expansion). Let
C(i1 , . . . , in ) = {(yi )i1 Y : y1 = i1 , . . . , yn = in } .
In order to see that is an isomorphism one needs to verify measurability
and measure preservingness on cylinders:
1 (C(i1 , . . . , in )) =
and
i1
i2
in i1
i2
in + 1
+ 2 + + n, + 2 + +
N
N
N N
N
Nn
1
1 (C(i1 , . . . , in )) = n = (C(i1 , . . . , in ))) .
N
Note that
N = {(yi )i1 Y : there exists a k 1 such that yi = N 1 for all i k}
50
0 x < 21
(2x, 2 y),
T (x, y) =
k, ` 0,
1+ 5
, so that G2 = G + 1. Consider the set
Exercise 3.1.3 Let G =
2
[ 1
1
1
X = [0, ) [0, 1) [ , 1) [0, ),
G
G
G
endowed with the product Borel -algebra. Define the transformation
(Gx, G ),
T (x, y) =
(Gx, ),
(x, y) [0, G1 ) [0, 1]
G
S(x, y) =
Factor Maps
51
y
)
G
3.2
Factor Maps
0 x < 21
(2x, 21 y),
T (x, y) =
52
and let S be the left shift on X = {0, 1}N with the -algebra F generated by
the cylinders, and the uniform product measure . Define : [0, 1)[0, 1)
X by
(x, y) = (a1 , a2 , . . .),
P
an
where x =
n=1 n is the binary expansion of x. It is easy to check that
2
is a factor map.
Exercise 3.2.1 Let T be the left shift on X = {0, 1, 2}N which is endowed
with the -algebra F, generated by the cylinder sets, and the uniform product
measure giving each symbol probability 1/3, i.e.,
({x X : x1 = i1 , x2 = i2 , . . . , xn = in }) =
1
,
3n
3.3
Natural Extensions
T k 1 G
k=0
Natural Extensions
53
Z
Example Let T on
{0,
1}
,
F,
T k 1 G = F.
k=0
W
k 1
To prove this, we show that
k=0 T G contains all cylinders generating
F.
Let = {x X : xk = ak , . . . , x` = a` } be an arbitrary cylinder in
F, and let D = {y Y : y0 = ak , . . . , yk+` = a` } which is a cylinder in G.
Then,
1 D = {x X : x0 = ak , . . . , xk+` = a` } and
This shows that
T k 1 G = F.
k=0
T k 1 D = .
54
Chapter 4
Entropy
4.1
56
Entropy
the symbol ai , and T the left shift. We define the entropy of this system by
H(p1 , . . . , pn ) = h(T ) :=
n
X
pi log2 pi .
(4.1)
i=1
1X
1
1X
1
1
H(p1 , . . . , pn ) =
f (pi ) f (
pi ) = f ( ) = log2 n ,
n
n i=1
n i=1
n
n
so H(p1 , . . . , pn ) log2 n for all probability vectors (p1 , . . . , pn ). But
1
1
H( , . . . , ) = log2 n,
n
n
so the maximum value is attained at ( n1 , . . . , n1 ).
4.2
So far H is defined as the average information per symbol. The above definition can be extended to define the information transmitted by the occurrence
of an event E as log2 P (E). This definition has the property that the information transmitted by E F for independent events E and F is the sum
of the information transmitted by each one individually, i.e.,
log2 P (E F ) = log2 P (E) log2 P (F ).
57
The only function with this property is the logarithm function to any base.
We choose base 2 because information is usually measured in bits.
In the above example of the ticker-tape the symbols were transmitted
independently. In general, the symbol generated might depend on what has
been received before. In fact these dependencies are often built-in to be
able to check the transmitted sequence of symbols on errors (think here of
the Morse sequence, sequences on compact discs etc.). Such dependencies
must be taken into consideration in the calculation of the average information
per symbol. This can be achieved if one replaces the symbols ai by blocks of
symbols of particular size. More precisely, for every n, let Cn be the collection
of all possible n-blocks (or cylinder sets) of length n, and define
Hn :=
CCn
Then n1 Hn can be seen as the average information per symbol when a block
of length n is transmitted. The entropy of the source is now defined by
h := lim
Hn
.
n
(4.2)
The existence of the limit in (4.2) follows from the fact that Hn is a subadditive
sequence, i.e., Hn+m Hn + Hm , and proposition (4.2.2) (see proposition
(4.2.3) below).
Now replace the source by a measure preserving system (X, B, , T ).
How can one define the entropy of this system similar to the case of a
source? The symbols {a1 , a2 , . . . , an } can now be viewed as a partition
A = {A1 , A2 , . . . , An } of X, so that X is the disjoint union (up to sets
of measure zero) of A1 , A2 , . . . , An . The source can be seen as follows: with
each point x X, we associate an infinite sequence x1 , x0 , x1 , , where
xi is aj if and only if T i x Aj . We define the entropy of the partition by
H() = H () :=
n
X
i=1
58
Entropy
Exercise 4.2.1 Let = {A1 , . . . , An } and = {B1 , . . . , Bm } be two partitions of X. Show that
T 1 := {T 1 A1 , . . . , T 1 An }
and
:= {Ai Bj : Ai , Bj }
are both partitions of X.
The members of a partition are called the atoms of the partition. We say
that the partition = {B1 , . . . , Bm } is a refinement of the partition =
{A1 , . . . , An }, and write , if for every 1 j m there exists an
1 i n such that Bj Ai (up to sets of measure zero). The partition
is called the common refinement of and .
Exercise 4.2.2 Show that if is a refinement of , each atom of is a finite
(disjoint) union of atoms of .
Given two partitions = {A1 , . . . An } and = {B1 , . . . , Bm } of X, we
define the conditional entropy of given by
XX
(A B)
(A B) .
log
H(|) :=
(B)
A B
(Under the convention that 0 log 0 := 0.)
The above quantity H(|) is interpreted as the average uncertainty
about which element of the partition the point x will enter (under T )
if we already know which element of the point x will enter.
Proposition 4.2.1 Let , and be partitions of X. Then,
(a) H(T 1 ) = H() ;
(b) H( ) = H() + H(|);
(c) H(|) H();
(d) H( ) H() + H();
(e) If , then H() H();
59
(f ) H( |) = H(|) + H(| );
(g) If , then H(|) H(|);
(h) If , then H(|) = 0.
(i) We call two partitions and independent if
(A B) = (A)(B) for all A , B .
If and are independent partitions, one has that
H( ) = H() + H() .
Proof We prove properties (b) and (c), the rest are left as an exercise.
H( ) =
XX
(A B) log (A B)
A B
XX
(A B) log
A B
XX
(A B)
(A)
(A B) log (A)
A B
= H(|) + H().
We now show that H(|) H(). Recall that the function f (t) = t log t
for 0 < t 1 is concave down. Thus,
XX
(A B)
(A B) log
H(|) =
(A)
B A
(A B)
(A B)
log
(A)
(A)
B A
XX
(A B)
=
(A)f
(A)
B A
!
X
X
(A B)
f
(A)
(A)
A
B
X
=
f ((B)) = H().
=
XX
(A)
60
Entropy
H(
T ) = H() +
i=0
n1
X
j=1
H(|
j
_
T i ).
i=1
exists.
Proof Fix any m > 0. For any n 0 one has n = km + i for some i between
0 i m 1. By subadditivity it follows that
an
akm+i
akm
ai
am
ai
=
+
k
+
.
n
km + i
km km
km km
Note that if n , k and so lim supn an /n am /m. Since m is
arbitrary one has
lim sup
n
an
am
an
inf
lim inf
.
n
n
m
n
61
W
i
Proof Let an = H( n1
i=0 T ) 0. Then, by Proposition 4.2.1, we have
n+p1
an+p = H(
T i )
i=0
n1
_
H(
n+p1
i
T ) + H(
T i )
i=n
i=0
p1
= an + H(
T i )
i=0
= an + ap .
Hence, by Proposition 4.2.2
n1
_
an
1
lim
= lim H(
T i )
n n
n n
i=0
exists.
We are now in position to give the definition of the entropy of the transformation T .
Definition 4.2.1 The entropy of the measure preserving transformation T
with respect to the partition is given by
n1
_
1
H(
T i ) ,
n n
i=0
h(, T ) = h (, T ) := lim
where
n1
_
H(
i=0
T i ) =
D
Wn1
i=0
(D) log((D)) .
T i
62
Entropy
n1
_
T i ) .
i=1
W
i
Proof Notice that the sequence {H(| ni=1
is bounded from below,
WnT )
i
and is non-increasing, hence limn H(| i=1 T ) exists. Furthermore,
j
n
_
1X
H(| T i ).
lim H(| T ) = lim
n
n n
i=1
j=1
i=1
n
_
H(
i=0
T ) = H() +
n1
X
j=1
H(|
j
_
T i ).
i=1
Now, dividing by n, and taking the limit as n , one gets the desired
result
Theorem 4.2.2 Entropy is an isomorphism invariant.
Proof Let (X, B, , T ) and (Y, C, , S) be two isomorphic measure preserving systems (see Definition 1.2.3, for a definition), with : X Y the
corresponding isomorphism. We need to show that h (T ) = h (S).
Let = {B1 , . . . , Bn } be any partition of Y , then 1 = { 1 B1 , . . . , 1 Bn }
is a partition of X. Set Ai = 1 Bi , for 1 i n. Since : X Y is an
isomorphism, we have that = 1 and T = S, so that for any n 0
and Bi0 , . . . , Bin1
Bi0 S 1 Bi1 . . . S (n1) Bin1
= 1 Bi0 1 S 1 Bi1 . . . 1 S (n1) Bin1
= 1 Bi0 T 1 1 Bi1 . . . T (n1) 1 Bin1
= Ai0 T 1 Ai1 . . . T (n1) Ain1 .
Setting
A(n) = Ai0 . . . T (n1) Ain1 and B(n) = Bi0 . . . S (n1) Bin1 ,
63
n n
i=0
X
1
(B(n)) log (B(n))
= sup lim
n
n
Wn1 i
B(n)
= sup lim
1 n
1
n
i=0
X
A(n)
Wn1
i=0
T i 1
= sup h ( , T )
1
sup h (, T ) = h (T ) ,
where in the last inequality the supremum is taken over all possible finite
partitions of X. Thus h (S) h (T ). The proof of h (T ) h (S) is done
similarly. Therefore h (S) = h (T ), and the proof is complete.
4.3
T i and
T i
i=n
i=m
W
W
i
be the smallest -algebras containing the partitions m
T i and n
i=n
i=m T
W
respectively. Furthermore, let i= T i be the smallest -algebra conW
Wn
i
i
taining all the partitions m
all n andm. We
i=n T and
i=m T for
W
call a partition a generator with respect to T if i= T i = F,
64
Entropy
65
W
i
In other words, m
i=0 T is precisely the collection of cylinders of length
m (i.e., the collection of all m-blocks), and these by definition generate F.
Hence, is a generating partition, so that
1
H
h(T ) = h(, T ) = lim
m m
m1
_
!
i
i=0
Thus,
n
X
X
1
h(T ) = lim
(m)
pi log pi =
pi log pi .
m m
i=1
i=1
Exercise 4.3.1 Let T be the left shift on X = {1, 2, , n}Z endowed with
the -algebra F generated by the cylinder sets, and the Markov measure
given by the stochastic matrix P = (pij ), and the probability vector =
(1 , . . . , n ) with P = . Show that
h(T ) =
n X
n
X
j=1 i=1
Exercise 4.3.2 Suppose (X1 , B1 , 1 , T1 ) and (X2 , B2 , 2 , T2 ) are two dynamical systems. Show that
h1 2 (T1 T2 ) = h1 (T1 ) + h2 (T2 ).
66
4.4
Entropy
In the previous sections we have considered only finite partitions on X, however all the definitions and results hold if we were to consider countable partitions of finite entropy. Before we state and prove the Shannon-McMillanBreiman Theorem, we need to introduce the information function associated
with a partition.
Let (X, F, ) be a probability space, and = {A1 , A2 , . . .} be a finite or
a countable partition of X into measurable sets. For each x X, let (x)
be the element of to which x belongs. Then, the information function
associated to is defined to be
X
I (x) = log ((x)) =
1A (x) log (A).
A
For two finite or countable partitions and of X, we define the conditional information function of given by
XX
(A B)
.
I| (x) =
1(AB) (x) log
(B)
B A
We claim that
I| (x) = log E (1(x) |()) =
(4.3)
R
X
I| (x) d(x).
67
Ai
Furthermore,
Z
n
_
1
1
i
H( T ) = lim
lim
IWni=0 T i (x) d(x) = h(, T ).
n n + 1 X
n n + 1
i=0
The Shannon-McMillan-Breiman theorem says if T is ergodic and if has
1 W
I n T i (x) converges a.e. to
finite entropy, then in fact the integrand
n + 1 i=0
h(, T ). Notice that the integrand can be written as
!
n
_
1 W
1
I n T i (x) =
log ( T i )(x) ,
n + 1 i=0
n+1
i=0
W
W
where ( ni=0 T i )(x) is the element of ni=0 T i containing x (often referred
to as the -cylinder of order n containing x). Before we proceed we need the
following proposition.
Proposition 4.4.1 Let = {A1 , A2 , . . .} be a countable partition with finite
entropy. For each n = 1, 2, 3, . . ., let fn (x) = I| Wni=1 T i (x), and let f =
supn1 fn . Then, for each 0 and for each A ,
({x A : f (x) > }) 2 .
Furthermore, f L1 (X, F, ).
Proof Let t 0 and A . For n 1, let
fnA (x)
= log E
1A |
n
_
!
i
T (x),
i=1
and
A
Bn = {x X : f1A (x) t, . . . , fn1
(x) t, fnA (x) > t}.
68
Entropy
Notice that
has fn (x) = fnAW(x), and for x Bn one has
Wn fori x A one
E (1A | i=1 T ) (x) < 2t . Since Bn ( ni=1 T i ), then
Z
(Bn A) =
1A (x) d(x)
Bn
Z
E
1A |
Bn
n
_
!
T i (x) d(x)
i=1
2t d(x) = 2t (Bn ).
Bn
Thus,
({x A : f (x) > t}) = ({x A : fn (x) > t, for some n})
= {x A : fnA (x) > t, for some n}
= (
n=1 A Bn )
X
(A Bn )
=
n=1
t
(Bn ) 2t .
n=1
hence,
({x A : f (x) > t}) min((A), 2t ).
69
XZ
A
XZ
A
min((A), 2t ) dt
XZ
A
log (A)
(A) dt +
XZ
A
= H () +
2t dt
log (A)
X (A)
loge 2
A
1
< .
loge 2
(4.4)
Exercise 4.4.2 Give a proof of equation (4.4) using the following important
theorem, known as the Martingale Convergence Theorem (and is stated to
our setting)
Theorem 4.4.1 (Martingale Convergence Theorem) Let C1 C2 be a
sequence of increasing algebras, and let C = (n Cn ). If f L1 (), then
E (f |C) = lim E (f |Cn )
n
70
Entropy
1 W
1 X
I ni=0 T i (x) =
fnk (T k x)
n+1
n + 1 k=0
n
1 X
1 X
=
f (T k x) +
(fnk f )(T k x).
n + 1 k=0
n + 1 k=0
By the ergodic theorem,
lim
n
X
k=0
f (T x) =
71
1 X
(fnk f )(T k x)}. Let
n + 1 k=0
kN
nN
1 X
FN (T k x)
n + 1 k=0
N 1
1 X
+
|fk f |(T nk x).
n + 1 k=0
If we take the limit as n , then by exercise (4.4.3) the second term tends
to 0 a.e., and by the ergodic theorem as well as the dominated convergence
theorem, the first term tends to zero a.e. Hence,
1 W
I ni=0 T i (x) = h(, T ) a.e.
n n + 1
lim
The above theorem
Wn canibe interpreted as providing an estimate of the size
of
atoms of i=0 T . For n sufficiently large, a typical element A
Wnthe i
i=0 T satisfies
1
72
4.5
Entropy
Lochs Theorem
In 1964, G. Lochs compared the decimal and the continued fraction expansions. Let x [0, 1) be an irrational number, and suppose x = .d1 d2 is the
decimal expansion of x (which is generated by iterating the map Sx = 10x
(mod 1)). Suppose further that
1
x =
= [0; a1 , a2 , ]
a1 +
(4.5)
a2 +
a3 +
1
...
z=
c1 +
1
1
.
c2 + . . +
ck
(4.6)
In other words, if Bn (x) denotes the decimal cylinder consisting of all points
y in [0, 1) such that the first n decimal digits of y agree with those of x,
and if Cj (x) denotes the continued fraction cylinder of order j containing
x, i.e., Cj (x) is the set of all points in [0, 1) such that the first j digits in
their continued fraction expansion is the same as that of x, then m(n, x) is
the largest integer such that Bn (x) Cm(n,x) (x). Lochs proved the following
theorem:
Lochs Theorem
73
Theorem 4.5.1 Let denote Lebesgue measure on [0, 1). Then for a.e.
x [0, 1)
m(n, x)
6 log 2 log 10
lim
=
.
n
n
2
In this section, we will prove a generalization of Lochs theorem that
allows one to compare any two known expansions of numbers. We show
that Lochs theorem is true for any two sequences of interval partitions on
[0, 1) satisfying the conclusion of Shannon-McMillan-Breiman theorem. The
content of this section as well as the proofs can be found in [DF]. We begin
with few definitions that will be used in the arguments to follow.
Definition 4.5.1 By an interval partition we mean a finite or countable
partition of [0, 1) into subintervals. If P is an interval partition and x [0, 1),
we let P (x) denote the interval of P containing x.
Let P = {Pn }
n=1 be a sequence of interval partitions. Let denote
Lebesgue probability measure on [0, 1).
Definition 4.5.2 Let c 0. We say that P has entropy c a.e. with respect
to if
log (Pn (x))
c a.e.
n
Note that we do not assume that each Pn is refined by Pn+1 .
mP,Q (n, x)
c
a.e.
n
d
74
Entropy
mP,Q (n, x)
c
a.e.
n
d
mP,Q (n, x)
c
(1 + ) a.e.
n
d
Since > 0 was arbitrary, we have the desired result.
Now we show that
lim supn
mP,Q (n, x)
c
a.e.
n
d
c
Fix (0, 1). Choose > 0 so that =: c 1 + (1 )
> 0. For
d
j
c k
each n N let m
(n) = (1 ) n . For brevity, for each n N we call an
d
element of Pn (respectively Qn ) (n, ) good if
lim inf n
Lochs Theorem
75
Dn () = x :
.
and Pn (x) " Qm(n)
(x)
(n),
n
(Pn (x))
< 2n .
Qm(n)
(x)
n=1
Thus, for almost every x [0, 1), there exists N N, so that for all n N ,
mP,Q (n, x) m
(n), so that
c
mP,Q (n, x)
b(1 ) c.
n
d
This proves that
lim inf n
c
mP,Q (n, x)
(1 ) a.e.
n
d
76
Entropy
Lochs Theorem
77
Exercise 4.5.2 Prove Theorem (4.5.1) using Theorem (4.5.2). Use the fact
that the continued fraction map T is ergodic with respect to Gauss measure
, given by
Z
1
1
(B) =
dx,
B log 2 1 + x
and has entropy equal to h (T ) =
2
.
6 log 2
Exercise 4.5.3 Reformulate and prove Lochs Theorem for any two NTFM
maps S and T on [0, 1).
78
Entropy
Chapter 5
Hurewicz Ergodic Theorem
In this chapter we consider a class of non-measure preserving transformations.
In particular, we study invertible, non-singular and conservative transformations on a probability space. We first start with a quick review of equivalent
measures, we then define non-singular and conservative transformations, and
state some of their properties. We end this section by giving a new proof
of Hurewicz Ergodic Theorem, which is a generalization of Birkhoff Ergodic
Theorem to non-singular conservative transformations.
5.1
Equivalent measures
Recall that two measures and on a measure space (Y, F) are equivalent
if and have the same null-sets, i.e.,
(A) = 0
(A) = 0,
A F.
The theorem of Radon-Nikodym says that if , are -finite and equivalent, then there exist measurable functions f, g 0, such that
Z
Z
(A) =
f d and (A) =
g d.
A
80
d
d
and the function g by
.
d
d
Now suppose that (X, B, ) is a probability space, and T : X X a
measurable transformation. One can define a new measure T 1 on (X, B)
by T 1 (A) = (T 1 A) for A B. It is not hard to prove that for
f L1 (),
Z
Z
Usually the function f is denoted by
f d( T 1 ) =
f T d
(5.1)
5.2
(5.2)
81
Proof We show the result for indicator functions only, the rest of the proof
is left to the reader.
Z
1A (x) d(x) = (A) = (T (T 1 A))
X
Z
=
1 (x) d(x)
1 A
T
Z
1A (T x)1 (x) d(x).
=
X
Proposition 5.2.2 Under the assumptions of Proposition 5.2.1, one has for
all n, m 1, that
n+m (x) = n (x)m (T n x),
a.e.
82
(T n B) = 0.
n=1
Therefore, for almost every x A there exist infinitely many n 1 such that
T n x A. Replacing T by T 1 yields the result for n 1.
Proposition 5.2.4 Suppose T is invertible, non-singular and conservative,
then
X
n (x) = , a.e.
n=1
Proof Let A = {x X :
A=
n=1
{x X :
M =1
n (x) < M }.
n=1
X
n=1
n (x) < M }
83
R P
has positive measure. Then, B
n=1 n (x) d(x) < M (B) < . However,
Z X
Z
X
n (x) d(x) =
n (x) d(x)
B n=1
n=1
(T n B)
n=1
Z
X
n=1
1T n B (x) d(x)
Z X
1B (T n x) d(x).
X n=1
Hence,
R P
X
n=1
1B (T n x) < a.e.
n=1
n (x) = ,
a.e.
n=1
5.3
lim
f (T i x)i (x)
i=0
n1
X
i=0
= f (x)
i (x)
84
fn (x)
n1
X
= lim sup
n
fn (x)
,
gn (x)
i (x)
i=0
and
f (x) = lim inf
n
fn (x)
n1
X
= lim inf
n
fn (x)
.
gn (x)
i (x)
i=0
By Proposition (5.2.2), one has gn+m (x) = gn (x) + gm (T n x). Using Exercise
(5.2.1) and Proposition (5.2.4), we will show that f and f are T -invariant.
To this end,
fn (T x)
n
n gn (T x)
fn+1 (x) f (x)
1 (x)
lim sup
n gn+1 (x) g(x)
1 (x)
fn+1 (x) f (x)
lim sup
n gn+1 (x) g(x)
fn+1 (x)
gn+1 (x)
f (x)
lim sup
f (T x) = lim sup
=
=
=
=
85
(Similarly f is T -invariant).
Now, to prove that f exists, is integrable and T -invariant, it is enough
to show that
Z
Z
Z
f d
f d.
f d
X
86
If T n x
/ X0 , then take m = 1 since
an = F (T n x)n (x) = Ln (x) min(f (x), L)(1 )n (x) = bn .
Hence by T -invariance of f , and Lemma 2.1.1 for all integers N > M one
has
F (x)+F (T x)+1 (x)+ +N 1 (x)F (T N 1 x) min(f (x), L)(1)gN M (x).
Integrating both sides, and using Proposition (5.2.1) together with the T invariance of f one gets
Z
Z
N
F (x) d(x)
min(f (x), L)(1 )gN M (x) d(x)
X
X
Z
= (N M )
min(f (x), L)(1 ) d(x).
X
Since
F (x) d(x) =
X0
one has
Z
Z
f (x) d(x)
f (x) d(x)
X0
Z
F (x) d(x) L(X \ X0 )
Z
(N M )
f (x) d(x).
X
Z
f (x) d(x)
f (x) d(x).
X
87
If T n x
/ Y0 , then take m = 1 since
bn = G(T n x)n (x) = 0 (f (x) + )(n (x)) = an .
Hence by Lemma 2.1.1 one has for all integers N > M
G(x) + G(T x)1 (x) + . . . + G(T N M 1 x)N M 1 (x) (f (x) + )gN (x).
88
R
Since f 0, the measure defined by (A) = A f (x) d(x) is absolutely
continuous with respect to the measure . Hence, there exists 0 > 0 such
that
if (A) < , then (A) < 0 . Since (X \ Y0 ) < , then (X \ Y0 ) =
R
f (x)d(x) < 0 . Hence,
X\Y0
Z
f (x) d(x) =
G(x) d(x) +
f (x) d(x)
X\Y0
Z
N
(f (x) + ) d(x) + 0 .
N M X
Z
f d
f d
X
f d,
X
Remark We can extend the notion of ergodicity to our setting. If T is nonsingular and conservative, we say that T is ergodic if for any measurable set
A satisfying (AT 1 A) = 0, one has (A) = 0 or 1. It is easy to check that
the proof of Proposition (1.7.1) holds in this case, so that T ergodic implies
that each T -invariant function is a constant a.e. Hence, if T is invertible,
non-singular, conservative and ergodic, then by Hurewicz Ergodic Theorem
one has for any f L1 (),
n1
X
lim
f (T i x)i (x)
i=0
n1
X
i=0
Z
=
i (x)
f d a.e.
X
Chapter 6
Invariant Measures for
Continuous Transformations
6.1
Existence
Suppose X is a compact metric space, and let B be the Borel -algebra i.e.,
the -algebra generated by the open sets. Let M (X) be the collection of all
Borel probability measures on X. There is natural embedding of the space
X in M (X) via the map x x , where X is the Dirac measure concentrated
at x (x (A) = 1 if x A, and is zero otherwise). Furthermore, M (X) is a
convex set, i.e., p + (1 p) M (X) whenever , M (X) and 0 p 1.
Theorem 6.1.2 below shows that a member of M (X) is determined by how
it integrates continuous functions. We denote by C(X) the Banach space of
all complex valued continuous functions on X under the supremum norm.
Theorem 6.1.1 Every member of M (X) is regular, i.e., for all B B and
every > 0 there exist an open set U and a closed sed C such that C
B U such that (U \ C ) < .
Idea of proof Call a set B B with the above property a regular set. Let
R = {B B : B is regular }. Show that R is a -algebra containing all the
closed sets.
Corollary 6.1.1 For any B B, and any M (X),
(B) =
sup
(C) =
inf
(U ).
BU :U open
closed
CB:C
89
90
Using a similar argument, one can show that m(C) (C) + . Therefore,
(C) = m(C) for all closed sets, and hence for all Borel sets.
This allows us to define a metric structure on M (X) as follows. A sequence
{n } in M (X) converges to M (X) if and only if
Z
Z
f (x) d(x)
f (x) dn (x) =
lim
n
for all f C(X). We will show that under this notion of convergence
the space M (X) is compact, but first we need The Riesz Representation
Representation Theorem.
Theorem 6.1.3 (The Riesz Representation Theorem) Let X be a compact
metric space and J : C(X) C a continuous linear map such that J is a
positive Roperator and J(1) = 1. Then there exists a M (X) such that
J(f ) = X f (x) d(x).
Theorem 6.1.4 The space M (X) is compact.
Existence
91
92
1X i
T n .
n =
n i=0
Then, any limit point of {n } is a member of M (X, T ).
Proof
We need toR show that for any continuous function f on X, one
R
has X f (x) d(x) = X f (T x) d. Since M (X) is compact there exists a
M (X) and a subsequence {ni } such that ni in M (X). Now for
any f continuous, we have
Z
Z
Z
Z
f (T x) d(x)
= lim f (T x) dn (x)
f
(x)
d(x)
f
(x)
d
(x)
n
j
j
j X
X
X
X
Z nj 1
1
X
f (T i+1 x) f (T i x) dnj (x)
= lim
j nj X
Z i=0
1
= lim
(f (T nj x) f (x)) dnj (x)
j nj X
2 supxX |f (x)|
lim
= 0.
j
nj
Therefore M (X, T ).
Existence
93
(B E)
(B (X \ E))
and 1 (B) =
.
(E)
(X \ E)
94
1X
lim
fk (T i x) =
n n
i=0
for all x Xk . Let Y =
k=1
Z
fk (x) d(x)
X
Xk , then (Y ) = 1, and
n1
1X
lim
fk (T i x) =
n n
i=0
Z
fk (x)d(x)
X
for all k and all x Y . Now, let f C(X), then there exists a subsequence {fkj } converging to f in the supremum norm, and hence is uniformly
convergent. For any x Y , using uniform convergence and the dominated
convergence theorem, one gets
n1
1X
f (T i x) =
n n
i=0
lim
n1
1X
fkj (T i x)
n j n
i=0
lim lim
n1
1X
fkj (T i x)
j n n
i=0
Z
Z
f d.
= lim
fkj d =
=
lim lim
Theorem 6.1.8 Let T : X X be continuous, and M (X, T ). Then T
is ergodic with respect to if and only if
n1
1X
T i x a.e.
n i=0
Proof Suppose T is ergodic with respect to . Notice that for any f
C(X),
Z
f (y) d(T i x )(y) = f (T i x),
X
Existence
95
1X
lim
n n
i=0
n1
1X
f (y) d(T i x )(y) = lim
f (T i x) =
n
n i=0
X
1
n
Pn1
i=0
f (y) d(y)
X
T i x for all x Y .
P
Conversely, suppose n1 n1
i=0 T i x for all x Y , where (Y ) = 1. Then
for any f C(X) and any g L1 (X, B, ) one has
n1
1X
lim
f (T i x)g(x) = g(x)
n n
i=0
Z
f (y) d(y).
X
1X
lim
n n
i=0
f (y) d(y).
g(x)d(x)
f (T x)g(x) d(x) =
1X
lim
n n
i=0
f (y)d(y)
G(x)d(x)
f (T x)G(x) d(x) =
X
96
F (T i x) f (T i x) |G(x)| d(x)
X n i=0
Z
Z
Z
n1
X
1
+
f (T i x)G(x)d(x)
Gd
f d
Xn
X
X
i=0
Z
Z
Z
Z
+ f d
Gd
F d
G d
X
1X
lim
n n
i=0
F (T x)G(x) d(x) =
X
Z
G(x) d(x)
F (y) d(y)
X
6.2
Unique Ergodicity
Unique Ergodicity
97
for all x Y and all f C(X). When T is uniquely ergodic we will see that
we have a much stronger result.
Theorem 6.2.1 Let T : X X be a continuous transformation on a compact metric space X. Then the following are equivalent:
n1
1X
f (T j x)} converges uniformly
n j=0
to a constant.
n1
1X
(ii) For every f C(X), the sequence {
f (T j x)} converges pointwise
n j=0
to a constant.
(iii) There exists a M (X, T ) such that for every f C(X) and all
x X.
Z
n1
1X
i
f (T x) =
f (y) d(y).
lim
n n
X
i=0
(iv) T is uniquely ergodic.
Proof (i) (ii) immediate.
(ii) (iii) Define L : C(X) C by
n1
1X
f (T i x).
L(f ) = lim
n n
i=0
By assumption L(f ) is independent of x, hence L is well defined. It is easy
to see that L is linear, continuous (|L(f )| supxX |f (x)|), positive and
L(1) = 1. Thus, by Riesz Representation Theorem there exists a probability
measure M (X) such that
Z
f (x) d(x)
L(f ) =
X
1X
L(f T ) = lim
f (T i+1 x) = L(f ).
n n
i=0
98
Hence,
Z
Z
f (T x) d(x) =
f (x) d(x).
X
Unique Ergodicity
99
Let
n1
n1
1X j
1X
T j xn =
T xn .
n =
n j=0
n j=0
Then,
Z
gdn
X
g d| .
X
1X
lim
1Jk (Tj (0)) = (Jk ) = log10
n n
i=0
k+1
k
.
100
Returning to our original problem, notice that the first digit of 2i is k if and
only if
k 10r 2i < (k + 1) 10r
for some r 0. In this case,
r + log10 k i log10 2 = i < r + log10 (k + 1).
This shows that r = bic, and
log10 k i bic < log10 (k + 1).
But Ti (0) = i bic, so that Ti (0) Jk . Summarizing, we see that the first
digit of 2i is k if and only if Ti (0) Jk . Thus,
n1
pk (n)
1X
lim
= lim
1Jk (Ti (0)) = log10
n
n n
n
i=0
k+1
k
.
Chapter 7
Topological Dynamics
In this chapter we will shift our attention from the theory of measure preserving transformations and ergodic theory to the study of topological dynamics.
The field of topological dynamics also studies dynamical systems, but in a
different setting: where ergodic theory is set in a measure space with a measure preserving transformation, topological dynamics focuses on a topological
space with a continuous transformation. The two fields bear great similarity
and seem to co-evolve. Whenever a new notion has proven its value in one
field, analogies will be sought in the other. The two fields are, however, also
complementary. Indeed, we shall encounter a most striking example of a map
which exhibits its most interesting topological dynamics in a set of measure
zero, right in the blind spot of the measure theoretic eye.
In order to illuminate this interplay between the two fields we will be merely
interested in compact metric spaces. The compactness assumption will play
the role of the assumption of a finite measure space. The assumption of the
existence of a metric is sufficient for applications and will greatly simplify
all of our proofs. The chapter is structured as follows. We will first introduce several basic notions from topological dynamics and discuss how they
are related to each other and to analogous notions from ergodic theory. We
will then devote a separate section to topological entropy. To conclude the
chapter, we will discuss some examples and applications.
101
102
7.1
Basic Notions
Basic Notions
103
The reader may recognize that this theorem contains some of the most
important results in analytic topology, most notably the Lebesgue number
lemma, Baires category theorem, the extreme value theorem and Urysohns
lemma.
Let us now introduce some concepts of topological dynamics: topological
transitivity, topological conjugacy, periodicity and expansiveness.
Definition 7.1.1 Let T : X X be a continuous transformation. For
x X, the forward orbit of x is the set F OT (x) = {T n (x)|n Z0 }. T is
called one-sided topologically transitive if for some x X, F OT (x) is dense
in X.
If T is invertible, the orbit of x is defined as OT (x) = {T n (x)|n Z}. If T
is a homeomorphism, T is called topologically transitive if for some x X
OT (x) is dense in X.
104
Basic Notions
\
[
n=1 m=
T m Un
105
m
But
m= T Un is T -invariant and therefore intersects every open set in
m
X by (3). Thus,
m= T Un is dense in X for every n. Now, since X is
a Baire space (c.f. theorem 7.1.1), we see that {x X|OT (x) = X} is itself
dense in X and thus certainly non-empty (as X 6= ).
Exercise 7.1.1 Let X be a compact metric space with metric d and let
T : X X is a topologically transitive homeomorphism. Show that if T is
an isometry (i.e. d(T (x), T (y)) = d(x, y), for all x, y X) then T is minimal.
Definition 7.1.3 Let X be a topological space, T : X X a transformation and x X. Then x is called a periodic point of T of period n (n Z>0 )
if T n (x) = x and T m (x) 6= x for m < n. A periodic point of period 1 is called
a fixed point.
106
Basic Notions
{An }n= , Ai .
Theorem 7.1.4 Let X be a compact metric space and T : X X a
homeomorphism. Then T is expansive if and only if T has a generator.
Proof
Suppose that T is expansive and let > 0 be an expansive constant for T .
Let be the open cover of X defined by = {B(a, 2 )|a X}. By compactn
ness, there exists a finite subcover of . Suppose that x, y
Bn ,
n= T
m
m
Bi for all i. Then d(T (x), T (y)) , for all m Z. But by expansiveness, there exists an l Z such that d(T l (x), T l (y)) > , a contradiction.
Hence, x = y. We conclude that is a generator for T .
Conversely, suppose that is a generator for T . By theorem 7.1.1, there
exists a Lebesgue number > 0 for the open cover . We claim that
/4 is an expansive constant for T . Indeed, if x, y X are such that
d(T m (x), T m (y)) /4, for all m Z. Then, for all m Z, the closed
ball B(T m (x), /4) contains T m (y) and has diameter /2 < . Hence,
B(T m (x), /4) Am ,
107
m
for some Am . It follows that x, y
Am .But is a generator,
m= T
m
so m= T Am contains at most one point. We conclude that x = y, T is
expansive.
108
Two Definitions
Proof
The proof of the first two statements is trivial. For the third statement
we note that
\
n=
( An ) =
n=
(An ) = (
S n An )
n=
The lefthandside of the equation contains at most one point if and only if
the righthandside does, so a finite open cover is a generator for S if and
only if 1 = {1 A|A } is a generator for T . Hence the result follows
from theorem 7.1.4.
7.2
Topological Entropy
Topological entropy was first introduced as an analogue of the succesful concept of measure theoretic entropy. It will turn out to be a conjugacy invariant
and is therefore a useful tool for distinguishing between topological dynamical
systems. The definition of topological entropy comes in two flavors. The first
is in terms of open covers and is very similar to the definition of measure theoretic entropy in terms of partitions. The second (and chronologically later)
definition uses (n,) separating and spanning sets. Interestingly, the latter
definition was a topological dynamical discovery, which was later engineered
into a similar definition of measure theoretic entropy.
7.2.1
Two Definitions
We will start with a definition of topological entropy similar to the definition of measure theoretic entropy introduced earlier. To spark the readers
recognition of the similarities between the two, we use analogous notation
and terminology. Before kicking off, let us first make a remark on the assumptions about our space X. The first definition presented will only require
X to be compact. The second, on the other hand, only requires X to be a
metric space. We will obscure from these subtleties and simply stick to the
context of a compact metric space, in which the two definitions will turn out
to be equivalent. Nevertheless, the reader should be aware that both definitions represent generalizations of topological entropy in different directions
109
and are therefore interesting in their own right.
Let X be a compact metric space and let , be open covers of X. We say
that is a refinement of , and write , if for every B there is an
A such that B A. The common refinement of and is defined to
be = {A B|AW , B }. For a finite collection {i }ni=1 of open
covers we may define ni=1 i = {ni=1 Aji |Aji i }. For a continuous transformation T : X X we define T 1 = {T 1 A|A }. Note that these
are all again open covers of X. Finally, we define the diameter of an open
cover as diam() := supA diam(A), where diam(A) = supx,yA d(x, y).
Exercise 7.2.1 Show that T 1 ( ) = T 1 () T 1 () and show that
implies T 1 T 1 .
Definition 7.2.1 Let be an open cover of X and let N () be the number
of sets in a finite subcover of of minimal cardinality. We define the entropy
of to be H() = log(N ()).
The following proposition summarizes some easy properties of H(). The
proof is left as an exercise.
Proposition 7.2.1 Let be an open cover of X and let H() be the entropy
of . Then
1. H() 0
2. H() = 0 if and only if N () = 1 if and only if X
3. If , then H() H()
4. H( ) H() + H()
5. For T : X X continuous we have H(T 1 ) H(). If T is
surjective then H(T 1 ) = H().
Exercise 7.2.2 Prove proposition 7.2.1 (Hint: If T is surjective then T (T 1 A) =
A).
We will now move on to the definition of topological entropy for a continuous transformation with respect to an open cover and subsequently make
this definition independent of open covers.
110
Two Definitions
We must show that h(T, ) is well-defined, i.e. that the right hand side exists.
Wn1 i
Theorem 7.2.1 limn n1 H( i=1
T ) exists.
Proof
Define
n1
_
an = H(
T i )
i=0
an+p = H(
T i )
i=0
n1
_
H(
p1
i
T ) + H(T
i=0
T j )
j=0
an + ap
111
Proof
These are easy consequences of proposition 7.2.1. We will only prove the
third statement. We have:
n1
_
H(
T )
i=0
n1
X
H(T i )
i=0
nH()
Definition 7.2.4 Let n Z0 , > 0 and A X. A is called (n, )spanning for X if for all x X there exists a y A such that dn (x, y) < .
Define span(n, , T ) to be the minimum cardinality of an (n, )-spanning set.
A is called (n, )-separated if for any x, y A dn (x, y) > . We define
sep(n, , T ) to be the maximum cardinality of an (n, )-separated set.
112
Two Definitions
Figure 7.1: An (n, )-separated set (left) and (n, )-spanning set (right). The
dotted circles represent open balls of dn -radius 2 (left) and of dn -radius
(right), respectively.
Note that we can extend this definition by letting A K, where K is
a compact subset of X. This generalization to the context of non-compact
metric spaces was devised by R.E. Bowen. See [W] for an outline of this
extended framework.
We can also formulate the above in terms of open balls. If B(x, r) = {y
X|d(x, y) < r} is an open ball in the metric d, then the open ball with centre
x and radius r in the metric dn is given by:
Bn (x, r) :=
n1
\
T i B(T i (x), r)
i=0
Bn (a, )
aA
113
Theorem 7.2.2 cov(n, 3, T ) span(n, , T ) sep(n, , T ) cov(n, , T ).
Furthermore, sep(n, , T ), span(n, , T ) and cov(n, , T ) are finite, for all n,
> 0 and continuous T .
Proof
We will only prove the last two inequalities. The first is left as an exercise
to the reader. Let A be an (n, )-separated set of cardinality sep(n, , T ).
Suppose A is not (n, )-spanning for X. Then there is some x X such
that dn (x, a) , for all a A. But then A {x} is an (n, )-separated
set of cardinality larger than sep(n, , T ). This contradiction shows that A
is an (n, )-spanning set for X. The second inequality now follows since the
cardinality of A is at least as large as span(n, , T ).
To prove the third inequality, let A be an (n, )-separated set of cardinality sep(n, , T ). Note that if is an open cover of dn -diameter less than ,
then no element of can contain more than 1 element of A. This holds in
particular for an open cover of minimal cardinality, so the third inequality is
proved.
The final statement follows for span(n, , T ) and cov(n, , T ) from the compactness of X and subsequently for sep(n, , T ) by the last inequality.
Exercise 7.2.3 Finish the proof of the above theorem by proving the first
inequality.
Lemma 7.2.1 The limit cov(, T ) := limn
is finite.
1
n
Proof
Fix > 0. Let , be open covers of X consisting of sets with dm diameter and dn -diameter, smaller than and with cardinalities cov(m, , T )
and cov(n, , T ), respectively. Pick any A and B . Then, for
x, y A T m B,
dm+n (x, y) =
max
0im+n1
<
max
mjm+n1
114
Thus, A T m B has dm+n -diameter less than and the sets {A T m B|A
, B } form a cover of X. Hence,
cov(m + n, , T ) cov(m, , T ) cov(n, , T )
Now, since log is an increasing function, the sequence {an }
n=1 defined by
an = log cov(n, , T ) is subadditive, so the result follows from proposition 5.
Motivated by the above lemma and theorem, we have the following definition
Definition 7.2.6 (II) The topological entropy of a continuous transformation T : X X is given by:
1
log(cov(n, , T ))
n
1
= lim lim log(span(n, , T ))
0 n n
1
= lim lim log(sep(n, , T ))
0 n n
h2 (T ) = lim lim
0 n
7.2.2
We will now show that definitions (I) and (II) of topological entropy coincide
on compact metric spaces. This will justify the use of the notation h(T ) for
both definitions of topological entropy.
In the meantime, we will continue to write h1 (T ) for topological entropy in
the definition using open covers and h2 (T ) for the second definition of entropy.
After all the work done in the last section, our only missing ingredient is the
following lemma:
115
Lemma 7.2.2 Let X be a compact metric space. Suppose {n }
n=1 is a sequence of open covers of X such that limn diam(n ) = 0, where diam is
the diameter under the metric d. Then limn h1 (T, n ) = h1 (T ).
Proof
We will only prove the lemma for the case h1 (T ) < . Let > 0 be arbitrary and let be an open cover of X such that h1 (T, ) > h1 (T ) . By
theorem 7.1.1 there exists a Lebesgue number > 0 for . By assumption,
there exists an N > 0 such that for n N we have diam(n ) < . So, for
such n, we find that for any A n there exists a B such that A B,
i.e. n . Hence, by proposition 7.2.2 and the above,
h1 (T ) < h1 (T, ) h1 (T, n ) h1 (T )
for all n N .
Exercise 7.2.4 Finish the proof of lemma 7.2.2. That is, show that if
h1 (T ) = , then limn h1 (T, n ) = .
Theorem 7.2.3 For a continuous transformation T : X X of a compact
metric space X, definitions (I) and (II) of topological entropy coincide.
Proof
Step 1: h2 (T ) h1 (T ).
For any n, let {k }
k=1 be the sequence of open covers of X, defined by
1
)|x X}. Then, clearly, the dn -diameter of k is smaller than
k = {Bn (x, 3k
Wn1 i
1
cover of X by sets of dn -diameter smaller
.
Now,
i=0 T k is an open W
k
1
1
i
than k , hence cov(n, k , T ) N ( n1
i=0 T k ). Since limk diam(k ) = 0,
by lemma 7.2.2, we can subsequently take the log, divide by n, take the limit
for n and the limit for k to obtain the desired result.
Step 2: h1 (T ) h2 (T ).
2
1
Let {k }
k=1 be defined by k = {B(x, k )|x X}. Note that k = k is a
Lebesgue number for k , for each k. Let A be an (n, 2k )-spanning set for X
of cardinality span(n, 2k , T ). Then, for each a A the ball B(T i (a), 2k ) is
contained in a member of k (for all 0 i < n), hence Bn (T i (a), 2k ) is conW
Wn1 i
1
i
tained in a member of n1
i=0 T k . Thus, N ( i=0 T k ) span(n, 4k , T ).
Proceeding in the same way as in step 1, we obtain the desired inequality.
116
Properties of Entropy
We will now list some elementary properties of topological entropy.
Theorem 7.2.4 Let X be a compact metric space and let T : X X be
continuous. Then h(T ) satisfies the following properties:
1. If the metrics d and d both generate the topology of X, then h(T ) is the
same under both metrics
2. h(T ) is a conjugacy invariant
3. h(T n ) = n h(T ), for all n Z>0
4. If T is a homeomorphism, then h(T 1 ) = h(T ). In this case, h(T n ) =
|n| h(T ), for all n Z
Proof
denote the two metric spaces. Since both induce
(1): Let (X, d) and (X, d)
and j : (X, d)
the same topology, the identity maps i : (X, d) (X, d)
(X, d) are continuous and hence by theorem 7.1.1 uniformly continuous. Fix
1 > 0. Then, by uniform continuity, we can subsequently find an 2 > 0 and
3 > 0 such that for all x, y X:
y) < 2
d(x, y) < 1 d(x,
y) < 2 d(x, y) < 3
d(x,
Let A be an (n, 1 )-spanning set for T under d. Then A is also (n, 2 ) Analogously, every (n, 2 )-spanning set for T under
spanning for T under d.
d is (n, 3 )-spanning for T under d. Therefore,
span(n, 1 , T, d)
span(n, 3 , T, d) span(n, 2 , T, d)
where the fourth argument emphasizes the metric. By subsequently taking
the log, dividing by n, taking the limit for n and the limit for 1 0
(so that 2 , 3 0) in these inequalities, we obtain the desired result.
(2): Suppose that S : Y Y is topologically conjugate to T , where Y is a
117
compact metric space, with conjugacy : Y X. Let dX be the metric on
X. Then the map dY defined by dY (x, y) = dX ((x), (y)) defines a metric
on Y which induces the topology of Y . Indeed, let BdY (y0 , ) be a basis
element of the topology of Y . Then (BdY (y0 , )) is open in X, as is an
open map. Hence, there is an open ball BdX ((y0 ), ) (BdY (y0 , )), since
the open balls are a basis for the topology of X. Now, 1 (BdX ((y0 ), ))
BdY (y0 , ) and
y 1 (Bd ((y0 ), )) dY (y0 , y) = dX ((y), (y0 )) <
X
hence, BdY (y0 , ) BdY (y0 , ). Analogously, for any y0 Y and > 0, there
is a > 0 such that
1im1
max
1jnm1
0im1
118
0kn1 0im1
max
0jnm1
0in1
0in1
so T n1 A is an (n, )-separated set for T 1 of the same cardinality. Conversely, every (n, )-separated set B for T 1 gives the (n, )-separated set
T (n1) B for T . Thus, sep(n, , T ) = sep(n, , T 1 ) and the first statement
readily follows. The second statement is a consequence of (3).
Exercise 7.2.5 Show that assertion (4) of theorem 7.2.4, i.e. h(T 1 ) =
h(T ), still holds if X is merely assumed to be compact.
Exercise 7.2.6 Let (X, dX ), (Y, dY ) be compact metric spaces and T :
X X, S : Y Y continuous. Show that h(T S) = h(T ) + h(S).
(Hint: first show that the metric d = max{dX , dY } induces the product
topology on X Y ).
7.3
Examples
119
plenty of fish to go around. If, on the other hand, population is near the upper limit level P , it is expected to decrease as a result of overcrowding. We
arrive at the following simple model for the population at time n, denoted
by Pn :
Pn+1 = Pn (P Pn )
Where is a speed parameter. If we now set P = 1, we can think of Pn
as the population at time n as a percentage of the limiting population level.
Then, writing x = P0 , the population dynamics are described by iteration of
the following map:
Definition 7.3.1 The Quadratic Map with parameter , Q : R R, is
defined to be
Q (x) = x(1 x)
Of course, in the context of the biological model formulated above, our attention is restricted to the interval [0, 1] since the population percentage cannot
lie outside this interval.
The quadratic map is deceivingly simple, as it in fact displays a wide range
of interesting dynamics, such as periodicity, topological transitivity, bifurcations and chaos.
Let us take a closer look at the case > 4. For x (, 0) (1, )
it can be shown that Qn (x) as n . The interesting dynamics occur in the interval [0, 1]. Figure 7.2 shows a phase portrait for Q
on [0, 1], which is nothing more than the graph of Q together with the
graph of the line y = x. In a phase portrait the trajectory of a point
under iteration of the map Q is easily visualized by iteratively projecting from the line y = x to the graph and vice versa. From our phase
portrait for Q it is immediately visible that there is a leak in our interval, i.e. the subset A1 = {x [0, 1]|Q (x)
/ [0, 1]} is non-empty. Define
i
An = {x [0, 1]|Q (x) [0, 1] for 0 i n 1, Qn (x)
/ [0, 1]} and set
120
Figure 7.2: Phase portrait of the quadratic map Q for > 4 on [0, 1]. The
arrows indicate the (partial) trajectory of a point in [0, 1]. The dotted lines
divide [0, 1] in three parts: I0 , A1 and I1 (from left to right).
Example 7.3.1 . (Circle Rotations) Let S 1 denote the unit circle in C. We
define the rotation map R : S 1 S 1 with parameter by R (z) = ei z =
ei(+) , where [0, 2) and z = ei for some [0, 2). It is easily seen
that R is a homeomorphism for every . The rotation map is very similar
to the earlier introduced shift map T . This is not surprising, considering the
fact that the interval [0, 1] is homeomorphic to S 1 after identification of the
endpoints 0 and 1.
Proposition 7.3.1 Let [0, 2) and R : S 1 S 1 be the rotation map.
If is rational, say = ab , then every z S 1 is periodic with period b. R is
minimal if and only if is irrational.
Proof
The first statement follows trivially from the fact that e2ni = 1 for n Z.
121
Suppose that is irrational. Fix > 0 and z S 1 , so z = ei for some
[0, 2). Then
Rn (z) = Rm (z) ei(+n) = ei(+m)
ei(nm) = 1
(n m) Z
Thus, the points {Rn (z)|n Z} are all distinct and it follows that {Rn (z)}
n=1
is an infinite sequence. By compactness, this sequence has a convergent
subsequence, which is Cauchy. Therefore, we can find integers n > m such
that
d(Rn (z), Rm (z)) <
where d is the arc length distance function. Now, since R is distance preserving with respect to this metric, we can set l = nm to obtain d(Rl (z), z) < .
Also, by continuity of Rl , Rl maps the connected, closed arc from z to Rl (z)
onto the connected closed arc from Rl (z) to R2l (z) and this one onto the arc
connecting R2l (z) to R3l (z), etc. Since these arcs have positive and equal
length, they cover S 1 . The result now follows, since the arcs have length
smaller than , and > 0 was arbitrary.
As mentioned in the proof, R is an isometry with respect to the arc length
distance function and this metric induces the topology on S 1 . Hence, we
see that span(n, , R ) = span(1, , R ) for all n Z0 and it follows that
h(R ) = 0, for any [0, 2).
Example 7.3.2 . (Bernoulli Shift Revisited) In this example we will reintroduce the Bernoulli shift in a topological context. Let Xk = {0, 1, . , k 1}Z
(or Xk+ = {0, . , k 1}N{0} ) and recall that the left shift on Xk is defined by
L : Xk Xk , L(x) = y, yn = xn+1 . We can define a topology on Xk (or
Xk+ ) which makes the shift map continuous (in fact, a homeomorphism in
the case of Xk ) as follows: we equip {0, . . . , k 1} with the discrete topology
and subsequently give Xk the corresponding product topology. It is easy to
see that the cylinder sets form a basis for this topology. By the Tychonoff
theorem (see e.g. [Mu]), which asserts that an arbitrary product of compact
122
where we have used the metric d on Xk+ defined above. Thus, the set An
consisting of sequences a Xk+ such that ai {0, . . . , k 1} for 0 i n1
and aj = 0 for j n is (n, )-separated. Explicitly, a An is of the form
a0 a1 a2 . . . an1 000 . . .
Since there are k n possibilities to choose the first n symbols of an itinerary,
we obtain sep(n, , L) k n . Hence,
1
log(sep(n, , L)) log(k)
n n
To prove the reverse inequality, take l Z>0 such that 2l < . Then An+l
is an (n, , L)-spanning set, since for every x Xk+ there is some a An+l
for which the first n + l symbols coincide. In other words, d(x, a) < 2l < .
Therefore,
n+l
1
log(span(n, , L)) lim lim
log(k) = log(k)
n n
0 n
n
123
124
125
Exercise 7.3.1 In this exercise you are asked to prove some properties of
the shift map. Let Xk = {0, 1, . , k 1}Z and Xk+ = {0, 1, . , k 1}N{0} .
a. Show that the map d : Xk Xk Xk , defined by
d(x, y) = 2l if l = min{|j| : xj 6= yj }
is a metric on Xk that induces the product topology on Xk .
b. Show that the map d : Xk Xk Xk , defined by
X
|xn yn |
d (x, y) =
2|n|
n=
126
Chapter 8
The Variational Principle
In this chapter we will establish a powerful relationship between measure
theoretic and topological entropy, known as the Variational Principle. It
asserts that for a continuous transformation T of a compact metric space X
the topological entropy is given by h(T ) = sup{h (T )| M (X, T )}. To
prove this statement, we will proceed along the shortest and most popular
route to victory, provided by [M]1 . The proof is at times quite technical and
is therefore divided into several digestable pieces.
8.1
Main Theorem
For the first part of the proof of the Variational Principle we will only use
some properties of measure theoretic entropy. All the necessary ingredients
are listed in the following lemma:
Lemma 8.1.1 Let , , be finite partitions of X and T a measure preserving transformation of the probability space (X, F, ). Then,
1. H () log(N ()), where N () is the number of sets in the partition
1
, for all A
of non-zero measure. Equality holds only if (A) = N ()
with (A) > 0
2. If , then h (T, ) h (T, )
1
127
128
i=0
n1
_
T i
i=0
H(
n1
_
T i ) H(
i=0
i=0
n1
_
= H(
T i
n1
_
T i )
i=0
n1
_
T i ) + H(
i=0
T i |
n1
_
i=0
T i )
i=0
Now, by successively applying proposition (4.2.1) (d), (g) and finally using
the fact that T is measure preserving, we get
n1
_
H(
i=0
T |
n1
_
T )
i=0
n1
X
H(T |
i=0
n1
X
n1
_
T i )
i=0
H(T i |T i )
i=0
= nH(|)
Combining the inequalities, dividing both sides by n and taking the limit for
n gives the desired result.
Main Theorem
129
n1
_
T i ) =
i=0
k1
n1
1 _ nj _ i
H( T (
T ))
k k
j=0
i=0
lim
kn1
_
n
H(
T i )
= lim
k kn
i=0
m1
_
n
= lim
H(
T i )
m m
i=0
= nh(T, )
W
i
Notice that { n1
i=0 T | a partition of X} { | a partition of X}.
Hence,
nh(T ) = n sup h(T, )
= sup h(T ,
n1
_
T i )
i=0
n
sup h(T , )
= h(T n )
Finally, since
Wn1
i=0
T i , we obtain by (2),
n
h(T , ) h(T ,
n1
_
T i ) = nh(T, )
i=0
Exercise 8.1.1 Use lemma 8.1.1 and proposition (4.2.1) to show that for an
invertible measure preserving transformation T on (X, F, ):
h(T n ) = |n|h(T ), for all n Z
We are now ready to prove the first part of the theorem.
130
n X
n
X
(Bi )f (
i=0 j=1
= (B0 )
n
X
j=1
f(
(Bi Aj )
)
(Bi )
(B0 Aj )
)
(B0 )
(A B0 )
(B0 )
n
X
j=1
f(
(B0 Aj )
) = (B0 )H(|B0 ) (0 )
(B0 )
Hence, we can apply (1) of lemma 8.1.1 and use the fact that
(B0 ) = (X ni=1 Bi ) = (ni=1 Ai ni=1 Bi ) = (i=1 (Ai Bi )) < n
Main Theorem
131
to obtain
H (|) (B0 ) log(n) < n log(n) < 1
Define for each i, i = 1, . . . , n, the open set Ci by Ci = B0 Bi = X
j6=i Bj . Then we can define an open cover of X by = {C1 , . . . , Cn }.
By (1)
8.1.1 weWhave for m 1, in the notation of the lemma,
W of lemma
i
i
H ( m1
T
)
log(N ( m1
i=0
i=0 T )). Note that is not necessarily
Wm1 ia partition, butWby the proof of (1) of lemma 8.1.1, we still have H ( i=0 T )
i
log(2m N ( m1
i=0 T )), since contains (at most) one more set of non-zero
measure.
Thus, it follows that
h (T, ) h(T, ) + log 2 h(T ) + log 2
and by (3) of lemma 8.1.1 we get obtain
h (T, ) h (T, ) + H (|) h(T ) + log 2 + 1
Note that if M (X, T ), then also M (X, T m ), so the above inequality
holds for T m as well. Applying (4) of lemma 8.1.1 and (3) of theorem 7.2.4
leads us to mh (T ) mh(T ) + log 2 + 1. By dividing by m, taking the limit
for m and taking the supremum over all M (X, T ), we obtain the
desired result.
We will now finish the proof of the Variational Principle by proving the
opposite inequality. This is the hard part of the proof, since it does not only
require more machinery, but also a clever trick.
Lemma 8.1.2 Let X be a compact metric space and a finite partition of
X. Then, for any , M (X, T ) and p [0, 1] we have Hp+(1p) ()
pH () + (1 p)H ()
Proof
Define the function f : [0, ) R by f (t) = t log(t) (t > 0), f (0) = 0.
Then f is concave and so, for A measurable, we have:
0 f (p(A) + (1 p)(A)) pf ((A)) (1 p)f ((A))
The result now follows easily.
132
Main Theorem
133
where the final inequality follows from the assumption (A) = 0. The proof
of the opposite inequality is similar.
Lemma 8.1.4 Let q, n be integers such that 1 < q < n. Define, for 0 j
c, where bc means taking the integer part. Then we have
q 1, a(j) = b nj
q
the following
1. a(0) a(1) a(q 1)
2. Fix 0 j q 1. Define
Sj = {0, 1, . . . , j 1, j + a(j)q, j + a(j)q + 1, . . . , n 1}
Then
{0, 1, . . . , n 1} = {j + rq + i|0 r a(j) 1, 0 i q 1} Sj
and card(Sj ) 2q.
134
Example 8.1.1 Let us work out what is going on in lemma 8.1.4 for n =
c = 2 and similarly, a(1) = 2, a(2) = 2
10, q = 4. Then a(0) = b 100
4
and a(3) = 1. This gives us S0 = {8, 9}, S1 = {0, 9}, S2 = {0, 1} and
S3 = {0, 1, 2, 7, 8, 9}. For example, S3 is obtained by taking all nonnegative
integers smaller than 3 and adding the numbers 3+a(3)q = 7, 3+a(3)q+1 = 8
and 3 + a(3)q + 2 = 9. Note that card(Sj ) 8. One can readily check all
1c 4 + 3 =
properties stated in the lemma, e.g. (a(3) 1)q + 3 b 103
4
3 6 = n q.
We shall now finish the proof of our main theorem. The proof is presented
in a logical order, which is in this case not quite the same as thinking order.
Therefore, before plunging into a formal proof, we will briefly discuss the
main ideas.
We would like to construct a Borel probability measure with measure theoretic entropy h (T ) sep(, T ) := limn n1 log(sep(n, , T )). To do this,
W
i
we first find a measure n with Hn ( n1
i=0 T ) equal to log(sep(n, , T )),
where is a suitably chosen partition. This is not too difficult. The problem
is to find an element of M (X, T ) with this property. Theorem (6.1.5) suggest a suitable measure which can be obtained from the n . The trick in
lemma 8.1.4 plays a crucial role in getting from an estimate for W
Hn to one
i
for H . The idea is to first remove the tails Sj of the partition n1
i=0 T ,
make a crude estimate for these tails and later add them back on again.
Lemma 8.1.3 fills in the remaining technicalities.
Theorem 8.1.2 Let T : X X be a continuous transformation of a
compact metric space. Then h(T ) sup{h (T )| M (X, T )}.
Proof
Fix > 0. For each n, let En be an (n, )-separated set of P
cardinality sep(n, , T ). Define n M (X) by n = (1/sep(n, , T )) xEn x ,
Main Theorem
135
where P
x is the Dirac measure concentrated at x. Define n M (X) by
1
i
n = n n1
i=0 n T . By theorem 19, M (X) is compact, hence there exists a
subsequence {nj }
j=1 such that {nj } converges in M (X) to some M (X).
By theorem (6.1.5), M (X, T ). We will show that h (T ) sep(, T ), from
which the result clearly follows.
By (2) of lemma 8.1.3, we can find a -measurable partition = {A1 , . . . , Ak }
of X such that diam(Aj ) < and (Aj ) = 0, j = 1, . . . , k. We
may asWn1
sume that every x En is contained in some Aj . Now, if A i=0 T i ,
then A cannot contain more than one element of En . Hence, W
n (A) = 0 or
n1 i
k
T ) =
1/sep(n, , T ). Since j=1 Aj contains all of En , we see that Hn ( i=0
log(sep(n, , T )).
Fix integers q, n with 1 < q < n and define for each 0 j q 1 a(j) as in
lemma 8.1.4. Fix 0 j q 1. Since
n1
_
a(j)1
i
T =(
q1
(rq+j)
r=0
i=0
T i )) (
i=0
T i )
iSj
we find that
n1
_
log(sep(n, , T )) = Hn (
T i )
i=0
a(j)1
q1
Hn (T
rqj
r=0
T i )) +
i=0
a(j)1
Hn (T i )
iSj
q1
Hn T (rq+j) (
r=0
T i ) + 2q log k
i=0
Here we used proposition (4.2.1)(d) and lemma 8.1.1. Now if we sum the
above inequality over j and divide both sides by n, we obtain by (3) and (1)
of lemma 8.1.4
q1
n1
_
2q 2
1X
q
log(sep(n, , T ))
Hn T l ( T i ) +
log k
n
n l=0
n
i=0
q1
H n (
_
i=0
T i ) +
2q 2
log k
n
136
W
i
By (3) of lemma 8.1.3, each atom A of q1
i=0 T has boudary of -measure
zero, so by (4) of the same lemma, limj nj (A) = (A). Hence, if we
replace n by nj in the above
inequality and take the limit for j we
W
i
obtain qsep(, T ) H q1
T
. If we now divide both sides by q and
i=0
take the limit for q , we get the desired inequality.
Corollary 8.1.1 (The Variational Principle) The topological entropy of a
continuous transformation T : X X of a compact metric space X is
given by h(T ) = sup{h (T )| M (X, T )}
To get a taste of the power of this statement, let us recast our proof of the
invariance of topological entropy under conjugacy ((2) of theorem 7.2.4). Let
: X1 X2 denote the conjugacy. We note that M (X1 , T1 ) if and only
if 1 M (X2 , T2 ) and we observe that h (T1 ) = h1 (T2 ). The result
now follows from the Variational Principle. It is that simple.
8.2
The Variational Principle suggests an educated way of choosing a Borel probability measure on X, namely one that maximizes the entropy of T .
Definition 8.2.1 Let X be a compact metric space and T : X X be
continuous. A measure M (X, T ) is called a measure of maximal entropy
if h (T ) = h(T ). Let Mmax (X, T ) = { M (X, T )|h (T ) = h(T )}. If
Mmax (X, T ) = {} then is called a unique measure of maximal entropy.
Example. Recall that the topological entropy of the circle rotation R is
given by h(R ) = 0. Since h (R ) 0 for all M (X, R ), we see that
every M (X, R ) is a measure of maximal entropy, i.e. Mmax (X, R ) =
M (X, R ). More generally, we have h(T ) = 0 and hence Mmax (X, T ) =
M (X, T ) for any continuous isometry T : X X.
Measures of maximal entropy are closely connected to (uniquely) ergodic
measures, as will become apparent from the following theorem.
Theorem 8.2.1 Let X be a compact metric space and T : X X be
continuous. Then
137
Mmax (X, T ) is a convex set.
If h(T ) < then the extreme points of Mmax (X, T ) are precisely the
ergodic members of Mmax (X, T ).
If h(T ) = then Mmax (X, T ) 6= . If, moreover, T has a unique
measure of maximal entropy, then T is uniquely ergodic.
A unique measure of maximal entropy is ergodic. Conversely, if T is
uniquely ergodic, then T has a measure of maximal entropy.
Proofs
(1): Let p [0, 1] and , Mmax (X, T ). Then, by exercise 8.1.3,
hp+(1p) (T ) = ph (T ) + (1 p)h (T ) = ph(T ) + (1 p)h(T ) = h(T )
Hence, p + (1 p) Mmax (X, T ).
(2): Suppose Mmax (X, T ) is ergodic. Then, by theorem 6.1.6, cannot
be written as a non-trivial convex combination of elements of M (X, T ). Since
Mmax (X, T ) M (X, T ), is an extreme point of Mmax (X, T ). Conversely,
suppose is an extreme point of M (X, T ) and suppose there is a p (0, 1)
and 1 , 2 M (X, T ) such that = p1 + (1 p)2 . By exercise 8.1.3,
h(T ) = h (T ) = ph1 (T ) + (1 p)h2 (T ). But by the Variational Principle,
h(T ) = sup{h (T )| M (X, T )}, so h(T ) = h1 (T ) = h2 (T ). In other
words, 1 , 2 Mmax (X, T ), thus = 1 = 2 . Therefore, is also an
extreme point of M (X, T ) and we conclude that is ergodic.
For the second part, suppose that Mmax (X, T ) 6= .
(3): By the Variational Principle, for any n Z>0 , we can find a n
M (X, T ) such that hn (T ) > 2n . Define M (X, T ) by
=
X
n
n=1
2n
X
n
+
n
2
2n
n=N
N
X
n
n=1
But then,
h (T )
N
X
n
n=1
2n
>N
138
Now suppose that T has a unique measure of maximal entropy. Then, for
any M (X, T ), h/2+/2 (T ) = 21 h (T ) + 12 h (T ) = . Hence, = ,
M (X, T ) = {}.
(4): The first statement follows from (2), for h(T ) < , and (3), for
h(T ) = . If T is uniquely ergodic, then M (X, T ) = {} for some .
By the Variational Principle, h (T ) = h(T ).
0i2l+n1
139
So T l A is an (2l + n, , T )-separated set of cardinality sep(n, , T ). Hence,
1
log(sep(n, , T ))
n n
1
lim log(sep(2l + n, , T ))
n n
1
lim
log(sep(2l + n, , T ))
n 2l + n
= sep(, T )
sep(, T ) =
lim
140
Bibliography
[B]
[D]
[DF]
[De]
[G]
[Ho]
Hoppensteadt, F.C., 1993, Analysis and Simulation of Chaotic Systems, Springer-Verlag New York, Inc.
[H]
[KK]
[KT]
Kingman and Taylor, Introduction to measure and probability, Cambridge Press, 1966.
[M]
[Mu]
142
[Pa]
[P]
[W]
Index
sep(n, , T ), 111
span(n, , T ), 111
144
ergodic decomposition, 40
expansive, 106
constant, 106
extreme point, 92
monotone class, 7
factor map, 51
first return time, 16
fixed point, 105
orbit, 103
forward, 103
natural extension, 52
non-singular transformation, 80
145
one-sided, 103
translations, 9
uniquely ergodic, 96
Variational Principle, 136
weakly mixing, 45