Download as pdf or txt
Download as pdf or txt
You are on page 1of 481


I N M AT H E M AT I C S 228

Topological and
Ergodic Theory of
Symbolic Dynamics

Henk Bruin
Topological and
Ergodic Theory of
Symbolic Dynamics
I N M AT H E M AT I C S 228

Topological and
Ergodic Theory of
Symbolic Dynamics

Henk Bruin
Matthew Baker
Marco Gualtieri
Gigliola Staffilani (Chair)
Jeff A. Viaclovsky
Rachel Ward

2020 Mathematics Subject Classification. Primary 37B10; Secondary 37B05, 28D05,

11J70, 68R15.

For additional information and updates on this book, visit

Library of Congress Cataloging-in-Publication Data

Names: Bruin, Henk, 1966- author.
Title: Topological and ergodic theory of symbolic dynamics / Henk Bruin.
Description: Providence, Rhode Island : American Mathematical Society, [2022] | Series: Gradu-
ate studies in mathematics, 1065-7339 ; 228 | Includes bibliographical references and index.
Identifiers: LCCN 2022034733 | ISBN 9781470469849 (hardcover) | ISBN 9781470472191 (paper-
back) | ISBN 9781470472184 (ebook)
Subjects: LCSH: Symbolic dynamics. | Topological dynamics. | Ergodic theory. | AMS: Dy-
namical systems and ergodic theory – Topological dynamics – Symbolic dynamics. | Dynam-
ical systems and ergodic theory – Topological dynamics – Transformations and group actions
with special properties (minimality, distality, proximality, etc.). | Measure and integration
– Measure-theoretic ergodic theory – Measure-preserving transformations. | Number theory
– Diophantine approximation, transcendental number theory – Continued fractions and gen-
eralizations. | Computer science – Discrete mathematics in relation to computer science –
Combinatorics on words.
Classification: LCC QA614.85 .B78 2022 | DDC 515/.39–dc23/eng20221021
LC record available at

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting
for them, are permitted to make fair use of the material, such as to copy select pages for use
in teaching or research. Permission is granted to quote brief passages from this publication in
reviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publication
is permitted only under license from the American Mathematical Society. Requests for permission
to reuse portions of AMS publication content are handled by the Copyright Clearance Center. For
more information, please visit
Send requests for translation rights and licensed reprints to

c 2022 by the author. All rights reserved.
Printed in the United States of America.

∞ The paper used in this book is acid-free and falls within the guidelines
established to ensure permanence and durability.
Visit the AMS home page at
10 9 8 7 6 5 4 3 2 1 27 26 25 24 23 22

Preface ix

Chapter 1. First Examples and General Properties of Subshifts 1

§1.1. Symbol Sequences and Subshifts 1
§1.2. Word-Complexity 5
§1.3. Transitive and Synchronized Subshifts 9
§1.4. Sliding Block Codes 10
§1.5. Word-Frequencies and Shift-Invariant Measures 12
§1.6. Symbolic Itineraries 14

Chapter 2. Topological Dynamics 19

§2.1. Basic Notions from Dynamical Systems 19
§2.2. Transitive and Minimal Systems 23
§2.3. Equicontinuous and Distal Systems 28
§2.4. Topological Entropy 36
§2.5. Mathematical Chaos 40
§2.6. Transitivity and Topological Mixing 44
§2.7. Shadowing and Specification 47

Chapter 3. Subshifts of Positive Entropy 51

§3.1. Subshifts of Finite Type 51
§3.2. Sofic Shifts 61
§3.3. Coded Subshifts 65
§3.4. Hereditary and Density Shifts 71

vi Contents

§3.5. β-Shifts and β-Expansions 77

§3.6. Unimodal Subshifts 88
§3.7. Gap Shifts 117
§3.8. Spacing Shifts 120
§3.9. Power-Free Shifts 122
§3.10. Dyck Shifts 128
Chapter 4. Subshifts of Zero Entropy 133
§4.1. Linear Recurrence 133
§4.2. Substitution Shifts 135
§4.3. Sturmian Subshifts 162
§4.4. Interval Exchange Transformations 180
§4.5. Toeplitz Shifts 185
§4.6. B-Free Shifts 195
§4.7. Unimodal Restrictions to Critical Omega-Limit Sets 203
Chapter 5. Further Minimal Cantor Systems 217
§5.1. Kakutani-Rokhlin Partitions 217
§5.2. Cutting and Stacking 220
§5.3. Enumeration Systems 225
§5.4. Bratteli Diagrams and Vershik Maps 233
Chapter 6. Methods from Ergodic Theory 257
§6.1. Ergodicity 259
§6.2. Birkhoff’s Ergodic Theorem 260
§6.3. Unique Ergodicity 262
§6.4. Measure-Theoretic Entropy 282
§6.5. Isomorphic Systems 284
§6.6. Measures of Maximal Entropy 287
§6.7. Mixing 295
§6.8. Spectral Properties 309
§6.9. Eigenvalues of Bratteli-Vershik Systems 325
Chapter 7. Automata and Linguistic Complexity 341
§7.1. Automata 341
§7.2. The Chomsky Hierarchy 345
§7.3. Automatic Sequences and Cobham’s Theorems 357
Contents vii

Chapter 8. Miscellaneous Background Topics 367

§8.1. Pisot and Salem Numbers 367
§8.2. Continued Fractions 376
§8.3. Uniformly Distributed Sequences 385
§8.4. Diophantine Approximation 391
§8.5. Density and Banach Density 395
§8.6. The Perron-Frobenius Theorem 398
§8.7. Countable Graphs and Matrices 401
Appendix. Solutions to Exercises 413
Bibliography 423
Index 451

Symbolic dynamics and coding: For many students in mathematics,

symbolic dynamics appears first in a general course on dynamical systems, as
a symbolic description of Smale’s horseshoe, basic sets of Axiom A attractors,
or toral automorphisms. For these systems there is a Markov partition and
on a symbolic level these are described by a subshift of finite type (SFT).
That symbolic dynamics can lead to other subshifts (e.g. Sturmian shifts,
β-shifts, kneading theory) often remains beyond view or is only covered in
specialized topics courses. Also a standard undergraduate program will not
reveal that symbolic dynamics is a flourishing subject by itself, giving very
flexible approaches to construct examples with particular topological and
ergodic properties. This book is meant to give an overview of many of these
aspects of symbolic sequences and coding and allows readers to look beyond
their own initial interest. That some topics (in particular ergodic theory and
also kneading theory, i.e. symbolic dynamics of unimodal interval maps) are
studied at considerably greater depth than others has to do with my own
research background and interests.
Because {0, 1}N and {0, 1}Z are standard representations of the Cantor
set, symbolic dynamics is in essence the study of transformations of Cantor
sets. When I asked my second thesis supervisor, Jan Aarts, once if there was
a general topological classification of such Cantor systems, his answer was
“no”. If I would have been able to ask this question again, his answer may
have been very different. Not only has there been an explosion of new types of
subshifts that have been investigated in detail, several unified ways to study
them and Cantor systems as a whole have been developed as well. That a
complete classification is still lacking seems nowadays less important, or of
rather abstract interest, than how modern techniques help us to understand
concrete (symbolic) systems.

x Preface

What you will find in this book:

Chapter 1: Chapter 1 presents the basic notions in symbolic dynamics, in-

cluding itineraries, word-complexity, and word-frequency (and a
first mention of shift-invariant measures), and it presents some
of the simplest examples. In Section 1.4 we discuss the Curtis-
Hedlund-Lyndon Theorem 1.23 on conjugacy between subshifts given
by sliding block codes.
Chapter 2: Chapter 2 gives a brief introduction to topological dynamics. We
start with dynamical systems and its basic terminology. Then we
present transitive (i.e. dynamically indecomposable) systems fol-
lowed by minimal systems, which can be characterized by bounded
gaps in the sequence of visit times to open sets. We discuss the dis-
tinction between expansive and expanding, between expansive and
distal (and equicontinuity, which is achieved under the appropri-
ate metric). Mean equicontinuity is a more flexible and important
variation of this. In addition to topological entropy and power en-
tropy (which are the logarithmic and polynomial growth rates of
the word-complexity), we discuss the fairly new notion of amorphic
complexity which is a finer tool than power entropy in distinguish-
ing zero entropy systems, such as constant length substitution shifts
and Toeplitz shifts. We discuss several forms of mathematical chaos
(in the sense of Devaney, of Li-Yorke, and of Auslander-Yorke for
minimal systems) and the Auslander-Yorke dichotomy, with some
of its measure-theoretic versions. We present topological mixing in
various strengths, and shadowing and specification properties, from
Anosov’s and Bowen’s work to its use in intrinsic ergodicity, i.e. the
existence of a unique measure of maximal entropy.
Chapter 3: The main types of subshifts are divided into positive entropy
and zero entropy subshifts, although this distinction doesn’t ap-
ply strictly to every type; for instance zero and positive entropy
Toeplitz shifts and B-free shifts might be equally important. Chap-
ter 3 covers the positive entropy subshifts. The main class of these
is the subshifts of finite type, followed by sofic shifts. Both can
be characterized by transition graphs (vertex-labeled versus edge-
labeled) and their topological entropy can be simply computed as
the logarithm of the leading (Perron-Frobenius) eigenvalue of their
transition matrices, which therefore take specific values only (the
logarithms of Perron numbers). More general types of subshifts,
in which all values of topological entropy can be achieved, include
coded shifts, density shifts, gap-shifts, and spacing shifts, which
appear here probably first at textbook level. Directly related to
Preface xi

interval maps (β-transformations and unimodal maps) are β-shifts

and kneading theory. Our treatment of the latter is broader than in
the (by now 40-year-old) monographs of Milnor & Thurston [420] or
Collet & Eckmann [164]. We cover at length Hofbauer’s approach of
cutting times, which is closely related to the enumeration systems of
Section 5.3. Infinitely renormalizable unimodal maps, ∗-products,
and (strange) adding machines are deferred to Section 4.7.1 because
they fit better in the zero entropy chapter. Finally, we present the
square-free subshifts and Dyck shifts, which have more interest in
automata theory than dynamical systems.
Chapter 4: Zero entropy subshifts are covered in Chapter 4 starting with lin-
ear recurrence, which is rather a property than a class of subshifts.
From linear recurrence various properties can be derived, including
unique ergodicity and linear complexity, and the absence of arbi-
trarily high powers of subwords. Substitution shifts are a main
class of zero entropy subshift that have been studied in regard to
their intriguing ergodic and spectral properties, as demonstrated in
Chapter 6. They are central in the theory of tiling spaces, (Rauzy)
fractals, and applications such as paper-folding sequences. Their
generalization to S-adic shift is powerful enough to describe most
minimal subshifts. The second main class of zero entropy subshifts
is the Sturmian shifts, which are almost one-to-one extensions of
circle rotations (via Denjoy’s construction; see Section 4.3.1), but
otherwise appearing in many problems in combinatorics. Sturmian
sequences have the threefold characterization as symbolic dynamics
of irrational circle rotations, sequences of minimal non-trivial word-
complexity p(n) = n+1, and minimal 1-balanced sequences. We use
them to introduce Rauzy graphs (see Section 4.3.4) and also give a
detailed account of their representation as S-adic shifts (generalized
to Arnoux-Rauzy shifts of torus rotations). Circle rotations gener-
alize naturally to interval exchange transformations (IETs), and the
corresponding subshifts have word-complexity p(n) = (d − 1)n + 1,
where d is the number of intervals of the IET. We discuss Rauzy
induction which leads to their representation as S-adic shifts, and
this will later (Section 6.3.5) be used to give a solution of the Keane
conjecture on the typical unique ergodicity of IETs, bypassing the
use of Teichmüller flows in the original proofs of Masur [410] and
Veech [543]. Sections 4.5 and 4.6 combine adding machines (odome-
ters), which are not actual subshifts as they are not expansive, with
Toeplitz shifts and B-free subshifts, because the latter two are, un-
der mild conditions, one-to-one extensions of odometers. The final
Section 4.7 comes largely from some of my own research papers
xii Preface

(with or without coauthors) in topological dynamics. It shows how

some minimal subshifts (known and new) arise naturally in the
setting of unimodal interval maps, but contrary to the section on
kneading theory, they fit in the zero entropy chapter.
Chapter 5: Chapter 5 discusses minimal Cantor systems that are not nec-
essarily subshift and presents three methods of describing them,
namely cutting and stacking, enumeration systems, and Bratteli-
Vershik systems. In Section 5.1 we show a basic construction of
Kakutani-Rokhlin partitions, probably first described in general
form by Herman, Putnam & Skau [310], that can be used as a
building block to translate (virtually) any minimal system into any
of these three descriptions. The cutting and stacking was used by
Kakutani, Chacon, and others to produce the earliest examples of
uniquely ergodic systems with particular (weak) mixing or spec-
tral properties. Enumeration systems are a generalization of the
Ostrowski numeration, see Example 5.22, that is based on the stan-
dard continued fraction expansion. Liardet with various coauthors
studied these in detail, obtaining among other things novel results
on unique ergodicity. Under certain conditions (related to Pisot
numbers), the method also gives a way of associating Rauzy frac-
tals to particular minimal Cantor systems; see Section 5.3.1. The
discussion of Bratteli-Vershik (BV) systems is the most extensive
section in this chapter, partly because we give explicit constructions
of various subshifts and also of cutting and stacking and of enumer-
ation systems in terms of BV-systems. In this form, these systems
frequently return in Chapter 6 where ergodic and spectral proper-
ties of minimal Cantor systems are studied. It is worth mentioning
that Gjerde & Johansen’s description [275] of IETs follows in fact
from Rauzy induction described in Section 4.4. The description of
Toeplitz shifts in terms of a BV-system is also given by Gjerde &
Johansen [274], and we largely follow their exposition.
Chapter 6: Chapter 6 treats the properties of invariant measures of sub-
shifts and other (minimal) Cantor systems. After a brief recall of
the notions of ergodicity, the Choquet (and Poulsen) simplex of
invariant probability measures, and Birkhoff’s Ergodic Theorem,
we discuss unique ergodicity in Section 6.3. Special topics here are
Boshernitzan’s results based on low word-complexity, the “classical”
proof of unique ergodicity of primitive substitution shifts, unique
ergodicity of BV-systems based on contraction in the Hilbert met-
ric, and a proof of unique ergodicity of typical IETs (the Keane
conjecture), based on Rauzy induction. In Section 6.4 we discuss
measure-theoretic entropy, preparing our discussion of metrically
Preface xiii

isomorphic systems and Ornstein’s Theorem in Section 6.5. We

mention the Variational Principle and measures of maximal entropy
in Section 6.6 and discuss the notion of entropy denseness. We also
present the Shannon-Parry measure (measure of maximal entropy
for SFTs).
In Section 6.7 we recall proofs that many minimal subshifts
cannot be strongly mixing (i.e. the correlation coefficients cannot
tend to zero). This applies to substitution shifts, or more gener-
ally linearly recurrent subshifts, and also to cutting and stacking
systems with a bounded number of layers of spacer. In contrast,
for staircase systems (i.e. cutting and stacking with an unbounded
number of slices in between the stacks without bound on the num-
ber of layers of spacer) strong mixing can occur. This conjecture
by Smorodinsky was proved by Adams, and we present the result
in detail. In Section 6.7.3 we discuss the notion of weak mixing,
which in spectral terms means that constant functions are the only
eigenfunction of the Koopman operator. The search for (continuous
or measurable) eigenfunctions for various subshifts, specifically sub-
stitution shifts, goes back to Dekking, Keane, Queffélec, and Host,
and coauthors. We follow the more recent approach of Durand,
Maass, and coauthors and give detailed proofs of how the condi-
tion |||αhv (n)||| := d(αhv (n), Z) → 0 as n → ∞ (where hv (n) are
the height vectors of the Bratteli diagram) relates to e2πiα being a
continuous or measurable eigenvalue of the Koopman operator.
In Section 6.8 we study spectral properties of Cantor systems,
that is, the properties that can be expressed in terms of the spec-
trum of the Koopman operator UT : L2 (μ) → L2 (μ), UT f → f ◦ T .
This includes the spectral measures (Fourier transforms) of observ-
ables f ∈ L2 (μ) that are important for the spectral decomposi-
tion of the Hilbert space L2 (μ) and the Koopman operator itself.
We discuss the spectral measure and spectral type of UT itself, in
particular conditions under which Cantor systems have pure point
spectrum or mixed spectrum.

Chapter 7: Chapter 7 discusses ways in which symbol sequences play a role in

computer science, information theory, and data transmission. Au-
tomata represent a theoretical model for computers and a concrete
way of generating sequences, which are hence called automatic se-
quences. Fixed points of substitutions are automatic, by Cobham’s
Little Theorem. We discuss automata and automatic sequences in
Section 7.1. In Section 7.2 we present the Chomsky hierarchy of
formal languages, consisting of four basic levels of complexity of
xiv Preface

their underlying grammar (production rules) and of the type of au-

tomaton that can detect or produce them. The lowest level, regular
languages, can be produced by finite automaton and also corre-
spond to sofic subshifts discussed in Section 3.2. Pumping lemmas
are a major tool for distinguishing between these levels, and they
show that most other types of subshifts discussed in earlier chapters
are in the context-sensitive category or beyond.
Chapter 8: Here we give background material to which earlier chapters fre-
quently refer, but some of its sections comprise short topics of inde-
pendent interest as well. Section 8.1 on Pisot numbers, apart from
giving definitions and some historic background, addresses the ques-
tion of for which α ∈ R, |||αhv (n)||| := d(αGn , Z) → 0 as n → ∞,
where Gn is a sequence of integers satisfying a linear recursion for-
mula (such as the height vectors of the Bratteli diagram do). In
Section 8.2 we discuss the standard continued fraction algorithm,
as well as Farey arithmetic, and associated ways (Kepler, Calkin-
Wilf) of denumbering all the rationals.
Section 8.3 covers uniformly distributed sequences, Weyl’s crite-
rion, and Van der Corput’s Difference Theorem. Section 8.4 on Dio-
phantine approximation, in addition to definitions and some historic
background, discusses Dirichlet’s Theorem, the Lagrange spectrum,
and Roth’s Theorem
Section 8.5 covers the (Banach) density and logarithmic density
(important in B-free subshifts) of sequences.
In Section 8.6 we present the Perron-Frobenius Theorem 8.58,
on the leading eigenvalue and eigenspace of non-negative matrices,
also in connection with the Hilbert metric on projective space. In
Section 8.7 we treat countable Markov graphs and matrices and
present the Vere-Jones classification into transient, positive recur-
rent, and null recurrent systems, including Gurevich entropy and
refined classifications by Salama and Ruette. We also give some
results on the (non-)existence of measures of maximal entropy for
countable state Markov chains. Section 8.7.3 presents the rome
technique from the paper [88] by Block et al., which facilitates en-
tropy computations for countable Markov graphs.

The scope and aim of the book: Although the text grew from two courses
I gave on symbolic dynamics at the University of Vienna, there is no attempt
to shape it as a textbook for a course on symbolic dynamics. The material
is too diverse, hardly balanced in depth and detail, and I have made no at-
tempt to indicate which sections together would constitute such a course. My
Preface xv

experience is that hardly any book, however well-conceived and written, com-
pletely fits the purpose and taste of the lecturers teaching the actual class.
Instead, I hope this book can serve a purpose for topics courses or reading
courses or as a reference book for anyone wishing to acquaint him/herself to
a particular topic.
Also the exercises are not devised as a testing tool of the students’ un-
derstanding of the material. Symbolic dynamics is at the intersection of
dynamical systems, topological dynamics, combinatorics, and of course cod-
ing theory, and there are a lot of trivia to share. Some of these trivia are
disguised as exercises. Most of the exercises have solutions in the back of
the book. I have given, probably multiple times, proofs of simple results
that are used as exercises in comparable books, but I wanted to avoid the
annoying situation that you cannot refer to a well-known result because it is
only presented as an exercise without solution.
The necessary background for the book varies: for most of it a solid
knowledge of real analysis and linear algebra and first courses in probability
and measure theory (a few times, conditional expectations are used, and
martingales in the proof of Theorem 6.118), metric spaces, number theory,
topology, and set theory suffice. Chapter 6 is not meant as an introduction
to ergodic theory, so a course in ergodic theory and in Hilbert spaces and
Fourier analysis for Section 6.8 are probably necessary. Section 8.1 uses some
Galois theory.
By adding an extensive index and cross-references within the main text,
I tried to enable the reader to study the chapters independently. However,
readers without any prior knowledge of symbolic dynamics should not skip
(the first halves of) Chapters 1 and 2. To follow Chapter 5 one should have a
good understanding of SFTs (Section 3.1), substitution shifts (Section 4.2),
and Sturmian shifts (Section 4.3). To follow Chapter 6, an additional un-
derstanding of BV-systems (Section 5.4) is required. Chapter 8 can be read
largely independently of the rest, except for several examples with direct
references to earlier parts in the text.
The book is largely, but not entirely, self-contained. That would go
too far, because various topics are covered much better in other textbooks.
General books on ergodic theory are [165, 277, 305, 346, 408, 456, 479, 509,
551]; for proofs of Birkhoff’s Ergodic Theorem we suggest [341, 346, 349,
551], and for the proof of the Variational Principle we recommend [551]. In
topological dynamics the book by Auslander is a classic (although difficult to
navigate). Other texts are [12,22,198,199,284,381]. In symbolic dynamics
there are Kitchens [364] and Lind & Marcus [398], both specializing in
SFTs, and the more general book by Kůrka [381], and topic collections
by Blanchard et al. [85] and Bedford et al. [57]. Substitution shifts have
xvi Preface

the expert monographs Queffélec [465], “Phyteas Fogg” [249], and other
groups of authors [68]. I should not fail to mention Viana’s monograph [548]
on interval exchange transformations. Continued fractions and Diophantine
approximation are the subject of monographs [98,175,360,377]. For general
dynamical systems there are [17, 113, 346, 474], and [22, 87, 116, 164, 414,
462] for one-dimensional dynamics in particular.
Various milestones in the theory have been treated extensively by people
far more expert and well-placed than I am. For example, this holds for
Williams’s Theorem on conjugacy in subshifts of finite type; see [364, 398].
This also holds for Ornstein’s Theorem [438] that entropy is a complete
invariant among invertible Bernoulli shifts, which we only briefly introduce
in Section 6.5 because the full proof is beyond the scope of this book. We refer
to [208,209,216,352,456] for further developments, new (more conceptual)
proofs, and more detailed expositions.
How much detail is given for each topic relied on my own taste and
judgment, and if I overstretched the reader and/or my own abilities and
understanding, then so be it. My apologies to the true experts. Thus I
decided to include Li-Yorke chaos but not distributional chaos, some version
of entropy, but no dynamical ζ-functions, no IP-sets, only a few variations
of shadowing and specification (see [461] for a monograph on shadowing),
no higher-dimensional shifts, and no automorphism groups of shift spaces.
Although the first use of symbolic dynamics by Hadamard ([296] in 1898)
was for geodesic flows on modular surfaces, this topic does not appear in
this book; see e.g. [494]. We cover Kakutani-Rokhlin partitions, cutting
and stacking, S-adic transformation, Bratteli-Vershik systems but no graph
covers, although they describe the Cantor system in even further generality.
See [273, 500–502] for their constructions of some of the earliest uses.
Partly, this book is a compendium of bits of knowledge and curiosities
that are scattered over the literature if not on Wikipedia pages. The material
that I present is not equally up to date. For instance, notions such as B-free
shifts (or at least the current state) and amorphic complexity are from the
past decade or even less, whereas in the section on SFTs, the material is all
from before 1990. Topics that are to my knowledge new to textbook and
monograph literature include gap shifts, spacing shifts, power-free shifts,
B-free shifts, amorphic complexity, enumeration systems. The breadth of
topics allows one to see similarities of methods of proof in different subfields
of symbolic dynamics. I hope that my extensive treatment of Bratteli-Vershik
systems and unique ergodicity, as well as my treatment of Sturmian shifts
and Rauzy fractals (redoing and sometimes reproving results of Arnoux),
have some added value. The sections on weak mixing for Bratteli-Vershik
Preface xvii

systems and on Adams’s Theorem of mixing staircase systems, grew out of

topics I set for Master’s Theses for Silvia Radinger and Kathrin Peticzka,

Acknowledgments: In writing this book, in addition to countless articles,

monographs, and survey papers, I had a lot of benefit from various lecture
notes, conference presentations, and online lectures in the recent coronavirus-
dominated situation. It is impossible to list exactly which. But most of all I
would like to thank Lori Alvin, Ana Anušić, Max Auer, Jernej Činč, María
Isabel Cortez, Michel Dekking, Fabien Durand, Robbert Fokkink, Gernot
Greschronik, Maik Gröger, Jane Hawkins, Olena Karpel, Mike Keane, Tom
Kempton, Henna Koivusalo, Cor Kraaikamp, Dominik Kwietniak, Mariusz
Lemańzcyk, Olga Lukina, Kathrin Peticzka, Silvia Radinger, Michel Rigo,
Klaus Schmidt, Dalia Terhesiu, Jörg Thuswaldner, Reem Yassawi, for proof-
reading and/or answering few or many questions, although they didn’t al-
ways know it was for this book. The input of several anonymous referees
and the work of the AMS publication team is also gratefully acknowledged.

Henk Bruin Vienna, November 2, 2022

Chapter 1

First Examples and

General Properties
of Subshifts

Symbolic dynamics is concerned with spaces of (infinite) sequences of sym-

bols. Such sequences can come from the symbolic description of a dynamical
system, but they also have intrinsic interest. Symbol sequences are used to
code messages, digitally process sound and images, and as the objects that
computers process. The “dynamics”, usually, but not exclusively, refers to a
transformation σ of such sequences in the form of a shift by one unit to the
left. For example,

σ(10011 . . . ) = 0011 . . . for one-sided sequences,
σ(011.10011 . . . ) = 0111.0011 . . . for two-sided sequences.

That is, for a right-infinite sequence, the first symbol disappears, and all
other symbols move a place to the left. For a bi-infinite sequence, the dot
that indicates position zero moves one place to the right. A closed σ-invariant
subset of sequences over some fixed set of symbols (the alphabet), combined
with this left-shift operation σ, is called a subshift. In this chapter, we give
the basic notions and examples of subshifts and discuss the number and
frequency of their subwords.

1.1. Symbol Sequences and Subshifts

Let A be a finite or countable alphabet of letters. Usually A = {0, . . . ,
N − 1} or {0, 1, 2, . . . } but we can use other letters and symbols too. We are

2 1. First Examples and General Properties of Subshifts

interested in the space of infinite or bi-infinite sequences of letters:

Σ = AN or Z
= {x = (xi )i∈N or Z : xi ∈ A}.
Such symbol strings find applications in data-transmission and storage, lin-
guistics, theoretic computer science, and also dynamical systems (symbolic
dynamics). A finite string of letters, say x1 · · · xn ∈ An , is called a word or
block. A k-word is a word of k letters and  is the empty word (of length
0). We use the notation Ak = {k-words in Σ} and
A∗ = {words of any finite length in Σ including the empty word}.
Given a subshift (X, σ), a finite word u appearing in some x ∈ X is sometimes
called a factor1 of x. If u is concatenated as u = vw, then v is a prefix and
w a suffix of u.
A cylinder set2 is any set of the form
[ek · · · el ] = {x ∈ Σ : xi = ei for k ≤ i ≤ l}.
Intersections of cylinder sets are again cylinder sets. The cylinder sets form
a basis of the product topology on Σ; i.e. a set is open in the product
topology precisely if it can be written as a union of cylinder sets.
Note that a cylinder set is both open and closed (because it is the com-
plement of the union of complementary cylinders). Sets that are both open
and closed are called clopen.
Lemma 1.1. If 2 ≤ #A < ∞, then Σ = AN or Z is a Cantor set (that is, Σ
is (i) compact, (ii) has no isolated points, and (iii) its connected components
are points). If #A = ∞, then Σ is not compact, but (ii) and (iii) still hold.

Proof. (i) Set A = {0, 1, . . . , N − 1} with discrete topology. Clearly A is

compact, because it is finite. Compactness of Σ then follows from Tychonov’s
(ii) No point is isolated, because, for arbitrary x ∈ Σ, the sequence xn
defined as xni = xi if i = n and xnn = xn + 1 (mod 1) converges to x.
(iii) If x = y, set n = min{|i| : xi = y}; then Z := {x ∈ X : xi =
xi for all |i| ≤ n} and X \ Z are two clopen disjoint non-empty sets whose
union is X. Thus x and y cannot belong to the same connected component.
If #A = ∞, then the collection {[a]}a∈A is an open cover without finite
subcover, so Σ is not compact. 

1 We will rather not use this word, because of possible confusion with the factor of a subshift

(= image under a sliding block code; see Section 1.4).

2 In greater generality, if X is a topological space and n ∈ N ∪ {∞}, then every set of the

form A × X n−k for A ⊂ X k is called a cylinder set. If X = R, n = 3, and A is a circle in R2 , then

A × R is indeed a geometrical cylinder, stretching infinitely far in the z-direction.
1.1. Symbol Sequences and Subshifts 3

Shift spaces with product topology are metrizable. One of the usual3
metrics that generates the product topology is

0 if x = y or
(1.1) d(x, y) = −m
2 for m = sup{n ≥ 0 : xi = yi for all |i| < n};
so in particular d(x, y) = 1 if x0 = y0 , and diam(Σ) = 1. If (xk )k∈N is a
sequence of sequences such that xk → x, then there is k0 ∈ N such that
d(xk , x) < 2−m for every k ≥ k0 . The definition of the metric d implies
that xki = xi for all |i| ≤ m. In other words, xk → x means that xk[a,b] is
eventually equal to x[a,b] on every finite window [a, b].
The shift map or left-shift σ : Σ → Σ, defined as
σ(x)i = xi+1 , for i ∈ N or Z,
is invertible on AZ (with inverse σ −1 (x)i = xi−1 ) but non-invertible on AN .
We can use the ε-δ definition of continuity for δ = ε/2 to show that σ is
uniformly continuous. This is even true if #A = ∞.
Definition 1.2. A pair (X, σ) with X ⊂ Σ and σ the left-shift is a subshift
(often called simply shift) if X is closed (in product topology) and strongly
shift-invariant; i.e. σ(X) = X. If σ is invertible, then we also stipulate that
σ −1 (X) = X. For example, if Σ = {0, 1}Z and
x = . . . 000.111111 . . . ,
then X = {σ n (x) : n ≥ 0} is not a subshift, because x ∈ X but σ −1 (x) ∈
/ X.
In Examples 1.3–1.6, we use A = {0, 1}.
Example 1.3. The set X = {x ∈ Σ : xi = 1 ⇒ xi+1 = 0} is called the
Fibonacci shift4 . It disallows sequences with two consecutive 1’s. This Fi-
bonacci shift is an example of a subshift of finite type (SFT); see Section 3.1.
The collection X can be represented by a graph in multiple ways:
• X is the collection of labels of infinite paths through the vertex-
labeled graph in Figure 1.1 (left). Labels are given to the vertices
of the graph, and no label is repeated.
• X is the collection of labels of infinite paths through the edge-
labeled graph in Figure 1.1 (right). Labels are given to the arrows
of the graph, and labels can be repeated (different arrows with the
same label can occur).

3 Other metrics are d (x, y) = i |xi − yi |2−|i| or d (x, y) = m
with m as in (1.1). They are
equivalent to d(x, y): the former in the sense that there is some C such that C 1
d(x, y) ≤ d (x, y) ≤
Cd(x, y) for all x, y ∈ Σ, the latter in the weaker sense that the embedding i : (Σ, d ) → (Σ, d) as
well as its inverse i−1 are uniformly continuous. In either case, they generate the same topology.
4 Warning: There is also a Fibonacci substitution shift = Fibonacci Sturmian shift (see Ex-

ample 4.3), which is different from this one.

4 1. First Examples and General Properties of Subshifts

0 1 0

Figure 1.1. Transition graphs: vertex-labeled and edge-labeled.

Example 1.4. Xeven ⊂ {0, 1}N is the collection of infinite sequences in which
the 1’s appear only in blocks of even length and also 1111 · · · ∈ X. We call
Xeven the even shift. Similarly, the odd shift Xodd is the collection of
infinite sequences in which the 0’s appear only in blocks of odd length and
also 0000 · · · ∈ X; see Figure 1.2.

1 1
0 1

0 0 1

Xodd ∩ Xeven

0 0 1
1 0
1 0 1
Xodd Xeven

Figure 1.2. Edge-labeled graphs for Xodd , Xeven , and Xodd ∩ Xeven .

Example 1.5. Let S be a non-empty subset of N. Let X ⊂ {0, 1}Z be the

collection of sequences in which the appearance of two consecutive 1’s occur
always s positions apart for some s ∈ S. Hence, sequences in X have the

x = . . . 10s−1 −1 10s0 −1 10s1 −1 10s2 −1 1 . . .

where si ∈ S for each i ∈ Z. If #S = ∞, then allowed sequence can also end

and/or start with 0∞ . This space is called the S-gap shift; see Section 3.7.
For S = {2, 3, 4, . . . } we get the Fibonacci SFT, and for S = {1, 2 . . . } we
get the Fibonacci SFT with symbols 0 and 1 interchanged.
1.2. Word-Complexity 5

Example 1.6. The Thue-Morse substitution5 χTM : {0, 1} → {0, 1}∗ is

a special substituion, see Section 4.2, defined by

0 → 01,
χTM :
1 → 10
and extended on longer words by concatenation. It has two fixed points:
ρ0 = 01 10 1001 10010110 1001011001101001 . . . ,
ρ1 = 10 01 0110 01101001 0110100110010110 . . . .
These sequences make their appearance in many settings in combinatorics
and elsewhere; cf. [19, 561]. For instance, the n-th entry of ρ0 (where we
start counting at n = 0) is the parity of the number of 1’s in the binary
expansion of n. The Thue-Morse sequence ρi can be defined by the relation
ρi0 = i, ρi2n = ρin , and ρi2n+1 = 1 − ρin . Also, if we have a sequence (Pk )k≥0
of decreasing quality (e.g. rugby players) which we want to divide over two
teams T0 and T1 , so that the teams are as close in strength as possible, then
we assign Pk to team Ti if i is the k-th
 digit of ρ0 (or equivalently, of ρ1 ).
Also the series n≥1 ρn 2 0 −n = 1 − n≥1 ρn has been proved to sum to a

transcendental number; see e.g. [20, Theorem 13.4.2].

Example 1.7. The alphabet A consists of brackets (, ), [, ] and the allowed
words are those (that can be extended to words) consisting of brackets that
are properly paired and unlinked. So [ ( [ ] ) ] and ( ( ) [ ] ) are allowed, but
[ ( ] and ( [ ) ] are not. The subshift (X, σ) of which these are the allowed
subwords is called the Dyck shift; see Section 3.10.

1.2. Word-Complexity
Definition 1.8. Given a subshift X, the collection
L(X) = {words of any finite length in X}
is called the language of X. We use the notation Ln (X) for all the words
in the language of length n.
Definition 1.9. The function p := pX : N → N defined by p(n) = #Ln (X)
is called the word-complexity of X.
Example 1.10. For the Fibonacci SFT of Example 1.3, let
Fn = #{w ∈ Ln (X) : wn = 0}.
Then F1 = 1, F2 = 2, and Fn+1 = Fn + Fn−1 for n ≥ 3 because Fn
is the cardinality of the set of n + 1-words ending in 00 and Fn−1 is the
5 After the Norwegian mathematician Axel Thue (1863–1922) and the American Marston

Morse (1892–1977), but the corresponding sequence was used before by the French mathematician
Eugène Prouhet (1817–1867), a student of Sturm.
6 1. First Examples and General Properties of Subshifts

cardinality of the set of n + 1-words ending in 010. Therefore the Fn ’s

are the Fibonacci numbers. The same argument gives p(1) = 2 = F2 and
p(n) = Fn + Fn−1 = Fn+1 for n ≥ 2.

1.2.1. Sublinear and Polynomial Complexity. We start with some ter-

minology and a useful proposition.
Definition 1.11. We call a word u ∈ Ln (X) over the alphabet A = {0, 1}
• left-special if both 0u and 1u belong to L(X);
• right-special if both u0 and u1 belong to L(X);
• bi-special if u is both left-special and right-special.
Note, however, that there are different types of bi-special words u depending
on how many of the words 0u0, 0u1, 1u0, and 1u1 are allowed. If only one
choice of 0u or 1u is right-special and only one choice of u0 and u1 is left-
special, then u is a regular bi-special word. For larger alphabets, we
can formulate similar definitions and, naturally, there are more types of
left/right/bi-special words.

p(n + 1) − p(n) = #{left-special words of length n}
= #{right-special words of length n}.
The following result goes back to Morse & Hedlund [425].
Proposition 1.12. If the word-complexity of a subshift (X, σ) satisfies
p(n) ≤ n for some n, then (X, σ) consists of finitely many periodic sequences.

Proof. If p(1) = 1, then X = {a∞ } is obviously periodic. So assume p(1) ≥

2. Since p is non-decreasing, the assumption of this proposition implies that
there is a minimal n such that p(n − 1) = p(n) = n. Hence there are no
right-special words of length n − 1. Start with a word u ∈ Ln−1 (X); there
is only one way to extend it to the right by one letter, say to ua. Then the
n − 1-suffix of ua can also be extended to the right by one letter in only
one way. Continue this way, until, after at most p(n) = n steps, we obtain
an n-suffix that we have seen before. All strings need to be extendible to
the left in some allowed way (otherwise σ(X) = X). When extending u to
the left symbol by symbol, we need to arrive at the same periodic pattern,
because otherwise p(n + 1) > n + 1. Therefore X consists of (at most n)
periodic sequences. 

This proposition shows that the minimal complexity of interest is p(n) =

n + 1, because if p(n) ≤ n for some n, then X consists of at most n periodic
sequences. We say that (X, σ) is of sublinear complexity if there is a
1.2. Word-Complexity 7

constant C such that p(n) ≤ Cn. Sturmian sequences (see Section 4.3)
have p(n) = n + 1; in fact all recurrent words with this word-complexity are
Sturmian. There are further possibilities for non-recurrent subshifts. The
x = . . . 000.10000 . . . and y = 00001111.00000 . . .
both have p(n) = n+1. They are not uniformly recurrent, but asymptotically
fixed for n → ±∞. Ormes & Pavlov [435, Theorems 1.2 & 1.3] showed
that for non-recurrent shifts (X, σ) that are not asymptotically periodic in
both directions, lim inf n p(n)/n ≥ 32 and that this bound is sharp, as is
demonstrated by
z = 0000.10n0 10n1 10n2 10n3 1 . . .
for a carefully chosen increasing sequence of gaps (ni )i≥1 . In fact, given
any non-decreasing function g : N → N that tends to infinity, there is x ∈ X
such that p{x} (n) := #{w is subword of x : |w|} = n < 32 n+g(n). In further
detail, if a transitive6 shift (X, σ) with a recurrent point contains m minimal
subsystems, of which m∞ are infinite, then
lim sup pX (n) − (m + m∞ + 1)n = ∞, lim inf pX (n) − (m + m∞ )n = ∞,
n→∞ n→∞

and these bounds are sharp. The second estimate holds also without the
existence of a recurrent point. See [230], specifically Theorems 1.2 and 1.3.
Symbolic spaces associated with interval exchanges transformations on
k intervals have p(n) = (d − 1)n + 1; see Proposition 4.80. The Chacon
substitution shift and primitive Chacon substitution shift (see Example 1.27)
have word-complexity p(n) = 2n−1 (for n ≥ 2) and p(n) = 2n+1; see [243].
For many subshifts, pX (n)/n is bounded in n but hard to compute exactly;
often limn p(n)/n doesn’t exist. For instance, the word-complexity of the
Thue-Morse shift (i.e. the closure {σ n (ρTM ) : n ∈ N0 } of Example 1.6) is

3 · 2m + 4r if 0 ≤ r < 2m−1 ,
(1.2) p(n) =
4 · 2m + 2r if 2m−1 ≤ r < 2m ,
where n = 2m + r + 1; see [115, 406]. In [129], the word-complexity of
certain (Fibonacci-like) unimodal restrictions to the critical ω-limit set are
The following curious result is due to Heinis; see [150, 311].
Proposition 1.13. If limn pX (n)/n exists and is finite, then it has to be an

All substitution shifts, in fact all linearly recurrent shifts, have sublinear
complexity; see Theorem 4.4.
6 See Definition 1.18 below.
8 1. First Examples and General Properties of Subshifts

The polynomial growth rate is defined as r = limn loglogp(n)n . Naturally,

linear complexity implies r = 1, but every r ∈ {0} ∪ [1, ∞] is possible.
Subshifts with polynomial growth rate r > 1 are less studied, but for example
symbolic spaces for polygonal billiards on d-dimensional billiard tables can
have polynomial growth rate r = d.

1.2.2. Exponential Complexity. Anticipating the definition for general

dynamical systems in Section 2.4, for subshifts, the topological entropy is
the exponential growth rate of the word-complexity:
(1.3) htop (σ) = lim log pX (n).
n→∞ n

To show that the limit in (1.3) exists, we need one more notion and one
well-known lemma.
Definition 1.14. We call a real-valued sequence (an )n≥1 subadditive if
am+n ≤ am + an for all m, n ≥ 1.
Analogously, (an )n≥1 is superadditive if am+n ≥ am + an for all m, n ∈ N.
Lemma 1.15 (Fekete’s Subadditive Lemma). If (an )n≥1 is subadditive, then
limn ann = inf r≥1 arr (possily −∞). Analogously, if (an )n≥1 is superadditive,
then limn ann = supr≥1 arr (possily −∞).

Proof. Every integer n can be written as n = i · r + j for 0 ≤ j < r.

an ai·r+j iar + aj ar
lim sup = lim sup ≤ lim sup = .
n→∞ n i→∞ i · r + j i→∞ i · r + j r
This holds for all r ∈ N, so we obtain
ar an an ar
inf ≤ lim inf ≤ lim sup ≤ inf ,
r∈N r n→∞ n n→∞ n r∈N r
as required. The proof for superadditive sequences goes likewise. 
Remark 1.16. A positive sequence (an )n∈N is submultiplicative if am+n ≤
am an (and supermultiplicative if am+n ≥ am an ). By taking logarithms,
we can turn a sub/supermultiplicative sequence into a sub/superadditive
one, and this suffices for our purposes.

We devote separate chapters to subshifts of positive and subshifts of zero

entropy, because most7 tend to have different topological properties such as
topological mixing, existence and number of periodic orbits, shadowing; see
7 At least most shifts we encounter in this book, but it is not a strict rule. For example,

Petersen’s shifts mentioned below Theorem 2.77 has zero entropy and is topologically mixing,
while Grillenberger’s [287] construction gives minimal shifts of positive entropy (and therefore
lacking periodic orbits), with further examples among Toeplitz shifts (see Theorem 4.94) and
B-free shifts (Section 4.6).
1.3. Transitive and Synchronized Subshifts 9

Chapter 2. The maximal entropy of a subshift on N letters is log N , and

this is achieved by the full shift ({0, . . . , N − 1}N , σ). One can ask whether
all intermediate values between 0 and log N can be achieved as topological
entropy for some subshift. As we shall see later, this is not true for the
class of subshift of finite type or the sofic shifts, because the entropy is then
equal to the logarithm of the leading eigenvalue of some integer matrix, so
logarithms of algebraic numbers and, in fact Perron numbers; see [397] and
(the text below) Definition 8.4.
On the other hand, the topological entropy of β-shifts (Xβ , σ) can take
any non-negative value ≥ 0, because htop (Xβ ) = log β. Also within the class
of gap shift you can achieve every value of the entropy, as can be derived
from Theorem 3.114. Some specific constructions of subshifts of a chosen
entropy can be found among spacing shifts; see [380] and Section 3.8.
Remark 1.17. For many subshifts in AN or Z , the topological entropy can
be computed exactly, but not so for subshift in AZ , i.e. cellular automata.

Even for the simplest direct generalization of the Fibonacci SFT, namely
0-1-patterns on Z2 where no two 1’s occur directly next to each other (hor-
izontally or vertically), the entropy limm,n→∞ mn log px (m, n) is unknown.
There are however numerical approximations (e.g. for this example, the en-
tropy equals 0.5878116 . . . which these digits certainly correct; see [251])
and characterizations of which values can occur; see e.g. [259, 260, 289, 313,
314, 399].

1.3. Transitive and Synchronized Subshifts

The following definition expresses that all parts of a subshift connect to each
Definition 1.18. A subshift X is transitive or irreducible if for every
u, w ∈ L(X), there is v ∈ L(X) such that uvw ∈ L(X).

This definition does not automatically produce periodic sequences, be-

cause also if u = w, so uvu ∈ L(X), then it doesn’t follow that uvuvu ∈
Definition 1.19. A subshift (X, σ) is called synchronized if it is transi-
tive and there is a word v ∈ L(X) (called (intrinsically) synchronizing
word8 ) such that uv, vw ∈ L(X) implies uvw ∈ L(X). In other words, the
appearance of v cancels the influence of the past.
Theorem 1.20. A synchronized shift (X, σ) has a dense set of periodic
points. If X is not periodic itself, then the entropy htop (X, σ) > 0.
8 Kitchens in [364] calls it a magic Markov word.
10 1. First Examples and General Properties of Subshifts

Proof. Let v be a synchronizing word and let x ∈ L(X) be arbitrary. Since

a synchronized X is, by definition, transitive, there are words u, w ∈ L(X)
such that xuv ∈ L(X) and vwx ∈ L(X). Now the infinite periodic word
(xuvw)∞ belongs to X. Since x ∈ L(X) was arbitrary, denseness of periodic
words follows.
Next use transitivity again to find distinct words u, u , v ∈ L(X) such
that vuv, vu v ∈ L(X). Let X  be the subshift constructed by free concate-
nations of vu and vu ; clearly X  is a subshift of X, and for N = max{|v| +
|u|, |v| + |u |} we find pX  (nN ) ≥ 2n . Hence, htop (X  , σ) > N1 log 2. 
Example 1.21. The Fibonacci SFT (see Example 1.3) has synchronizing
word 0. In this case, every 1 is preceded and succeeded by a 0. If we swap
0’s and 1’s, then we obtain the S-gap shift with gap
√ sizes 1 and 2. Hence
htop (X, σ) = log λ where λ−1 + λ−2 = 1, so λ = 1+2 5 . This is in agreement
with Example 1.10.

1.4. Sliding Block Codes

Definition 1.22 (Sliding Block Code). Let A and à be alphabets. A map
π : AZ → ÃZ is called a sliding block code or local rule of window
size 2N + 1 if there is a function f : A2N +1 → Ã such that π(x)i =
f (xi−N · · · xi+N ).
In other words, we have a window9 of width 2N + 1 put on the sequence
x. If it is centered at position i, then the recoded word y = π(x) will have at
position i the f -image of what is visible in the window. After that we slide
the window to the next position and repeat.
Theorem 1.23 (Curtis–Hedlund–Lyndon10 ). Let X and Y be subshifts over
finite alphabets A and Ã, respectively. A continuous map π : X → Y com-
mutes with the shift (i.e. σ ◦ π = π ◦ σ) if and only if π is a sliding block

If π : X → Y is a homeomorphism, then we call (X, σ) and (Y, σ)


Proof. First assume that π is continuous and commutes with the shift. For
each a ∈ Ã, the cylinder [a] = {y ∈ Y : y0 = a} is clopen, so Va := π −1 ([a])
is clopen too. Since Va is open, it can be written as the union of cylinders,
and since Va is closed (andhence compact) it can be written as the finite
union of cylinders: Va = ri=1 a
Ua,i . Let N be so large that every Ua,i is
9 Sometime the window can have memory and anticipation of different lengths, so the window

would be [−m, n], but calling their maximum N covers all cases.
10 Curtis and Lyndon were working for the military at the time, so their work was “classified”,

and the paper was published under Hedlund’s name only, [308].
1.4. Sliding Block Codes 11

the union of 2N + 1-cylinders Ua,i,j ; i.e. each Ua,i,j is determined by a word

x−N · · · xN . This makes 2N +1 a sufficient window size and there is a function
f : A2N +1 → Ã such that π(x)0 = f (x−N · · · xN ). By shift-invariance,
π(x)i = f (xi−N · · · xi+N ) for all i ∈ Z.
Conversely, assume that π is a sliding block code of window size11 2N +1.
Take ε = 2−M > 0 arbitrary and δ = ε2−N . If d(x, y) < δ, then xi = yi for
|i| ≤ M + N . By the construction of the sliding block code, π(x)i = π(y)i
for all |i| ≤ M . Therefore d(π(x), π(y)) < ε, so π is continuous (in fact
uniformly continuous). 
Exercise 1.24. Give a surjective sliding block code between the Fibonacci
SFT and the even subshift (see Examples 1.3 and 1.4).
Corollary 1.25. If (X, σ) and (Y, σ) are conjugate shifts, then there is N
such that pX (k − N ) ≤ pY (k) ≤ pX (k + N ) for k > N .

Proof. Let 2N + 1 be the maximal window size among the sliding block
codes from X to Y and from Y to X. Then every k-word in Y is obtained
from an N + k-word in X, so pY (k) ≤ pX (N + k). Replacing the role of X
and Y gives the other inequality. 
Exercise 1.26. If ψ : X → Y is an onto sliding block code which is k-to-one
for some fixed k, show that htop (X, σ) = htop (Y, σ).
Example 1.27. The following substitutions (see Section 4.2) are called the
Chacon substitution and primitive Chacon substitution:

0 → 0010, ⎨0 → 0021,
(1.4) χchac : and χChac : 1 → 021,
1→1 ⎪

2 → 21,
with fixed points
ρchac = 0010 0010 1 0010 0010001010 1 0010 . . . ,
ρChac = 0021 0021 21 021 0021002121021 . . . .
They can be transformed into each other using the sliding block code
⎧ ⎧

⎨00a → 0, ⎪
⎨0 → 0,
f : 10a → 1, a ∈ {0, 1} and f : 1 → 0,

⎩ ⎪

1 → 2, 2 → 1,
and this extends to the shift orbit closures
Xchac = {σ n (ρchac ) : n ≥ 0} and XChac = {σ n (ρChac ) : n ≥ 0}.
11 If (X, σ) is a one-sided subshift, with window size [0, N ] (so no memory, only anticipation),

then this part of the proof still works. The first part of the proof fails: one must first extend
(X, σ) to a two-sided shift before the Curtis–Hedlund–Lyndon Theorem can be applied in full.
12 1. First Examples and General Properties of Subshifts

Therefore, these substitution shifts are topologically conjugate, although the

word-complexities are different: pXchac (1) = 2, pXchac (n) = 2n − 1 for n ≥ 2
and pXChac (n) = 2n + 1 for n ≥ 1; see [243].

01 11
10 11

Figure 1.3. The edge-labeled transition graph of the 2-block even shift.

Each subshift (X, σ) over an alphabet A is conjugate to an -block shift,

where the alphabet à ⊂ A consists of the words in L (X) and a, b ∈ à can
only follow each other if the  − 1-suffix of a coincides with the  − 1-prefix
of b. For instance, if (Xeven , σ) is the even shift, then à = {00, 01, 10, 11}
and the edge-labeled transition graph is given in Figure 1.3. Note that to
recover the coding of paths in the original shift, we use only the first letters
of the codes at the edges.
Taking a block shift generally doesn’t change the nature of the shift (SFTs
remain SFTs, substitution shifts remain substitution shifts, see Section 6.3.2,
etc.). Block shifts can be used to shrink the window size of sliding block
codes; see [398, Proposition 1.5.12].
Proposition 1.28. If π is a sliding block code between X and Y of window
size 2N + 1, then there is a sliding block code π̃ (of window size 1) between
the 2N + 1 block shift X̃ of X and Y .

Proof. We do the proof for invertible shifts; the one-sided shifts work as
well, but then we cannot allow a memory in the sliding block code, only
anticipation. Let φ : X → X̃ be the sliding block code that recodes the
2N + 1-blocks in X into the letters of Ã; i.e. φ(x)i = f (xi−N · · · xi+N ).
Then π̃ = π ◦ φ−1 is the required sliding block code. 

1.5. Word-Frequencies and Shift-Invariant Measures

In addition to the number of words, we can also study the frequency of
words w appearing inside infinite sequences:
(1.5) fw (x) = lim #{0 ≤ i < n : xi · · · xi+|w|−1 = w}.
n→∞ n
The question of whether the limit exists and to what extent it depends on x is
answered by Birkhoff’s Ergodic Theorem 6.13. For this we need a measure μ
1.5. Word-Frequencies and Shift-Invariant Measures 13

that assigns a number to every cylinder set, according to the following rules:

(i) 0 ≤ μ([w]) ≤ 1 for every cylinder [w];

(ii) μ(∅) = 0, μ(X) = 1;
(iii) μ( [wi ]) = μ([wi ]) for all disjoint cylinders [w1 ], [w2 ], . . . .
i i

The Kolmogorov Extension Theorem (see e.g. [56, Section 21.10]) implies
that μ can be extended uniquely for every set in the σ-algebra B generated
by the cylinder sets. Thus, if x ∈ X is such that fw (x) exists for every
w ∈ L(X), then there is a shift-invariant probability measure μ such that
μ([w]) = fw (x) for all w ∈ L(X).

Remark 1.29. The Kolmogorov Extension Theorem is about extending

probability measures μn on finite Cartesian products X n (equipped with an
n-fold-product σ-algebra) to a measure on the infinite product X ∞ (equipped
with an infinite-product σ-algebra). That is, if μn+1 (A × X) = μn (A) for
every n ∈ N and measurable set A ⊂ X n , then there is a unique probability
measure μ on X ∞ such that μ(A × X ∞ ) = μn (A) for every n ∈ N and
measurable set A ⊂ X n .
This carries over to indicator functions. Linear combinations of indicator
functions 1A with A ⊂ X n , n ∈ N, lie dense in L1 (μ); i.e. for every ψ ∈ L1 (μ)
and ε > 0 there is Nand there are finitely many sets Ak ⊂ X N and ak ∈ R
such that X ∞ |ψ − k ak 1Ak | dμ < ε.

Definition 1.30. A measure μ on a subshift (X, σ) is called invariant or

shift-invariant if μ(B) = μ(σ −1 B) for all B ∈ B.
A measure is called ergodic if σ −1 (A) = A mod μ for some A ∈ B
implies that μ(A) = 0 or μ(Ac ) = 0. That is, the only measurable shift-
invariant sets are nullsets or the whole space up to a nullset.

Birkhoff’s Ergodic Theorem 6.13 implies that if μ is an ergodic shift-

invariant probability measure on (X, σ), then for μ-a.e. x ∈ X, fw (x) =
μ([w]) for all w ∈ L(X). However, if fw (x) exists for every w ∈ L(X), the
associated measure need not be ergodic. For example, the sequence

x = 1001110000111110000001111111 · · · 0n 1n+1 · · ·
is associated to a combination of Dirac measures 2 (δ0
∞ + δ1∞ ), and this
measure is clearly not ergodic.

Remark 1.31. Regardless of whether μ is ergodic or not, we call it a generic

measure if there is a point x ∈ X such that the frequency fw (x) = μ([w])
for all w ∈ L(X).
14 1. First Examples and General Properties of Subshifts

Definition 1.32. Let A = {1, 2, . . . , d} and X = AN or Z be the full shift

space. Let p = (p1 , . . . , pd ) be a probability vector; i.e. pi ≥ 0 and p1 + · · · +
pd = 1. The product measure that assigns to every cylinder set
μp ([xk · · · xl ]) = pxk pxk+1 · · · pxl
is called the p-Bernoulli measure. The measure can be extended to the
Borel σ-algebra by means of the Kolmogorov Extension Theorem. Each
p-Bernoulli measure is shift-invariant.

Bernoulli measures12 are a basic tool in probability theory. For example,

encode a sequence of coin-flips by, say, xi = 0 if the i-th flip gives a “head”,
and xi = 1 if the i-th flip gives a “tail”. This gives a sequence x ∈ {0, 1}N . If
the coin has a bias, say “head” come up with probability p > 12 and “tail” with
probability q = 1 − p < 12 , then the probability of a word can be computed
by multiplying probabilities; e.g. the probability P(x1 x2 x3 x4 = 0010) = p3 q.

Definition 1.33. A subshift (X, σ) is uniquely ergodic if it admits only

one invariant probability measure. If (X, σ) is both uniquely ergodic and
minimal, it is called strictly ergodic. (This should not be confused with
intrinsically ergodic which means that there is a unique measure of max-
imal entropy; see Definition 6.70.)

The full shift is obviously not uniquely ergodic; it has for instance a
Bernoulli measure for every probability vector p and neither are SFTs, sofic
shifts, or β-shifts (which are, in fact, intrinsically ergodic). The Thue-Morse
shift on the other hand is uniquely ergodic. Clearly, unique ergodicity implies
intrinsic ergodicity, but not the other way around. It follows from Oxtoby’s
Theorem 6.20 that a recurrent subshift (X, σ) is uniquely ergodic if and only
if fw (x) exists and is the same for every x ∈ X. In this case, the convergence
in the limit (1.5) is uniform in x.

1.6. Symbolic Itineraries

An important application of symbol sequences is to use them to represent
trajectories of dynamical systems (see Section 2.1 for an introduction to
dynamical systems). It was probably Hadamard who first used this idea in
his studies of geodesic flows [296]. Over 40 years later, Morse & Hedlund’s
[425] wrote the first monograph on symbolic dynamics. If T : X → X is some
map on a topological space, denote the n-fold compositions by T n = T ◦· · ·◦T
(and T −n is the n-fold composition of T inv if T is invertible). Symbolic

12 Named after Jacob Bernoulli, one of the mathematicians’ family originating from Basel

who wrote the book “Ars conjectandi”, one of the first books on probability theory.
1.6. Symbolic Itineraries 15

0 1 0 1 0 1
α 1/β 1/2

Figure 1.4. A circle rotation Rα (x) = x + α (mod 1), a β-

transformation Tβ (x) = βx (mod 1), and the quadratic map f4 (x) =
4x(1 − x).

dynamics emerges from the dynamical system (X, T ) by coding the T -orbits
of the points x ∈ X. To this end, for a finite or countable alphabet A, we
let J = {Ja }a∈A be a partition of X. Then to each x ∈ X we assign an
itinerary i(x) ∈ AN0 :
in (x) = a if T n (x) ∈ Ja .
If T is invertible, then we can extend itineraries to sequences in AZ . It is
clear that i◦T (x) = σ◦i(x). Therefore, i(X) is σ-invariant and if T : X → X
is onto, then σ(i(X)) = i(X). In general, however, i(X) is not closed, so
we need to take the closure before it can be called a subshift. Using this
subshift, we can often show the abundance of different trajectories (periodic
or with other properties) of the original system (X, T ).
Example 1.34. Let X be the closure of the collection of symbolic itineraries
of a circle rotation Rα : S1 → S1 over angle α ∈ [0, 1] \ Q; see Figure 1.4
(left). We use the partition J = {J0 , J1 } with J0 = [0, α) and J1 = [α, 1).
Hence, if y ∈ S1 and n ∈ Z, then

0 if Rn (y) ∈ [0, α),
i(y)n =
1 if Rn (y) ∈ [α, 1).
Slightly different coding comes from the partition {(0, α], (α, 1]}, but the
closure of i(S1 ) is the same for both partitions. The resulting shift is called
a Sturmian shift; see Definition 4.60.
Example 1.35. Consider the β-transformation Tβ : [0, 1] → [0, 1], Tβ (x) =
βx mod 1 (see Figure 1.4 (middle)), and i(x)n = a if Tβn (x) ∈ Ja := [ na , a+1
β ).
The closure of i([0, 1]) is called a β-shift; see Section 3.5.

Example 1.36. Let X = [0, 1] and T (x) = f4 (x) = 4x(1−x); see Figure 1.4
(right). Let J0 = [0, 12 ] and J1 = ( 12 , 1]. Then i(X) is not closed, because
16 1. First Examples and General Properties of Subshifts

there is no x ∈ [0, 1] such that i(x) = 1100000 . . . , while 1100000 · · · =

limx 1 i(x). Naturally, redefining the partition to J0 = [0, 12 ) and J1 = [ 12 , 1]
doesn’t help, because then there is no x ∈ [0, 1] such that i(x) = 0100000 . . . ,
while 0100000 · · · = limx 1 i(x). This shows that we have to take the closure
to obtain a subshift.
There are other ways of coding in the literature to obtain a subshift:
• Assign a different symbol (often ∗ or C) to 12 . That is, using the
partition J0 = [0, 12 ), J∗ = { 12 } and J1 = ( 12 , 1]. This resolves the
“ambiguity” about which symbol to give to 12 , but it doesn’t make
the shift space closed.
• Assign the two symbols to 12 , so J0 = [0, 12 ] and J1 = [ 12 , 1] are no
longer a partition but have 12 in common. Therefore 12 will have two
itineraries and so will every point in the backward orbit of 12 . With
all these extra itineraries, i(X) becomes closed. But this doesn’t
work in all cases; see Exercise 1.37.
• Take a quotient space i(X)/ ∼ where in this case x ∼ y if there is
n ∈ N0 such that

xn xn+1 xn+2 xn+3 xn+4 · · · = 11000 . . . ,
x0 · · · xn−1 = y0 · · · yn−1 and
yn yn+1 yn+2 yn+3 yn+4 · · · = 01000 . . .
or vice versa. This quotient space adopts the quotient topology (so
i(X)/ ∼ is not a Cantor set anymore), and it turns the coding map
i : [0, 1] → {0, 1}N0 / ∼ into a genuine homeomorphism.
Exercise 1.37. Let a=3.83187405528332 . . . and T (x)=fa (x)=ax(1 − x).
For this parameter, T 3 ( 12 ) = 12 . Let J  = {[0, 12 ], ( 12 , 1]} and J = {[0, 12 ], [ 12 , 1]},
so 12 gets two symbols. Let Σ = i(X) w.r.t. J  and Σ = i(X) w.r.t. J . Show
that Σ = Σ.

From now on, assume that X is a compact metric space without isolated
points. We will now discuss the properties of the coding map i itself. First of
all, for i to be continuous, it is crucial that T |Ja is continuous on each element
Ja ∈ J . But this is not enough: if x is a common boundary of two elements
of J , then (no matter how you assign the symbol to x in Example 1.36), for
each neighborhood U  x, diam(i(U )) = 1, so continuity fails at x. It is only
by using quotient spaces of i(X) (so changing the topology of i(X)) that we
can make i continuous. Normally, we choose to live with the discontinuity,
because it affects only a few points:
Lemma 1.38. Let ∂J denote the collection of common boundary points of
different elements in a partition J . If orb(x) ∩ ∂J = ∅ for all J ∈ J , then
the coding map i : X → AN0 or AZ is continuous at x.
1.6. Symbolic Itineraries 17

Proof. We carry out the proof for invertible maps. Let ε > 0 be arbitrary
and fix N ∈ N such that 2−N < ε. For each n ∈ Z with |n| ≤ N , let
Un  T n (x) be such a small neighborhood that it is contained in a single
partition element Jin (x) . Since orb(x) ∩ ∂J = ∅, this is possible. Then
U := |n|≤N T −n (Un ) is an open neighborhood of x and in (y) = in (x) for
all |n| ≤ N and y ∈ U . Therefore diam(i(U )) ≤ 2−N < ε, and continuity at
x follows. 
Definition 1.39. A transformation T : X → X of a metric space (X, d) is
called expansive if there exists δ > 0 such that for all distinct x, y ∈ X,
there is n ≥ 0 (or n ∈ Z if T is invertible) such that d(T n (x), T n (y)) > δ.
We call δ the expansivity constant.
Every subshift (X, σ) is expansive. Indeed, if x = y, then there is
n ∈ N (or n ∈ Z if (X, σ) is a two-sided shift) such that xn = yn , so
d(σ n (x), σ n (y)) = 1. This makes every δ ∈ (0, 1) an expansivity constant.
Lemma 1.40. Suppose that T is a continuous expansive map and injective
on each Ja of some partition J . If the expansivity constant δ >
supa∈A diam(Ja ), then the coding map i : X → AN0 or Z is injective.

Proof. Suppose that i is not injective, so there are x = y ∈ X such that

i(x) = i(y). Since T |Ja is injective for each a ∈ A, T n (x) = T n (y) for all
n ∈ Z. By expansiveness, there is n ∈ Z such that d(T n (x), T n (y)) > δ. By
assumption, they cannot lie in the same element of J . Hence x and y cannot
have the same itinerary after all. 

To obtain injectivity of the coding map, it often suffices (but not always;
see Example 1.43 below) that T is expanding on each partition element
Ja . Expanding (and expansion) should not be confused with expansive (and
expansivity) of Definition 1.39.
Definition 1.41. Let T : X → Y be a map between metric spaces. We call
T expanding if there is ρ > 1 such that dY (T (x), T (y)) ≥ ρdX (x, y) for all
x, y ∈ X and locally expanding if there are ε > 0 and ρ > 1 such that
d(T (x), T (y)) ≥ ρd(x, y) for all x, y ∈ Y with d(x, y) < ε.
Proposition 1.42 (Gottschalk & Hedlund [284]). Let T : X → X be a
homeomorphism on a compact metric space (X, d). If T is locally expanding,
then X is finite.
Compact is important. For example T : R → R, x → 2x, would be a
counterexample without the compactness assumption.

Proof. Let ε > 0 and ρ > 1 be as in Definition 1.41. Since T −1 is con-

tinuous and X is compact, there is a δ > 0 such that d(x, y) < δ im-
plies d(T −1 (x), T −1 (y)) < ε. Let {Ui }N
i=1 be a finite open cover of X
18 1. First Examples and General Properties of Subshifts

such that diam(Ui ) < δ. Then {T −1 (Ui )}N i=1 is an open cover of X, and
diam T (Ui ) < ε, so by local expansion, diam T −1 (Ui ) < diam(Ui )/ρ ≤ δ/ρ.

Repeating this argument, we find that {T −n (Ui )}N i=1 is a finite open cover
of X with diam(T −n (Ui )) < δρ−n . Since n is arbitrary, #X ≤ N . 
Example 1.43. In this example, we show that despite T being expanding
on partition elements Ja , a ∈ A, this may still not result in an injective
coding map i : X → AN0 if the diameter of some of the Ji ’s is too big.
Let T : S1 → S1 , x → 2x mod 1, be the doubling map, and let J0 =
and J1 = S1 \ J0 . Clearly T  (x) = 2 for all x ∈ S1 , but T is not
( 14 , 34 )
expanding on the whole of S1 , because for instance d(T ( 14 ), T ( 34 )) = 0 <
1 1 3
2 = d( 4 , 4 ). More importantly, T is not expanding on J0 or J1 either; for
example d(T ( 14 + ε), T ( 34 − ε)) = 4ε < 12 − 2ε = d( 14 + ε, 34 − ε) for each
ε ∈ (0, 121
). The corresponding coding map is not injective. The way to see
this is by noting that the involution S(x) = 1 − x commutes with T and also
preserves each Ja . It follows that i(x) = i(S(x)) for all x ∈ S1 , and only
x = 0 and x = 12 have unique itineraries. For the more general partition
J0b = (b, b + 12 ) and J1b = S1 \ J0b for b ∈ [0, 12 ), see Remark 3.102.
Chapter 2

Topological Dynamics

In essence, symbolic dynamical systems are dynamical systems on a topo-

logical (in fact metric) space and therefore share many of the topological
properties that general dynamical systems can have. In this chapter, we
discuss several of these general topological properties, such as minimality,
entropy, versions of equicontinuity and mathematical chaos, as well as topo-
logical mixing and shadowing properties.

2.1. Basic Notions from Dynamical Systems

A dynamical system is a mathematical description of how a physical sys-
tem evolves in time. It consists of the following:
• A phase space X, usually a metric space, or at least topological
space, describing the state of the system. For example, R2n can
be used to describe the positions and velocities of n point-particles
moving along a line, or R6n for the positions and velocities of n
point-particles moving in R3 .
• A time space, which could be R (for continuous time) or N0 :=
N ∪ {0} (or Z if the dynamical system is time-invertible) if the ob-
servations are only made at discrete time steps. More complicated
(multi-dimensional or group-valued) time can be considered too,
but in this text, time is always discrete: N0 or Z.
• An evolution rule, which for discrete time takes the form of a
transformation T : X → X satisfying:
(1) T 0 (x) = x for all x ∈ X.
(2) T m+n (x) = T m (T n (x)) for all m, n ∈ N0 (or Z) and all x ∈ X.

20 2. Topological Dynamics

This is realized if we let T n be the n-fold composition:

T n (x) = T ◦ T ◦· · · ◦ T
n times

and T −n is the n-fold composition of its inverse transformation if

it exists. If T is continuous, then (X, T ) is called a continuous
dynamical system.

Definition 2.1. Let (X, T ) be a dynamical space on a topological space.

The orbit of x ∈ X is the set

{T n (x) : n ∈ Z} if T is invertible,
orb(x) =
{T n (x) : n ≥ 0} if T is non-invertible.

The set orb+ (x) = {T n (x) : n ≥ 0} is the forward orbit of x. This notation
is useful if T is invertible; if T is non-invertible, then orb+ (x) = orb(x).

Exercise 2.2. Let σ : Σ → Σ be invertible. Is there a difference between

x ∈ orb(x) \ {x} and x ∈ orb+ (x) \ {x}?

We distinguish several types of orbits. Namely, a point x is:

• Periodic if T n (x) = x for some n ≥ 1. The smallest such n is
called the period of x. If the period is 1, then x is called a fixed
• Preperiodic if T m+n (x) = T m (x) for some m, n ∈ N. The minimal
such m, n are called the preperiod and period of x.
• Asymptotically periodic if there is a periodic point y ∈/ orb(x)
such that d(T (x), T (y)) → 0 as n → ∞. The periodic point y is
n n

attracting if it is periodic and has a neighborhood1 U such that

T n (U ) = {y}. If y has a neighborhood U such that T n (U ) =
{y}, then y is repelling.
For example, for the quadratic family with a = 3.83187405528332 . . . as
in Exercise 1.37, the point x = 12 has period 3, and since Qa ( 12 ) = 0, it is
easy to show that 12 is attracting. The two fixed points are 0 and 1 − a1 ; they
are repelling. For the circle rotation Rα , every point is periodic if and only
if α ∈ Q; if α = m/n in lowest terms, then each point x ∈ S1 has period n
and can be called neutral. If α ∈ / Q, then every orbit is dense in S1 .

Definition 2.3. Let (X, T ) be a dynamical space on a topological space.

The ω-limit set of x is the set of accumulation points of its forward orbit.

1 If the space X is one-dimensional, then we can speak of one-sided attracting if there is a

one-sided neighborhood U of y such that T n (U ∪ {y}) = {y}.
2.1. Basic Notions from Dynamical Systems 21

In formula,

ω(x) = T m (x) = {y ∈ X : ∃ ni → ∞, lim T ni (x) = y}.
n∈N m≥n

We call x recurrent if x ∈ ω(x).

Analogously for invertible dynamical systems, the α-limit set of x is the
set of accumulation points of its backward orbit of x:

α(x) = T m (x) = {y ∈ X : ∃ ni → ∞, lim T −ni (x) = y}.
n∈N m≤−n
Definition 2.4. Given a dynamical system (X, T ), a point x ∈ X is called
non-wandering if for every neighborhood U  x there is an n ≥ 1 such
that T −n (U ) ∩ U = ∅. The non-wandering set Ω(T ) is the set of all
non-wandering points.
Recurrent points are always non-wandering, but Ω(T ) can contain non-
recurrent points. In the one-sided full shift, for instance, x = 0111111 · · ·
is not recurrent but is non-wandering. If (X, T ) has a dense orbit, then
Ω(T ) = X.
Definition 2.5. Two dynamical systems (X, f ) and (Y, g) are (topolog-
ically) conjugate if there is a homeomorphism ψ : X → Y such that
ψ ◦ f = g ◦ ψ.
If ψ ◦ f = g ◦ ψ and ψ : X → Y is a continuous, onto, but not necessarily
one-to-one map, then ψ is called a semi-conjugacy or factor map, (Y, g)
is called a factor of (X, f ), and (X, f ) is called an extension of (Y, g).
This extension is almost one-to-one if there is a dense set Y  such that
#ψ −1 (y) = 1 for all y ∈ Y  .
A conjugacy ψ : X → Y is called pointed if it sends specified points
x ∈ X and y ∈ Y to each other.
Lemma 2.6. Let (X, f ) and (Y, g) be dynamical systems that are conjugate
via g ◦ ψ = ψ ◦ f . Then:
(1) If p is a (pre)periodic point for f , then ψ(p) is a (pre)periodic point
of g, and the (pre)periods are the same.
(2) If f, g are continuous, then the conjugacy preserves ω-limit sets:
ψ(ω(x)) = ω(ψ(x)).
(3) If the periodic point p is attracting/repelling, then ψ(p) is also at-

Proof. First note that

ψ ◦ f n = ψ ◦ f ◦ ψ −1 ◦ ψ ◦ f ◦ ψ −1 ◦ ψ ◦ f ◦ ψ −1 ◦ · · · ◦ f
= g ◦ ψ ◦ ψ −1 ◦ g ◦ ψ ◦ ψ −1 ◦ g ◦ ψ ◦ ψ −1 ◦ · · · ◦ g ◦ ψ = g n ◦ ψ.
22 2. Topological Dynamics

1. Take p such that f n (p) = p and q = ψ(p). Then g n (q) = g n ◦

ψ(p) = ψ ◦ f n (p) = ψ(p) = q, so q if n-periodic for g. Next, suppose that
f m+n (p) = f m (p), and set q = ψ(p). Then g m+n (q) = g m+n ◦ ψ(p) =
ψ ◦ f m+n (p) = ψ ◦ f m (p) = g m ◦ ψ(p) = g m (q).
2. Now assume that x ∈ ωf (a), so there is a sequence nk → ∞ such
that f nk (a) → x. Set y = ψ(x) and b = ψ(a). Then, by continuity of f ,
g nk (b) = g nk ◦ ψ(a) = ψ ◦ f nk (a) → ψ(x) = y, so y ∈ ωg (b).
3. If p = f (p) is asymptotically attracting, then for every a ∈ X suffi-
ciently close to p, we have p = ωf (a). By part 1, q := ψ(p) is a fixed point
of g, and by part 2, q = ωg (y) for y = ψ(x). 
Exercise 2.7. Is the following true? If (X, f ) is a factor of (Y, g) and (Y, g)
is a factor of (X, f ), then (X, f ) and (Y, g) are conjugate.
Example 2.8. The quadratic Chebyshev polynomial Q2 (y) = 2y 2 − 1
on [−1, 1] is conjugate to the tent map T (x) = min{2x, 2(π − x)} on [0, π].
(2.1) Q2 ◦ ψ = ψ ◦ T for ψ(x) = cos x.
It is very unusual to find smooth conjugacies between maps, and even here,
ψ is not diffeomorphic at the endpoints 0, 1. But applying (2.1) n times and
then differentiating, we find
(Qn2 ) ◦ ψ(x) · ψ  (x) = ψ  (T n (x)) · (T n ) (x).
If x is a p-periodic point of T , and hence y = ψ(x) a p-periodic point of
Q2 , we see that |(Qp ) (y)| = 2p . The only periodic point where this fails is
y = ψ(0) = 1, because ψ  (0) = 0.
Note that the same conjugacy works for the degree n Chebyshev poly-
nomial Qn and the slope n tent map with n branches. The characterization
of Chebyshev polynomials Qn (x) = cos(n arccos x) is the cause of this.
Example 2.9. We show that two circle rotations Rα and Rβ are not conju-
gate if 0 ≤ α < β < 1. Let < denote the positive orientation on S1 . Choose
n ∈ N such that nα ≤ k < nβ and (n − 1)β ≤ k for some integer k. Then,
setting y = ψ(0),
(2.2) Rαn (0) ≤ 0 ≤ Rα (0) and y ≤ Rβn (y) ≤ Rβ (y).
The homeomorphism ψ : S1 → S1 must either preserve or reverse the ori-
entation of the circle, but neither way is compatible with (2.2). Therefore
there cannot be any conjugacy.
A more structural way to see this is using lifts and rotation numbers; see
Theorem 4.54. Indeed, the rotation number ρ(f ) is preserved on conjugacy,
and ρ(Rα ) = α = β = ρ(Rβ ).
2.2. Transitive and Minimal Systems 23

Definition 2.10. Two dynamical systems (X, f ) and (Y, g) are called orbit
equivalent if there is a homeomorphism ψ : X → Y such that ψ(orbf (x)) =
orbg (ψ(x)) for all x ∈ X; i.e. ψ sends orbits to orbits (set-wise, not neces-
sarily point-wise).

Clearly, a conjugacy is an orbit equivalence. If f and g are themselves

homeomorphisms and ψ ◦ f = g −1 ◦ ψ, then ψ is called a flip-conjugacy
and this is also an orbit equivalence. More generally, if ψ is a conjugacy or
flip-conjugacy, then ψ ◦ f k is an orbit equivalence for each k ∈ Z.
Orbit equivalence implies the existence of two functions m, n : X → Z,
called orbit cocycles, such that
ψ ◦ f (x) = g n(x) ◦ ψ(x) and ψ ◦ f m(x) = g ◦ ψ(x).
Thus the orbit cocycle of a conjugacy is constant 1 and of a flip-conjugacy
is constant −1. Another special case of orbit equivalence is a speed-up:
(Y, g) is a speed-up of (X, f ) if it is orbit equivalent and the orbit cocycle
m : X → Z is non-negative.
Definition 2.11. Two dynamical systems (X, f ) and (Y, g) are strongly
orbit equivalent if their orbit cocycles are continuous on X, except for at
most one point each.

2.2. Transitive and Minimal Systems

The following definition expresses that all parts of a dynamical system con-
nect to each other:
Definition 2.12. A dynamical system (X, T ) is (topologically) transitive
if for every two non-empty open2 sets U, V ⊂ X, there is an n ≥ 0 such that
U ∩ T −n (V ) = ∅.3 It is called totally transitive if T N is transitive for
each N ∈ N.

Clearly totally transitive implies transitive. The other implication fails;

for example, σ is transitive on the two-point subshift {(10)∞ , (01)∞ } but σ 2
is not.
Proposition 2.13. Let X be a compact regular Hausdorff space4 without
isolated points and which is second countable; i.e. it possesses a countable
basis of its topology. A continuous map T : X → X is a topologically
transitive map if and only if there is a dense orbit.
2 Some authors use the abbreviation opene for open and non-empty.
3 Many texts write T n (U )∩V = ∅, which may be more intuitive but the fact that T n (U ) need
not be open (or not measurable even if U is measurable) might in some cases lead to inadvertent
4 Regular Hausdorff means that singletons {x} are closed and for all closed sets A and x ∈ /A
there are neighborhoods U  x and V ⊃ A such that U ∩ V = ∅.
24 2. Topological Dynamics

Remark 2.14. The notion of dense orbit may need further explanation if
the subshift is two-sided. Consider the sequence
(2.3) x = · · · 000000000000000000.101000101000000000101000101 · · · .
This sequence emerges from the Cantor substitution

0 → 000
χCantor :
1 → 101
from the seed 0.1. This sequence has a dense forward orbit orb+ (x) within
its forward orbit closure orb+ (x) as well as a dense backward orbit orb− (x)
within its backward orbit closure orb− (x). However, orb− (x) is not dense in
its two-sided orbit closure.

Proof. Let {Uj }j∈N be a countable basis of the topology. Let U, V ⊂ X be

arbitrary open sets. Take j, k ∈ N such that Uj ⊂ U , Uk ⊂ V . Since orb(x)
is dense and X has no isolated points, x visits each Uj infinitely often. Hence
there is m, n ∈ N such that T m (x) ∈ Uj and T m+n (x) ∈ Uk . This shows
that U ∩ T −n (V ) = ∅.
Conversely, by topological transitivity applied to U1 and U2 , we can find
n1 such that U1 ∩ T −n1 (U2 ) = ∅. By continuity of T , U1 ∩ T −n1 (U2 ) is an
open set. Choose V2 open such that V 2 ⊂ U1 ∩ T −n2 (U2 ). Here we use the
regular Hausdorff property of X.
Next, using topological transitivity applied to V2 and U3 , choose n2 > n1
such that V2 ∩ T −n2 (U3 ) = ∅. Then choose an open set V3 such that V3 ⊂
V2 ∩ T −n2 (U3 ).
Continuing this way we find a nested sequence of open sets Vk , with
V k ⊂ Vk−1 , and a sequence of integers (nk ) such that Vk ⊂ T −nk (Uk+1 ).
Let V∞ = k Vk . Since V k ⊂ Vk−1 and closed sets in X are automatically
compact, this intersection is non-empty, and every x ∈ V∞ has a dense orbit.
This concludes the proof. 

A strong form of transitivity is minimality:

Definition 2.15. A dynamical system (X, T ) is minimal if every orbit is
dense in X.
Remark 2.16. It is a straightforward application of Zorn’s Lemma that
every dynamical system on a compact space5 contains at least one minimal
subsystem. For compact metric spaces, this fact can also be shown without
the use of Zorn’s Lemma; see [304, Chapter 1, Theorem 2.2.1].
5 Compactness is important; otherwise one could take a single non-recurrent orbit (without

its closure) as the phase-space. An interesting example with only recurrent orbits but no minimal
subset is due to Auslander [38, page 27].
2.2. Transitive and Minimal Systems 25

Proposition 2.17. Let X be a compact topological space. We have the fol-

lowing equivalent characterizations for a continuous dynamical system (X, T )
to be minimal:
(i) There is no closed T -invariant proper subset of X.
(ii) Every orbit is dense in X.
(iii) There is a dense orbit and T is uniformly recurrent6 ; i.e. for
every open set U ⊂ X there is an N ∈ N such that for every x ∈ U
there is 1 ≤ n ≤ N such that T n (x) ∈ U .

Proof. We prove the three implications by the contrapositive.

(i) ⇒ (ii): Suppose that x ∈ X has an orbit that is not dense. Then
orb(x) is a T -invariant closed proper subset, so (i) fails.
(ii) ⇒ (iii): By (ii) every orbit is dense, so there is at least one dense
Now to prove uniform recurrence, let U be any open set and U0 an open
subset such that U0 ⊂ U .
Suppose that for every N ∈ N there is xN ∈ U0 such that T n (xN ) ∈
/ U0
for all 1 ≤ n ≤ N . Let x ∈ U0 ⊂ U be an accumulation point of (xN )N ∈N .
Since x has a dense orbit, we can take n ≥ 1 such that T n (x) ∈ U0 . Take
an open set V  x such that T n (V ) ⊂ U0 . Next take N ≥ n so large that
xN ∈ V . But this means that T n (xN ) ∈ U0 , which is against the definition
of xN . Hence no such n exists, and therefore orb(x) is not dense, and (ii)
Now take y ∈ U arbitrary (so not necessarily in U0 ), and take x ∈ U0
with a dense orbit. Find a sequence ki such that T ki (x) → y. For each i
there is 1 ≤ ni ≤ N such that T ki +ni (x) ∈ U0 . Passing to a subsequence,
we may as well assume that ni ≡ n. Then T n (y) = T n (limi T ki (x)) =
limi T ki +n (x) ∈ U0 ⊂ U . This proves the uniform recurrence of U .
(iii) ⇒ (i): Let x be a point with a dense orbit. Suppose that Y is a
closed T -invariant proper subset of X and let U ⊂ X be non-empty open
such that U ∩ Y = ∅. Let n ≥ 0 be minimal such that u := T n (x) ∈ U .
Let N = N (U ) ≥ 1 be as in the definition of uniform recurrence, and let
y ∈ Y be arbitrary. Since orb(y) ⊂ Y , there is an open set V  y such that
V ∩ T −i (U ) = ∅ for 0 ≤ i ≤ N .

Take n > n minimal such that T n (u) ∈ V , and let n < n be maximal

such that T n (u) =: u ∈ U . Then T i (u ) ∈
/ U for all 1 ≤ i ≤ n − n + N .

6 The expression “almost periodic” is frequently used as well, e.g. in [284, 381, 398, 465], but

it is not the same with all authors and sometimes refers to a different notion. For instance, in
[482] it is used as “periodically recurrent” in our Definition 2.19.
26 2. Topological Dynamics

Since N was arbitrary, this contradicts the uniform recurrence and hence
such Y cannot exist. 
Definition 2.18. Uniform recurrence means that the set
N (x, U ) := {n ∈ Z or N : x ∈ T −n (U )}
is syndetic for every x ∈ X; i.e. it has bounded gaps (from the Greek
συνδετ ικoς = bound together). A set that is not syndetic has a complement
that is thick: for every N ∈ N it contains blocks {n, n + 1, . . . , n + N }.
Definition 2.19. A dynamical system is called periodically recurrent if
for every non-empty open set U , there is N such that U ⊂ T −kN (U ) for all
k ∈ N (or k ∈ Z if T is invertible).

Since periodic recurrence is obviously stronger than uniform recurrence,

we have the following corollary.
Corollary 2.20. Every periodically recurrent dynamical system is minimal.
Definition 2.21. Given a dynamical system (X, T ), a point x ∈ X is uni-
formly recurrent (resp. periodically recurrent) if for every neighbor-
hood U  x, the set N (x, U ) is syndetic (resp. N (x, U ) ⊃ {bk : k ∈ N or Z}
for some b ∈ N).
Corollary 2.22. Let (X, T ) be a continuous dynamical system and let x ∈
X have a dense orbit. Then (X, T ) is minimal (resp. periodically recurrent)
if and only if x is uniformly recurrent (resp. periodically recurrent).

Proof. If (X, T ) is minimal, then x is uniformly recurrent by Proposi-

tion 2.17, part (iii).
Conversely, assume that x is uniformly recurrent. First observe that
every u ∈ orb(x) is uniformly recurrent too. Indeed, suppose u = T n (x),
and let V be an open neighborhood of x. Then for every open neighborhood
U of u, also U  = T −n (U ) ∩ V is an open neighborhood of x, and N (u, U ) ⊃
N (x, U  ) + n. Now minimality of (X, T ) follows precisely as in the step (iii)
⇒ (i) in the proof of Proposition 2.17.
The proof for x periodically recurrent is analogous. 
Definition 2.23. A dynamical system (X, T ) on a metric space (X, d) is
uniformly rigid if for every ε > 0 there is an iterate n ≥ 1 such that
d(T n (x), x) < ε for all x ∈ X.
Lemma 2.24. A continuous dynamical system (X, T ) on a Cantor set (or
compact zero-dimensional set) is uniformly rigid if and only if it is periodi-
cally recurrent.
2.2. Transitive and Minimal Systems 27

For this result, it is important that the space X is zero-dimensional. For

example, irrational rotations Rα on the circle are uniformly rigid but only
uniformly, so not periodically, recurrent. The uniform rigidity follows imme-
diately because a circle rotation is an isometry and every point is recurrent.
But periodic recurrence fails because for every n ∈ N and x ∈ S1 , the set
{Rαkn (x) : k ∈ N} is dense in S1 . The proof below, however, shows that
a periodically recurrent dynamical system on a compact space is uniformly

Proof. ⇒: Take ε > 0 arbitrary with corresponding iterate n ≥ 1, and let

k ∈ N be the smallest integer such that 2−k < ε. Thus the distance between
every two distinct k-cylinders Z in X is at least ε. By uniform rigidity
T n (Z) = Z, and therefore T kn (Z) = Z for all k ≥ 0, proving periodic
⇐: Let ε > 0 be arbitrary. For each x ∈ X, we can find a neighborhood
Ux of diam(Ux ) < ε and iterate nx such that T nx (Ux ) ⊂ Ux . By compactness,

there is a finite collection x1 , . . . , xN such that X = N i=1 Uxi . Take n =
lcm{nx1 , . . . , nxN }. Then d(T (x), x) < ε for each x ∈ X, as required.

The following weakening of minimality is of importance for e.g. Toeplitz

shifts and B-free shifts; see Sections 4.5 and 4.6.
Definition 2.25. A dynamical system (X, T ) is called essentially minimal
if it contains a unique minimal set Y , i.e. a unique non-empty closed set Y
such that T (Y ) = Y .

Clearly, essentially minimal maps can have at most one periodic or-
bit, but as the subshift X := {σ k (· · · 000001000000 · · · )}k∈Z ∪ {0∞ } shows,
X \ Y = ∅ is possible. However, the two-sided orbit closure of (2.3) does
not give an essentially minimal shift.
Proposition 2.26. Given a dynamical system (X, T ) and a point y ∈ X,
the following are equivalent:
(i) (X, T ) is essentially minimal and y is contained in its minimal set.
(ii) For every x ∈ X, ω(x)  y.
If, in addition, T is invertible, then two further equivalent statements are:
(iii) For every x ∈ X, α(x)  y.

(iv) For every open set U  y, n∈Z T n (U ) = X.

Proof. (i) ⇒ (ii): ω(x) is a closed non-empty T -invariant set, so by Zorn’s

Lemma, it contains a minimal set. But Y is the unique minimal set, so
y ∈ ω(x).
28 2. Topological Dynamics

(ii) ⇒ (i): Assume by contradiction that y ∈ Y and Y  are minimal sets,

and take x ∈ Y , x ∈ Y  . By assumption y ∈ ω(x) ∩ ω(x ), so y ∈ Y ∩ Y  .
Thus Y ∩ Y  is a non-empty, closed, and T -invariant subset of both Y and
Y  . Since Y and Y  are minimal, Y = Y ∩ Y  = Y  .
(i) ⇔ (iii): Use the above with T −1 instead of T .

(i) ⇒ (iv): Let U be an arbitrary neighborhood of y. Since n∈Z T n (U )
is an open (two-sided!) T -invariant set, its complement Y  is closed and T -
invariant. If Y  = ∅, then it contains a minimal
 set that is disjoint from y,
contradicting essential minimality. Hence n∈Z T n (U ) = X.
(iv) ⇒ (iii): Let x ∈ X be arbitrary; we can assume without loss
of generality that x = T k (y) for all k ≥ 0, because if y is periodic, then
α(x) = orb(y)  y, and otherwise we replace x by T −(k+1) (x) to get it outside
 be a nested sequence of neighborhoods of
the forward orbit of y. Let (Ur )r∈N
y such that r Ur = {y}. Since n∈Z T n (Ur ) = X and X is compact, there
is a finite Nr such that N n
n=−Nr T (U ) = X. Applying T
Nr to both sides, we
2Nr n
obtain n=0 T (U ) = X. Thus there is nr ≤ 2Nr such that T −nr (x) ∈ Ur .
As we can do this for every r, we have found a sequence (nr ) (and nr → ∞
because x = T k (y) for any k ≥ 0) such that T −nr (x) → y. Thus y ∈ α(x),
as required. 

2.3. Equicontinuous and Distal Systems

The opposite to expansive (recall Definition 1.39) is equicontinuous.
Definition 2.27. A dynamical system (X, T ) on a metric space (X, d) is
called equicontinuous if for all ε > 0 there exists δ > 0 such that if d(x, y) <
δ, then d(T n (x), T n (y)) < ε for all n ≥ 0 (or n ∈ Z if T is invertible). This
is sometimes also called Lyapunov stability.

Naturally, if T is not injective, then distality fails immediately. Every

isometry, i.e. a dynamical system such that d(T (x), T (y)) = d(x, y) for all
x, y ∈ X, is equicontinuous.
Exercise 2.28. Let (X, T ) be an equicontinuous dynamical system. Show
that it is topologically transitive if and only if it is minimal.
Lemma 2.29. Let (X, T ) be an equicontinuous surjection on a compact
metric space (X, d). Then the non-wandering set Ω(T ) = X.

Proof. Suppose by contradiction that x ∈ X is wandering; i.e. there is an

ε > 0 such that T k (Bε (x)) ∩ Bε (x) = ∅ for all k ≥ 1. In particular, x
is not periodic. By equicontinuity, there is δ > 0 such that d(a, b) < δ
implies d(T n (a), T n (b)) < ε/2 for all n ≥ 0. Construct a backward orbit
(x−n )n≥0 , i.e. T n (x−n ) = x and T k (x−n ) ∈
/ Bε (x) for all k ∈ N \ {n}.
2.3. Equicontinuous and Distal Systems 29

By compactness of X, (x−n )n≥0 has an accumulation point y ∈ X. Let

m < n be so that d(y, x−m ) < δ and d(y, x−n ) < δ. Then T n (x−n ) = x ∈
Bε (x) and T n (x−m ) = T n−m (x) ∈
/ Bε (x), so d(T n (x−m ), T n (y)) ≥ ε/2 or
d(T (x−n ), T (y)) ≥ ε/2. This contradicts equicontinuity of T and hence
n n

there cannot be a wandering point. 

Lemma 2.30. If an equicontinuous dynamical system (X, T ) on a compact
metric space (X, d) is topologically transitive, then it is uniformly rigid.

See [324, Proposition 1.1] for more general results in this direction.

Proof. Suppose z ∈ X has a dense orbit. Take ε > 0 arbitrary and choose
δ ∈ (0, ε/3) such that d(x, y) < δ implies d(T n (x), T n (y)) < ε/3 for all n ≥ 0.
Choose N ∈ N so large that N n N
n=0 Bδ (T (z)) = X and d(T (z), z) < δ. Now
let x be arbitrary and take 0 ≤ n < N such that d(T (z), x) < δ. Then

d(T N (x), x) ≤ d(T N (x), T N +n (z)) + d(T n+N (z), T n (z)) + d(T n (z), x)
ε ε
≤ + +δ <ε
3 3
as required. 

If T : X → X is equicontinuous on a compact metric space (X, d), then

the metric
d∞ (x, y) := sup d(T n (x), T n (y))
is well-defined (i.e. not infinite) and non-expanding because
d∞ (T (x), T (y)) = sup d(T n (x), T n (y)) ≤ sup d(T n (x), T n (y)) = d∞ (x, y).
n≥1 n≥0

However, equicontinuity also gives for every ε > 0 some δ > 0 (and δ → 0
as ε → 0) such that d(x, y) < δ implies d∞ (x, y) < ε, and therefore xn → x
in the metric d if and only if xn → x in the metric d∞ . Hence both metrics
generate the same topology.
If T is itself a strict contraction, then also d∞ (T (x), T (y)) < d∞ (x, y),
but if X is compact and T is surjective, then the dynamical system (X, T )
is an isometry in the metric d∞ .
Proposition 2.31. If a dynamical system (X, T ) is equicontinuous and sur-
jective on a compact metric space (X, d), then T preserves d∞ .

Since isometries are injective, (X, T ) is automatically a homeomorphism

in this case.

Proof. We have already seen that d∞ (T (x), T (y)) ≤ d∞ (x, y) for all x, y ∈
X. Assume by contradiction that we have strict inequality for some choice
a = b, say d∞ (a, b) = d∞ (T (a), T (b)) + 9ε for some ε > 0.
30 2. Topological Dynamics

Consider the product system T2 : X 2 → X 2 with metric

d2 ((x, x ), (y, y  )) := max{d∞ (x, y), d∞ (x , y  )}.
Clearly T2 is non-expanding on (X 2 , d2 ). Let B ⊂ X 2 be the ε-ball w.r.t. d2
around (a, b). So, if (x, y) ∈ B, then d∞ (x, a) < ε and d∞ (y, b) < ε. Assume
by contradiction that there is n ≥ 1 such that B ∩ T2n (B) = ∅. This would
mean that d∞ (T n (x), a) < 3ε and d∞ (T n (y), b) < 3ε. But then
d∞ (a, b) ≤ d∞ (a, T n (x)) + d∞ (T n (x), T n (y)) + d∞ (T n (y), b)
≤ 3ε + d∞ (T (x), T (y)) + 3ε
≤ 6ε + d∞ (T (x), T (a)) + d∞ (T (a), T (b)) + d∞ (T (b), T (y))
≤ 3ε + ε + d∞ (a, b) − 9ε + ε = d∞ (a, b) − ε.
This contradiction shows that T2n (B) ∩ B = ∅ for all n ≥ 1. But then (a, b)
is a wandering point for T2 , contradicting Lemma 2.29. 

Related notions to equicontinuity are distality and its opposite: proxi-

Definition 2.32. A dynamical system (X, T ) on a metric space (X, d) is
distal if lim inf n d(T n (x), T n (y)) > 0 for every x = y. Conversely, a pair
(x, y) ∈ X 2 is called proximal if lim inf n d(T n (x), T n (y)) = 0. That is, a
distal dynamical system has no proximal pairs (except (x, x)). A dynamical
system (X, T ) is called proximal if every pair (x, y) ∈ X 2 is proximal.

Auslander & Ellis (see e.g. [13]) proved that for every x ∈ X, there exists
a y ∈ X such that orb(y) is a minimal subset of X and (x, y) is a proximal
pair. Note that proximality is not an equivalence relation: it is not transitive.
For example, (101)(000)2 (101)3 (000)4 · · · and (000)(101)2 (000)3 (101)4 · · ·
are both proximal to 0∞ under the shift, but not to each other. A stronger
version of proximality that does give an equivalence relation is the following:
Definition 2.33. Let (X, T ) be a dynamical system on a metric space (X, d).
Then a pair of points (x, y) is syndetically proximal if for every ε > 0,
the set {n ∈ N or Z : d(T n (x), T n (y)) < ε} is syndetic.

The following result for subshifts goes back to [156, 562]; see also [434,
Theorem 19] for the proof.
Theorem 2.34. Given a subshift (X, σ), the following are equivalent:
(1) Proximality is an equivalence relation.
(2) Every proximal pair is syndetically proximal.
(3) The orbit closure {σ n × σ n (x, y) : n ∈ N or Z} of every (x, y) ∈ X×
X contains exactly one minimal set in the product shift.
2.3. Equicontinuous and Distal Systems 31

Distality doesn’t imply equicontinuity; see Exercise 2.37. Neither does

equicontinuity imply distality; think of T (x) = x/2 on X = [0, 1] or on
X = R. However:

Corollary 2.35. Every equicontinuous surjection (X, T ) on a compact met-

ric space (X, d) is distal.

Proof. Assume by contradiction that T is not distal. Then there are x = y

and a sequence (nk )k∈N such that d(T nk (x), T nk (y)) → 0. Since X is com-
pact, by taking a subsequence, we can assume limk T nk (x) = limk T nk (y) = z
in the metric d. But then also limk T nk (x) = limk T nk (y) = z in the metric
d∞ , and this contradicts that T is an isometry in d∞ . 

In particular, equicontinuous surjections on a compact metric space are

invertible, because distal dynamical systems are.

Corollary 2.36. An equicontinuous surjection (X, T ) on a compact metric

space (X, d) has an equicontinuous inverse.

Proof. Take Kε = {(x, y) ∈ X 2 : d(x, y) ≥ ε} for ε > 0. We claim that

δ(ε) := inf{d(T n x, T n y) : (x, y) ∈ Kε , n ∈ N} > 0.
Indeed, assume by contradiction that there are sequences (xk , yk ) ⊂ Kε
and (nk ) ⊂ N such that d(T nk xk , T nk yk ) ≤ 1/k for all k ∈ N and (xk , yk ) →
(x∞ , y∞ ) ∈ Kε . By Corollary 2.35, T is distal, so η := inf{d(T n (x∞ ), T n (y∞ )) :
n ≥ 0} > 0. By equicontinuity, there is γ(η) > 0 is such that d(x, y) < γ(η)
implies that d(T n (x), T n (y)) < η/3 for all n ≥ 0. Take k > 3/η so large that
(xk , yk ) ∈ Bγ(η) (x∞ , y∞ ). Then by the triangle inequality
d(T nk (x∞ ), T nk (y∞ )) ≤ d(T nk (x∞ ), T nk (xk )) + d(T nk (xk ), T nk (yk ))
+ d(T nk (yk ), T nk (y∞ ))
< η/3 + η/3 + η/3 = η,
contradicting the choice of η. Hence two points u, v ∈ X with d(u, v) < δ(ε)
have d(T −n (u), T −n (v)) < ε for all n ∈ N. This is equicontinuity of T −1 . 

Exercise 2.37. (a) Show that the map T (x, y) = (x, x + y) on the two-torus
T2 is distal but not equicontinuous.
(b) Let α ∈ [0, 1] be irrational. Show that the map T (x, y) = (x + α, x + y)
on the two-torus T2 is distal but not equicontinuous. (Here showing mini-
mality is the hard part; see Proposition 6.26).

Proposition 2.38. Every subshift (X, σ) with a non-periodic minimal set

is proximal (so not equicontinuous by Corollary 2.35).
32 2. Topological Dynamics

The non-periodicity is essential; otherwise X = {(01)∞ , (10)∞ } is an

equicontinuous counterexample. Non-periodicity implies in particular that
X is uncountable.

Proof. First assume that the shift is one-sided. If it is distal, then it has
to be invertible, and therefore a homeomorphism. But a one-sided shift
is locally expanding, and locally expanding homeomorphisms only exist on
finite spaces; see Proposition 1.42. Hence, there are no distal one-sided shifts
other than finite unions of periodic orbits.
Now if (X, σ) is a two-sided shift, then its one-sided restriction (X + , σ)
is a subshift too. Here we need to check that σ : X + → X + is surjective,
but this follows because if x+ is the one-sided restriction of x ∈ X, then
y + := σ −1 (x)+ ∈ X + and σ(y) = x. Furthermore, since X has a non-
periodic minimal set, X + has a non-periodic minimal set too. Thus the
above argument shows that (X + , σ) cannot be distal. 
Definition 2.39. Given a dynamical system (X, T ), we say that (Y, S) is
the maximal equicontinuous factor (MEF) if it is equicontinuous and
semi-conjugate to (X, T ) and every other equicontinuous factor of (X, T ) is
also a factor of (Y, S).
Every dynamical system has an MEF, and it can be shown that the MEF
is unique up to conjugacy. This goes back to a result of Ellis & Gottschalk
[236]. The proof we give is for invertible dynamical systems7 and relies on
the notion of regional proximality:
Definition 2.40. Let (X, T ) be a dynamical system on a metric space (X, d).
Two points x, y ∈ X are regionally proximal if there are sequences xi → x
and yi → y and (ni ) ⊂ N such that d(T ni (xi ), T ni (yi )) → 0. In this case
we write x ∼rp y. It is not obvious that ∼rp is a transitive relation8 , and
therefore we take the transitive hull x ∼trp y if there is a sequence x = z0 ∼rp
z1 ∼rp · · · ∼rp zN = y.
Proposition 2.41. Every continuous invertible dynamical system (X, T ) on
a compact metric space (X, d) has a maximal equicontinuous factor.

Proof. First we note that if (X, T ) is equicontinuous and x ∼rp y, then

x = y. Indeed, otherwise for any ε > 0 and δ = δ(ε) as in the defini-
tion of equicontinuity, there is xi ∈ Bδ (x), yi ∈ Bδ (y), and ni such that
d(T ni (xi ), T ni (yi )) < ε. But then also
d(T ni (x), T ni (y)) ≤ d(T ni (x), T ni (xi )) + d(T ni (xi ), T ni (yi ))
+ d(T ni (yi ), T ni (y)) < 3ε.
7 See [381, Theorem 2.44] for a proof of the non-invertible case, which is not constructive if
it comes to the factor map.
8 See e.g. [321, 434, 497] for further information.
2.3. Equicontinuous and Distal Systems 33

Therefore (x, y) is not a distal pair, but equicontinuous maps are distal; see
Corollary 2.35.
The (transitive hull) relation ∼trp is an equivalence relation that is T -
invariant and also T −1 -invariant. The equivalence classes are closed, and if
xk → x, yk → y are such that xk ∼trp yk , then also x ∼trp y. Therefore
the quotient space Xeq = X/ ∼trp is a well-defined Hausdorff space (and in
fact a metric space with quotient metric deq ), and the maps T and T −1 are
well-defined on it.
Now suppose by contradiction that T and hence T −1 is not equicontin-
uous on the quotient space Xeq . Then there is ε > 0 such that for all i ∈ N,
there are xi , yi ∈ Xeq , deq (xi , yi ) < 1/i, and ni ∈ N such that deq (xi , yi ) > ε
for xi = T −ni (xi ) and yi = T −ni (yi ). By passing to a subsequence, we can
assume that xi → x and yi → y and deq (x, y) ≥ ε. But x ∼trp y by construc-
tion, contradicting that Xeq has only trivial regionally proximal pairs. 

2.3.1. Mean Equicontinuity. Instead of assuming that nearby points al-

ways remain close under iteration, mean equicontinuity stipulates that iter-
ates of nearby points remain close on average. This notion was first used by
Fomin [250] under the name of mean Lyapunov stability9 .

Definition 2.42. A dynamical system (X, T, d) on a metric space is called

mean equicontinuous if for every ε > 0 there is a δ > 0 such that d(x, y) <
δ implies
lim sup d(T i x, T i y) < ε.
n→∞ n

Mean equicontinuity is more versatile than its strict version. Clearly,

circle rotations Rα : S1 → S1 with angle α ∈ / Q are isometries and there-
fore equicontinuous . Their symbolic versions, i.e. Sturmian shifts, see Sec-
tion 4.3, are expansive and therefore not equicontinuous. Indeed, equip S1
with an orientation and a partition {[0, α) , [α, 1)}, with symbols 1 and 0, re-
spectively, as is done in Example 1.34. If x < y ∈ S1 are very close together,
then there are still iterates n ∈ N such that Rαn (x) < 0 < Rαn (y), so the
symbolic distance dσ (σ n ◦ i(x), σ n ◦ i(y)) = 1. However, since this happens
less frequently as the distance |x − y| becomes smaller, mean equicontinuity
of a Sturmian shift is still achieved.
Another variation of equicontinuity, which is a priori weaker than mean
equicontinuity, is Weyl mean equicontinuity: for every ε > 0 there is a

9 This was defined as for every ε > 0 there is δ > 0 such that d(x, y) < δ implies

d(T n (x), T n (y)) < ε for all n ∈ N except for a set of density zero. This is equivalent to Def-
inition 2.42 by Lemma 8.53.
34 2. Topological Dynamics

δ > 0 such that d(x, y) < δ implies

lim sup d(T i x, T i y) < ε.
n−m→∞ n − m

However, it was proved in [211] for minimal dynamical systems (and [263,
464] in more generality) that (X, T ) is mean equicontinuous if and only if
for every ε > 0 there is a δ > 0 and N ∈ N such that d(x, y) < δ implies
d(T i x, T i y) < ε for all m and n ≥ m + N.

Some of the stronger results on mean equicontinuity rely on invariant

measures and therefore don’t quite fit in this section on topological dynamics.
We present some of this nonetheless and refer to Chapter 6 for the relevant
details. Given a T -invariant Borel probability measure μ, we call (X, T ) μ-
mean equicontinuous if for every η > 0, there is a set Y ⊂ X of measure
μ(Y ) > 1 − η such that T is mean equicontinuous on Y .
As shown in [208, 449], if (X, T ) is an almost one-to-one extension of a
minimal equicontinuous dynamical system (Y, S), then (Y, S) is the maximal
equicontinuous factor of (X, T ).
It follows from Theorem 6.22 (or more precisely the remarks that follow
it) that transitive mean equicontinuous dynamical systems are uniquely er-
godic. Thus the following characterization of mean equicontinuity, due to
[211] for minimal dynamical systems and to [263] in general, makes sense:
Theorem 2.43. A continuous dynamical system (X, T ) is mean equicontin-
uous if and only if its semi-conjugacy to its maximal equicontinuous factor is
at the same time a measure-theoretic isomorphism between the unique invari-
ant probability measures of (X, T ) and its maximal equicontinuous factor10 .

Let us show that the symbolic version of an equicontinuous homeomor-

phism with a partition that is not too complicated (see condition (2) below)
is mean equicontinuous.
Theorem 2.44. Let (X, T ) be an equicontinuous homeomorphism on a
compact metric space (X, d) with T -invariant measure11 μ. Let P =
{P0 , . . . , Pr−1 } be a finite partition such that:
(1) P is generating (cf. Theorem 6.48); i.e. for every x = x ∈ X
there is n ∈ Z such that T n (x) and T n (x ) lie in different partition
10 In this case, (X, T ) is called a topo-isomorphic extension of its MEF.
11 If (X, T ) is minimal, then μ is unique.
2.3. Equicontinuous and Distal Systems 35

(2) limε→0 μ(Uε ) = 0 where Uε is the ε-neighborhood of ∂P = {x ∈ X :

x ∈ P i ∩ P j for some 0 ≤ i < j < r}.

Let (Y, σ) be the symbolic system associated to (X, T, P), i.e. the smallest
subshift such that the itinerary i(x) ∈ Y for every x ∈ X. Then (Y, σ) is
mean equicontinuous.

Proof. If (X, T ) is transitive, then μ is the only T -invariant probability

measure, see Theorem 6.22, and we can use Oxtoby’s Ergodic Theorem 6.20,
later on. Otherwise, we can separate X into transitive parts and deal with
each part separately.
Choose N ∈ N arbitrary and 0 < ε < 2−N /(2N + 1). Choose ε > 0 so
small that μ(Uε ) < ε . By equicontinuity of (X, T ) there is δ > 0 such that
d(T n (x), T n (x )) < ε for all n ∈ Z whenever d(x, x ) < δ. Next take M ∈ N
so large that the diameter diam(i−1 ([e−M · · · eM ])) < δ for every two-sided
(2M + 1)-cylinder [e−M · · · eM ].
Now take y, y  ∈ Y such that dσ (y, y  ) ≤ 2−M , where dσ is the symbolic
metric; i.e. y, y  are in the same two-sided (2M + 1)-cylinder. The sequences
y, y  may not be well-defined itineraries of points in X, but this is remedied by
assuming that points x ∈ X such that T n (x) ∈ ∂P get multiple itineraries,
according to which P i contains T n (x). In this sense there are x, x such
that at least one of their multiple itineraries equals y and y  , respectively. In
particular, d(x, x ) < δ and therefore d(T n (x), T n (x )) < ε for all n ∈ Z. The
points T n (x) and T n (x ) can only lie in different partition elements if they

both lie in Uε . Unless T n (x), T n (x ) ∈ Vε := N j
j=−N T (Uε ), their itineraries
satisfy dσ (i(T (x)), i(T (x ))) ≤ 2 . But the measure μ(Vε ) ≤ (2N + 1)ε
n n  −N

and by Oxtoby’s Ergodic Theorem 6.2012 , x and x visit Vε with frequency

≤ (2N + 1)ε . Therefore

n−1 n−1
1 1
lim sup dσ (σ j (y), σ j (y  )) = lim sup dσ (σ j (i(x)), σ j (i(x )))
n→∞ n n→∞ n
j=0 j=0
≤ lim sup dσ (i(T j (x)), i(T j (x )))
n→∞ n

≤ (2N + 1)ε + (1 − (2N + 1)ε )2−N ≤ 2−N +1 .

This proves mean equicontinuity. 

12 We will apply Oxtoby’s Ergodic Theorem for the indicator function 1 , which is discon-

tinuous. But by assumption (2), μ(∂Vε ) can be made arbitrarily small by taking ε small, so that
1Vε can be approximated by a continuous function with negligible error.
36 2. Topological Dynamics

2.4. Topological Entropy

The notion of topological entropy was introduced, by Adler, Konheim
& McAndrew [9] in 1969. Nowadays, the definition due to the American
mathematician Rufus Bowen [102] and, independently, his Russian colleague
Efim Dinaburg [202] is most often13 used.
Entropy is a measure of disorder of the dynamical system, and one pop-
ular definition of chaos is that the topological entropy is positive.
Let (X, T ) be a continuous dynamical system on a compact metric space
(X, d). If my eyesight is not so good, I cannot distinguish two points x, y ∈ X
if d(x, y) ≤ ε. I may still be able to distinguish their orbits, if d(T k x, T k y) >
ε for some k ≥ 0. Hence, if I’m willing to wait up to n − 1 iterations, I can
distinguish x and y if
dn (x, y) := max{d(T k x, T k y) : 0 ≤ k < n} > ε.
If this holds, then x and y are said to be (n, ε)-separated. Among all the
subsets of X of which all elements are mutually (n, ε)-separated, choose one,
say En (ε), of maximal cardinality. Then sn (ε) := #En (ε) is the maximal
number of n-orbits I can distinguish with my ε-poor eyesight.
Remark 2.45. Compactness of X together with continuity of T ensures that
sn (ε) < ∞. However, also for discontinuous maps, such as β-transformations,
it can be proven that sn (ε) < ∞ for all ε > 0 and n ∈ N. Consequently,
this approach to topological entropy usually also works for discontinuous

The topological entropy is defined as the limit (as ε → 0) of the

exponential growth rate of sn (ε):
(2.4) htop (T ) = lim lim sup log sn (ε).
ε→0 n→∞ n
Note that sn (ε1 ) ≥ sn (ε2 ) if ε1 ≤ ε2 , so lim supn n1 log sn (ε) is a decreasing
function in ε, and the limit as ε → 0 indeed exists (we allow the limit to be
Instead of (n, ε)-separated sets, we can also work with (n, ε)-spanning
sets, that is, sets that contain, for every x ∈ X, a point y such that dn (x, y) ≤
ε. Let rn (ε) denote the minimal cardinality among all (n, ε)-spanning sets.
Due to its maximality, En (ε) is always (n, ε)-spanning, and no proper subset
of En (ε) is (n, ε)-spanning. Each y ∈ En (ε) must have a point of an (n, ε/2)-
spanning set within an ε/2-ball (in dn -metric) around it, and by the triangle

13 Note, however, that the Adler, Konheim & McAndrew definition requires only a topology,

whereas the Bowen-Dinaburg definition is metric.

2.4. Topological Entropy 37

inequality, this ε/2-ball is disjoint from the ε/2-balls centered around all
other points in En (ε). Therefore,
(2.5) rn (ε) ≤ sn (ε) ≤ rn (ε/2).
Thus we can equally well define
(2.6) htop (T ) = lim lim sup log rn (ε).
ε→0 n→∞ n
Example 2.46. Let (X, σ) be the full shift on N symbols. Let ε > 0 be
arbitrary, and take m minimal such that 2−m < ε. If we select a point from
each n + m-cylinder, this gives an (n, ε)-spanning set, whereas selecting one
point from each n-cylinder gives an (n, ε)-separated set. Therefore
1 1
log N = lim sup log N n ≤ lim sup log sn (ε) ≤ htop (σ)
n→∞ n n→∞ n
1 1
≤ lim sup log rn (ε) ≤ lim sup log N n+m
n→∞ n n→∞ n
= log N.
Exercise 2.47. Show that for subshifts the definition of (1.3) coincides with
(n, ε)-definition in this section.
Example 2.48. Consider the β-transformation Tβ : [0, 1) → [0, 1), x →
βx mod 1 for some β > 1. Take ε < 2β1 2 and Gn = { βkn : 0 ≤ k < β n }.
Then Gn is (n, ε)-separating, so sn (ε) ≥ β n . On the other hand, Gn =
βn βn
β n : 0 ≤ k < 2ε } is (n, ε)-spanning, so rn (ε) ≤ 2ε . Therefore
{ 2kε
1 1 βn
log β = lim sup log β n ≤ htop (Tβ ) ≤ lim sup log = log β.
n→∞ n n→∞ n 2ε
Circle rotations, or in general isometries, have zero topological entropy.
Indeed, if E(ε) is an ε-separated set (or ε-spanning set), it will also be (n, ε)-
separated (or (n, ε)-spanning) for every n ≥ 1. Hence sn (ε) and rn (ε) are
independent of n, and their exponential growth rates are equal to zero. In
more generality:
Proposition 2.49. Every equicontinuous transformation (X, T ) on a com-
pact metric space (X, d) has zero entropy.

Proof. Let ε > 0 be arbitrary and choose δ > 0 as in the definition of

equicontinuity. Then diam(T n (Bδ (x)) ≤ 2ε for all x ∈ X and n ≥ 0 (or
n ∈ Z if T is invertible). Take M = diam(X)/δ. Hence, a single cover
of X by M δ-balls constitutes a cover of (n, ε)-balls for all n. Therefore
htop (T ) ≤ limε→0 limn→∞ n1 log M = 0. 
Corollary 2.50. Given a continuous map T : X → X, htop (T k ) = khtop (T )
for all k ≥ 0, and if T is invertible, then htop (T k ) = |k|htop (T ) for all k ∈ Z.
38 2. Topological Dynamics

Proof. For any k ∈ N, a (kn, ε)-separated set for T is also an (n, ε)-separated
set for T k . Therefore
1 1
htop (T k ) = lim log sn (ε, T k ) = k lim log sn (ε, T ) = khtop (T ).
n→∞ n n→∞ kn
Clearly the identity T 0 has zero entropy. If T is invertible and En (ε) is
an (n, ε)-separated set, then T n−1 (En (ε)) is an (nε)-separated set for T −1 .
Therefore htop (T −1 ) = htop (T ). Combined with the first part, it follows that
htop (T k ) = |k|htop (T ) for all k ∈ Z. 
Corollary 2.51. If (Y, S) is a continuous factor of (X, T ) (where (X, d)
is a compact metric space), then htop (S) ≤ htop (T ). In particular, conju-
gate dynamical systems on compact metric spaces have the same topological

Proof. Let π : X → Y be a continuous factor map. Since X is compact, π

is uniformly continuous, so for ε > 0, we can find δ > 0 such that d(x, y) < δ
implies d(π(x), π(y)) < ε. Therefore, if En (δ) is an (n, δ)-spanning set for
T , then π(En (δ)) is an (n, ε)-spanning set for S (but possibly not a minimal
(n, ε)-spanning set, even if En (δ) is minimal). It follows that rn (δ, T ) ≥
rn (ε, S), and hence htop (T ) ≥ htop (S). 
Proposition 2.52. Let (X, T ) be continuous dynamical system on a com-
pact metric space (X, d). The entropy of its restriction to the non-wandering
set Ω(T ) satisfies htop (T ) = htop (T |Ω(T ) ).
Since T -invariant measures have to be supported on the non-wandering
set, Proposition 2.52 follows from the Variational Principle (Theorem 6.63).
A direct proof (not using invariant measures) can be found in [22, Lemma
Example 2.53. The non-wandering set Ω(σ) of the subshift
X = {0n1 1n2 0n3 1n4 · · · : 0 ≤ n1 ≤ max{n1 , 1} ≤ n2 ≤ n3 ≤ n4 ≤ · · · }
consists of periodic orbits 0k 1k 0k 1k · · · or 1k 0k 1k 0k · · · , i.e. with period 2k.
Therefore the number of 2n-periodic points (not necessarily prime period
2n) equals twice the number of divisors of n and hence is ≤ 2n. In view of
Proposition 2.52, we have htop (σ) = 0.
2.4.1. Amorphic Complexity. If the cardinalities of (n, ε)-separated and
of (n, ε)-spanning sets increase subexponentially, then one could compute the
polynomial growth rate instead. This is called power entropy:
log sn (ε)
(2.7) hpow (T ) = lim lim sup ;
ε→0 n→∞ log n
see [304]. However, in practice this isn’t a very powerful tool to distinguish
between dynamical systems, because, for instance, all dynamical systems
2.4. Topological Entropy 39

with linear word-complexity have hpow (T ) = 1. A recent approach [264],

which turns out to distinguish between many zero-entropy systems (even of
linear complexity and between some semi-conjugate dynamical systems), is
amorphic complexity14 . It is based on the average time v that orbits are δ
apart. Given a dynamical system (X, T ) on a metric space (X, d), two points
x, y ∈ X are (δ, v)-separated for some δ > 0 if
lim sup #{0 ≤ j < n : d(T j (x), T j (y)) ≥ δ} ≥ v.
n→∞ n
A set S ⊂ X is (δ, v)-separated if every x = y ∈ S is (δ, v)-separated. Let
Sep(δ, v) denote the maximal cardinality of the (δ, v)-separated sets. We say
that (X, T ) has finite separation numbers if Sep(δ, v) < ∞ for all δ, v > 0.
If Sep(δ, v) = ∞ for some δ, v > 0, then (X, T ) has infinite separation
numbers, and in this case the amorphic complexity defined below is infinite,
hence not so useful. This occurs, for instance, in the following cases; see
[264, Theorem 1.1]:
Theorem 2.54. Let (X, T ) be a continuous dynamical system on a com-
pact metric space (X, d). If htop (T ) > 0 or T is weakly mixing w.r.t. some
non-atomic invariant probability measure (see Definition 6.83), then T has
infinite separation numbers.

Hence we are only interested in dynamical systems with separation num-

bers that are finite, but potentially unbounded in v.
Definition 2.55. Assume that (X, T ) has finite separation numbers. The
upper/lower amorphic complexity is the polynomial growth rate of the
separation numbers as a function of v tending to zero:

⎨ac(T ) = supδ>0 lim supv→0 log Sep(δ,v) ,
− log v
⎩ac(T ) = sup log Sep(δ,v)
δ>0 lim inf v→0 − log v .
log Sep(δ,v)
If these quantities are the same, then ac(T ) = supδ>0 limv→0 − log v is the
amorphic complexity of T .
Remark 2.56. Amorphic complexity can also be defined by spanning sets
[264, Section 3.2]. A set S ⊂ X is (δ, v)-spanning if for every y ∈ X there is
an x ∈ S such that
lim sup #{0 ≤ j < n : d(T j (x), T j (y)) ≥ δ} < v.
n→∞ n
Letting Span(δ, v) denote the minimal cardinality of the (δ, v)-spanning sets,
(2.8) holds with Sep(δ, v) replaced by Span(δ, v).
14 This notion was first used in the context of aperiodic tilings that approximate “amorphous”

material. The name was coined for this reason.

40 2. Topological Dynamics

If T is an isometry, then the frequency of two points x, y ∈ X being ≥ δ

apart is 0 or 1, depending on whether d(x, y) < δ or ≥ δ. Therefore Sep(δ, v)
is independent of v, so ac(T ) = 0. More generally:
Proposition 2.57. If (X, T ) is equicontinuous, then the amorphic complex-
ity ac(T ) = 0.

Proof. Let ε > 0 be arbitrary. By equicontinuity and the compactness of

X, we can take δ > 0 such that T n (Bδ (x)) ⊂ Bε/2 (T n (x)) for all x ∈ X
and n ∈ N or Z. Thus two points in Bδ (x) are never (ε, v)-separated for any
v ∈ (0, 1]. Let N (δ) be the number of such δ-balls that can be packed in
X, so that no such ball contains the center of another. Then log−Sep(ε,v)
log v ≤
log N (δ)
− log v → 0 as v → 0. Therefore ac(T ) = 0. 

Further properties concern iterates and factors; see [264, Proposition 1.3].
Lemma 2.58. Let (X, T ) and (Y, S) be two dynamical systems on compact
metric spaces.
• If (Y, S) is a topological factor of (X, T ), then ac(S) ≤ ac(T ). In
particular, amorphic complexity is preserved under conjugacy.
• ac(T n ) = ac(T ) for every n ∈ N.
• ac(S × T ) = ac(S) + ac(T ).

In later sections, we compute the amorphic complexity of some particular

dynamical systems, such as Sturmian shifts, see Section 4.3.1, and Toeplitz
shifts, see Section 4.5.

2.5. Mathematical Chaos

Mathematical chaos doesn’t have a single definition, but the basic idea it
tries to capture is that forward orbits are unpredictable. The computa-
tion of orbits in any (physical) dynamical systems inherently brings errors:
measurement errors, round-off errors, error in the mathematical model. Un-
predictability means that initial errors blow up over time (sometimes ex-
ponentially fast, as is the case with subshifts). Therefore distal dynamical
systems on compact spaces (in particular isometries) are not chaotic in any
common definition. On the other hand, expansivity is in general too strong
a property to require for chaos. For instance, a tent map
Ts : [0, 1] → [0, 1], Ts : x → min{sx, s(1 − x)}

is chaotic if the slope s ∈ ( 2, 2], but not expansive. Indeed, x = 1+ε
2 and
y = 2 are ε apart, but Ts (x) = Ts (y) for all n ≥ 1. A weaker, more
1−ε n n
2.5. Mathematical Chaos 41

appropriate, definition in this context is the following:

Definition 2.59. A dynamical system (X, T ) on a metric space (X, d)
has sensitive dependence on initial conditions if there is δ > 0 such
that for all ε > 0 and x ∈ X, there is y ∈ Bε (x) and n ≥ 0 such that
d(T n (x), T n (y)) > δ.
This leads to one of the most common definitions of chaos [196]:
Definition 2.60. A dynamical system (X, T ) on a metric space (X, d) is
chaotic in the sense of Devaney if:
1. (X, T ) has sensitive dependence on initial conditions;
2. (X, T ) has a dense orbit;
3. X has a dense set of periodic orbits.
As was soon realized by Banks et al. [46], unless X is a single periodic orbit,
1 follows automatically from 2 and 3. See also Silverman’s study [510] on
chaos and topological transitivity.
Proposition 2.61. Let (X, T ) be a continuous dynamical system on an
infinite metric space (X, d). If T has a dense set of periodic orbits as well as
a dense orbit, then T has sensitive dependence on initial conditions.

Proof. Since X is infinite and has a dense orbit, no periodic point is isolated,
and there are at least two periodic orbits, say orb(p) and orb(q). Let δ :=
min{d(x, y) : x, y ∈ orb(p) ∪ orb(q), x = y}/6 > 0. Take x ∈ X and ε > 0
arbitrary. Then Bε (x) contains a periodic point r ∈ / orb(p) ∪ orb(q). If
there is n ≥ 0 such that d(T (x), T (r)) > δ, then sensitive dependence is
n n

established at x. Therefore assume that d(T n (x), T n (r)) ≤ δ for all n ≥ 0.

Since there is a dense orbit, we can find y ∈ Bε (x) such that p, q ∈
orb(y) = X. If there is n ≥ 0 such that d(T n (x), T n (y)) > δ, then sensitive
dependence is again established at x, so we assume that d(T n (x), T n (y)) ≤ δ
for all n ≥ 0.
Take j, k ∈ N such that
d(T j+i (y), T i (p)) < δ and d(T k+i (y), T i (q)) < δ
for all 0 ≤ i, i ≤ per(r). We can choose 0 ≤ i, i ≤ per(r) such that

r = T j+i (r) = T k+i (r). Therefore
d(T i (p), r) ≤ d(T i (p), T j+i (y)) + d(T j+i (y), T j+i (x)) + d(T j+i (x), r) ≤ 3δ
d(T i (q), r) ≤ d(T i (q), T k+i (y))+d(T k+i (y), T k+i (y))+d(T k+i (x), r) ≤ 3δ.

But then d(T i (p), T i (q)) ≤ 6δ, contradicting the choice of δ. This proves
the result. 
42 2. Topological Dynamics

The requirement of a dense set of periodic orbits in Devaney chaos is

restrictive, because it precludes minimal systems to be chaotic. The following
notion doesn’t have this drawback.

Definition 2.62. A dynamical system (X, T ) on a metric space (X, d) is

chaotic in the sense of Auslander-Yorke if:
1. (X, T ) has sensitive dependence on initial conditions;
2. (X, T ) has a dense orbit.

The following result is known as the Auslander-Yorke dichotomy [39]:

Theorem 2.63. Every minimal dynamical system (X, T ) is either equicon-

tinuous or has sensitive dependence on initial conditions.

Proof. That sensitive dependence precludes equicontinuity is clear from the

definition. For the converse, we will assume that equicontinuity fails at
one point x, and we show that T is sensitive at every x ∈ X. For this it
suffices to assume that orb(x) is dense in X. Given that T is not equicon-
tinuous at x, there are δ > 0 and sequences yk → x, nk → ∞ such that
d(T nk (x), T nk (yk )) ≥ δ.
Let x ∈ X and U   x be an arbitrary open neighborhood. Using
denseness of orb(x), we find m such that T m (x) ∈ U . Since T is continu-
ous, we can take k so large that nk > m and T m (yk ) ∈ U  as well. Now
T nk (y), T nk (yk ) ∈ T nk −m (U  ) and d(T nk (x), T nk (yk )) > δ, so
d(T nk −m (x), T nk −m (x )) > δ/2 or d(T nk −m (yk ), T nk −m (x )) > δ/2.
This proves sensitive dependence with expansivity constant δ/2. 

Remark 2.64. In fact, there is a version of the Auslander-Yorke dichotomy,

see [14, 423, 553, 563], saying that a transitive dynamical system either has
sensitive dependence on initial conditions (see Definition 2.59) or is uni-
formly rigid. This implies in particular that for minimal dynamical systems,
equicontinuity is equivalent to uniform rigidity.

Remark 2.65. There is also an analogue for mean equicontinuity, see [394]
and also [270], saying that every minimal dynamical system is either mean
equicontinuous or mean sensitive, which means that there is a δ > 0
 x ∈ X
such that for every and neighborhood U  x, there is y ∈ U such
that lim supn n1 n−1
i=0 d(T i x, T i y) > δ. A measure-theoretic version of the

dichotomy is due to [269], which states that given an ergodic T -invariant

Borel measure μ, (X, T ) is either μ-mean equicontinuous or μ-mean sensi-
tive, i.e. mean sensitive with “neighborhood U ” replaced by “Borel set U  x
with μ(U ) > 0”.
2.5. Mathematical Chaos 43

The paper of Li & Yorke [395] from 1973 might be called a popular
(partial) rediscovery of Sharkovskiy’s theorem [498] from 196415 , but it also
initiated the study of the following notions.
Definition 2.66. Let (X, T ) be a dynamical system on a metric space (X, d).
A pair of points x, y ∈ X is called a Li-Yorke pair if
lim inf d(T n (x), T n (y)) = 0 and lim sup d(T n (x), T n (y)) > 0.
n→∞ n→∞
A set S ⊂ X is called scrambled if (x, y) is a Li-Yorke pair for every two
distinct x, y ∈ S. The dynamical system is chaotic in the sense of Li and
Yorke if there is an uncountable scrambled set.

Huang & Ye [324, Theorem 4.1] proved that if a continuous dynamical

system is transitive and properly contains a periodic orbit, then it is chaotic
in the sense of Li-Yorke. In particular, Devaney chaos implies Li-Yorke chaos.
Remark 2.67. A quantitative version of Li-Yorke chaos is distributional
chaos, introduced by Schweizer & Smítal [492] (see also [45, 517] for the
versions DC1–DC3). It measures the proportion of time that points in a
scrambled set spent close to each other and far away from each other.
Example 2.68. Let us construct an uncountable scrambled set in the full
shift space X = {0, 1}N0 . First define an equivalence relation ∼ by setting
x ∼ y if there is n0 ∈ N such that either xn = yn for all n ≥ n0 or xn = yn
for all n ≥ n0 . That is, x and y have either the same or opposite tails. Each
equivalence class is countable, because for each fixed n0 there are finitely
many equivalent points with the same n0 . Since X is uncountable, there are
uncountably many equivalence classes.
Next, using the axiom of choice, construct a set Y ⊂ X that contains
exactly one point in each equivalence class.
Now define an injection π : X → X by π(x)j = xn for each 2n − 1 ≤
j < 2n+1 − 1. Then S = π(Y ) is uncountable and scrambled. Indeed, for
every x = y ∈ Y , there are infinitely many n such that xn = yn and then
d(σ 2 −1 ◦ π(x), σ 2 −1 ◦ π(y)) ≤ 2−n . Also there are infinitely many n such
n n

that xn = yn and then d(σ 2 −1 ◦ π(x), σ 2 −1 ◦ π(y)) ≥ 1 − 2−n .

n n

15 Sharkovskiy’s Theorem states that if a continuous map of the real line has a periodic point

of period n, it also has a periodic point of period m for every m ≺ n in the Sharkovskiy order
1 ≺ 2 ≺ 4 ≺ 8 ≺ · · · ≺ 4 · 7 ≺ 4 · 5 ≺ 4 · 3 · · · ≺ 2 · 7 ≺ 2 · 5 ≺ 2 · 3 · · · ≺ 7 ≺ 5 ≺ 3.
Sharkovskiy related during the 2018 IWCTA: International Workshop and Conference on Topology
& Applications (Kochi, India) in honor of his 1,000-th moon that the printer of his original
publication didn’t have the sign ≺ at his disposal, and therefore he suggested to use the letter
Y turned sideways. The publisher followed this suggestion but turned the Y in the different
direction as Sharkovskiy had intended, and therefore the Sharkovskiy order was first printed as

3 5 7 . . . 6 10 14 . . . 12 20 28 ...... 4 2 1 in [498]. Štefan [524] in

his 1977 proof used 3  5  7  . . . and the English translation of Sharkovskiy’s proof [499] by
Tolosa used 3 5 7 . . . .
44 2. Topological Dynamics

Similarly, all non-trivial subshifts of finite type (SFTs) are Li-Yorke

chaotic, but Sturmian subshifts (or more generally distal maps) are not Li-
Yorke chaotic (lim inf n d(T n (x), T n (y)) > 0 for distinct x = y ∈ X).
Exercise 2.69. Let X = AN0 be the full shift space for some alphabet A
containing a. Define π : X → X by

xk−n2 , n2 ≤ k ≤ n2 + n,
π(x)k =
a, n2 + n < k ≤ n2 + 2n.
Show that π(X) is a scrambled set.

An important, long conjectured, result ties Li-Yorke chaos to topological

Theorem 2.70. Every continuous dynamical system of positive entropy on
a compact space is Li-Yorke chaotic.

This is the main result of [83]; see also [482, Chapter 5] and [210]. The
converse is, however, not true. There exist examples of continuous (so-called
2∞ ) interval maps which have periodic points of period 2n for each n ∈ N and
no periodic points with other periods, which have (therefore) zero topological
entropy, but which still are Li-Yorke chaotic; see [516, 566]. Example 2.53
gives a subshift which has zero entropy but is Li-Yorke chaotic.
Theorem 2.71. Let X = {1, . . . , d}N . For every probability vector p =
(p1 , . . . , pd ), every scrambled set has zero p-Bernoulli measure.

Proof. Let (X, B, μp , σ) be the p-Bernoulli shift and assume by contradic-

tion that S ⊂ X is a scrambled set with μp (S) > 0. Take two distinct
Lebesgue density points16 a and b of S  , and for any n, let Zn (a) and
Zn (b) be the corresponding n-cylinders of a and b, respectively. Because
a and b are density points, the Lebesgue fractions of μp (σ n (S ∩ Zn (a))) and
μp (σ n (S ∩ Zn (b))) tend to 1 as n → ∞. That means that there are distinct
x, y ∈ S and some n ∈ N such that σ n (x) = σ n (y). But then (x, y) is not a
Li-Yorke pair. This contradiction shows that μp (S) = 0. 

2.6. Transitivity and Topological Mixing

Transitivity prevents that the phase space consist of multiple pieces that
don’t communicate with each other. Topological mixing prevents that they
communicate with each other only at a periodic sequence of iterates. There
16 For Lebesgue measure μ on Euclidean space, x is called a density point if lim
μ(A ∩ Bε (x))/μ(Bε (x)) = 1. The Lebesgue Density Theorem says that if μ(A) > 0 then μ-
a.e. x ∈ A is a density point of A. We now use the same result for Bernoulli measure μp , but
this is justified because there is a measure-preserving map ψ : (X, μp ) → ([0, 1], μ) which is also
continuous and sends cylinder sets to intervals; cf. Example 6.55.
2.6. Transitivity and Topological Mixing 45

are several related concepts in addition to (totally) transitive from Defini-

tion 2.12:
Definition 2.72. A dynamical system (X, T ) on a topological space is called
topologically mixing if for every two open sets U, V there is N ≥ 0 such
that U ∩ T −n (V ) = ∅ for all n ≥ N .

Topologically mixing dynamical systems on metric spaces are sensitive to

initial conditions (provided X consists of at least two points), and therefore
equicontinuous dynamical systems cannot be topologically mixing. In partic-
ular, since topological mixing is inherited by factors, the maximal equicon-
tinuous factor of a topologically mixing dynamical system is trivial.
Definition 2.73. A dynamical system (X, T ) on a topological space is called
topologically exact (also called locally eventually onto or leo for short)
if for every open set U there is N ≥ 0 such that T N (U ) = X.

Invertible dynamical systems (other than the identity on a singleton) are

never topologically exact, and neither are nontrivial dynamical systems with
zero entropy.
Lemma 2.74. If a dynamical system (X, T ) on a non-trivial metric space
(X, d) is topologically exact, then htop (T ) > 0.

Proof. Take x0 = x1 ∈ X, and choose 0 < ε < d(x0 , x1 )/3. Let U0 and U1
be the ε-neighborhoods of x0 and x1 , respectively. By topological exactness,
there is N ∈ N such that T N (U0 ) = X = T N (U1 ). Hence, for an arbitrary
n ∈ N and every w = w0 w1 · · · wn−1 ∈ {0, 1}n , there is xw ∈ X such that
T kN (xw ) ∈ Uwk for all 0 ≤ k < n. If w = w ∈ {0, 1}n , then the nN -distance
dnN (xw , xw ) > ε. Hence, every (nN, ε)-spanning set must contain at least
2n elements and htop (T ) ≥ N1 log 2 > 0. 
Theorem 2.75. If T : [0, 1] → [0, 1] is a continuous transitive interval
map, then htop (T ) ≥ 12 log 2. If in addition T is topologically mixing, then
htop (T ) > 12 log 2.

This result is due to Blokh; see [90, 92]. A compact exposition of this
and related results can be found in [482, Proposition 4.70].
Definition 2.76. A dynamical system (X, T ) on a topological space is
called weakly topologically mixing if for every four non-empty open sets
U1 , U2 , V1 , V2 , there is n such that U1 ∩ T −n (V1 ) = ∅ and U2 ∩ T −n (V2 ) = ∅,
or equivalently, the product system T × T on X × X is transitive.

When presenting these notions, we consistently write the adjective “topo-

logical” because there are also measure-theoretic versions of exact, mixing,
46 2. Topological Dynamics

and weak mixing. These are discussed in Section 6.7. Some specific differ-
ences exist; for instance, there is no topological analog of Theorem 6.86.
From the definition it is clear that topological weak mixing implies that
the product system (X 2 , T × T ) is transitive. In fact, Furstenberg [266]
showed that this holds for every N -fold Cartesian product (X N , T × · · · × T ).
An important result on topological weak mixing is the following multiple
recurrence (a dynamical version of Van der Waerden’s Theorem) due to
Furstenberg & Weiss [267]: if (X, T ) is minimal, then for every open set
U ⊂ X and m ∈ N, there is n ∈ N such that U × T n (U ) × T 2n (U ) × · · · ×
T mn (U ) = ∅. Glasner [276] extended this to multiple transitivity: if (X, T )
is minimal and topologically weak mixing, then for x in a residual subset of
X, the m-tuple (x, . . . , x) has a dense orbit under T ×T 2 ×· · ·×T m . Further
results can be found in e.g. [139, 424].
The following hierarchy (which also holds for the measure-theoretic ana-
log) will not come as a surprise:

Theorem 2.77. The following implications hold:

top. topologically topologically totally topologically

⇒ ⇒ ⇒ ⇒
exact mixing weak mixing transitive transitive

The reverse implications are in general false.

Counterexamples to the reverse implications can be found among subshifts:

full Petersen’s Chacon Fibonacci Thue-Morse
shift shift substitution shift substitution shift shift
where the Fibonacci, Chacon, and Thue-Morse substitution shifts are defined
in Examples 1.3, 1.27, and 1.6, respectively. Petersen’s shift [454] is an ex-
ample of a zero entropy subshift that is topologically mixing. Lemma 2.74
shows that it cannot be topologically exact.

Remark 2.78. Although none of the reverse implications in Theorem 2.77

hold in all generality, for many subshifts, some of these notions are equiva-
lent. For instance, sofic shifts and density shifts that are totally transitive
are topologically mixing; cf. [237] and Theorem 3.61. For coded and syn-
chronized shifts, total transitivity is equivalent to topologically weak mixing;
see [237, Theorem 1.1].

In terms of the set of visit times for sets U, V ⊂ X,

(2.9) N (U, V ) = {n ∈ N0 or Z : U ∩ T −n (V ) = ∅}.

2.7. Shadowing and Specification 47

The notions in this section can be expressed as follows. For all U, U  , V, V  ⊂

X open and non-empty:
• Topologically exact: x∈X N (U, {x}) is cofinite.
• Topologically mixing: N (U, V ) is cofinite.
• Topologically weak mixing: N (U, V ) ∩ N (U  , V  ) is infinite.
• Topologically transitive: N (U, V ) is non-empty.
• Totally transitive: ∀k N (U, V ) ∩ kN is non-empty.

2.7. Shadowing and Specification

Definition 2.79. Let (X, T ) be a dynamical system on a metric space (X, d).
A sequence (xn )n∈N0 or Z is called a δ-pseudo-orbit if d(T (xn ), xn+1 ) < δ for
all n ∈ N0 or Z. A point x ∈ X is chain-recurrent if for every δ > 0 there
is a δ-pseudo orbit of some length k such that x = x0 and d(T (xk ), x) < δ.

Chain recurrence is the weakest version of recurrence in a sequence of

periodic ⇒ periodically recurrent ⇒ uniformly recurrent
⇒ recurrent ⇒ non-wandering ⇒ chain-recurrent
and none of the reverse implications hold in general.
Given that every floating-point calculation has round-off errors, orbits
that a computer calculates numerically are always pseudo-orbits for some
small δ. Whether such a pseudo-orbit represents an approximation of an
actual orbit is captured in the following definition.
Definition 2.80. A dynamical system (X, T ) on a metric space (X, d) has
the shadowing property if for every ε > 0 there is δ > 0 such that for
every δ-pseudo-orbit (xn )n∈N0 or Z , there is y ∈ X so that orb(y) ε-shadows
(xn ); i.e. d(xn , T n (y)) < ε for all n ∈ N0 or Z.

By now, many variations of shadowing have been studied, for example

average shadowing (the average error needs to be smaller than ε), periodic
shadowing (periodic pseudo-orbits are ε-shadowed by actual periodic orbits),
limit shadowing (the ε in the shadowing tends to zero as the iterates |n| →
∞). We refer to the monograph by Pilyugin [461], although many variations
of shadowing are from a later date; cf. [55, 280, 423].
The seminal result for shadowing is the Anosov Shadowing Lemma [27]
for hyperbolic sets. Work by Bowen [99] showed that hyperbolic dynamical
systems, and this includes SFTs, have the shadowing property.
Definition 2.81. Let f : M → M be a C 1 diffeomorphism of a C 1 Rie-
mannian manifold M . An f -invariant set Λ is called hyperbolic if there
48 2. Topological Dynamics

is a uniformly transversal splitting Tq M = Eqs ⊕ Equ of the tangent spaces

that is continuous in q ∈ Λ, invariant under f , i.e. Dfq (Eqs ) = Efs (q) and
Dfq (Equ ) = Efu(q) , and tangent vectors in Eqs , resp. Equ , decrease exponen-
tially fast under forward, resp. backward, iteration.
If f : M → M is not invertible, then we need to select inverse branches in
order for Equ to be well-defined. The manifold contains stable and unstable
s (q) and W u (q) of q, tangent to E s and E u , respectively,
local manifolds Wloc loc q q
such that

n→∞ if x ∈ Wloc
s (q),
d(f (q), f (x)) → 0 exponentially, as
n n
n → −∞ if x ∈ Wloc u (q).

In the symbolic setting, i.e. a subshift (X, σ) takes the place of (M, f ), we
can define

s (q) = {x ∈ X : x = q for all n ≥ 0},
Wloc n n
Wloc (q) = {x ∈ X : xn = qn for all n ≤ 0}.

Theorem 2.82 (Anosov Shadowing Lemma). Let Λ be a hyperbolic set of

a C 1 diffeomorphism f : M → M , and let Λε denote the ε-neighborhood
of Λ. Then for every ε > 0 there is δ > 0 such that every δ-pseudo orbit
(xk ) ⊂ Λδ (finite, one-sided or two-sided infinite), there is x ∈ Λε such that
d(f k (x), xk ) < ε for all k.
The analogue of this theorem for periodic shadowing is called the Anosov
Closing Lemma. See [346, Sections 6.4 and 18.1].
One may think that uniform expansion is enough to guarantee shadow-
ing, but it is not as simple as that. For example (see [170] and [116, Theorem
6.3.5]), tent maps Ts with slope s ∈ (1, 2) have the shadowing property if
and only if the critical point c is recurrent or its kneading map is unbounded
(in the terminology of Section 3.6.3).
An important variation of shadowing, also introduced by Bowen [101],
is specification. In this case, no pseudo-orbits are involved, but particular
pieces of orbits are to be ε-shadowed for particular intervals of time, allowing
gaps in between that are inverse proportional to log ε.
Definition 2.83. A dynamical system (X, T ) on a metric space (X, d) has
specification for K points if for every ε > 0 there is a gap size N with the
following property: for all points x1 , . . . , xK ∈ X and iterates m1 ≤ n1 <
m2 ≤ n2 < · · · < mK ≤ nK with mk+1 − nk ≥ N , there is x ∈ X such that
(2.10) d(T j (x), T j−mk (xk )) < ε for all k ∈ {1, . . . , K}, mk ≤ j < nk .
Sometimes specification includes the requirement that x is periodic as well
(periodic specification) and that specification holds for all K ∈ N (strong
2.7. Shadowing and Specification 49

Remark 2.84. For subshifts (X, σ), this definition can be simplified. We
give the version for strong specification, because it is the one in most frequent
use in this context. There is a gap size N ∗ such that for all K ∈ N and every
K-tuple x1 , . . . , xK ∈ X and iterates m1 ≤ n1 < m2 ≤ n2 < · · · < mK ≤ nK
with mk+1 − nk ≥ N ∗ , there is x ∈ X such that
(2.11) xj = xkj−mk for all k ∈ {1, . . . , K}, mk ≤ j < nk .

Since d(x, xk ) ≤ 12 if and only if x0 = xk0 , condition (2.11) implies (2.10)

with N (ε) = N ∗ for ε > 12 . For ε ∈ (0, 12 ], condition (2.11) implies (2.10)
with N (ε) = N ∗ + n where n is minimal such that 2−n < ε.

The strength of specification is that a single orbit can shadow many

other orbits consecutively, in particular orbits that have different dynamical
Lemma 2.85. A dynamical system with specification for some K ≥ 2 is
topologically mixing and if the specification is periodic, then the set of periodic
orbits is dense.

Proof. Specification allows one to connect ε-neighborhoods of any two points

x1 , x2 by an orbit of length N = N (ε). To show topological mixing, take
n ≥ 1 arbitrary and m1 = n1 = 0, m2 = n2 = n1 + N (ε), and m2 = n2 =
n1 + N + n as in the definition of specification. Then there are x, x ∈ Bε (x1 )
such that T N (x) ∈ Bε (x2 ) and T N +n (x) ∈ Bε (x2 ) as required. Finally, for
any x1 ∈ X and ε > 0, we can find a periodic point x ∈ Bε (x1 ), so the set
of periodic points is dense. 

The next result is due to Bowen [101] and in more generality to Sigmund
[508, Proposition 3].
Proposition 2.86. Every continuous dynamical system with specification
for all K ∈ N on a compact metric space has positive topological entropy.

Proof. Take distinct points a, b ∈ X and let ε = d(a, b)/3. Let N be the
gap size associated to ε. Now for every K ∈ N and chain {x1 , . . . , xK } ⊂
{a, b}K and the integers mk = nk = mk+1 − N , there is a point x such that
d(T mk (x), xk ) < ε for k = 1, . . . , K. There are 2K choices of {x1 , . . . , xK }
and the corresponding points x are (nK , ε)-separated. Hence, according to
Definition (2.4), htop (T ) ≥ 1+N
log 2 > 0. 

The following was first shown by Bowen [103].

Lemma 2.87. Every continuous factor of a dynamical system (X, T ) with
specification on a compact metric space (X, d) has specification.
50 2. Topological Dynamics

Proof. Let (Y, S) be a factor of (X, T ) such that π : X → Y is the semi-

conjugacy. Since X is compact, π is uniformly continuous. Choose ε > 0
arbitrary, and take δ > 0 such that the π-image of every δ-neighborhood in X
is contained in an ε-neighborhood in Y . Find N = N (δ) as in Definition 2.83
of specification for (X, T ). Choose K ∈ N and m1 ≤ n1 < m2 ≤ n2 < · · · <
mK ≤ nK with gaps mk+1 − nk ≥ N and points y1 , . . . , yK ∈ Y arbitrary.
Choose xk ∈ π −1 (yk ) for each 1 ≤ k ≤ K. Since (X, T ) has specification,
there is x ∈ X that δ-shadows the pieces of orbits of the xk ’s at the required
time intervals. Thus y := π(x) ε-shadows the pieces of orbits of the yk ’s at
the required time intervals. This completes the proof. 
Theorem 2.88. Let (X, T ) be an expansive continuous dynamical system on
a compact metric space (X, d). If T has specification, then it is intrinsically
ergodic; i.e. T has a unique measure of maximal entropy.

This was proven in [103], and it applies of course to subshifts. Strong

specification makes it possible, and even easy, to approximate invariant mea-
sures in the weak∗ topology by equidistributions on periodic orbits. Indeed,
if x is a typical17 point for an ergodic T -invariant measure μ, then for ar-
bitrarily large n, we can find an n-periodic point pn that ε-shadows
n−1 the
orbit of x up to iterate n − N . The equidistribution μn := n i=0 δT i (p)

then tends to μ as n → ∞. Similar ideas work for non-ergodic measures;

see Definition 1.30. An extended version of this argument yields  that the
measure of maximal entropy is the weak∗ limit of #{p:Per(p)≤n}
Per(p)≤n δp
where Per(p) denotes the period of p; see [103] and [159]. Further variations
of specification were designed to extend this proof of intrinsic ergodicity to
dynamical systems where specification fails; see Buzzi [135], Climenhaga &
Thompson [159,160], and Kwietniak and coauthors [383,384]. This applies
for instance to (factors of) β-shifts and gap shifts.

17 In the sense that the Ergodic Theorem 6.13 holds for x.

Chapter 3

Subshifts of
Positive Entropy

Most of the subshifts of positive entropy are symbolic versions of positive

entropy dynamical systems of manifolds, for example dynamical system pos-
sessing a Markov partition, β-transformations, or unimodal interval maps.
Symbolically these correspond to β-shifts, kneading theory, and subshifts of
finite type (SFT), respectively, and the entropy is given by the exponential
growth rate of periodic points. We discuss also some subshifts that are not in
first instance symbolic versions of other dynamical systems, such as density
shifts, coded shifts, gap shifts, and spacing shifts, and in some cases (such
as power-free shift), the entropy is not related to periodic sequences at all.

3.1. Subshifts of Finite Type

Subshifts of finite type are the simplest and most frequently used subshifts
in applications. They emerge naturally in hyperbolic dynamical systems
such as toral automorphisms, Markov partitions of Anosov diffeomorphisms,
Axiom A attractors (including Smale’s horseshoe), but also in topological
Markov chains.

3.1.1. Definition of SFTs and Transition Matrices and Graphs.

Definition 3.1. A subshift of finite type (SFT) is a subshift consisting
of all strings avoiding a finite list of forbidden words as subwords.

For example, the Fibonacci SFT has 11 as forbidden word. Naturally,

then also 110 and 111 are forbidden, but we take only the smallest collection
of forbidden words. If M + 1 is the length of the longest forbidden word,

52 3. Subshifts of Positive Entropy

then this SFT is an M -step SFT, or an SFT with memory M . Indeed,

an M -step SFT has the property that if uv ∈ L(X) and vw ∈ L(X) and
if |v| ≥ M , then uvw ∈ L(X) as well. The following property is therefore

Lemma 3.2. Every SFT (X, σ) on a finite alphabet can be recoded such that
the list of forbidden words consists of 2-words only.

Proof. Assume that (X, σ) is a subshift over the alphabet A and the longest
forbidden word has length M + 1 ≥ 2. Take a new alphabet à = AM ,
say a1 , . . . , an are its letters. Recode every x ∈ X using a sliding block
code π, where for each index i, π(x)i = aj if aj is the symbol used for
xi xi+1 · · · xi+M −1 . Effectively, this is replacing X by its M -block code. Then
every M + 1-word is uniquely coded by a 2-word in the new alphabet Ã, and
vice versa, every a1 a2 such that the M -suffix of π −1 (a1 ) equals the M -prefix
of π −1 (a2 ) encodes a unique M + 1-word in A∗ . Now we forbid a 2-word
a1 a2 in Ã2 if π −1 (a1 a2 ) contains a forbidden word of X. Since B is finite,
and therefore A is finite, this leads to a finite list of forbidden 2-words in the
recoded subshift. 

Example 3.3. Let X be the SFT with forbidden words 11 and 101, so
M = 2. We recode using the alphabet a = 00, b = 01, c = 10, and
d = 11. Draw the vertex-labeled transition graph (see Figure 3.1); labels at
the arrows indicate which word in {0, 1}3 they stand for. For example, the
edge a → b labeled 001 has prefix a = 00 and suffix b = 01. Each arrow
containing a forbidden word is dashed and then removed in the right panel
of Figure 3.1.

000 a b a b
100 011

c d 111 c

Figure 3.1. The recoding of the SFT with forbidden words 11 and 101.

Corollary 3.4. Every SFT (X, σ) on a finite alphabet A can be represented

by a finite graph G with vertices labeled by the letters in A and arrows
b1 → b2 only if π −1 (b1 b2 ) contains no forbidden word of X.
3.1. Subshifts of Finite Type 53

Definition 3.5. The directed graph G constructed in the previous corollary

is called the transition graph of the SFT. The matrix A = (aij )i,j∈A with
ai,j = #{arrows i → j in G} is its transition matrix. The graph is vertex-
labeled, which means that to each vertex there is assigned a symbol in the
alphabet. We will stipulate throughout this book that the vertex-labels
are unique (i.e. no two distinct vertices have the same label), although this
assumption is not entirely uniform in the literature.

Definition 3.6. A non-negative N × N matrix A = (aij )i,j∈A is called

irreducible if for every i, j ∈ A there is k such that Ak has (i, j)-entry
(k) (k)
aij > 0. For index i, set per(i) = gcd(k > 1 : aii > 0). If A is irreducible,
then per(i) is the same for every i, and we call it the period of A. We call
A aperiodic if its period is 1. The matrix is called primitive if there is
k ∈ N such that aij > 0 for all i, j ∈ A.

Exercise 3.7. Show that if A is aperiodic and irreducible, then A is primi-

tive, but irreducibility or aperiodicity alone doesn’t imply primitivity. Con-
versely, if A is primitive, then it is also aperiodic and irreducible. If A is
irreducible, show that per(i) is indeed independent of i.

Lemma 3.8. Every irreducible SFT is synchronized; in fact, every word of

length M (the memory of the SFT) is synchronizing.

Proof. Let v be any word of length M . If uv ∈ L(X), then u has no

influence on what happens after v. Hence if vw ∈ L(X), then uvw ∈ L(X).
Irreducibility of A then gives a dense orbit. 

Example 3.9. Let T : [0, 1] → [0, 1] be the piecewise monotone map;

i.e. there is a finite partition {Ji }i∈A of [0, 1] into intervals such that T |Ji
is continuous and monotone for each i. Assume also that for each i, T (Ji )
is the closure of the union of Jk ’s. In this case we call {Ji }i∈A a Markov
partition. Write

1 if T (Ji ) ⊃ Jj◦ ,
aij =
0 if T (Ji ) ∩ Jj◦ = ∅.
Then the resulting matrix A = (ai,j )i,j∈A is the transition matrix for the
subshift obtained by taking the closure of the collection of itineraries {i(x) :
x ∈ [0, 1]}. This yields a one-sided shift.
The example in Figure 3.2 produces the transition matrix A = 01 11 , so
the corresponding subshift is the Fibonacci SFT; see Example 1.3. It should
not come as a surprise that the leading eigenvalue of A is exactly the slope
of T : both equal ehtop (T ) = ehtop (σ) = γ; see Section 3.1.2.
For the bi-infinite Fibonacci SFT, we can look at a toral automorphism.
54 3. Subshifts of Positive Entropy

⎨γ(x + 2−γ
if x ∈ J1 := [0, γ−1
γ ) γ ],
T (x) =
⎩γ(1 − x) if x ∈ J2 := [ γ−1
γ , 1],

γ= 2

J1 J2

Figure 3.2. The tent map with slope equal to the golden mean.

Definition 3.10. A toral automorphism T : Td → Td is an invertible

linear map on the (d-dimensional) torus Td . Each such T is of the form
TM (x) = M x mod 1, where
• M is an integer matrix with det(M ) = ±1 (i.e. M is unimodular);
• the eigenvalues of M are not on the unit circle; this property is
called hyperbolicity; for toral automorphisms, this is equivalent
to Td being a hyperbolic set in terms of Definition 2.81.

The map TM has a Markov partition1 , that is, a partition {Ji }i∈A for
sets such that:

(1) The Ji have disjoint interiors and i Ji = Td .
(2) If TM (Ji◦ )∩Jj◦ = ∅, then TM (Ji ) stretches across Jj◦ in the unstable
direction (i.e. the direction spanned by the unstable eigenspaces of
M ).
(3) If TA−1 (Ji◦ ) ∩ Jj◦ = ∅, then TA−1 (Ji ) stretches across Jj◦ in the stable
direction (i.e. the direction spanned by the stable eigenspaces of
M ).
Every hyperbolic toral automorphism has a Markov partition (see [100]),
but in general they are fiendishly difficult to find explicitly, especially in
dimension ≥ 3 where the boundaries of the Ji might have to be fractal 1 1(see

[104]). Therefore we confine ourselves to the simpler case of M = 1 0 , a
Markov partition of three rectangles Ji for i = 1, 2, 3 can be constructed; see
Figure 3.3. The corresponding transition matrix is
⎛ ⎞ 
0 1 1
1 if TM (Ji◦ ) ∩ Jj = ∅,
A = (ai,j ) = ⎝1 0 1⎠ where aij =
0 1 0 0 if TM (Ji◦ ) ∩ Jj = ∅.

1 The construction of Markov partitions for toral automorphisms on T2 goes back to Berg [58]

and Adler & Weiss [10], extended to more general settings in [99, 100, 258, 512] among others.
3.1. Subshifts of Finite Type 55






di re
e J2

Figure 3.3. The Markov partition for TM : T2 → T2 ; the catmap is TM


The characteristic polynomial of A is

det(A − λI) = −λ3 + 2λ + 1
= −(λ + 1)(λ2 − λ − 1) = −(λ + 1) det(M − λI),
so A has the eigenvalues of M (no coincidence!), together with λ = −1.
Example 3.11. The most “famous” toral automorphism is Arnol’d’s catmap,
and it has the matrix 21 11 = 11 10 ; see Figure 3.3 (right). It is called catmap
because Arnol’d used this example, including the drawing of a cat’s head, in
his book(s) [30] to illustrate the nature of hyperbolic maps.
Exercise 3.12. Show that if x ∈ Td has only rational coordinates, then
x is periodic under a toral automorphism. Conclude that, if the pixels in
Figure 3.3 have rational coordinates (such as the dyadic coordinates that
computers use), then the cat will return intact after a finite number of iter-

The following characterization for shadowing subshifts is due to Wal-

ters [552] (see also [381, Theorem 3.33] and [279] for an entirely general
characterization of dynamical systems with shadowing in terms of SFTs).
Theorem 3.13. A subshift (X, σ) has the shadowing property if and only if
it is a subshift of finite type.

Proof. We give the proof for X ⊂ AN0 only; the two-sided case follows in a
similar way.
⇐: Let (X, σ) be an SFT of memory M (see below Definition 3.1) so
M + 1 is the length of the longest forbidden word. Let ε > 0 be arbitrary
and choose m ≥ M + 1 so small that 2−m < ε. Take δ = 22−m . We need to
56 3. Subshifts of Positive Entropy

show that every δ-pseudo-orbit (xn )n≥0 ⊂ X (in other words,

σ(xn )0 · · · σ(xn )m−3 = xn1 · · · xnm−2 = xn+1
0 · · · xn+1
for every n), there is y ∈ X that ε-shadows (xn )n≥0 . To this end, set yn = xn0
for each n ≥ 0. Then for 0 ≤ i < m, we have
yn+i = xn+i
0 = xn+i−1
1 = xn+i−2
2 = · · · = xni ,
so yn · · · yn+m−1 = xn0 · · · xnm−1 ∈ L(X). Since X is an SFT, y ∈ X and
d(σ n (y), xn ) < ε by construction.
⇒: Let (X, σ) be a subshift with the shadowing property, so in particular,
for ε = 1, there exists δ > 0 such that every δ-pseudo-orbit in X is ε-
shadowed in X. Take N ∈ N such that 22−N < δ, and let y ∈ AN0 be such
that yn · · · yn+N −1 ∈ L(X) for each n. Then there exists a sequence (xn )n≥0
such that xn0 · · · xnN −1 = yn · · · yn+N −1 for each n ≥ 0. Therefore
σ(xn )0 · · · σ(xn )N −2 = xn1 · · · xnN −1 = yn+1 · · · yn+N −1 = xn+1
0 · · · xn+1
N −2

and d(σ(xn ), xn+1 ) ≤ 2−N +2 < δ. Hence (xn )n≥0 is a δ-pseudo-orbit, which
can be ε-shadowed by some z ∈ X. But then zn = xn0 = yn for every n ≥ 0,
so z = y ∈ X. Since y was arbitrary, up to the condition that each of its
N -blocks belongs to L(X), it follows that the only restriction of X involves
forbidden blocks of length ≤ N . Therefore X is an SFT. 

3.1.2. Topological Entropy of SFTs.

Theorem 3.14. The topological entropy of an SFT equals max{0, log λ}
where λ is the leading eigenvalue of the transition matrix A.

If A is irreducible, then the Perron-Frobenius Theorem 8.58 gives that

λ ≥ 1 (and λ > 1 if A is not a permutation matrix).
Proof. Let An = (aij )i,j∈A be the n-th power of A. Every word in Ln (X)
corresponds to an n-path in the transition graph, and the number of n-paths
from i to j is given by aij . Using the Jordan normal form A = U JU −1 , we

can find C ≥ 1 such that

C −1 λn ≤ max aij ≤ Cn#A λn .

It follows that C −1 λn
≤ p(n) ≤ (#A)2 Cn#A λn (where p(n) stands for the
word-complexity; see Definition 1.9) and limn n1 log p(n) = log λ. 
Proposition 3.15. If (Y, σ) is a factor of (X, σ), then htop (Y, σ) ≤ htop (X, σ).
If (X, σ) and (Y, σ) are conjugate, then htop (X, σ) = htop (Y, σ).

The result also holds in general, i.e. not just in the context of subshifts,
see Corollary 2.51, but using the word-complexity and sliding block codes,
the proof is particularly straightforward here.
3.1. Subshifts of Finite Type 57

Proof. Let ψ : X → Y be the factor map. Since it is continuous, it is a

sliding block code by Theorem 1.23, say of window length 2N + 1. Therefore
the word-complexities relate as pY (n) ≤ pX (n + 2N ), and hence
1 1
lim sup log pY (n) ≤ lim sup log pX (n + 2N )
n→∞ n n→∞ n
n + 2N 1
= lim sup log pX (n + 2N )
n→∞ n n + 2N
= lim sup log pX (n + 2N ).
n→∞ n + 2N

This proves the first statement. Using this in both directions, we find
htop (X, σ) = htop (Y, σ). 

As shown by Parry [446], see Theorem 6.67, irreducible SFTs are intrin-
sically ergodic. This follows also from Theorem 3.48 and Proposition 3.41.
Weiss [556] showed that factors of irreducible SFTs are intrinsically ergodic
as well.

3.1.3. Vertex-Splitting and Conjugacies between SFTs. It is natural

to ask which SFTs are conjugate to each other. We have seen that having
equal topological entropy is a necessary condition for this, but it is not
sufficient. The conjugacy problem for SFTs was solved by Williams and in
this section we discuss the ingredients required for this result. Complete
details can be found in [364, 398].
We know that an SFT (X, σ) has a graph representation (as vertex-
labeled subshift or edge-labeled subshift, and certainly not unique). The
following operation on the graph G, called vertex splitting, produces a
related subshift:

v1 v  v  v1 v 

v2 v v v v2 v

Figure 3.4. Insplit graph Original G Outsplit graph

Let G = (V, E) where V is the vertex set and E the edge set. For each
v ∈ V , let Ev ⊂ E be the set of edges starting in v and let E v ⊂ E be the
set of edges terminating in v.
58 3. Subshifts of Positive Entropy

Definition 3.16. Let G = (V, E), and assume that #E v ≥ 2. An elemen-

tary insplit graph Ĝ = (V̂ , Ê) is obtained by
• doubling one vertex v ∈ V into two vertices v1 , v2 ∈ V̂ ;
• replacing each e = (v → w) ∈ Ev for w = v by an edge ê1 = (v1 →
w) and ê2 = (v2 → w);
• replacing each e = (w → v) ∈ E v for w = v by a single edge
ê1 = (w → v1 ) or an edge ê2 = (w → v2 ) (but make sure that v1
and v2 both have incoming edges);
• replacing each loop (v → v) by two edges (v1 → vi ), (v2 → vi ) ∈ Ê
(so one of them is a loop) where i ∈ {1, 2}.
An insplit graph is then obtained by successive elementary insplits.

(Elementary) outsplit graphs are defined similarly, interchanging the

roles of Ev and E v .
Definition 3.17. Let G = (V, E), and assume that #Ev ≥ 2. An elemen-
tary outsplit graph Ĝ = (V̂ , Ê) is obtained by
• doubling one vertex v ∈ V into two vertices v1 , v2 ∈ V̂ ;
• replacing each e = (v → w) ∈ Ev for w = v by a single edge
ê = (v1 → w) or ê = (v2 → w) (but make sure that v1 and v2 both
have outgoing edges);
• replacing each e = (w → v) ∈ E v for w = v by an edge ê = (w →
v1 ) and an edge ê = (w → v2 );
• replacing each loop (v → v) by two edges (vi → v1 ), (vi → v2 ) ∈ Ê
(so one of them is a loop) where i ∈ {1, 2}.
An outsplit graph is then obtained by successive elementary outsplits.

If every e ∈ E had a unique label, then we will also give each ê ∈ Ê a

unique label.
Proposition 3.18. Let Ĝ be an in- or outsplit graph obtained from G. Then
the edge-labeled subshift X̂ of Ĝ and the edge-labeled subshift X of G are
mutually semi-conjugate to each other.

Proof. We give the proof for an elementary outsplit graphs Ĝ; the general
outsplit and (elementary) insplit graphs follow similarly. By Theorem 1.23,
it suffices to give sliding block code representations for π : X̂ → X and
π̂ : X → X̂.
• The factor map π : X̂ → X is simple. If ê ∈ Ê replaces e ∈ E, then
f (ê) = e and π(x)i = f (xi ).
3.1. Subshifts of Finite Type 59

• Each 2-word ee ∈ L(X) uniquely determines the first edge ê of

the 2-path in Ĝ that replaces the 2-path in G coded by ee . Set
fˆ(e, e ) = ê and π̂(x)i = fˆ(xi , xi+1 ).
This concludes the proof. In general, mutual semi-conjugacy is not enough
to conclude conjugacy (it is not given that π̂ = π −1 ), but in this situation,
conjugacy holds; see Theorem 3.24. 

Now let Ĝ = (V̂ , Ê) be an outsplit graph of G = (V, E) with transition

matrices  and A, respectively. Assume that N̂ = #V̂ and N = #V . Then
there is an N × N̂ -matrix D = (dv,v̂ )v∈V,v̂∈V̂ where dv,v̂ = 1 if v̂ replaces v.
(Thus D is a sort of rectangular diagonal matrix.)
There also is an N̂ ×N -matrix C = (cv̂,v )v̂∈V̂ ,v∈V where cv̂,v is the number
of edges e ∈ E v that replace an edge ê ∈ Êv̂ .
Proposition 3.19. With the above notation,
DC = A and CD = Â.

Sketch of proof. Prove it first for an elementary outsplit, and then com-
pose elementary outsplits to a general outsplit. For the first step, we compute
the elementary outsplit for the example of Figure 3.4.
⎛ ⎞
⎛ ⎞ 0 0 0 1
1 1 1 ⎜1 1 1 0⎟
A = ⎝0 1 1⎠ and  = ⎜ ⎝0 0 1 1⎠ .

1 0 0
1 1 0 0
Also ⎛ ⎞
⎛ ⎞ 0 0 1
1 1 0 0 ⎜1 1 0⎟
D = ⎝0 0 1 0⎠ and C=⎜
1 1⎠
0 0 0 1
1 0 0
Matrix multiplications confirms that DC = A and CD = Â. 
Exercise 3.20. Do the same for the elementary insplit graph in the example
of Figure 3.4.
Definition 3.21. Two matrices A and  are strongly shift equivalent
(of lag ) (denoted as A ≈ Â) if there are (rectangular) matrices Di , Ci and
Ai , 1 ≤ i ≤  over N0 such that
(3.2) A = A0 , Ai−1 = Di Ci , Ci Di = Ai , i = 1, . . . , , A = Â.
Remark 3.22. One important restriction of this definition is that the con-
jugating matrices must have non-negative integer entries. Even if a square
60 3. Subshifts of Positive Entropy

matrix has determinant ±1, its inverse may still have negative integers among
its entries. For example
4 1 3 2
A= and  =
1 0 2 1
1 1 1 1
are similar via 1 −1 A = Â 1 −1 . From this, we can easily compute that the
traces tr(An ) = tr(Ân ) for all n ∈ Z, so A and  share ζ-functions ζA (t) :=

exp( ∞ n
n=0 tr(A )). However, A and  are not (strongly) shift equivalent.
This is Williams’s [559, Example 3] counterexample to Bowen’s question of
whether sharing ζ-functions for SFTs suffices to conclude conjugacy.
Exercise 3.23. Show that strong shift equivalence ≈ is indeed an equiv-
alence relation between non-negative square matrices. Show that A ≈ Â
implies that A and  have the same leading eigenvalue λ = λ̂.

Strong shift equivalence between matrices A and  means, in effect, that

their associated graphs G and Ĝ can be transformed into each other by a
sequence of elementary vertex-splittings and their inverses (vertex-mergers).
Conjugacy between SFTs can always be reduced to vertex-splittings and
vertex-mergers, as shown in Williams’s Theorem [559] from 1973. The full
proof is in [364, Chapter 2] and [398, Chapter 7, specifically Theorem 7.2.7].
Theorem 3.24. Two SFTs are conjugate if and only if their transition ma-
trices are strongly shift equivalent.

Strong shift equivalence A ≈ Â may be a complete invariant for conju-

gacy between edge-labeled SFTs XA and XÂ . In practice it is difficult to
check if A ≈ Â. Even if A and  have the same characteristic polynomial,
they need not be strongly shift equivalent. The following weaker notion may
Definition 3.25. Two matrices A and  are shift equivalent (of lag )
(denoted as A ∼ Â) if there are matrices C, D over N0 such that
(3.3) A = CD, Â = DC and AC = C Â, ÂD = DA.
Said differently, the following diagram commutes:

A A−1 A
Zn Zn Zn Zn


Zn̂ Zn̂ Zn̂
3.2. Sofic Shifts 61

Shift equivalence means that the -th powers A and  are strong shift
equivalent (with lag 1). Shift equivalence is easier to verify than strong shift
equivalence, although verification can still be very complicated. But, and
this is Williams’s Conjecture, it is still not fully2 known if it is a complete
invariant; see [398, Section 7.3] and [106, Problem 19.1]. If A ∼ Â, then XA
and XÂ cannot be conjugate, but if A ∼ Â, this is insufficient to conclude
that (XA , σ) and (XÂ , σ) are conjugate.
Exercise 3.26. Show that (i) A ∼  implies A ∼k  for all k ≥ , (ii)
shift equivalence ≈ is an equivalence relation between non-negative square
matrices, and (iii) strong shift equivalence implies shift equivalence, with the
same value of .
Shift equivalence matrices have the same ζ-function, and many other
properties coincide too.
Lemma 3.27. If A and  are shift equivalent (of lag ), then they have the
same non-zero eigenvalues (so also htop (XA , σ) = htop (XÂ , σ)).

Proof. We have An C = C Ân and DAn = Ân D for all n ≥ 0. By linearity,

q(A) · C = C · q(Â) and D · q(Â) = q(A)D for every polynomial. If q
is the characteristic polynomial of A (so q(A) = 0 by the Cayley-Hamilton
Theorem), then 0 = D ·q(A)·C =  ·q(Â). Thus  has no other eigenvalues
than those of A, possibly plus 0. On the other hand, if q is the characteristic
polynomial of Â, then 0 = C · q(Â) · D = q(A) · A , so A has the eigenvalues
of Â, with the possible exception of 0.
Since htop (XA , σ) = log λA for the leading eigenvalue λA of A, the en-
tropies are the same too. 

In order to say what can be proved with shift equivalence, we define SFTs
(XA , σ) and (XÂ , σ) to be eventually conjugate if the n-block shifts are
conjugate for all sufficiently large n. Then, see [398, Theorem 7.5.15]:
Theorem 3.28. Two SFTs (XA , σ) and (XÂ , σ) are eventually conjugate if
and only if A and  are shift equivalent.
There remain many open (classification) problems in SFT, as well as in
sofic and other subshifts. The survey of Boyle [106] contains a long list of
open problems, many of which remain open to today.

3.2. Sofic Shifts

Sofic shifts are shifts that can be described by finite edge-labeled (rather
than vertex-labeled as needed for SFT) transition graphs. The word sofic
2 Kim & Roush [361] gave a negative answer, but only for reducible matrices.
62 3. Subshifts of Positive Entropy

was coined by Benjy Weiss; it comes from the Hebrew word for “finite”. Much
of this section can be found in concise form in [364, Section 6.1].

Definition 3.29. A subshift (X, σ) is called sofic if it is the space of paths

in an edge-labeled graph. Other than with the vertex-labeling, in this edge-
labeling, more than one edge is allowed to have the same symbol.

Lemma 3.30. Every SFT is sofic.

Proof. Assume that the SFT has memory M . Let G be the vertex-labeled
M -block transition graph of the SFT; i.e. each a1 · · · aM ∈ LM (X) is the
label of a unique vertex. We have an edge a1 · · · aM → b1 · · · bM if and only
if a1 · · · aM bM = a1 b1 · · · bM ∈ LM +1 (X), and then this M + 1-word is also
the label of the edge. Since each infinite vertex-labeled path is in one-to-
one correspondence with an infinite edge-labeled path and also in one-to-one
correspondence with an infinite word in X, we have represented X as a sofic

Remark 3.31. Not every sofic shift is an SFT. For example the even shift
(Example 1.4) has an infinite collection of forbidden words, but it cannot be
described by a finite collection of forbidden words. Sofic shifts that are not
of finite type are called strictly sofic.

The following theorem shows that we can equally define the sofic subshifts
as those that are a factor of a subshift of finite type.

Theorem 3.32. A subshift X is generated by an edge-labeled graph if and

only if it is the factor of an SFT.

Proof. ⇒: Let G be the edge-labeled graph of X, with edges labeled in

alphabet A. Relabel G in a new alphabet A such that every edge has a
distinct label. Call the new edge-labeled graph G  . Due to the injective
edge-labeling, the edge-subshift X  generated by G  is isomorphic to an SFT.
For this, we can take the dual graph in which the edges of G  are the vertices,
and a → b if and only if a labels the incoming edge and b the outgoing edge
of the same vertex in G  . Now π : X  → X is given by π(x)i = a if a is
the label in G of the same edge that is labeled xi in G  . This π is clearly a
sliding block code, so by Theorem 1.23, π is continuous and commutes with
the shift.
⇐: If X is a factor of an SF T , then the factor map is a sliding block code
by Theorem 1.23, say of window size 2N + 1: π(x)i = f (xi−N , . . . , xi+N ).
Represent the SFT by an edge-labeled graph G  where the labels are the
2N + 1-words w ∈ L2N +1 (X). These are all distinct. The factor map turns
G  into an edge-labeled graph G with labels f (w). Therefore X is sofic. 
3.2. Sofic Shifts 63

Corollary 3.33. Every factor of a sofic shift is again a sofic shift. Every
shift conjugate to a sofic shift is again sofic.

In fact, a sofic shift with an irreducible transition matrix is always tran-

sitive, has a dense set of periodic points, and is mixing if and only if it is
totally transitive; see [47, Theorem 3.3].

3.2.1. Follower sets. A further characterization of sofic shifts relies on the

following notion.
Definition 3.34. Given a subshift X and a word v ∈ L(X), the follower
set F (v) is the collection of words w ∈ L(X) such that vw ∈ L(X).
Example 3.35. Let Xeven be the even shift from Example 1.4. Then F (0) =
L(Xeven ) because a 0 casts no restrictions on the follower set. Also F (011) =
L(Xeven ), but F (01) = 1L(X) = {1w : w ∈ L(X)}. Although each follower
set is infinite, there are only these two distinct follower sets. Indeed, F (v0) =
F (0) for every v ∈ L(X), and F (v0111) = F (v01), F (v01111) = F (v011),
etc. The follower set F (1) is not properly defined, but we can ignore this.

The following theorem, appearing in [556], is in fact a consequence of

the Myhill-Nerode Theorem [428, 430].
Theorem 3.36. A subshift (X, σ) is sofic if and only if the collection of its
follower sets is finite.

Proof. First assume that the collection V = {F (v) : v ∈ L(X)} is finite.

We will build an edge-labeled graph representation G of X as follows:
(1) Let V be the vertices of G.
(2) If a ∈ A and w ∈ L(X), then F (wa) ∈ V ; draw an edge F (w) →
F (wa), and label it with the symbol a. (Although there are infin-
itely many w ∈ L(X), there are only finitely many follower sets,
and we need not repeat arrows between the same vertices with the
same label.)
The resulting edge-labeled graph G represents X.
Conversely, assume that X is sofic, with edge-labeled graph representa-
tion G. For every w ∈ L(X), consider the collection of paths in G representing
w, and let T (w) be the collection of terminal vertices of these paths. Then
F (w) is the collection of infinite paths starting at a vertex in T (w). Since
G is finite and there are only finitely many subsets of its vertex set, the
collection of follower sets is finite. 
Definition 3.37. An edge-labeled transition graph G is right-resolving if
for each vertex v ∈ G, the outgoing arrows all have different labels. It is
64 3. Subshifts of Positive Entropy

called follower-separated if for each vertex v ∈ G, the follower set (i.e. the
set of labeled words associated to paths starting in v) is different from the
follower set of every other vertex.

Every sofic shift has a right-resolving follower-separated graph represen-

tation and if we minimize the number of vertices in such a graph, there is
only one such graph with these properties. In fact, the follower set rep-
resentation G constructed in the first half of the proof of Theorem 3.36 is
both right-resolving, follower-separated, and of smallest size. The latter two
properties follow by the choice of V . To see the former, assume that v ∈ V
and v → w and v → w have the same label a. This implies that
F (w) = {x : ax ∈ F (v)} = F (w ),
so w = w .
Corollary 3.38. Every transitive sofic shift X is synchronized and (unless
it is a single periodic orbit) has positive entropy. In fact, htop (X) = log λA ,
where λA is the leading eigenvalue of the transition graph of the minimal
right-resolving representation of X.

Proof. Let edge-labeled graph G be the right-resolving follower-separated

representation of X. Pick any word u ∈ L(X) and let T (u) be the collection
of terminal vertices of paths in G representing u. If T (u) consists of one
vertex v ∈ V , then every path containing u goes through v, and there is a
unique follower set F (u), namely the collection of words representing paths
starting in v. In particular, u is a synchronizing word.
If #T (u) > 1, then we show that we can extend u to the right so that
it becomes a synchronizing word. Suppose that v = v  ∈ T (u). Since G is
follower-separated, there is u1 ∈ L(X) such that u1 ∈ F (v) but u1 ∈ / F (v  )
(or vice versa; the argument is the same). Extend u to uu1 . Because G is
right-resolving, u1 can only represent a single path starting at any single
vertex. Therefore #T (uu1 ) ≤ #T (u). But since u1 ∈ / F (v  ), we have in
fact #T (uu1 ) < #T (u). Continue this way, extending uu1 until eventually
#T (uu1 · · · uN ) = 1. Then uu1 · · · uN is synchronizing. (In fact, what we
proved here is that every u ∈ L(X) can be extended on the right to a
synchronizing word.)
The positive entropy follows from Theorem 1.20 or Corollary 3.47. In
fact, since G is right-resolving, there is an at most #V -to-one correspondence
between n-paths starting in G and words in Ln (X). Therefore #{n-paths} ≤
pX (n) ≤ #V · #{n-paths}, and we can use Theorem 3.14. 
Remark 3.39. Irreducible sofic shifts are intrinsically ergodic; see [556] and
Theorem 3.48.
3.3. Coded Subshifts 65

3.3. Coded Subshifts

Rather than forbid words to appear, as one does in SFTs, we can prescribe
which words need to be used, and then these words can be concatenated
freely. This type of subshift was first described by Blanchard & Hansel [84].

Definition 3.40. A coded subshift (XC , σ) is the closure of the collection

of free concatenations of a finite or countable collection C.

Of course, this doesn’t mean that concatenations of words in C are the

only words in the language L(X). For example, if C = {10, 01}, then 00 ∈
L(X) \ C ∗ .

Proposition 3.41. Every transitive SFT is a coded shift.

For example, the Fibonacci SFT of Example 1.3 and the even shift of
Example 1.4 are both coded subshifts, with sets of code words C = {0, 01}
and C = {0, 01}, respectively. On the other hand,
 the SFT (XA , σ) on the
alphabet {0, 1} with transition matrix A = 0 1 is not transitive, and it is
1 1

also not a coded shift, because no code word containing 01 can ever be used
twice in a concatenation.

Proof. Rewrite the SFT to an SFT with memory M = 1; i.e. all forbidden
words have length ≤ 2. Let G be the transition graph; since the SFT is
transitive, G is strongly connected. Fix vertices a, b such that the arrow a → b
occurs in G. Now let S contain the codes of all finite paths b → · · · → a;
these can be freely concatenated. 

Remark 3.42. Naturally, the set C of codes may not be the most economical,
but the idea of the proof of Proposition 3.41 is quite general. It can also be
used to show that sofic and synchronized subshifts are coded. Therefore we
have the inclusion:

SFTs ⊂ sofic shifts ⊂ synchronized subshifts ⊂ coded subshifts.

All these inclusions are strict. For instance, Dyck shifts are coded but
not synchronized; see Section 3.10. Coded shifts are always transitive, but
not always totally transitive; indeed, if the lengths of all code words is a
multiple of N ≥ 2, then σ N can easily be non-transitive (but not necessarily;
see [180, Theorem 4.1]). Totally transitive coded subshifts are always weak-
mixing (since they have a dense set of periodic orbits, see [325, Corollary
3.6]), and also topologically mixing, see [180, Theorem 2.2]. Thus for coded
systems, these three notions coincide.
66 3. Subshifts of Positive Entropy

It is useful to make some distinction between sequences that are the

concatenations of “short” words:
(3.4) VC = {x ∈ AZ : ∃ (sk )k∈Z ⊂ Z such that xsk · · · xsk+1 −1 ∈ C},
and sequences for which every finite subword appears as a subword of “long”
words in C:
(3.5) UC = {x ∈ AZ : ∀ k ∈ N, x−k · · · xk is a subword of some word in C}.
We have XC = VC ⊃ UC .
Example 3.43. The odd shift Xodd (recall from Example 1.4 that in this
subshift, blocks of 0’s have odd lengths) is a coded shift with collection of
code words
C = {1, 10, 1000, 10000, . . . , 102n−1 , . . . }.
The sequence · · · 010101010 · · · belongs to VC but not to UC . On the other
hand, · · · 000000 · · · belongs to UC but not to VC . The sequence · · · 0001000 · · ·
belongs to neither but lies in the closure VC (but not in UC ; in fact UC =
UC = {0∞ }).

One can view coded shifts by means of (infinite) edge-labeled transition

graphs GC , with a central vertex v0 from which loops of length  emerge.
q = #C for C := {C ∈ C : |C| = }.
The theory of infinite Markov graphs, as summarized in Section 8.7, should
then be applicable. In particular, lim sup 1 log q = hG (G) is the Gure-
vich entropy; see Definition 8.63. According to Theorem 8.73, unless GC is
transient, the topological entropy ought to be the leading root of
(3.6) F (h) := q e−h = 1.

This is indeed true in many cases, see e.g. Examples 3.43 (see Exercise 3.115)
and Example 3.49 below, but there are two problems.
First, the space of paths on GC can multiply code the points in XC ,
leading to an overestimate of the entropy. We call x ∈ XC recognizable
or uniquely decipherable if the sequence (sk ) in (3.4) is unique. The
collection of code words C has the unique decomposition property if
every finite word w ∈ L(XC ) can be decomposed in at most one way into
words of C.
Example 3.44. Let C = {0, 10, 100}; then XC is the Fibonacci SFT, but
clearly the word 100 is superfluous here, since it is the concatenation of
the first two. Thus XC is neither uniquely decipherable nor has the unique
decomposition property. The entropy is not the logarithm of the silver mean
as (3.6) would suggest, but truly the logarithm of the golden mean.
3.3. Coded Subshifts 67

Let C = {1010, 0100}. Then XC doesn’t have the unique decomposition

property because
 10 = 010 010
However, if this word is extended by one symbol (either on the left or on
the right), then the decomposition is unique. Therefore XC is uniquely de-
Let C = {10, 00, 01}. In this case, every word containing 11 is uniquely
decipherable, and all other words can be deciphered in exactly two ways; e.g.

· · · 01
 10  0 · · · = · · · 0 10
 01  00
 00  · · ·
Formula (3.6) suggests that the topological entropy htop (σ) = 2 log 3, and
this is indeed true.

0 0 4 5

0 1 0 2
v0 v0
1 0 1 3

Figure 3.5. The edge-labeled transition graphs of XC and XC̃ .

We see this by considering XC˜ for C˜ = {01, 23, 45}. These have isomor-
phic transition graphs (with isomorphic path spaces), see Figure 3.5, but the
latter is clearly uniquely decipherable with entropy 12 log 3. Since (XC , σ) is
a factor of (XC˜, σ) via the sliding block-code π : 0 → 0, 1 → 1, 2 → 1, 3 →
0, 4 → 0, 5 → 0, the factor map is at most 2-to-1 and hence doesn’t decrease

The second problem is that there may not be a good correspondence

between the number of loops of length  and the number of subwords of
length . The solution of (3.6) can then underestimate the true value of the
entropy, and indeed hG (G) ≤ htop (XC ). A crude example of this is

C = {01, 00011011, 000001010011100101110111, . . . };

i.e. the n-th code word is a concatenation of all words in {0, 1}∗ of length n.
Then q = 1 if  = n2n and q = 0 otherwise. Since every word appears in
XC , the true entropy is htop (XC ) = log 2, but (3.6) yields

e−2h +e−8h +e−24h +e−128h +· · · = 1, which gives h = log 1.1809 · · · < log 2.
68 3. Subshifts of Positive Entropy

Hence, knowing the numbers q of length  code words is insufficient

to decide on the entropy. Pavlov [450] suggests using the n-subwords Wn
inside code words instead. The exponential growth rate of their number is
limn n1 log #Wn = h(UC ).

 3.45 ([450, Theorems 1.7 and 1.8]). Recall from (3.6) that F (h) =
q e −h .

(i) If h > h(UC ) and F (h) < 1, then htop (XC ) ≤ h.

(ii) Conversely, if F (h) > 1 and C has the unique decomposition prop-
erty, then htop (XC ) > h.

Proof. (i) Let Pren and Sufn denote the length n prefixes and suffixes of
code words C ∈ C. Note that Pren ∪ Sufn ⊂ Wn . Every word in L(XC ) can
be written as the concatenation of the one suffix, some code words, and one
prefix, and therefore
Ln (XC ) = Wn ∪ Sufn1 Cn2 · · · Cnk−1 Prenk ,
k=2 n1 +···+nk =n
ni ≥ 1

where the inner union really runs over the concatenations of all words in the
indicated sets. Note that if the concatenation starts with a full code word,
then this counts as a suffix, and similarly if the concatenation ends with a
full code. Therefore it is justified to assume that ni ≥ 1 for each i.
This gives
#Ln (XC ) ≤ #Wn + #Sufn1 · qn2 · · · qnk−1 · #Prenk .
k=2 n1 +···+nk =n
ni ≥ 1

Since limn n1 log #Wn = h(UC ), our assumption h > h(UC ) implies that there
is a constant K such that

max{#Prenk , #Sufnk } ≤ #Wn ≤ Kenh .

Therefore, setting m = n1 + nk ,
#Ln (XC ) ≤ Ke nh
+ 2 (n1 +nk )h
K e qnj
k=2 n1 +···+nk =n j=2
ni ≥1
⎛ ⎞
⎜ n n−m 

= enh ⎜
⎝K + K
qnj e−nj h ⎟
m=0 k=2 n2 +···+nk−1 =n−m j=2
ni ≥ 1
3.3. Coded Subshifts 69

where the empty product counts as 1. All the terms in the last sum are part
∞ −jh
of the expansion of F (h)k−2 = j=1 qj e . By the assumption that
F (h) < 1, we obtain
(n + 1)K 2
#Ln (XC ) ≤ e nh
K + (n + 1)K 2
F (h) k−2
≤e nh
K+ .
1 − F (h)
Taking logarithms, dividing by n, and taking the limit n → ∞ gives htop (XC )
≤ h.

(ii) If F (h) > 1, then for all t ∈ N sufficiently large, St := tj=1 qj e−jh >
1. For k ∈ N, we have the expansion
⎛ ⎞k
t tk k
St = ⎝ qj e −jh ⎠
= e −nh
qnj .
j=1 n=k n2 +···+nk−1 =n j=1
ni ≥ 1

Choose n = Nk such that the second sum is maximized. Obviously t ≤ Nk ≤

tk. Then

−Nk h
St ≤ tke
qnj .
n2 +···+nk−1 =Nk j=1
ni ≥ 1

For every choice n1 , . . . , nk with ki=1 ni = Nk , the concatenation of words
from Cni belongs to L(XC ). Also, by the unique decomposition property,
every different choice of such concatenation gives a different word in L(XC ).
k N /t
eNk h St k
#LNk (XC ) ≥ qnj ≥ .
2 n +···+n
k−1 =Nk j=1
ni ≥ 1

Next take logarithms, divide by Nk , and let Nk → ∞ to obtain htop (XC ) ≥

h + 1t log St . But since F (h) ≥ St > 1 for all sufficiently large t, we get
htop (XC ) > h as required. 

We can now state the consequence for the entropy of coded shifts, para-
phrasing results of Pavlov [450, Theorems 1.1–1.3].
Corollary 3.46. Let h(UC ) = limn n1 log pUC (n) be the exponential growth

rate of words in UC , and recall the function F (h) = ≥1 q e−h from (3.6).
(a) Assume that XC has unique decomposition property. If F (h(UC )) ≥
1, then F (htop (XC , σ)) = 1. In fact, h = htop (XC ) is the only
solution of F (h) = 1.
(b) If F (h(UC )) < 1, then htop (XC , σ) = h(UC ).
Also htop (XC , σ) = h(UC ) if and only if F (h(UC )) ≤ 1.
70 3. Subshifts of Positive Entropy

Proof. The map h → F (h) has a critical value hc such that F (h) = ∞ for
h < hc and F (h) < ∞ is strictly decreasing for h > hc . At h = hc , F (h) can
be finite or infinite.
(a) If 1 < F (h(UC )) is finite, then hc ≤ h(UC ), as there is a unique
h1 > h(UC ) such that F (h1 ) = 1. Theorem 3.45 gives that htop (XC ) = h1 .
(b) If F (h(UC )) < 1, then by Theorem 3.45(i) we have htop (XC ) <
h(UC ) + ε for every ε > 0. Since XC ⊃ UC , we have htop (XC ) ≥ h(UC ),
so htop (XC ) = h(UC ) follows.
Combining (a) and (b) shows that htop (XC ) = h(UC ) if and only if
F (h(UC )) ≤ 1. 
Corollary 3.47. Every non-periodic coded shift (XC , σ) has positive en-

Proof. If C is a single word, then XC is periodic. Let C, C  ∈ C be the two

shortest words in C. Then by Theorem 8.73, the entropy htop (XC , σ) ≥ log x,

where x is the largest solution to the equation x−|C| + x−|C | = 1. Clearly
x > 1. 

The classification also has an analogue for the intrinsic ergodicity of

coded shifts. This was studied in several papers by Climenhaga, Thompson,
and Pavlov; see [159, Theorem B] and [450]. For countable directed graphs,
intrinsic ergodicity is equivalent to positive recurrence, see Theorem 8.68.
The results for coded shifts are parallel, except that h(UC ) plays the role of
lim sup 1 log q in the case that the graph G is formed by a single vertex v0
from which q loops of length  emerge.
That is, if F (h(UC )) > 1, then there is a unique measure of maximal
entropy μ and supp(μ) = XC . If F (h(UC )) < 1, then all invariant measures
(if there are any) are of maximal entropy μ and μ(UC ) = 1. The case
F (h(UC )) = 1 is a mixture of the two: there may be one or multiple measures
of maximal entropy.
Theorem 3.48. Let (X, σ) be a coded shift with q = #{c ∈ C : |c| = }.
(1) If lim sup 1 log q < htop (XC , σ), then (X, σ) is intrinsically er-
(2) If lim sup 1 log q = 0, then every factor of (X, σ) is intrinsically
Example 3.49. The next example, based on [450, Example 5.3 and 5.4],
shows that in certain cases the theory of countable directed graph does apply
to coded shifts. Important for this seems to be that
q ≈ #{subwords of the code words from C }.
3.4. Hereditary and Density Shifts 71

For the alphabet A = {0, 1, . . . , d} and some function τ : N → N, take

the set of code words

C = {a1 a2 · · · an 0τ (n) : ai ∈ {1, . . . , d}, n ≥ 2}.

Hence UC = {0, . . . , d}Z , whence h(UC ) = log d, and

∞ ∞
F (h(UC )) = q d− = dn · e−(n+τ (n)) log d = d−τ (n) .
=1 n=2 n≥2

If d = 2 and τ (n) = n, then F (h(UC )) = 1, so htop (XC ) = h(UC ) = log 2. In

fact, this is a situation (see [450, Proposition 5.1]) where one can equally well
work with the transition graph G and the Gurevich entropy hG (G) = log 2.
Since also

d −τ (n)−1
F (h)|h=hG (G) = (n + τ (n))d = n2−n < ∞,
n≥2 n=2

the graph G is positively recurrent. Thus there is a unique measure of max-

imal entropy, and it is supported on the whole of XC .
If d = 4 and τ (n) = log2 n, then
∞ 2k+1 −1 ∞
−τ (n) −log2 n −k
F (h(UC )) = d = 4 = 4 = 2k 4−k = 1.
n≥2 n≥2 k=1 n=2k k=1

Again, htop (XC ) = h(UC ) = hG (G), and

d −τ (n)−1 1
F (h)|h=hG (G) = (n + τ (n))d ≥ n4−log2 n
dh 4
n≥2 n=2
∞ 2k+1 −1 ∞
1 1
≥ 2k 4−k = 1 = ∞.
4 4
k=1 n=2k k=1

Therefore G is null recurrent, and the measure of maximal entropy is sup-

ported on UC . In fact, it is the ( 14 , 14 , 14 , 14 )-Bernoulli measure on {1, 2, 3, 4}Z ,
giving no weight to the symbol 0.

3.4. Hereditary and Density Shifts

The natural order on the alphabet A = {0, . . . , N − 1} can be used to create
shift-invariant rules.

Definition 3.50. A collection X ⊂ AN or Z is hereditary if whenever x ∈ X

and y ≤her x (meaning that yn ≤ xn for all n), then also y ∈ X.
72 3. Subshifts of Positive Entropy

Hereditary shifts first appeared in [356, page 882]. It is clear that this
rule is shift-invariant, but it is not necessarily closed under taking limits.
For example, the collection
(3.7) X = {x ∈ {0, 1}N : xi = 0 infinitely often}
is hereditary, but it contains the sequence 1∞ in its closure. Therefore, some
authors [382] make the distinction between hereditary shift and subordi-
nate shift, the latter being hereditary and closed. We will write hereditary
subshift, meaning it is indeed closed. SFTs are hereditary, if the collection of
forbidden words of length M is exactly the largest in the partial order ≤her
on AM . A similar fact holds for sofic shifts.
Lemma 3.51. The hereditary closure of (i.e. smallest hereditary subshift
containing) the sofic shift (X, σ) is sofic.

Proof. Extend the edge-labeled transition graph G of X to G  so that for

a a
each v → w, there is also v → w for each letter a < a. 

We will see later that also β-shifts (Corollary 3.71) and spacing shifts
are hereditary. Another way to create hereditary subshifts is by stipulating
an upper bound of the frequency of non-zero digits.
Definition 3.52. Let A = {0, 1, . . . , N − 1} be the alphabet. The (upper)
density of the subshift X ⊂ AN or Z is
d(X) = sup{d(x) ¯ : x ∈ X},
where d(x) is the upper density (see Definition 8.52) of the set of indices
j such that xj = 0; i.e. d(x) = lim supk k1 {0 ≤ j < k : xj = 0}. Let
Xδ := {x ∈ AN : d(x)
¯ ≤ δ}.

¯ δ ) = δ, but the example of (3.7) shows that the

It is clear that d(X
¯ ≤ δ for every x ∈ Xδ is not closed under taking limits.
property that d(x)
Remark 3.53. Assume that a collection X ⊂ Xδ is shift-invariant and
closed. Then it makes no difference to use Banach density (see Section 8.5)
instead of density. Indeed, if there was a sequence x ∈ X with upper Banach
density δ, then there is a sequence nk such that k1 #{1 ≤ j ≤ k : xnk +j =
0} → δ. But then k1 #{1 ≤ j ≤ k : σ nk (x)j = 0} → δ, and by compact-
ness, we can find a subsequence of (nk )k∈N along which σ nk (x) converges
to y. This y has upper density
nk −1 δ. Secondly, if we define a measure μ as
accumulation point of n1k j=0 δσj (y) where the sequence (nk )k∈N is such
that limk nk #{0 ≤ j < nk : yj = 1} = δ, then each μ-typical point satisfies

limn n1 #{0 ≤ j < n : yj = 1} = δ.

The following entropy estimate is adapted from [382].

3.4. Hereditary and Density Shifts 73

Theorem 3.54. A non-periodic hereditary subshift (X, σ) on the alphabet

A = {0, 1, . . . , N − 1} has positive topological entropy. In fact3 htop (X, σ) ≥
d(X) ¯
log 2 and htop (X, σ) = 0 if d(X) > 0.

Proof. Let X be a one-sided hereditary shift (the two-sided case goes sim-
ilarly). Assume that X is not a single periodic orbit, which for hereditary
shifts means X = {0∞ }. If d(X) ¯ > 0, then for every ε > 0 there are
x ∈ X and infinitely many integers n such that #{1 ≤ i ≤ n : xi = 0} ≥
(d(X) − ε)n. Since X is hereditary,
1 1 ¯ ¯
log p(n) ≥ log 2(d(X)−ε)n = (d(X) − ε) log 2.
n n
But limn n1 log p(n) exists according to Fekete’s Lemma 1.15, and ε > 0 is
arbitrary, so htop (σ) ≥ d(X) log 2. Note that if X, for every ε > 0, contains
sequences x such that #{1 ≤ i ≤ n : xi = N − 1} ≥ d(X) ¯ − ε, then we find
htop (σ) ≥ d(X) log N .
For the converse, assume that d(X) = 0, so for every ε > 0 there is n0
such that for all n ≥ n0 ,
n n
p(n) ≤ ≤ nε .
k nε

Using Stirling’s formula4 , we obtain

1 1 ε n nn e−n
log p(n) ≤ log
n n (nε)nε e−nε (n(1 − ε))n(1−ε) e−n(1−ε)
1 √
≤ log(ε n) − ε log ε − (1 − ε) log(1 − ε).
Since ε > 0 is arbitrary, it follows that htop (σ) = limn 1
n log p(n) = 0. 

The drawback of Definition 3.52 above is that the collection

Xδ := {x ∈ AN or Z : d(x)
¯ ≤ δ}

is not closed. For instance xn := 1n 0∞ ∈ Xδ for all δ ≥ 0, but limn xn =

1∞ belongs only to X1 . To obtain closedness, we need to impose further
conditions, of the sort that every n-block (for n sufficiently large) contains
no more than δn non-zero symbols. The general approach that we shall
present is due to Stanley [523].

3 But this is not

√ a sharp bound; see Example 3.59.
4 n! ∼ nn e−n 2πn.
74 3. Subshifts of Positive Entropy

Definition 3.55. Let A = {0, 1, . . . , N − 1} be the alphabet. Given a

function f : N → R, we define the density shift of f as
Xf := x ∈ AN or Z : xk+i ≤ f (n) for all k ∈ N or Z and n ∈ N .
In particular, if A = {0, 1}, then
Xf := {x ∈ AN or Z : |xk · · · xk+n−1 |1 ≤ f (n) for all k ∈ N or Z and n ∈ N}.

Since the condition in the definition is on finite blocks, Xf is closed, and

σ-invariance is clear too. Therefore Xf is a subshift, and it is obviously
hereditary. We could define density shifts on the infinite alphabet A =
{0, 1, 2, . . . }, but as f (1) < ∞, we can use only f (1) + 1 symbols anyway.
Example 3.56. The odd shift Xodd from Example 1.4 is not a density shift,
because it is not hereditary. For example, 1011 ∈ L(Xodd ) but 1001 ∈ /
L(Xodd ).
Definition 3.57. The canonical function f of a density shift X is the
smallest function such that X = Xf , in the sense that if X = Xf , then
f (n) ≤ f (n) for all n ∈ N.
Theorem 3.58. The canonical function f of a density shift satisfies
(1) f (N) ⊂ N;
(2) f is non-decreasing;
(3) f (m + n) ≤ f (m) + f (n) (subadditive).
Conversely, every function f satisfying (1)–(3) is the canonical function of
some density shift.
Example 3.59. If f (n) = (n + 1)/2, then the word 11 is forbidden, but no
other word is (apart from words that contain 11). Thus Xf is the Fibonacci
SFT, and its density d(X) = 1/2, achieved by x = 101010 · · · . If we set
f (n) = (n + 1)/2, then we get the same shift: Xf = Xf . In fact, f is the
smallest function with this property. This example also shows that the lower
bound of the
√ entropy in Theorem 3.54 is1 not sharp, because htop (Xf , σ) =
log( 2 (1 + 5)) which is larger than the 2 log 2 given by Theorem 3.54.

Proof of Theorem 3.58. For simplicity of exposition, we only consider

one-sided shifts. Define the partial order on X as
n n
(3.8) x sum y if xi ≤ yi for all n ∈ N.
i=1 i=1
Let z ∈ X be such that, inductively, for every n ∈ N, xn ∈ A is the largest
symbol such that x1 · · · xn ∈ Ln (X). We claim that x sum z for all x ∈ X.
3.4. Hereditary and Density Shifts 75

We prove the claim by induction on the length n. Clearly x1 ≤ z1 . As-

sume by induction that x1 · · · xn sum z1 · · · zn and let ξn+1 ∈ A be maximal
such that
(3.9) x1 · · · xn ξn+1 ∈ L(X).
n n
Set p = i=1 zi − i=1 xi ≥ 0. If ξn+1 ≤ p, then (3.9) clearly holds. If
ξn+1 > p, then take a = ξn+1 − p ≤ N − 1. For each 1 ≤ r ≤ n, we have
n n r−1
zi + a = zi + a − zi
i=r i=1 i=1
n r−1
≤ xi + p + a − xi (by the induction hypothesis)
i=1 i=1
≤ xi + ξn+1 (by the choice of a),
which is an allowed   choice of ξn+1 . Therefore z1 · · · zn a ∈
sum in X by the
L(Xf ) and because ni=1 zi + a = ni=1 xi + ξn+1 , we have
x1 · · · xn+1 sum x1 · · · xn ξn+1 sum z1 · · · zn a sum z1 · · · zn+1 .
This finishes the induction step. It follows that σ m (z) sum zfor all m ≥ 0;
i.e. z is shift-maximal with respect to sum . Define f (n) = ni=1 zi . Then
n f is integer-valued and non-decreasing. Also f (m + n) = i=1 zi +
i=1 σ m (z) i ≤ f (m) + f (n). Hence (1)–(3) hold.
Conversely, suppose that f satisfies (1)–(3) and set X = Xf . Let z
be the maximal sequence with respect to sum as before. We will prove by
induction that
(3.10) f (r) = zi for all r ∈ N.
This is clear for n = 1, so assume that (3.10) holds for all 1 ≤ r ≤ n. Set
a = f (n + 1) − f (n). 
We must show that z1 · · · zn a ∈ L(Xf ) and for this it
suffices to show that ni=r zi + a ≤ f (n − r + 2) for each 1 ≤ r ≤ n. For
r = 1 this holds by the choice of a. Otherwise
n n r−1
zi + a = zi − zi + a
i=r i=1 i=1
= f (n) − f (f (r − 1)) + a (by the induction hypothesis)
= f (n + 1) − f (r − 1) (by the choice of a)
≤ f (n − r + 2) (by property (3)).
This concludes the induction step and the entire proof. 
76 3. Subshifts of Positive Entropy

By Fekete’s Lemma 1.15, limn f (n)/n = inf n f (n)/n, so inf n f (n)/n =

0 if and only if the density shift (Xf , σ) has zero topological entropy by
Theorem 3.54. Without proof we state ([523, Theorem 2.10]):

Corollary 3.60. If σ m (z) sum z for all m n≥ 0, then z is the maximal
sequence of the density shift Xf for f (n) = i=1 zi .

Theorem 3.61. Let Xf be a non-trivial density shift with canonical function

f . The following are equivalent:
(a) (Xf , σ) is topologically transitive.
(b) (Xf , σ) is topologically mixing.
(c) f is unbounded.

Proof. (a) ⇒ (b): Let v, w ∈ L(Xf ) be arbitrary non-empty words. By

topological transitivity, there is u ∈ L(Xf ) such that vuw ∈ L(Xf ) as well.
But then v0k w ∈ L(Xf ) for every k ≥ |u|, proving topological mixing.
(b) ⇒ (a): Trivial.
(a) ⇒ (c): Since 1 ∈ L(Xf ), topological transitivitygives a sequence
x ∈ Xf containing infinitely many 1’s. Thus f (n) ≥ i=1 xi → ∞ as
n → ∞.
(c) ⇒ (a): Let u, v ∈ L(Xf ) be arbitrary non-empty words. Since f
|u| |v|
is unbounded, there is n ∈ N such that f (n) ≥ i=1 ui + i=1 vi . Then
u0n v ∈ L(Xf ). 

In particular, SFTs (X, σ) that are also density shifts are transitive, be-
cause, unless X = {0∞ }, there is a non-trivial word v and x ∈ X that
contains v infinitely often as subword. In fact, density SFTs are com-
pletely characterized as those for which the canonical function f satisfies
inf n f (n)/n = f (p)/p for some p ∈ N; see [523, Theorem 4.3]. On the other
hand, if f is bounded, then all x ∈ Xf end with 0∞ . They can be rep-
resented by a finite edge-labeled transition graph [523, Theorem 2.16] and
also have a finite collection of follower sets. Hence such density shifts are
non-transitive sofic shifts.
Sofic density shifts, in general, are characterized [523, Theorem 6.3] as
those for which the maximal sequence z is eventually periodic (zn = zn+p
for n sufficiently large), or equivalently f (n + p) = f (n) + k (where k =
i=n zi and k > 0 if and only if Xf is transitive).

Theorem 3.62. Let Xf be a non-trivial density shift with canonical function

f . The following are equivalent:
(a) Xf contains a periodic point other than 0∞ .
3.5. β-Shifts and β-Expansions 77

(b) There is λ > 0 such that f (n) ≥ λn for all n ∈ N.

(c) (Xf , σ) is a coded shift.
Proof. (a) ⇒ (b): If 0∞ = x = σ p (x) ∈ Xf , then i=1 xi ≥ n/p, so (b)
holds for λ = 1/p.
(b) ⇒ (c): Define s(u) =  i=1 ui λ, and let
C := {0s(u) u 0s(u) : u ∈ L(Xf )}
be the collection of code words. The “padding blocks” 0s(u) ensure that the
“core words” u are sufficiently apart that the code words can be concatenated
freely; see [523, Theorem 3.1] for the details. Hence we have the coded shift
XC ⊂ Xf . On the other hand, L(Xf ) ⊂ L(XC ) so the reverse inclusion
Xf ⊂ XC follows.
(c) ⇒ (a): If u is a non-trivial code word, then u∞ ∈ Xf . 

Since every infinite subshift is expansive (see below Definition 1.39), The-
orems 3.61 and 3.62 allow the following characterizations of chaos for density
Corollary 3.63. Let (Xf , σ) be a density shift with canonical function f .
(1) (Xf , σ) is Devaney chaotic if and only if inf n f (n)/n > 0.
(2) (Xf , σ) is Auslander-Yorke chaotic if and only if f is unbounded.
(3) (Xf , σ) is Li-Yorke chaotic if and only if f is unbounded.

3.5. β-Shifts and β-Expansions

Throughout this section, we fix β > 1. A number x ∈ [0, 1] can be expressed
as the (infinite) sum of powers of β:

−k bk ∈ {0, 1, . . . , β} if β ∈
/ N,
x= bk β where
bk ∈ {0, 1, . . . , β − 1} if β ∈ {2, 3, 4, . . . }.
For the case β ∈ {2, 3, 4, . . . }, this is the usual β-ary expansion; it is unique
except for the β-adic rationals { βmn : m ∈ Z, n ∈ N}. For example, if β = 10,
then 0.3 = 0.29999 . . . . If β ∈/ N, then x need not have a unique β-expansion
either. As summarized in Theorem 3.67, some points have uncountably many
different expansions, but there is a canonical way to define an expansion,
called the greedy expansion:
• Take b1 = βx; that is, we take b1 as large as we possibly can.
• Let x1 = βx − b1 and b2 = βx1 ; again b2 is as large as possible.
• Let x2 = βx1 − b2 and b3 = βx2 , etc.
78 3. Subshifts of Positive Entropy

In other words, xk = Tβk (x) for the map Tβ : x → βx mod 1, and bk+1 is the
integer part of βxk .
Definition 3.64. The closure of the greedy β-expansions of all x ∈ [0, 1] is
a subshift of {0, . . . , β}N ; it is called the β-shift and we will denote it as
(Xβ , σ).

If b = (bk )∞
k=1 is the β-expansion of some x ∈ [0, 1], then σ(b) is the
β-expansion of Tβ (x). The following lemma from [445] characterizes the
β-shift in terms of the lexicographic order lex :
Lemma 3.65. Let c = c1 c2 c3 · · · be the β-expansion of 1, and suppose it is
not finite; i.e. ci > 0 infinitely often5 . Then b ∈ Xβ if and only if
σ n (b) lex c for all n ≥ 0.

However, the greedy expansion (bi )i≥1 of x is the largest sequence in

lexicographical order among all the expansions of x.
Example 3.66. Let β = 1.8393 . . . be the largest root of the equation
β 3 = β 2 + β + 1. One can check that c = 111000000 · · · . Therefore b ∈ Xβ
if and only if one of
σ n (b) = 0 · · · , σ n (b) = 10 · · · , σ n (b) = 110 · · · , or σ n (b) = c
holds for every n ≥ 0. The subshift Xβ is itself not of finite type, because
there are infinitely many forbidden words 1110k 1, k ≥ 0, but by some re-
coding it can be seen to be conjugate to an SFT (see the middle panel of
Figure 3.6), and it has a simple edge-labeled transition graph.

0 11 0

111 1

00 1 0
0 Tβ2 1 Tβ 1 1

Figure 3.6. Left: The map Tβ for β 3 = β 2 + β + 1. Then Tβ3 (1) = 0.

Middle: A corresponding vertex-labeled graph. Right: A corresponding
edge-labeled graph.

5 This condition is required for the “if” direction. For example, if c = 1110∞ as in Exam-

ple 3.66, then b = (110)∞ <lex c, but there is no point x ∈ [0, 1] with this itinerary. In fact, b is
the lazy expansion of the point 1; it is “the other” canonical itinerary that 1 has.
3.5. β-Shifts and β-Expansions 79

Proof of Lemma 3.65. Let b = (bk (x))k≥1 be the β-expansion of some

x ∈ [0, 1). (If x = 1, there is nothing to prove because b = c.) Since x < 1
we have b1 = βx ≤ c1 = β · 1. If the inequality is strict, then b ≺lex c.
Otherwise, 0 ≤ x1 = Tβ (x) = βx − b1 < β · 1 − c1 = Tβ (1), and we find that
b2 = βx1  ≤ c2 = βTβ (1). Continue by induction.
Conversely, define half-open subintervals of [0, 1]:
j j+1
Aj = , 0 ≤ j < c1 ,
β β
c1 j c1 j + 1
(3.11) Ac1 j = + 2, + 0 ≤ j < c2 ,
β β β β2
c1 c2 j c1 c2 j+1
Ac1 c2 j = + 2 + 2, + 2 + 0 ≤ j < c3 ,
β β β β β β2
.. .. .. ..
. . . .
They are adjacent and clearly Tβ (Aj ) = [0, 1) for 0 ≤ j < c1 . Also
Tβ (Ac1 j ) = [j/β, (j + 1)/β) for 0 ≤ j < c2 . Since σ n ((ck )k≥1 ) lex (ck )k≥1
by the first part of the proof, we have c2 ≤ c1 . In particular Tβ (Ac1 j ) is one
of the intervals in the first row of (3.11). Therefore Tβ2 (Ac1 j ) = [0, 1). By
induction, we obtain
(3.12) Tβk+1 (Ac1 c2 ···ck j ) = [0, 1) for all k ∈ N, 0 ≤ j < ck+1 .
In fact, Ac1 ···ck j = {x ∈ [0, 1] : bn (x) = cn for 1 ≤ n ≤ k, bk+1 (x) = j}.
Now take (bk )k≥1 ∈ AN such that (bk )k≥1 lex (ck )k≥1 , and define n0 = 0
and recursively nr+1 = min{k > nr : bk = ck−nr }. Suppose first that all nr ’s
are finite. Then bnr +1 · · · bnr+1 is the index of one of the intervals in the
nr+1 − nr -th row of (3.11). The intersection

Tβ−nr (Abnr +1 ···bnr+1 )

(of intervals of length ≤ β −r ) is a single point x with (bk (x))k≥1 = (bk )k≥1 .
If ns+1 = ∞ for some s ≥ 0 and we set Abns +1 bns +2 ··· = {1}, then {x} =
s −nr
r=0 Tβ (Abnr +1 ···bnr+1 ) gives again the unique point with (bk (x))k≥1 =
(bk )k≥1 . 

 The greedy
expansion above is not the only way of expressing x =
k≥1 bk β for bk in the digit set {0, . . . , β}. For instance, in the lazy
expansion we always take the smallest possible6 digit bk such that the
sum x can still be achieved. For β = 2, choosing the greedy and lazy
6 In terms of the algorithm given for the greedy expansion, we need to take b = βx
 k k−1 −
β/(β − 1) so that xk ≤ j>k ββ k−j ; i.e. xk (and therefore x) can still be reached choosing
the remaining digits bj maximal.
80 3. Subshifts of Positive Entropy

expansion makes the difference in expressing dyadic rationals in (0, 1) as

x = b1 · · · bk 1000 · · · (greedy, with partition {[0, 12 ), [ 12 , 1]}) and x = b1 · · ·
bk 0111 · · · (lazy, with partition {[0, 12 ], ( 12 , 1]}). All other numbers in [0, 1]
have a unique expansion for β = 2.
In general, the number of expansions can be much larger, for a larger set
of points. This can be shown by counting the number of orbits of the point
x under iteration of the multivalued map
# β + 1 $ # β + 1 $
Tβ : 0, → 0, , x → βx − i if x ∈ Δi .
β β

Here the Δi = [ βi , β+1+iβ

] are the domains of the branches of Tβ , and
the labels i are used as the symbols of the itinerary i(x) = b0 b1 b2 · · · of
points; i.e. bk = i if Tβk (x) ∈ Δi along some forward orbit is an expansion
of x. The intervals where the Δi ’s overlap are called switch regions; see
Figure 3.7. Points for which infinitely many forward Tβ -orbits each visit the
switch regions infinitely often have uncountably many expansions.

β+1 β+1
β −1 β

The orbit of the greedy expansion in [0, 1)

The orbit of the lazy expansion in [ β+1

β − 1, β+1
β )

Switch regions Δi ∩ Δi+1 = [ i+1
β , β2 ]
0 1

β+1 β+1
Figure 3.7. The map Tβ : [0, β
] → [0, β
] with switch regions.

On the other hand, points whose forward Tβ -orbits avoid switch regions
(and then these forward orbits are indeed uniquely defined) have only one
expansion. Such points are called univoque; we denote the set of univoque
points in (0, β/(β − 1)) by Uβ . Larger values of β lead to smaller switch
regions and thus smaller univoque sets, that is, until β becomes integer and
the digit set is increased by one. The following theorem is a summary of
results from [278, 368], just for the digit set {0, 1}.

Theorem 3.67. The set Uβ of positive univoque points satisfies:

(1) Uβ = ∅ for 1 < β ≤ γ = 12 (1 + 5) ≈ 1.618 . . . , the golden mean;
(2) #Uβ = 2 for γ < β ≤ βc ≈ 1.755 . . . , the leading root of x3 =
2x2 − x − 1;
3.5. β-Shifts and β-Expansions 81

(3) #Uβ = ℵ0 for βc < β < βKL ≈ 1.787 . . . , the so-called Komornik-
Loreti constant7 ;
(4) #Uβ = 2ℵ0 for βKL ≤ β < 2; it is a Cantor set of positive Haus-
dorff dimension;
(5) Uβ = (0, 1) \ {dyadic rationals} if β = 2.
In fact, the Lebesque measure Leb(Uβ ) = 0 for all β ∈ [1, 2).

Further details are given also in [201]. Previously, Erdös and coauthors
[238–240] studied the number of β-representations of 1 as function β. For
similar results for larger digit sets {0, 1, . . . , m}, see e.g. [42, 200], among a
by now very extensive literature.
Proposition 3.68. The β-shift is a coded shift.

Proof. Let c = c1 c2 c3 · · · be the β-expansion of 1. Then we can take as set

of code words
S ={0, 1, . . . , (c1 − 1), c1 0, c1 1, . . . , c1 (c2 − 1),
1-words 2-words
(3.13) c1 c2 0, c1 c2 1, . . . , c1 c2 (c3 − 1), ...
.. ..
. .
c1 c2 c3 c4 c5 c6 · · · }.
a single infinite word

Apart from the single infinite word, these are exactly the indices of the
intervals Ac1 ···ck j in (3.11). We know from (3.12) that Tβk+1 (Ac1 ···ck j ) = [0, 1),
so free concatenations of such code words all represent (bk (x))k≥1 for some
x ∈ [0, 1]. Any concatenation in S ∗ also satisfies Lemma 3.65, so that S ∗ is
dense in (and in fact equal to) Xβ . 

To illustrate this for the β-shift Xβ with c = c1 c2 c3 · · · , consider an

edge-labeled countable transition graph with vertices (vn )n≥0 and arrows8
vn−1 −→ vn , for n ≥ 1, so c is written out over the horizontal spine,
vn −→ v0 , for n ≥ 1, 0 ≤ a < cn ;
see Figure 3.8. The code words are the labels of the simple loops from v0 to
itself, and the infinite paths starting from v0 are in one-to-one correspondence
with the points in Xβ .
7 This
 −k = 1 for the Thue-Morse se-
constant is the solution of the equation k≥0 ρk+1 β
quence ρTM = ρ0 ρ1 ρ2 · · · = 0110 1001 . . . ; see Example 1.6 and [368]. The numerical value is
1.7872316501 < βKL < 1.7872316505 and βKL was proven to be transcendental in [18].
8 This graph is the edge-labeled version of the Hofbauer tower of the corresponding β-

transformation; see Section 3.6.3.

82 3. Subshifts of Positive Entropy



• 2
• 1
• 0
• 2
• 0
• 1
• 0
• 2


Figure 3.8. The edge-labeled transition graph for a β-shift with c = 21020102 . . . .

Corollary 3.69. Every β-transformation is intrinsically ergodic.

Proof. This was first shown by Hofbauer [315]; see also [159] based on
a weakened form of specification9 . Implementing Theorem 3.48, we have
#{s ∈ S : |s| = n} ≤ β for each n, so the exponential growth rate of these
words is 0. Hence Theorem 3.48 even implies that every factor of the β-shift
is intrinsically ergodic. 
Remark 3.70. For the β-transformation with slope β > 1, the measure
of maximal entropy is absolutely continuous w.r.t. Lebesgue measure, and
there is an explicit formula for the density:
dμ 1
= β −n 1[0,Tβn (1)]
dx Λ

for an appropriate normalizing constant Λ; see [281, 447].

The following result was probably first stated in [382, Section 6].
Corollary 3.71. For every β ∈ [1, 2], the β-shift (Xβ , σ) is hereditary.

Proof. This follows directly from Lemma 3.65 which determines the
shape of the code-words in Proposition 3.68. Indeed, if x ∈ Xβ and n =
min{i ≥ 1 : xi = ci }. Then xn < cn and x1 · · · xn is a code word. Now
repeat the argument with σ n (x). 
Theorem 3.72. The Tβ -orbit of 1
(1) contains 0 if and only if Xβ is conjugate to an SFT;
(2) is preperiodic if and only if Xβ is sofic10 ;
9 This
is because specification as in Definition 2.83 and hence Lemma 2.87 do not apply.
10 Since
1 is not in the range of Tβ , the orbit of 1 cannot be periodic. If T n (1)(j/β) for some
j ∈ N, then T n+1 (1) = 0 and case (1) applies, even though limyj/β Tβ (y) = 1.
3.5. β-Shifts and β-Expansions 83

(3) is not dense in [0, 1] if and only if Xβ is synchronized;

(4) is disjoint from [0, δ] for some δ > 0 if and only if Xβ has specifi-

We give a proof below but refer to [446,488] for other proofs and related

Proof. First note that if β ∈ N, then Xβ is the full shift on β symbols, so

clearly an SFT. Assume therefore that β is non-integer.
For statement (1), let aj = Tβ (1)j , so a0 = 1 and aN = 0 for some
N ≥ 2. Let P be the partition given by the branches of TβN −1 . Then aj ∈ ∂P
and the image TβN −1 (∂J) ⊂ {ai }N i=0 for each J ∈ P. This means that P is a
N −1
Markov partition for Tβ , and hence (Xβ , σ N −1 ) is a memory N − 1 SFT
over the alphabet {0, 1, . . . , β}. See Example 3.66 for an illustration of
For statement (2) and c = c1 c2 · · · cN (cN +1 · · · cN +p )∞ , we claim that
Xβ only has finitely many different follower sets; see Definition 3.34. Let
w be a proper prefix of some s1 s2 s3 · · · ∈ S ∗ for the collection of words
S from (3.13). That is, there are k ≥ 1 and 0 ≤ m < |sk | such that
|w| = |s1 · · · sk−1 | + m. The possible follower sets are

⎪ S∗ if m = 0,

⎪ ∗ ∗
⎨{aS : 0 ≤ a < c2 } ∪ {c2 aS : 0 ≤ a < c3 } ∪ . . . if m = 1,

F (w) = {aS ∗ : 0 ≤ a < c3 } ∪ {c3 aS ∗ : 0 ≤ a < c4 } ∪ . . . if m = 2,

⎪ {aS ∗ : 0 ≤ a < c4 } ∪ {c4 aS ∗ : 0 ≤ a < c5 } ∪ . . . if m = 3,

⎪ .. .. ..
⎩ . . .
Since c is eventually periodic, this list of follower sets eventually becomes
periodic as well: for each i ≥ 0, they are the same for m = N + i and
m = N + p + i. This proves the claim, so by Theorem 3.36, Xβ is sofic. (It is
easy to construct an edge-labeled transition graph for Xβ ; see Example 3.73.)
If on the other hand, the expansion of 1 is not preperiodic, so the Tβ -
orbit of 1 is infinite, then there are infinitely many different follower sets by
Theorem 3.76 below, so Xβ cannot be sofic.
For statement (3), assume that orb(1) is not dense in [0, 1] and let
U be an interval that is disjoint from orb(1). Take N so large that the
domain Z of an entire branch of TβN is contained in U . The set Z is a
cylinder set, associated to a unique word v ∈ LN (Xβ ). If u ∈ LM (Xβ ) is
such that uv ∈ L(Xβ ), then the domain Y of the corresponding branch of
TβM is such that TβM (Y ) ∩ Z = ∅. But since orb(1) ∩ Z = ∅, we have
TβM (Y ) ⊃ Z. Therefore, for every z ∈ TβN (Z), there is y ∈ Y such that
84 3. Subshifts of Positive Entropy

TβM +N (y) = z. Symbolically, this means that for every word w ∈ L(X) such
that vw ∈ L(Xβ ), also uvw ∈ L(Xβ ). In other words, v is synchronizing.
Conversely, suppose that v ∈ L(X) is some word. Then v corresponds
to the domain Z of some branch of TβN . If orb(1) is dense, then there is
n ∈ N such that Tβn (1) ∈ Z. Therefore there is a one-sided neighborhood
Y of 1 such that Tβn (Y ) = [0, Tβn (1)], and there is x ∈ Z \ Tβn (Y ). Let w
be the itinerary of TβN (x); since x ∈ Y , vw ∈ L(Xβ ). Similarly, taking
u = c1 c2 · · · cn , since Tβn (1) ∈ Z, also uv ∈ L(Xβ ). However, uvw ∈
/ L(Xβ ),
because there is no y ∈ Y such that Tβn (y) = x. This shows that v is not
synchronizing, and since v was arbitrary, Xβ is not synchronized.
Finally, for statement (4), take N such that the cylinder set [0N ] cor-
responds to a subinterval ZN contained in [0, δ]. Then TβN (ZN ) = [0, 1].
Also, for any k-cylinder [x] corresponding to an interval Zx ⊂ [0, 1], we have
Tβk (Zx ) ⊃ [0, δ] ⊃ ZN . Specification follows from this.
On the other hand, if 0 is an accumulation point of orb(1), then for any
M, N ∈ N, there is some word x ∈ LM (Xβ ) corresponding to an interval Zx
such that TβM (Zx ) ⊂ [0, β −N +1 ]. Then there is no word y ∈ LN (Xβ ) such
that xy1 ∈ L(Xβ ), and thus specification fails. 

Example 3.73. Let β = 1.801937735 . . . be the largest root of the equation

β 3 = β 2 + 2β − 1. One can check that c = 11010101010 · · · is preperiodic,
and the Tβ -orbit of 1 is {1, β − 1, β(β − 1), β − 1, β(β − 1), . . . }. The points
{0, β(β−1), 1/β, β−1, 1} define a Markov partition; see Figure 3.9. Therefore
the dynamical system ([0, 1], Tβ ) can be described as an SFT, but not in the
alphabet {0, 1}. However, by edge-labeling the transition graph in Figure 3.9,
we get Xβ . Therefore x ∈ Xβ if and only if for every n ≥ 0 one of

σ n (x) = 0 · · · , σ n (x) = 10 · · · , σ n (x) = 110 · · · , or σ n (x) = c

holds. The subshift Xβ is itself not of finite type, because there are infinitely
many forbidden words 1110k 1, k ≥ 0, but by some recoding it can be seen
to be conjugate to an SFT (see the middle panel of Figure 3.6), and it has a
simple edge-labeled transition graph. Also, Xβ is the image of the length one
sliding block code π(a) = π(b) = 0, π(c) = π(d) = 1, because a, b ⊂ [0, 1/β]
and c, d ⊂ [1/β, 1].

The first two types of β-shifts in Theorem 3.72 correspond to certain

algebraic properties of β, which we will mention but not prove. For the
definitions of Pisot and Perron numbers, see Section 8.1.

Theorem 3.74. If β is a Pisot number, then Xβ is sofic. If the subshift Xβ

is sofic, then β is a Perron number.
3.5. β-Shifts and β-Expansions 85

a c a = [0, β(β − 1) − 1]
1 b = [β(β − 1) − 1, 1/β]
0 1 1 c = [1/β, β − 1]
d = [β(β − 1), 1]

a c d
b d
b 1
0 Tβ2 1 Tβ 1 1

Figure 3.9. The transition graph for a sofic β-shift for β = 1.801937735 . . . .

Remark 3.75. We refer to [490] and [85, Chapter 7] for more results in
this spirit. If Xβ is sofic, then the Tβ -orbit of 1 is a finite set, say 0 =
x0 < x1 < x2 < · · · < xd = 1, where x0 = 0 is added for convenience, also
if it is not part of orbTβ (1). The intervals τi = [xi−1 , xi ] form a Markov
partition with associated matrix M = (mij )di=1 where mij = 1 if Tβ (τi ) ⊃ τj
and mij = 0 otherwise. This also defines a substitution χβ (a) = a1 . . . at
(with the letters ai in increasing order) if Tβ (τa ) = τa1 ∪ · · · ∪ τat with
fixed point ρ = limn χnβ (a1 ) and substitution shift (Xρ , σ) for Xρ = orbσ (ρ).
See [15, 16] for studies of these kinds of substitution systems. The Pisot
substitution conjecture states that this subshift has a purely point spectrum
(see Section 6.8.3) if and only if β is a Pisot number. This special version of
the Pisot substitution conjecture was proved by Barge [50].

Continuing on the theme of follower sets, let

(3.14) F (n) := #{F : F is the follower set of some v ∈ Ln (Xβ )}
be the number of distinct follower sets of words in Ln (Xβ ). Clearly, F (n) ≤
p(n), but in general F (n) is much smaller. Recall from Theorem 3.36 that
F (n) is a bounded sequence if and only if the subshift is sofic. For β-shifts,
we see in general linear growth of F (n).
Theorem 3.76. For every β-shift (Xβ , σ) with β > 1, we have F (n) = n+1,
except when orb(1) is finite; in this case, (Xβ , σ) is sofic.

Proof. This result comes from [435, Theorem 2.25], but we give a different
dynamical proof. Set β > 1, and assume that c = c1 c2 c3 · · · is the β-
expansion of 1. Let D0 = [0, 1] and in general11 Dn = [0, Tβn (1)]. First
assume that all points Tβn (1) are distinct. The proof will be by induction.
11 This notation is derived from the Hofbauer tower construction from Section 3.6.3 applied

to β-transformations. If the orbit of 1 is infinite, then there are n + 1 levels in the tower of height
≤ n. The image of each n-cylinder under Tβn is one of these, and therefore #F (n) = n + 1. The
same result holds for unimodal maps. More generally, for interval maps with d + 1 branches, we
have #F (n) ≤ dn + 1.
86 3. Subshifts of Positive Entropy

For n = 0, there is only one follower set F0 of the empty word : F0 =

L(Xβ ). Therefore F (0) = 1.
For n = 1 and a1 = c1 , Tβ ([a1 /β, (a1 + 1)/β]) = [0, 1] = D0 and the
follower set of a1 is F0 . If a1 = c1 , then Tβ ([a1 /β, 1]) = [0, Tβ (1)] = D1
and the follower set F1 of a1 is equal to the collection of itineraries of points
x ∈ D1 . Therefore F (1) = 2.
For general n, if v = a1 a2 · · · an and k is the smallest index such that
ak+1 · · · an = c1 · · · cn−k , then the corresponding follower set equals Fn−k .
In particular, if k = 0, then the follower set of a1 · · · an is the collection of
itineraries of x ∈ Dn . Hence F (n) = n + 1, proving the statement.
If Dn = Dk for some k < n (say n is minimal with this property), then
we get no new follower sets anymore, and F (m) = n + 1 for all m ≥ n. As
shown in Theorem 3.36, Xβ is sofic in this case. 

Theorem 3.77. The β-shift for β > 1 has topological entropy log β.

Proof. This is a special case of a theorem of interval dynamics saying that

every piecewise affine map with slope ±β has topological entropy htop (Tβ ) =
max{log β, 0}, but we will give a purely symbolic proof.
Recall the β-expansion c = c1 c2 · · · of 1 and the set of code words S from
(3.13). By Proposition 3.68, every word in L(Xβ ) has the form12
(3.15) s1 s2 · · · sm c1 c2 · · · ck for some (maximal) s1 , . . . , sm ∈ S, k ≥ 0.
Let pβ (n) and pS ∗ (n) be the number of words in Ln (Xβ ) and Ln (S ∗ ), respec-
tively. Since every word in S ∗ is a word in L(Xβ ), we have pS ∗ (n) ≤ pβ (n).
Conversely, by (3.15),

pβ (n) ≤ pS ∗ (m) ≤ (n + 1) max pS ∗ (m).


Therefore the exponential growth rates are the same:

1 1
htop (Xβ ) = lim sup log pβ (n) = lim sup log pS ∗ (n).
n→∞ n n→∞ n
Now to compute the latter, we use generating functions:

fS ∗ (t) = pS ∗ (n)tn and fS (t) = #{s ∈ S : |s| = n}tn .

n≥0 n≥1

1 (the single empty word ) and #{s ∈ S : |s| = n} = cn .

Note that pS ∗ (0) =
We have pS ∗ (n) = nk=1 #{s ∈ S : |s| = k}pS ∗ (n − k), and this gives for the

12 The fact that {Ac1 ···ck j : k ∈ N, 0 ≤ j < ck+1 } is a partition of [0, 1) shows that (bk )k≥1
starts with a code word rather than the suffix of a code word for every x ∈ [0, 1).
3.5. β-Shifts and β-Expansions 87

power series
1 + fS ∗ (t)fS (t) = 1 + pS ∗ (n)tn #{s ∈ S : |s| = m}tm
n≥0 m≥1
= 1+ pS ∗ (N − k)tN −k #{s ∈ S : |s| = k}tk
N ≥1 k=1

= 1+ pS ∗ (N )tN = fS ∗ (t).
N ≥1
 −n = f (β −1 ), β −1 is a
Therefore fS ∗ (t) = 1−f1S (t) . Since 1 = n≥1 cn β S
(simple) pole of fS ∗ and fS ∗ (t) is well-defined for |t| < β . Hence β −1 is

the radius of convergence of fS ∗ , and this means that the coefficients of fS ∗

lim sup log pS ∗ (n) = log β.
n→∞ n
This concludes the proof. 

One can ask whether β-shifts are density shifts and vice versa. After
all, the one-sided β-shift (Xβ , σ) is characterized as {x ∈ AN : x lex c}
for the lex -maximal sequences c of Lemma 3.65 and the one-sided density
shift (Xf , σ) is characterized as {x ∈ AN : x sum z} for the ≺sum -maximal
sequence z of (3.8). If σ n (y) sum x for all n ≥ 0, then σ n (y) lex x for
all n ≥ 0; see [523, Lemma 8.1]. Therefore the shift-maximal sequence of a
density shift is also shift-maximal for a β-shift, and every one-sided density
shift is also a β-shift. The converse, however, is false. For example, c = 302∞
is shift-maximal w.r.t. lex but not w.r.t. sum because σ 2 (c) sum c (in
fact, these two sequences are not comparable). A way of finding (non-sofic)
density β-shifts with β ∈ [0, 1]13 is as follows: Given a β-transformation
Tβ : [01, 1] → [0, 1], define T̄β : [1 − β1 , 1] → [1 − β1 , 1] by

⎪Tβ (x) = βx if 1 − β1 ≤ x ≤ β1 ,

(3.16) T̄β (x) = 1 − β1 if β1 < x < 2β−1 ,

⎩ 2β−1
Tβ (x) = βx − 1 if β 2 ≤ x ≤ 1;
see Figure 3.10.
Since T̄β (1 − β1 ) = T̄ (1) = β − 1, this map can be considered as a
non-decreasing circle endomorphism on [1 − β1 , 1]/1− 1 ∼1 , with plateau A =

[ β1 , 2β−1
]. If T̄βn (1) ∈
/ A for all n ≥ 1, then the rotation number α :=
ρ(T̄β ) ∈ / Q, and c = i(1) is shift-maximal both w.r.t. lex and sum , and
it is in particular a sequence with maximum frequency of 1’s. It is also a
13 This is for simplicity of exposition; similar constructions for β > 2 are of course possible.
88 3. Subshifts of Positive Entropy


Tβ T̄β

1 2β−1
0 1− 1
β β β2

Figure 3.10. Turning a β-transformation into a Sturmian shift.

Sturmian sequence; more specifically, the itinerary of α for the circle rotation
Rα : S1 → S1 w.r.t. the partition (0, α] (with symbol 1) and (α, 1] (with
symbol 0); cf. Definition 4.48. The canonical function of the density shift
equals the Beatty sequence f (n) = nα.

3.6. Unimodal Subshifts

A unimodal map is a continuous map f : [0, 1] → [0, 1] with a single point
c ∈ (0, 1), called critical or turning point such that f |[0,c] is increasing
and f |[c,1] is decreasing14 . This makes the critical value f (c) the largest
value that f assumes. Examples are the family of quadratic maps fa (x) =
ax(1 − x), a ∈ (0, 4] and the family of tent maps Ts (x) = min{sx, s(1 − x)},
s ∈ (0, 2]; see Figure 3.11. It is customary to scale unimodal maps so that
f (0) = f (1) = 0, but the interesting dynamics takes place on the core
[f 2 (c), f (c)], provided f 2 (c) < c < f (c).
Unimodal maps are simple to define, but difficult to analyze. Before
starting on the symbolic description, i.e. kneading theory, we give some back-
ground on the topological properties of unimodal maps15 .

14 Decreasing in our definition; in the frequently used family f (z) = z 2 + c, c ∈ [−2, 1 ], the
c 2
roles of increasing and decreasing are reversed.
15 Also for multimodal maps (i.e. continuous intervals with multiple critical points), symbolic

dynamics have been studied. Much of the structure presented here has a direct analogue, albeit
on a larger alphabet, for the multimodal case. However, since the multimodal doesn’t present
substantially different phenomena from the unimodal case, we omit it from this text, but see e.g.
[92, 414, 420].
3.6. Unimodal Subshifts 89

0 1 0 1

Figure 3.11. Unimodal maps: a quadratic map and a tent map.

The lap-number (f n ) of a unimodal map f is the number of maximal

intervals on which f n is monotone. Its exponential growth rate is the topolog-
ical entropy16 ; see Misiurewicz & Szlenk [422]: htop (f ) = limn→∞ log (f n ),
and this limit exists by Fekete’s Lemma 1.15.
A necessary condition for a unimodal map f to be transitive on its core is
that htop (f |[c2 ,c1 ] ) ≥ log 2. (For tent maps Ts the two are equivalent.) Oth-
erwise, there is an interval J  c such that f 2 (J) ⊂ J and J ∩ f (J) = ∅. We
call f renormalizable if such a cycle of intervals exists; see Section 4.7.1.
Every orbit that intersects J is eventually trapped in orb(J). Renormaliza-
tion prevents topological (weak) mixing, but non-renormalizable transitive
unimodal maps are also topologically mixing, and in fact topologically exact
on the core.
The symbolic dynamics have many features in common with β-shifts;
for instance, analogous to Theorem 3.72, parts (1) and (2), a unimodal shift
is an SFT (or strictly sofic) if and only if the critical point is periodic or
preperiodic. However, the orientation-reversing branch requires a parity-
lexicographical order (see Definition 3.81) rather than lexicographical order
to make the itinerary map i orientation preserving. The specification prop-
erties of unimodal maps were described by Blokh: every continuous interval
map has specification if and only if it is topologically mixing17 [91, 92]; see
also Theorem 2.75. The shadowing properties (at least for tent maps) are
completely understood too. For this result, we define the unimodal map f

16 Also 1 1
htop (f ) = max{0, limn→∞ n log Var(f n )} = lim supn n log #{n-periodic points}, so
specifically, htop (Ts ) = max{0, log s} for the tent map Ts with slope s ∈ [0, 2].
17 This is quite a weaker sufficient condition for specification than for β-transformations, but

the discontinuities of the β-transformations make a very important difference.

90 3. Subshifts of Positive Entropy

to be long-branched if there is η > 0 such that for all n ∈ N and maximal

intervals J on which f n is monotone, |f n (J)| > η.
Theorem 3.78. A tent map has the shadowing property if and only if its
critical point c is periodic or T is not long-branched.

Proof. See [170] and also [116, Section 6.3]. 

3.6.1. Kneading Theory. Kneading theory is the theory of symbolic

dynamics of unimodal (or multimodal) maps. It was introduced in [416] but
the first deep contributions are due to Milnor & Thurston [420]. Further
comprehensive sources are [164, 196, 414, 462]. Kneading theory got its
name from the image of a baker kneading a slab of dough with a raisin in
it. The dough is repeatedly stretched and folded, and we write a 0 or 1
according to whether the raisin appears in the top or bottom half of the
We use the partition {[0, c], [c, 1]} to generate symbolic dynamics by
defining the itinerary i(x) = e0 e1 e2 · · · of x ∈ [0, 1] as

0, f k (x) ∈ [0, c],
ek =
1, f k (x) ∈ [c, 1].

As discussed in Section 1.6, this definition is ambiguous if f k (x) = c for some

k ≥ 0. There are several ways of resolving this, such as introducing a third
symbol (usually ∗ or C) if f k (x) = c, or making a consistent
 choice of 0 or 1.
However, we will allow both symbols, so a point x ∈ k≥0 f −k (c) will have
two itineraries, because in this way, the unimodal shift space
Σf = {e ∈ {0, 1}N0 : e = i(x) for some x ∈ [f 2 (c), f (c)]}
becomes a closed shift-invariant set. That is, (Σf , σ) is a genuine subshift.
Remark 3.79. If c is periodic, then ek can be ambiguous for infinitely
many k’s. Allowing all possibilities, however, yields a subshift with isolated
points, because not every choice is the limit of “unambiguous itineraries”.
This “periodic” unimodal shift-space is best described by an SFT. We will
not be overly concerned with this in this section.
Exercise 3.80. Another way of coding orbits of unimodal maps (used by
Milnor & Thurston [420] and Derrida et al. [195]) is as follows:

+1 if fkc is increasing at x,
ij (x)
θk (x) = (−1) =
−1 if fkc is decreasing at x.

Find the sliding block code transforming θ(x) into i(x). Is there an inverse
sliding block code transforming i(x) into θ(x)?
3.6. Unimodal Subshifts 91

The basic questions of kneading theory are:

(1) Admissibility of itineraries for a fixed unimodal map f . Give
necessary and sufficient conditions for a sequence e ∈ {0, 1}N0 to
belong to Σf .
(2) Admissibility of kneading sequences. The answer to question
(1). can be phrased purely in terms of the itinerary ν(f ) = i(f (c))
of the critical value. We call this the kneading sequence of f .
Since unimodal maps only have interesting dynamics if f 2 (c) <
c < f (c), we will always assume that ν starts with 10. But apart
from this: give necessary and sufficient conditions for a sequence
ν ∈ {0, 1}N to be a kneading sequence of some unimodal map.
(3) Topological properties: derive properties of the unimodal map
from the subshift, such as the renormalizability and the properties
of the critical omega-limit set ω(c).

3.6.2. Admissibility Conditions. The first ingredient necessary to an-

swer questions (1) and (2) above is an order relation on {0, 1}N0 .
Definition 3.81. Set 0 < 1. Given distinct e, ẽ ∈ Σ = {0, 1}N and n =
min{k ≥ 1 : ek = ẽk }, the parity-lexicographical order between e and ẽ
is given by

en < ẽn and #{i < n : ei = 1} is even,
e ≺pl ẽ if
en > ẽn and #{i < n : ei = 1} is odd.
Also e pl ẽ if e ≺pl ẽ or e = ẽ.
Remark 3.82. Let e1 · · · en−1 en ∈ {0, 1}n be the word e1 · · · en−1 en with the
last symbol switched. One can easily check that e1 · · · en−1 en "pl e1 · · · en−1 en
if and only if |e1 · · · en |1 (i.e. the number of 1’s in e1 · · · en ) is even.
Theorem 3.83. A sequence e ∈ {0, 1}N0 belongs to Σf if and only if
(3.17) σ(ν) pl σ k (e) pl ν for all k ≥ 0.

Proof. First let x, y ∈ [0, 1] be such that x < y. If c ∈ / [f k (x), f k (y)] for
all k ≥ 0, then i(x) = i(y). Otherwise, take n ≥ 0 minimal such that
c ∈ [f n (x), f n (y)]. Then ik (x) = ik (y) for 0 ≤ k < n. Furthermore, f n |(x,y)
is increasing/decreasing precisely if |i0 (x) · · · in−1 (x)|1 is even/odd. In the
former case, in (x) < in (y), so i(x) ≺pl i(y). In the latter case, in (x) > in (y),
so again i(x) ≺pl i(y).
This shows that
(3.18) i : ([0, 1], <) → (Σ, ≺pl ) is order preserving.
Since f 2 (c) ≤ x ≤ f (c) for every x in the core, (3.17) follows.
92 3. Subshifts of Positive Entropy

For the converse, assume that e ∈ {0, 1}N0 satisfies (3.17). We need
to find x ∈ [f 2 (c), f (c)] with e = i(x). Define cylinder sets Zn = {x ∈
[f 2 (c), f (c)] : ik (x) = ek for 0 ≤ k < n}. By definition Zn+1 ⊂ Zn for all n.
We will show that Z∞ := n Zn = ∅, and then each x ∈ Z∞ has itinerary
i(x) = e.
Using (3.18)

[f 2 (c), c) and σ(ν) pl e if e0 = 0,
Z1 =
(c, f (c)] and e pl ν if e1 = 0.
 −j (c).
Now assume by induction that Zn−1 is found and that ∂Zn−1 ∈ n−1 j=−1 f
Therefore f n (Zn−1 ) is contained in an interval [f a (c), f b (c)], 0 < a, b ≤ n+2.
First assume c ∈ f n (Zn−1 ). Regardless of the value of en , we can take Zn
to be the closure of the component of Zn \ f −n (c) such that in (Zn ) = en . If
c∈/ f n (Zn−1 ), then Zn = Zn−1 , but because e satisfies (3.17), in (Zn ) = en .
This induction step proves that all the Zn ’s are closed non-empty nested
intervals and i(x) = e for every x ∈ n Zn . This concludes the proof. 

3.6.3. Cutting Times and the Kneading Map. Let ν be the kneading
sequence. We can split any sequence e into maximal pieces (up to the last
symbol) that coincide with a prefix of ν. To this end, define
(3.19) ρ : N → N, ρ(n) = max{k > n : en+1 en+2 · · · ek−1 is prefix of ν}.
That is, the function ρ depends on e and ν, but we will suppress this de-
pendence. When we apply this for e = ν itself, we obtain the sequence of
cutting times which were introduced in the late 1970s by Hofbauer [316].
They are given recursively by
S0 = 1, Sk+1 = ρ(Sk ),
or in other words Sk = ρk (1) for e = ν and k ≥ 0.
Example 3.84. There is a unique transitive unimodal map, up to conjugacy
and homtervals18 , that has cutting times S0 , S1 , S2 , S3 , S4 , . . . = 1, 2, 3, 5, 8, . . .
equal to the Fibonacci numbers. We call this the Fibonacci (unimodal)
map, and, as one would expect, it has connections with Fibonacci substitu-
tions and golden mean rotations; see Proposition 5.26 and [125].
Lemma 3.85. Let ν be an admissible kneading sequence. The integer n ≥ 1
is a cutting time if and only if ν1 · · · νn is admissible w.r.t. ν in the sense
that (3.17) holds for it. In this case also ν1 · · · νn contains an odd number of
18 An interval J ⊂ [0, 1] is called a homterval if f n : J → f n (J) is a homeomorphism for

every n ∈ N.
3.6. Unimodal Subshifts 93

Proof. We argue by induction. Since ν starts with 10, the statement holds
for n = 1, 2. For the induction step, assume the assertion holds for all
j < n. Let k be maximal such that Sk < n and assume ρ(Sk ) < ∞, because
otherwise, ν is Sk -periodic and there is nothing to prove. We distinguish
four cases:
• n < ρ(Sk ) and n − Sk is not a cutting time. Then the word
νSk +1 · · · νn = ν1 · · · νn−S

is not admissible by induction, and hence

σ k (ν1 · · · νn ) fails (3.17).

• n < ρ(Sk ) and Sj := n − Sk is a cutting time. Then the word

νSk +1 · · · νn = ν1 · · · νS j is admissible by induction. Since |ν1 · · · νSk |1
and |ν1 · · · νSj |1 are odd, |ν1 · · · νn |1 is even and ν1 · · · νn " ν1 · · · νn ,
so it fails (3.17).
• n = ρ(Sk ) and n − Sk is not a cutting time. Then the words
ν1 · · · νn−Sk and νSk +1 · · · νn = ν1 · · · νn−S both occur in ν, which
is against the induction hypothesis.
• The remaining case is n = ρ(Sk ) and Sj := n − Sk is a cutting time.
Since ρ(Sk ) < ∞, this must be allowed. Furthermore, |ν1 · · · νn |1 =
|ν1 · · · νSk |1 +|ν1 · · · νSj |1 ±1 is odd, since |ν1 · · · νSk |1 and |ν1 · · · νSj |1
are odd by the induction hypothesis.
This finishes the induction step and the proof. 

Hence we can define the kneading map Q : N → N0 ∪ {∞} by

(3.20) Sk = Sk−1 + SQ(k) .
If ρ(Sk ) = ∞, then Q is only defined on {1, . . . , k}. Based on the ρ-function
and cutting times, several further admissibility conditions were formulated
[118, 316, 534] of which we mention two in the next theorem.
Exercise 3.86. Let ν and ν̃ be two admissible kneading sequences with
kneading maps Q and Q̃. Show that ν ≺pl ν̃ if and only if (Q(j))j≥1 "lex
(Q̃(j))j≥1 , where "lex is the lexicographical order.
Exercise 3.87. How is the ρ-function defined if kneading sequences are
expressed as the itinerary of c1 w.r.t. the alternative way of Exercise 3.80?
Exercise 3.88. Given an admissible kneading sequence ν, let
κ := min{n ≥ 2 : νn = 1}
be the position of the second 1. The numbers Ŝj = ρj (κ) are called the
co-cutting times.
(a) Show that Ŝk − Ŝk−1 is a cutting time, so there is a co-kneading
Q̂ : N → N ∪ ∞, Ŝk − Ŝk−1 = SQ̂(k) .
94 3. Subshifts of Positive Entropy

(b) Assume that c is not periodic. Show that

local maximum if n is a cutting time,
|f n (c) − c| has a
local minimum if n is a co-cutting time.

(c) Show that if n is such that |f n (c) − c| < |f k (c) − c| for all 1 ≤ k < n,
then n is a cutting or a co-cutting time. That is, closest returns of c happen
at cutting or co-cutting times.
(d) If Q̂(k) is bounded, show that c is not recurrent.
(e) Give an example of a unimodal map with bounded kneading map but
c is non-periodic and recurrent.

A more graphical way of seeing the cutting times is by means of the

following construction. Write ck := f k (c). Inductively, define intervals D1 =
[0, c1 ] and

f (Dn ) if c ∈
/ Dn ,
Dn+1 =
[cn+1 , c1 ] if c ∈ Dn ; i.e. n = Sk is a cutting time.
It follows by induction that Dn = [cn , cβ(n) ] or [cβ(n) , cn ] where β(n) =
n − max{Sk : Sk < n}. Moreover, Dn ⊂ Dβ(n) for all n ≥ 2 and these two
intervals have cβ(n) as common boundary point.
Lemma 3.89. A tent map Ts with slope s > 1 is long-branched if and only
if c is periodic or its kneading map Q is bounded.

Proof. Note that a tent map Ts is long-branched if and only if lim inf n |Dn |
> 0, and since |Dn+1 | = s|Dn | unless n is a cutting time, this is equivalent
to lim inf k |D1+Sk | > 0.
If c is periodic, then {|Dn | : n ∈ N} is a finite collection and hence Ts is
long-branched. So let us assume that c is not periodic. If Q(k) ≤ B, then
Sk − Sk ≤ SB . It follows that lim inf k |c − cSk | > 0, because otherwise, the
time between cutting times is unbounded. Therefore lim inf k |D1+Sk | > 0
and Ts is long-branched.
If, on the other hand, lim supk Q(k) = ∞, then lim supk Sk − Sk−1 =
∞, and hence lim inf k |c − cSk | ≤ lim inf k s−(Sk −Sk−1 ) = 0. This gives
lim inf k |D1+Sk | = 0, and Ts is not long-branched. 
The disjoint union D = n≥1 Dn supports a map

Dn+1 if c ∈
/ [cn , x],
fˆ(x ∈ Dn ) = f (x) ∈
DSQ(k)+1 if c ∈ [cn , x], so n = Sk is a cutting time.
The collection {Dn }n≥1 forms a countable Markov partition for (D, fˆ). It
is easy to see that the inclusion map π : x ∈ Dn → x ∈ [0, 1] satisfies
3.6. Unimodal Subshifts 95

π ◦ fˆ = f ◦ π. Hence (D, fˆ) is an extension of ([0, 1], f ), and Hofbauer [316]

called it the canonical Markov extension of f , but the object became
better known as the Hofbauer tower.

c9 c1 c2 c9 c1

c3 c8 c3 c8 c1

c2 c7 c2 c7 c1

c6 c1 c2 c6 c1

c2 c5 c2 c5 c1

c4 c1 0 c4 c1

c3 c1 0 c3 c1

c2 c1 0 c2 c1

0 c1 0 c1 1
c c

Figure 3.12. The Hofbauer tower and extended Hofbauer tower for the
Fibonacci map.

Hofbauer saw (D, fˆ) as an infinite Markov chain extending the interval
dynamics (I, f ) and explicitly added arrows Di → Dj if fˆ(Di ) ⊃ Dj . We
can edge-label this graph by setting
Di −→ Dj if fˆ−1 (Dj ) ∩ Di ⊂ [0, c],
Di −→ Dj if fˆ−1 (Dj ) ∩ Di ⊂ [c, 1].

The infinite paths on this graph starting in D1 are thus put in one-to-one
correspondence with X = {i(x) : x ∈ [0, f (c)]} = {x ∈ {0, 1}N0 : σ k (x)
pl ν}. Therefore the edge-labeled Hofbauer tower is immediately the count-
able state automaton accepting the language L(X); see Figure 3.13 for the
Fibonacci map. Such automata are discussed at length in [565, Chapter 5 &
6]. If c has an infinite orbit, then all the sets Dn are all different. Therefore
the corresponding unimodal shift has F (k) = k + 1 distinct follower sets
associated to k-words; see (3.14) and Theorem 3.76. If c is preperiodic, then
there are only finitely many different levels Dn , and L(X) is sofic, and in
fact it is an SFT if c is periodic.
96 3. Subshifts of Positive Entropy

0 1 1 0 0

1 0 0 1 1 1 0 1 1
• • • • • • • • • •

Figure 3.13. The edge-labeled Markov graph for the Fibonacci map.

One can extend the Hofbauer tower so as to account for the co-cutting
times as well. Set D̂1 = [0, 1] and inductively

⎨f (D̂n ) if c ∈ / Dn ,
D̂n+1 = f (En ) if c ∈ Dn and En is the component of

Dn \ {c} containing c,
See Figure 3.12 for the Hofbauer tower and extended Hofbauer tower of
the unimodal Fibonacci map. Then cn ∈ D̂n for all n ≥ 1 and there is a
neighborhood Zn−1  f (c) such that f n−1 : Zn−1 → D̂n is monotone onto.
Also, if c ∈ D̂n , then n is a cutting or a co-cutting time. More precisely,

cutting time if c ∈ Dn ,
(3.21) if c ∈ D̂n , then n is a
co-cutting time if c ∈ D̂n \ Dn .
It is clear from this that the cutting and co-cutting times are disjoint se-
Theorem 3.90. A sequence ν = 10 · · · is an admissible kneading sequence
if one of the following equivalent conditions is satisfied:
(a) σ(ν) pl σ n ν pl ν for all n ∈ N0 .
(b) The kneading map is well-defined by (3.20) above, and (according
to Hofbauer [316])
(3.22) {Q(k + j)}j≥1 #lex {Q(Q2 (k) + j)}j≥1
for all k ≥ 1, where #lex stands for the lexicographical order on
sequences. Here we set Q(0) = 0 by convention.
(c) If ρ(m) < ∞, then ρ(m) − m is a cutting time.
(d) The sequences of cutting times {Sk }k≥0 and co-cutting times {Ŝ }≥0
(see Exercise 3.88) are disjoint.

Proof. We first show that admissibility implies the four conditions (a)–(d).
The necessity of condition (a) is shown in Theorem 3.83.
Condition (d) follows directly from (3.21).
Define the closest precritical point ζ ∈ [0, 1] as any point such that
f n (ζ) = c for some n ≥ 1 and f k (x) = c for all k ≤ n and x ∈ (ζ, c).
By symmetry, if ζ is a closest precritical point, ζ̂ = 1 − ζ is also a closest
3.6. Unimodal Subshifts 97

precritical point. If ζ  ∈ (ζ, c) is the closest precritical point of lowest n > n,

then the itineraries of f (c) and f (x), x ∈ (ζ̂  , ζ̂), coincide for exactly n − 2
entries and differ at entry n − 1. Hence n is a cutting time, say n = Sk
for some k  ≥ 1. We use the notation ζ = ζk if n = Sk . That is,

f Sk (ζk ) = f Sk (ζ̂k ) = c,
· · · < ζk < ζk+1 < · · · < c < · · · < ζ̂k+1 < ζ̂k < · · ·
(3.24) x ∈ Υk := (ζk−1 , ζk ] ∪ [ζ̂k , ζ̂k−1 ) ⇒ i(f (x)) = ν1 · · · νSk −1 νS k .
Applying this to x = f m (c), we obtain that ρ(m) − m is a cutting time, and
this proves (c).

Figure 3.14. The points ζQ(k) < cSk−1 < ζQ(k)−1 and their images
under f SQ(k) .

In particular,
(3.25) f Sk−1 (c) ∈ ΥQ(k) = (ζQ(k)−1 , ζQ(k) ] ∪ [ζ̂Q(k) , ζ̂Q(k)−1 ),
see Figure 3.14, and the larger Q(k), the closer f Sk−1 (c) is to c.
Formula (3.22) can be interpreted geometrically as cSk ∈ [c, cSQ2 (k) ]; see
Figure 3.14. To see this, apply f SQ(k) to the points ζQ(k) , cSk−1 , and ζQ(k)−1 .
We find cSk ∈ (c, cSQ2 (k) ), so cSk is closer to c than cSQ2 (k) is. This implies
that Q(k + 1) ≥ Q(Q2 (k) + 1). If the inequality is strict, then (3.22) holds.
Otherwise, i.e. if Q(k + 1) = Q(Q2 (k) + 1), then both cSk and cSQ2 (k) ∈
(ζQ(k+1) , ζQ(k+1)−1 ) and we apply f SQ(k+1) , which maps (ζQ(k+1) , ζQ(k+1)−1 )
into [c, f Q2 (k+1) (c)). Therefore cSk+1 ∈ (c, cSQ2 (k)+1 ). This shows that
Q(k + 2) ≥ Q(Q2 (k) + 2). If the inequality is strict, then again (3.22)
holds; otherwise both c1+Sk+1 and c1+SQ2 (k)+1 ∈ ΥQ(k+2) and we can apply
f SQ(k+2) . Repeating the argument shows that (3.22) holds in any case, and
(b) is proven.
(c) ⇒ (a): Since ρ(m) − m is a cutting time, #{m < j ≤ ρ(m) : νj = 1}
is even by Lemma 3.85. Hence σ m (ν) pl ν (cf. Exercise 3.86). Since ν1 = 1,
the parity-lexicographical order implies that σ m+1 (ν) #pl σ(ν) for all m.
98 3. Subshifts of Positive Entropy

(a) ⇒ (b): We have

ν1 · · · νSk = ν1 · · · νSk−1 ν1 · · · νS Q(k) = ν1 · · · νSk−1 ν1 · · · νSQ(k)−1 ν1 · · · νSQ2 (k) ,
so ν1 · · · νSQ2 (k) is a suffix of ν1 · · · νSk . Suppose by contradiction that (3.22)
fails at entry k, say Q(k)+j = Q(Q2 (k)+j) for 1 ≤ j ≤ j0 and Q(k)+j0 +1 <
Q(Q2 (k) + j0 + 1). Then
Sk −SQ2 (k)
σ (ν)
= ν1 · · · νSQ2 (k) ν1 · · · νS Q(k+1) · · · · · · ν1 · · · νS Q(k+j ) ν1 · · · νS Q(k+j +1) · · ·
odd even even even
= ν1 · · · νSQ2 (k) ν1 · · · νS · · · ν1 · · · νS ν1 · · · νS k+j +1) ···
Q(Q2 (k)+1) Q(Q2 (k)+j 0 ) 0

= ν1 · · · νn · · · " ν,
where n is not a cutting time because SQ2 (k)+j0 < n < SQ2 (k)+j0 +1 . This
contradicts (3.17).
(b) ⇒ (d): First we claim that if Sk−1 < Ŝ < Sk < Ŝ+1 , then Sk − Ŝ =
SQ2 (k) . This is true for Ŝ0 = κ and Sk = κ + 1, because then Sk − S = 1 =
SQ2 (k) .
Assume now by induction that Sk−1 < Ŝ = Sk − SQ2 (k) < Sk for some
k, . Then (3.22) gives Ŝ+1 = Sk + Sj for some k  ≥ k and j < Q(k  + 1).
But since νSk +1 · · · νSk +Sj νSk +1 = ν1 · · · νSj · · · νS  , the integers Sk +
Q(k +1)
Sj , Sk + Sj+1 , . . . , Sk + Sj  are co-cutting times for all j ≤ j  < Q(k  + 1).
The largest such integer is Ŝ := Sk + SQ(k +1)−1 , so

Sk +1 − Ŝ = Sk +1 − (Sk + SQ(k +1)−1 ) = SQ(k +1) − SQ(k +1)−1 = SQ2 (k +1) ,
and this completes the induction step. But repeating this step also shows
that {Sk }k≥0 and {Ŝ }>0 are disjoint, so (d) holds.
It remains to prove that (d) implies admissibility. For this we will use
the quadratic family fa (x) = ax(1 − x), a ∈ [0, 4], with critical point c = 12
to which we assign the symbol ∗. Let
Aν1 ···νn := {a ∈ [0, 4] : the kneading sequence ν(fa ) starts with ν1 · · · νn }.

Then A0 = √[0, 2), A1 = (2, 4] while f2 (c) = c. Also A11 = [2, 1 + 5),
A10 = (1 + 5, 4] while f1+ 2 √ (c) = c. We are only interested in kneading
sequences starting with 10, so we continue with A10 .
Define ϕn (a) := fan (c). It is easy to check that ϕ2 : A10 → [0, c] =
2 2 √ (c)] = [f 2 (c), c] is monotone onto. We claim that this holds in
[f4 (c), f1+ 5 4
general: for all prefixes ν1 · · · νn of some ν satisfying (b),

(3.26) ϕn : Aν1 ···νn → [fan−S

(c), fan−
(c)] is monotone onto,
3.6. Unimodal Subshifts 99

where Smax and Ŝmax are the largest cutting and co-cutting times in ν1 · · · νn
(see Exercise 3.88) and a1 , a2 are the boundary points of Aν1 ···νn and the
order in boundary points in [fan−S 1
max (c), f n−Ŝmax (c)] may be the other way
around. Also Ŝmax = 0 if ν1 · · · νn = 10 · · · 0 by convention.
If n + 1 is neither a cutting nor co-cutting time, then νn−Smax +1 =
νn−Ŝmax +1 = νn+1 , so (fan−S max (c), f n−Ŝmax (c))  c, Therefore A
1 a2 ν1 ···νn+1 =
Aν1 ···νn and Smax and Ŝmax remain the same. Also fa : f (Aν1 ···νn+1 ) →

ϕn+1 (Aν1 ···νn+1 ) is monotone onto, so (3.26) holds for Aν1 ···νn+1 .
If n + 1 is a cutting time, then νn−Smax +1 = νn−Ŝmax +1 = νn+1 , so

max +1
(c), fan−
Ŝmax +1
(c))  c.

Now Aν1 ···νn+1 is a proper subset of Aν1 ···νn and Smax = 0 and Ŝmax remains
the same. Again fa : f n (Aν1 ···νn+1 ) → ϕn+1 (Aν1 ···νn+1 ) is monotone onto, so
(3.26) holds for Aν1 ···νn+1 .
If n + 1 is a co-cutting time, then νn−Ŝmax +1 = νn−Smax +1 = νn+1 , so

max +1
(c), fan−
Ŝmax +1
(c))  c.

Again, Aν1 ···νn+1 is a proper subset of Aν1 ···νn and Ŝmax = 0 and Smax remains
the same. Also fa : f n (Aν1 ···νn+1 ) → ϕn+1 (Aν1 ···νn+1 ) is monotone onto, so
(3.26) holds for Aν1 ···νn+1 .
Since Aν1 ···νn+1 ⊂ Aν1 ···νn , n≥2 Aν1 ···νn = ∅. If ν is periodic and
νn = ∗, then there is a ∈ ∂Aν1 ···νn with ν(fa ) = ν. Otherwise, Aν1 ···νn+1 ⊂
Aν1 ···νn infinitely often, so n≥2 Aν1 ···νn = ∅ and ν(fa ) = ν for each a ∈
n≥2 Aν1 ···νn . 

Exercise 3.91. Show that if Sk−1 < n < Sk ≤ ρ(n), then Sk − n is a cutting

3.6.4. Kneading Determinants and Topological Entropy. The the-

ory of kneading determinants was developed by Milnor & Thurston [420]
(see also the exposition in [414, Section II.8]) in order to address proper-
ties of topological entropy and counting periodic orbits for unimodal maps
f : I → I. Recall the alternative way of symbolic dynamics for unimodal
maps from Exercise 3.80, evaluated at the critical value

+1 if f k is increasing at f (c),
(3.27) θk =
−1 if f k is decreasing at f (c).
The power series
(3.28) Df (t) := 1 + θ1 t + θ2 t2 + θ3 t3 + · · ·
100 3. Subshifts of Positive Entropy

is called the kneading determinant of the unimodal maps f . To deal

with the case that f p (c) = c for some (minimal) period p ≥ 1, it is more
accurate to say increasing/decreasing on a left neighborhood of f (c) instead
of at f (c). Thus θp = 1 in this case, and Df (t) has a periodic sequence of
P (t)
coefficients. Therefore Df (t) = 1−t p for some polynomial of degree p − 1

with coefficients ±1 if c is p-periodic (see Table 3.1 19 and Corollary 3.96),

but this rational function can sometimes be reduced to a simpler form. For
example, if f 2 (c) = c, then the itinerary of f (c) is i(f (c)) = 1 ∗ 1 ∗ . . . and
Df (t) = 1 − t + t2 − t3 + t4 − · · · = 1−t1−t 1
2 = 1+t .

Example 3.92. Hofbauer [315] showed that tent maps Ts (x) are intrinsi-
cally ergodic. If s > 1, then the measure of maximal entropy is absolutely
continuous w.r.t. Lebesgue measure and its density is given explicitly as
dμ θ(n)
(3.29) = 1 n+1 s s (x),
dx sn [Ts ( 2 ), 2 ]

for θ(n) as in (3.27); see [195, Section 5.3]. In fact, (3.29) extends to skew
tent maps

⎨sx if 0 ≤ x ≤ c := s+t
(3.30) Ts,t : [0, 1] → [0, 1], Ts,t (x) =
⎩t(1 − x) if c ≤ x ≤ 1

with slopes s, t > 1, st ≤ s + t, as

dμ 1
= 1 k (c),Ts,t (c)] ;
k ) (T (c)) [Ts,t
dx (Ts,t s,t

see [330]. Further results in this direction can be found in [282, 370].

The main result of this section relates the kneading determinant to the
topological entropy of the map. The rest of this section leads up to its proof.
Theorem 3.93. The topological entropy htop (f ) > 0 if and only if t0 :=
inf{t > 0 : Df (t) = 0} ∈ (0, 1) and in this case htop (f ) = − log t0 .

By setting 0 < ∗ < 1 we can extend ≺pl to sequences in {0, ∗, 1}N with
the property that if em = ∗, then σ m (e) = ν is the kneading sequence ν of
Milnor & Thurston [420] used formal power series rather than symbolic
dynamics to phrase their kneading theory. This is a bit more involved, but
for many purposes a very powerful method. Let us interpret the intervals
19 Exercise 7.15 gives a precise recursive formula of the lap-number of the Feigenbaum map,

and Exercise 3.98 gives the kneading determinant of the quadratic map with a period 3 critical
3.6. Unimodal Subshifts 101

Table 3.1. Kneading determinants and lap-numbers for the quadratic

family fa (x) = ax(1 − x).

Attractor Kneading det. Dfa (t) Lap-number (fan |[0,1] )

period 1 1−t 2
period 2 1+t 2n
period 4 1−t
n2 − n + 2
(1−t)(1−t2 )
period 8 1+t4
12 (2n
3 − 3n2 + 22n + 0 or 3)
(1−t)(1−t2 )·(1−t2 )
period 2r 1+t2
r polynomial of degree r + 1
& 2r
Feigenbaum r≥0 (1 − t ) superpolynomial
period 3 1−t−t2
2Fn (Fibonacci number)

E0 := [f 2 (c), c) and E1 := (c, f (c)] as formal unit vectors associated with

symbols 1 and 0. For j ≥ 0 define

⎨ +1 if f j (x) ∈ E0 ,
εj (x) := 0 if f j (x) = c,

−1 if f j (x) ∈ E1 ,
expressing the fact that f |E0 preserves and f |E1 reverses orientation. Then
the product
Ek if f n (x) ∈ Ek for k = 0, 1,
Θn (x) := ε0 (x) · · · εn−1 (x) · 1
2 (E0 + E1 ) if f n (x) = c
is a formal vector expressing where f n (x) is situated and whether f n is
locally increasing, decreasing or assumes an extremum at x. The vector-
valued formal power series
Θ(x, t) = Θn (x)tn
is called the invariant coordinate of x.
Lemma 3.94. The sum of the coefficients of Ej , j = 0, 1, 2, satisfies
(3.31) (1 − ε(Ej )t) · δj (Θ(x, t)) = 1

for every point x. Here the Kronecker delta (δi (Ej ) = 1 if i = j and δi (Ej ) =
0 otherwise) is extended by linearity to vectors with Q[t]-valued coefficients.
Example 3.95. Before proving this lemma, let us see how this works out
for the fixed points of f . The orientation-reversing fixed point α ∈ E1
102 3. Subshifts of Positive Entropy

has Θ(α) = (1 − t + t2 − t3 + · · · )E1 , so formally (1 − ε(E1 )t)δ1 (Θ(x)) =

(1 + t)(1 − t + t2 − t3 + · · · ) = 1, whereas δ0 (Θ(x)) = 0, because E0 doesn’t
appear in θ(x).
For the orientation-preserving fixed point β ∈ E0 it works similarly
with indices and signs reversed: Θ(β) = (1 + t + t2 + t3 + · · · )E0 , so
(1 − ε(E0 )t)δ0 (Θ(x)) = (1 − t)(1 + t + t2 + t3 + · · · ) = 1.
If f also has a period two orbit {γ0 , γ1 } with γi ∈ Ei , then we can
E0 − tE1 tE0 + E1
Θ(γ0 ) = , Θ(γ1 ) = .
1+t 2 1 − t2
1−t (1+t)t (1−t)t 1+t
Therefore 1+t 2 + 1+t2 = 1 = 1−t2 + 1−t2 as Lemma 3.94 claimed.

Proof. We write 1j=0 (1 − ε(Ej )t)δj (Θ(x, t)) as a double sum and assume
for simplicity that orbf (x)  c:
(1 − ε(Ej )t)δj (Θn (x))tn
j=0 n≥0
= (1 − ε(Ej )t) εk (x) tn
j=0 n≥0 k=0
f n (x)∈Ej
= εk (x) t −
εk (x) tn+1
n≥0 k=0 n≥0 k=0
f n (x)∈E0 f n (x)∈E0
+ εk (x) t −n
εk (x) tn+1 .
n≥0 k=0 n≥0 k=0
f n (x)∈E1 f n (x)∈E1
Formally, all positive powers tn cancel, leaving only x∈E0 t0 + x∈E1 t0 =
1. If f n (x) = 0 for some n, then the definition of Θn (x) allows a similar

The qualitative behavior of the entire interval map is given by the in-
variant coordinate of the critical value. In this terminology, the kneading
ν(t) := lim Θ(x, t) − lim Θ(x, t)
xc x c
is the object closest to our kneading sequence.20 This formula obviously
expresses the change of kneading coordinate Θ(x) as the point x moves
20 We changed the sign from the definition on page 483 of [420] because in our setting f
assumes a maximum rather than a minimum at the critical point. The same construction, with d
formal unit vectors Ek , k = 0, 1. . . . , d − 1, can be carried out for a d − 1-modal interval map (i.e.
with d − 1 critical points) and also (although not covered in [420]) for piecewise continuous maps.
3.6. Unimodal Subshifts 103

across c, but it can also be used to express a change of kneading coordinate

Θ(x) as the point x moves across z for any precritical point z, say f n (z) = c
for some minimal n ≥ 0:

lim Θ(x, t) − lim Θ(x, t) = tn ν(t).

xz x z

Milnor & Thurston [420] continue to define kneading matrices and kneading
determinants Df (t) which in the unimodal case is equal to

(3.32) Df (t) = (δ0 (ν(t)) − δ1 (ν(t))) = 1 + ε1 (c1 )t + ε1 (c1 )ε2 (c1 )t2 + · · ·

which is the same as (3.28).

Corollary 3.96. If the critical point c is periodic of period p, then the

kneading determinant D(t) is a polynomial of degree p − 1. If 0 is attracted
to a periodic orbit orb(x), then the kneading determinant is rational:

P (t)
Df (t) = ,
1 ± tp

where p is the period of x, P is a polynomial of degree < p and the sign ±

is according to whether f p reverses or preserves orientation at x.

Proof. If f p (c) = c, then εp = 0, so Df (t) is truncated at the p − 1-st term.

If c is attracted to a periodic attractor, rather than periodic itself, then 0
is in the immediate basin of some periodic point x. Writing ε0 = 1, we
obtain ε0 · ε1 · · · εp−1 = εkp · εkp+1 · · · · εkp+p−1 = ±1 for all k, where the
sign only depends on whether f p preserves or reverses orientation. Hence
 P (t)
Df (t) = P (t) k≥0 (±t)k = 1∓t p for the polynomial P (t) = 1 + ε1 (c1 )t +

ε1 (c1 )ε2 (c1 )t + · · · + ε1 (c1 )ε2 (c1 ) · · · εp−1 (c1 )tp−1 .


For an open interval J ⊂ I and n ≥ 0, let γn (J) be the number of

precritical points z such that n is the minimal
 integer such that f n (z) = c,
forming the formal power series γ(J) = n≥0 γn (J)tn . Define also the lap-
number (f n |J ) as the number of maximal subintervals of J on which f n is
monotone21 .

21 But not necessarily strictly monotone — even if f n | has flat pieces, this does not count
towards the lap-number.
104 3. Subshifts of Positive Entropy

Then, as formal power series,

∞ ∞ ∞
(1 − t) (f n |J )tn−1 = 1 − (f n |J ) + (f n+1 |J )tn
n=1 n=0 n=0

= 1+ ((f n+1 |J ) − (f n |J ))tn

= 1+ γn (J)tn ,

so that

(3.33) (f |J )t
n n−1
= 1+ γn (J)tn .
n=1 n=0

Lemma 3.97. For the interval J = (a, b), the difference

lim Θ(x, t) − lim Θ(x, t) = γ(J)ν(t).
xb x a

Also γ([0, 1]) = (1 − t)−1 Df (t)−1 and

(3.34) (f n )tn−1 = 1 − t + Df (t)−1 .
(1 − t) 2

Proof. The difference limxb Θ(x, t) − limx a Θ(x, t) is the sum of the in-
crements of all precritical points z. Each precritical point of order n gives a
contribution of tn ν(t), and γ(J) counts how many order n precritical points
there are, giving them weight tn . So the first formula follows.
Θ(β) = E0 (1 + t + t2 + . . . ) = E0
Θ(−β) = E1 − E0 (t + t2 + t3 + . . . ) = E1 − E0 ,
we can use formula (3.32) to simplify for J = [0, 1]:
γ([0, 1])D(t) = (δ0 (γ(t)ν(t)) − δ1 (γ(t)ν(t)))
= δ0 ( lim Θ(x, t) − lim Θ(x, t))
2 x1 x 0

−δ1 ( lim Θ(x, t) − lim Θ(x, t))
x1 x 0
1 1 t 1
= + +1 = .
2 1−t 1−t 1−t
3.6. Unimodal Subshifts 105

This gives γ([0, 1]) = (1 −

 t)−1 Df (t)−1 . Combined
 with (3.33), we also get
∞ −1
n=1 (f )t
n−1 = (1−t)2 1 − t + Df (t) .

Exercise 3.98. Let the parameter a = 3.83187405528332 . . . of the qua-

dratic family fa (x) = ax(1 − x) be such that f 3 (c) = c for the critical point
c = 12 .

(1) Show that the kneading determinant is Dfa (t) = 1−t3
(2) Show that

1 + γ(t) = = 2Fn tn ,
1 − t − t2

for the Fibonacci numbers F0 = F1 = 1, F2 = 2, . . . .

(3) Hence show that (fan |[0,1] ) = 2Fn and that htop (fa ) = log 12 (1+ 5).

The main result of this section follows directly from (3.34).

Proof of Theorem 3.93. The power series ∞ n n−1 converges for
n=1 (f )t
all t less than its radius of convergence R. But by [422], the lap-number
(f n ) ∼ ehtop (f )n , so − log R = htop (f ). By (3.34), this is also the first zero
of the kneading determinant, as required. 

Remark 3.99. Theorem 3.93 amounts to a result from  [195, Section 3]

stating that the slope s of the tent map Ts (x) satisfies s = ∞ −j
j=0 Θj (c1 )s ,
which can be rewritten to
1 1
DTs = = 0.
s DTsk (Ts (c))

As discovered in [330] as a consequence of their so-called

∞ f -expansion for-
mula, this fact extends to skew tent maps: Ts,t as k=0 DT k (T (c)) = 0.
s,t s,t
One of the consequences is that the first zero of Df (t) on [0, 1) is t0 =
exp(−htop (Ts,t )) provided htop (Ts,t ) > 0.

The following theorem is one of the main results in [420], namely Theo-
rem 9.2 and Corollary 10.7. We don’t prove or need it for our purpose, but
we mention it for completeness sake.
106 3. Subshifts of Positive Entropy

Theorem 3.100. The reduced dynamical ζ-function22

ζ(t) := exp #{x : f k (x) = x} tk

of f : R → R satisfies
−1 (1 − t)D(t) if c is non-periodic,
(3.35) ζ(t) =
(1 − t)(1 − tp )D(t) if c is periodic of period p.

3.6.5. Complex Kneading Theory. The standard and most direct ex-
tension of unimodal dynamics to the complex plane is via the quadratic
family23 fc (z) = z 2 + c. This family is conjugate to fa (w) = aw(1 − w) via
2 √
w = ψ(z) = 12 − az and c = a2 − a4 , a = 1 + 1 − 4c.
The family fc has its own kneading theory with features interesting
enough to devote a separate section to. Instead of symbolic dynamics on
a core interval in the real setting, we now address the symbolic dynamics on
a tree H (called the Hubbard tree) or dendrite24 .

Definition 3.101. A quadratic Hubbard tree is a tree H equipped with

a continuous tree map fH : H → H which is at most two-to-one and with
a single critical point c0 , where fH is not a local homeomorphism onto its
image, and all end-points lie on the critical orbit; see [124, 354, 452]. We
extend this definition to H being a dendrite; the properties of fH : H → H
remain the same, except that all post-critical points are end-points but there
can be end-points that are in the closure of orbfH (c0 ), but not in orbfH (c0 )

The Hubbard tree models the closed connected hull of the critical orbit
within the Julia set Jc provided Jc is a dendrite. This applies to most of the
parameters c in the Mandelbrot set M that do not lie in the closure of any of
its hyperbolic components. But also if Jc is not a dendrite (because it is not
locally connected, or there are bounded Fatou components), there is always
a topological model for the Hubbard tree that satisfies Definition 3.101. The

22 Here we count at most one k-periodic point in each lap of f k ; if there are two such orbits

with the period of one twice the period of the other (as is the case shortly after a period doubling
bifurcation), then only the orbit of the smaller period is counted. Note, however, that the period
need not be minimal: a k-periodic point also counts for 2k, 3k, . . . .
23 We use a different font to distinguish from f (w) = aw(1 − w). Also we will write c = 0
a 0
for the critical point of fc (to distinguish it from the parameter), so c1 = fc (c0 ) = c.
24 A dendrite is a compact, connected, locally connected (and therefore arc-connected) set

without loops.
3.6. Unimodal Subshifts 107

questions to answer are:

(1) Which sequences in {0, 1}N0 are itineraries of points in the Hubbard
(2) Which sequences in {0, 1}N0 are kneading sequences, i.e. the itinerary
of the critical value in the Julia set?
(3) Determine the combinatorial structure of the Hubbard tree (branch-
points, their number of arms, and relative position) from the knead-
ing sequence.
Answers to questions (2) and (3) can be found in [126], but we will give
combinatorial proofs in terms of the 0-1-sequences, showing the following:
• c1 = fH (c0 ) is an end-point of the Hubbard tree.
• Every branch-point is precritical or preperiodic; i.e. there are “no
wandering triangles” (see [535] and Proposition 3.106).
We first need to recall some basics from complex dynamics; see [419].
Let f : C → C be a polynomial of degree ≥ 2. It is often handy to extend f
to the Riemann sphere Ĉ = C ∪ {∞}, i.e. the one-point compactification of
the complex plane. Then ∞ is always a superattracting fixed point, with no
other preimage than itself. For fc (z) = z 2 + c one can see this easily by using
a change of coordinates w = 1/z, sending ∞ to 0 and fc (w) = 1+cw 2 . The
Riemann sphere falls apart into two fully invariant sets, called the Fatou
set F and the Julia set J . The Fatou set is the set of regular dynamics:
every z ∈ F has a neighborhood U  z such that
(1) f n (U ) converges to an attracting or parabolic periodic orbit orb(p),
i.e. f r (p) = p and (f r ) (p) lies in the open unit disk D or is a root
of unity, or
(2) f maps U into an open topological disk D (called a Siegel disk)
on which the dynamics is conjugate to an irrational rotation on the
unit disk D.
By Sullivan’s Theorem, see [527] and [419, Section 16], there are no other
possibilities for rational maps25 , and entire maps can have Baker domains
(wandering regions). The Fatou set is open, and for polynomials, F = ∅
because the superattracting fixed point ∞ ∈ F and its basin of attraction
A∞ ⊂ F .
The Julia set J , the complement of the Fatou set, is the set of chaotic
motion. It is closed, completely invariant (f −1 (J ) = J ), non-empty (since it

25 Although rational maps can also have Herman rings (as Siegel disks, but now on an annulus

instead of a disk).
108 3. Subshifts of Positive Entropy

contains all repelling periodic points; in fact J = {repelling periodic points};

see [419, Theorem 14.1]) and for polynomials, J = ∂A∞ .
The Mandelbrot set is the locus in parameter space of the family
fc where Jc is connected, and equivalently, the critical orbit {fnc (0)}n≥0 is
bounded; see [144, Theorem VIII.1.1]. This means that for c ∈ M, the
basin A∞ is a topological disk in Ĉ. Using the Riemann Mapping Theorem
we can find a conformal homeomorphism ψc : D → A∞ which in fact can be
chosen such that ψc (z 2 ) = fc (ψc (z)). These are the Böttcher coordinates; see
[419, Section 9]. The images Rc (ϑ) = ψc ({re2πiϑ : 0 < r < 1}) are called the
dynamic external rays (see [419, Section 18]); they form an fc -invariant
foliation of A∞ , namely Rc (2ϑ) = fc (Rc (ϑ)). If Jc is locally connected, then
each external ray Rc (ϑ) lands26 ; i.e. ψc (re2πiϑ ) → zϑ ∈ Jc as r → 1, and
this zϑ is called the landing point of Rc (ϑ). In fact, also if Jc is not locally
connected, Rc (ϑ) lands for Lebesgue-a.e. ϑ including every rational ϑ ∈ S1 .
A point z on the Julia set is called biaccessible if it is the landing point
of at least two external rays Rc (ϑ) and Rc (ϑ ), and these external angles are
then also called biaccessible. Biaccessibility is an equivalence relation ∼c on
the circle S1 of external angles. Its equivalence classes are closed, forward
invariant under the doubling map g(ϑ) = 2ϑ mod 1, and if ϑi → ϑ, ϑi → ϑ ,
and ϑi ∼c ϑi for all i ∈ N, then also ϑ ∼c ϑ . The quotient space S1 / ∼c
is well-defined and Hausdorff, with a well-defined doubling map on it. The
collection of geodesic circles connecting equivalent angles in the disk is called
a Thurston lamination and the quotient space is the pinched disk model
of the Julia set. If Jc is a dendrite, then (Jc , fc ) is in fact conjugate to its
pinched disk model (S1 / ∼c , g).
Remark 3.102. The Thruston laminations present an interesting example
of how simple partitions of S1 can lead to interesting non-injective itinerary
maps. Namely, set J0b = (b, b + 12 ), J1b = S1 \ J0b for b ∈ [0, 12 ) and let
S(x) = 1 − x be an involution. Then i(x) = i(S(x)) whenever orb(x) avoids
the symmetric difference J0b $S(J0b ) = (b, 12 − b) ∪ (b + 12 , 1 − b). If b > 15 + 17
then there is a Cantor set of points for which i(x) = i(S(x)).
Also if x = S(y), this still doesn’t guarantee that i(x) = i(y) for the
itinerary map i w.r.t. {J0b , J1b }. The reason for this is that the quotient
space S1 / ∼ for x ∼ y if i(x) = i(y) is a topological “pinched disk” model for
the Julia set Jc of fc : z → z 2 + c for some specific c, namely the landing
point of the external parameter ray with angle 2b; see [124, 354, 452] and
also Section 3.6.5, in particular Figure 3.15 for an illustration of this pinched
disk model. Injectivity of i is equivalent to S/ ∼ being a topological circle,
which means that Jc is the boundary of a Siegel disk. This happens if c lies
on the main cardioid of the Mandelbrot set.
26 Due to the theorem of Carathéodory; see [419, Section 17].
3.6. Unimodal Subshifts 109

We call z an end-point if it is not biaccessible; i.e. its equivalence class

under ∼c contains only z and z is a branch-point with q ≥ 3 arms if its
equivalence class under ∼c consists of q points27 .

3 1
ϑc = 6
• 1
c2 = c4 c1 •
• • 1
1 12 12
•c •0
7 0 7
12 c3• 12 0

fi : z → z 2 + i

Figure 3.15. The Hubbard tree inside the disk model and the Julia set
of the external angle ϑc = 1/6 and kneading sequence ν = 110.

By the symmetry z → −z, the critical point 0 is always biaccessible (at

least in its pinched disk model), so it has (at least) two externals angles ϑ∗
and ϑ∗ = ϑ∗ + 12 . These divide the circle into two open semi-circles S0 and
S1 , where by convention28 ϑc := g(ϑ∗ ) = g(ϑ∗ ) ∈ S1 . Now we can define the
itinerary map i : S1 → {0, ∗, 1}N0 :

⎨0 if g (ϑ) ∈ S0 ,

i(ϑ)k = ∗ if g k (ϑ) ∈ {ϑ∗ , ϑ∗ },

1 if g k (ϑ) ∈ S0 .
The kneading sequence ν = νc = ν1 ν2 ν3 · · · is the itinerary of ϑc w.r.t.
this partition; see Figure 3.15.
Each itinerary i(ϑ) ∈ {0, 1}N0 is well-defined, except for the countably
many precritical angles. On the other hand, for every e ∈ {0, 1}N0 except
those for which ν is a suffix of e, there is ϑ ∈ S1 such that i(ϑ) = e (and recall
that itineraries of angles in the same equivalence class of ∼c are the same).
However, not every 0-1-sequence is achieved by the post-critical angle.
Theorem 3.103. The map c → htop (fc ) is non-increasing on R.
27 This also means that Jc \ z has q connected components.
28 We exclude the case ϑc = ϑ∗ = 0 achieved for c = 1/4.
110 3. Subshifts of Positive Entropy

This entropy function cannot be strictly increasing, because it is constant

inside every hyperbolic component. Since the family fa (w) = aw(1 − w)
is conjugate to fc (z) = z 2 + c via conjugacies that relate the parameters
in an orientation-reversing way, a → htop (fa ) is non-decreasing. Although
Theorem 3.103 is a result in real dynamics, all known proofs rely on complex
dynamics. The elegant proof we present here was given by Douady [205].

Proof. Recall again the definition of θ(x) of Exercise 3.80 as an alternative

way to code itineraries of unimodal maps fc . Applied to the critical value c1 ,
this reads as 
+1 if fnc is increasing at c1 ,
θn =
−1 if fnc is decreasing at c1 .

1 − θn −n ( 1 )
γ= 1
2 2 ∈ 0, 2 .
Next, for the doubling map g : S1 → S1 , ϑ → 2ϑ mod 1, define the set
Kγ = {ϑ ∈ S1 : g j (ϑ) ∈
/ (γ, 1 − γ) for all j ≥ 0};
see Figure 3.16.

2 2
7 3
• 1
25 25
γ= 56 • 56
• • •
c1 α β
1−γ = 31 • 31
56 56
• •
• 28
11 1
14 3

Figure 3.16. A schematic Julia set for external angle γ = 25/56 (with
ν = 100101) and the Mandelbrot set with some external rays.

The set Kγ is g-invariant, and usually a Cantor set, except for γ = 12 ,

when Kγ = S1 . Then (Kγ , T ) is semi-conjugate
√ to ([c1 , c2 ], f), in an entropy-
preserving way. Let β = β(c) = 2 (1+ 1 − 4c) be the orientation-preserving

fixed point of fc . Then there is a semi-conjugacy L : Kγ → [c1 , β] given by

the landing point L(ϑ) of the dynamical external ray R(ϑ) with angle ϑ.
Also #L−1 (x) ≤ 4 (namely #L−1 (x) = 4 if x is (pre)critical, #L−1 (x) = 1
3.6. Unimodal Subshifts 111

if x = ±β (but −β ∈ [c1 , β) only if c1 = −2), and #L−1 (x) = 2 for all other
x ∈ [c1 , β)). In particular, htop (T |Kγ ) = htop (fc |[c1 ,β] ). Moreover,

⎪c → (θn )n≥1 (lexicographic order) is order preserving;

⎨(θ ) 1 1−θn −n
n n≥1 → γ = 2 n≥1 2 2 is order reversing;

⎪γ → Kγ (inclusion order) is order preserving;

⎩K → h (T | ) is order preserving.
γ top Kγ

Since T |Kγ and fc |[c1 ,β] ) have the same entropy, c →

 htop (fc |[c1 ,β] ) is an
orientation-reversing map, proving the monotonicity of entropy for the qua-
dratic family. 

We can extend the ρ-function from (3.19) to this complex case without
changing the definition:

ρ : N → N, ρ(n) = max{k > n : en+1 en+2 · · · ek−1 is prefix of ν},

and the sequence of cutting times

(3.36) S0 = 1, Sk+1 = ρ(Sk ),

is again uniquely determined by νc . In the complex setting, the sequence

of cutting times is called the internal address. This name comes from a
procedure of locating the parameter c in the Mandelbrot set. A hyperbolic
component of period r in M is a subset of parameters where fc has an
attracting periodic orbit of period r. Since each such periodic orbit must
attract a critical point (by Montel’s Theorem; see [144, Theorem I.3.1]),
there can be only one of them, and hence hyperbolic components are disjoint.
Attracting periodic orbits persist under small perturbations, so hyperbolic
components are open. The hyperbolic component of period 1 is called the
main cardioid and contains 0. It is conjectured (and this follows from the
MLC-conjecture: the Mandelbrot set is locally connected; see [206, 407])
that the hyperbolic components lie dense in M. For c ∈ M, take an arc
connecting 0 to c without self-intersections. Starting from 0, list the periods
of hyperbolic components the closures of which the arc passes through, and
retain only the smallest entries. That is, S0 = 1 and if Sk is found, let Sk+1
be the smallest period29 that you can find on the remainder of the arc. The
sequence {Sk }k≥1 turns out to be exactly the one obtained from (3.36).

29 It was shown by Lavaurs [388] that between two hyperbolic components of the same period,

there is always a hyperbolic component of a lower period, and therefore this list determines the
hyperbolic components on this arc uniquely.
112 3. Subshifts of Positive Entropy

We extend the use of closest precritical points to the Julia set: ζ ∈ Jc

is a closest precritical point if fnc (ζ) = c0 and fkc (x) = c for all k ≤ n and
x ∈ (ζ, c0 ) which now denotes the open arc between ζ and c0 in the Julia
set. The closest precritical points ζS1k ∈ Jc1 and ζS0k ∈ Jc0 from the real
case in (3.23) belong to the interval [c1 , c2 ], i.e. the Hubbard tree, but in
the whole Julia set there is a closest precritical point ζn for every n ∈ N.
Indeed, since Jc is completely invariant, {Jc0 , Jc1 } is a Markov partition such
that fc (Jc0 ∪ {c}) = fc (Jc1 ∪ {c}) = Jc . There are therefore 2n cylinders of
generation n, and each z ∈ n−1 k=0 fc (c0 ) is a boundary point of one of these
cylinders. Hence ζn1 and ζn0 can be found in the interior of the two n-cylinders
with c in their boundary.
The arc [ζn1 , ζn0 ] maps in a two-to-one way onto the arc [ζ̂n , c1 ] for ζ̂n :=
fc (ζn1 ) = fc (ζn0 ), with the property that fkc (x) = c1 for all x ∈ (ζ̂n , c1 ) and
k ≤ n. Also the ζ̂n ∈ (ζ̂n , c1 ] of lowest index n satisfies n = ρ(n). Indeed,
c maps [ζ̂n , c1 ] homeomorphically onto [c0 , cn ]  ζn −n , and n − n is the

smallest positive iterate of [c0 , cn ] to contain c0 . But then fcn −n (c0 ) and

fcn −n (cn ) = cn lie in different components Jc1 and Jc0 , and hence ρ(n) = n ,
as claimed. Therefore, each ρ-orbit orbρ (k) represents a monotone sequence
(ζ̂ρi (k) )i≥1 of closest precritical points approaching c1 (or ending in c1 if c
is periodic). These ρ-orbits are disjoint if and only if the corresponding
sequences approach c1 from different components of Jc \ {c1 }. The following
proposition therefore implies that points ck , k ≥ 2 (with corresponding ρ-
orbits orbρ (ρ(k) − k)), all lie in the same component of Jc \ {c1 }, and hence
c1 is an end-point of the Hubbard tree.
Proposition 3.104. For each kneading sequence ν ∈ {0, 1}N and 2 ≤ m ∈ N
such that ρ(m) − m < ∞, there exists a k ≤ ρ(m) such that k ∈ orbρ (1) ∩
orbρ (ρ(m) − m).

Proof. We argue by induction on n, using the induction hypothesis IH[n]:

IH[n]: For every ν ∈ {0, 1}N with corresponding ρ-function
and for every m ∈ N such that ρ(m) − m = n, the orbits
orbρ (1) and orbρ (n) intersect at the latest at ρ(m).
Remark 3.105. IH[n] does not imply that orbρ (1) ∩ orbρ (n) contains ρ(m),
not even if m is minimal such that ρ(m) − m = n. For example, if
ν = 1011001101101 · · ·
with n = 6 and m = 7, then n ∈ orbρ (1), but ρ(n) > m > n.

The induction hypothesis is trivially true for n = 1. So assume that

IH[n ] holds for all n < n. Take ν ∈ {0, 1}N arbitrary and m ∈ N minimal
such that ρ(m) − m = n. If no such m exists, then IH[n] is true for this
3.6. Unimodal Subshifts 113

ν by default. Let n0 ∈ orbρ (n) be maximal such that n0 ≤ ρ(m); thus

ρ(n0 ) > ρ(m). We distinguish two cases:
Case I: n0 < ρ(m). If n0 ≤ m, then ρ(n0 ) > ρ(m) implies n0 < m and
ν1 · · · νm−n0 νm−n0 +1 · · · νρ(m)−n0 = νn0 +1 · · · νm νm+1 · · · νρ(m) ;
hence ρ(m − n0 ) − (m − n0 ) = ρ(m) − m = n, contradicting minimality of
m. Therefore m < n0 < ρ(k). Since ρ(n0 ) > ρ(m) and
νm+1 · · · νn0 +1 · · · νρ(m) = ν1 · · · νn0 −m+1 · · · νn

(where νn is the opposite symbol of νn ), we have ρ(n0 − m) = n. Consider

ν̃ := ν1 · · · νn0 −1 νn 0 νn0 +1 · · · (with arbitrary continuation) with associated
function ρ̃. Then ρ̃(m) = n0 .
(i) If n0 = n, then the fact that ρ(n0 − m) = n implies ρ̃(n0 − m) > n0 ,
so ρ̃(m) ∈ / orbρ̃ (n0 − m).
(ii) If n0 > n, then ρ̃(m) = n0 ∈ / orbρ̃ (n), and ρ̃(n0 − m) = ρ(n0 − m) =
m < n0 again implies ρ̃(m) ∈ / orbρ̃ (n0 − m).
So in both cases ρ̃(m) ∈ / orbρ̃ (n0 − m). Now ρ̃(m) − m = n0 − m <
ρ(m) − m = n, so by the induction hypothesis IH[n0 − m], orbρ̃ (1) and
orbρ̃ (n0 − m) meet at or before ρ̃(m); since ρ̃(m) ∈
/ orbρ̃ (n0 − m), they meet
before ρ̃(m) = n0 .
As a result, also orbρ (1) and orbρ (n0 − m) meet before n0 < ρ(m), and
since ρ(n0 − m) = n, also orbρ (1) and orbρ (n) meet before ρ(m).
Case II: n0 = ρ(m). In this case ρ(m) ∈ orbρ (n). Let p0 ∈ orbρ (1)
be maximal such that p0 ≤ ρ(m); hence ρ(p0 ) > ρ(m). If p0 = ρ(m), then
there is nothing to prove, so assume that p0 < ρ(m) < ρ(p0 ). As in Case
I (by minimality of m), we only need to consider the case that m < p0 <
ρ(m) < ρ(p0 ). Since νk+1 · · · νρ(m) = ν1 · · · νn , we have ρ(p0 − m) = n
(similarly as above). Set ν̃ := ν1 · · · νp 0 · · · with associated function ρ̃. Then
ρ̃(m) = p0 < ρ(m) and by IH[p0 − m], orbρ̃ (1) and orbρ̃ (p0 − m) meet at the
latest at ρ̃(m) = p0 .
(i) If n < p0 , then ρ̃(p0 − m) = ρ(p0 − m) = n, so orbρ̃ (1) and orbρ̃ (n)
meet at the latest at p0 . But p0 ∈/ orbρ̃ (1), so in fact orbρ̃ (1) and orbρ̃ (n)
meet before p0 . But then orbρ (1) and orbρ (n) also meet before p0 < ρ(m).
(ii) If n = p0 , then orbρ (1) and orbρ (n) obviously meet at p0 < ρ(m).
(iii) The case n > p0 is impossible. Indeed, ρ(m) − m = n > p0 > m,
so ρ(m) > 2m. Since ρ(p0 ) > ρ(m) = n + m > p0 + m, we find that
νm+1 · · · νp0 +1 · · · νρ(m)−1 = ν1 · · · νp0 −m+1 · · · νρ(m)−m−1 ; hence ρ(p0 − m) ≥
ρ(m) − m > p0 . For the sequence ν̃ this means that ρ̃(p0 − m) = p0 , while
p0 ∈/ orbρ̃ (1). Therefore orbρ̃ (1) and orbρ̃ (p0 − m) do not meet at or before
p0 ; since ρ̃(m) − m = p0 − m, this contradicts IH[p0 − m].
114 3. Subshifts of Positive Entropy

This completes Case II and proves that orbρ (1) and orbρ (n) intersect at
the latest at ρ(m), where m is minimal with the property that ρ(m)−m = n.
For an arbitrary (i.e. not necessarily minimal) m with ρ(m) − m = n, let m
be minimal with this property. Then the ρ-orbits orbρ (1) and orbρ (n) meet
at the latest at ρ(m ) = n + m ≤ n + m = ρ(m), so the statement holds for
arbitrary m. This proves IH[n]. 

Thurston’s Non-wandering Triangle Theorem states that every branch-

point in the dendritic Julia set of a quadratic map is precritical or (pre)perio-
dic. The next proposition proves this using only the properties of ρ-functions.
However, the theorem only works for quadratic polynomials, because cubic
or higher-order polynomials can have wandering (i.e. not precritical or prepe-
riodic) triods or n-ods30 . This was discovered by Blokh & Oversteegen [93]
and studied systematically by Childers [154]. Upper bounds for the number
of external rays were already given in [365].
We extend the ρ-function to arbitrary sequences x ∈ {0, 1}, as
ρx (n) = min{k > n : xk = νk−n } for kneading sequence ν.
Hence ρν = ρ. The ρx -orbits correspond to sequences of closest-to-x precrit-
ical points that monotonically approach x (or end in x if x is precritical).
The number of disjoint ρx -orbits is equal to the number of components of
Jc \ {x}.
Proposition 3.106. Let x, ν ∈ {0, 1}N . If there are three disjoint ρx -orbits,
then x is preperiodic.

Proof. Assume by contradiction that (Ah )h≥0 = orbρx (A0 ), (Bi )i≥0 =
orbρx (B0 ), and (Cj )j≥0 = orbρx (C0 ) are pairwise disjoint ρx -orbits. There
are infinitely many triples (Ah , Bi , Cj ) such that Bi−1 < Ah−1 < Bi < Ah
and Cj−1 < Ah−1 < Cj < Ah (possibly with the roles of Ah , Bi , and
Cj permuted. Assume that (Ah , Bi , Cj ) is one of such triples, with span
d(Ah , Bi , Cj ) := max{Ah − Ah−1 , Bi − Bi−1 , Cj − Cj−1 } taking the minimal
value dmin among the span of all such triples; see Figure 3.17.

• • • • • • • • • • •
Cj−1 Bi−1 k Ah−1 ρ(k) − k Bi Cj Bi −1Cj  Cj  −1 Ah B i C j 

Figure 3.17. Parts of the three disjoint ρx -orbits.

30 n arcs glued together at a common branch-point.

3.6. Unimodal Subshifts 115

We have

νBi−1 +1 · · · νAh−1 · · · νBi = ν1 · · · νAh−1 −Bi−1 · · · νB i −Bi−1

and hence ρ(Ah−1 − Bi−1 ) = Bi − Bi−1 . Proposition 3.104 for m := Ah−1 −

Bi−1 and therefore ρ(m) − m = Bi − Ah−1 gives a k ≤ ρ(m) = Bi − Bi−1
such that k ∈ orbρ (1) ∩ orbρ (ρ(m) − m). Similarly, for m = Ah−1 − Cj−1 ,
we have a k  ≤ ρ(m ) = Cj − Ai−1 such that k  ∈ orbρ (1) ∩ orbρ (ρ(m ) − m ).
We have xAh +1 · · · xAh −1 = ν1 · · · νAh −Ah−1 −1 . If Ah − Ah−1 >
max{Bi −Bi−1 , Cj −Cj−1 }, then Ah−1 +max{k, k  } ∈ orbρx (Bi )∩orbρx (Cj ),
contradicting the disjointness of orbρx (B0 ) and orbρx (C0 ). Therefore Ah −
Ah−1 ≤ max{Bi − Bi−1 , Cj − Cj−1 }.
Now take i ≥ i and j  ≥ j maximal such that Bi ≤ Ah and Cj  ≤ Ah .
Assume without loss of generality that Bi < Cj  , and take j  ≤ j  minimal
such that Bi < Cj  . Then (Bi , Cj  , Ah ) forms the next new triple, with
span d(Bi , Cj  , Ah ) ≤ d(Ah , Bi , Cj ). By the choice of (Ah , Bi , Cj ), we have
in fact d(Bi , Cj  , Ah ) = d(Ah , Bi , Cj ), and that means that the span of all
later triples is dmin . A fortiori, the first “over-arching” distance Ah − Ah−1 of
these triples is equal to dmin . Therefore x is periodic from this point onwards,
with period dividing dmin . 

Remark 3.107. This proof is more general than Thurston’s proof, because
it applies also to non-admissible kneading sequences, i.e. those that do not
come with a Thurston lamination. For instance, if ν = 101100 · · · , then
there is a periodic point (see Figure 3.18, right) with itinerary x = 101. The
ρx -orbits of A0 = 3, B0 = 6, and C0 = 1 are disjoint, with span dmin = 6.
The precritical branch-points are not covered by this proposition, because
of the issue of assigning a proper symbol to the critical point. Each choice of
0 or 1 allows for one or two branches according to whether ν is an end-point
in the Julia set or not (unless ν is eventually periodic). Therefore 0ν and 1ν
together accounts for two or four arms.

For quadratic dendritic Julia sets, each non-precritical branch-point has a

so-called characteristic branch-point z ∈ [c1 , c0 ] on its orbit that is periodic
and is closest to c1 in the sense that the arm, i.e. component of Jc \ {z},
is disjoint from orbfc (z); see [126, Section 3]. Again, the combinatorial
properties of characteristic periodic points can be read off the ρ-function, as
summarized below (see [126, Proposition 4.19] for the proof):

Proposition 3.108. Let ν ∈ {0, 1}N be the kneading sequence of a dendritic

quadratic Julia set Jc . Take m ∈ orbρ (1) and write ρ(m) = qm + r for
r ∈ {1, . . . , m}. Then there is a characteristic m-periodic point z ∈ [c1 , c0 ]
116 3. Subshifts of Positive Entropy

with itinerary i(z) = ν1 · · · νm . Its number of arms is

q + 1 if m ∈ orbρ (r),
q + 2 if m ∈ / orbρ (r),

and locally, these arms are permuted cyclically by fm

c ; see Figure 3.18, left.
There are no other characteristic branch-points in Jc .

c1• • c7
◦ • c4 • c1 • c5

c =c c =c
• c5  ◦ 0 • 10 ◦ • c3 $ $0 • 6 $
• c8 ◦ • c6 • c9
◦ • c4 • c3 • c2
c2 •

Figure 3.18. The Hubbard tree of ν = 1 10 111 1100 has two periodic
orbits of branch-points. The Hubbard tree of ν = 1 0 11 0 0 has an orbit
of evil branch-points.

However, it is possible that the Hubbard tree corresponding to some

ν ∈ {0, 1}N has characteristic branch-points that are not described by Propo-
sition 3.108. These are called evil points; the existence of evil points is the
only restriction for ν not to be the kneading sequence of some quadratic
map; see [126, Proposition 4.13].

Proposition 3.109. Let ν ∈ {0, 1}N and m ∈ N and write ρ(m) = qm + r

for r ∈ {1, . . . , m}. If

⎨m = orbρ (1),
(3.37) ρ(k) < m if k < m divides m,

m ∈ orbρ (r),
then the Hubbard tree associated to ν has an evil characteristic branch-point
z ∈ [c1 , c0 ]. Its itinerary is ν1 · · · νm and its number of arms is q + 2. Locally,
the m-th iterate of the tree map fixes the arm towards c0 and permutes the
other arms cyclically.

It follows from Proposition 3.104 that if m ∈ N satisfies (3.37), then

ρ(m) ∈ orbρ (1).
The fact that arms are not permuted cyclically prevents the existence
of a polynomial fc with a periodic branch-point as described, but this is
the only restriction; see [126, Section 4]. Hence we have the Complex
3.7. Gap Shifts 117

Admissibility Condition:
If ν ∈ {0, 1}N is such that (3.37) fails for every m ∈ N, then
ν is the kneading sequence of some quadratic polynomial.
If m ≥ ρ(m) − m ∈ orbρ (1) for all m, then all characteristic periodic
points have two arms, according to Propositions 3.108 and 3.109, and the
Hubbard tree is an arc [c1 , c2 ]. But this condition gives the existence of a
kneading map Q, which is central to having a real kneading sequence.

3.7. Gap Shifts

These were introduced by Lind & Marcus [398, page 7] as another example
of the variety of the notion of subshift, but they were not further developed
Definition 3.110. Let S be a collection of non-negative integers. The cor-
responding gap shift (or S-gap shift, for apparently no other reason that S
denotes the collection of gap-sizes) is the subshift
XS = {x ∈ {0, 1}Z : if 10s 1 is a subword of x, then s ∈ S}.
Example 3.111. We obtain the Fibonacci SFT by taking S = {1, 2, 3, 4, . . . },
and if we interchange the roles of 0’s and 1’s, then S = {0, 1}. The even
shift with the roles of 0 and 1 interchanged is obtained by taking S =
{0, 2, 4, 6, . . . }. Also β-shifts are S-gap shifts, namely with
 S = {s ∈ N :
cs = 1} for the greedy β-representation of 1, i.e. 1 = −s
s∈S β ; see the
proof of Theorem 3.77.
Example 3.112. The subshift in which every other symbols is a 1 (but no
other restrictions) can be seen as a sofic shift (see the edge-labeled transition
graph in Figure 3.19), as an S-gap shift with S = {1, 3, 5, 7, 9, . . . }, and it is
isomorphic to an SFT on {0, 1, 2} (see the vertex-labeled transition graph in
Figure 3.19). It encodes the quadratic interval map √ Ts (x) = min{sx, s(1−x)}
such that Ts2 ( 12 ) < 12 < Ts3 ( 12 ) = Ts4 ( 12 ), so s = 2. These dynamical systems
are topologically transitive, but not topologically mixing. The entropy is
exactly 12 log 2.

0 1
1 0

Figure 3.19. Graphs of a sofic shift, SFT, and tent map with topolog-
ical entropy 12 log 2.
118 3. Subshifts of Positive Entropy

The following collection of results was shown in [182].

Theorem 3.113. Let (XS , σ) be an S-gap shift. Then (XS , σ)

(a) is an SFT if and only if S is finite or cofinite;

(b) is sofic if and only if {si }i is eventually periodic;
(c) is synchronized and coded;
(d) is topologically mixing if and only if gcd{s + 1 : s ∈ S} = 1;
(e) has specification if and only if S is syndetic and gcd{s + 1 :
s ∈ S} = 1.

Proof. (a) If #S < ∞, then let N = max S, and if S is cofinite, then

take N such that s ∈ S for all s > N . Now declare all N + 1-words for-
bidden if they don’t occur in any concatenation of words 10s , s ∈ S. Con-
versely, if N is the maximal length of a forbidden word of the SFT, then S ⊃
{N + 1, N + 2, N + 3, . . . }.
(b) Take B = {0s 1 : s ∈ S}. Clearly, every (left-infinite) word w ending
with 1 has the same follower set F (w) = B N . For every (left-infinite) word
w ending in 0, w = · · · 0000 or there is a unique n ∈ N such that 10n is a
suffix of w. The word · · · 0000 has its own follower set F (· · · ) and if n < ∞,
then the follower set F (w) depends only on n and is eventually periodic in n
by our assumption. Therefore there are only finitely many distinct follower
sets and by Theorem 3.36 XS is sofic.
Conversely, if XS is sofic, then again F (w) depends only on n. The
follower set F (w) = {0∞ } ∪ {0a 1B N : a + n ∈ S}. That is, there is an infinite
collection of follower sets, unless {N ∩ (S − n)}n≥1 is a finite collection of
sets, and this only holds if S is eventually periodic.
(c) The S-gap shift, as the free concatenation of words 10s , s ∈ S, is
obviously a coded shift. Each such word is synchronizing.
(d) If g := gcd{s + 1 : s ∈ S} > 1, then σ n ([1]) ∩ [1] = ∅ if n is not
a multiple of g. In this case, topological mixing fails. Conversely, there is
N such that for every n ≥ N , there is a word v ∈ Ln (XS ) which is the
concatenation of words 10s , s ∈ S. Now let u, w ∈ L(XS ) be arbitrary. By
extending u by u on the right and w by w on the left by words u and w
of no more than min S symbols, we can turn them in the suffix and prefix
of concatenations of words 10s , s ∈ S. But then uu vw w ∈ L(XS ) as well.
This proves the topological mixing; cf. [336].
Finally, for the specification, we refer to [182]. In fact, in [43] it is shown
that for every h ∈ (0, log 2], there are (sometimes uncountably many) gap
shifts with specification satisfying htop (XS , σ) = h. 
3.7. Gap Shifts 119

As mentioned before, a gap shift (X, σ) is coded with C = {0s 1 : s ∈ S}

as the set of code words. Therefore Theorem 3.48 immediately gives that
(X, σ) (and any of its factors) is intrinsically ergodic; see also [159].
Theorem 3.114. The topological entropy of the S-gap shift (XS , σ) is log λ
where λ the largest solution of the equation
λ−(s+1) = 1.

Proof. Gap shifts are coded shifts, with C = {10s : s ∈ S} as code words.
Therefore the results of Section 3.3 apply, but the situation here is simpler
because UC from (3.5) reduces to {0∞ }. In fact, we can pass directly to the
representation of a gap shift by an infinite transition graph consisting of a
single central vertex from which loops of length s + 1, s ∈ S, emerge. So
Theorem 3.114 follows directly from Theorem 8.73.

Exercise 3.115. Use Theorem 3.114 to compute the entropy of the Fi-
bonacci SFT, the odd shift, and the even shift.

A generalization of S-gap shifts was initiated in [183, 409]. For the

alphabet A = {0, 1, . . . , d − 1} and for each a ∈ A, there is a set Sa ⊂ N0 ,
such that the maximal blocks of each symbol a must have length s ∈ Sa . If in
addition, these blocks appear cyclically, i.e. as 0s0 1s1 · · · (d − 1)sd−1 , si ∈ Si ,
before the next 0 is allowed to appear31 , then we call this shift space the
cyclic S-limited gap shift or simply cyclic S-gap shift. Clearly S-gap
shifts are cyclic S-gap shifts on two symbols with S0 = S and S1 = {1}.
For cyclic S-gap shifts, a fairly straightforward generalization of Theo-
rem 3.113 holds. For instance, a cyclic S-gap shift is topologically mixing if
and only if gcd{s0 + s1 + · · · + sd−1 : si ∈ Si } = 1; see [409, Proposition 3.6].
In [409, Theorem 4.3] (using the results of [159]), it is shown that cyclic
S-gap shifts are intrinsically ergodic, and so are its factors. Also conditions
are given [409, Theorems 5.1 and 5.2] about when two cyclic S-gap shifts
are conjugate.
Theorem 3.116. The topological entropy of the cyclic S-gap shift (XS , σ)
is log λ where λ is the largest solution of the equation

λ−sa = 1.
a∈A s∈Sa

Proof. We give the proof first for the truncated Sa := Sa ∩{0, . . . , N }. Since
the entropy increases as N increases, the theorem follows by taking N → ∞.
31 If Sa  0, then the symbol a can be “jumped”; if Sa ⊃ {0, 1} for each a, then XS = AN or Z .
120 3. Subshifts of Positive Entropy

Also, we use the rome technique from Section 8.7.3 as opposed to the proofs
in [183, 398, 409].
Let B be the n × n-transition matrix (for some n ≤ d(N + 1)) for the
cyclic S  -gap shift. Then by Theorem 8.72,
det(B − λIn ) = (−λ)n−d det(Arome (λ) − λId )
for ⎛ ⎞
0 Σ0 0 ... ... 0
⎜ 0 0 Σ1 ⎟
⎜ ⎟
⎜ .. .. ⎟
⎜ . . ⎟ .
⎜ ⎟
Arome (λ) = ⎜ . ⎟,
⎜ . . ⎟ .
⎜ ⎟
⎜ .. ⎟
⎝ . 0 Σd−2 ⎠
Σd−1 0 . . . 0

and Σa := s∈Sa λ1−s , a ∈ A. A straightforward computation gives that
⎛ ⎞
det(B − λI) = (−λ)n − (−λ)n−d Σa = (−λ)n ⎝1 − λ−s ⎠ .
a∈A a∈A s∈Sa
Therefore the leading root satisfies a∈A s∈Sa λ−s = 1. 

3.8. Spacing Shifts

Instead of determining the gaps between 1’s in allowed sequences of a sub-
shift, we can specify the distances of all (not just neighboring) 1’s. This leads
to the definition of a spacing shift. These were first described in [387] with-
out giving them a specific name. The name, as well as a rigorous treatment
of this type of subshift, stems from [47].
Definition 3.117. Given a subset P ⊂ N, the spacing shift is the collection
of all sequences x ∈ {0, 1}N or Z such that xi = xj = 1 implies i = j or
|i − j| ∈ P . We denote this subshift as (XP , σ).
Example 3.118. If P = 2N, then XP is the odd shift from Example 1.4. If
P = 2N + 1, then
XP = {0∞ } ∪ {σ n (· · · 000.1000 · · · )}n∈Z ∪ {σ n (· · · 000.11000 · · · )}n∈Z .

Clearly every spacing shift contains 0∞ and · · · 0001000 · · · , even if P =

∅, and P  P  implies that XP  XP  . In particular, no spacing shift is
minimal. Also (XP , σ) is an SFT if and only if P is cofinite, and max{N\P }−
1 is the length of the longest forbidden word of the SFT; see [47, Theorem
3.8. Spacing Shifts 121

The condition in the definition of the spacing shift is more restrictive

than the restriction of gap shifts. Clearly if p1 , p2 ∈ P , then 10p1 −1 10p2 −1 1 ∈
L(X) only if also p1 + p2 ∈ P . Therefore it is natural to require that P is
closed under addition, but this requirement is a necessary (but not the only)
condition for a spacing shift to be a gap shift; see [47, Theorems 2.19 and
2.21]. However, there are many interesting spacing shifts (XP , σ) for which
P is not closed under addition.
Proposition 3.119. If a spacing shift (XP , σ) is topologically transitive,
then P is infinite. If P = ∅ is closed under addition, then (XP , σ) is topo-
logically transitive.

Proof. If P is finite, then no point x ∈ [1] can return to [1] infinitely often,
so topological transitivity fails. Conversely, assume that u, v ∈ L(XP ) both
start and end with a 1, and let p ∈ P be arbitrary. We claim that w = u0p−1 v
also belongs to L(XP ). Indeed, let 1 ≤ i < j ≤ |w| be such that wi = wj = 1.
If j ≤ |u|, then |j − i| ∈ P because u ∈ L(XP ), and if |u| + p ≤ i, then
|j −i| ∈ P because v ∈ L(XP ). The case i ≤ |u| < |u|+p ≤ j follows because
P is closed under addition. Therefore (XP , σ) is topologically transitive. 

The same proof gives that (XP , σ) has a dense set of periodic orbits, so if
P is closed and non-empty, then (XP , σ) is automatically Devaney chaotic.
Contrary to gap shifts, spacing shifts are hereditary, which gives a certain
freedom in constructing proofs.
Theorem 3.120. A spacing shift (XP , σ) is Li-Yorke chaotic if and only if
the set P is infinite.

Proof. If P is finite, then every x ∈ XP has only finitely many non-zero

entries, so XP is countable. This is too small to allow for an uncountable
scrambled set.
For the converse, let P = (pi )∞
i=1 . Construct a scrambled set S ⊂ {0, 1}

as in Example 2.68, and map it into XP via

xi if n = pi ,
π(x)n =
0 otherwise.
Then π(S) ⊂ XP is still scrambled and still uncountable. 

A spacing shift (XP , σ) is topological mixing if and only if P is cofinite,

which can easily be seen from the definition by taking open sets U = V =
[1] ⊂ XP . It is weakly topologically mixing if and only if P is thick; see [387,
Proposition 1]32 and [47, Theorem 2.1]. As such, for every thick set P with
N \ P infinite, (XP , σ) is topologically weakly mixing but not topologically
32 The term replete was used instead of thick.
122 3. Subshifts of Positive Entropy

mixing; see [387, Theorem 1.3]. It is known from [266] that a dynamical
system (X, T ) is topologically weak mixing if the product of (X, T ) with
every transitive system is again transitive. Conversely, in [387, Theorem
1.1], it is pointed out that the converse is false. Namely, if P and P  are
disjoint thick sets, then (XP , σ) and (XP  , σ) are both topologically weak
mixing, but their product is not topologically transitive (since [1] × [1] is not
infinitely recurrent). Topologically weak mixing precludes the existence of
non-constant Borel-measurable eigenfunctions [359]; i.e. no Borel function
f : X → C satisfies UT f := f ◦ T = λf . However, [387, Theorem 1.2]
presents a spacing shift that is not topologically weak mixing, but which
has a non-constant Borel eigenfunction. All this shows that there is only
a partial topological analog of the characterization of measure-theoretically
weak mixing in Theorem 6.86.
Regarding topological entropy, htop (XP , σ) ≥ k1 log 2 if P ⊃ kN; cf.
Theorem 3.54 and [47, Lemma 3.1]. Other than this, there seems to be no
easy way to compute the topological entropy htop (XP , σ) from the properties
of P . However [47, Theorem 3.6] gives a criterion for htop (XP , σ) = 0.

3.9. Power-Free Shifts

A square is a non-empty word of the form ww, e.g. the word bonbon;
a cube is a non-empty word of the form www, e.g. the Dutch word ker-
kerker33 . Naturally, you can go to higher powers, or fractional powers,
e.g. the word sense. An overlap is a non-empty word of the form wvwvw,
e.g. the word alfalfa, where w = a and w = af. Naturally, every square
word or higher power of length ≥ 4 contains an overlap. In fact, an overlap
is a fractional power or repetition where the exponent p/q > 2; see Defini-
tion 3.126 below.
Thue’s pioneering articles [531, 533] on (what is now called) the Thue-
Morse sequence, see Example 1.6, started this topic in the early 20th century.
In computer science, finding languages that avoid powers and overlaps has
been pursued at least since the 1970s.

Definition 3.121. A subshift (X, σ) over some alphabet A is called square-

free, cube-free, power-free, and overlap-free, if its language L(X) con-
tains no squares, cubes, etc. Overlap-free, power-free, and repetition-free
are here synonyms, but if the exponent is indicated, k + -repetition-free
= k + -power-free = k-overlap-free. That is, wk is a k-th power or k-th
repetition, but wk w1 where w1 is the first letter of w is a k-overlap.

33 Meaning church-niche, just as kerkerkerkerker means dungeon in a church-niche.

3.9. Power-Free Shifts 123

Theorem 3.122. The smallest alphabet size for which square-free subshifts
exist is 3. The Thue-Morse sequence is square+ε-free (i.e. overlap-free) in
the sense that www1 ∈/ L(X) for every w ∈ L(X) and w1 is the first letter
of w.

The crux of the proof relies on a property of the substitution χT M (see

[107, Theorem 3]) which is an idea we can reuse elsewhere, so we formulate
it in the following lemma.
Lemma 3.123. A word w ∈ {0, 1}∗ has a k-overlap if and only if χT M (w)
has a k-overlap. The same result holds for k + 1-power instead of k-overlap.

This is different from Theorem 3.122 in the sense that x can be any 0-1-
word, not just words from the Thue-Morse language L(XρTM ). On the other
hand, it applies to every k ≥ 1, not just to k = 1 (i.e. overlap-free/squares-

Proof. It is immediate that if w has a k-overlap (or k-power), so has χTM (w).
Hence we only have to prove the “if”-direction, and we start with a prelimi-
nary remark. By the shape of χTM ,
(3.38) |χTM (x)|0 = |χTM (w)|1 = |x| for all x ∈ {0, 1}∗ .
Suppose that w ∈ L contains χT M (w) which contains a k-overlap; i.e.
χTM (w) = av k v1 b where v1 is the first letter of w, but w itself does not
contain a k-overlap. Assume also that w is the shortest word with this
property, so |a|, |b| ≤ 1. Suppose by contradiction that there is x such that
χTM (x) = avvc for |a|, |c| ≤ 1. Since |χTM (x)| = |avvb| is even, |a| = |b|.
If |v| is odd, then |vv|0 − |vv|1 = ±2, so a, c = . But then a = c =  and
when we now divide avvc in blocks of two, then v is chopped in two-block
in two different way. Each such block is 01 or 10; therefore a = v2 = v4 =
· · · = vn and c = v1 = v3 = · · · = vn−1 = a, where we wrote v = v1 · · · vn .
But then |avvc|0 = 1 + 2|v|0 = 1 + 2|v|1 = |avvc|1 , contradicting (3.38).
Therefore |v| is even. If |a| = 1, then if we divide avvc in blocks of
two, we see that a = v1 and c = vn . The parity of (3.38) gives a = c, so
a = vn = v1 = c. But then w shortened by its last letter has the same
property as w, contradicting that w is shortest.
Hence a = c =  and χTM (x) = vv. But then xk x1 is a prefix of w, so w
contains a k-overlap after all. 

Proof of Theorem 3.122. If you try to create a two-letter square-free word,

then you soon get stuck:
0  01  010  stuck.
124 3. Subshifts of Positive Entropy

To create a three-letter square-free infinite word (a problem that was first

solved by Thue; see [531, 533] and also [290, 404] for more modern results
and approaches), start with a fixed point ρ0 of the Thue-Morse substitution
χTM and replace the symbol by a 2 if a square threatens to appear:
0120 1021 20210120 1021012021201021 . . . .
This turns out to work. Another way of creating a square-free word x from
ρTM is by taking xi = ρTM,i − ρTM,i+1 (mod 3), because if x contains a
square, then ρT M contains an overlap.
For the Thue-Morse sequence, we work by induction on n in χnTM . We
can see by inspection that the first 8 digits of ρT M are overlap-free. By
applying χT M and Lemma 3.123, the first 16, 32, . . . , 2n , . . . digits of ρT M
are overlap-free as well. 

The next lemma (see [107, Theorem 2]) can be used to produce any
number of square-free languages.
Lemma 3.124. Let χ : A → B ∗ be a constant length substitution; i.e.
|χ(a)| = |χ(b)| for all a, b ∈ A. If χ(w) is square-free for every square-
free 3-letter word w, then χ(x) is square-free for square-free words x of any

Proof. Clearly χ(a) = χ(b) for all a = b because otherwise χ(aba) is not
square-free. If |χ(a)| = 1 for all a ∈ A, then χ is a simple permutation of
letters, and χ preserves the square-freeness. So let us assume that |χ(a)| =:
d ≥ 2.
Assume by contradiction that a square-free word x = x1 · · · xn maps to a
non-square-free word χ(x) = rsst. Assume that x is the shortest such word,
so |x| ≥ 4 and χ(x1 ) = rr for some non-empty prefix r of ss and χ(xn ) = t t
for some non-empty suffix t of ss. However, |χ(x1 )| = |χ(xn )|, so there is
some 1 < k < n such that χ(xk ) = yy  and
x = x1 uxk vxn −→ r r χ(u)y y  χ(v)t t.
s s

Therefore |r | + d|u| + |y| = |s| = |y  | + d|v| + |t |.

• If both |r | = |y  |, then r = y  because they are both prefix of s.
Therefore d > | |t | − |y| | = d| |u| − |v| |, so |y| = |t | and y = t
(both are suffix of s). But then also χ(u) = χ(v), so u = v because
χ is injective. If wk w1 or wk = wn , then w = w1 uw1 uwn or w =
w1 uwn uwn contradicting that w is square-free. If wk = wn , then
w1 wk wn is a square-free 3-letter word, but χ(w1 wk wn ) = rr yy  tt =
ry  yy  yt, contrary to the hypothesis of the lemma.
3.9. Power-Free Shifts 125

• If |r | > |y  |, then χ(w1 ) = ry  r where r =  and χ(v1 ) = r r .
Since |χ(w1 )| = |χ(v1 )| = d, also r = . Now χ(w1 v1 w1 ) =
ry  r r r ry  r is not square-free, so w1 = v1 . Thus we can rewrite
χ(w1 ) = r qr for some q = , because otherwise not even χ(w1 ) is
square-free. But r is also a prefix of χ(u), so r qr r is a prefix of
χ(w1 u), contradicting the minimality of w.
• If |r | < |y  |, then χ(wk ) = yr y  where y  =  and χ(u1 ) = y  y  .
Since |χ(wk )| = |χ(u1 )| = d, also y  = . Now χ(wk u1 wk ) =
yr y  y  y  yr y  is not square-free, so wk = u1 . Thus we can rewrite
χ(wk ) = y  qy  for some q = , because otherwise not even χ(wk )
is square-free. But y  is also a prefix of χ(v), so y  qy  y  is a prefix
of χ(wk v), contradicting the minimality of w.
This proves the lemma. Note that χ : a → ab, b → cb, c → cd is square-free
on all 2-letter words, but χ(abc) = abcbcd is not square-free. Therefore the
minimal length |w| = 3 in the hypothesis of the lemma is optimal. 

Lemma 3.124 is a building block for the proof of the following result; see
[107, Theorems 5].

Theorem 3.125. The square-free subshift (X, σ) on three or more letters

has positive topological entropy.

As we mentioned in Theorem 3.122, there are no square-free sequences in

two letters, but the Thue-Morse sequences are overlap-free. Their entropy,
however, is zero: the word-complexity is known exactly; see (1.2).

Proof. The idea is to start with a square-free word x ∈ {0, 1, 2}n , from
which we can create 2n different square-free words in a 6-letter alphabet
A = {a, a , b, b , c, c } by replacing occurrences of 0 by a or a , occurrences of
1 by b or b , and occurrences of 2 by c or c , all independently. Let y be any
of the resulting words, and apply the following length-22 substitution to y:

⎪ a → 0102012021012102010212,

⎪ a → 0102012021201210120212,

⎨b → 0102012101202101210212,

⎪ b → 0102012101202120121012,

⎪ c → 0102012102010210120212,

⎩c → 0102012102120210120212.

By Lemma 3.124 and because χ is injective, this produces 2n square-free 22n-

words in {0, 1, 2}∗ . Hence the topological entropy htop (X, σ) ≥ 22
log 2 ≈
0.01368 > 0.
126 3. Subshifts of Positive Entropy

In addition, the proof in [107] also produces an upper bound, by remark-

ing that there are 1,172 square-free 24-letter words starting with 01. Com-
bined with the six square-free 2-letter words, there are altogether 6 · 1,172
square-free 24-letter words, and they can be extended in at most 1,172 ways
to a square-free 46-letter word, etc. This gives p(n) ≤ 6 · 1,172n/22 , so that
htop (X, σ) ≤ 22
log 1,172 ≈ 0.321. 

Other methods have been designed than the one in this proof; see e.g.
[60, 114, 235, 366, 472, 518]. If p(n) indicates the number of square-free
words in {0, 1, 2}n , then htop (X, σ) = limn n1 log p(n). For square-free sub-
shift of {0, 1, 2}∗ , the most accurate estimate to date is htop (X, σ) = log α for
1.3017597 < α < 1.3017619, see [503, 504], which contains also numerical
estimates for topological entropy for k-power-free shifts for various values of
k and alphabet sizes.
Definition 3.126. If w is a finite word, its repetition exponent is the
largest rational pq such that there is a prefix v of w such that w is a prefix of
v ∞ and |w| = pq |v|. If x is an infinite word, then the critical exponent of
x is the supremum of the repetition exponents of all its subwords w.

As such, the Thue-Morse sequence has critical exponent 2, and this is

the smallest critical exponent of any sequence in {0, 1}N . For general finite
alphabets, we have Dejean’s Theorem:
Theorem 3.127 (Dejean’s Theorem). The least critical exponent of x ∈
{0, . . . , N − 1}N is

if N = 3,
if N = 4,

⎩ N
N −1 if N = 2 or N ≥ 5.

The proof was completed in a list of articles [173, 189, 427, 470]. See
[467, 468] for related results. This raises the question of the topological en-
tropy of fractional repetition-free subshifts. For example, in [343] it is shown
that the 7/3-rd repetition-free subshift over 3 letters has polynomial word-
complexity, where γ-repetition-free shifts have positive entropy if k > 7/3.
In fact [343, Theorems 7 and 11], the word-complexity of the k-repetition-
free language satisfies

p(n) = O(nlog2 25 ) if 2 < k ≤ 7/3,
log p(n)
0 < lim supn n ≤ 63 log 2 if k > 7/3.

Furthermore, a two-sided infinite word is k-overlap-free for some 2 < k ≤ 7/3

if and only if the set of all its subwords belongs to L(ρTM ).
3.9. Power-Free Shifts 127

Cassaigne [145] discovered that for overlap-free shifts, there is no γ ≥

1 such that p(n) ≈ nγ , because the behavior is different along different
subsequences. The following theorem comes from [337].

Theorem 3.128. Let (XOF , σ) be the overlap-free shift, and pOF (n) its
word-complexity. Then:
log pOF (n)
• lim inf n→∞ log n ∈ [1.2690, 1.2736].
log pOF (n)
• lim supn→∞ log n ∈ [1.3322, 1.3326].

• The ratio log log

pOF (n)
n has a limit as n → ∞ along some subsequence
of density 1, and this limit belongs to the interval [1.3005, 1.3098].

In addition to the word-frequency, several authors also study the minimal

f (a) = lim inf |x1 · · · xn |a
n→∞ n

of a letter a ∈ A for k-repetition-free sequences x. This turns out to be

a non-trivial number, at least for the minimal alphabet-size that allows a
k-power-free shift. For example, [530] computes that within square-free
sequences in {0, 1, 2}N , f (0) ≥ 0.2746 . . . . For k-repetition-free sequences
with 2 < k ≤ 7/3, it was shown in [367] that f (0) = 12 , which is in agreement
with the above result that all subwords belong to L(ρTM ).

Proposition 3.129. Power-free and overlap-free subshifts are not sofic.

Proof. If (X, σ) was sofic, then there would be a finite edge-labeled transi-
tion graph representing X; see Theorem 3.32. But then we can pass a loop
arbitrarily often, creating any order powers. (This is basically the Pumping
Lemma 7.9 from Section 7.2.2.) 

Since k-power-free shifts don’t contain blocks 0k , they are not S-gap
or spacing shifts either. Specifically, because power-free shifts contain no
periodic sequences, we can ask whether power-free shifts are minimal. The
entire k-power-free shift is not minimal, because it contains non-recurrent se-
quences, but there exist minimal k-power-free subshifts for any value of k >
0. Naturally this holds for the Thue-Morse shift, and by Lemma 3.123 other
overlap-free shifts are obtained by performing substitutions to (XTM , σ).
Theorem 4.4 in Section 4.1 shows that linearly recurrent shifts with con-
stant L are L + 1-power-free. Sturmian shifts are also k-power-free for k
sufficiently large if and only if their frequencies are of bounded type; see
Example 4.46 in Section 4.2.5.
128 3. Subshifts of Positive Entropy

3.10. Dyck Shifts

Dyck34 shifts first appeared in a paper by Krieger [373], whose interest was
partially to give examples of subshifts with multiple measures of maximal
Definition 3.130. A Dyck shift (X, σ) is a two-sided shift on an alphabet
of k types of bracket pairs, such that every v ∈ L(X) can be extended to a
word v  in such a way that all opening/closing brackets in v are closed/opened
in v  and every two distinct pairs of opening and closing brackets are unlinked
in v  .
For example, ( ) [ ] is a legal word, as is ( ( ( ) [ ] [, but ( [ [ ) is illegal
because the ( bracket is not allowed to be closed before the [ brackets are.

Every Dyck shift has positive entropy, because it contains the coded shift
with code words () and (()). However, a Dyck shift with at least two pairs
of brackets is not synchronized. Indeed, there is no way that any word v
can synchronize so that both ([v)] and [(v]) both become admissible. On
the other hand, the Dyck shift is coded; see [450, Example 5.5]. Indeed,
let C be the collection of all the well-formed expressions with brackets
where each opening bracket is closed, without linking. In the terminology of
groups generated by the brackets, these are the expressions that reduce to
the identity if each pair of brackets ( ) = [ ] = · · · = Id.
Example 3.131. The language of the Dyck shift with one pair of brackets
is isomorphic to the language of the full shift on two symbols (with entropy
log 2), because every word v ∈ {(, )}∗ can be extended by brackets on the left
to supply brackets ( for every unopened ) and on the right to supply ) for
every unclosed (. The collection Lext of such extended words in which every
opening bracket is closed, and vice verse without illegally linked pairs, has
a representation with a countable automaton; see Figure 3.20. It can also
be represented as a push-down automaton (see Section 7.2.2), where we put
or remove a plate on/from the stack whenever we read an opening/closing

( ( ( (


) ) ) )

Figure 3.20. A countable automaton for the two bracket Dyck shift.

34 Named after Walther von Dyck (1856–1934) who, being a student of Klein, was more

interested in group theory.

3.10. Dyck Shifts 129

2n the number of well-formed expressions in Lext

Exercise 3.132. Show that
of length 2n is Cn = n+1 n , i.e. the n-th Catalan number.

The generating function of the Catalan numbers (with C0 = 1 by con-

vention) is

∞ √
1− 1 − 4x 2
GCat (x) := n
Cn x = = √ .
2x 1 + 1 − 4x

 −3/2 9 −5/2 145 −7/2 
More precise asymptotics are Cn = √ π
n + 8n + 128 n + ··· ,
so the entropy of the Dyck shift with one pair of brackets is indeed log 2,
just as in the full shift of two symbols: The allowed 2n-words are a small
fraction of all 2n-words, but not an exponentially small fraction.
To compute the entropy of the Dyck shift with k types of bracket pairs,
we obtain the well-formed expressions of length 2n by starting with the well-
formed expressions with one pair of brackets and then, for each joined pair of
open-and-closing brackets, choosing one of the k bracket types independently.
Thus there are Cn k n well-formed expressions with k types of bracket pairs,
and the entropy is ≥ log 2 + 12 log k. This is only a lower bound, because not
every 2n-word in this Dyck shift is a well-formed expression. The topological
entropy is really log(k + 1), which follows from the next result by Krieger

Theorem 3.133. The Dyck shift (X, σ) on k ≥ 2 types of bracket pairs has
exactly two ergodic measures of maximal entropy log(k + 1), and each one is
fully supported and isomorphic to a Bernoulli shift.

The notion of isomorphism between measures, as well as the techniques

used in the proof, are discussed in detail in Sections 6.5 and 6.1.

Proof. Let B− ⊂ X be the set of all sequences in which every left bracket
has a corresponding right bracket, and let B+ be the set of all sequences in
which every right bracket has a corresponding left bracket. Note that B+ and
B− are shift-invariant. One can show that every shift-invariant measure has
μ(B− ∪ B+ ) = 1 by partitioning the complement into a countable collection
of disjoint sets indexed by the location of the first/last left/right bracket
with no partner.
Define a map π+ : B+ → {0, 1, . . . , k}Z by sending the k left brackets
to the symbols {1, . . . , k} and sending every right bracket to the symbol 0.
Then π+ is an isomorphism between the two shift spaces because every right
130 3. Subshifts of Positive Entropy

bracket has a corresponding left bracket, and hence its identity is uniquely
determined by the rules of the shift. Similarly, the analogous map π− :
B− → {0, 1, . . . , k}Z is an isomorphism.
Because every ergodic invariant measure on X is supported on either
B− or B+ , we conclude that htop (X, σ) = log(k + 1) and that there are
exactly two ergodic measures of maximal entropy μ± = ν ◦ π± , where ν is
the Bernoulli measure on the full shift on k + 1 symbols that gives equal
weight to all symbols. Each of these measures gives positive measure to
every open set in X.
Finally, note that if k = 1, then B+ and B− largely overlap. If we
let ν be the ( 12 , 12 )-Bernoulli measure, then by the Law of Large Numbers,
the mass is concentrated on sequences with zeros and ones occurring with
frequency 1/2, so that the number of opening brackets and closing brackets
is asymptotically the same and μ+ = μ− . 

This and the next result from [373] have been shown in simplified form
in the Math Blog of Climenhaga [157].

Proposition 3.134. The set of ergodic measures for the Dyck shift is arc-
wise connected but is not dense in the Choquet simplex of invariant measures
(see Section 6.1 for the definition).

Proof. Let M± denote the set of ergodic measures supported on B± . By

the isomorphism in the previous proof, each of M+ and M− is arc-wise
connected. Moreover, because B+ ∩ B− is a non-empty closed invariant
subset of X, it supports at least one ergodic measure; hence M+ ∩ M− = ∅.
This shows arc-wise connectedness of the set of ergodic measures Merg .
To see that Merg is not dense in the Choquet simplex M (i.e. M is
not a Poulsen simplex), let ν1 be the δ-measure supported on the fixed point
. . . [[[. . . , and let ν2 be the δ-measure supported on the fixed point . . . ))) . . . .
Let ν = 12 (ν1 + ν2 ). Then any ergodic measure μ close to ν in the weak∗
topology must give mass close to 12 to each of the 1-cylinders corresponding to
[ and ), and almost no mass to the 1-cylinders corresponding to ] and (. Thus
if x is a typical35 point for μ, most symbols in x are [ and ). However, the
Dyck shift does not contain such x, because the symbol ) cannot appear until
all the preceding symbols [ have been closed with the corresponding symbol ].
This contradiction shows that ν cannot be approximated by ergodic measures
in the weak∗ topology, so Merg is not dense in M. 

35 In the sense of the Birkhoff Ergodic Theorem 6.13.

3.10. Dyck Shifts 131



Figure 3.21. A three-dimensional ‘heterogeneous’ baker transformation.

Example 3.135. A piecewise affine and ‘heterogeneous’ (i.e. stable man-

ifolds don’t have the same dimension at every point) hyperbolic map F :
[0, 1]3 → [0, 1]3 (see Figure 3.21) is defined as

⎪(4x − 2, y/2, (1 + z)/2) if (x, y, z) ∈ A,

⎨(4x − 3, (1 + y)/2, (1 + z)/2) if (x, y, z) ∈ B,
F (x, y, z) =

⎪ (2x, 2y, z/4) if (x, y, z) ∈ C,

⎩(2x, 2y − 1, (1 + z)/4) if (x, y, z) ∈ D,
and primes indicate the F -images of each of these four boxes; see [483].
That is, the two pizza-box shaped regions A and B are mapped into shoe-
boxes A and B  , and the shoe-box shaped regions C and D are mapped into
pizza-boxes C  and D  . The partition into these four boxes is not a Markov
partition because of the heterogeneity of the hyperbolicity. The symbolic
shift (X, σ) associated with this partition (i.e. a subshift of {A, B, C, D}Z )
is not an SFT . For example, AC can be followed by D but AAC cannot be
followed by D. However, (X, σ) is the Dyck shift with two types of brackets,
by the sliding block code
A→( B→[ C →) D →].
Consequently, it is a context-free subshift (see Section 7.2.2) and also syn-
Chapter 4

Subshifts of
Zero Entropy

Circle rotations and more generally interval exchange transformations are

zero entropy dynamical systems, whose symbolic versions are well-studied
subshifts. Substitution shifts are another major class of zero entropy sub-
shifts that were studied also before their role as symbolic description of
dynamical systems (e.g. translations on Rauzy fractals) became apparent.
In this chapter we also discuss adding machines (odometers) although these
are not subshifts, but they are at the core of Toeplitz shifts and B-free shifts.

4.1. Linear Recurrence

Definition 4.1. A subshift (X, σ) is linearly recurrent if there is L ∈ N
such that for every w ∈ L(X) and x ∈ X, there is 0 < k ≤ L|w| such that
σ k (x) ∈ [w]. That is, every word w ∈ L(X) reoccurs with gap ≤ L|w|.

This notion is stronger than uniformly recurrent, in that it relates the

N = N (U ) in the definition of uniform recurrence (in the case that U is a
cylinder set) in a uniform way to the length of U . An equivalent definition,
in terms of shift-invariant measures, is given in Lemma 6.30. Examples
of minimal shifts that are not linearly recurrent can be found among the
Sturmian shift, i.e. symbolic versions of circle rotations; see Section 4.3.3.
To be precise, a Sturmian shift is linearly recurrent if and only if its associated
rotation number is of bounded type; see Section 8.4.

Definition 4.2. Given u ∈ L(X), we call w a return word for u if

• u is a prefix and suffix of wu but u does not occur elsewhere in wu;
• wu ∈ L(X).
We denote the collection of return words of u by Ru .

134 4. Subshifts of Zero Entropy

In other words, we can write every recurrent point x ∈ [u] as

(4.1) x = w1 w2 w3 w4 w5 w6 · · · = uw1 uw2 uw3 uw4 uw5 uw6 · · · ,
where uwj = wj ∈ Ru for each j ∈ N, and the only appearances of u are
as prefix and suffix of wj , j ≥ 1. If (X, σ) is minimal (and hence u appears
with bounded gaps), then Ru is finite.
Example 4.3. Construct ρ ∈ {0, 1}N by setting ρ1 = 0, ρ2 = 1, and recur-
ρSk +1 · · · ρSk+1 = ρ1 · · · ρSk−1 , k ≥ 1,
for the Fibonacci numbers S0 , S1 , S2 , S3 , . . . = 1, 2, 3, 5, . . . This gives
ρ = 01 0 01 010 01001 01001010 0100101001001 · · · .
(This sequence is in fact the fixed point of the Fibonacci substitution of
Example 4.6.) If u = 010010, then w = 010 ∈ Ru because wu = 010|010010
starts and ends with u (even though these occurrences of u overlap). Note
that it is therefore possible that w ∈ Ru is shorter than u. However, at least
one of the return words has to be longer than u because otherwise u always
returns in ρ with gap ≤ n = |u| and therefore p(n) ≤ n and ρ is periodic by
Proposition 1.12.

The following result is due to Durand, Host & Skau [225].

Theorem 4.4. Let the subshift (X, σ) be non-periodic and linearly recurrent
with constant L. Then:
(i) The word-complexity is sublinear: p(n) ≤ Ln for all n ∈ N, so
htop (X, σ) = 0.
(ii) X is L + 1-power-free; i.e. uL+1 ∈
/ L(X) for any word u = .
(iii) For all w ∈ Ru , |u| < L|w|.
(iv) #Ru ≤ L(L + 1)2 .
(v) Every factor (Y, σ) of (X, σ) is linearly recurrent1 .

Proof. (i) Linear recurrence implies that for every n ∈ N and every word
u ∈ Ln (X) and x ∈ X, the occurrence frequency of u in x satisfies
1 1
lim inf #{1 ≤ i ≤ k : xi · · · xi+n−1 = u} ≥ .
k→∞ k Ln
Therefore there is no space for more than Ln words of length n.
(ii) If v ∈ Ln (X), then the gap between two occurrences of v is ≤ Ln,
so every word u of length (L + 1)n − 1 contains v at least once. If v L+1 ∈
L(X), then all words of length n are cyclic permutations of v because the
1 As shown in [221, Theorem 1], a linearly recurrent subshift has, up to isomorphism, only

finitely many different factors.

4.2. Substitution Shifts 135

gap between any other words of length n becomes too large otherwise; cf.
Proposition 1.12. But then X is periodic.
(iii) Take u ∈ L(X) and w ∈ Ru . If |u| ≥ L|w|, then the word wu
(which starts and ends with u) must have wL+1 as prefix. This contradicts
(iv) Take u ∈ L(X) and v ∈ L(X) of length (L + 1)2 |u|. By the proof of
(ii), every word of length ≤ (L + 1)|u| occurs in v and, in particular, every
return word w ∈ Ru occurs in v. Now return words in v don’t overlap, see
(4.1), so using the minimal length |w| ≥ |u|/L of return words (from item
(iii)), we find #Ru ≤ |v|/(|u|/L) = L(L + 1)2 .
(v) Finally, suppose that the subshift (Y, σ) over alphabet B is a factor
of (X, σ) and f : A2N +1 → B is the corresponding sliding block code, so
2N + 1 is its window size. Take u ∈ L(X) of length |u| ≥ 2N + 1 and v its
image under f . Then |v| = |u| − 2N . If w ∈ Rv , then |w| ≤ max{|s| : s ∈
Ru } ≤ L|u| ≤ (|v| + 2N )L ≤ (2N + 1)|v|L. Therefore Y is linearly recurrent
with constant (2N + 1)L.
The bound (2N + 1)L is not the sharpest. One can show that for every
ε > 0, there is L0 such that for all n ≥ L0 , x ∈ X, and v ∈ Ln (X), the gap
between two occurrences of v in x is at most (L + ε)n. 

4.2. Substitution Shifts

Definition 4.5. Let A = {0, . . . , N − 1} be a finite alphabet. A substitu-
tion2 χ is a map that assigns to every a ∈ A a single word χ(a) ∈ A∗ :

⎪0 → χ(0),

⎨1 → χ(1),
χ: ..

⎪ .

N − 1 → χ(N − 1)
and extends to A∗ (and to AN ) by concatenation:
χ(a1 a2 · · · ar ) = χ(a1 )χ(a2 ) · · · χ(ar ).
The substitution is of constant length if |χ(a)| is the same for every a ∈ A.

0 → 01,
Example 4.6. The Fibonacci substitution χFib : acts as
0 → 01 → 010 → 01001 → 01001010 → 0100101001001 → · · · .
2 Some authors use the word morphism for substitution. Formally, a morphism ψ : X → Y

is a map for which ψ(xy) = ψ(x)ψ(y) holds, provided concatenations (or products) are properly
defined on X and Y . The word substitution agrees better with our intuition of this concept, so
we will use substitution.
136 4. Subshifts of Zero Entropy

The lengths of χn (0) are exactly the Fibonacci numbers. The limit word ρFib
is also a Sturmian sequence, namely the one associated to the golden mean
as rotation number; see Section 4.3.

Lemma 4.7. Assume that χ(a) is non-empty for every a ∈ A. Then for
every a ∈ A, χn (a) tends to a periodic orbit of χ as n → ∞.

Proof. As can be seen in Example 4.6, if a is the first symbol of χ(a), then
χ(a) is a prefix of χ2 (a), which is a prefix of χ3 (a), etc. Therefore χn (a)
tends to a fixed point of χ as n → ∞.
Since #A = N , there must be p < r ≤ N such that χp (a) and χr (a)
start with the same symbol b. Now we can apply the above argument to
χr−p and b. 

Example 4.8. Take χ(0) = 10 and χ(1) = 1. Then

0 → 10 → 110 → 1110 → 11110 → · · · → 1∞ fixed by χ.

1 → 1 fixed by χ.

The second line of this example is not interesting, so we will usually make
the assumption

(4.2) lim |χn (b)| = ∞ for all b ∈ A.


Example 4.9. Recall the Thue-Morse sequences ρ0 and ρ1 from Exam-

ple 1.6. Applying the sliding block code f ([01]) = f ([10]) = 1 and f ([00]) =
f ([11]) = 0, the images of ρ0 and ρ1 are the same:

(4.3) ρfeig = 10 11 1010 10111011 1011101010111010 · · ·

which is the fixed point of the period doubling or Feigenbaum substi-


0 → 11,
(4.4) χfeig :
1 → 10.

This sequence appears as the kneading sequence (i.e. itinerary of the crit-
ical value) of the (infinitely renormalizable) Feigenbaum interval map; see
Section 4.7.1. It is also a Toeplitz sequence; see Example 4.86.

Example 4.10. The paper folding sequence is obtained by taking a strip

of paper, folding it (by a 180◦ right turn), folded over once more (by a 180◦
right turn), and again, etc. Then, after unfolding the paper strip again, write
down the traces of the folds on the paper: 1 for a right-turn fold, 0 for a
left-turn fold; see Figure 4.1.
4.2. Substitution Shifts 137

 fold   fold

 - -

1  0 1 
1 0 1
1 0 

Figure 4.1. Folding a paper strip: 1 = right-turn, 0 = left-turn.

The folding patterns this creates are

.. .. .. .. .. .. ..
. . . . . . .
Since the n-th stage sequence is always a prefix of the n+1-st stage sequence,
there is a well-defined limit x.
In folded form, the strip of paper looks much like the zero-composant
of the so-called Knaster continuum3 ; see Figure 4.2 (left) 4 . Note that after
the central right-turn, the second half of the strip follows the first half in the
opposite direction. This explains the palindromic anti-symmetry x2n −k =
1 − x2n +k for all n ≥ 1 and 1 ≤ k < 2n .
If we untighten the folded paper half-way in such a way that all 180◦
angles become 90◦ angles, then a fractal called Heighway dragon5 appears;
see Figure 4.2 (right).
The paper-folding sequence is also generated by the block substitution
⎧ ⎧

⎪11 → 1101, ⎪
⎪ 3 → 31,

⎪ ⎪

⎨10 → 1100, ⎨2 → 30,
χpf : equivalent to χ :
⎪01 → 1001,
⎪ ⎪
⎪ 1 → 21,

⎪ ⎪

⎩00 → 1000 ⎩0 → 20

3 Also called bucket handle.

4 This is also the standard planar embedding of the inverse limit space of the tent map with
slope 2 or of the unimodal Chebyshev polynomial Q2 (x) = 4x(1 − x).
5 After the NASA physicist John Heighway; see e.g. [529] for a bit of the story and more

mathematical background and references.

138 4. Subshifts of Zero Entropy

Figure 4.2. The Knaster continuum and the Heighway dragon.

in a 4-letter alphabet. Each 2-letter block has enough information to deter-

mine what it looks like after one more fold; see Figure 4.1. The fixed point
of χpf is
ρpf = 11 01 1001 11001001 1101100011001001 11011001110010001101 · · · ,
which can be recoded to
ρpf % 31 21 3021 31203021 3121302031203021 · · · .
Closer inspection shows that the 3’s and 2’s appear alternatingly, and the
1’s and 0’s appear in a pattern equal to ρpf itself.
Definition 4.11. A substitution subshift is any subshift (X, σ) that can
be written as X = Xρ = orbσ (ρ) where ρ is a fixed point (or periodic point)
of a substitution satisfying (4.2).
Lemma 4.12. For every substitution with a non-shift-periodic fixed point ρ,
the subshift (Xρ , σ) has at least one asymptotic pair, i.e. a pair of distinct
points x, y such that limn d(σ n (x), σ n (y)) = 0.

For example, the Feigenbaum substitution has an asymptotic pair

(0ρ, 1ρ), and each other asymptotic pair is a backward shift of this pair.
The Thue-Morse shift has two asymptotic pairs (0ρ0 , 1ρ0 ) and (0ρ1 , 1ρ1 ).
Lemma 4.12 holds for all infinite expansive subshifts, as can be derived from
[362, Theorem 2.1], but the current setting allows a short proof.

Proof. Since Xρ is not a collection of periodic orbits, there is some left-

special word w and a = b ∈ A such that χn (aw) is not a suffix of χn (bw) for
every n ≥ 0 and vice versa. Therefore there are a = b ∈ A, x ∈ Xρ , and a
sequence (kn )n∈N such that
σ kn ◦ χn (aw) → a x and σ kn ◦ χn (bw) → b x,
as n → ∞. Hence the limit words a x and b x are asymptotic. 
4.2. Substitution Shifts 139

Lemma 4.13. Each one-sided substitution shift space (Xρ , σ) admits a two-
sided substitution shift extension.

Proof. By Lemma 4.7, we can assume that χ(a) starts with a. First define
χ on two-sided sequences as
ρ(· · · x−2 x−1 .x0 x1 x2 x3 · · · ) = · · · ρ(x−2 )ρ(x−1 ).ρ(x0 )ρ(x1 )ρ(x2 )ρ(x3 ) · · · ,
where the central dot indicates where the zeroth coordinate is.
To create a two-sided substitution shift, take some i > 1 such that ρi = a,
and let a = ρi−1 . Similar to the argument of Lemma 4.7, there is b ∈ A and
p < q ∈ N such that ρp (a ) and ρq (a ) both end in b. Set K = q − p, so ρK (b)
ends with b. Next iterate ρK (b.a) repeatedly, so that limn ρnK (b.0) =: ρ̂ is a
two-sided fixed point of ρK . Finally, set X̂ρ = {σ n (ρ̂) : n ∈ Z}.
Even though ρ̂ need not be unique (because the choices of b and K are
not unique), due to minimality (see below), the shift space X̂ρ is unique. 

4.2.1. Primitive Substitutions.

Definition 4.14. The associated or transition matrix of a substitution
χ is the matrix A = (aij )i,j∈A such that aij = |χ(j)|i is the number of
symbols i appearing in χ(j). We call χ aperiodic and/or irreducible if
A is aperiodic and/or irreducible, in the sense of the Perron-Frobenius
Theorem 8.58; see Definition 3.6. The substitution is primitive if it is both
irreducible and aperiodic. Equivalently, χ is irreducible if for every i, j ∈ A
there exists n ≥ 1 such that i appears in χn (j).

This way of writing the associated matrix (and not its transpose) ensures
that composition of substitutions and composition of associated matrices
work in the same way: Aχ̃◦χ = Aχ̃ · Aχ .
Lemma 4.15. Let a primitive substitution χ with χ(0) = 0 · · · have the
fixed point ρ = 0 · · · and associated matrix A. Let v bethe right eigenvector
of the leading eigenvalue of A. If v is scaled so that i vi = 1, then vj =
limn n1 #{1 ≤ i ≤ n : ρi = j} is the frequency of the j-th letter in ρ.

Proof. Let u = (uj )j∈A , uj = |w|j /|w|, be the frequency vector of some
word w = 0 · · · ∈ A∗ and let u be the frequency vector of χ(w). Then
aij uj
ui =  j
aij uj , so u = f (u) := Au
Au1 . Since χ is primitive, the Perron-
Frobenius Theorem 8.58 assures that f n (u) converges to the leading eigen-
vector, which is therefore the frequency vector of the letters in the fixed point
ρ = limn χn (w). 
Remark 4.16. By taking the associated matrix A instead of the substitu-
tion, we lose the order structure of the substitution words. For instance, the
140 4. Subshifts of Zero Entropy

Thue-Morse substitution χTM and the substitution χ : 0 → 01, 1 → 01 have

the same associated matrix, but they behave entirely differently as subshifts.
The associated matrix is called the abelianization of the substitution. A
method to retain the order information is by taking matrices with power
series as entries: Let x = (xa )a∈A be a formal vector and set

aij (x) = δχ(j)k ,i xχ(j) ,
k=1 =1

where δa,b is the Dirac delta and an empty product is 1. Then A(1, . . . , 1) =
A and A(x) satisfies the composition rule for substitutions: Aχ◦ψ (x) =
Aχ (ATψ (x)) · Aψ (x). See e.g. [132].

Theorem 4.17. Let χ be a substitution satisfying hypothesis (4.2). Assume

that χ(a) starts with a, and let ρ be the corresponding fixed point of χ. Then
the corresponding substitution subshift (Xρ , σ) is minimal if and only if for
every b ∈ A appearing in ρ, there is k ≥ 1 such that χk (b) contains a.

Proof. If Xρ is minimal (i.e. uniformly recurrent according to Proposi-

tion 2.17), then every word, in particular a, appears with bounded gaps.
Let b be a letter appearing in ρ. Then χk (b) is a word in χk (ρ) = ρ, and
since |χk (b)| → ∞ by (4.2), χk (b) must contain a for k sufficiently large.
Conversely, let k(b) = min{i ≥ 1 : χi (b) contains a} and K = max{k(b) :
b appears in ρ}. Set Δb = χk(b) (b) and decompose ρ into blocks:
ρ = Δρ1 Δρ2 Δρ3 · · ·
= ρ1 · · · ρk(ρ1 ) ρk(ρ1 )+1 · · · ρk(ρ1 )+k(ρ2 ) ρk(ρ1 )+k(ρ2 )+1 · · · .
By the choice of k(ρj ), each block Δρj contains an a, so a appears with gap
K. Now take w ∈ L(Xρ ) arbitrary. There exists m ∈ N such that w appears
in χm (a). By the above, w appears in each χm (Δρj ) and hence w appears
with gap maxj |χm (Δρj )| = max{|χm+k(b) (b)| : b appears in ρ}. This proves
the uniform recurrence of ρ. The minimality of the orbit closure Xρ follows
from Corollary 2.22. 

Theorem 4.18 below shows that if χ is primitive, then (Xρ , σ) is linearly

recurrent and hence of linear complexity (pρ (n) ≤ Ln), but it doesn’t exclude
that ρ is periodic. For instance,

0 → 010,
(4.5) χ:
1 → 101

produces two fixed points ρ0 = (01)∞ and ρ1 = (10)∞ . We call a substitution

such that its fixed point ρ is not periodic under the shift aperiodic. Note
4.2. Substitution Shifts 141

that this is different from ‘the associated matrix of χ is aperiodic’, so be

aware of this unfortunate clash of terminology.
A mild assumption dispenses with such periodic examples, and then
pρ (n) ≥ n + 1; see Proposition 1.12.
Theorem 4.18. Every primitive substitution shift is linearly recurrent.

We follow the exposition of Durand [221, 222] here; the paper [176]
shows that for substitution shifts, linear recurrence is equivalent to minimal-

Proof. Let χ : A → A∗ be the substitution with fixed point ρ and (Xρ , σ)

the corresponding shift. Let
Sk := sup{|χk (a)| : a ∈ A } and Ik := inf{|χk (a)| : a ∈ A}.
Note that Ik ≤ S1 Ik−1 and I1 Sk−1 ≤ Sk for all k ∈ N. Since χ is primitive,
for every a, b ∈ A there exists Na,b such that χNa,b (a) contains b. Therefore
|χk (b)| ≤ |χk+Na,b (a)| ≤ SNa,b |χk (a)| for all k ∈ N.
Hence, taking N = sup{Na,b : a, b ∈ A}, we find
Ik ≤ Sk ≤ SN Ik for all k ∈ N.
Now let u ∈ L(Xρ ) and v ∈ Ru be arbitrary. Choose k ≥ 1 minimal so that
|u| ≤ Ik . Therefore there exists a 2-word ab ∈ L(Xρ ) such that u appears in
χk (ab). Let R be the largest distance between two occurrences of any 2-word
in L(Xρ ). Then R is finite by minimality of the shift. We have
|v| ≤ RSk ≤ RSN Ik ≤ RSN S1 Ik−1 ≤ RSN S1 |u|.
This proves linear recurrence with L = RSN S1 . 

A more general result on complexity of substitutions (without the as-

sumption of primitivity) is due to Pansiot [441–443].
Theorem 4.19. If χ : A → A∗ is a non-erasing (i.e. χ(a) = , the
empty word, for every a ∈ A) substitution with χ(a) = au for some a ∈ A,
 = u ∈ A∗ , then the complexity of ρ = limn χn (a) is one of the following:
(1) pρ (n) is bounded (if ρ is (pre)periodic).
(2) pρ (n) ≈ n, including the primitive case.
(3) pρ (n) ≈ n log log n.
(4) pρ (n) ≈ n log n.
(5) pρ (n) ≈ n2 .
Here pρ (n) ≈ a(n) means that there is C > 0 such that C −1 a(n) ≤ pρ (n) ≤
Ca(n) for all n sufficiently large.
142 4. Subshifts of Zero Entropy

Deviatov [197] extended these results to S-adic shifts; see Section 4.2.5.

Example 4.20. If we remove the non-erasing condition in the above the-

orem, then even more asymptotics for p(n) become possible. Let A =
{a, b0 , . . . , br } for some r = N and let χ : A → A∗ be given by

⎨a → abr ,
χ : bk → bk bk−1 , for k = 1, . . . , r,

b0 → b0 .

Then χ has a unique fixed point, which for e.g. r = 3 looks like

ρ = ab3 . b3 b2 . b3 b2 b2 b1 . b3 b2 b2 b1 b2 b1 b1 b0 . b3 b2 b2 b1 b2 b1 b1 b0 b2 b1 b1 b0 b1 b0 b0 · · · .
v1 v2 v3 v4

Set vi = χi (abr ) for i ≥ 0. The dots separate the blocks wi , where w0 = abr
 suffix of vi of length |vi | − |vi−1 |. Then symbol bk appears
and wi is the
exactly r−k times in wi .
Next apply an erasing substitution χ̃ : A → {0, 1}∗ given by

⎨a → ,
χ̃ : bk → 0, for k = 0, . . . , r − 1,

br → 1

to ρ. Then
n0 n1
ρ̃ := χ̃(ρ) = 1.10 .10 .10 .10 .10 .10n2 n3 n4 n5
··· for ni = ≈ ir /r!.

It can be shown (see [68, Proposition 4.7.2]) that the complexity of ρ̃ is

pρ̃ (n) ≈ n r n.

Regarding the amorphic complexity of primitive constant length substi-

tutions, Fuhrmann & Gröger [262] proved the following result:

Theorem 4.21. Let χ : {0, 1} → {0, 1}∗ be an aperiodic primitive substitu-

tion of constant length , and let (X, σ) be the associated subshift. Then the
amorphic complexity
ac(σ) = for ∗ = #{1 ≤ i ≤  : χ(0)i = χ(1)i }.
log  − log ∗

In this theorem, ac(σ) = ∞ is allowed if the denominator log  − log ∗ =

0, such as is the case with the Thue-Morse substitution.
4.2. Substitution Shifts 143

4.2.2. Block and Return Word Substitutions. We can view substi-

tutions also on the level of -block shifts, as in Section 1.4. That is, we
introduce a new alphabet A having the -words in L (X) as letters, and we
study χ as substitution χ : A → A∗ . To this end, if u = u1 u2 · · · u ∈ A
and χ(u) = v and k = |χ(u1 )|, define
χ (u) = w̄ := w1 · · · w w2 · · · w+1 · · · wk · · · w+k−1 .
w̄1 w̄2 w̄k

That is, w̄j is the j-th word of length  inside w = χ(u). Note that |χ (u)| is
equal to |χ(u1 )|, which is not necessarily the same as the number of -words
that fit in χ(u). For example, if χ = χFib : 0 → 01, 1 → 0 is the Fibonacci
substitution on the alphabet A = {0, 1}, and  = 3, then the new alphabet
A = {001, 010, 100, 101} = {a, b, c, d} and
⎧ ⎧

⎪ 001 → 01010, ⎪
⎪ a → bd because |χ(u1 )| = 2,

⎪ ⎪

⎨010 → 01001, ⎨b → bc because |χ(u1 )| = 2,
χ: χ :

⎪ 100 → 00101, ⎪
⎪ c→a because |χ(u1 )| = 1,

⎪ ⎪

⎩101 → 0010, ⎩d → a because |χ(u1 )| = 1,
with associated matrices
⎛ ⎞
  0 0 1 1
1 1 ⎜1 1 0 0⎟
A= and A2 = ⎜ ⎝0 1 0 0⎠

1 0
1 0 0 0
√ √
and eigenvalues 12 (1 ± 5) and 12 (1 ± 5), 0, 0, respectively. For the second
⎧ ⎧

⎪ 001 → 01001001, ⎪
⎪ a → bca because |χ2 (u1 )| = 3,

⎪ ⎪

⎨010 → 01001010, ⎨b → bca because |χ2 (u )| = 3,
χ2 : χ2 :

⎪ 100 → 01010010, ⎪
⎪ c → bd because |χ (u1 )| = 2,

⎪ ⎪

⎩ 101 → 0101001, ⎩ d → bd because |χ2 (u1 )| = 2.
This example shows that powers of χ and powers of χ match: f ◦ χn (x) =
χn ◦ f (x) if f is the transposition of words x ∈ A∗ into words in A∗ .
Proposition 4.22. Let χ be the -block version of the substitution χ, with
associated matrix A . If χ is a primitive substitution, then so is χ , and the
leading eigenvalue of A is equal to the leading eigenvalue of A1 . For the
remaining eigenvalues of A , they are the same as those of A2 , possibly with
extra eigenvalues 0.

Proof. We follow the proof in [465, Section V.5]. Since χ : A → A∗ is

primitive, every word u appears in χn (a), a ∈ A, for n sufficiently large. But
then χ : A → A∗ is also a primitive substitution.
144 4. Subshifts of Zero Entropy

Let λ and λ be the leading eigenvalues of the associated matrices A and

A of χ and χ , respectively. Let u = u1 · · · u be the “first” letter of the
alphabet A , where u1 · · · u is a prefix of χ(u1 · · · u ). Then
|χ (u)n+1 | |χ(u1 · · · u )n+1 |
λ = lim = = λ.
n→∞ |χ (u)n | |χ(u1 · · · u )n |
Now for the remaining eigenvalues, fix  ≥ 3 and take p ∈ N so that |χp (a)| ≥
 − 1 for each a ∈ A. This means that χp (w1 w2 ) = y1 y2 · · · y|χp (w1 w2 )| is long
enough so we can properly define ψ : A2 → A ,
ψ (w1 w2 ) = y1 y2 · · · y y2 y3 · · · y+1 · · · y|χp (w1 )| y|χp (w1 )|+1 · · · y|χp (w1 )|+−1 .
w̄1 ∈A w̄2 ∈A w̄|χp (w1 )|

Conversely, let ψ2 : A → A2 , ψ2 (w1 w2 · · · w ) = w1 w2 be the reduction of

w ∈ A to its first two letters in A. Then the following diagram commutes:

χp χ
A∗ A∗ A∗

ψ2 ψ ψ ψ2

χ2 χp2
A∗2 A∗2 A∗2

That is, ψ ◦ ψ2 = χp and ψ2 ◦ ψ = χp2 . Let A and  be the abelianizations

of χ2 and χ , and let D and C be the abelianizations of ψ2 and ψ . Then
the commutative diagram translates to A and  being shift equivalent with
lag ; see Definition 3.25. By Lemma 3.27 they have the same eigenvalues
up to possibly 0. 
Corollary 4.23. With the notation from the proof of Proposition 4.22, let
v be a right eigenvector of A. Then v  = Cv is a right eigenvector of the
associated matrix  for the same eigenvalue. In particular, if v is the leading
eigenvector of χ2 , then the normalization of the vector Cv is the frequency
vector of the -words in the fixed point of χ.

Proof. Since CA = ÂC, we have Âv  = ÂCv = CAv = Cλv = λv  . 

Example 4.24. In order to show that A = A1 itself does not fully determine
the eigenvalues of A , we consider
1 → 121, 1 → 112,
χ: and ψ :
2 → 212 2 → 212;
4.2. Substitution Shifts 145

see [3]. Clearly they both have the same associated matrix with eigenval-
ues 1 and 3. However, the fixed point of χ is a shift-periodic sequence
121212121 · · · and χ = χ for each  if we recode the two -blocks by their
first letters. For ψ with {a, b, c, d, e, f } = {112, 212, 121, 211, 122, 221} we

⎪ a → abd, ⎛ ⎞

⎪ 1 0 1 0 1 0

⎪ b → bcd,

⎪ ⎜1 1 0 1 0 1⎟

⎨c → aef, ⎜ ⎟
⎜0 1 0 1 0 0⎟
ψ2 : with associated matrix A2 = ⎜ ⎜ ⎟
⎪d → bcd,
⎪ ⎜ 1 1 0 1 0 0⎟ ⎟

⎪ ⎝0 0 1 0 1 1⎠

⎪e → aef,

⎩f → bef 0 0 1 0 1 1

which has eigenvalues 3, 1, 12 (1 ± 3i), 0, 0. In Table 4.1 we worked out some
of the details for the Fibonacci substitution.

Rather than dividing words x ∈ Xρ into blocks of equal length, we can

also divide x into return words. Suppose that ρ is the fixed point of a
primitive substitution χ : A → A∗ and u =  is a prefix of ρ. Then we can
divide ρ into blocks equal to the return words v ∈ Ru , simply by starting a
new block at every next occurrence of u in ρ. Let
Θu : Au := {1, . . . , #Ru } → Ru
be a bijection such that Θu (1) is the first return word in this decomposition.
Thus there is a sequence ρu ∈ AN u
u such that the concatenation Θu (ρ ) = ρ.
The following results are due to Durand [219]
Lemma 4.25. Let χ : A → A∗ be a primitive substitution with fixed point ρ
and Θu , ρu ∈ ANu for some non-empty prefix u of ρ be as above. Then there
is a primitive substitution χu : Au → A∗u with fixed point ρu such that
Θu ◦ χu = χ ◦ Θu and χu (ρu ) = ρu .

Proof. If u is a prefix of w and w is a prefix of ρ, then each return word

in Rw is a concatenation of return words in Ru . Let v = Θu (a) ∈ Ru be
arbitrary. Then u is a prefix of vu and χ(u) is a prefix of χ(v)χ(u) and of
χ(u) (because vu is a prefix of ρ). Now χ(v) is a subword of ρ that starts
with u and is succeeded in ρ by u. Therefore χ(v) = Θu (a1 ) · · · Θu (an ) is
some concatenation of return words in Ru . Hence, if we set
χu (a) = a1 · · · an
and do likewise for all a ∈ Au , then Θu ◦ χu = χ ◦ Θu . By construction,
χu (1) starts with 1, so limk χku (1) = ρu and Θu (ρu ) = ρ.
146 4. Subshifts of Zero Entropy

Table 4.1. Block substitutions of the Fibonacci substitution.

χ χ associated
leading left eigenvector ψ matrices

⎧ ⎧ ⎛ ⎞

⎨a := 00 → 0101, ⎪
⎨a → bc, 0 0 1
χ : b := 01 → 010,
2 χ2 : b → bc, ⎝1 1 0⎠

⎩ ⎪

c := 10 → 001 c→a 1 1 0
⎛ ⎞
⎝γ ⎠

 = 3, p = 2
⎧ ⎧
⎪ a := 001 → 01010, ⎪ a → bd, ⎛ ⎞

⎪ ⎪
⎪ 0 0 1 1

⎨b := 010 → 01001, ⎪
⎨b → bc, ⎜1 1 0 0⎟
χ3 : χ3 : ⎜ ⎟
⎪ c := 100 → 00101, ⎪ c → a, ⎝0 1 0 0⎠

⎪ ⎪

⎩d := 101 → 0010 ⎪
⎩d → a 1 0 0 0
⎛ ⎞ ⎧ ⎛ ⎞
1+γ ⎪ 1 1 0
⎜1 + 2γ ⎟ ⎨a → bca, ⎜1
⎜ ⎟ ⎜ 1 1⎟⎟
⎝1+γ ⎠ ψ3 : b → bca, ⎝1

⎩ 1 0⎠
γ c → bd 0 0 1

 = 4, p = 3
⎧ ⎧
⎪ a := 0010 → 0101001, ⎪ a → ce, ⎛ ⎞

⎪ ⎪
⎪ 0 0 0 1 1

⎪ ⎪

⎨b := 0100 → 0100101, ⎪
⎨b → bd, ⎜0 1 1 0 0⎟
⎜ ⎟
χ : c := 0101 → 010010,
4 χ4 : c → bd, ⎜1 0 0 0 0⎟
⎪ ⎪ ⎜ ⎟

⎪ ⎪
⎪ ⎝0 1 1 0 0⎠

⎪ d := 1001 → 001010, ⎪
⎪ d → a,

⎩ ⎪
⎩ 1 0 0 0 0
e; = 1010 → 001001 e→a
⎛ ⎞ ⎛ ⎞
1 + 2γ ⎧ 1 1 1
⎜1 + 2γ ⎟ ⎪
⎨a → bdace, ⎜1 1 1⎟
⎜ ⎟ ⎜ ⎟
⎜1+γ ⎟ ψ4 : b → bdace, ⎜1 1 0⎟
⎜ ⎟ ⎪ ⎜ ⎟
⎝1 + 2γ ⎠ ⎩ ⎝1 1 1⎠
c → bda
1+γ 1 1 0

Finally, if χu is not primitive, then there are a, b ∈ Au such that Θu (b) is

not a subword of χk (Θu (a)) = Θ(χku (a)) for any k ≥ 1. But this contradicts
that χ is primitive. 
4.2. Substitution Shifts 147

For each prefix u =  of ρ, we call χu a derived substitution of χ.

Corollary 4.26. Every primitive substitution has only finitely many differ-
ent derived substitutions.

Proof. Recall from Theorem 4.18 that a primitive substitution shift is lin-
early recurrent; let L be the correspondent constant. Then, independently
of the prefix u =  of ρ, we have by Theorem 4.4 that
#Au = #Ru ≤ L(L + 1)2 , ≤ |v| ≤ L|u|, and |χ(v)| ≤ KL|u|
for K = supa∈A |χ(a)|. Therefore there is no space for more than finitely
many different substitutions. 
Proposition 4.27. All derived substitutions of a primitive substitution χ
have the same eigenvalues, possibly with extra eigenvalues 0.

Proof. Let u and v be prefixes of the fixed point ρ of χ so that u is a prefix

of v. Return words of Rv are concatenations of return words of Ru , so there
is a substitution ψ : Av → A∗u such that Θu ◦ ψ = Θv . Hence
(4.6) Θu ◦ χu ◦ ψ = χ ◦ Θu ◦ ψ = χ ◦ Θv = Θv ◦ χv = Θu ◦ ψ ◦ χv .
Next take  so large that v is a prefix of χ (u). Return words of χ (u) are
concatenations of return words of Rv , so there is a substitution ψ̃ : Au → A∗v
such that Θv ◦ ψ̃ = χ ◦ Θu . Hence
(4.7) Θv ◦ χv ◦ ψ̃ = χ ◦ Θv ◦ ψ̃ = χ+1 ◦ Θu = χ ◦ Θu ◦ χu = Θv ◦ ψ̃ ◦ χu .

⎨Θv ◦ ψ̃ ◦ ψ = χ ◦ Θu ◦ ψ = χ ◦ Θv = Θv ◦ χv ,
⎩Θ ◦ ψ ◦ ψ̃ = Θ ◦ ψ̃ = χ ◦ Θ = Θ ◦ χ .
u v u u u

Removing the left-factors Θu and Θv in (4.6), (4.7), and (4.8) gives

χu ◦ ψ = ψ ◦ χv , χv ◦ ψ̃ = ψ̃ ◦ χu and ψ̃ ◦ ψ = χv ψ ◦ ψ̃ = χu .
This means that the abelianizations A and  of χu and χv are shift equivalent
of lag , with the abelianizations D and C of ψ and ψ̃ as conjugating matrices.
By Lemma 3.27, A and  have the same eigenvalues, up to 0. 

By the same sort of argument, Durand [219, Proposition 9] also showed

that these eigenvalues are the same as those of χ, except for possible 0 and
roots of unity.
Example 4.28. The √ primitive Pisot substitution χ : 0 → 0110, 1 → 010
has eigenvalues 12 (3 ± 17) and fixed point
ρ = 0 110 0100100110 011001001100110010011001100100100110 · · · .
148 4. Subshifts of Zero Entropy

Table 4.2 shows the derived substitution for the first few prefixes of ρ.

Table 4.2. Derived substitutions of the substitution χ.

u Ru = {a, b, (c)} ρu χu eigenval.

⎨a → abcbcb, √
0 {011, 0, 01} abcbcbababcbab · · · b → ab, 0, 3±2 17

 c → abcb
a → abba, √
01 {0110, 010} abbaabaabaabba · · · 3± 17
b → aba

⎨a → abcbcb, √
{0110010, 0110,
011 abcbcbababcbab · · · b → ab, 0, 3±2 17
0110010} ⎪

c → abcb

⎨a → abcbcb, √
{0110010, 0110,
0110 abcbcbababcbab · · · b → ab, 0, 3±2 17
0110010} ⎪

c → abcb

For u = 0,  = 1, and v = 01 from the proof of Proposition 4.27, we find

a → ab, ⎨a → abb,
(4.9) ψ= and ψ̃ = b → a,
b → cb ⎪

c → ab.

Note that ρu is also the fixed point of the substitution

⎨a → abcbc,
b → bab,

c → abc,

but this one doesn’t match with (4.9).

4.2.3. Recognizability. We call a substitution injective if χ(a) = χ(b)

for all a = b ∈ A. Most of the examples above were indeed injective, but, in
general, substitutions are not injective and hence not invertible, not even as
map χ : Xρ → Xρ . But we can still ask:
Is an injective substitution χ : Xρ → χ(Xρ ) invertible, and
what does the inverse look like?
To illustrate the difficulty here, assume that χ from (4.5) acts on a two-sided
shift space. Then what is the inverse of x = · · · 010101010 · · · ? Without
4.2. Substitution Shifts 149

putting in the dot to indicate the zeroth position, there are three ways of
dividing x into three-blocks,
x = · · · |010|101|010| · · · = · · · 0|101|010|10 · · · = · · · 01|010|101|0 · · · ,

and each with their own inverse. The way to cut x into blocks χ(a) is called
a 1-cutting of x. The problem is thus: can a sequence x ∈ χ(Xρ ) have
multiple 1-cuttings if we don’t know a priori where the first block starts?

Remark 4.29. We give a brief history of this problem. In 1973, Martin

claimed that any substitution on a two-letter alphabet which is aperiodic
is one-sided recognizable (or ‘rank one determined’). His proof was not
convincing. In 1986, Host proved that a primitive substitution shift Xρ is
one-sided recognizable if and only if χ(Xρ ) is open in Xρ . This condition is
not so easy to check, though. In 1987, Quefféllec announced a short proof of
the unilateral recognizability of constant length substitutions due to Rauzy.
Nobody could check this proof. In his 1989 PhD Thesis, Mentzen claimed
to prove this result, using a paper by Kamae of 1972. However, in 1999,
Apparicio found a gap in Mentzen’s proof (Kamae’s results only work for a
particular case of the theorem, namely if the length is a power of a prime
number). She solved the problem using a 1978 result by Dekking. In the
meantime, in 1992, Mossé proved a more general result (also non-constant
length), but using a new notion of (two-sided) recognizable substitution. She
refined this result in 1996. Her results are currently considered as the defin-
itive reference, although since then proofs by other methods (using results
from Downarowicz & Maass [215]) were also found [71, 78].

Fix x ∈ Xρ and define the sequences

E = {|χ(x1 x2 · · · xi )|}i≥0 and Ek = {|χk (x1 x2 · · · xi )|}i≥0 .

By convention, the zeroth entry (for i = 0) is 0. In short, Ek tells us how

to divide x into blocks of length |χk (xi )| if we start at 0. Clearly if χ is of
constant length M , then Ek = {iM k }i≥0 .

Definition 4.30. A substitution word x ∈ Xρ is

• one-sided recognizable if there is N such that for every i, j ∈ N
such that xi · · · xi+N = xj · · · xj+N we have i ∈ E if and only if
j ∈ E,
• two-sided recognizable if there is N such that for every i, j ∈ N
such that xi−N +1 · · · xi+N = xj−N +1 · · · xj+N we have i ∈ E if and
only if j ∈ E.
We call N the recognizability index.
150 4. Subshifts of Zero Entropy

In this definition, the sequence x from (4.10) is not recognizable, but for
example the fixed point of the Fibonacci substitution χFib is recognizable
with recognizability index 2. The Thue-Morse sequence ρ0 (or ρ1 ) is recog-
nizable with recognizability index 4. The following result is due to Mentzen
(1989) and Apparicio [29]:
Theorem 4.31. Every primitive injective constant length substitution with
aperiodic fixed point is one-sided recognizable.

For non-constant length substitutions, things are more involved.

Example 4.32. The substitutions

0 → 0001, ⎨0 → 0012,
χ: and χchac : 1 → 12,
1 → 01 ⎪

2 → 012
are not one-sided recognizable. For example, the fixed point of the first one
0001 0001 01 0001 01 · · ·
0001 0001 0001 01 00 01 
ρ = 0001 0001 0001 01 
u u

and just based on the word u = 010001, we cannot say if the cut is directly
before its occurrence or not. This problem does not disappear if we take
longer words. The latter substitution χchac is called the primitive Chacon
substitution; see Example 6.124.

In 1992 [426], Mossé also gave conditions under which recognizability

Theorem 4.33. Let Xρ be an aperiodic primitive substitution. Suppose that
for every n ∈ N there exists v ∈ L(Xρ ) with |v| ≥ n and a, b ∈ A such that
(1) χ(a) is a proper suffix of χ(b) and
(2) χ(a)v and χ(b)v ∈ L(X) and have the same 1-cutting of v.
Then χ is not one-sided recognizable.
Theorem 4.34. Every aperiodic primitive injective substitution is two-sided

The recognizability index was determined by Durand & Leroy [227].

Recognizability of aperiodic, but not necessarily primitive, substitution shifts
was proved in [78, Theorem 5.17] and later in [71, Theorems 4.6 and 5.3],
which in [71, Theorems 5.1 and 5.2] extended a part of the result to S-adic
shifts; see Section 4.2.5.
4.2. Substitution Shifts 151

4.2.4. Pisot Substitutions. Substitutions χ for which the leading eigen-

value λ of the associated matrix A is a Pisot number (i.e. their algebraic con-
jugates lie inside the open unit disc; see Definition 8.2) have particularly nice
properties. They are called Pisot substitutions, and irreducible Pisot
substitutions if the characteristic polynomial of A is irreducible. Be aware
that, confusingly, this is not the same “irreducible” as in Definition 4.14.
Remark 4.35. Let λ be the leading eigenvalue of a Pisot matrix A. The
minimal polynomial p(x) of λ always divides the characteristic polynomial of
A. If these two polynomials are not equal, say det(A − xI) = p(x)q(x), then
the roots of q are zero or roots of unity; i.e. Pisot matrices with reducible
characteristic polynomials can still have eigenvalues on the unit circle.

A Pisot substitution is called unimodular6 if the associated matrix

satisfies det(A) = ±1.
For Pisot numbers λ, the distance to the nearest integer |||λn ||| → 0 ex-
ponentially; see Proposition 8.5. This leads to the very useful property of
Pisot substitutions that
(4.11) | λ|χn (a)| − χn+1 (a)| =: en (a) → 0 exponentially.
Indeed, assume that the second eigenvalue μ has multiplicity m, and let
(4.12) fa := lim |ρ1 · · · ρn |a
n→∞ n

be the letter frequencies of the fixed point ρ of χ. As column

 vector f =
(fa )a∈A is the leading right eigenvector of A, such that a fa = 1. Using
the diagonalization A = U DU −1 where (fa ) is the leftmost column of U and
writing 1b for the column vector with a single 1 at position b, we find
|χn (b)| = (An 1b )a = (U D n U −1 1b )a
a∈A a

(4.13) = fa λn (U −1 )1b + O(nm−1 μn ) = Cb λn ,


for Cb = (U −1 )1b . Therefore λ|χn (b)| = λn+1 Cb + O(nm−1 μn ) = |χn+1 (b)| +

O(nm−1 μn ), implying (4.11).
Condition (4.11) suffices to conclude7 that there is a continuous function
gλ : Xρ → S1 such that gλ ◦ σ = e2πiλ gλ . That is, gλ is a continuous
eigenfunction of the Koopman operator Uσ f = f ◦ σ. Dynamically this
means that the rotation Rλ : S1 → S1 over angle λ is semi-conjugate to
(Xρ , σ) and gλ is the semi-conjugacy.

6 Not to be confused with unimodal interval maps in Section 3.6.1.

7 See Theorem 6.118 for a more general result.
152 4. Subshifts of Zero Entropy

Using the above computation, we see that also ||| λk |χn (b)| ||| → 0 expo-
nentially, and therefore λk are eigenvalues as well; cf. Theorem 8.8. If the
minimal polynomial of λ has degree d = #A, then 1, λ, . . . , λd−1 are linearly
independent, but λd is a linear combination of 1, λ, . . . , λd−1 . Thus
g : Xρ → Td−1 , x → (gλ , gλ2 , . . . , gλd−1 )
is a semi-conjugacy between (Xρ , σ) and the toral rotation Rλ : Td−1 →
Td−1 , x → x + λ mod 1 for the translation vector λ = (λ, . . . , λd−1 ). Again,
since the 1, λ, . . . , λd−1 are linearly independent, Rλ is minimal and uniquely
ergodic, with Lebesgue measure as its only Rλ -invariant probability measure.
It is widely believed that, for every irreducible Pisot substitution, (Xρ , σ, μ)
is isomorphic to (Td−1 , Rλ , Leb); i.e. the semi-conjugacy π is one-to-one μ-
a.e. This is a corollary of Halmos & von Neumann’s Structure Theorem
6.100, together with the Pisot substitution conjecture which states that every
irreducible Pisot substitution has a pure point spectrum; see Section 6.8.3.
In this section, we will give some more properties of Pisot substitutions,
leading to a more geometrical understanding of g.
The letter frequencies fa = limn n1 |x1 · · · xn |a of substitution shifts exist
for all a ∈ A, independently of x ∈ X. Frequency is a limit notion, but
there are ways to measure how often subwords and letters appear in finite
words, without taking limits. Given a word v = v1 · · · vn ∈ A∗ , let |v|a =
#{1 ≤ i ≤ n : vi = a} be the number of appearances of the letter a in v.
Similarly, |v|u stand for the number of occurrences of the word u in v.
Definition 4.36. A language L(X) is called R-balanced if there is an
R ∈ N such that
||v|a − |w|a | ≤ R
for all a ∈ A, n ∈ N, and words v, w ∈ Ln (X). If R is not specified, then we
just say balanced. Similarly, we call L(X) balanced on words if there is
R ∈ N such that
||v|u − |w|u | ≤ R
for all u ∈ L, integers n ≥ |u|, and words v, w ∈ Ln (X).
Theorem 4.37. Every primitive Pisot substitution shift is balanced.

Proof. Let f = (fa )a∈A be the frequency vector; it is the right eigenvalue
of the associated matrix A of χ; see Lemma 4.15. Let λ, μ be the largest two
eigenvalues of A. Because λ is a Pisot number, λ > 1 > |μ|. Assume that
μ has multiplicity m. Then, using the Jordan decomposition A = U JU −1
where f is the leftmost column of U and writing 1b for the unit column
vector with a single 1 at position b, we find
|χn (b)|a = (An1b )a = U J n U −11b = fa λn (U −1 )1b + O(nm−1 μn ).
4.2. Substitution Shifts 153

We sum over a ∈ A, noting that a∈A fa = 1: |χn (b)| = a∈A |An1b |a =
λn (U −1 )1b + O(nm−1 μn ). Therefore
(4.14) | |χn (b)|a − fa |χn (b)| | = O(nm−1 μn ),
proving that the discrepancy is bounded at the words χn (b); see (8.18) and
Definition 8.40 in Section 8.3.1. We can split an arbitrary word w ∈ L(ρ) as

(4.15) w = v0 χ(v1 ) · · · χn−1 (vn−1 )χn (vn )χn−1 (vn−1 ) · · · χ(v1 )v0
for some maximal n such that vn =  and each vk and vk have length ≤ L :=
maxa∈A |χ(a)|. Applying (4.14) to each of χj (vj ) and χj (vj ) we get bounded
discrepancy altogether. It follows by Proposition 8.43 that ρ is balanced; see
also Proposition 4.22. 

Remark 4.38. The above proof can be adapted to show that also whole
words v ∈ L (ρ) appear with bounded discrepancy, namely by considering
the -block shift, which is also Pisot, and in which v is simply a single letter.
Without proof (see [3, 4]), we remark that if λ is not a Pisot number, then
the discrepancy

⎪ m log |μ|/ log |λ| if |μ| > 1,
⎨(log n) n

nDn (ρ) ≈ (log n) or (log n)
m m−1 if |μ| = 1 is a root of unity,

(log n) m if |μ| = 1 is not a root of unity.
where again μ is the second largest eigenvalue, of geometric multiplicity m.

References for the following construction of Rauzy fractals include [33,

69, 328, 329]. Let us label the coordinate axes of Rd by the letters a ∈ A
(so d = #A is also the degree of λ, provided that χ is indeed irreducible).
Let 1a , a ∈ A, denote the unit vectors. Let Eλ+ be the positive half-line in
the direction the leading right eigenvector f of A. To each x ∈ Xρ we will
assign a broken line
x → L(x) = (i (x))i≥1
as follows: Starting at the origin, we concatenate unit length arcs ui (x),
i ≥ 1, parallel to 1a if xi = a, so that ui+1 (x) meets with ui (x) only at a
single common endpoint i (x) ∈ ZN with coordinates i (x)a = |x1 · · · xi |a ;
see Figure 4.3. We also let 0 (x) = 0 be the origin.
Let V be the d − 1-dimensional hyperplane spanned by the (generalized)
eigenvectors of A other than f. Equivalently, V is the orthogonal comple-
ment of the leading left eigenvector of A. Let π : RN → V be the projection
parallel to Eλ+ . The set

R := π({n (ρ) : n ∈ N})

154 4. Subshifts of Zero Entropy

• •

• •

• • • 5 (ρ)

• •

⎨0 → 02
χ: 1→0


ρ = 02100202102 · · ·

Figure 4.3. Broken line construction of a Rauzy fractal.

is called the Rauzy fractal of χ, [32, 53, 471]. See Figure 4.4 for some
examples in dimension two. Strictly speaking, for Rauzy fractals that are
topological disks, it is only the boundary of R that is fractal.

Figure 4.4. The Rauzy fractals for x3 = x2 + x + 1 (tribonacci) and

x3 = x2 + 1.

We can transfer the shift action σ from Xρ to the space of broken lines

(4.16) σ̂ ◦ L = L ◦ σ for σ̂(L)k = k+1 − 1 , L = (i )i≥0 .

4.2. Substitution Shifts 155

Also the substitution can be carried to the space of broken lines. Set
χ̂(1a ) = u1 · · · uχ(xi ) , uj is parallel to 1χ(a)j ,
and extend this to a broken line L by concatenating the broken arcs χ̂(xi )
such that χ̂(x1 ) starts at the origin and χ̂(xi ) and χ̂(xi+1 ) have a boundary
point in common, namely the vector (|χ(x1 · · · xi )|a )a∈A . It also follows that
(4.17) h ◦ π = π ◦ χ̂, h = A|V : V → V.
Theorem 4.39. The map π̂ : orbσ (ρ) → R ⊂ V defined by π̂ ◦ σ n (ρ) = π ◦
n (ρ) extends continuously to Xρ and commutes with the piecewise translation
(4.18) T : R → R, y → y + π(1a ) if y ∈ p([a]).
In particular, p(Xρ ) = R. In fact, T is a group translation on V /Λ for some
lattice Λ. If A is unimodular, then R is a fundamental domain of Λ, and
π̂ : Xρ → R % V /Λ is a measure-theoretic isomorphism.

Note, however, that T is multivalued at points in π̂([a])∩ π̂([a ]), a = a ∈

A. Under the assumption that A is unimodular, the sets π̂([a]) only overlap
at common boundary points, and π̂([a]) and π̂([a ]) have disjoint interiors for
a = a ; see the different gray-tones in Figure 4.4. Arnoux & Ito [33, Theorem
2] (see also [242]) proved that for unimodular Pisot substitutions, π̂([a]) ∩
π̂([a ]) have zero Lebesgue measure, and hence (Xρ , σ) with respect to its
unique invariant measure is isomorphic to (R, T, Leb). In this case, the map
T can be properly called a domain exchange transformation, since the T -
images of the sets π̂([a]) are disjoint up to sets of Lebesgue measure zero.
In general, Rauzy fractals need not be connected or simply connected and
neither does their boundary have zero Lebesgue measure; see [66, 335] for
more information. If A is not unimodular, π fails to be one-to-one for a.e. x ∈
Xρ . In this case, according to Halmos & von Neumann’s Structure Theorem
6.100, the group translation that (Xρ , σ) is isomorphic to is a solenoid (skew-
product of a Cantor set and a d − 1-dimensional torus); see [44, 53, 70].

Proof of Theorem 4.39. The main step is showing that p is uniformly

continuous on orbσ (ρ), so it has a unique continuous extension Xρ = orbσ (ρ),
and then (4.18) follows directly from (4.16).
Since Xρ is recognizable, there is N such that every word w ∈ L(ρ) of
length |w| ≥ N is a subword of the χ-image of some unique v, shortest in the
sense that w is not a subword of χ(v  ) for every proper subword v  of v. Now
suppose n1 < n2 are such that d(σ n1 (ρ), σ n2 (ρ)) = 2−n for some n ≥ N , so
ρn1 +1 · · · ρn1 +n = ρn2 +1 · · · ρn2 +n but ρn1 +n+1 = ρn2 +n+1 .
We can take the inverse of χ on ρn1 +1 · · · ρn2 +n , i.e. find m1 maximal and
m2 +m minimal such that ρn1 +1 · · · ρn2 +n is a subword of χ(ρm1 +1 · · · ρm2 +m )
156 4. Subshifts of Zero Entropy

and ρm1 +1 · · · ρm1 +m = ρm2 +1 · · · ρm2 +m . Note that n2 − n1 ≈ λ(m2 − m1 )

and n ≈ λm.
Continue this way until the common length of the two coinciding words
drops below N . That is, we find k ∈ N, l1 maximal and l2 + l minimal such
that ρn1 +1 · · · ρn2 +n is a subword of χk (ρl1 +1 · · · ρl2 +l ) and ρl1 +1 · · · ρl1 +l =
ρl2 +1 · · · ρl2 +l . Also n2 − n1 ≈ λk (l2 − l1 ) and n ≈ λk l ≤ λk N , and
there is an integer K % λk such that ρn1 +1 · · · ρn1 +n starts at the K-th
letter of χk (ρl1 +1 · · · ρl1 +l ) and ρn2 +1 · · · ρn2 +n starts at the K-th letter of
χk (ρl2 +1 · · · ρl2 +l ).
Since A ◦ π ◦ L = π ◦ χ̂ ◦ L = π̂ ◦ χ by (4.17) and V is the contracting
hyperplane of A, we have
* n *
* 2 *
* *
&π̂(σ (ρ)) − π̂(σ (ρ))& = *
n2 n1
p(1ρi ) *
* *
i=n1 +1
* *
* l2 *
* *
= * p ◦ σ ◦ χ̂(1ρi ) *
* i=l1 +1 *
* ⎛ ⎞ *
* *
* K l2
= C *T ◦ A k⎝ ⎠ K  *
π̂(1ρi ) − T (0)*

* i=l1 +1 *
≤ Ck m |μ|k ,

where C is a uniform constant and m is the multiplicity of the second largest

eigenvalue μ of A. If k is sufficiently large, we have Ck m |μ|k ≤ |μ|k/2 = (λk )α
for α = 2 log λ
log |μ| < 0. Because n ≈ λ l ≤ λ N , this gives
k k

−α α − log(d(σ n1 (ρ) , σ n2 (ρ)))
&π̂(σ (ρ)) − π̂(σ (ρ))& ≤ λ
n2 n1 kα
≤N n = .
N log 2

This implies the required uniform continuity of π̂ : orbσ (ρ) → V and allows
us to extend π̂ continuously to Xρ .
However, on each domain π̂([a]), the translation vector πa := π(1a ) is
different. When we divide the hyperplane V by a well-chosen lattice Λ, these
translation vectors become the same. That is, we need πa − πa to be lattice
points for all a, a ∈ A = {0, . . . , d − 1}. The simplest way of achieving this
is by letting Λj = πj − πj−1 , for j ∈ {1, . . . , d − 1}, be the vectors spanning
Λ. Let us compute the πi more explicitly. Let uj , j ∈ {0, . . . , d − 1}, be the
(generalized) right eigenvectors of A, where u0 is associated to the leading
eigenvalue λ. Since the uj are the columns of U in the Jordan decomposition
A = U JU −1 , we have ej = (U −1 U )j = d−1 i=0 uij  ui where U −1 = (u−1 d−1
ij )i,j=0 .
4.2. Substitution Shifts 157

Hence πj = ej − u−1
u0 = i=1 u−1 ui , and

Λj = ej − ej−1 − (u−1 −1

0,j − u0,j−1 )
u0 for 1 ≤ j ≤ d − 1.
Each πj has rationally independent coordinates, and therefore T acts as
a minimal map on the quotient space V /Λ, which is a d − 1-dimensional

The same proof shows that the substitution χ acts as a contraction on R,

with contraction factor ≈ |μ|: χ̂(0) = 0 and &χ̂(y1 ) − χ̂(y2 )& ≤ |μ| &y1 − y2 &
assuming that μ has geometric multiplicity 1.
Rauzy fractals of Pisot substitutions can be seen as attractors of iterated
function systems (IFS) in the sense of Hutchington [326], or rather graph-
directed IFS defined by a kind of transition graph called the prefix-suffix
graph, as introduced in [138]. The vertices of this graph are labeled by
the letters of A and there is an arrow i → j labeled (p, i, s) ∈ A∗ × A × A∗
(p = prefix, s = suffix), for every occurrence of i, j ∈ A such that χ(j) = pis;
see Figure 4.5. In particular, the transition matrix of the prefix-suffix graph
is A, the number of incoming arrows to vertex j is |χ(j)|, and the label of
each i → j reads χ(j) if we ignore the commas and the empty words .
⎧ ⎧

⎨0 → 012, ⎪
⎨0 → 02,
χ : 1 → 0, χ : 1 → 0,

⎩ (, 0, 12) ⎪
⎩ (, 0, 2)
2→1 2→1

(01, 2, ) (, 0, ) (0, 2, ) (, 0, )

2 0 1 2 0 1
(0, 1, 2)
(, 1, ) (, 1, )

Figure 4.5. Prefix-suffix graphs for two substitutions.

Theorem 4.40. Let A be the matrix associated to an irreducible unimodular

Pisot substitution χ and let h = A|V be the map restricted to its contracting
eigenspace. The subtiles R(i) := {π ◦ n−1 : ρn = i} of the Rauzy graph R
⎛ ⎞
R(i) = h(R(j)) + π̂(p), for π̂(p) = π ⎝ 1pk ⎠ .
(p,i,s) k=1

The result was first shown for general irreducible unit Pisot substitutions
by Sirvent & Wang [515], although special cases were around, see e.g. [33,
125, 329]. In particular, Arnoux & Ito [33] gave a condition under which
158 4. Subshifts of Zero Entropy

the tiles R(i) overlap at most on a null-set. We follow the proof presented
in [69], which is Chapter 5 in [68].

Proof. Recall that L = {i }i≥0 is the broken line associated to the fixed
point ρ of the Pisot substitution χ. The subtile R(i) is the closure of the
points {π ◦ n−1 : ρn = i}. Since χ(ρ) = ρ, for each such n, there is m such
that ρ1 · · · ρn = χ(ρ1 · · · ρm )p, where ρm = j and χ(j) = pis. By (4.17), we

π(n−1 ) = π(|χ(ρ1 ···ρm−1 )| ) + π̂(p) = h ◦ π(m−1 ) + π̂(p), ρm = j.

Taking the union of such points for all n with ρn = i and then taking the
closure, we arrive at

R(i) ⊂ h(R(j)) + π̂(p),


where i −→ j are the labeled arrows of the prefix-suffix graph. Now h con-
tracts the d − 1-dimensional Lebesgue measure Leb of V by a factor 1/λ
because A is a unimodular Pisot matrix. Therefore, writing wi = Leb(R(i)),
we obtain that

(4.19) λwi ≤ wj = aij wj for every i ∈ A.

(p,i,s) j∈A

Here A = (aij ) is both the associated matrix of the substitution χ and the
transition matrix of the prefix-suffix graph. However, the Perron-Frobenius
Theorem 8.58 (part (c)) tells us that if A is a non-negative matrix with
leading eigenvalue λ and w a non-negative vector, then λw ≤ Aw coordinate-
wise (that is (4.19)) can only hold if w is a multiple of the leading
of A and then we have equality. Therefore λ Leb(R(i)) = (p,i,s) Leb(R(j))
for every i ∈ A, and R(i) = (p,i,s) h(R(j)) + π̂(p) as claimed. 

4.2.5. S-adic Transformations. Instead of using a single substitution to

create an infinite word ρ ∈ AN , we can use a sequence of substitutions
χn : An → A∗n−1 , potentially between different alphabets An . Thus

(4.20) ρ = lim χ1 ◦ χ2 ◦ · · · ◦ χn (an ), an ∈ An .


A priori, the limit need not exist, or can depend on the choice of letters
an ∈ An , but if ρ exists and is an infinite sequence, then we have the following
4.2. Substitution Shifts 159

Definition 4.41. Let S be a collection8 of substitutions χ and choose χn ∈ S

such that alphabets match: χn : An → A∗n−1 . Assume that the sequence ρ
defined in (4.20) exists and is infinite, and let Xρ = orbσ (ρ). Then (Xρ , σ)
is called an S-adic shift.

The word S-adic was first used by Ferenczi [244] and the S in S-adic
stands for substitution. If the sequence (χn )n∈N itself is periodic, then the
S-adic shift reduces to a substitution; the reverse question of when S-adic
shifts are isomorphic to substitution shifts was addressed in [318].
The following simple set of conditions implies the existence of ρ: An =
A  0, an ≡ 0, and χn (0) starts with 0 for each n ∈ N. However, this
by itself doesn’t imply that (Xρ , σ) is minimal. We use a straightforward
generalization of Definition 4.14.

Definition 4.42. A sequence (χn )n∈N is called primitive9 if there is N

such that for all 0 ≤ m < n, a ∈ An+N , every b ∈ Am appears in χm+1 ◦
· · · ◦ χn+N (a).

If (Xρ , σ) is primitive, then it is minimal. Indeed, let ρ(m) := limn χm+1 ◦

χn (0), so χm (ρ(m+1) ) = ρ(m) . The primitivity implies that all letters a ∈ An
occur with bounded gaps ρ(n) , and hence words w = χm+1 ◦ · · · ◦ χn (a)
occur with bounded gaps in ρ(m) . This proves minimality. So far this is
[221, Lemma 7]. However, not every primitive S-adic subshift is linearly
recurrent, because the recurrence of two-letter words can be problematic.
For instance [221, Section 2 of the addendum], the substitutions on the
alphabet A = {0, 1, 2}
⎧ ⎧

⎨ 0 → 012, ⎪
⎨0 → 021,
χ : 1 → 012, and χ̃ : 1 → 121,

⎩ ⎪

2 → 002 2 → 012

always form primitive S-adic shifts because every letter occurs in every image
of every composition of two substitutions. The problem is the word 20
which only occurs when straddling the concatenated images of two words
χ1 ◦ · · · ◦ χn (a), a ∈ A. As a result, two appearances of 20 in χn ◦ χ̃(w)
are always 3n+1 places apart. Hence, to achieve linear recurrence, we need a
bound on the distance between occurrences of two-letter words, but this is

8 Some, but not all, authors require S to be finite. We will not require finiteness, because in

the few results where this requirement matters, it can easily be assumed separately.
9 In [70] a weaker notion of primitive is used, namely that for every m, there is n such that

χm+1 ◦ · · · ◦ χn has a strictly positive associated matrix. This is strong enough to conclude
minimality, but not for linear recurrence.
160 4. Subshifts of Zero Entropy

sufficient (see [221, Lemma 3.1 of the addendum]):

Lemma 4.43. Let (Xρ , σ) be an S-adic shift with a well-defined infinite
ρ, and take ρ(m) as below Definition 4.42. Define the gap-size g (m) (j) =
(m) (m) (m) (m)
min{i ≥ 1 : ρj ρj+1 = ρj+i ρj+i+1 }. If

D := sup{g (m) (j) : j ≥ 1, m ≥ 0} < ∞,

then (Xρ , σ) is linearly recurrent.

Proof. First, recalling N from the definition of primitivity, we can define

K1 := max{|χn ◦ · · · ◦ χn+N (a)| : n ∈ N, a ∈ An+N }
K2 := min{|χn ◦ · · · ◦ χn+N (a)| : n ∈ N, a ∈ An+N } > 0.
Hence, for all m ≤ m + N ≤ n and a, b ∈ An ,
|χm+1 ◦ · · · ◦ χn (a)| |χ1 ◦ · · · ◦ χn−N (χn−N +1 ◦ · · · ◦ χn (a))| K1
≤ ≤ =: K.
|χ1 ◦ · · · ◦ χn (b)| |χ1 ◦ · · · ◦ χn−N (χn−N +1 ◦ · · · ◦ χn (b))| K2
Let u ∈ L(Xρ ) such that |u| ≥ min{χ1 ◦ · · · ◦ χN (a), a ∈ AN } and let v = wu
be a return word to u; see Definition 4.2. Take N  > N such that u is a
subword of χ1 ◦ · · · ◦ χN  (ab) for some a, b ∈ AN  . Then
|v| ≤ D max {|χ1 ◦ · · · ◦ χn (c)|}

≤ D K min {|χ1 ◦ · · · ◦ χN  (c)|}

≤ D K min{|χN  (c)| : c ∈ AN  } · min {|χ1 ◦ · · · ◦ χN  −1 (c)|}
c∈AN  −1

≤ DK 2
min {|χN  (c)| : c ∈ AN  } · max {|χ1 ◦ · · · ◦ χN  −1 (c)|}
c∈AN  −1 c∈AN  −1

≤ D K 2 min{|χn (c)| : c ∈ An , n ∈ N}|u|.

This gives linear recurrence with constant L = DK 2 min{|χn (c)| : c ∈
An , n ∈ N}. 

Verifying that D in Lemma 4.43 is finite can be easily done in many

cases. Durand gave a general condition equivalent to linear recurrence.
Definition 4.44. A substitution χ is called proper if there exist two letters
b, e ∈ A such that for every a ∈ A, χ(a) starts with b and ends with e.
Theorem 4.45. The sequence ρ is produced by a proper primitive S-adic
system if and only if (Xρ , σ) is linearly recurrent.

Proof. For the proof, see [221, Proposition 1.1 of the addendum]. 
4.2. Substitution Shifts 161

It follows that primitive S-adic shifts have sublinear word-complexity.

Extending Mossé’s [426] results that for substitution shifts p(n + 1) − p(n)
is bounded, Durand showed that p(n + 1) − p(n) is bounded for primitive
S-adic shifts as well. Sturmian shifts have p(n + 1) − p(n) ≡ 1; see Def-
inition 4.60. They are indeed S-adic as explained in Section 4.3.5 and in
our next Example 4.46. More generally, the symbolic itinerary space coming
from a d-interval exchange transformation has p(n + 1) − p(n) ≡ d − 1; see
Section 4.4. Host conjectured that p(n + 1) − p(n) is bounded for a subshift
(X, σ) if and only if X is S-adic. Durand’s result gives the “if” part, but the
“only if” part was disproved by Ferenczi [244]. Cassaigne [146] showed that
every sequence in {0, . . . , d − 1}N (no matter what its word-complexity is)
can be written as an S-adic transformation on alphabet {0, . . . , d − 1}. This
is a somewhat stronger form of the S-adic complexity conjecture. See also
[228] for the question of under what additional condition can we conclude
that a minimal subshift (X, σ) is S-adic if and only if its word-complexity
p(n) is sublinear.

Example 4.46. Sturmian shifts can be represented as S-adic shifts; see

Section 4.3.5 for details. Consider the substitutions
0 → 0, 0 → 01,
χ0 : and χ1 :
1 → 10 1 → 1;

see (4.31). By themselves they are not primitive, neither their iterates χa0
and χa1 , but
0 → χ1 (0) = 01, 0 → χ0 (0a 1) = 0a 10,
χ1 ◦ χa0 : and χ0 ◦ χa1 :
1 → χ1 (1a 0) = 1a 01 1 → χ0 (1) = 10

are primitive. The limit sequence ρ = limn χa01 ◦ χa12 ◦ · · · ◦ χa1n (0) is linearly
recurrent if and only if (an )n∈N is a bounded sequence. Since all Sturmian
sequences can be found this way, where the corresponding frequency α has
continued fraction expansion α = [0; a1 , a2 , . . . ], Sturmian sequences are lin-
early recurrent if and only if α is of bounded type; see Durand [221, Propo-
sition 10 and Proposition 5.1 of the addendum]. Note that {χ0 , χ1 } is not a
collection of proper substitutions; a proper S-adic representation of Sturmian
sequences was given in [179].
Well before Durand’s work, it was shown by Mignosi [418] and [20,
Theorem 10.6.1] that Sturmian sequences are k-power-free for some k ∈ N
if and only if the corresponding frequency α has a continuous fraction of
bounded type. Of course, one direction follows, because if α is of unbounded
type, say ark > k for arbitrary k, then (4.34) shows the occurrence of a
162 4. Subshifts of Zero Entropy

k-power χa01 ◦ χa12 ◦ · · · ◦ χ0 k (b) for b = 0, 1 depending on whether rk is even
or odd. Mignosi’s proof shows that there are no unexpected k-powers for
k > supn an .

In [70], many parts of the structure of Pisot substitutions and Rauzy

fractals are recovered for classes of S-adic shifts. A specific example of this,
presented in [34], is when the associated matrices of all the substitutions in
the collection are the same. Another example is provided by the so-called
Arnoux-Rauzy substitutions:
Example 4.47. In [35], Arnoux & Rauzy proposed a generalization of Ex-
ample 4.46 to describe translations on higher dimensional tori Td−1 and in-
terval exchange transformation on d intervals. These are the Arnoux-Rauzy
substitutions on the alphabet {0, 1, . . . , d − 1}:

i → i,
(4.21) αi :
j → ji for j = i.
Itineraries of IETs and torus rotations with respect to a natural partition
can be written as
(4.22) lim αi(1) ◦ αi(2) ◦ · · · ◦ αi(n) (0),

or with shifts σ interspersed as discussed in Section 4.3.5 for Sturmian shifts.

Arnoux & Rauzy also showed that for three-letter alphabets, every minimal
sequence with complexity p(n) = 2n+1 can be written as in (4.22). However,
the conjecture that every sequence produced by (4.22) is the itinerary for a
point in such a dynamical system was disproved in [149], by constructing
unbalanced Arnoux-Rauzy sequences. Note that itineraries of torus rotations
and IETs have to be balanced. There are even weakly mixing Arnoux-Rauzy
sequences; see [148]. Positive results on large classes of Arnoux-Rauzy se-
quences were achieved in [65]. In [70, Theorems 3.7 and 3.8] it is shown that
typical (in some sense) Arnoux-Rauzy sequences on three letters do corre-
spond to itineraries of torus translations and also that linearly recurrent
Arnoux-Rauzy shifts have pure point spectra; see Section 6.8.3.

Further results on S-adic shifts pertain to recognizability, e.g. Theorems

5.1 and 5.1 of [71], where it was also proved that a recognizable S-adic shift
has finite rank; cf. [71, Corollary 6.7].

4.3. Sturmian Subshifts

Sturmian sequences emerge as symbolic dynamics of circle rotations or simi-
lar systems. There are several textbook sources on the properties of Sturmian
sequence, e.g. [85, Chapter 1], [249, Section 6], and [20, Section 5.10]. There
4.3. Sturmian Subshifts 163


0 1 1

1 0

Figure 4.6. Sturmian sequences produced as intersections with hori-

zontal and vertical grid-lines (left) and billiards on a rectangular billiard
table (right).

are at least three equivalent defining properties, to which we will devote sep-
arate sections.
The name Sturmian was given by Morse & Hedlund [425], seemingly
because these sequences appear in connection with the work of the French
mathematician Jacques Sturm (1803–1855) on the number of zeroes that
sin(αx + β)π has in the interval [n, n + 1), but the sequences as such were
certainly not studied by Sturm. There are multiple other ways to obtain
Sturmian sequences. For instance, take a piece of paper with a square grid,
draw a line on it with slope α, and write a 0 whenever it crosses a horizontal
grid-line and a 1 whenever it crosses a vertical grid-line (see Figure 4.6, left).
Then we obtain a Sturmian sequence. Also, the trajectory of a billiard ball
moving frictionless on a rectangular billiard table can be coded symbolically
by writing a 0 for each collision with a long edge and a 1 for each collision
with a short edge (see Figure 4.6, right). If the motion is not periodic, then
the resulting sequence is Sturmian.
Equivalently, Sturmian sequences can be obtained as the difference se-
quence bn+1 − bn for a Beatty sequence bn = αn for some irrational
number α ∈ (0, 1). For irrational α > 1, we would obtain Sturmian se-
quences on a larger alphabet {0, 1 . . . , α}, but we will not address these in
this text.

4.3.1. Rotational Sequences.

Definition 4.48. Let Rα : S1 → S1 , x → x + α mod 1, be the rotation over
an irrational angle α. Let β ∈ S1 and build the itinerary i(x) = u = (un )n≥0

1 if Rαn (β) ∈ [0, α),
(4.23) un =
0 if Rαn (β) ∈/ [0, α).
Then u is called a rotational sequence.
164 4. Subshifts of Zero Entropy

Remark 4.49. The additional sequences obtained by taking the closure can
also be obtained by taking the half-open interval the other way around:

1 if Rαn (x) ∈ (0, α],
un =
0 if Rαn (x) ∈/ (0, α].

In either way, the resulting two-sided subshift (Xα , σ) for Xα = orbσ (u) is an
extension of (S1 , Rα ) where i : S1 → Xα is the inverse factor map i = ψ −1 .
Therefore the points xn = Rαn (0), n ∈ Z, have fibers ψ −1 (xn ) consisting of
two points, whereas #ψ −1 (x) = 1 for all other x. Thus (Xα , σ) is an almost
one-to-one extension of the circle rotation; see Section 2.3.1.

Lemma 4.50. Every rotational word u is palindromic: it contains palin-

dromes of arbitrary length.

Remark 4.51. A minimal palindromic shift (X, σ) is also mirror invariant

which means that if w1 w2 · · · wn ∈ L(X), then also L(X)  wn wn−1 · · · w1 .
It is an open question (posed by Surer) for which shifts mirror invariance
implies that the shift is palindromic. Especially for substitution shifts, this
question looks very interesting.

Proof. By symmetry, the two-sided itinerary of β := α/2 is a palindrome

entirely: un = u−n for all n ∈ Z. Since {kα + β mod 1}k is dense in the
circle and uniformly recurrent, every subword w1 w2 · · · wn in every itinerary
will have its reversed copy wn wn−1 · · · w1 in the same itinerary. 

Lemma 4.52. If w is a bi-special subword of a rotational sequence, then

it coincides with a prefix of the one-sided itinerary i(2α mod 1) of length
qn + aqn+1 − 2 for some n ∈ N and 0 ≤ a < an+1 , where pn /qn are the
convergents of the continued fraction expansion α = [0; a1 , a2 , a3 , . . . ] (see
Section 8.2).

Proof. Each subword w corresponds to a subinterval Jw of the circle, namely

the interval of points x such that i(x) starts with w. If w is left-special, so
0w and 1w are both allowed, then Rα−1 (Jw ) contains 0 or α in its interior.
In the former case, α ∈ Jw◦ , so not all x ∈ Jw have the same first letter in
their itinerary. Therefore α ∈ Rα−1 (Jw◦ ) and Rα2 (0) ∈ Jw◦ .
|w|+2 ˆ◦
Let Jˆw := Rα−2 (Jw◦ )  0. Now if w is also right-special, then Rα ( Jw ) =
|w| −(|w|+2)
Rα (Jw◦ )  0, and therefore y := Rα (0) ∈ Jˆw◦ . This means that y is a
preimage of 0 such that no preimage of 0 of lower order belongs to (0, y).
The points y with this property are ordered as in Figure 4.7, where the
numbers j refer to the points Rα−j (0). Therefore |w| + 2 = qn + aqn+1 and
the lemma follows. 
4.3. Sturmian Subshifts 165

qn−1 qn + qn−1 2qn + qn−1 = qn+1 0 qn+2 qn+1 + qn qn

Figure 4.7. Positions of the preimages of 0 under Rα that are closest to 0.

Exercise 4.53. Show that every bi-special word of a rotational sequence (so
Sturmian sequence by Lemma 4.63) is a palindrome.

We discuss the work of Denjoy [193] on circle homeomorphisms, specifi-

cally Denjoy circle maps with only one orbit of maximal wandering intervals.
They have minimal sets that are exactly conjugate to Sturmian shifts.
Theorem 4.54 (Denjoy). The rotation number
F n (x) − x
ρ(f ) = lim mod 1
n→∞ n
of a circle homeomorphism f : S1 → S1 exists independently of x (and the
convergence is uniform). Here F : R → R is a lift of f , i.e. a continuous
map of the universal cover R of S1 such that F (x) mod 1 = f (x mod 1).
• ρ(f ) = q ∈ Q (in lowest terms) if and only if f has a q-periodic
• if ρ = ρ(f ) ∈
/ Q, then f is semi-conjugate to the rotation Rρ : h◦f =
Rρ ◦ h. In fact, h is a conjugacy if and only if f is minimal.

For the proof we refer to [414, Chapter I, Theorem 2.1], but let us give
some details on how non-minimal circle homeomorphisms f with irrational
rotation numbers can be constructed. Start with the rotation Rρ : S1 → S1
and select some x1 ∈ S1 (or any finite or countable set of points xj ∈ S1
having disjoint orbits under Rα ). For each k and n ∈ Z, replace Rρn (xk ) by
a closed interval Ik,n of length 2−(k+|n|) ; this creates a new circle K with
circumference 1 + k n∈Z 2−(k+|n|) = 1 + 3 k 2−k . Define f : Ik,n →
Ik,n+1 as an 
affine (or any orientation-preserving) homeomorphism, and for
all x ∈ S1 \ k,n Rρn (xk ) set f (x) = Rρ (x). Then f : K → K is indeed a
homeomorphism, and h : K → S1 ,

Rρn (xk ) if x ∈ Ik,n ,
(4.24) h(x) =
x otherwise,
is a semi-conjugacy; see Figure 4.8. Such circle homeomorphisms f are
called Denjoy circle maps. There is some restriction on how smooth such
homeomorphisms can be. Denjoy proved that if f is a C 1 diffeomorphism
166 4. Subshifts of Zero Entropy

f :K→K Ik,2
h(Ik,2 )

Rρ : S1 → S1
Ik,1 •h(Ik,1 )

Ik,0 •h(Ik,0 )

Ik,−1 •h(Ik,−1 )

h(Ik,−2 )

Figure 4.8. The semi-conjugacy h from a Denjoy circle map to a rotation.

such that log f  has bounded variation10 , then f is minimal. On the other
hand, for every γ ∈ [0, 1), there are C 1+γ Denjoy circle maps; see [309].
Take Rρ , split open the orbit of 0, replacing the points Rρn (0) by intervals
 denote the corresponding Denjoy circle map by f : K → K. Then
In , and
K \ n In◦ is a minimal Cantor set. If we code [sup I0 , inf I1 ] ∩ X by 1 and
[sup I1 , inf I0 ] ∩ X by 0, then the coding map i : X → {0, 1}Z is precisely a
conjugacy between (X, f ) and a two-sided rotational shift Xρ with frequency
ρ = ρ(f ).
If we split open S1 only along the backward orbit of 0, then the map f
is not invertible at α, and we obtain a one-sided rotational shift.
Remark 4.55. In this construction, we have split open only a single orbit,
and this leads to a rotational shift. It is of course possible to split open the
circle at several orbits. This still leads to an almost one-to-one extension of
the circle map, but no longer to a rotational shift of Definition 4.48. The
following result on amorphic complexity holds for these more general Denjoy
examples, and the proof given works in this generality. An easier proof for
rotational shifts is given in [288].
Theorem 4.56. The amorphic complexity of any non-periodic two-sided ro-
tational subshift (Xρ , σ) is 1. Equivalently, ac(f ) = 1 for any Denjoy circle
map f : K → K.

Proof. Since
 the two-sided shift σ : Xρ → Xρ is conjugate to f : C → C for
C = K \ k,n Ik,n◦ , it suffices to show that ac(f | ) = 1.

Take three points ξ1 , ξ2 , ξ3 ∈ k,n Ik,n such that d(h(ξi ), h(ξj )) ≥ 14 for
i = j. Let δ := min{|Ik,n | : Ik,n  ξj for some j} be the minimal length of
the intervals corresponding to the ξi ’s.
10 For the definition of variation, see before Theorem 8.42
4.3. Sturmian Subshifts 167

Since h( k,n Ik,n ) is a countable set, we can take N := 1/v points in
C such that S := {h(xi ) : i = 1, . . . , N } is an equidistant lattice in S1 with
minimal mutual distance 1/N . Set J = [xi , xj ] for some i = j, ordered in
such a way that |h(J)| < 12 . Whenever Rρn (h(J))  ξ1 , |f n (J)| ≥ δ, but
S1 \ Rρn (h(J)) has length ≥ 1/2, so it must contain ξ2 and/or ξ3 . Therefore
also |K \ f n (J)| ≥ δ, and thus d(f n (xi ), f n (zj )) ≥ δ.
1 1
lim #{0 ≤ k < n : Rnk (h(J))  ξ1 } = Leb(h(J)) ≥ ≥ v,
n→∞ n N
we obtain lim supn n1 #{0 ≤ k < n : d(hk (xi ), hk (xj )) ≥ δ} ≥ v, so S is
(δ, v)-separated. We have #S ≥ v1 − 1 and therefore ac(f ) ≥ 1.
Now for the other direction, we will use (δ, v)-spanning sets; see Re-
mark 2.56. For v ∈ (0, 1], define a function ψv : S1 → [0, |K|], where |K| is
the circumference of K, as

ψv (x) = Leb(h−1 ([x, x + v])).

Note that d(x, y) ≤ Leb(h−1 ([h(x), h(y)])) (because d(x, y) measures the
shortest arc between x and y) and ψv (x) ≥ diam(h−1 ([x, x + v])) for all v
sufficiently small and x outside the countable set h( k,n Ik,n ). Therefore
ψv is an L1 -function. The Birkhoff Ergodic Theorem 6.13 implies that for
Leb-a.e. y ∈ S1 ,
(4.25) lim #{0 ≤ k < n : ψv (Rρk (y)) ≥ δ|K|} = Leb({ψv ≥ δ|K|}).
n→∞ n

We claim that mv := Leb({ψv ≥ δ|K|}) ≤ 2v(1/δ + 1). Indeed, if mv >

2v(1/δ + 1), then the set {ψv ≥ δ|K|} cannot be contained in the union of
at most 1/δ + 2 intervals of length v. Therefore there are Ñ = 1/δ + 2
points ξi ∈ S1 such that ψv (ξi ) ≥ δ|K| and of minimal mutual distance
d(ξi , ξj ) ≥ v. It follows that

Leb(h−1 ([ξi , ξi + v])) = ψv (ξi ) ≥ Ñ δ|K| ≥ (1 + δ)|K|,
i=1 i=1

contradicting that h−1 ([ξi , ξi + v]) consists of Ñ disjoint intervals inside a

circle of circumference |K|. This proves the claim.
Hence we can find a set S = {y1 , y2 , . . . , yN } for N = 1/v such that
h(S) is an equidistant lattice on S1 (with minimal mutual distance 1/N )
and (4.25) holds for every h(yi ). Without loss of generality, the yi ’s can be
arranged in circular order on K.
168 4. Subshifts of Zero Entropy

Now take y ∈ K arbitrary and i such that y ∈ [yi , yi+1 mod N ). Then
h(y) ∈ [h(yi ), h(yi ) + v) and d(f k (yi ), f k (y)) ≤ ψv (Rρk (h(yi ))). Therefore
lim sup #{0 ≤ k < n : d(f k (yi ), f k (y)) ≥ δ}
n→∞ n
≤ lim sup #{0 ≤ k < n : ψv (Rρk (h(yi ))) ≥ δ} = mv ,
n→∞ n

which means that S is (δ, mv )-spanning. Using the spanning set equivalent
of (2.8), we obtain
log 2v(1/δ + 1)
ac(f ) ≤ sup lim sup = 1,
δ|K|>0 v→0 − log v
and the result follows. 

4.3.2. Balanced Words. Another characterization of Sturmian words is

by means of their property of being balanced. Recall from Definition 4.36
that a language L ⊂ A∗ is R-balanced if for all a ∈ A, n ∈ N, and v, w ∈ Ln ,
the numbers |v|a and |w|a of letters a in v and w differs by at most R. Here,
we are only interested in the case R = 1, so balanced will mean 1-balanced
in this section.
Definition 4.57. Clearly, a balanced word x contains precisely one of 00
and 11 as factors (unless x = 10101010 · · · or x = 01010101 · · · ). We say
that a balanced word x ∈ {0, 1}N or Z is of type i if the word ii appears in
Lemma 4.58. Every rotational sequence is balanced.

Proof. An equivalent way to define a rotational sequence u is that there is

a fixed β ∈ S1 such that
(4.26) un = nα + β − (n − 1)α + β
for all n ∈ Z. This is easy to check, except that in order to include the se-
quences mentioned in Remark 4.49, we need to add the alternative definition
(4.27) un = nα + β − (n − 1)α + β
for all n ∈ Z. By telescoping series,
|uk+1 · · · uk+n |1 = (k + 1)α + β − kα + β
+(k + 2)α + β − (k + 1)α + β
+ · · · + (k + n)α + β − (k + n − 1)α + β
= (k + n)α + β − kα + β = nα or nα + 1
regardless of what k is. It follows that u is balanced. 
4.3. Sturmian Subshifts 169

Lemma 4.59. If X is an unbalanced subshift on alphabet {0, 1}, then there

is a (possibly empty) word w such that both 0w0, 1w1 ∈ L(X).

Proof. Let K be minimal such that there are K-words u = u1 · · · uK and

v = v1 · · · vK ∈ LK (X) such that | |u|1 − |v|1 | ≥ 2. Since |u|1 − |v|1 can
change by at most 1 if u, v are shortened or expanded by one letter, the
minimality of K implies that u = 0u2 · · · uK−1 0 and v = 1v2 · · · vK−1 1 (or
vice versa) and |u2 · · · uK−1 |1 = |v2 · · · vK−1 |1 . If u2 · · · uK−1 = v2 · · · vK−1 ,
then we have found our word w. Otherwise, take k = min{j > 1 : uj = vj }
and l = max{j < K : uj = vj }. We have four11 possibilities, all leading to
shorter possible words:
k l k l
u = 0 · · · 1 · · · 1 · · · 0, u = 0 · · · 1 · · · 0 · · · 0,
· · 0 · · · 1,
v = 1 · · · 0 · v = 1 · · · 0 · · · 1 ·
· · 1 ,
shorter u,v shorter u,v

k l k l
u = 0 · · · 0 · · · 1 · · · 0, u = 0 · · · 0 · · · 0 · · · 0,
· · 1 · · · 0 · · · 1,
v = 1 · v = 1 · · · 1 · · · 1 ·
· · 1 .
shorter u,v shorter u,v
This contradicts the minimality of K. The proof is complete, but note that
we have proved that |w| ≤ K − 2 as well. 

4.3.3. Sturmian Sequences.

Definition 4.60. A sequence u ∈ {0, 1}N or {0, 1}Z is called Sturmian if
it is recurrent under the shift σ, and the number of different words of length
n in u equals pu (n) = n + 1 for each n ≥ 0. Take the shift-orbit closure
X = orbσ (u). The corresponding subshift (X, σ) for X = orbσ (u) is called
a Sturmian subshift.
Remark 4.61. The assumption that u is recurrent is important for the two-
sided case. Also · · · 00000100000 · · · has p(n) = n + 1, but we don’t want
to consider such asymptotically periodic sequences. For one-sided infinite
words, the recurrence follows from the assumption that pu (n) = n + 1 for all
n ∈ N.
Remark 4.62. A Sturmian sequence contains exactly one left-special and
one right-special word of length n for each n ∈ N. If they coincide, then this
is a bi-special word; see Lemma 4.52.
Lemma 4.63. Every rotational sequence is Sturmian.
11 In the first case, the shorter words u, v are not necessarily of the form u = 0w0 and 1w1

yet, so the whole argument needs repeating.

170 4. Subshifts of Zero Entropy

Proof. Let i(x) denote the itinerary of x ∈ S1 w.r.t. {[0, α), [α, 1)}. If
ik (x) = ik (y) for 0 ≤ k < n, then Rαk (x) and Rαk (y) belong to the same
set [0, α) or [α, 1) for each 0 ≤ k < n. In other words, the interval [x, y)
contains no point in Qn := {Rα−k (α) : 0 ≤ k ≤ n}. But Qn consists of
exactly n + 1 points, and it divides the circle into n + 1 intervals. Each
such interval corresponds to a unique word of length n in the language, so
p(n) = n + 1. 
Example 4.64. This lemma depends crucially on the partition of S1 into
intervals [0, α) and [α, 1). If we take the intervals [0, γ) and [γ, 1) for some γ
rationally independent of α ∈ [0, 1] \ Q instead, then p(n) = 2n for all n ≥ 1.
Exercise 4.65. Given N ∈ N, find a subshift with complexity p(n) = 2n
for n ≤ N and p(n) = N + n for n ≥ N .
Theorem 4.66. A non-periodic sequence is Sturmian if and only if it is

Proof. Let x ∈ AN or AZ for A = {0, 1}.

⇐: We prove by contrapositive, so assume that there is a minimal N ∈ N
such that p(N ) ≥ N + 2. (Recall from Proposition 1.12 that p(N ) ≤ N
implies that x is periodic.) Since p(1) = #A = 2 and 00 and 11 cannot
both be words of x (otherwise it wouldn’t be balanced at word-length 2),
N ≥ 3. For every n < N − 1, there is one right-special word, but there are
two distinct right-special words, say u and v, of length N − 1. In particular,
u and v can only differ at their first symbol, because otherwise there are
two distinct right-special words of length N − 2. Hence there is w such that
0w = u and 1w = v. But since u and v are right-special, 0w0 and 1w1 are
both words in x, and x cannot be balanced.
⇒: Again, proof by contrapositive, so assume that p(n) = n + 1 for all
n ∈ N, but x is not balanced. Let N be the minimal integer where this
unbalance becomes apparent. We have p(2) = 3. Since both 01 and 10 occur
in x (otherwise it would end in 0∞ or 1∞ ) at least one of 00 and 11 cannot
occur in x, and hence N ≥ 3.
By Lemma 4.59, there is a word w = w1 · · · wN −2 such that both 0w0
and 1w1 occur in x.
Observe that w1 = wN −2 , because otherwise both 00 and 11 occur in x.
To be definite, suppose that w1 = wN −2 = 0.
If N = 3, then w1 = wN −2 , so w is a palindrome. If N ≥ 4, then
w2 = wN −3 because otherwise 000 and 101 both occur in x, contradicting
that N is the minimal length where the unbalance becomes apparent.
Continuing this way, we conclude that w is a palindrome: wk = wN −k−1
for all 1 ≤ k ≤ N − 2.
4.3. Sturmian Subshifts 171

Since p(N − 2) = N − 1 and w is bi-special, exactly one of 0w and 1w is

right-special. Say 0w0, 0w1, and 1w1 occur, but not 1w0.
Claim: If 1w1 is a prefix of the 2N − 2-word xj+1 · · · xj+2N −2 , then 0w
does not occur in this word.
Suppose otherwise. Since |1w1| = N and |0w| = N − 1, the occurrence
of 0w must overlap with 1w1, say starting at entry k. Then wk · · · wN −2 1 =
0w1 · · · wN −k−1 , so wk = 0 = 1 = wN −k−1 . This contradicts that w is a
palindrome and proves the claim.
Now xj+1 · · · xj+2N −2 contains N words of length N − 1, but not 0w,
according to the claim. This means that one of the remaining N − 1-words
must appear twice, and none of these words is right-special. It follows that
xj+1 · · · xj+2N −2 can only be continued to the right periodically, and p(n) ≤
N for all n. This contradiction concludes the proof. 
Proposition 4.67. If the infinite non-periodic sequence u is balanced, then
α := lim |u1 · · · un |1
n→∞ n
exists and is irrational. We call α the frequency of u.

Proof. Define
(4.28) Mn = min{|uk+1 · · · uk+n |1 : k ≥ 0}.
Since u is balanced, max{|uk+1 · · · uk+n |1 : k ≥ 0} = Mn + 1, so
|uk+1 · · · uk+n |1 = Mn or Mn + 1 for every k ∈ N.
For q, n ∈ N such that n > q 2 , we can write n = kq + r for a unique k ≥ q
and 0 ≤ r < q. We have
(4.29) kMq ≤ Mkq+r = Mn ≤ k(Mq + 1) + r.
Dividing by n gives
Mq − 1 Mn Mq 2
≤ ≤ + .
q n q q
As this holds for all q ≤ q 2 < n, we conclude that { Mnn }n∈N is a Cauchy
sequence, say with limit α.
Now to prove that α is irrational, assume by contradiction that α = pq
and take k = 2m , r = 0, and n = 2m q in (4.29) for increasing m ∈ N. This
Mq M2q M24q M2m q + 1 M2q + 1 Mq + 1
≤ ≤ ≤ ··· ≤ ≤ ≤ ,
q 2q 24q 2q 2q q
M m M2m q+1
so { 2m q }m is increasing and {
2 q
2 m q }m is decreasing in m. They converge
to q , so p = Mq or Mq + 1.
172 4. Subshifts of Zero Entropy

If p = Mq , then in particular M2m q = 2m Mq for all m ≥ 0. This

implies that every 2m q-word w with minimal |w|1 is in fact a concatena-
tion w1 w2 · · · w2M of q-words all with |wi |1 = Mq . Take a subword W :=
wm · · · wn containing all q-words such that wm+1 = wm = wn ; also, since u
is non-periodic, W can be taken non-periodic too. Therefore there exists a
v1 in W with |v1 |1 = Mq + 1; we take the leftmost. Since wm = wm+1 , this
word v overlaps with wm , and we can write W = w1 v1 v2 · · · vn−m wn , where
all vi are q-words, and w1 is a prefix of w1 and wn a suffix of wn such that
w1 wn = w1 . But this means that |W |1 ≥ qMq + 1, a contradiction.
Finally, if p = Mq +1, then we repeat this argument with a concatenation
W = wm · · · wn of q-words wi with |wi |1 = Mq + 1. This completes the

Lemma 4.68. If u and u ∈ {01}N or Z are balanced words with the same
frequency α, then u and u generate the same language.

Proof. From the proof of Proposition 4.67 we know that α ∈ ( Mnn , Mnn+1 )
and α ∈ ( Mnn , Mnn+1 ) where Mn and Mn are given by (4.28) for u and u ,
respectively. This implies that Mn = Mn for all n ∈ N. For each n ∈ N, u and
u each have only one right-special word in Ln (X); we first show that these
right-special words, say w and w , are the same. Assume by contradiction
that there is some minimal n such that w = w . Hence there is an n − 1-word
v such that w = 0v and w = 1v (or vice versa). But v is right-special, so
all four of 0v0, 0v1, 1v0, and 1v1 occur in the combined languages. But
then Mn+1 = |v|1 ≤ Mn+1  − 1, a contradiction. By uniform recurrence
(of minimal subshifts), every word of length n appears in any sufficiently
long word, specifically in every sufficiently long right-special word. But as
these right-special words of u and u are the same, u and u have the same
subwords altogether. 

We finish this section by proving the last implication for the three equiva-
lent characterizations of Sturmian sequences, due to Morse & Hedlund [425].

Theorem 4.69. Every Sturmian sequence is rotational.

Proof. Let u be a Sturmian sequence; by Theorem 4.66 it is balanced as well.

By Proposition 4.67, u has an irrational frequency α = limn n1 |u1 · · · un |1 ,
and by Lemma 4.68, every Sturmian sequence with frequency α generates
the same language as u. It is clear that the rotational sequence vn = nα −
(n−1)α = i(0) (as in (4.26)) has frequency α. Therefore there is a sequence
bj such that σ bj (v) → u. By passing to a subsequence if necessary, we can
assume that limj Rαj (0) = β. Then (assuming that nα + β ∈ / Z, so we can
4.3. Sturmian Subshifts 173

use continuity of x → x at this point),

un = lim (σ bj v)n = lim (n + bj )α − (n + bj − 1)α
j→∞ j→∞
= nα + β − (n − 1)α + β = i(β).
If nα + β ∈ Z, then we need to take the definition (4.27) into account. Note,
however, that since α ∈
/ Q, this occurs at most for one value of n ∈ Z. This
proves the theorem. 

4.3.4. Rauzy Graphs. The Rauzy graph Γn of a Sturmian subshift X is

the word-graph in which the vertices are the words u ∈ Ln (X) and there is
an arrow u → u if ua = bu for some a, b ∈ {0, 1}. Hence Γn has p(n) = n+1
vertices and p(n + 1) = n + 2 edges; it is the vertex-labeled transition graph
of the n-block shift interpretation of the Sturmian shift.

110110 011011

010110 101101 011010

101011 110101

Figure 4.9. The Rauzy graph Γ6 based on the Fibonacci Sturmian sequence.

In the example of Figure 4.9, the word u = 101101 is bi-special, but

only 0u0, 0u1, and 1u0 ∈ L(X) (i.e. u is a regular bi-special word). Since
p(n + 1) − p(n) = 1 for a Sturmian sequence, every Rauzy graph contains
exactly one left-special and one right-special word, and they may be merged
into a single bi-special word. Hence, there are two types of Rauzy graphs;
see Figure 4.10.

left-special right-special bi-special

Figure 4.10. The two types of Rauzy graphs for a Sturmian sequence.
174 4. Subshifts of Zero Entropy

The transformation from Γn to Γn+1 is as follows:

(a) If Γn is of the first type, then the middle path decreases by one
vertex, and the upper and lower path increase by one vertex. The
left-special vertex of Γn splits into two vertices, with outgoing ar-
rows leading to the previous successor vertex which now becomes
left-special. Similarly, the right-special vertex of Γn is split into two
vertices with incoming arrows emerging from the previous predeces-
sor vertex, which now becomes right-special.
(b) If Γn is of the second type, then one of the two paths becomes the
central path in Γn+1 , the other path becomes the upper path of
Γn+1 , and there is an extra arrow in Γn+1 from the right-special
word to the left-special word. Thus the bi-special vertex of Γn is
split into two vertices, one of which becomes left-special in Γn+1 ,
and one of the predecessors of the bi-special vertex in Γn becomes
right-special in Γn+1 .
We can combine all Rauzy graphs into a single inverse limit space
Γ=← −(Γn , πn ) = {(γn )n≥0 : πn+1 (γn+1 ) = γn ∈ Γn for all n ≥ 0},
where Γ0 has only one vertex  and one arrow  →  and πn+1 : Γn+1 → Γn
is the prefix map γn+1 (ua) = u for every u ∈ Ln (X), a ∈ A. It is the inverse
of the map described in items (a) and (b) above. By definition, the arrows
in Γn have the following property called edge surjective:
If v → v  is an arrow in Γn−1 , then there is an arrow u → u
in Γn such that πn (u) = v and πn (u ) = v  .
It also has the property called positive directional:
If u → u and u → u are two arrows in Γn starting at the
same vertex, then πn (u ) = πn (u ); that is, πn maps these
arrows to the same arrow in Γn−1 .
Equipped with product topology, Γ is a Cantor set. We can define a map
f : Γ → Γ by “moving one step” along the arrows:
(4.30) f (γ)n = u if u = γn ∈ Γn and there is an arrow u → u .
It seems as if this definition is ambiguous, but by the positive directional
property, all choices for f (γ)n+1 give the same answer for f (γ)n . Since this
holds for all n ≥ 0, the sequence γ = (γn )n≥0 completely determines f (γ).
Moreover, f : Γ → Γ is continuous. The system (Γ, f ) is called the graph
cover of the shift (X, σ); it provides a model that is conjugate to the shift.
(In order to agree with other graph cover constructions, we can speed up this
process by “telescoping” between n’s where there is a bispecial word.) This
point of view, which holds for shifts in general and in fact for all continuous
4.3. Sturmian Subshifts 175

Cantor systems, was introduced by Gambaudo & Martens [273] and studied
by several authors, especially Shimomura; see e.g. [500–502].
Theorem 4.70. For each k ∈ N, there are at most three values that the
lim #{1 ≤ j ≤ n : xj+1 · · · xj+k = w}
n→∞ j
can take for a k-word w in a Sturmian sequence x. These three values depend
only on k and the rotation angle α.
Remark 4.71. This is the symbolic version of what is known as the Three
Gap Theorem which was conjectured by Hugo Steinhaus and eventually
proven by Vera Sós [520, 521]:
For every α ∈ R \ Q and n ∈ N, the set {jα mod 1}n−1 j=0
divides the circle into intervals of at most three different
Indeed, since Lebesgue measure is the only invariant probability measure
that is preserved by the rotation Rα : x → x + α mod 1, the frequencies
in Theorem 4.70 corresponds to the Lebesgue measures (i.e. length) of the

Proof. This is a special case of the more general statement that the fre-
quency can take at most 3(p(n + 1) − p(n)) values, which we will prove here.
For Sturmian sequences 3(p(n + 1) − p(n)) = 3.
Let n ∈ N be arbitrary and let Γn be the word-graph (Rauzy-graph)
of the language. For every vertex a ∈ Γn let a− and a+ be the number of
incoming and outgoing arrows. That is, a is left-special, resp. right-special,
if a− ≥ 2, resp. a+ ≥ 2.
Let V1 = {a ∈ Γn : a+ ≥ 2} be the collection of right-special words of
length n. Then
#V1 = 1≤ (a+ − 1) ≤ p(n + 1) − p(n).
a+ ≥2 a+ ≥2

Next set V2 = {a ∈ Γn : a+ = 1, a → b, b− ≥ 2}. These are the words

a = a0 c that can be extended to the right in a unique way, say a0 can+1 , but
b = can+1 is left-special. We have
#V2 ≤ b− = (b− − 1) + 1 ≤ 2(p(n + 1) − p(n)).
b− ≥2 b− ≥2 b− ≥2

Now every a ∈ Γn \ (V1 ∪ V2 ) is right-special, and if a → b, then b is left-

special. This means that a and b appear with the same frequency in infinite
words x ∈ X. Every maximal path  ⊂ Γn \(V1 ∪V2 ) is succeeded by a vertex
v ∈ V1 ∪ V2 (because otherwise the last vertex of  belongs to V2 ), and no
176 4. Subshifts of Zero Entropy

other such maximal path is succeeded by v. The vertex in V1 ∪ V2 succeeding

 has then the same frequency as the elements in . Therefore, the number
of different frequencies is bounded by #(V1 ∪ V2 ) ≤ 3(p(n + 1) − p(n)) as

According to this proof applied to Figure 4.10, the upper and lower arrow
indicate two maximal paths, both with an extra vertex in V2 , with distinct
frequencies. The middle path (including the left-special and right-special
word) has the sum of these frequencies. In the right panel, the bi-special
word is the unique vertex that occurs with the sum of the frequencies.

4.3.5. Sturmian Sequences and Substitutions. In this section we will

show that every Sturmian shift (Xα , σ) is generated as an S-adic shift based
on two substitutions; the order in which these substitutions, χ0 and χ1 , are
applied is determined by the continued fraction expansion (see Section 8.2) of
rotation number α. Every sequence s ∈ Xα can be generated by using these
two substitutions interspersed with shifts. This idea goes back to Morse &
Hedlund [425]. Many more details are given by Arnoux in [249, Section 6.3]
and [31, 179]. Contrary to Arnoux’s exposition, we prefer to use one-sided
Sturmian sequences only.
Define the substitutions
0 → 0, 0 → 01,
(4.31) χ0 : χ1 :
1 → 10, 1 → 1.
Lemma 4.72. Suppose that s = s1 s2 s3 · · · is not left-special: for only one
symbol a0 ∈ {0, 1}, the finite prefixes of only a0 s occur in s. Then there is a
unique sequence

s = χi (t) if a0 s1 = χi (b0 ),
(4.32) t = t1 t2 t3 · · · =: Φ(s) such that
s = σ ◦ χi (t) if a0 s1 = χi (b0 ),
for some b0 ∈ {0, 1}. Moreover, s is Sturmian if and only if t is Sturmian.

If s is left-special, then the first symbol t1 is not uniquely determined.

The only left-special right-infinite Sturmian words (when viewed as rotation
sequences) are the itineraries of α and 2α.
Before proving this, note that if s = χi (t) or s = σ ◦ χi (t) for i ∈ {0, 1},
then more than half of its symbols are equal to i. We call this symbol the
type of s; see Definition 4.57. This gives an immediate way to determine if
the inverse of χ0 or χ1 is applied in Φ.

Proof. Assume that s is of Type 0 (the proof for other type goes likewise).
Note that since 11 doesn’t appear in s, it has a unique 1-cutting into
words 0 and 10. The choice of whether in this 1-cutting s1 is a block by
4.3. Sturmian Subshifts 177

itself or the second letter of a block is determined by the symbol a0 that can
be put in front of s is 0 or 1. With this caveat, the choice of t is unique.
Now for the second statement, suppose by contradiction that s is not
Sturmian (and hence not balanced; see Theorem 4.66); then by Lemma 4.59,
there is a word w such that both 0w0 and 1w1 appear in s. Since s doesn’t
contain 11, w = 0v0 for some v, and v0 = χ0 (v  ). Now 10v01 is the prefix of
χ0 (1v  1) and 00v00 is the prefix of χ0 (10v  0) or χ(00v  0). This means that
both 0v  0 and 1v1 are factors of t, so t is not balanced.
For the converse, suppose by contradiction that t is not Sturmian (hence
not balanced) and that w is such that 0w0 and 1w1 both appear in t. Then
a0w0 appears as a factor too for some a ∈ {0, 1}, unless 0w0 only appears in
t as initial word. Let w = χ0 (w). Now χ0 (1w1) = 10w 10 and χ0 (a0w0) =
χ(a)0w 0. Because χ0 (a) ends in 0, both 10w 1 and 00w 0 appear in s, so s
is not Sturmian.
Finally, if 0w0 is the only prefix of t and doesn’t appear elsewhere in
t, then a = 0 and 0χ0 (0w0) = 00w 0 appears in s. As above, also 10w 1
appears in s, so s is not Sturmian. 

Recall that Xα is the space of one-sided Sturmian sequences with fre-

quency of the symbol 1 equal to α.
Proposition 4.73. If s ∈ Xα is a Sturmian sequence and t = Φ(s), then
t ∈ Xg(α) , where g : [0, 1] → [0, 1] is defined as
, α ∈ [0, 12 ),
(4.33) g(α) = 1−α
α , α ∈ ( 2 , 1].
2α−1 1

Proof. Let α be the frequency of symbols 1 in t; i.e.

α = lim |tk+1 · · · tk+n |1 ,
n→∞ n
uniformly in k ∈ N. If w = tk+1 · · · tk+n , then the limit frequency of 1’s in
χi (w) as n → ∞ is  
 ∈ [0, 2 ],
α 1
i = 0,
α = 1+α
2−α ∈ [ 2 , 1], i = 1.
1 1

Inverting this relation gives α = g(α). As we already know from Lemma 4.72
that a Sturmian s ∈ Xα can always be written as χi (t) or σ ◦ χi (t) for some
Sturmian sequence t, we have now also determined that t ∈ Xg(α) . 
Definition 4.74. If we iterate this procedure, then we get a sequence of
types (τj )j≥1 which is called the additive coding of the Sturmian shift Xα .
We can abbreviate this sequence as
τ1 τ2 τ3 · · · = 0a1 1a2 0a3 1a4 · · · ,
178 4. Subshifts of Zero Entropy

where the ai denote the lengths of the consecutive blocks of τj = 0 and

τj = 1. Here all exponents ai ≥ 1, except that a1 = 0 if α ∈ ( 12 , 1). This is
the multiplicative coding.

In particular, the fixed points of the S-adic shifts

ρb = lim χτ0 ◦ χτ1 ◦ · · · ◦ χτj (b)
(4.34) a
= lim χa01 ◦ χa12 ◦ · · · ◦ χ0n−1 ◦ χa1n (b),

for b ∈ {0, 1}, are sequences in Xα . To get all sequences s ∈ Xα , we need to

intersperse the χτj with shifts as indicated in (4.32).
Proposition 4.75. The frequency α ∈ (0, 1) of a Sturmian sequence s sat-
(4.35) α = [0; 1 + a1 , a2 , a3 , . . . ] := ,
1 + a1 + a + 1 1
a3 + 1
i.e. α has the continued fraction expansion of the multiplicative coding, after
adding 1 to a1 .
α |s1 ···sn |1
Proof. For simplicity, we take α irrational. Let θ = 1−α = limn→∞ |s1 ···sn |0
be the limit proportion of 1’s vs. 0’s in s. Then for s = Φ(s) we find the
relation between its proportion θ and θ as

0 (θ ) := 1+θ  if θ ∈ (0, 1), i.e. s is of Type 0,
θ= −1  
g̃1 (θ ) := θ + 1 if θ ∈ (1, ∞), i.e. s is of Type 1.

The iterates of these maps are g̃0−a1 (θ ) = a1θ+θ and g̃−a 2  
1 (θ ) = θ + a2 .
Therefore g̃a01 ◦ g̃1−a1 (x) = a + 1 1 is the first part of a continued fraction
1 a2 +x
expansion. Therefore, for any x ∈ (0, ∞),
−an−1 1
lim g̃−a1 ◦ g̃−a 2
◦ · · · ◦ g̃0 ◦ g̃−a n
(x) = lim
n→∞ 0 1 1 n→∞ a1 + 1
a2 +
. + an1+x
if θ ∈ (0, 1) (that is, s is of Type 0), and
−a 1
lim g̃−a1 ◦g̃−a 2
◦g̃−a 3
◦· · ·◦g̃−a n
◦g̃1 n+1 (x) = lim a1 +
n→∞ 1 0 1 0 n→∞ a2 + 1
a3 +
.+ a 1
n+1 +x

if θ ∈ (1, ∞) (that is, s is of Type 1). Transforming back to α = 1

1+1/θ and
using that a1 = 0 if s is of Type 1, we find (4.35). 
4.3. Sturmian Subshifts 179

The substitutions χ0 and χ1 are the symbolic versions of first return

maps of a circle rotation
Rα (x) = x + α mod 1 with symbol 1 if and only if x ∈ [0, α).
• If α ∈ (0, 12 ) (so s is of Type 0), then we take the first return map
to J0 := [2α, 1) ∪ [0, α) and rescale this interval to unit size. The
resulting rotation is
α α
R̃0 (x) = x + mod 1 with symbol 1 if and only if x ∈ 0, ;
1−α 1−α

see Figure 4.11 (left). If x ∈ J0 has R̃0 -itinerary t, then its Rα -

itinerary is χ0 (t). If x ∈ S1 \J0 , then i−1 (x)i0 (x) = 10 and Rα−1 (x) ∈
J0 . In this case, if t is the R̃0 -itinerary of Rα−1 (x), then the Rα -
itinerary of x is i(x) = σ ◦ χ0 (t).
• If α ∈ ( 12 , 1) (so s is of Type 1), then we take the first return map
to J1 := [α, 1) ∪ [0, 2α − 1) and rescale this interval to unit size. The
resulting rotation is
2α − 1 2α − 1
R̃1 (x) = x+ mod 1 with symbol 1 if and only if x ∈ 0, ;
α α

see Figure 4.11 (right). If x has R̃1 -itinerary t, then its Rα -itinerary
is χ1 (t). If x ∈ S1 \ J1 , then i−1 (x)i0 (x) = 01 and Rα−1 (x) ∈ J1 . In
this case, if t is the R̃1 -itinerary of Rα−1 (x), then the Rα -itinerary
of x is i(x) = σ ◦ χ1 (t).
In conclusion, this renormalization operation turns Rα into Rg(α) . To
obtain the itinerary of any other point x ∈ S1 , we need to apply shifts as in
the second line of (4.32) every time the renormalization image of x doesn’t
belong to J0 or J1 .

0 1 α 2α 0 1 0 1 2α − 1 α 0 1
[ )[ ) [ )[ )

1 0 1 0
[ )[ ) [ )[ )
α 2α−1
0 1−α 1 0 α 1

Figure 4.11. First returns of circle rotations for α < 2
(Type 0) and
α > 12 (Type 1).
180 4. Subshifts of Zero Entropy

Since the return maps are always to neighborhoods of 0 ∈ S1 , iterating

the above procedure gives the itinerary of 0:
i(0) = lim χτ0 ◦ χτ1 ◦ · · · ◦ χτj (1).

Example 4.76. For the Fibonacci rotation over α = (3 − 5)/2 (of Type
0), the itinerary of 0 is
t := i(0) = 10 01001 01001001 010010100100101001001 · · · = 10ρFib
where ρFib is the fixed point of the Fibonacci substitution χFib . At the same
t = lim χ0 ◦ χ1 ◦ · · · ◦ χ0 ◦ χ1 (0).
j pairs
√ the same token, the itinerary of 0 for the Fibonacci rotation with α =
( 5 − 1)/2 (of Type 1) is
t := i(0) = 10 10110 10110101 101101011011010110110 · · · = 10ρFib
where ρFib is the fixed point of the Fibonacci substitution χFib with inter-
changed symbols (χFib : 0 → 1, 1 → 10). However, there is no direct way of
writing χFib or χ2Fib as composition of χ0 and χ1 , but
ρ = lim χ0 ◦ σ ◦ χ1 ◦ χ0 ◦ χ1 ◦ · · · ◦ χ0 ◦ χ1 (0)
j pairs

ρ = lim χ1 ◦ σ ◦ χ0 ◦ χ1 ◦ χ0 ◦ · · · ◦ χ1 ◦ χ0 (1).
j pairs

There is a further argument, once the frequency of the Sturmian sequence

has been determined, to also determine for which point y ∈ S1 the Sturmian
sequence x = i(y), but we will skip these details; see [249, Section 6.3].

4.4. Interval Exchange Transformations

Interval exchange transformations (IET) were studied by Katok [345, 568]
and Sinaı̆, cf. [513], and hints towards such system appear earlier than that
(see [252]), in the context of billiard dynamics and flows on flat surfaces,
including translation surfaces. The subject was taken up for its own sake in
a series of papers by Mike Keane, in which minimality and unique ergodicity
were discussed. Together with Rauzy [351, Theorem 7] he showed that the
uniquely ergodic IETs form a residual set. The question of whether unique
ergodicity for IETs is a Lebesgue typical phenomenon (the Keane Conjecture)
inspired the creation of a wealth of new mathematics, in particular new
applications of Teichmüller theory, which is central in the solutions of Veech
[543] and Masur [410]. We won’t discuss this in this text (the monograph of
4.4. Interval Exchange Transformations 181

Viana [548] is strongly recommended), but we say some more about unique
ergodicity and counterexamples to unique ergodicity in Section 6.3.5.

Definition 4.77. A map T : [0, 1) → [0, 1) is called an interval exchange

transformation (IET) if there is a finite partition into half-open intervals
{Δi }di=1 such that T |Δi is a translation and T is invertible. That is, the
intervals Δi , i = 1, . . . , d, are mapped back into [0, 1) after a permutation
π : {1, . . . , d} → {1, . . . , d}. As a formula, with λi = |Δi |,

T (x) = x − λj + λj if x ∈ Δi = [γi−1 , γi ).
j<i π(j)<π(i)

Equivalently, writing γ0 = 0 and γi = j≤i λj , we have

T (x) = x − γi−1 + γπ(i)−1 if x ∈ Δi = [γi−1 , γi ).

Thus T is invertible but has discontinuity points at γi , i = 1, . . . , d − 1,

unless π(i − 1) = π(i) − 1. Every IET preserves Lebesgue measure and
is uniquely determined by the probability vector (λ1 , . . . , λd ) and the per-
mutation π. In fact, if λi = 0 or if π(i − 1) = π(i) − 1 for some i, then
T is degenerate, but there are advantages to including these cases in the
parameter space Σd × Sd , the d − 1-dimensional simplex times the group
of permutations of {1, . . . , d}. These degenerate cases are excluded by the
following type of irreducibility condition, that is usually made.

Definition 4.78. An interval exchange transformation on d intervals satis-

fies the Keane condition if T n (γi ) = γj for all i, j ∈ {1, . . . , d − 1} and
n ≥ 1. It is irreducible (or, rather, has an irreducible permutation) if
π({1, . . . , k}) = {1, . . . , k} for all k < d.

Instead of [0, 1), it may be more convenient to define IETs on the circle
S1 .Every IET on two intervals thus becomes a rotation. Thus IETs are a
generalization of circle rotations.

Example 4.79. To illustrate that Poincaré maps for polygonal billiard flows
can be IETs, we present an example of a torus T2 = R2 /Z2 with a single
horizontal wall [ 14 , 34 ) × { 12 } in it; see Figure 4.12 (left). A particle moves
with constant speed and angle θ w.r.t. the positive horizontal axis. When it
hits the wall, it reflects elastically: angle of incidence is angle of reflection.
Thus the outgoing angle is −θ until the next hit at the other side of the wall,
when the angle returns to θ.
We take 2 < tan θ < 4. The upper side of the wall is parametrized by
x ∈ [0, 12 ) and the lower side of the wall is parametrized by x ∈ [ 12 , 1) (left to
right). Then the map T : [0, 1] → [0, 1) assigning to x the coordinate of the
182 4. Subshifts of Zero Entropy

γ1 γ2


Figure 4.12. Billiards on the torus against a wall and its IET.

next hit is an IET with six intervals:

⎪ x + 1 − γ1 if x ∈ [0, γ1 ); γ0 = 0,

⎪ x + 1 − 2γ1 − γ2 if x ∈ [γ1 , γ2 ); = 14 − 1

⎪ γ1 tan θ ,

⎨x + 1 − γ
2 2 if x ∈ [γ2 , γ3 ); γ2 = 34 − 2
tan θ ,
T (x) =

⎪ x − γ1 if x ∈ [γ3 , γ4 ); γ3 = 12 ,

⎪ x − 2γ1 − γ2 if x ∈ [γ4 , γ5 ); = 34 − 1

γ4 tan θ ,

⎩x − 1 − γ
2 2 if x ∈ [γ5 , 1); γ5 = 54 − 2
tan θ ;
see Figure 4.12 (right). Typically, T satisfies the Keane condition, but the
permutation of T 2 is reducible.

Using the alphabet A = {1, . . . , d} and the partition {Δi }, define the
corresponding symbolic shift space XT = {i(x) : x ∈ [0, 1)} ⊂ AZ . It is
easily seen that the closure can be taken care of as follows:
XT = {i(x) : x ∈ [0, 1)} ∪ { lim i(y) : x ∈ (0, 1]}.

Proposition 4.80. The subshift associated to an interval exchange trans-

formation on d pieces has word-complexity p(n) ≤ (d − 1)n + 1. If T satisfies
the Keane condition, then p(n) = (d − 1)n + 1 for all n ≥ 0.

Proof. We argue by induction. Clearly p(n) = (d − 1)n + 1 for n = 1. Sup-

pose now that Jn is the collection of sets J ∈ [0, 1) such that i(x)n−1
i=0 is con-
stant on J. Then each boundary point y of J satisfies T (y) ∈ {γ1 , . . . , γd−1 }

for some 0 ≤ k < n, because otherwise the set J can be enlarged with points
with the same first n symbols in its itinerary.
Assume by induction that #Jn ≤ (d−1)n+1. Then J ∈ Jn but J ∈ Jn+1
if and only if T −n (γi ) ∈ J ◦ for some i = 1, . . . , d − 1. Since T is invertible,
4.4. Interval Exchange Transformations 183

there are at most d − 1 such points T −n (γi ) that can divide a J ∈ Jn ,

and the total extra piece is therefore at most d − 1 (and precisely = d − 1 if
T −n (γj ) = {γ1 , . . . , γd−1 }). Hence #Jn+1 ≤ #Jn +d−1 ≤ (d−1)(n+1)+1.
This concludes the proof. 
Theorem 4.81. An irreducible IET satisfying the Keane condition is mini-

Note that irreducibility follows from minimality. The Keane condition is,
however, not implied by minimality. Without the Keane condition, minimal-
ity can fail, but as shown in [489], an interval exchange transformation (and
in fact interval translation maps12 ) with d pieces can have at most d/2
distinct minimal subsets.

Proof. Our first claim is that T cannot have periodic points. Clearly T (0) =
0 and limy1 T (y) = 1 because otherwise the permutation would be re-
ducible. Now assume by contradiction that there is x ∈ (0, 1) with x =
T k (x). Let y = max{T n (γi ) : 0 ≤ n < k, 1 ≤ i < d}. Then T k restricted to
[y, x] is continuous and therefore the identity. So T k (y) = y, contradicting
the Keane condition.
Now assume that there is a point z whose full orbit orb(z) is not dense in
[0, 1). Then there is an interval J = [a, b) disjoint from orb(z). For each γi ,
1 ≤ i < d, there is at most one point γi ∈ J such that T ki (γi ) = γi . Similarly
a and b have such first preimages a , b ∈ J  . Thus there is a partition of J
into half-open intervals with points in {γi } ∪ {a, b, a , b } as boundary points.
Any such interval J  is mapped continuously by T n until T n (J  ) ⊂ J  for
 n(J  )−1 j 
some finite minimal n = n(J  ). Then the union K = J  j=0 T (J ) is a
T -invariant set consisting of finitely many intervals, all disjoint from orb(z).
For any x ∈ ∂K \ {0, 1}, either T (x) ∈ ∂K or x = γi for some 1 ≤ i <
d. Since ∂K is a finite set, x must be periodic in forward time (but that
contradicts our first claim) or eventually map to some γi . In backward time,
x must also be periodic (contradicting the first claim) or eventually map to
a point γj . But then T m (γj ) = γi for some m ≥ 1, contradicting the Keane
condition. Therefore no such z can exist. In other words, every orbit is

A standard tool in the theory of interval exchange transformations is

Rauzy induction. This is a particular first return map expressed in parameter
space Σd × Sd rather than in the dynamical space. We will explain it here
(see [548] for further reading), but primarily to show that (XT , σ) is an
S-adic subshift.
12 I.e. the pieces are translated without assuming invertibility.
184 4. Subshifts of Zero Entropy

Definition 4.82. Given an IET (λ, π) ∈ Σd × Sd , the Rauzy induction

produces a new IET (λ , π  ) = Θ(λ, π) ∈ Σd × Sd , which represents

[0, 1 − λd ) if λd > λe ,
the first return map to for e = π −1 (d),
[0, 1 − λe ) if λd < λe
In words, choose the shortest of the rightmost interval Δd (Type 0) and
the rightmost image Δe (Type 1), cut this length off from the right of the
interval [0, 1), and take the first return map to the remaining interval, but
scaled to unit length again; see Figure 4.13.

Type 0 Type 1
e d e d


Figure 4.13. The two types of Rauzy induction.

Example 4.83. If d = 2 and λ = λ1 , then the new coordinate after a Rauzy

induction step is
 1−λ if λ < 12 (Type 0),
λ =
1−λ if λ > 12 (Type 1).

Since we cut off at a discontinuity point (Type 0) or a discontinuity

point of T −1 (Type 1), the total number of intervals of continuity of the
first return map is again d. Only if λd = λe , one has to make a choice, but
under the Keane condition this cannot happen. If λd = 0 or λe = 0, then
the Rauzy induction doesn’t do anything, but also this degenerate case is
prevented by the Keane condition. In effect, if the Keane condition holds,
then every iterate of the Rauzy induction is well-defined. As a formula, the
Rauzy induction step looks as in Table 4.3.
The Rauzy induction map Θ : Σd × S → Σd × S is 2-to-1. Indeed, if
we are given the type of the Rauzy induction step that produces (λ , π  ) =
Θ(λ, π), then we can reconstruct π from π  and then λ from λ and π. See
Table 4.3 for a summary. One can easily compute that the incidence matrices
of the substitutions are all unimodular.
Let χn denote the substitution emerging from the n-th Rauzy induction
step. Under the Keane condition, χn is well-defined for every n ∈ N. Since
4.5. Toeplitz Shifts 185

Table 4.3. Rauzy induction formulas (with σ(e) = d).

Type 0: λe < λd Type 1: λd < λe ,

λ = 1
1−λe (λ1 , . . . , λd−1 , λd − λe ), λ = 1
1−λd (λ1 , . . . , λe − λd , λd , . . . , λd−1 ),
⎧ ⎧
⎪  if π(j) ≤ π(d), ⎪  if j ≤ e,
⎨π (j) = π(j) ⎨π (j) = π(j)

π (e) = π(d) + 1, π  (e + 1) = π(d),

⎩  ⎪
π (j) = π(j) + 1 if π(j) > π(d), π (j) = π(j − 1) if j > e + 1,

j→j if j = e, ⎨j → j if j ≤ e,
χ: χ : e + 1 → ed,
e → ed. ⎪

j →j−1 if j > e + 1.

χn (1) starts with 1 for every n and there is a fixed point of the corresponding
S-adic transformation:
ρT := lim χ1 ◦ χ2 ◦ · · · ◦ χn (1).
Since the iterates of the Rauzy induction represent first return maps to
shorter and shorter one-sided neighborhoods of 0 (assuming the Keane con-
dition holds), every letter will eventually play the role of d and e, and there-
fore this S-adic substitution is primitive. This gives another proof that
irreducible IETs satisfying the Keane condition are minimal.
Since χ1 ◦ χ2 ◦ · · · ◦ χn (1) represents the successive intervals Δi that
0 visits before its first return time associated to the n-th Rauzy induction
step, ρT = i(0). Since x = 0 has a dense orbit, the one-sided subshift is
XT = orbσ (ρT ).

4.5. Toeplitz Shifts

Definition 4.84. A sequence x ∈ AN or Z is called a Toeplitz sequence if
for every i ∈ N, there exists qi ∈ N such that xi = xi+kqi for all k ∈ N or Z.
The orbit closure Xq = {σ n (x) : n ≥ 0} is called a Toeplitz shift.

Although the first example of this appeared in a paper by Oxtoby [440],

the name and notion were introduced by Jacobs & Keane [332], inspired
by a construction by Otto Toeplitz (1881–1940) [537] to create an almost
periodic function on the real line, but otherwise, Toeplitz was not involved.
Proposition 4.85. If χ : A → A∗ is a constant length substitution such
that χ(a) starts with the same symbol for each a ∈ A, then the unique fixed
point of χ is a Toeplitz sequence.
186 4. Subshifts of Zero Entropy

Proof. Fix the symbol a ∈ A such that χ(a) starts with a, so ρ = ρ1 ρ2 ρ3 · · ·

= limn χn (a) is the fixed point of χ. Let N = |χ(b)| for each b ∈ A. Then
clearly ρ1+kN = a for all k ∈ N, so we can take q1 = N .
It follows that χ(ρ1 · · · ρ1+kN ) (which has length kN 2 + N ) starts and
ends with χ(a). Therefore qi = N 2 for i = 2, . . . , N . Continuing by induc-
tion, we find qi = N r for i = N r−1 + 1, . . . , N r . 
Example 4.86. The simplest way to construct a Toeplitz sequence emerges
from taking qi = 2i , the powers of 2, and x qi +kqi = 12 (1 − (−1)i ) for all
k ≥ 0 and i = 1, 2, 3, . . . . The resulting Toeplitz sequence is the Feigenbaum
ρfeig = 1011101010111011101110101011101010111010101110111 · · · ;
see Example 1.6 for more details on this sequence. Although ρfeig is Toeplitz,
not every sequence in Xfeig = orbσ (ρfeig ) has the Toeplitz property. For
example, ρfeig has two preimages in Xfeig , namely 0ρfeig and 1ρfeig . Of these
two, only 0ρfeig is a Toeplitz sequence.
As will be shown in Section 4.7.1, ρfeig is the kneading sequence of an
infinitely renormalizable unimodal map. In fact, the kneading sequence of
every infinitely renormalizable unimodal map is a Toeplitz sequence. More
generally, Alvin [23, 24] classifies all the Toeplitz sequences which appear
as a kneading sequence (and for which the unimodal maps act on ω(c) as
(strange) adding machines).
Proposition 4.87. The Thue-Morse sequence
ρTM = 1001 0110 0110 1001 0110 1001 1001 0110 · · ·
obtained from the Thue-Morse substitution χTM : 0 → 01, 1 → 10, is not a
Toeplitz sequence.

However, the Thue-Morse shift factorizes to the Feigenbaum substitution

shift via the sliding block code 01, 10 → 1, 00, 11 → 1 (see Example 1.6) and
the Feigenbaum substitution shift is Toeplitz.

Sketch of Proof. We show that there is no period p1 such that ρ1 = ρ1+kp1

for all k. First assume by contradiction that p1 is odd. Then {2n mod p1 :
n ≥ 1} = {1, . . . , p1 − 1}; in fact, 2n mod p1 traverses these rest-classes
periodically. Therefore 2n + 3 = kp1 for infinitely many k, n ∈ N. Since
ρ1 ρ2 ρ3 ρ4 = 1001 is the opposite word to ρ2n +1 ρ2n +2 ρ2n +3 ρ2n +4 = 0110,
ρ1 = ρ2n +4 , so the period cannot be an odd p1 .
However, if p1 is even, then kp1 + 1 is odd for all k ∈ N. If we divide
ρ into blocks of length 2, then kp1 + 1 is always the first symbol of such a
block. By taking the inverse of the substitution (which fixes ρ), it follows
4.5. Toeplitz Shifts 187

that p1 /2 is also a period of ρ1 . Continuing by induction, we find that ρ1

has an odd period after all, contradicting the first half of this proof.
There is no infinite arithmetic progression kj = jq + r such that ρkj is
the same for all j. This follows from estimates of the possible lengths of
arithmetic progressions in the Thue-Morse sequence by Pashina [448]. 

Lemma 4.88. A Toeplitz shift (Xq , σ) is uniformly rigid and hence minimal.

Proof. We give the proof for one-sided Toeplitz sequences; the proof of two-
sided sequences goes likewise. Let [x1 x2 · · · xn ] be any cylinder set. Then
every digit xi reappears with gap qi . Hence, if Ln = lcm(q1 , . . . , qn ) is the
least common multiple of q1 , . . . , qn , then σ kL ([x1 x2 · · · xn ]) ⊂ [x1 x2 · · · xn ]
for all k ∈ N. This is uniform rigidity. The minimality of the corresponding
subshift follows from Lemma 2.24 and Corollary 2.20. 

The way to build up a Toeplitz sequence in {0, 1}N or Z is to start with

x1 = 1, choose q1 , and set x1+kq1 = 1 for all k ∈ N (or k ∈ Z for a two-sided
Toeplitz sequence, but we will focus on the one-sided Toeplitz sequences).
The rest of the entries get a “temporary ∗”: xi = ∗. Next set x2 = 0,
choose q2 (not coprime with q1 ), and set x2+kq2 = 0. Continuing this way
inductively, let xi be the first remaining temporary ∗’s and choose qi − i
a multiple of the period of the pattern of the remaining ∗’s. The periodic
sequence Sk(qj ) ∈ {0, 1, ∗}N of the j-th step of this construction is called the
qj -skeleton of the Toeplitz sequence.

Example 4.89. As an example of building

q1 = 3 : 1∗∗1∗∗1∗∗1∗∗1∗∗1∗∗1∗∗1∗∗1∗∗. . . ,
q2 = 6 : 10∗1∗∗10∗1∗∗10∗1∗∗10∗1∗∗10∗. . . ,
(4.36) q3 = 3 : 1011∗11011∗11011∗11011∗1101 . . . ,
q4 = 12 : 1011011011∗11011011011∗1101 . . . ,
.. .. .. ..
. . . .
In most cases, qj+1 is a multiple of qj , but (4.36) shows that this is not
necessary. However, if q = (qj )j≥1 is such that qj divides qj+1 for all j ∈ N,
then we call q the periodic structure of the Toeplitz sequence x.

This construction of skeletons yields an extension of Proposition 4.85.

Theorem 4.90. The one-sided sequence x ∈ AN is Toeplitz if and only if

there is a sequence of constant length substitutions χi : Ai → Ai−1 on finite
alphabets Ai with A = A0 such that χi (a) starts with the same symbol for
each a ∈ Ai , and x = limi→∞ χ1 ◦ χ1 ◦ · · · ◦ χi (a), a ∈ Ai arbitrary.
188 4. Subshifts of Zero Entropy

Proof. Let Ni = χi (a) be the length of the words from the i-th substitution.
By the condition that x1 = χ1 (a) for all a ∈ A1 , we find x1+kN1 = x1 for all
k ∈ N. By composing χ1 ◦χ2 , we obtain x1 · · · xN1 = x1+kN1 N2 · · · xN1 +kN1 N2
for all k ∈ N. In general, the initial block x1 · · · xN1 N2 ···Nr repeats with period
N1 N2 · · · Nr Nr+1 , so x is Toeplitz.
Conversely, if x = x1 x2 x3 · · · is Toeplitz on alphabet A0 , then there is
N1 such that x1+kN1 = x1 for all k ∈ N, and there is a finite collection of
N1 -words bk , k = 1, . . . , K1 , all starting with x1 such that x = bk1 bk2 bk3 · · · .
Consider {bk }Nk=1 as the letters of alphabet A1 , and define the substitution
word χ1 (bk ) (as letter) = bk (as N1 -word). Then x = χ1 (bk1 bk2 bk3 · · · ).
Since the N1 -words bki appear with their own gap, bk1 bk2 bk3 · · · ∈ AN 1 is a
Toeplitz sequence on its own right, and we can repeat the construction. 

4.5.1. Regular Toeplitz Sequences. When constructing a Toeplitz se-

quence this way, at step n, we have an Ln -periodic sequence, where Ln =
lcm(q1 , . . . , qn ). We call the Toeplitz sequence regular if L1n #{1 ≤ i ≤ Ln :
xi = ∗} → 0 as n → ∞. The official definition is slightly weaker:
Definition 4.91. A sequence x ∈ AN or AZ is a regular Toeplitz se-
quence if it is the limit of skeletons Sk(Ln ) ∈ (A ∪ {∗})N or (A ∪ {∗})Z of
period Ln such that
Sk∗ (Ln )
lim = 0 for Sk∗ (Ln ) := #{1 ≤ i ≤ Ln : Sk(Ln )i = ∗}.
n→∞ Ln
Theorem 4.92. A regular Toeplitz shift has zero entropy.

Proof. We follow [381, Theorem 4.76]. Let V (i) be the Li -word in

(A ∪ {∗})Li obtained in the i-th step of the construction of Example 4.89;
i.e. we have now an Li -periodic skeleton Sk(i) = V (i)∞ ∈ (A ∪ {∗})N . Let
ri = |V (i)|∗ be the number of ∗’s in V (i). Then there are at most #Ari ways
to fill in the ∗’s later on, and there are at most #Ari Li -words in the Toeplitz
sequence x starting at a position 1 + kLi . Therefore px (Li ) ≤ Li #Ari , and
1 log Li + ri log #A ri
lim log px (Li ) ≤ lim ≤ log #A lim = 0.
i→∞ Li i→∞ Li i→∞ Li

Since px (n) is subadditive, limn 1

n log px (n) = 0 by Fekete’s Lemma 1.15. 

The following upper bound for the amorphic complexity of regular

Toeplitz sequences was shown in [264].
Theorem 4.93. Let (Xq , σ) be a Toeplitz sequence with periodic structure
“=” (qj )∞
log qj+1
j=1 . Then the amorphic complexity ac(σ) ≤ lim supj→∞ − log Sk∗ (qj ) .
In particular, if qj+1 ≤ C1 qjt and Sk∗ (qj ) ≤ C2 qj−u , then ac(σ) ≤ ut .
4.5. Toeplitz Shifts 189

With some more work, and for the two-letter alphabet, we could improve
log q
the upper bound to ac(σ) ≤ lim supj − log Sk∗j (qj ) . By stipulating further prop-
erties on the Toeplitz sequence, one can (see [264, Section 5]) give examples
showing that this upper bound is sharp and also that for a dense set of values
a ∈ [1, ∞) (including a = 1), there is a regular Toeplitz shift with ac(σ) = a.

Proof of Theorem 4.93. Note that the densities Sk∗ (qj ) are decreasing in
j, and by regularity of the Toeplitz shift, limj Sk∗ (qj ) → 0. Choose δ > 0
arbitrary and m ∈ N such that 2−m < δ. Next choose v arbitrary and j ∈ N
such that (2m + 1)Sk∗ (qj+1 ) < v ≤ (2m + 1)Sk∗ (qj ). Then
Sep(δ, v) ≤ Sep(2−m , (2m + 1)Sk∗ (qj )).
We claim that the right-hand side is bounded by qj+1 . Indeed, assume by
contradiction that there is a (2−m , (2m + 1)Sk∗ (qj ))-separated set S with
more than qj+1 elements. Then at least two of them, say x, y ∈ S, share the
same qj+1 -skeleton. This means that x and y differ at most in qj+1 Sk∗ (qj+1 )
positions in every qj+1 -block. Since d(σ k (x), σ k (y)) ≥ δ only if xi = yi for
some i with |i − k| ≤ m,
#{0 ≤ k < nqj+1 : d(σ k (x), σ k (y)) ≥ δ}
2m + 1
≤ #{0 ≤ k < nqj+1 : xk = yk } ≤ (2m + 1)Sk∗ (qj+1 ).
When taking the limit n → ∞, we get a contradiction with the choice of j.
This proves the claim.
Therefore Sep(δ, v) ≤ qj+1 . Take logarithms and divide left- and right-
hand sides by − log v ≥ − log(2m + 1)Sk∗ (qj ), respectively. This gives
log Sep(δ, v) log qj+1
≤ .
− log v − log(2m + 1) − log Sk∗ (j)
Note that m depends only on δ. Thus taking the superior limit v → 0 (and
log q
hence j → ∞), we obtain ac(σ) ≤ lim supj − log Skj+1
∗ (qj )
as claimed. 

Theorem 4.94. For every real number K ≥ 0, there is a Toeplitz shift

(X, σ) such that htop (σ) = K. For every real number K ≥ 1, there is a
Toeplitz shift (X, σ) that has polynomial word-complexity with exponent K;
i.e. limn→∞ loglogp(n)
n = K.

Proof. We start with the positive entropy Toeplitz sequence, following [381,
Theorem 4.77], which in turn follows [560]. Let A be &an alphabet such that

log #A ≥ 2K and take a sequence (ki )i∈N such that i=1 (1 − k1i ) = log2K #A ∈
(0, 1). Start with an L0 -word V (0) containing r0 = L0 /2 symbols ∗. We
construct the i-th skeleton V (i)∞ with |V (i)| = Li , recursively. Given V (i),
190 4. Subshifts of Zero Entropy

let W (i) be the concatenation of the (#A)ri copies of V (i) where the ri
symbols ∗ are replaced by the (#A)ri ri -words in A. Then set
ri (k −1)
V (i + 1) := W (i)V (i)(#A) i

so that |V (i + 1)| = ki (#A)ri , each non-∗ symbol in V (i) returns with

periodic gap ≤ Li , and V (i + 1) contains ri+1 = ri kik−1 symbols ∗.

It follows that limi Lrii = Lr00 ∞i=1 (1 − ki ) = L0 log #A > 0 (so regularity
1 r0 2K

fails). The number of Li -words p(Li ) is bounded below by (#A)ri (namely

the words that start at a positive 1 + kLi ) and bounded above by Li (#A)ri
(all starting positions). Therefore
ri log #A log p(Li ) log Li + ri log #A
≤ ≤
Li Li Li
#A &∞
whence limi log Lp(L
= r0 log
L0 i=1 (1 − ki ) = L0 2K = K. Therefore the
1 r0

log p(n)
topological entropy is limn n = K by Fekete’s Lemma 1.15.
log p(n)
We will not give the examples with logarithmic complexity limn log n =
K ≥ 1, but the technique is the same. 

4.5.2. Adding Machines. Just like the more general enumeration system
in Section 5.3, adding machines are a class of symbolic systems that are
not subshifts. They are also called odometers13 , after the device in a car to
measure distance. Such an odometer consists of a number of disks, with the
digits 0, . . . , 9 written on the edge. A single “tick” moves the rightmost disk
by one unit, and if the 9 is passed (so the disk is back at position 0), it ticks
over the second disk by one unit. A mathematical odometer has infinitely
many disks, and the number of digits may vary from disk to disk.
The most common one is the dyadic adding machine or dyadic
odometer a : Σ → Σ for Σ = {0, 1}N . For x ∈ Σ, let k = inf{i : xi = 0}.
Then a is defined as

⎨0, i < k,
(4.37) a(x)i = 1, i = k,

xi , i > k.

Also, if x = 111 · · · , so k = ∞, then a(x) = 000 · · · .

In more generality, we can choose a sequence p := (pk )k≥1 of integers
pk ≥ 2 and define a on Σp := {(xk )k≥1 : xk ∈ {0, 1, . . . , pk − 1}} analogously
to (4.37). It is also instructive to view this procedure algorithmically, as

13 After ῾
the ancient Greek words oδoς and μετ ρoν for road and measure.
4.5. Toeplitz Shifts 191

“add one and carry”.

c := 1 ; k := 1
s := xk + c
(4.38) If s ≥ pk then c := 1 else c := 0
xk := s mod pk ; k := k + 1
Until c=0
In fact, Σp is a group under the same rule of add and carry, and a : Σp → Σp
is invertible.
Proposition 4.95. Every odometer Σp is a topological group under addi-

Proof. The addition z = x + y of two sequences x, y ∈ X with add and

carry goes according to the algorithm:
c := 0 ; k := 1
Repeat for all k ∈ N
s := xk + yk + c
If s ≥ pk then c := 1 else c := 0
zk := s mod pk ; k := k + 1
It is straightforward to check that this is commutative and continuous in x
and y. 
Exercise 4.96. Show that an odometer (Σp , a) is conjugate to its own
inverse (Σp , a−1 ).
Remark 4.97. There is a common alternative way to write adding machines.
Given p = (pj )j∈N , define a sequence q = (qj )j∈N0 by q0 = 1 and qj =
k=1 pk . Set

Σ̃q = {y = (yj )∞
j=1 : yj ∈ {0, . . . , qj −1}, qj−1 divides (yj −yj−1 ) for all j ∈ N},

where y0 = 0 by convention. Define b : Σp → Σ̃q by

yk − yk−1
(4.39) b(x)k = xj qj−1 with inverse b−1 (y)k = .

Then b is a homeomorphism, and

b ◦ a = ã ◦ b ã(y)k = yk + 1 mod qk for all k ∈ N.

If car odometers were constructed as Σ̃q , then qj = 10j and nj=1 qj 10j−1
on the odometer would be the total number of kilometers driven mod10n .
192 4. Subshifts of Zero Entropy

Remark 4.98. There is yet another, less common, way to write the adding
machine, provided all the pi ’s are pairwise coprime. Let Σ̂p = Σp and define
â : Σ̂p → Σ̂p as
â(y)i = yi + 1 mod pi for all i ≥ 1.
Then (Σp , a) and (Σ̂p , â) are conjugate via ψ : Σp → Σ̂p defined as
ψ(x)i = xj pj−1 mod pi with p0 = 1.

The inverse of this map ψ can be computed using the Chinese Remainder
Theorem which states that, whenever p1 , . . . , pk are coprime integers greater
than 1 and N = ki=1 pi and given integers 0 ≤ ai < pi , the congruence
(4.40) x mod pi = ai , 1 ≤ i ≤ k,
have a unique solution 0 ≤ x < N . A constructive solution can be found
inductively. Since gcd(p1 , p2 ) = 1, Bézout’s identity (effectively the Eu-
clidean algorithm) gives n1 , n2 ∈ Z such that n1 p1 + n2 p2 = 1. Then
x = a1,2 := a1 n2 p2 + a2 n1 p1 mod p1 p2 solves the first two congruence equa-
tions. Now replace these first two congruence equations by x ≡ a1,2 mod p1 p2
and continue by induction. This inductive procedure also shows that if we
increase k to k + 1, the new solution is in the same congruence class modN
as the previous. Hence ψ −1 (y) can be computed term by term.
Proposition 4.99. Every odometer is uniformly rigid and hence periodically

Proof. Let ε > 0 be arbitrary and take k such that 2−k < ε. Let qk =
p1 p2 · · · pk . Then aqk (x)i = xi for all i ≤ k; i.e. d(aqk (x), x) < ε as required.
Periodic recurrence follows by Lemma 2.24. 
Proposition 4.100. Every odometer is strictly ergodic; i.e. it is minimal
and has a unique invariant probability measure; see Section 6.3.

Proof. Given any n-cylinder Z, every x ∈ Σp will visit it exactly once in

every p1 p2 · · · pn iterates of a. Therefore orba (x) is dense in Σp and the only
a-invariant probability measure has μ(Z) = (p1 p2 · · · pn )−1 . 
Proposition 4.101. Every odometer is an isometry, and hence of zero en-

Proof. Let x, y ∈ Σp and n = min{i ≥ 1 : xi = yi }, so d(x, y) = 2−n .

The algorithmic definition of a shows that mini {a(x)i = a(y)i } = n as
well. Therefore a is an isometry, and in particular equicontinuous. Proposi-
tion 2.49 shows that htop (a) = 0. 
4.5. Toeplitz Shifts 193

Odometers can be classified by the structure of the sequence p = (pi )i∈N .

There is no restriction in assuming that all pi are primes, because otherwise,
i.e. if pi is the product of k primes, we can replace the i-th “wheel” of the
odometers by k wheels, each with a prime number of digits. Define kp :
{primes} → {0, 1, 2, . . . , ∞} by setting kp (n) = #{i ∈ N : pi = n}. It is
shown in e.g. [208] that (Σp , a) and (Σp , a) are conjugate if and only if
kp ≡ kp . Also (Σp , a) is a factor of (Σp , a) if and only if kp (n) ≤ kp (n)
for every prime n. Therefore the only proper factors of simple odometers,
i.e. those odometers for which all pi ’s are the same prime, are finite cyclic
groups. All non-simple odometers have other odometers as factors.
Proposition 4.102. An odometer has no subshift other than periodic sub-
shifts as continuous factors. However, an odometer can be a factor of a

Proof. Clearly the restriction of a to the first n digits gives an p1 p2 · · · pn -

periodic orbit. However, since a is an isometry, it cannot have an expansive
continuous factor, and by Proposition 2.38, all non-periodic transitive sub-
shifts are expansive.
Conversely, take the Feigenbaum substitution shift (Xfeig , σ) with Xfeig =
orbσ (ρ) for the fixed point
ρfeig = ρ0 ρ1 ρ2 · · · = 1011 1010 10111011 1011101010111010 1011 · · · .
The shift is invertible on Xfeig , except that ρfeig itself has two preimages
0ρfeig and 1ρfeig . We define a factor map ϕ onto the dyadic inverse odometer
(X, a−1 ), for Σ = {0, 1}N . Since odometers are conjugate to their own
inverses (see Exercise 4.96), this gives a factor map onto (Σ, a) too.
Carry out the following algorithm:
y1 := min{n ≥ 1 : xn = 0} mod 2,
y2 := min{n ≥ 1 : xy1 +2n = 1} mod 2,
y3 := min{n ≥ 1 : xy1 +2y2 +4n = 0} mod 2,
y4 := min{n ≥ 1 : xy1 +2y2 +4y3 +8n = 0} mod 2,
.. ..
. .
and set ϕ(x) = y. For example, we get
ϕ(ρfeig ) = 0000 · · · and ϕ ◦ σ(ρfeig ) = 1111 · · · .
Note that this is not a sliding block code, since the windows to consider
to determine yi increase with i. However, ϕ is continuous, and one can
check that ϕ ◦ σ = a−1 ◦ ϕ. The above minima are taken over n ≥ 1.
Therefore ϕ(0ρfeig ) = ϕ(1ρfeig ) and in fact ϕ(σ −k (0ρfeig )) = ϕ(σ −k (1ρfeig ))
for all k ≥ 0. 
194 4. Subshifts of Zero Entropy

Theorem 4.103. Let (Xq , σ) be a Toeplitz shift with periodic structure q and
assume that p = (pi )i≥1 with p1 = q1 , pi = qi /qi−1 is an integer sequence.
Then (Σp , a) is the maximal equicontinuous factor of (Xq , σ), and (Xq , σ)
is a non-trivial almost one-to-one extension of (Σp , a).

By [208], a minimal system (X, T ) is an almost one-to-one extension

of an odometer if and only if X is the closure of a periodically recurrent
point. Let us denote the factor map by π. Then the periodically recurrent
points x ∈ X are exactly the single fibers: π −1 ◦ π(x) = {x}. If T is a
homeomorphism, then (X, T ) is an odometer. The Feigenbaum shift of Ex-
ample 4.86 demonstrates non-singleton fibers: σ −1 (ρfeig ) = {0ρfeig , 1ρfeig } =
π −1 ◦ a−1 ◦ π(ρfeig ) = π −1 ◦ a−1 (0∞ ) = π −1 (1∞ ).

Proof. Let Xq be the orbit closure of the Toeplitz sequence x with periodic
structure q. Let Sk(qj ) be the j-th skeleton of x, so it is a qj -periodic
sequence in (A ∪ {∗})∞ . For y ∈ Xq , define
πj (y) = r ∈ {0, . . . , pj − 1} if yi = Sk(qj )i+r whenever Sk(qj )i+r = ∗.
Therefore πj (σ n y) = πj (y) + n mod qj , so πj is surjective, and π −1 (r), r =
0, . . . , qj − 1, are qj disjoint clopen sets in Xq . For y ∈ Xq , it may not be
clear from the first qj entries what πj (y) is. However, for every j, there is
mj such that the first mj entries determine the value of πj (y). Therefore πj
is continuous.
Note that π(y)j − π(y)j−1 is always a multiple of qj−1 . Thus we can
define π : Xq → Σ̃q by
π(y)j = πj (y).
Then π −1 (z) = j πj−j (z), as the intersection of nested non-empty closed
sets, is itself non-empty. Thus π is surjective, continuous, and π ◦ σ =
ã ◦ π, were ã is defined in Remark 4.97. Via b we can recode (Σ̃q , ã) to
the adding machine (Σp , a) as Remark 4.97 explains. This adding machine
is thus a factor of the Toeplitz shift and, as with all adding machines, it is
If we set π̃ = b ◦ π, we see further that π̃(σ n (x)) = an (00000 · · · ) =: (n)
for each n ∈ N0 and that also π̃ −1 ((n)) = {σ n (x)}. Therefore (Xp , σ) is
an almost one-to-one extension of (Σp , a). However, there must be z ∈ Σp
such that π̃ −1 (z) ≥ 2, because otherwise (Σp , a) would be conjugate to the
(expansive) subshift (Xq , σ), contradicting Proposition 4.102.
It follows from Theorem 2.43 that (Σp , a) is the maximal equicontinuous
factor of (Xq , σ). 

The following result can be found in [381, Theorem 4.4].

4.6. B-Free Shifts 195

Theorem 4.104. Every transitive equicontinuous dynamical system (X, T )

on the Cantor set X is conjugate to an adding machine.

Proof. Recall from Proposition 2.31 that (X, T ) preserves the metric
d∞ (, y) := sup d(T n (x), T n (y)).

Define ε-chain-connectedness as the equivalence relation

x ≈ε y if there are x = x0 , x1 , . . . , xn = y such that d∞ (xi , xi+1 ) < ε.
Since T preserves d∞ , we have x ≈ε y if and only if T (x) ≈ε T (y), so T
permutes the equivalence classes of ≈ε .
Take ε0 maximal such that X doesn’t consist of a single equivalence class
of ≈ε0 anymore. By compactness there are finitely many, say p1 , equivalence
classes. Since T is transitive, it permutes these classes cyclically, so we
can number them as E0 , E1 , . . . , Ep1 −1 where T (Ei ) = Ei+1 mod p1 , and in
particular, T p1 fixed each Ei .
Next take ε1 < ε0 maximal such that E0 is not a single equivalence class
for ≈ε1 , and as before there are finitely many, say p2 , equivalence classes
in E0 . Since T is transitive, T p1 permutes these classes cyclically, and we
can number them as E00 , E01 , . . . , E0(p2 −1) where T p1 (E0i ) = E0(i+1 mod p2 ) .
Furthermore, we can number the images of the E0i such that, for all 0 ≤
i < p2 , we have T (Eji ) = E(j+1)i for 0 ≤ i ≤ p1 − 2 and T (E(p1 −1)i ) =
E0(i+1 mod p2 ) . In particular, T p1 p2 fixed each Eji .
Continuing this way, we see that T permutes the sets Ex1 x2 ···xn in ac-
cordance with the map a on the q-adding machine for the sequence (pj )j≥1 ,
and this carries over to the infinite intersections Ex1 x2 ··· = n≥1 Ex1 x1 ···xn .
These intersections are in fact points, because X is totally disconnected.
This completes the proof. 

4.6. B-Free Shifts

In order to study his famous conjecture, Sarnak introduced B-free shifts in
2010, although B-free sets have already been studied since the 1930s. Our
main source for this section is the monograph [231].
Definition 4.105. For a subset B ⊂ N and FB := Z\{nb : b ∈ B, n ∈ Z}, let
η := 1FB be the indicator function of FB . That is, ηk = 0 if k = nb for some
b ∈ B and n ∈ Z; otherwise ηk = 1. Let XB = orbσ (η) be the shift-orbit
closure of η. The two-sided subshift (XB , σ) is called the B-free shift.

We will assume that 1 ∈ / B, because otherwise FB = ∅. More generally,

we will assume that no element b ∈ B is a multiple of some other b ∈ B. This
196 4. Subshifts of Zero Entropy

property, called primitive14 , prevents B from having unnecessary elements

that don’t change FB but might interfere with conditions put on B later on.

Example 4.106. If B = {prime numbers}, then FB = {−1, 1} and

XB = {σ n (· · · 000101000 · · · ) : n ∈ Z} ∪ {0∞ }

is clearly a non-minimal shift. However, there is a minimal subshift ({0∞ }, σ)

that every sequence in XB is asymptotic to, both in forward and backward
time. If B = {pq : p, q are prime numbers}, then FB = {±prime numbers} ∪
{−1, 1}; this is effectively the sieve of Eratosthenes15 . Since there are ar-
bitrarily long gaps between primes (i.e. (FB )c is thick; see Definition 2.18),
(0∞ , σ) is again a minimal subshift of XB , and in this case, every x ∈ XB is
proximal to 0∞ .

The B-free sets date back to the first half of the 20th century; research
from that time includes the question of under which conditions the density
d(FB ) = limn n1 #{FB ∩ {1, 2, . . . , n}} exists; see [76, 155, 184, 186, 187].
Davenport & Erdös [186] showed that the logarithmic density δ(FB ) (see
Definition 8.55) always exists and is equal to the upper density d(FB ). Besi-
covitch [75] gave the following sufficient condition for d(FB ) to exist:
(4.41) B is pairwise coprime and thin; i.e. < ∞.

Since d( Bb>K bZ) ≤ Bb>K 1/b, every

thin sequence B has light tails,
which means that the upper densities d( Bb>K bZ) → 0 as K → ∞.
The set B might contain superfluous elements b0 in the sense that FB\{b0 }
and its related shifts have the same properties as FB and its related shifts.
A condition on B to avoid superfluous elements is the following:

Definition 4.107. The set B ⊂ N is taut if the logarithmic densities satisfy

⎛ ⎞ 
δ⎝ bZ⎠ < δ bZ for all b0 ∈ B;
b0 =b∈B b∈B

that is, every b0 has a significant contribution to FB .

Having light tails implies tautness, but not the other way around.

14 But this primitive has nothing to do with primitive for matrices.

15 The Greek geographer and mathematician Eratosthenes of Cyrene (276–195/194 BC) was
the chief librarian of Alexandria in his time. His most famous achievement was an estimate of the
circumference of the Earth.
4.6. B-Free Shifts 197

4.6.1. Hereditary and Admissible B-Free Shifts. Apart from the B-

free shift, the following two shifts related to B-free sets are of use.
Definition 4.108. The B-admissible shift is
XBadm = {x ∈ {0, 1}Z : ∀ b ∈ B ∃ a ∈ N ∀ n ∈ Z xa+nb = 0}.
A subshift X ⊂ {0, 1}N or Z is hereditary if whenever x ∈ X and yn ≤ xn
for all n, then also y ∈ X. We call XBher the hereditary subshift if it is the
smallest hereditary subshift containing XB .

It is clear from the definitions that XBadm is hereditary and

(4.42) XB ⊂ XBher ⊂ XBadm .
We summarize from [231] some of the properties of XBher and XBadm :
• There are examples where these inclusions are strict (in particular,
XB need not be hereditary). Indeed, for B the primes as in Exam-
ple 4.106, XBher = XB ∪ {σ n (· · · 0001000 · · · ) : n ∈ Z}, and XBadm is
uncountable, because it is hereditary and contains a sequence with
& ones, for instance x = · · · 000.010001 · · · (with 1’s
infinitely many
at positions nj=1 pj where pj is the j-th prime number).
• However, as shown in [2], condition (4.41) implies equality in (4.42).
In fact, if B is taut, then XB = XBher ; see [353, Theorem 3].
• Every set B can be made taut in the sense that there exists B 
such that FB ⊂ FB , XBher ⊂ XB
her and these spaces carry the same

shift-invariant measures. For every x ∈ XBher and ε > 0, the set

{n ∈ Z : σ n (x) ∈
/ Bε (XBher
 )} has zero density.

• If B and B  are both taut, then equality of B and B  is equivalent to

FB = FB , to XB = XB , to XBher = XBher
 , and also to XB
adm = X adm .
• If B has light tails, then the density d(FB ) exists. Additionally, if
B is pairwise coprime and has light tails16 , then XB = XBher ; i.e. we
have equality in (4.42).
Proposition 4.109. Regardless of whether equality holds in (4.42) on not,
the entropies are equal:
(4.43) ¯ B ) = δ(FB ),
htop (X her , σ) = htop (X adm , σ) = d(F
where δ(FB ) is the logarithmic density; see Definition 8.55.

Proof. We sketch the proof from [231, Proposition K and Theorem 2.28].
It is easy to see that htop (XBher , σ) ≥ d(FB ). Indeed, among the first n
entries, η has at least d(FB )n ones, and therefore pX her (n) ≥ 2d(FB )n . Since

16 Keller [353] derives this conclusion under the weaker assumption that B is taut and pairwise

198 4. Subshifts of Zero Entropy

htop (XBher , σ) = inf n pX her (n), the inequality htop (XBher , σ) ≥ d(FB ) follows.
One step in [231, Proposition K] is therefore to show that the other inequality
pX her (n) ≤ 2d(FB )n+ε holds. 

Example 4.110. If B = {p2 : p is prime}, then FB is the set of square-free

integers. In terms of the Möbius function17 μ : Z → {−1, 0, 1} defined as

⎪ (−1)k if |n| is the product of k distinct primes,

⎨0 if |n| is a multiple of a square of a prime,
(4.44) μ(n) =
⎪ if |n| = 1,

⎩0 if n = 0
we have FB = {n ∈ Z : μ(|n|) = 0}. The study of this example was
stimulated by Sarnak’s conjecture18 . The density of FB is d(FB ) = 6/π 2 =
1/ζ(2) , see [303], so by (4.43), htop (XB , σ) = 1/ζ(2).
Exercise 4.111. Let A = {1, 2, . . . , a} for some a ∈ N. Show that the
number of periodic sequences x ∈ AZ of minimal period n equals
Per(n) = μ ad ,

where μ denotes the Möbius function, roughly counting the parity of distinct
prime factors; see (4.44). In particular, Per(n) = an −a if n is a prime. Derive
Fermat’s Little Theorem, an−1 ≡ 1 (mod 1) if n is a prime not dividing a.

The connection between B-free shifts and Toeplitz shifts is that every
B-free shift has a unique minimal “core” that is a Toeplitz shift, although it
is usually a very simple one, namely ({0∞ }, σ). This is summarized in the
next result, which is [231, Theorem A].
Theorem 4.112. Every B-free shift (XB , σ) has a unique minimal subshift,
which is a Toeplitz shift, and every x ∈ XB is proximal to this subshift.

As the proof shows, every x ∈ XB is syndetically proximal to this sub-

shift, and the result holds for (XBher , σ) as well. Furthermore, (XB , σ) is
17 The
Möbius function is central in algebraic number theory. For instance, =
 n ns
ζ(s)−1 , i.e. the inverse of the Riemann ζ-function. As a consequence, the statement n≤N μ(n) =
 1 +ε
o(N ) is equivalent to the prime number theorem, and n≤N μ(n) = O(N
2 ) is equivalent to
the Riemann hypothesis.
18 His conjecture states that every continuous dynamical system (X, T ) of zero entropy has

the property that every continuous f : X → R is orthogonal to the Möbius function, which
k=0 μ(k) · f ◦ T (x) tend to zero for every x ∈ X. Many dynamical
1 k
means that averages n
systems satisfy this conjecture, e.g. circle rotations [185]. It is known that the converse is false:
There are continuous positive entropy systems such that every continuous function is orthogonal
to the Möbius function; see Downarowicz & Serafin [217]. A recent account of the progress on
this problem can be found in [245].
4.6. B-Free Shifts 199

proximal (i.e. every pair (x, y) ∈ XB2 is proximal) if and only if its maximal
equicontinuous factor is trivial; see [231, Theorem 3.22].

Proof. By construction, each 0 appearing in η = 1FB reappears with period

b for some b ∈ B. Thus every block of 0’s also reappears periodically. We
will show that some blocks of 1’s appear periodically as well. Throughout
this proof, blocks of 0’s (or of 1’s) are always taken to be of maximal length;
i.e. they cannot be extended to the left or right with another 0 (or 1).
If η contains arbitrarily long blocks of 0’s, all reappearing periodically,
0∞ ∈ XB , and the proof of Proposition 2.17 shows that {0∞ } is the unique
minimal subset of (XB , σ). Trivially 0∞ is a Toeplitz sequence.
Otherwise, let A0 be the longest block of 0’s that appears in η. Each
appearance is followed by a block of 1’s. Let A1 := A0 1s where s ∈ N is
the shortest length of all blocks of 1’s succeeding an appearance of A0 in η.
Next take A2 := A1 0t where t ∈ N is the longest length of all blocks of 0’s
succeeding an appearance of A1 in η. The blocks A0 and 0 both reappear
periodically, and if they reappear simultaneously, the s-block in between has
to be 1s again. This follows because A0 was a longest block of 0’s and s
was the length of the shortest block of 1’s succeeding A0 . Therefore A2 as a
whole reappears periodically.
Next extend A2 to the left as A3 := 1u A2 where u ∈ N is the shortest
length of all blocks of 1’s preceding an appearance of A0 in η. Next take
A4 := 0v A3 where t ∈ N is the longest length of all blocks of 0’s preceding
an appearance of A3 in η. By the argument above, A4 reappears periodically.
Continue by induction; i.e.
A4n+1 = A4n 1sn , A4n+2 = A4n+1 0tn ,
A4n+3 = 0 A4n+2 , A4n+4 = 0vn A4n+3
u n

are extensions with the shortest possible blocks of 1’s and longest possible
blocks of 0’s available. Then y := limn An is a two-sided sequence of which
each subword reappears periodically, so it is Toeplitz. Therefore orbσ (y) is
a minimal subset of (XB , σ). Because the blocks An appear with the same
periods in η, it is the only minimal subset. 

The case that {0∞ } is the minimal subset is easy to determine from B:

Lemma 4.113. The set B contains an infinite set of pairwise coprime inte-
gers if and only if 0∞ ∈ XB . In this case {0∞ } is the unique minimal set in
XB and in XBher .
Proof. ⇒: Let b1 , . . . , bk ∈ B be pairwise coprime, and let N = ki=1 bi .
By the Chinese Remainder Theorem, there is m ∈ {0, . . . , N − 1} such that
200 4. Subshifts of Zero Entropy

m ≡ i mod bi for i = 1, . . . , k. Therefore ηj = 0 for j = m + 1, . . . , m + k.

Since k is arbitrary, 0∞ ∈ XB .
⇐: Assume that A = {a + 1, . . . , a + n} is a longest block of 0’s in η, and
let N be its period. Then ηa+kN −1 = 1 for all k ∈ Z. If b ∈ B is coprime with
N , then there are k,  ∈ Z such that b = kN + a − 1 ∈ / FB , contradicting
that ηa+kN −1 = 1. Therefore no b ∈ B is coprime with N , and B cannot
contain infinitely many pairwise coprime integers. 

The following characterization is [231, Theorem B].

Theorem 4.114. The following statements about a B-free shift (XB , σ) are
(a) The unique minimal subshift of (XB , σ) is (0∞ , σ).
(b) 0∞ ∈ XB .
(c) (XB , σ) is proximal.
(d) B contains an infinite pairwise coprime subset.

Part (a) says that B is so large that {nb : n ∈ Z, b ∈ B} is a thick

set; see Definition 2.18. For part (c), it may be worth pointing out that
every x ∈ XB is proximal to 0∞ , but usually not asymptotic to it. For
instance, the set B = {pq : p, q are primes} from Example 4.106 with FB =
{± primes} ∪ {−1, 1}; the sequence x = 1FB is proximal to 0∞ because
there are arbitrarily large gaps between the primes, but not asymptotic to
it, because there are infinitely many primes.
Exercise 4.115. Let XCantor be the shift space emerging from the Cantor
substitution χCantor ; see Remark 2.14. Is XCantor a B-free shift? Is it a
Toeplitz shift?

4.6.2. The Canonical Odometer and the Mirsky Measure. Write

B = {b1 , b2 , b3 , . . . }. Let
Σ̂B = {0, 1, . . . , bk − 1} and â : Σ̂B → Σ̂B , xj → xj + 1 mod bj

be the adding machine as described in Remark 4.98. Then (Σ̂B , â) is called
the canonical odometer of (XB , σ). We abbreviate 0 = 0∞ and 1 =
â(0) = 1∞ ∈ Σ̂B . Note that, contrary to Remark 4.98, we did not make the
assumption that B = {b1 , b2 , b3 , . . . } consists of pairwise coprime integers.
Therefore orbâ (1) need not be dense in Σ̂B ; we have xi ≡ xj mod gcd(bi , bj )
for all i, j ∈ N. In more detail, Σ̂B is an abelian group under addition, and
{. . . , −x, 0, x, 2x, 3x, . . . } is dense if and only if B is pairwise coprime. As
a consequence, (Σ̂B , â) is minimal and uniquely ergodic if and only if B is
pairwise coprime.
4.6. B-Free Shifts 201

Define the window

W := {w ∈ Σ̂B : wk = 0 for all k ≥ 1}.
Then nB := ân (1) ∈ W if and only if n ∈ FB . Also define ϕB : Σ̂B → {0, 1}Z

1 if j ≡ −wk mod bk for all k ≥ 1,
(4.45) ϕB (w)j =
0 otherwise.
Then η = ϕB (0) and ϕB ◦ â = σ ◦ ϕB . Although
ϕB sends Borel sets to Borel
sets, ϕB is not continuous. For instance, a j=1 (0) → 0 as k → ∞, but in
general σ j=1 (η) → η.
Note that the interior W ◦ consists of sequences w for which a finite
number of entries wk determine that w ∈ W . Indeed, since cylinder sets are
open, there exists m such that [w]m = {y ∈ Σ̂B : yk = wk for k ≤ m} ⊂ W ◦ .
If n ∈ Z is such that nB ∈ [w]m , then n + jM B ∈ [wm ] as well for M =
k=1 bk and all j ∈ Z. But then also ηn = 1 and ηn+jM = 1 as well, for all
j ∈ Z. Therefore a non-empty interior of the window refers to 1’s that appear
periodically in η, just as in the second part of the proof of Theorem 4.112.
On the other hand, if B contains an infinite pairwise coprime subset, then
the window has empty interior and no 1 in η appears periodically. This is
exactly the situation of the first part of the proof of Theorem 4.112, if {0∞ }
is the minimal subset (trivially Toeplitz) of XB . See for a more extended
argument [380, Theorem C], which also proves19 that W is the closure of
W ◦ if and only if η itself is a Toeplitz sequence. In this case, (XB , σ) is an
almost one-to-one extension of ({n}, â).
Definition 4.116. Let μB be the unique â-invariant probability measure on
(Σ̂B , â). The Mirsky measure νB is the pull-back measure
νB (A) := μB (ϕB (A)) for every Borel set A ⊂ {0, 1}Z .
In particular, νB ([1]) = b∈B (1 − 1b ) > 0 if and only if B is thin. There-
fore, if B is not thin, then νB = δ0∞ . Peckner [451] showed that the B-
free shift for B = {p2 : p prime} is intrinsically ergodic. Kułaga-Przymus,
Lemańczyk & Weiss [379] showed that if htop (Xη ) > 0, then (XB , σ) need
not be intrinsically ergodic; the set of shift-invariant measures can be the
Poulsen simplex; see Section 6.1.
Theorem 4.117. The sequence η = 1FB is quasi-generic for the Mirsky
measure νB ; i.e. there is a subsequence (nk )k≥1 of N such that the Cesàro

19 In Theorem B; additionally in Proposition 1.2 it is proved that this Toeplitz sequence is

regular if and only if the Haar measure of the boundary ∂W is zero.

202 4. Subshifts of Zero Entropy

means of Dirac measures

nk −1
in the weak∗ topology.
δσj (η) −→ νB

¯ B ) suffices,
Any sequence (nk )k≥1 such that n1k #{FB ∩ {0, . . . , nk − 1}} → d(F
so if d(FB ) exists, then η is typical20 for νη .

It follows that, although νB is defined on XBher or even XBadm , νB (XB ) = 1.

In fact, (XB , σ) is uniquely ergodic and htop (XB , σ) = 0, because (Σ̂B , â)
has these properties (still assuming that B = {b1 , b2 , b3 , . . . } are pairwise
coprime). If B is taut, then XB is the support of νB ; see [353, Theorem
2]. The shift (XBher , σ) is intrinsically ergodic ([378] and [231, Theorem J])
even though XBher carries in general other measures, and htop (XBher , σ) can
be positive.

Proof. Since ϕB ◦ â = σ ◦ ϕB , it suffices to prove that

Nk −1
(4.46) lim 1ϕ−1 (Z) ◦ ân (0) = μB (ϕB (Z)),
k→∞ Nk B

for cylinder sets Z = {x ∈ {0, 1}Z : xki = 0 for i = 1, . . . r} for arbitrary

r ∈ N and k1 , . . . , kr ∈ Z. Define
WK = {s ∈ Σ̂B : wk ≡ 0 mod bk for all 1 ≤ k ≤ K}.
Then WK is clopen and WK * W . Note that

â−ki (WK
) ⊂ ϕ−1
B (Z) = â−ki (W c )
i=1 i=1

r r
(4.47) ⊂ â c
(WK ) ∪ â−ki (WK \ W ).
i=1 i=1
Choose ε > 0 arbitrary and let K ∈ N be so large that

μB ( ri=1 â−ki (WK c )) ≥ μ ( r
B i=1 â
(W c )) − ε,
¯ B ) + ε.
d(F{b1 ,...,bK } ) ≤ d(F
r −ki
Because i=1 â
(WK is a clopen set, the indicator function 1r â−ki (WK
c )
is continuous. The unique ergodicity of (Σ̂B , â) implies that
N −1
1 k 
(4.49) lim 1r â−ki (W c ) ◦ ân (0) = μB â−ki (WK
) .
k→∞ Nk i=1 K
n=0 i=1

20 In the sense that the Ergodic Theorem 6.13 holds for η.

4.7. Unimodal Restrictions to Critical Omega-Limit Sets 203

Note that ân (0) ∈ WK \ W if and only if n ∈ F{b1 ,...,bK } \ FB . By (4.48),

Nk −1
lim sup 1r â−ki (WK \W ) ◦ ân (0) ≤ ε.
k→∞ Nk n=0

This combined with (4.47) gives (4.46), and the proof follows. 

For every x ∈ XBher and k ∈ N, there is a wk ∈ Z such that xbk n+wk = 0

for all n ∈ Z. Thus, for the set YBher := {x ∈ XBher : wk is unique for all k ∈
N}, we can define the map
θB : YBher → Σ̂B , θ(x)k = wk .
Then θB ◦ â = θB ◦ σ and ϕB ◦ θB (x) ≤ x coordinate-wise, and due to the
unique ergodicity of (Σ̂B , â), we have νB ◦ θB−1 = μB .

4.7. Unimodal Restrictions to Critical Omega-Limit Sets

There are many unimodal maps f for which (or rather, many combinatorial
conditions implying that) the critical ω-limit set is a minimal Cantor set
on which f acts in an interesting way, e.g. (semi-)conjugate to substitution
shift, an adding machine, or a Sturmian shift. In this section, we present
some results from the literature in this direction.
A precise conditions for ω(c) to be a minimal Cantor set containing c is
due to Alvin [25, Theorem 5.2], and for this we need the following definition.
Definition 4.118. We call the sequence {n , An }n≥1 a uniform scheme
(1) (n )n≥1 is a strictly increasing sequence of integers with 1 ≥ 2.
(2) An ⊂ {0, 1}n .
(3) For every n ≥ 1 and u ∈ An+1 , we can write u = v1 · · · vk , vi ∈ An
and each w ∈ An is equal to vi for some i.
The sequence x ∈ {0, 1}N is generated by {n , An }n≥1 if xin +1 · · · x(i+1)n
∈ An for all n, i ∈ N.
Theorem 4.119. A sequence ν is the kneading sequence of a unimodal map
such that ω(c)  c is a minimal Cantor set if and only if ν is generated by a
uniform scheme {n , An }n≥1 such that the first elements an ∈ An satisfy the
(i) an is a prefix of an+1 .
(ii) ν = limn→∞ an .
(iii) σ k (an ) pl an for each n ∈ N and 0 ≤ k < n . Here pl is the
parity-lexicographical order (see Definition 3.81) and a pl b also
holds if a is a prefix of b.
204 4. Subshifts of Zero Entropy

In terms of kneading maps, we have the following straightforward suffi-

cient, but certainly not necessary, condition.

Theorem 4.120. Let f be a unimodal map with kneading map Q. If Q(k) →

∞, then ω(c) is a minimal Cantor set and htop (f |ω(c) ) = 0.

Proof. First note that c is not (pre)periodic, because (Q(k))k≥1 is un-

bounded but finite for each k. Therefore c has an infinite orbit and is recur-
rent. This means that ω(c) = orb(c). Once we have shown minimality, it is
clear that ω(c) cannot contain periodic points and thus has to be nowhere
dense. In addition, ω(c) contains no isolated points, so it must be a Cantor
In Example 5.23 we will see that every unimodal restriction to ω(c) with
Q(k) → ∞ is conjugate to an enumeration system with a low enumeration
scale. Thus minimality follows by Proposition 5.17.
Theorem 5.25 shows then also that htop (f |ω(c) ) = 0, but we will give a
slightly more direct proof: Let j ∈ N be arbitrary so that Q(k) > Q(j) for all
k > j. Since the kneading sequence ν is a concatenation of
blocks ν1 · · · νSQ(k) −1 νS Q(k) , k ≥ j, where νn = 1 − νn , the subshift derived
from (ω(c), f ) is a subsubshift of the coded shift with code words
{ν1 · · · νSQ(k) −1 νS Q(k) : k ≥ j}. By Theorem 8.73, λj := exp(htop (σ|Xj )) sat-
 −S  −S  −(S +k)
isfies Q(k)≥Q(j) λj Q(k) = 1. But Q(k)≥Q(j) λj Q(k) ≤ k≥0 λj Q(j)
= λj Q(j) /(λj − 1). Thus λj → 1 as j → ∞ and 0 ≤ htop (f |ω(c) ) ≤
limj log λj = 0 as claimed. 

4.7.1. Renormalizable Unimodal Maps and ∗-Products. In this sec-

tion, we explain renormalization for unimodal maps f and more specifically
the quadratic maps fa (x) = ax(1 − x), as this is the setting in which the
concept was first made popular. Let 0 and p = 1 − a1 ∈ ( 12 , 1) be the fixed
points of the quadratic map fa , a ∈ [2, a∗ ] where a∗ = 2.9196 . . . is the solu-
tion of a3 = 2(a2 + a + 1). If J1 = [1 − p, p], then fa (J1 ) and J1 have disjoint
interiors, but fa2 (J1 ) ⊂ J1 ; see Figure 4.14. In this case, f 2 : J1 → J1 is the
first return map to J1 and again is unimodal, although turned upside down.

Definition 4.121. A unimodal map f is renormalizable if there exists an

interval J1  c and q1 ≥ 2 such that f q1 (J1 ) ⊂ J1 and J1 , f (J1 ), . . . , f q1 −1 (J1 )
have disjoint interiors. The map f q1 : J1 → J1 is called the renormaliza-
tion of f .

As the renormalization of a unimodal map is again unimodal, f q1 : J1 → J1

may be again renormalizable; i.e. there is J2 ⊂ J1 such that J2 and f j (J2 )
have disjoint interiors for 0 < j < q2 but f q2 (J2 ) ⊂ J2 . Such a map is
4.7. Unimodal Restrictions to Critical Omega-Limit Sets 205

fa2 fa

f 2 (J2 ) J2 f 3 (J2 ) f (J2 )

H H 

H H 


J1 f (J1 )
HH *

1−p c p

Figure 4.14. A renormalizable quadratic map and its second iterate

(left) and the permutation of intervals if f is twice renormalizable

twice renormalizable. Figure 4.14 shows how these intervals are permuted if
q1 = 2, q2 = 4.
Similarly, there are maps that are 3, 4, . . . times renormalizable, or even
infinitely renormalizable. In this case there is an infinite sequence of
nested intervals
· · · ⊂ J4 ⊂ J3 ⊂ J2 ⊂ J1 ⊂ [0, 1]
and a sequence of periods (qk )k∈N (where qk divides qk+1 ), such that
f qk (Jk ) ⊂ Jk and Jk , f (Jk ), . . . , f qk −1 (Jk )
have pairwise disjoint interiors. This is what happens, with qk = 2k , during
the first period doubling cascade in the quadratic family. There is an
increasing sequence of parameters (αk )k∈N such that Qα becomes k times
renormalizable if α ≥ αk . At the limit parameter
αfeig = lim ak ≈ 3.569945672 . . .
the map becomes infinitely renormalizable. This behavior was first observed
in the 1970s by Tresser & Coullet [538] and Feigenbaum21 [241], and an
amazing observation was that the relative distances of those parameters con-
|αk+1 − αk |
(4.50) → δ = 4.669201609102990 . . . .
|αk+2 − αk+1 |
This phenomenon has been a major source of inspiration since the 1970s;
see e.g. [112, 140, 386] and the monograph in [414, Section VI]. The next
proposition gives the effect of having periodic intervals on the kneading map.
21 Mitchell Feigenbaum (1944–2019).
206 4. Subshifts of Zero Entropy

Proposition 4.122. Let f be a unimodal map with kneading map Q and

cutting times (Sk )k≥0 .
(1) Suppose that f has no attracting periodic point. Then Q(k +1) ≤ k
for all k and f has an n-periodic interval J  c for some n ≥ 2 if
and only if n = Sk for some k ≥ 1 and Q(k + j) ≥ k for all j ≥ 1.
(2) If a quadratic map f = Qa has an attracting n-periodic point p,
then n is a cutting time if p is orientation reversing and n is a
co-cutting time if p is orientation preserving.

Proof. Recall the closest precritical points ζk , ζ̂k from the proof of Theo-
rem 3.90 and (3.25).
(1) If Q(k + 1) > k, then f Sk (c) ∈ [ζk , ζ̂k ], so f Sk maps one of the
intervals [ζk , c] or [c, ζ̂k ] monotonically into itself (in an orientation-reversing
way). This interval contains an attracting Sk -periodic or 2Sk -periodic point.
If J  c is n-periodic, then f n (J ◦ )  c because otherwise f n maps [p, c]
into itself, producing an attracting n-periodic point. Therefore J  ζk , ζ̂k for
some minimal k, and n = Sk . Additionally, f j (J)  c only if j is a multiple
of n. In particular, Sk+j are all multiples of n, and thus Q(k + j) ≥ k for all
j ≥ 1.
Conversely, if Q(k + j) ≥ Q(k + 1) = k for all j ≥ 1, then f Sk (c) ∈
(ζk−1 , ζ̂k−1 ), and f Sk maps one of the intervals [ζk−1 , ζk ] or [ζ̂k , ζ̂k−1 ] in an
orientation-reversing way onto itself, producing an orientation-reversing Sk -
periodic point p. The other interval contains a preperiodic point p̂. If there
are more such points, then we can take p, p̂ furthest away from c. If f Sk (c) ∈ /
[p, p̂], then f jSk (c) ∈
/ (ζk−1 , ζ̂k−1 ) for some j ≥ 1. Then Q(k + j) < k for this
j, contrary to our assumption. Therefore f Sk (c) ∈ [p, p̂], making J := [p, p̂]
(2) Let p be an attracting periodic point, and assume it is the one closest
to c on its orbit. We can assume without loss of generality that p < c. Since
f is quadratic, Singer’s Theorem22 [414, Chapter II.6] implies that c is in the
immediate domain of p, so f kn (c) → p as k → ∞, and there is no 0 < j < n
such that f j (c) ∈ [p, c].
If p reverses orientation, there is an interval [ζ, p] that is mapped mono-
tonically onto [p, c] by f n . But this means that ζ = ζk is a closest precritical
point, so n = Sk .
If p preserves orientation, then f n ([p, c]) ⊂ [p, c), so n is not a cutting
time. Take y > f (c) maximal such that f n−1 is monotone on [f (p), y]. Then
f n−1 ([f (p), y])  c, so n is a co-cutting time. 
22 It suffices for this that f has negative Schwarzian derivative: f
− 32 ( ff  )2 ≤ 0 as quadratic
maps do; see [414, Chapter II.6].
4.7. Unimodal Restrictions to Critical Omega-Limit Sets 207

Exercise 4.123. Show that if Q is an admissible kneading map such that

k − 1 ≤ Q(j) for j = k, k + 1, . . . , k + r − 1 and Q(k + r) > k − 1, then the
corresponding unimodal map is renormalizable with period Sk .
Exercise 4.124. For the Feigenbaum map, we can take the uniform scheme
{n , An } with n = 2n and An = {ν1 , . . . , ν2n −1 , ν2n , ν1 , . . . , ν2n −1 , ν2 n }. Find
a uniform scheme if f is infinitely renormalizable with periods (qk )k≥1 .
The dynamics of f on ω(c) = k Jk can be represented by an adding
machine; see Section 4.5.2. Indeed, let p1 = q1 and pk = qk+1 /qk , and
X = {(xi )i≥1 : 0 ≤ xi < pi } with the “add and carry” operation a : X → X.
qk −1
Then (X, a) and (ω(c), f ) are conjugate. The intervals {Jk,j }j=0 in the k-th
cycle of the renormalization then represent the cylinders [x1 · · · xk ], where

j = ki=1 qi xi . However, f |ω(c) can be conjugate to an adding machine also
if f is not infinitely renormalizable. In this case, we speak of a strange
adding machine [89].
Renormalization for unimodal maps can also be described symbolically
by means of so-called ∗-products; see [195] and [164, page 72]. Suppose
that f is a renormalizable unimodal interval map, say it has periodic interval
J  0 of period q and the itineraries of orbits in f (J) start with the block
β = [β1 · · · βq−1 ∗],23 and that the renormalization f l |J is conjugate to a
unimodal map f˜ with kneading sequence ν̃. Then the kneading sequence ν
of f has the form ν = β ∗par ν̃, where the parity ∗-product ∗par is defined as

⎨ ν̃j mod q if q does not divide j,
(β ∗par ν̃)j = ν̃ if q divides j and #{k < q : βk = 1} is even,
⎩ j/q
ν̃j/q if q divides j and #{k < q : βk = 1} is odd.
The parity of 1’s in the block β determines whether the orientation of the
map f l |J is reversed w.r.t. the orientation of the original map f or not.
Example 4.125. The kneading sequence of the Feigenbaum map emerges
as the infinite ∗-product with β = 1∗. Indeed,
β ∗par ν = 1ν1 1ν2 1ν3 1ν4 1ν5 1ν6 1ν7 · · · ,
β ∗par (β ∗par ν) = 101ν1 101ν2 101ν3 101ν4 · · · ,
β ∗par (β ∗par (β ∗par ν)) = 1011101ν1 1011101ν2 1011101 · · · ,
.. .. .. ..
. . . .
This follows the pattern of a Toeplitz sequence; see Section 4.5. The limit
sequence is also obtained by the period doubling or Feigenbaum substi-
tution χfeig : 0 → 11, 1 → 10; see Example 1.6. In fact, every ∗-product of
a constant type renormalization is generated by a substitution. If the type
of renormalization varies, then we need S-adic transformations instead.
23 We don’t specify the final symbol, since it is not the same for every point in f (J).
208 4. Subshifts of Zero Entropy

4.7.2. Homeomorphic Restrictions. This section discusses whether and

when a unimodal map f : [0, 1] → [0, 1], when restricted to ω(c), is a home-
omorphism and which kneading sequences correspond to this. We exclude
the trivial case of ω(c) being a single periodic orbit.
Lemma 4.126. If X ⊂ [0, 1] is an infinite closed and f -invariant set and
f |X is a homeomorphism, then c ∈ X. In addition, if (X, f ) is transitive,
then it is minimal.

Proof. By (semi)conjugating to a tent map, we can assume that f has con-

stant slope ±λ; see [420] and [116, Section 9.5]. If htop (f ) = log λ > 0 and
X  c, then f is locally expanding, and thus X is finite by Proposition 1.42.
If htop (f ) = 0, then every orbit of f is asymptotic either to a periodic orbit
or to a Cantor set C of Feigenbaum type and hence c is an accumulation
point. Therefore X ∩ C = ∅ and X is finite also in this case.
If (X, f ) is not minimal, say Y ⊂ X is a closed subset of X not containing
c, then Y is finite by Proposition 1.42 and in particular contains a periodic
orbit P . By transitivity, we can find x ∈ X \ X  with a dense orbit. As
before we can assume that f |Y is locally expanding. Let (nk )k≥1 be so that
the f nk (x) are the successive closest approaches of x to P . Let p ∈ P be an
accumulation point of (f nk (x))k≥1 . Then f nk −1 (x) → q where f (q) = p but
q ∈ X \ P . This contradicts that f |X is one-to-one. 

Thus if ω(c) is a Cantor set on which f is one-to-one, then c ∈ ω(c).

When we view the problem symbolically, i.e. ask ourselves which kneading
sequences ν correspond to such ω-limit sets, we get the following additional
property; see [26, Theorem 3.5].
Lemma 4.127. The sequence ν is the kneading sequence of a unimodal map
for which ω(c) is a minimal set on which f is homeomorphic if and only if
ν is the only infinite sequence u ∈ Xν := {σ k (ν) : k ≥ 0} such that both 0u
and 1u ∈ Xν .

This means that c has to be accumulated by ω(c) from both the right
and the left.

Proof. First assume that f : ω(c) → ω(c) is one-to-one; then i(x) ∈ Xν

cannot be preceded by both 0 and 1, because f −1 (x) has only one preimage.
The only exception is f (c), but i(f (c)) = ν. To show that both 0ν and
1ν ∈ Xν , note that for every x, there must be a left-special word in Ln (Xν ),
because otherwise Xν is a single periodic orbit by Proposition 1.12. But
then, by taking a convergent sequence of these left-special words, there must
be an infinite left-special word. We have already seen that ν is the only
4.7. Unimodal Restrictions to Critical Omega-Limit Sets 209

Conversely, if ν is the only infinite left-special word in Xν , then there

is no x ∈ ω(c) with two preimages. This also holds if f n (x) = c and u =
limyx i(x) or u = limy x i(x). 

This proof shows that the subshift Xν has only one infinite left-special
sequence, like Sturmian sequences. However, there may be left-special words
of finite length in L(Xν ) that cannot be extended indefinitely to the right
and remain left-special. This happens for example in the Feigenbaum case

ν = ρfeig = 1011 1010 10111011 1011101010111010 · · ·

where 11 is left-special and also right-special, but both 110 and 111 are
no longer left-special. This corresponds to property (iii) in Theorem 4.128
Excluding the periodic and infinitely renormalizable cases, the first ex-
amples of kneading sequences for maps f so that f |ω(c) are homeomorphisms
were described in [119], in terms of the kneading map. The construction is
flexible enough to provide, say for the tent √family Tλ , uncountably many
parameters within every open subinterval of [ 2, 2] of slopes. Further exam-
ples emerged with the discovery of strange adding machines in [23, 89]. The
following characterization comes from [26, Theorem 4.3].

Theorem 4.128. A sequence ν ∈ {0, 1}N is the kneading sequence of a uni-

modal map with an infinite minimal set ω(c) on which f is a homeomorphism
if and only if every uniform scheme {n , An }n≥1 (see Definition 4.118) that
generates ν satisfies the following:
(i) For the first elements an of An , we have ν = limn an and an is a
prefix of an+1 .
(ii) For sufficiently large N there exists 1 ≤ m < N (with m → ∞ as
N → ∞), such that every u ∈ AN occurring in the decomposition
of ν into words of AN is preceded by am or am (i.e. am with the last
letter changed).
(iii) If a1 ∈ AN is different from aN , then there exists an extension
a1 a2 · · · ak ∈ AkN for some k ∈ N, such that every occurrence of
a1 a2 · · · ak in ν is preceded by aN .

4.7.3. Sturmian Restrictions to the Critical Omega-Limit Set.

There are multiple ways of choosing the kneading map Q so that (ω(c), f ) is
Sturmian. The simplest way is by means of the Ostrowski numeration; see
Example 5.22. Indeed, let θ ∈ [0, 1] be some irrational number and let an
be the entries and pn /qn the convergents of its continued fraction expansion.
Thus q−1 = 0, q0 = 1, and qn = an qn−1 + qn−2 for n ≥ 1; see Section 8.2.
210 4. Subshifts of Zero Entropy

Take kn = j=0 aj and then cutting times as follows:

⎨Sk = k + 1 for 0 ≤ k ≤ a1 ,
Skn = qn for n ≥ 1,

Skn +a = aqn + qn−1 for 1 ≤ a ≤ an , n ≥ 1.
It is clear that Q(k) → ∞ in this case, and the Sk ’s interpolate between the
qn ’s; cf. [125]. However, f : ω(c) → ω(c) is in general not invertible, since c
itself and/or other points in the backward orbit of c have two preimages in
ω(c); see [119].
Yet also if Q(k) is bounded (and even if Q(k) ≤ 1), there are examples
where (ω(c), f ) is Sturmian; see [117, Chapter III, 3.6]. Let ϕ : [0, 1] → [0, 1]
be a Lorenz-like map, i.e. an interval map that is continuous and increasing
both on [0, c) and (c, 1] with limxc ϕ(x) = 1 and limx c ϕ(x) = 0. Thus it
has a discontinuity at the critical point c. In addition, we will assume that
ϕ is symmetric: ϕ(1 − x) = 1 − ϕ(x) (i.e. ϕ(x̂) = ϕ(x)  with the notation
x̂ = 1 − x) for all x ∈ [0, 1] \ {c}, so the critical point c = 12 .
Every symmetric unimodal map f : [0, 1] → [0, 1] with f (c) = 1 can
be made into a symmetric Lorenz-like map by flipping the right half of the
graph vertically around c = 12 (see [28, 117]):

f (x) if x ∈ [0, c),
ϕ(x) =
1 − f (x) if x ∈ (c, 1].
Then ϕ is semi-conjugate to f : ϕ ◦ f = f ◦ f ; see Figure 4.15. In fact

n f n (x) if f n is increasing at x,
ϕ (x) =
1 − f n (x) if f n is decreasing at x.
We will use the itinerary map i for ϕ with codes +1 for [0, c) and −1 for
(c, 1].

f ϕ

1 1
c= 2 c= 2

Figure 4.15. A symmetric Lorenz-like map obtained from a unimodal map.

4.7. Unimodal Restrictions to Critical Omega-Limit Sets 211

Recall the definition of θ(x) of Exercise 3.80 as an alternative way to

code itineraries of unimodal maps f : [0, 1] → [0, 1]. Namely, θ0 (x) = +1

+1 if f n is increasing at x,
(4.51) θn (x) = (−1)ij (x) =
−1 if f n is decreasing at x,

for n ≥ 1. It follows that θ(f (x)) = σ(θ(x)) if i0 (x) = 0 and θ(f (x)) =
−σ(θ(x)) if i0 (x) = 1. For the itinerary iϕ of x ∈ I \ nj=0 ϕ−j (c) under the
function ϕ this means that

⎨in (x) = 0 and θn (x) = +1
in (x) = 0 ⇔ or
⇔ θn+1 (x) = +1

in (x) = 1 and θn (x) = −1

⎨in (x) = 1 and θn (x) = +1
iϕ (x) = 1 ⇔ or ⇔ θn+1 (x) = −1.

in (x) = 0 and θn (x) = −1

n = (1 − θn+1 (x))/2. This gives i ◦ ϕ(x) = σ ◦ i (x).

In other words, iϕ ϕ ϕ
ϕ ϕ ϕ
Also, ν = limxc i (x) with the first symbol neglected. Define ρ (n) =
min{k > n : νkϕ = νk−nϕ
}; then we recover the cutting times as S0 = 1,
Sk+1 = ρϕ (Sk ). (The co-cutting times can be recovered as Ŝ0 = κ =
min{k ≥ 1 : νkϕ = 0} and Ŝi+1 = min{k > Ŝi : νkϕ = ν ϕ }.) See the
example in the proof of Proposition 4.129 below.
To each x ∈ I we can assign a rotation number by first assigning a lift
Φ : R → R to the Lorenz map ϕ:

⎨ϕ(x) if x ∈ [0, c], ϕ(c) = 1,
Φ(x) = ϕ(x) + 1 if x ∈ (c, 1),

Φ(x − n) + n if x ∈ [n, n + 1).
Then Φ(x) mod 1 = ϕ(x mod 1) and the rotation number is defined as
Φn (x) − x
(4.52) α(x) = lim sup .
n→∞ n
Since Φ(x) = x if and only if x mod 1 ∈ [0, c) and Φ(x) = x + 1
otherwise, we obtain
(4.53) α(x) = lim sup #{0 ≤ k < n : iϕ
k (x) = 1}
n→∞ n
= lim sup #{1 ≤ k ≤ n : θk (x) = −1}.
n→∞ n
212 4. Subshifts of Zero Entropy

Next we turn ϕ into a proper circle endomorphism (with unique rotation

number independent of x ∈ S1 ) by setting

ϕ(1) = f+(1), x ∈ [0, a], where a < c is such that ϕ(a) = ϕ(1),
ϕ̄(x) =
ϕ(x), otherwise.

Also let b > c be such that ϕ(b) = a; see Figure 4.16.

ϕ(1) ϕ̄

a 1
c= 2 b

Figure 4.16. A stunted symmetric Lorenz map ϕ̄ as a circle endomorphism.

Proposition 4.129. Assume that f is a unimodal map with cutting times

{Sk }k≥0 . Let b > c be such that ϕ̄(b) = a; see Figure 4.16. Then the rotation
number of the corresponding ϕ̄ equals
∈ [ 12 , 1] ∩ Q if k is minimal such that f Sk (c) ∈ (b̂, b),
α = Sk
limk→∞ Skk ∈ [ 12 , 1] if no such k exists.

In the latter case, the kneading map Q(k) ≤ 1 for all k ∈ N, and if α ∈
/ Q,
then f : ω(c) → ω(c) is a minimal homeomorphism.

Proof. Recall that f (c) = 1 and assume that there is a minimal integer
n ≥ 1 such that ϕn (1) ∈ (c, b]. Then ϕ̄n+1 (1) ∈ (0, a] and ϕ̄n+2 (1) = ϕ̄(1) is
periodic with period n + 1.
Recall that b > c is such that ϕ̄(b) = a, so f (b) = â > c, and f 2 (b) =
f (a) = f 2 (c) > c. Therefore b ∈ (ζ̂ , ζ̂ ) for closest precritical points ζ̂ >
2 1 1
ζ̂2 > c, see (3.23), and b̂ ∈ (ζ1 , ζ2 ). There are two possibilities:
• ϕn (1) = f n (1). In this case f n is increasing at 1 and thus n+1 = Sk
is a cutting time.
• ϕn (1) = f
n (1). In this case f n is decreasing at 1 and again n + 1 =

Sk is a cutting time.
4.7. Unimodal Restrictions to Critical Omega-Limit Sets 213

By minimality of k, f Sj (c) ∈
/ [b̂, b] \ {c} for all j < k, and hence the kneading
sequence ν of f consists of blocks 0 or 11. For example,

ν = 1. 0. 0. 1 1. 0. 1 1. 0. 1 1. 1 0 ··· ,
θ = +1 − 1 − 1 − 1 + 1 − 1 − 1 + 1 − 1 − 1 + 1 − 1 + 1 + 1 ··· ,
ν ϕ
= 1. 1. 1. 0 1. 1. 0 1. 1. 0 1. 0 0 ···

where dots indicate cutting times and the bold symbol the position Sk . Since
n + 1 is the period of ϕ̄(1), this shows that #{1 ≤ j ≤ Sk : θj = −1} = k,
and in view of (4.53) we have α = k/Sk .
If there is no such minimal n, i.e. ϕn (1) ∈
/ (b̂, b) for all n ≥ 1, then f n (1) ∈
(b̂, b) for all n ≥ 1 (and in particular Q(j) ≤ 1) for all j ≥ 1. A counting
argument similar to the above shows that α = lim supk k/Sk = limk k/Sk . It
is possible that α is rational, e.g. for the logistic map fa (x) = 1 − a(x − 12 )2
with a = 3.5097. In this case, ν = (101)∞ and ϕ̄i (1) converges to an
attracting orbit of period 3. Also for the tent map Ts (x) = 1 − s|x − 12 |
√ √
with s = 12 (1 + 5), the critical orbit { 12 , 1, 34 − 14 5} has period three and
avoids [0, a].
If α ∈/ Q, then ωϕ̄ (c) is a Cantor set, disjoint from [0, a] and minimal
w.r.t. the action of ϕ̄. Under the semi-conjugacy f between f and ϕ (indeed
f ◦ f = f ◦ ϕ), this projects to a minimal map f : ωf (c) → ωf (c). We will
show that f : ωϕ (c) → ωf (c) is a homeomorphism, from which it follows that
f : ωf (c) → ωf (c) is also a homeomorphism. Assume by contradiction that
x < c < x̂ are points in ωϕ (c) such that f (x) = f (x̂) = y ∈ ωf (c). Then,
since f is the semi-conjugacy between ϕ and f , we must have f (ϕn (x̂)) =
f (ϕn (x)) = f n (y) for every n ∈ N. Note that ϕn (x̂) = ϕ n (x) for every

n ∈ N, and thus ϕn (x̂) = ϕn (x), unless ϕn (x) = c, so f n (x) = c. Since c is

not periodic, there exists N ∈ N such that ϕn (x̂) = ϕn (x) for all n ≥ N , and
thus f n (y) has two f -preimages in ωϕ (c). Since f : ωf (c) → ωf (c) is minimal,
for every ε > 0 there exist infinitely many y  ∈ orbf (y) which are ε-close to
f 2 (c) = f (1). For sufficiently small ε, an f -preimage of a point ε-close to
f (1) will be contained in [0, a]. Since every point in orbf (y) eventually has
both f -preimages in ωϕ (c), we conclude that ωϕ̄ (c)∩[0, a] = ωϕ (c)∩[0, a] = ∅,
which is a contradiction. 

We argued so far that there exist stunted Lorenz maps for which orbϕ̄ (c)
is a Cantor set with dynamics similar to circle rotations (or more precisely
to Denjoy circle maps) with irrational rotation number and that there are
also unimodal maps with kneading map bounded by 1, such that f |ω(c) is
semi-conjugate to a circle rotation. The rotation number is α = limk k/Sk .
Therefore (ω(c), f ) represents a Sturmian shift.
214 4. Subshifts of Zero Entropy

In fact, every irrational rotation number (hence every Sturmian shift) can
be realized this way, as we can prove by studying this rotation number closer.
Indeed, let α = [0; a1 , a2 , a3 , . . . ] be the continued fraction expansion of α,
with convergents pi /qi . For the irrational rotation Rα , the denominators qi
are the times of closest returns of any point x ∈ S1 to itself, and these returns
occur alternatingly on the left and on the right; see Section 8.2.
For the map ϕ̄, the closest returns on the left indeed accumulate on c,
but the right neighborhood [c, b) is the preimage of the plateau [0, a) and no
further iterates of c enter that region. Instead, returns on the left accumulate
on b.
Translating this back to the unimodal map f with kneading sequence
ν = ν1 ν2 ν3 · · · , the closest returns on the left correspond to closest returns
at co-cutting times (recall that there are no cutting times Sj so that f Sj (c) ∈
(b̂, b)). If qi is such a co-cutting time, then (recalling the function ρ from
(3.19) and using the above argument) the Farey convergents ρa (qi ) = qi +
aqi+1 are also the next co-cutting times for 1 ≤ a ≤ ai+1 , and in particular,
ρai+1 (qi ) = qi+2 .
The closest returns on the right correspond to cutting times, but this
time the f qi (c) accumulate on b, and because f 3 (b) = f 3 (c), the itinerary of
b is

(4.54) i(b) = b1 b2 b3 b4 b5 · · · = 11ν3 ν4 ν5 · · · .

Therefore we need to consider the analogous function ρb (m) = min{n > m :

bn = bn−m } and find that ρab (qi ) = qi + aqi+1 for 1 ≤ a ≤ ai+1 , and in
particular, ρb i+1 (qi ) = qi+2 .
For example, if ai ≡ 2, so the qi ’s are the Pell numbers 2, 5, 12, 29, 70, 189,
. . . , then we obtain

ν = 10.1 1.1 1.0 · · ·

where dots indicate cutting times and primes co-cutting times. The bold
symbols indicate the positions qi . In fact, for each i

νqi+1 −qi +1 · · · νqi+1 −1 νqi+1 = ν1 · · · νqi −1 νqi or ν1 · · · νqi −1 νq i .

Therefore c has two limit itineraries limxc i(x) = 0ν and limx c i(x) = 1ν,
but c has only one preimage in ω(c).
Outside maps: Boyland, de Carvalho & Hall in [105, Section 3] present
a different way of creating a circle endomorphism from a unimodal map.
They call this the outside map B and use it to study the inverse limit space
of the unimodal map as attractors of sphere homeomorphisms. Starting from
a unimodal map f : I → I such that the second branch is surjective (i.e.
4.7. Unimodal Restrictions to Critical Omega-Limit Sets 215

2(s−1) ϕ̄

0 c ϕ̄ d 2
s =2−d 2

1− 2
s a= s−1
s 1− 1
s 1

Figure 4.17. Constructing the outside map and the stunted Lorenz
map for a tent map Ts .

f ([c, 1]) = I), they

• double the interval to a circle R/2Z = [0, 2]/0∼2 ;

• let B map the second branch onto [1, 2] by flipping this branch;
• extend the definition of f on [1, 2 − d] for the unique point d ∈ (c, 1]
for which f (d) = f (0) to cover the interval [0, f (0)];
• map the remaining interval [2 − d, 2] to the constant f (0).

That is, as shown in Figure 4.17,

⎪ f (x) if x ∈ [0, c),

⎨2 − f (x) if x ∈ [c, 1),
B(x) =

⎪ f (2 − x) if x ∈ [1, 2 − d),

⎩f (0) if x ∈ [2 − d, 2).

Let us carry this out for the family of cores of tent maps Ts : I → I,

sx + 2 − s, x ∈ [0, s−1
s ],
Ts =
−sx + s, x ∈ [ s , 1],
216 4. Subshifts of Zero Entropy

for all s ∈ (1, 2]. Then the map

⎨s if 0 ≤ x ≤ a = s−1
2 s ,
(4.55) ϕ̄(x) = on R/Z
⎩s(x − 1 ) mod 1 if a = s−1
≤ x < 1,
2 s
and the outside map

⎨s(x − 1) + 2 mod 2 if 0 ≤ x < 2s ,
B(x) = on R/2Z
⎩2 − s if 2
≤ x < 2,
are conjugate with conjugacy G : R/Z → R/2Z, G(x) = 2(1 − x) mod 2;
i.e. G ◦ ϕ̄ = B ◦ G. But the conjugacy reverses orientation, so the rotation
numbers are each other’s opposite, α for ϕ̄ versus 1 − α for B.
Chapter 5

Further Minimal
Cantor Systems

In this chapter we present three main types of dynamical systems that, al-
though not subshifts themselves, are popular tools to describe (minimal)
continuous maps on Cantor sets. Cutting and stacking goes back to von
Neumann and Kakutani in the early 1960s. These systems were originally
used to create examples to test specific ergodic properties. Enumeration
systems are a generalization of and a more number-theoretic approach to
both odometers and Ostrowski numeration systems. Bratteli-Vershik sys-
tems came in the 1980s and seem to have become the most frequently used
tool to describe Cantor systems. We explain how to represent some of the
subshifts from earlier chapters in terms of these tools.

5.1. Kakutani-Rokhlin Partitions

First return maps (sometimes called induced maps) form an important
tool for studying dynamical systems. They are defined by taking a subset
Y ⊂ X and setting

TY (y) = T r (y) for the return time r = r(y) := min{i > 0 : T i (y) ∈ Y }.

The exhaustion of the space (if (X, T ) is minimal) as

(5.1) X= T i ({y ∈ Y : i < r(y)})

and the Rokhlin Lemma [479, Theorem 3.10] in the measure-preserving set-
ting are classical techniques associated with first return maps.

218 5. Further Minimal Cantor Systems

For continuous minimal (or at least aperiodic) transformations of the

Cantor set, this led to a generalization of (5.1), called the Kakutani-Rokhlin
partition. The seminal paper is by Herman et al. [310], who coined the
Definition 5.1. Let (X, T ) be a continuous dynamical system on a Cantor
set. A Kakutani-Rokhlin (KR) partition is a partition
- .N,hi −1
P = T j (Bi )

of X into clopen sets that are pairwise disjoint and together cover X. We call

B= N i=1 Bi the base of the KR-partition and the integers hi the heights.
Also we assume that T hi (Bi ) ⊂ B. (If T is invertible, this is automatic.)

Usually we need a sequence (Pn )n≥0 of KR-partitions, with bases B(n)

and height vectors (hi (n))N n
i=1 , having the following properties:

(KR1) The sequence of bases is nested: B(n + 1) ⊂ B(n) and

(KR2) Pn+1 # Pn ; that is, Pn+1 refines Pn .
(KR3) n B(n) is a single point.
(KR4) {A ∈ Pn : n ∈ N} is a basis of the topology.
The following property (KR5) relies crucially on minimality. Property (KR6)
is optional but ensures that there is a unique smallest path in the context of
Bratteli-Vershik systems and cutting and stacking systems.
(KR5) For all n ∈ N, i ≤ N (n), and i ≤ N (n − 1), there is 0 ≤ j < hi (n)
such that T j (Bi (n)) ⊂ Bi (n − 1).
(KR6) B(n) ⊂ B1 (n − 1) for all n ∈ N.
Theorem 5.2. Every continuous minimal Cantor system (X, T ) has Kaku-
tani-Rokhlin partitions Pn satisfying (KR1)–(KR6).

This result comes from [310] and was extended from minimal to transi-
tive aperiodic in [78].

Proof. Let B(1) be any clopen subset of X. By minimality, the first entry
r(x) = min{i ≥ 1 : T i (x) ∈ B(1)}
is well-defined and finite for every x ∈ X. Take Br (1) = {x ∈ B(1) :
r(x) = r}. Then
Br (1) = (T (B(1)) ∩ B(1)) \ T −j (B(1)).
5.1. Kakutani-Rokhlin Partitions 219

From this it follows that the Br (1)’s are clopen and pairwise disjoint. By
compactness of B(1) (or uniform recurrence), B(1) is the union of finitely

many such Br (1)’s, say B(1) = N i=1 Bri (1). Then
N ri −1
X= T j (Bri (1)),
i=1 j=0
and the sets in this union are pairwise disjoint. Hence, we have found our
first KR-partition P1 = {T j (Bri (1)) : 1 ≤ i ≤ N, 0 ≤ j < ri }.
To continue, take a clopen set B(2) inside one of the Bri (1)’s in the pre-
vious partition, and repeat the above construction. In this way, we can con-
struct inductively a sequence (Pn )n∈N , where Pn+1 refines Pn . The heights
hi (n) = ri for step n in this construction.
Without loss of generality, we can assume that diam(B(n)) < 1/n, so
n B(n) is a singleton, so that (KR1)–(KR3) hold. By renumbering the
Bri (n)’s, (KR6) holds as well.
To show (KR4), we can view the sets T j (Bri (n)), 1 ≤ i ≤ Nn , 0 ≤ j <
hi (n), achieved at the previous step as the targets in which to choose the next
clopen set B  . That is, for fixed η > 0 and for each pair (i, j), choose as the
next clopen set B  ⊂ T j (Bri (n)) so that diam(B  ) and diam(T j (Bri (n))) \
B  < (1 − η) diam(T j (Bri (n))). After going through all these 1 ≤ i ≤
Nn , 0 ≤ j < hi (n), we return to taking the next clopen set B(n + 1)  x.
Since the corresponding Pn+1 = {T j (Bri (n + 1)) : 1 ≤ i ≤ N, 0 ≤ j < ri }
refines all the intermediate KR-partitions, and therefore {Pk }k≥1 generates
the topology of X.
Finally, (KR5) can be achieved using minimality and taking a subse-
quences of (Pn )n∈N if necessary. 
Remark 5.3. If T : X → X is equicontinuous, then the above construction
can be refined so as to obtain that Nn ≡ 1 and T h1 (n) (B(n)) = B(n).

In the next sections, we will discuss (minimal) Cantor systems repre-

sented as cutting and stacking systems, as enumeration systems, and as
Bratteli-Vershik systems. In all of these representations, as well as for sub-
stitution shifts, graph covers (see Remark 5.4), and Toeplitz shifts, nested
sequences of KR-partitions appear naturally. The terminology of base and
height can be used for all of them; see Remarks 5.8, 5.34, and 5.56. For sub-
stitution shifts associated to a primitive (or aperiodic; see [78]) substitution
χ : A → A∗ , the base elements Ba (n), a ∈ A, are the cylinder sets associ-
ated to the words χn (a), a ∈ A, and the remaining partition elements are the
shifts of these. That this constitutes a partition relies on the recognizability
of the substitution shift [78]. More generally, for linearly recurrent shifts
(with constant L), one can also use the n-th step return words as bases of
220 5. Further Minimal Cantor Systems

the KR-partitions. In this case, Nn = #B(n) ≤ L and hi (n) ≤ Lhi (n − 1)

for all 1 ≤ i, i ≤ Nn .

Remark 5.4. The transition from nested sequence of Kakutani-Rokhlin

partitions to graph cover (see Section 4.3.4) is fairly direct: The vertices of
the n-th graph Γn are the elements of Pn , and there are arrows T j (Bi (n)) →
T j+1 (Bi (n)) if 0 ≤ j < hi (n) − 1 and also T hi (n)−1 (Bi (n)) → Bi (n) if
T hi (n) (Bi (n)) ∩ Bi (n) = ∅. As bonding maps πn : Γn → Γn−1 we take the
inclusion: πn (A) = A for vertices A of Γn and A of Γn−1 if A ⊂ A . Then
inverse limit space

Γ=← −(Γn , πn ) = {(γn )n≥0 : πn+1 (γn+1 ) = γn ∈ Γn for all n ≥ 0}


is the graph cover and (KR6) provides the positive directional property. The
dynamics f : Γ → Γ is by following the arrows as in equation (4.30).

5.2. Cutting and Stacking

The purpose of cutting and stacking is to create invertible maps of the
interval that preserve Lebesgue measure and have further desired properties
such as “unique ergodicity”, “not weak mixing”, or to the contrary “weak
mixing but not strong mixing”. The area was initiated by famous examples
by Kakutani and to Chacon, and we first give Kakutani’s example.

Example 5.5 (Kakutani). Start with the half-open interval [0, 1) as first
stack. Cut it in half and put the right half on top of the left half. Repeat

T 6

4 1 stack
T 6
1 1
4 2
T 6
1 3
2 4 1
.. T 6
1 3 1 1 3
0 2 4 1 0 4 2 4 1

Figure 5.1. The von Neumann-Kakutani map by cutting and stacking.

5.2. Cutting and Stacking 221

this procedure. The limit map T : [0, 1) → [0, 1) is call the von Neumann-
Kakutani map and the resulting formula is

⎪ x + 12 if x ∈ [0, 12 ),

⎨x − 4 if x ∈ [ 12 , 34 ),

T (x) = x − 4 + 8 if x ∈ [ 34 , 78 ),
3 1

⎪ .. ..

⎪ . .

⎩x − (1 − 1 ) + 1
2n 2n+1
if x ∈ [1 − 1 , 1 − 1 ), n ≥ 0;
2n 2n+1

see Figure 5.1. If x ∈ [0, 1) is written in base 2, i.e.

x = 0.b1 b2 b3 . . . bi ∈ {0, 1}, x= bi 2−i ,


then T acts as the adding machine or odometer: add 0.1 with carry. That
is, if k = min{i ≥ 1 : bi = 0}, then T (0.b1 b2 b3 . . . ) = 0.0 . . . 01bk+1 bk+2 . . . .
If k = ∞, so x = 0.111111 . . . , then T (x) = 0.0000 . . . . However, x =
0.11111 · · · = 1, so we need to extend the domain of T .

The general procedure is as follows:

• Cut the unit interval into several intervals, say Δ1 , Δ2 , . . . (these
will be called the stacks, and in this initial step they all have height
1) and a remaining interval S (the spacer).
• Cut each stack into slices (a fixed finite number for each Δi ), and
potentially also cut off some intervals from S.
• Pile the slices of the stacks and the cut-off pieces of S on top of
each other, according to some fixed rule. By choosing the pieces
in the previous steps of the correct length, we can ensure that all
intervals in each separate stack have the same length, so they can
be neatly aligned vertically. Denote the j-th level of the i-th stack
by Δji .
• Map every point on a level Δji of stack i directly to the level Δj+1
above it. Then every point has a well-defined image (except for
points at the top levels in a stack and points in the remainder of
S) and also a well-defined preimage (except for points at a bottom
level in a stack and points in the remainder of S). Where defined,
Lebesgue measure is preserved.
• Repeat the process, now slicing vertically through whole stacks and
stacking whole slices on top of other slices, possibly putting some
intervals of S in between. Wherever the map was defined at a
previous step, the definition remains the same.
222 5. Further Minimal Cantor Systems

• As we repeat this procedure, the measure of points where the map

is not defined yet tends to zero. In the limit, assuming that the
spacer S will be entirely spent, there will only be a finite set of
points X max (not more than the number of stacks) without image
and a finite set of points X min (not more than the number of stacks)
without preimage.
• The resulting transformation of the interval preserves Lebesgue
measure and is invertible up to at most finitely many points.
Remark 5.6. If the stacking is “right stacks on top of left stacks”, as is the
case in many examples, then 0 ∈ X min and 1 ∈ X max . It seems appealing
to map X max to X min (e.g. T (1) = 0 in Example 5.5), but it is not always
possible to do this continuously or bijectively as Example 5.7 shows.
Example 5.7. Figure 5.2 shows the cutting and stacking procedure√for the
Fibonacci substitution shift. There are two points 1 and γ −1 = 12 ( 5 − 1)
that are always at the top of their stacks, but 0 is only one point that is
always at the bottom of the stack.
Remark 5.8. The n-th step in a cutting and stacking construction relates
the n-th Kakutani-Rokhlin partition as follows: The stacks created at the
n-th step in the cutting and stacking procedure, with heights hi (n), are
hi (n)−1
{T j (Bi (n))}j=0 , so Bi (n) = Δi at step n. The non-used spacer is left out
and T i |Bi (n) remains undefined until later steps.
h (n)

Definition 5.9. The rank is r = lim inf n #{stacks used in the step n}, re-
gardless of whether spacers are used or not. Note that this is the number of
stacks after piling the slices of the previous stacks on top of each other, so
not the number of slices.

γ = 12 (1 + 5)
0 is the minimal point
• 1 = • and γ −1 = ∗ are maximal points

γ −5

γ −4
∗ •
0 γ −3 γ −2 γ −1 1

Figure 5.2. A cutting and stacking representation of the Fibonacci substitution.

5.2. Cutting and Stacking 223

Minimal subshifts can usually be represented as cutting and stacking sys-

tems, and the following result1 of Ferenczi [244, Proposition 4] ties the rank
of the cutting and stacking system to the word-complexity of the subshift.

Theorem 5.10. Let (X, σ) be a minimal subshift with word-complexity sat-

isfying pX (n) ≤ an + b for all n ∈ N. Then (X, σ) can be represented as a
cutting and stacking system of rank r ≤ 2a.

Cutting and stacking transformations, considered as single-valued maps

on the interval [0, 1), are discontinuous. To be definite, we take the intervals
of continuity at each finite step in the construction as closed from the left,
open from the right. That is, levels Δji in the stacks or pieces of spacer are
of the form [a, b), and T is discontinuous at a (unless a = 0). In particular,
the whole map is defined on [0, 1) but not at 1.
A way of resolving the discontinuities is to double all the discontinuity
points x into x− belonging to the interval at the left of x, and x+ belonging
to the interval at the right of x. That is, T (x− ) = limyx T (y) and T (x+ ) =
limy x T (y). We start with domain [0, 1], and in the limit, T is defined on a
totally disconnected space I ∗ . Namely, I ∗ is a Cantor set and T and T −1 is
defined and continuous everywhere, except (possibly) at the sets X max and
X min , respectively. We illustrate the issue with two examples.

Example 5.11.
I: Take spacer [r+ , 1] = [ 12 , 1] and at every step slice the stack in
two halves, stack the right half on the left, and put a single layer
of spacer on top; see Figure 5.3 (left). Doubling the discontinuity
points will not produce a minimal map, irrespective of how we define
T at 1 (note that T (r− ) = 34 ). If we set T (1) = 1, then T has a
fixed point where it is discontinuous. If we set T (1) = 0, then T is
continuous, but not minimal because T n (r− ) ∈ S for all n ≥ 1.
II: Take spacer [r+ , 1] = [ 23 , 1] and at every step slice the stack in three
equal thirds, stack the second slice on the first, then put in a single
layer of spacer, and finally stack the third slice on top; see Figure 5.3
(right). This is the Chacon map (or one of the Chacon maps; see
Example 1.27), related to the non-primitive Chacon substitution

0 → 0010,
χchac :
1 → 1.

1 Extending a particular case in [35], see also [226, 392, 393] for later results; also for repre-

sentations as Bratteli-Vershik systems, see Section 5.4.

224 5. Further Minimal Cantor Systems

spacer spacer

Figure 5.3. Two examples of cutting and stacking with spacer.

If after doubling the discontinuity points we set T (1) = T (r− ) =

0, then the result is a continuous, minimal, uniquely ergodic (but
weakly mixing) transformation on the Cantor set, but 1 has no

Proposition 5.12. The (pi )-odometer is conjugate to a cutting and stacking

transformation on the space I ∗ , i.e. with discontinuity points doubled. Add-
ing machines have rank 1.

Proof. The von Neumann-Kakutani map of Example 5.5 is a realization of

the dyadic odometer. For the general (pi )-odometer, proceed as follows: In
step i, cut the stack in pi ≥ 2 slices, and, without spacer, put them on top
of each other to make the stack for step i + 1. Since only one stack is used
at each step, adding machines are rank 1 transformations. The resulting
cutting and stacking map T : [0, 1] → [0, 1] is

x − (1 − qi−1
) + q1i if x ∈ [1 − qi−1
, 1 − q1i ), i ≥ 1,
T (x) =
0 if x = 1,
where qi = p1 p2 . . . pi and q0 = 1. 
hi −1
We call a cutting and stacking map primitive if for every stack {Δji }j=0
at each cutting and stacking step n , there is n > n such that part of
5.3. Enumeration Systems 225

hi −1 h  −1
{Δji }j=0 is stacked inside the i -th stack {Δ ji }j=0
at cutting and stacking

step n . This definition is the equivalent of primitivity in substitution and
S-adic subshifts.
Proposition 5.13. If a cutting and stacking transformation (I ∗ , T ) is prim-
itive and maximal number s∗ of consecutive layers of spacer is finite, then
(I ∗ , T ) is minimal.

Proof. Let an open interval U be compactly contained in [0, 1], and take
n so large that the base Δ at the n-cutting and stacking step has length
|Δ| < 12 |U | and the part of U inside the spacer is all included in the stacks.
Then for at least one i
(5.2) there exists 0 ≤ j < hi such that Δji ⊂ U,
hi < ∞ by our assumption on the spacer. By primitivity, there is n such that
hi −1 h  −1
for each i , part of the stack {Δji }j=0 finds its way into the stack {Δ ji }j=0

of cutting and stacking step n .

Therefore at this step, (5.2) is true for all i . This implies that for every
x ∈ [0, 1] with a properly defined orbit, there is 0 ≤ k ≤ maxi hi + s∗ such
that T k (x) ∈ U . Minimality follows by Proposition 2.17. 
Example 5.14. Transitivity is a weaker property than primitivity, and there
are transitive but non-minimal cutting and stacking transformations. In-
deed, the following non-primitive but transitive rank 2 cutting and stacking
transformation is not minimal. At every step, cut the left stack into three
slices a, b, c and the right stack into two slices d, e, and stack ac and dbe.
This implies that in all later cutting and stacking steps, the left stack is en-
coded acac · · · ac, so the orbit of 0 has a non-dense closure. However, every
well-defined orbit outside orbT (0) is dense. See also Example 5.45.

5.3. Enumeration Systems

A generalization of adding machines is an enumeration system2 , in the
sense that the sequence (qk )k for qk = j=1 pj is replaced by an arbi-
trary strictly increasing integer sequence G = (Gk )k . General references are
[48, 49, 286]. The theory goes back to Coquet [166], while many number-
theoretic properties are presented in [253].
Given an increasing sequence of integers (Gk )k≥0 with G0 = 1, called the
enumeration scale (after the French échelles de numération), we can
construct the greedy expansion (N0 ) = x0 x1 x2 · · · of any integer N0 ≥ 0
as follows. Start with the sequence x = x0 x1 x2 · · · of all zeroes. Take the
2 Also called generalized odometers and they are related to the Ostrowski numerations

[439]; see Example 5.22.

226 5. Further Minimal Cantor Systems

maximal Gk ≤ N0 , replace xk with xk = N0 /Gk , and continue with the

remainder N1 := N0 − xk Gk . That is, find k  maximal such that Gk ≤ N1
xk = N1 /Gk , etc. After a finite number of steps, Ni = 0 and
and let  

N0 = j x j G j .
Remark 5.15. Sometimes the greedy expansion is the only possible expan-
sion, for example the binary expansion, if Gn = 2n−1 and xn ∈ {0, 1}. Zeck-
endorf’s Theorem [567] states that the expansion is infinite and unique if the
Gn are the Fibonacci numbers and the digits xj ∈ {0, 1} satisfy xj xj+1 = 0.

Let XG = (N ) := {(n) : n ≥ 0} be the set of greedy expansions of

non-negative integers. Each x ∈ XG
 is an infinite string of integers, but with

only finitely many non-zero entries. Let XG be the closure of XG  in the

product topology on N0 .
Lemma 5.16.
XG = x = x0 x1 x2 · · · ∈ NN
0 : 0 ≤ xn < Gn+1 /Gn ,

x0 G0 + x1 G1 + · · · + xn Gn < Gn+1 for all n ≥ 0 .

Proof. For every n, if x0 x1 . . . xn 000 . . . is the greedy expansion of rn :=

x0 G0 + · · · + xn Gn , then rn < Gn+1 . Hence x ∈ XG . If, on the other hand,
x ∈ XG , so rn := x0 G0 + · · · + xn Gn < Gn+1 for each n, then no “carry”
takes place, so x0 x1 · · · xn 000 . . . is the greedy expansion of rn . 

Define the “addition of one” map a : XG  → X  as a((n)) = (n + 1).

This leads to an “add one and carry” algorithm extending the one of (4.38).
c := 1 ; k := 0
s := xk + c
If there is n > k such that Gn+1 = s + xk+1 Gk+1 + · · · + xn Gn
then xk := 0; xk+1 := 0; . . . , xn := 0; k := n + 1
else c := 0
xk := s mod pk ; k := k + 1
Until c=0
Proposition 5.17. Let Q(n) < n be the maximal integer such that
(5.3) Gn = d1 Gn−1 + · · · + dn−Q(n) GQ(n)
for integers 0 ≤ dj ≤ Gj+1 /Gj , so maximality implies that dn−Q(n) > 0.
Then limn→∞ Q(n) = ∞ if and only if a can be extended to a continuous
“add one and carry” operation a : XG → XG which is then also surjective
and minimal.
5.3. Enumeration Systems 227

Proof. We show that under the condition in the proposition, a : XG  → X

is uniformly continuous. Then the extension is well-defined and uniformly
 → X  \{0∞ } is surjective and a((G −1)) =
continuous as well. Since a : XG G n

(Gn ) → 0 as n → ∞, surjectivity follows.
Now for minimality, let Z := [x0 . . . xN −1 ] be an arbitrary cylinder set
and let S := N j=0 xj Gj . Because Q(n) → ∞, there is N such that Q(n) ≥
N for all n ≥ N  . Then (S + xN  GN  ) ∈ Z for each 0 ≤ xN  < GN  +1 /GN  .
Next (S + GN  +1 ) ∈ Z and (S + GN  + GN  +1 ) ∈ Z or (S + GN  +2 ) ∈ Z,
whichever integer is smaller. Continuing this way we find for each n ∈ N
some m ≤ GN  such that (n + m) ∈ Z. This is uniform recurrence, so by
Proposition 2.17 minimality follows.
To transfer these properties to the whole of XG , let us prove that a :
XG → X  \ {0∞ } is uniformly continuous. Let N ∈ N be arbitrary, and
take N  so large that Q(n) > N for all n ≥ N  . For any x ∈ XG and
y ∈ [x0 . . . xN  ] ∩ XG , a(x)i = a(y)i for 0 ≤ i ≤ N , because, by the choice of
N  , the first N + 1 digits cannot give a carry to a digit n ≥ N  . This proves

uniform continuity with ε = 2−(N +1) for arbitrary N and δ = 2N .
Conversely, suppose Q(n) → ∞, so there is a sequence (nk )k≥1 with
N0 = Q(nk ) for all k. Let nk be such that dnk = 0; then in (Gnk ), the
entries xnk = 0 and x0 G0 + x1 G1 + · · · + xN0 −1 GN0 −1 = GN0 − 1. Also
nk → ∞ as k → ∞. Therefore, for any ε > 0, there is k such that
d((Gnk − 1), (Gnk − Gnk − 1)) < ε. But a((Gnk − 1)) starts with N0 + 1
zeroes and a((Gnk −Gnk −1)) starts with N0 zeroes and some non-zero digit.
Hence a is not continuous on XG. 

Lemma 5.18. Let

(5.4) XG = {x ∈ XG : x0 G0 + · · · + xn Gn = Gn+1 − 1 infinitely often}.

Then a(x) = 0∞ if and only if x ∈ XG

, and a−1 is well-defined at every

x ∈ XG \ {0 }.

Proof. If x = y ∈ / XG , there is a largest nx such that x0 G0 + · · · + xnx Gnx =
Gnx +1 − 1 and y0 G0 + · · · + yny Gny = Gny +1 − 1. Take n = max{nx , ny } + 1.
If no such nx , ny exist, then take n = 1. Then a(x)j = xj and a(y)j = yj
for all j ≥ n. Hence, if xj = yj for some j > n, then a(x) = a(y). If xj = yj
for some j > n, then a(x)0 , . . . , a(x)n , 0, 0, . . . and a(y)0 , . . . , a(y)n , 0, 0, . . .
are the greedy expansions of r(x) := 1 + x0 G0 + · · · + xn Gn and r(y) :=
1 + y0 G0 + · · · + yn Gn , respectively, and hence they are not equal because

x = y. This proves that a is injective on XG \ XG . Surjectivity follows from
Proposition 5.17, so a : XG \ {0 } → XG \ XG is well-defined.
228 5. Further Minimal Cantor Systems

On the other hand, if x ∈ XG 

, then a(x) = 0∞ , because the condition
x0 G0 + x1 G1 + · · · + xn Gn = Gn+1 − 1 holds for infinitely many n. 
Corollary 5.19. If x ∈ XG , then a(x) = 0∞ if and only if x ∈ XG

. Hence

a is invertible if and only if #XG = 1.

The set XG is called the arbre de retenues in [48]. The system (XG , a)
is called an enumeration system.

& an integer sequence (pj )j≥0 with p0 = 1, pj ≥ 2 for

Example 5.20. For
j ≥ 1, set Gn = nj=1 pj . Then the corresponding numeration scale XG

is exactly the odometer, and XG = {(p1 − 1, p2 − 1, p3 − 1, p4 − 1, . . . )}.
Therefore a : XG → XG is invertible.
Example 5.21. If (Gn )n≥0 = 1, 2, 3, 5, 8, 13, 21, . . . are the Fibonacci num-
bers, then XG is exactly the space for the Fibonacci SFT, although the ad-
dition map is of course entirely different from the shift. The two “maximal”
sequences are

XG = {(1, 0, 1, 0, 1, 0, 1, . . . ) , (0, 1, 0, 1, 0, 1, . . . )},
so a is not invertible.
Example 5.22. The standard continued fraction expansion of a number
θ ∈ (0, 1) is
θ = [0; a1 , a2 , a3 , . . . ] := ;
a1 + a + 1 1
a3 + 1
see Section 8.2. For every rational θ ∈ (0, 1) there are two finite expansions,
and for every irrational θ the expansion is unique. The convergents pqnn :=
[0; a1 , a2 , . . . , an ] are the partial fractions obtained by cutting the infinite
expansion at step k. If we set p−1 = 1, q−1 = 0, p0 = 0, q0 = 1, then the
sequences (pn )n≥1 and (qn )n≥1 satisfy the recursive relations (see Section 8.2)
pn = an pn−1 + pn−2 , qn = an qn−1 + qn−2 .
Furthermore, ( pqnn − θ)n≥1 is an alternating sequence converging to 0
(super)exponentially, depending on how fast the sequence (an )n≥1 grows.
If we let (Gn )n≥0 = (qn )n≥0 , then we obtain the Ostrowski numera-
tion XG (see [439] and [20, Section 3.9]) with
G0 = 1, G1 = a1 , Gn = an Gn−1 + Gn−2 ,
so ai ≡ 1 reduces this example to Example 5.21. In any case

XG = {(a1 − 1, 0, a3 , 0, a5 , 0, a7 , . . . ), (0, a2 , 0, a4 , 0, a6 , . . . )},
so 0∞ has two preimages
 under a. A curious property of this numeration is
that θ(n + 1) = i (n)i Gi ; see [20, Corollary 9.1.14].
5.3. Enumeration Systems 229

Example 5.23. Let Gk = Sk be the cutting times of a unimodal map

f : [0, 1] → [0, 1] and critical point c. Then G0 = 1 and Gk = Gk−1 +
GQ(k) ≤ 2Gk−1 . Here the kneading map plays exactly the role of the Q in
Proposition 5.17, and we assume that Q(k) → ∞.
(5.5) XG = {x ∈ {0, 1}N | xn = 1 ⇒ aj = 0 for Q(n + 1) ≤ j < n}.
In the language of [48], such a G is a low enumeration scale (échelle basse).
Define π : XG  → orb(c) by π((n)) = f n (c). This map is uniformly

continuous and extends to a continuous map π : XG → ω(c). Although this

extension need not be injective, we have π ◦ a = f ◦ π; see [125, Theorem 1].
Remark 5.24. Let f : [0, 1] → [0, 1] be a unimodal map with kneading map
Q(k) → ∞. One can show that f |ω(c) is invertible if and only if #(XS ) = 1.
This applies to infinitely renormalizable unimodal maps, but also if there is
a strange adding machine; see [24].

In [49] we have the following result concerning entropy.

Theorem 5.25. Assume that Q(n) → ∞ and Gn /Gn−1 is bounded (so that
XG is compact). Then the enumeration system (XG , a) has zero entropy.

Recall from Theorem 4.104 that an equicontinuous minimal Cantor sys-

tem is conjugate to an adding machine, so, unless Gn−1 divides Gn for all n
sufficiently large, enumeration systems are not equicontinuous.

Proof. Let ε > 0 be arbitrary and let N be such that 2−N < ε. Set Q :=
inf n>N Q(n). Let x ∈ XG be arbitrary. For a(x) to have a carry beyond
index N , we need i=1 xi Gi ≥ GQ − 1, and this happens at most once
every GQ iterates of a. At such iterate there can be a carry or not, so it
can no more than double the number of points in an (n, ε)-separated set
and still have an (n + GQ , ε)-separated set. The maximal cardinality of an
(Gn , ε)-separated set is bounded by ε−1 2Gn /GQ , so
log(ε−1 2Gn /GQ ) log 2
htop (a) ≤ lim lim sup ≤ lim = 0.
ε→0 n→∞ Gn N →∞ GQ

This ends the proof. 

5.3.1. Factors of Enumeration Systems. Let

|||x||| = min{x − x , x − x}
denote the distance of x to the nearest integer, and let sf(x) = x − x − 12 
be a signed fractional part of x taken in (− 12 , 12 ]. Hence |||x||| is the absolute
value of sf(x).
230 5. Further Minimal Cantor Systems

Proposition 5.26. Let (XG , a) be an enumeration system associated to the

integer sequence (Gn )n≥0 . If ρ ∈ R is such that

(5.6) |||ρGj ||| < ∞,

then ∞
g : XG → T1 , x → e2πi( j=0 ρxj Gj )

where T1 is the unit circle in C, is a continuous eigenvector of (XG , a), with

eigenvalue e2πiρ .

Proof. Take ε > 0 and N ∈ N such that ∞ j=N |||ρGj ||| < ε. Then
/ /
/∞ / ∞
/ N
/ sf(ρxj Gj ) − /
sf(ρxj Gj )/ ≤ |||ρGj ||| < ε.
/ j=0 j=0 / j=N +1

This shows that g : X → T1 is uniformly continuous. For each n ∈ N0 , we

have g((n)) = e2πiρn . Therefore
g ◦ a((n)) = g((n + 1)) = e2πiρ(n+1) = e2πiρ g((n)).
 to X .
By uniform continuity, we can extend this relation from XG G 

The choice ρ = 0 trivially satisfies (5.6), with the eigenfunction g(x) ≡ 1.

This just confirms ergodicity, and it is not what we want to study. There can
be multiple (rationally independent) non-zero ρ’s, say ρ0 , . . . , ρd−1 , satisfying
(5.6) simultaneously. In this case, we can build a continuous factor map onto
the (d − 1)-dimensional torus:
g : XG → Td−1 , x → (g1 (x), . . . , gd−1 (x)),

where gk (x) = exp(2πi ∞ j=0 ρk xj Gj ). This happens for instance if (Gn )n≥0
is generated by a recursive relation Gn = a1 Gn−1 + · · · + ad Gn−d such that
the corresponding characteristic equation 1 = a1 x−1 + · · · + ad x−d has a
Pisot number λ as leading root; see Proposition 8.6. In this case, we can
take ρ1 = λ, ρ2 = λ2 , . . . , ρd−1 = λd−1 and λd is an integer combination
of {1, λ, . . . , λd−1 }. The standard (and actually first [471]) example is the
Rauzy fractal based on the tribonacci number, i.e. the leading root of x3 =
x2 + x + 1; see Figure 4.4 (left).
Example 5.27. In [125]3 , this approach is used to describe the ω-limit sets
of unimodal maps with kneading maps Q with k − Q(k) bounded. The left

3 However, [125] also gives examples where (ω(c), f ) factors to a torus (of any dimension)

and to solenoids, i.e. circle suspensions over Cantor sets.

5.3. Enumeration Systems 231

panel in Figure 5.4 is constructed in this way from the sequence (Gn )n≥0 =
1, 2, 3, 4, 6, 9, 13, 19, . . . (sometimes called the Narayama cow sequence4 ).
The picture suggests, and this is indeed true, that the boundary of such
a Rauzy fractal is a fractal, non-rectifiable, curve. It has infinite length, but
the interesting question is whether it has positive two-dimensional Lebesgue
measure or not. Occasionally, this can be decided upon by a simple geometric
argument. For the case x3 = x2 + 1 with solutions λ0 > 1 and λ1 = λ̄2 , the
space XG consists of sequences in which every two 1’s are separated by at
least two 0’s. Define π : XG → C as
n n
π(x) = lim sf(λ0 xj Gj ) + i sf(λ20 xj Gj )
j=0 j=0

and set P = π(XG ). Identify the two-dimensional torus T2 with the quotient
space C/(Z + iZ), and note that we have
T ◦π =π◦a for the translation T : T2 → T2 , z → z + λ0 + iλ20 .
In Figure 5.4 (left), the three shades refer to three cylinder sets P00 =
π(00XG ), P100 = π(100XG ), and P0100 = π(0100XG ). As shown in [125]
P = P00 ∪ P100 ∪ P0100 is the attractor (in the sense of Hutchington [326])
of an iterated function system (IFS):

⎨ ψ00 : P → P00 , z → λ21 z,
(5.7) ψ100 : P → P100 , z → λ41 + λ31 z,

ψ0100 : P → P0100 , z → λ51 + λ41 z.
Since λ0 λ1 λ2 = 1, the squares of the absolute values of the contraction
factors sum to
|λ21 |2 + |λ31 |2 + |λ41 |2 = λ21 λ22 + λ31 λ32 + λ41 λ42 = λ−2 −3 −4
0 + λ0 + λ0
= λ−4 (λ20 + λ0 + 1)
= λ−4
0 (λ0 − (λ0 + 1) (λ0 − λ0 − 1)) = 1.
4 3 2

Therefore Leb(P ) = Leb(P00 ) + Leb(P100 ) + Leb(P0100 ), so that the three

cylinders overlap in a Lebesgue nullset. On the other hand, P mod (Z + iZ)
= T2 because π(XG  ) mod (Z + iZ) = {n(λ + iλ2 ) : n ≥ 0} mod (Z + iZ) is
0 0
dense in the torus.
This discussion on Rauzy fractals raises the question of how the con-
struction in this section is related to the construction in Section 4.2.4, i.e.

4 The Indian mathematician Narayama Pandita (1325–1400) studied cows in much the same

way that Fibonacci studied rabbits, only cows take more time to mature than rabbits.
232 5. Further Minimal Cantor Systems

Figure 5.4. Rauzy fractals for x3 = x2 + 1 in two different constructions.

the left and right panels of Figure 5.4. Continuing our example, consider the
⎧ ⎛ ⎞

⎨0 → 02, 1 1 0
χ : 1 → 0, with associated matrix A = ⎝0 0 1⎠ .

2→1 1 0 0

This matrix has left and right eigenvectors (λ2i , λi , 1) and (λ2i , 1, λi )T , where
λi , i = 0, 1, 2, are the roots of the characteristic polynomial p(x) = x3 −
x2 − 1. Hence p(x) = 0 is exactly the characteristic equation of our enu-
meration scale. This means that the attracting right eigenspace of A is
V = (λ20 , λ0 , 1)⊥ . Applying Theorem 4.40 to this substitution (its prefix-
suffix graph is given in the right panel of Figure 4.5) we get that its Rauzy
fractal R ⊂ V is the attractor of the graph-directed IFS

⎨R(0) = h(R(0)) ∪ h(R(1)),
(5.8) R(1) = h(R(2)), where h = A|V .

R(2) = h(R(0)) + π(10 )

Using (5.8) twice on its first line, we obtain

R(0) = h2 (R(0)) ∪ h2 (R(1)) ∪ h2 (R(2)) = h2 (R),
so the first tile R(0) is an affine (and since λ1 = λ2 actually conformal) copy
of the entire Rauzy fractal R. Applying (5.8) twice on R(1) and once more
on R(2), we get
R(0) = h2 (R(0)) ∪ (h3 (R(0)) + h2 ◦ π(10 )) ∪ (h4 (R(2)) + h ◦ π(10 )).
But R(0) = h2 (R), so
R = h2 (R) ∪ (h3 (R) + π(10 )) ∪ (h4 (R) + h ◦ π(10 )).
5.4. Bratteli Diagrams and Vershik Maps 233

Next let f : V → C be a linear isometry. Since λ1 = λ2 , the contraction

h turns into a multiplication by λ1 : f ◦ h = λ1 · f . Therefore Q := f (R)
Q = λ21 Q ∪ (λ31 Q + z) ∪ (λ41 Q + λz), for z := f ◦ π(10 ) = 0.
Substitute P = zλ−4 4 −1
1 Q and finally multiply the set-equation by λ1 z . This
P = λ21 P ∪ (λ31 P + λ41 ) ∪ (λ41 P + λ51 ),
exactly the same as the set-equation we derived from (5.7). Uniqueness of
Hutchington attractors shows that P = P, and hence the Rauzy fractal of the
above example is, up to a scaling, the same as the Rauzy fractal associated
to χ.

5.4. Bratteli Diagrams and Vershik Maps

Bratteli diagrams emerged in the area of operator algebras [108], C ∗ -algebras
in the first place, but were given a dynamical interpretation when Vershik
equipped it with an order and a successor map, [546]. Vershik named this
map the adic transformation of a Markov compactum (see [402, 403, 519,
546, 547]); the explicit connection with the Bratteli diagram seems to have
been made later. Herman, Putnam & Skau [310] showed that every (es-
sentially) minimal homeomorphism on the Cantor set can be represented
as a Bratteli-Vershik system. Later, Medynets [413] extended this to all
aperiodic homeomorphisms on the Cantor set; see also [212, Theorem 6.14].
An ordered Bratteli diagram is an infinite graph consisting of
• a sequence of finite non-empty vertex sets Vi , i ≥ 0, such that V0
consists of a single vertex v0 ;
• a sequence of finite non-empty edge sets Ei , i ≥ 1, such that each
edge e ∈ Ei connects a vertex s(e) ∈ Vi−1 to a vertex t(e) ∈ Vi .
(Here s and t stand for source and target.) For every v ∈ Vi−1 ,
there exists at least one outgoing edge e ∈ Ei with v = s(e), and
for every v ∈ Vi there exists at least one incoming edge e ∈ Ei with
v = t(e);

• for each v ∈ i≥1 Vi , a total order < between its incoming edges.
The path space
XBV := {(xi )i≥1 : xi ∈ Ei , t(xi ) = s(xi+1 ) for all i ∈ N}
is the collection of all infinite edge-labeled paths starting from v0 , endowed
with product topology. That is, the set of infinite paths with a common
initial n-path is clopen, and all sets of this type form a basis of the topology.
234 5. Further Minimal Cantor Systems

To each Ei we assign an incidence matrix M (i) = (mv,w (i))v∈Vi−1 ,w∈Vi

of size #Vi−1 × #Vi , where mv,w (i) is the number of edges from v ∈ Vi−1 to
w ∈ Vi .
Definition 5.28. For n ∈ N, define the height vectors h(n) = (hv (n))v∈Vn
(interpreted as row vectors) as
hv (n) = #{x1 . . . xn : s(x1 ) = v0 , t(xn ) = v},
that is, the number of n-paths from v0 to v ∈ Vn . Taking h(0) = (1) by
default, it follows by induction that h(n) = h(0)·M (1)·M (2) · · · M (n) for all
n ∈ N. The Bratteli diagram has a simple cap if h(1) = M (1) = (1, . . . , 1).
Remark 5.29. Instead of taking the collections of edges Ei separately, we
can also consider all the paths from Vi−1 to Vj for some j ≥ i. We denote
this collection of paths by Ei,j . The incidence matrix associated to Ei,j can
be shown to be the matrix product M (i, j) = M (i) · · · M (j). This process
is called telescoping. The opposite procedure, i.e. inserting extra levels of
vertex and edge sets, is called microscoping.
Example 5.30. In the Bratteli diagram in Figure 5.5, we telescope the first
two levels away. The corresponding computation of the associated matrices
is as follows:   
  1 1 1  
M (1, 3) = 1 1 = 4 .
2 0 1

v0 v0 v0
M (1) = 1 1
  V1 M  (1) = 3 1 M  (1) = 4
1 1
M (2) =
2 0

  V2  V1
1  1
M (3) = M (2) =
1 1
V3 V2 V1

Figure 5.5. Telescoping a Bratteli diagram.

Exercise 5.31. Show that also if M (1) = (1, . . . , 1), then there is an equiv-
alent Bratteli diagram with M (1) = (1, . . . , 1).

It is often useful to telescope Bratteli diagrams in such a way that all

incidence matrices become strictly positive. This is possible if and only if,
for every i, there exists j ≥ i such that for every v ∈ Vi and w ∈ Vj+1 , there
is a path from v to w.
5.4. Bratteli Diagrams and Vershik Maps 235

Definition 5.32. A Bratteli diagram is called simple if there is an increas-

ing sequence 0 = m0 < m1 < · · · such that after telescoping between levels
mi−1 and mi , then new matrices

M  (i) := M (mi−1 , mi ) are strictly positive,

for all i ≥ 1. In this way, it is analogous to the notion of primitive used for
incidence matrices of SFTs or S-adic transformations, except that primitive
requires that supi mi − mi−1 < ∞; see Definition 4.42.

If the Bratteli diagram is stationary (i.e. M (i) = M is the same matrix for
all i ≥ 2), then the path space XBV is identical to the path space of an edge-
labeled transition graph associated to M . The Vershik map τ : XBV → XBV
that we will define below, however, is quite different5 from the left-shift σ.
The latter is hyperbolic and of positive entropy6 whereas τ is not hyperbolic
and, on stationary Bratteli diagrams, has zero entropy.
If x = x1 . . . xN and y = y1 . . . yN , with xi , yi ∈ Ei for 1 ≤ i ≤ N , are
finite paths, then we can compare x and y if they have the same endpoint in
VN . Let m < N be the largest index such that xm = ym . This means that
t(xm ) = t(ym ), and we say that x < y if xm < ym . This gives a partial order
on the set of N -paths and a total order on the set of N -paths ending in the
same v ∈ VN . For every v ∈ Vi , there is a unique minimal path from v up to
v0 and at least one e ∈ Ei with s(e) = v. From this it follows that there are
infinite paths x ∈ XBV such that the initial N -path xmin [1,N ] is minimal among
all N -paths with the same terminal vertex. That is, the collection XBV min of

minimal infinite paths is non-empty, and at the same time, every v ∈ VN

can have only one minimal incoming path xmin [1,N ] terminating in v. If XBV

consists of a single element, we denote it as xmin .

max of maximal infinite paths and
The same is true for the collection XBV
if XBV consists of a single element, we denote it as xmax . In other words,

we have proved the following lemma:

Lemma 5.33. For every ordered Bratteli diagram XBV , we have

1 ≤ #XBV
min max
, #XBV ≤ lim inf #Vn .

The Vershik adic transformation (Vershik map) τ : XBV → XBV

is defined as follows [546]: For x ∈ XBV , let i be minimal such that xi ∈ Ei

5 Without going into details, the shift and Vershik map can be likened to the geodesic flow

and horocycle flow on a manifold of curvature −1; see the interesting exposition in [487].
6 See [487] for a comparison to geodesic and horocycle flow.
236 5. Further Minimal Cantor Systems

is not the maximal incoming edge. Then set

⎨τ (x)j = xj for j > i,
τ (x)i is the successor of xi among all incoming edges at this level,

τ (x)1 . . . τ (x)i−1 is the minimal path connecting v0 with s(τ (x)i ).
max , and we need to choose y ∈ X min to define
If no such i exists, then x ∈ XBV BV
τ (x) = y. Whether τ extends continuously to XBV max depends on how well we

can make this choice. Medynets [413] gave an example of a Bratteli diagram
that doesn’t allow any ordering by which τ is continuously extendable, even if
#XBV min = #X max ; see Figure 5.6 (right). For this diagram the only incoming
edges to u ∈ Vn come from u ∈ Vn−1 , and therefore there is a minimal and
a maximal path going through vertices u only. By the same token, there is
a minimal and a maximal path going through vertices w only. No matter
how τ is defined on these two maximal paths, there is no way of putting
an order on the incoming edges to v ∈ Vn such that this definition makes τ
continuous at these maximal paths.
Remark 5.34. For Bratteli-Vershik systems to a Kakutani-Rokhlin parti-
tion, the partition Pn is formed by the n-cylinders, represented by n-paths
connecting v0 with some v ∈ Vn . There are then hv (n) such path, and the
smallest path corresponds to the base elements Bv (n).

Example 5.35. There are examples where XBV is not a Cantor set.
The two examples in Figure 5.6 (left) have opposite ordering. On the
min consists of the (vertex-labeled) paths e := v → v → v → v → · · ·
left, XBV 0

v0 v0 v0

v v v v u v w

0 1 0 0 0 1
v v v v u v w

0 1 0 0 0 1
v v v v u v w

0 1 0 0 0 1
v v v v u v w

Figure 5.6. Non-Cantor set Bratteli diagrams and a non-simple Brat-

teli diagram.
5.4. Bratteli Diagrams and Vershik Maps 237

and e := v0 → v  → v  → v  → · · · , and XBV

max consists of the path v →
v → v → v → · · · . That is, e is both minimal and maximal, and defining
τ (e) = e is the only continuous possibility. The path e is an isolated point
(as are in fact all paths different from e). The resulting system is conjugate
to f : Y → Y for Y := { n1 : n ∈ N} ∪ {0}, defined by f (0) = 0, f ( n1 ) = n+1
Note that f is not surjective.
On the right, XBV min = {e} and X max = {e, e }, and τ (e) = e is the
only continuous possibility, but then τ (e ) = e is forced too (since we want
τ (X max ) ⊂ X min ). Now τ is conjugate to f −1 : Y → Y defined as f −1 (0) =
f −1 (1) = 0 (compensating for f not being surjective) and f −1 ( n+1
) = n1 for
n ≥ 1.
Proposition 5.36. The Vershik map τ can be extended continuously on an
ordered Bratteli diagram in the following situations:
• #XBVmin = #X max = 1. In this case τ : X
max min

extends to a homeomorphism of XBV .

• 2 ≤ #XBV min ≤ #X max and X min and X max have non-empty interi-
ors. If also τ : XBV \ XBV
max → X
BV \ XBV is uniformly continuous,

then τ extends to an endomorphism of XBV . If #XBV min = #X max ,

then this extension is a homeomorphism.

The first part was already addressed in [310, Section 2]; they called
such Bratteli-Vershik systems “essentially simple” where currently the word
“properly ordered” is used; see Definition 5.38. The question has been inves-
tigated in detail by Bezuglyi et al. [81, 82]7 , calling such Bratteli-Vershik
systems perfect. A different account is due to Downarowicz & Karpel
[212, 213]. They call a Bratteli-Vershik system decisive if τ can be ex-
tended to a homeomorphism in a unique way. According to [212, Lemma
6.11], a Bratteli-Vershik system is decisive if and only if τ is uniformly con-
tinuous on XBV \ XBV max and the interior of X max is either empty or a single
isolated point.
Example 5.37. There are four ways to assign a stationary order to a sta-
1 1
tionary Bratteli diagram with incidence matrix 1 1 ; see Figure 5.7.
• The right two cases represent the Thue-Morse shift. They have
two minimal and two maximal infinite paths, but it is impossible
max → X min so that τ : X
to define τ : XBV BV BV → XBV becomes
continuous, let alone a homeomorphism. As shown in [81, Example
3.5] (see also [212, Example 6.12]), it is not possible to extend τ
continuously, also for non-stationary orders, as soon as there are
two minimal paths.
7 Also for infinite rank Bratteli diagrams, see Definition 5.41 below.
238 5. Further Minimal Cantor Systems

v0 v0 v0 v0

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

Figure 5.7. Different orders on the same stationary Bratteli diagram.

• The left two cases have only one minimal and maximal path. Now τ
can be extended continuously to a homeomorphism of XBV , and it
is conjugate to the dyadic odometer; see Remark 5.55, [81, Propo-
sition 2.20], or [255, Section 5] which are concerned with the char-
acterization of odometers as Bratteli-Vershik systems. This also
follows because (XBV , τ ) represents an invertible Toeplitz shift; see
[208, below Theorem 5.1] and Theorem 5.54 and the text below it.
Definition 5.38. A Bratteli-Vershik system (XBV , τ ) is called properly
min = #X max = 1.
ordered if it is simple (as in Definition 5.32) and #XBV BV

Proposition 5.39. All well-defined forward orbits of a simple Bratteli-

Vershik system are dense; in particular, a properly ordered Bratteli-Vershik
system is minimal.

Proof. Take any cylinder set [x1 . . . xn ]; it corresponds to an n-path from

v0 to some v ∈ Vn . The map τ goes cyclically through all paths from v0 to
v, except that there are other paths (not to v) between the maximal path
from v0 to v and the reappearance of the minimal path between v0 and v.
Recall from Definition 5.28 that hv (n) = #{paths between v0 and v ∈ Vn }.
By transitivity, there is k ≥ 1 such that M (n, n+k) = M (n) · · · M (n+k)
is strictly positive. Hence, in whichever way x1 . . . xn continues to x1 . . . xn+k ,
it will take no more than the maximal column-sum K of M (n, n + k) of
successor paths of xn+1 . . . xn+k before a path xn+1 . . . xn+k appears with
s(xn+1 ) = v.
Therefore, if x ∈ [x1 . . . xn ] has a well-defined orbit, then there is 0 <
j ≤ K · maxv∈Vn hv (n) such that τ j (x) ∈ [x1 . . . xn ].
This proves that orbτ (x) is dense and uniformly recurrent. Hence, if
XBV is properly ordered and, in particular, τ is a homeomorphism, then
Proposition 2.17 implies that (XBV , τ ) is minimal. 
5.4. Bratteli Diagrams and Vershik Maps 239

Figure 5.6 illustrates the effect of reversing the order of each collection
of incoming edges:
Lemma 5.40. The system (XBV , ≤, τ ) is conjugate to (XBV , ≥, τ −1 ) wher-
ever τ is well-defined and injective. That is, if we reverse the order on the
incoming edges everywhere, we obtain a system whose inverse Vershik map
is conjugate to the original system. In particular, the set of minimal and
maximal paths change roles.
Definition 5.41. A Bratteli diagram B = ((Ei )i∈N , (Vi )i∈N ) has rank r B :=
lim inf i #Vi . A Bratteli diagram has rank r if r is the smallest integer such
that (XBV , τ ) is isomorphic to a system on a Bratteli diagram with r B = r.
If no such finite r exists, then the Bratteli diagram is said to have infinite

As was shown in [204], every minimal subshift with sublinear word-

complexity can be represented as a Bratteli-Vershik system of finite rank.
There are, however, minimal subshifts with superlinear word-complexity that
can be represented as a Bratteli-Vershik system of finite rank.
Exercise 5.42. Show that telescoping and microscoping produces conjugate
Bratteli-Vershik systems. Show that the rank r ≤ r B and that the inequality
can be strict. Show that a rank r B Bratteli diagram can be telescoped to a
Bratteli diagram with r B = #Vi for all i.
Example 5.43. The Pascal Bratteli diagram is characterized by Vk =
{0, 1, . . . , k} and Ek = {i → i : 0 ≤ i ≤ k − 1} ∪ {i → i + 1 : 0 ≤ i ≤ k − 1}.
The corresponding Bratteli-Vershik system has uncountably many measures
that are ergodic w.r.t. the equivalence relation of having the same tail;
namely every (p, 1 − p)-Bernoulli measure represents one such ergodic mea-
sure; see [415] and e.g. [254, 256] for more results on infinite rank Bratteli-
Vershik systems. In this text we will confine ourselves to finite rank systems.

The following result, reminiscent of the Auslander-Yorke dichotomy, is

due to Downarowicz & Maass [215] in the minimal case and Bezuglyi et al.
[78, Theorem 4.8] for aperiodic systems.
Theorem 5.44. Every minimal Bratteli-Vershik system of finite rank is ei-
ther expansive (namely if its rank r ≥ 2) or conjugate to an odometer. If
the Bratteli-Vershik system is not minimal but aperiodic and if no minimal
component is conjugate to an odometer, then it is expansive.

Example 5.45. The non-primitive substitution of Figure 5.8 illustrates the

second part of Theorem 5.44. The corresponding Bratteli-Vershik system
(XBV , τ ) is invertible, because there is a unique xmin = 000 . . . and a unique
xmax = 1111 . . . . Since χ is not primitive, (XBV , τ ) cannot be simple, and
240 5. Further Minimal Cantor Systems


0 1 2 3

⎪0 → 01,
0 1 2 3 ⎪

⎨1 → 01,

⎪2 → 02031,

0 1 2 3 ⎩3 → 02131

0 1 2 3

Figure 5.8. A transitive, non-primitive Bratteli diagram.

hence not minimal. The subdiagram (XBV  , τ ) using only symbols 0 and 1

(dashed in the figure) is a minimal subsystem which is actually conjugate to

a dyadic odometer as in Example 5.37. However, there are also points with
dense orbits, such as y = 2222 . . . . The point z = 3333 . . . is non-recurrent,
because every 3 that disappears when iterating τ will never reoccur. Instead,
orbτ (y) accumulates on XBV  . For more information on Bratteli-Vershik

systems with this kind on non-minimal structure, see [77].

Definition 5.46. A pointed dynamical system (X, T, x) is essentially min-

imal if for every open set U  x, n T n (U ) = X. Thus essentially minimal
is a pointed version of topologically exact. Also, (X \ {x}, T ) is essentially
minimal in the sense of Definition 2.25.

Minimality implies essential minimality, but also if x is a fixed point,

(X, T, x) can still be essentially minimal. For example, the Bratteli-Vershik
system in the left panel of Figure 5.9 has two minimal sequences x = SSS . . .
and y = 000 . . . and two maximal sequences x and z = 333 . . . . If we set
τ (x) = x and τ (z) = y, then x is fixed but the whole system (XBV , τ, x) is
essentially minimal. We can call x the spacer path, because it plays the
role of spacer in a cutting and stacking system; see Example 5.11, II. If we
set τ (x) = y and τ (z) = x, then the whole system (XBV , τ, x) is minimal.
For neither choice does τ extend continuously to XBV max .

The following theorem from [310] says that Bratteli-Vershik systems

model every essentially minimal Cantor system.
5.4. Bratteli Diagrams and Vershik Maps 241

v0 v0

0 1 2 3 S 0 1 0 2 1 0 12

0 1 2 3 S 0 1 0 2 1 0 12

0 1 2 3 S 0 1 0 2 1 0 12

0 1 2 3 S 0 1 0 2 1 0 12

Figure 5.9. A Bratteli-Vershik system isomorphic to the Chacon substitution.

Theorem 5.47. For every essentially minimal homeomorphism T on the

Cantor set X and x ∈ X, there exists a properly ordered Bratteli-Vershik
system (XBV , τ ) such that (X, T, x) is pointedly conjugate to (XBV , τ, xmin ).

Example 5.48. The Bratteli-Vershik systems in Figure 5.9 are both iso-
morphic to the Chacon substitution shift (see Example 1.27) generated
by the fixed point

ρ = 0010 0010 1 0010 0010 0010 1 0010 1 0010 0010 1 0010 . . .

of the Chacon substitution

0 → 0010,
χchac :
1 → 1.

The one on the left is not properly ordered, because it has two minimal
sequences x and y and two maximal sequences x and z as we saw below
Definition 5.46.
The one on the right, constructed by Park [444], is properly ordered.
See [225] for general results on finding properly ordered Bratteli-Vershik
242 5. Further Minimal Cantor Systems

5.4.1. Bratteli-Vershik Systems and S-adic Shifts. Substitutions are

a common way to build minimal subshift; in fact, if we are allowed to build
the substitution shift on a countable collection of substitutions, i.e. apply
S-adic shifts, then every minimal Cantor system can be expressed in this
way. For each i ≥ 1, let Vi be a finite alphabet, and let Vi∗ denote the
∗ be a
collection of finite words in this alphabet. For i ≥ 1, let χi : Vi → Vi−1
substitution; hence, to each v ∈ Vi+1 , χi assigns a string of letters from Vi .
The substitution acts on strings by concatenation:

χi (v1 v2 . . . vN ) = χi (v1 )χi (v2 ) . . . χi (vN ).

To each substitution χi we associate the incidence matrix M (i) of size

#Vi−1 × #Vi , where the entry mk,l (i) denotes the number of appearances of
the k-th letter in Vi−1 in the χi -image of the l-th letter of Vi . Hence M (i)
is the associated matrix of the i-th substitution; see Definition 4.14. By
iterating the substitutions, we can construct an infinite string

s = lim χ2 ◦ χ3 ◦ · · · ◦ χi (v),

where v is taken from Vi . For completeness, we also set χ1 (v) = 0 for every
v ∈ V1 . Using the irreducibility conditions,

∀i ∃w ∈ Vi−1 ∃j > i ∀v ∈ Vj the word χi ◦ · · · ◦ χj (v) starts with w,

the limit can be shown to exist, independently of the choice of v ∈ Vj .

Moreover, s is a uniformly recurrent string in V1N . It generates a minimal
subshift (Σ, σ), where σ is the left-shift and Σ = {σ n (s) : n ≥ 0}, the closure
taken with respect to product topology.

Example 5.49. The best-known examples are of course the stationary sub-
stitutions; i.e. Vi ≡ V and χi ≡ χ. For example, the Fibonacci substitution
acts on the alphabet {0, 1} by
0 → 01,
χFib :
and s = 0100101001001 . . . . This sequence is equal to the sequence of first
labels of {τ n (xmin )}n≥0 in the Fibonacci Bratteli diagram in Figure 5.13. As
a result, the Fibonacci substitution is isomorphic to the Fibonacci Bratteli
diagram, which in turn is isomorphic to the Fibonacci enumeration system.

The following result can be found in e.g. [225].

Lemma 5.50. Every S-adic shift such that each letter v ∈ Vi−1 , i ≥ 2, ap-
pears in some word χi (w), w ∈ Vi , is isomorphic to a Vershik transformation
on an ordered Bratteli diagram and vice versa.
5.4. Bratteli Diagrams and Vershik Maps 243

If the substitutions (χi ) are such that for every i ≥ 2, there is j0 ≥ 1

such that
(5.9) χi ◦ · · · ◦ χj (v) starts with the same symbol for all j ≥ j0 , v ∈ Vj ,
then the corresponding Bratteli diagram has a unique minimal path xmin .

Proof. The vertices of the Bratteli diagram coincide with the alphabets
Vi (for this reason we choose the same notation), except that the Bratteli
diagram has a first level V0 = {v0 }. Let8 E1 = {v0 → v | v ∈ V1 }. For
each v ∈ Vi , i ≥ 1, there is an incoming edge w → v for each appearance of
w ∈ Vi−1 in χi (v), and the ordering of the incoming edges in v is the same
as the order of the letters in χi (v). It follows that the incidence matrices of
the substitution χi coincide with incidence matrices associated to the edges
Ei . Hence M (i) is the transpose of the matrix associated to χi .
Clearly, the Bratteli diagrams and substitutions (χi )i≥2 are in one-to-
one correspondence, provided every w ∈ Vi−1 appears in at least one χi (v),
v ∈ Vi .
Let vi ∈ Vi be the symbol indicated by (5.9); then it easily follows that
vi−1 is the first symbol of χi (vi ) and that xmin := v0 → v1 → v2 → v3 → · · ·
is the unique minimal element.
The sequence s = limj χ2 ◦ · · · ◦ χj (v) can be read off as
sn = s(τ n (xmin )2 ).
In other words, sn records the vertex in V1 that the n-th τ -image of xmin
goes through. The way to see this is the following: Since the incoming edges
to w ∈ V3 are ordered as in χ2 (w), a path starting with χ2 (w)1 → w is
followed by a path starting with χ2 (w)2 → w, etc. Because this is true for
every vertex in every level Vi , the required sequence s will emerge. 
Remark 5.51. A graph cover map f : Γ → Γ on the inverse limit

←−(Γn , πn ) = {(γn )n≥0 : πn+1 (γn+1 ) = γn ∈ Γn for all n ≥ 0}

Γ = lim
(where Γ0 is a single loop from a single vertex) can be turned into a Bratteli-
Vershik system as follows. The edges a ∈ Γn correspond bijectively to the
vertices v = Vn of an ordered Bratteli diagram. We call the bijection Pn . If
the bonding map πn maps a to the concatenation πn (a) = a1 · · · ak of edges
in Γn−1 , then we draw k incoming edges ei ∈ En to v = Pn (a) from the
vertices Pn−1 (ai ) = s(ei ) ∈ Vn−1 , i = 1, 2, . . . , k, in this order. The map
f : Γ → Γ will then lift to the Vershik map τ on the path space XBV .
Conversely, when given an ordered Bratteli diagram (En , Vn )n≥1 , the
vertices v ∈ Vn correspond bijectively to edges a = Pn−1 (v) in Γn . The
8 In fact, in the construction of Theorem 4.120, there is an extra edge (1 → 2) ∈ E . This
gives rise to an isomorphic Bratteli diagram.
244 5. Further Minimal Cantor Systems

ordered set of incoming edges ei ∈ En to v determines the bonding map πn

by the concatenation πn (a) = a1 . . . ak if Pn−1 (ai ) = s(ei ). The Vershik map
τ then translates to the graph cover map f . The difficult step of how to
connect the edges of Γn , i.e. how to determine the vertices of Γn , is solved
by Shimomura [502] using an equivalence relation of which the equivalence
classes in Vn are called clusters. Vertices in the cluster correspond to edges
in Γn with the same terminal vertex.

5.4.2. Bratteli-Vershik Systems and Interval Exchange Transfor-

mations. As in Section 4.4, let T : [0, 1) → [0, 1) be an interval exchange
transformation of d intervals Δi = [γi−1 , γi ), i = 1, . . . , d and γ0 = 0. Since
the subshifts of interval exchange transformations can be seen as S-adic shifts
(using Rauzy induction), ordered Bratteli diagrams can be constructed for
them in the same way as in Section 5.4.1; see specifically Table 4.3. A more
direct construction was given Gjerde & Johansen [275] (see also [222]).
Theorem 5.52. The subshift associated to an IET on d intervals has an
ordered Bratteli diagram ((Ei )i≥1 , (Vi )i≥0 ) where
• d ≥ #Vi ≥ #Vi+1 for all i ≥ 1. If #Vi+1 < #Vi , then9 #Vi+1 =
#Vi −1 and limi #Vi = 1+#{disjoint orbits of discontinuity points}.
• If #Vi−1 = #Vi = m, then the incidence matrix of edge set E(i)
has the form
⎛ ⎞
1 0 ... 0 0 0 ... 0
⎜0 1 0 ⎟
⎜ ⎟
⎜ .. .. ⎟
⎜0 0 . . ⎟
⎜. ⎟
⎜. ..⎟
⎜. 1 1 .⎟
(5.10) M (i) = ⎜
⎜ ..

⎜. 0 0 1 ⎟
⎜. ⎟
⎜ .. ..
⎜ 0 . ⎟
⎜ .. .. ⎟
⎝0 . . 1⎠
s1 s2 . . . sk sk+1 sk+2 ... sm
for sj ∈ {0, r, r + 1} for some r ≥ 0. The matrices M (i) are all
unimodular. If #Vi−1 = #Vi + 1 = m, then the k + 1-st row in the
matrix should be removed.
• The order on the incoming edges {e ∈ Ei : t(e) = v} is left to right.
However, not every Bratteli-Vershik system of this form corresponds to an
9 This corresponds to the loss of an interval for some first return map due to a critical con-

nection and is hence not generic.

5.4. Bratteli Diagrams and Vershik Maps 245

Proof. The proof is based on an accelerated version of Rauzy induction, al-

though [275] doesn’t mention Rauzy and basically reinvents the procedure.
First all the points x in the backward orbits and left forward orbits of dis-
continuity points γi are doubled to x− and x+ . This transforms [0, 1) into a
Cantor set with order topology, provided the Keane condition holds. Then

consider the first return map T1 to [0, γd−1 ) = d−1
i=1 Δi . Let e < d be the
index of the interval such that T (Δe )  1.
• If |Δd | < |Δe |, then the procedure coincides with a Type 1 Rauzy
induction step; see Figure 5.10. The corresponding substitution (as
in Table 4.3) and incidence matrix are
⎛ ⎞
1 0 ... ... 0
⎜ .. ⎟
⎧ ⎜0 . . . .⎟
⎜ ⎟

⎨j → j, 1 ≤ j ≤ e, ⎜ . ⎟
⎜. . 1 1 0⎟
χ : e + 1 → ed, ⎜
M (i) = ⎜ . ⎟.
⎪ . ⎟
⎩ ⎜. . 0 0 . . 0⎟
j → j − 1, e + 1 < j ≤ d, ⎜ ⎟
⎜ .. .. .. ⎟
⎝. . . 1⎠
0 ... 0 1 0 0
Clearly M (1) is unimodular and has the form (5.10).

Δe Δd
γe γd

T (Δd ) T (Δe )

Figure 5.10. The first return map to [0, γd− ] for the case |Δd | < |Δe |.

• If |Δe | = |Δd |, then T (γe ) = γd and we don’t have to split Δe .

⎛ ⎞
1 0 ... ... 0
⎜ .. ⎟
⎧ ⎜0 . . . . . . .⎟
⎪ j → j, 1 ≤ j < d, ⎜ ⎟

⎨ ⎜ . . ⎟
⎜ .. 1 0 .⎟
χ: j = e, ⎜
M (i) = ⎜ . ⎟.
⎪ ⎟

⎩ ⎜ .
. 0 1 0⎟
e → ed, ⎜ ⎟
⎜ .. ⎟
⎝. 0 1⎠
0 ... 1 0 0
This is a rectangular matrix of the form (5.10) with a column re-
246 5. Further Minimal Cantor Systems

• If |Δd | > |Δe |, then there is k < d and r ≥ 0 such that T 1+r (Δk ) 
γd and T j (Δk ) ⊂ δd for 1 ≤ j ≤ r. Thus r ≥ 1 is the minimal
iterate such that T r (1− ) ∈/ Δd . The intervals Δj mapped into Δd
remain in Δd for r or r + 1 steps, depending on whether T (Δj )
is to the left or right of T (Δk ); see Figure 5.11. The first return
map to [0, γd−1 ) is comprised of r Type 1 Rauzy steps and a sin-
gle Type 0 Rauzy step, where r = r × #{j : T (Δj ) ∩ Δd = ∅}
+ #{j : T (Δj ) lies to the right of T (Δk )}.

Δe Δk Δd
γe γd−1

T (Δd ) T (Δk ) T (Δe )

T 2 (Δk )

Figure 5.11. The first return map to [0, γd−1 ) for the case |Δd | > |Δe |.

The corresponding substitution is

⎪ j → jdr or jdr+1 if 1 ≤ j < k and T (Δj ) ⊂ [γd−1 , 1),

⎪ j→j if 1 ≤ j < k and T (Δj ) ⊂ [0, γd−1 ),

⎪k → kd ,


⎨k + 1 → kdr+1 ,

⎪ j → (j − 1)dr if k + 1 < j ≤ d

⎪ or (j − 1)d r+1 and T (Δj−1 ) ⊂ [γd−1 , 1),

⎪ j →j−1 if k + 1 < i ≤ d

and T (Δj−1 ) ⊂ [0, γd−1 ),
and the incidence matrix is
⎛ ⎞
1 0 ... 0 0 0 ... 0
⎜0 1 0 ⎟
⎜ ⎟
⎜ .. ⎟
⎜0 0 . ⎟
⎜. ⎟
⎜. .. ⎟
⎜. 1 1 . ⎟
M (i) = ⎜⎜ .. ..⎟.

⎜. 0 0 1 .⎟
⎜. ⎟
⎜ .. ..
⎜ 0 . ⎟
⎜ .. .. ⎟
⎝0 . . 1⎠
s1 s2 . . . sk sk+1 sk+2 ... sm
5.4. Bratteli Diagrams and Vershik Maps 247

Since sk+1 = sk + 1, a simple computation shows that M (1) is

unimodular and has the form (5.10). (Alternatively, multiplying the
incidence matrices of the Type 1 and Type 0 Rauzy induction steps

gives this result as well.) If T r+1 (Δk ) has γd−1 as right boundary
point, then we don’t need to split Δk and the k+1-st row is removed
from (5.10).
Now continue inductively creating substitutions and incidence matrices for
− −
the n-th first return map Tn : [0, a− −
n ) → [0, an ] for a1 = γd and an the left

endpoint of the rightmost interval on which Tn−1 is continuous.

For the final statement that not every Bratteli-Vershik system is of this
form, we refer to [275, Section 4]. 

5.4.3. Bratteli-Vershik Systems and Toeplitz Shifts. Gjerde & Jo-

hansen [274] gave a characterization of Bratteli-Vershik systems coming from
Toeplitz shifts. The following notion is central in this characterization.
Definition 5.53. A Bratteli diagram has the equal path number prop-
erty10 if for every i ≥ 1, the number of incoming edges #t−1 (v) is the same
for all v ∈ Vi .
Theorem 5.54. A minimal shift is a Toeplitz shift if and only if it has
a representation as Bratteli-Vershik system that is expansive, has a unique
minimal path xmin , and satisfies the equal path number property.

In fact, Gjerde & Johansen [274] proved this for invertible Toeplitz shifts,
provided the Bratteli diagram is properly ordered, so it has both a unique
minimal and a unique maximal path.
Remark 5.55. If the equal path number property holds, then the sequence
p = (pi )i∈N ⊂ N, pi := #t−1 (v) for v ∈ Vi is well-defined. We label the
incoming edges e ∈ Ei with t(e) = v in increasing order with the labels
0, 1, . . . , pi −1. Then the labeling map ψ : XBV → Σp assigning the sequence
of labels to each path is a continuous factor onto the p-adic odometer (Σp , a)
and ψ◦τ = a◦ψ. This gives another way of seeing that odometers are factors
of Toeplitz shifts.
Remark 5.56. Relating Toeplitz shifts to Kakutani-Rokhlin partitions, for
a Toeplitz sequence with periodic structure (qn )∞
n=1 , the elements of Pn are
the sequences that share the same qn -skeleton, and the heights hi (n) = qn .

In order to prove Theorem 5.54, we start with a lemma that holds for
general Bratteli-Vershik systems with a unique minimal path.

10 In [77] this property is called equal row sum (ERS).

248 5. Further Minimal Cantor Systems

Lemma 5.57. Every Bratteli-Vershik system with a unique minimal path

can be telescoped such that the minimal incoming edge at every v ∈ V̂k+1 ,
k ≥ 0, has the same source ûk ∈ V̂k .

Proof. First remove the minimal path xmin from the Bratteli diagram. Since
there is only one minimal path, for no v ∈ Vi , i ∈ N, there remains an infinite
minimal path starting at v. That is, there is an increasing sequence (ik )k∈N ,
such that no minimal path connects Vik−1 to Vik . Therefore, after telescoping
between Vik−1 and Vik for all k ∈ N, obtaining a Bratteli diagram (Êk , V̂k )k∈N ,
there is no minimal edge connecting V̂k−1 to V̂k for any k ∈ N. Now reinsert
the (telescoped version of the) minimal path in (Êk , V̂k )k∈N . This achieves
the required property. 

Proof of Theorem 5.54. First we note that telescoping a Bratteli diagram

preserves the equal path number property. Indeed, set pi = #{e ∈ Ei :
t(e) = v ∈ Vi }; this is independent of v because of the equal path property.
Then there are qi := ij=1 pj paths connecting v0 to each v ∈ Vi . If we
telescope between Vk and Vi , k < i, then the new p̂i = ij=k+1 pj is still
independent of v.
Suppose the Vershik map is expansive, with expansivity constant δ > 0.
We can find i such that the distance of every two paths with xj = yj for
all j ≤ i is less than δ. Therefore, if we telescope between v0 and Vi , the
new Bratteli-Vershik system (Ê, V̂ , τ̂ ) has the property that for every two
distinct paths x, y, there is an n such that τ̂ n (x) and τ̂ n (y) differ at the first
Now we come to the proof, treating the “if”-part first:
⇐: Because there is a unique minimal path, Lemma 5.57 gives a tele-
scoping after which the minimal incoming edge at every v ∈ V̂k+1 , k ≥ 0,
has the same source ûk ∈ V̂k . The analogous argument holds for maximal
edges, provided we have a unique maximal path. This is useful to obtain
an invertible Toeplitz shift (which is in fact an odometer; see [208, below
Theorem 5.1]), but it is not required for one-sided Toeplitz shifts.
Now let p̂k = #{e ∈ Êk : t(e) = v ∈ V̂k } and let q̂k = kj=1 p̂k be the
number of edges connecting v0 to v ∈ V̂k . Then every path will go through
ûk exactly once every q̂k+1 iterates of the Vershik map τ̂ . Let θ0 . . . θq̂k −1
be the first q̂k symbols read off at the first edge set Ê1 from the iterates
τ̂ j (xmin ), 0 ≤ j < q̂k . This word is repeated with period q̂k+1 . Since k is
arbitrary, the full sequence (θj )j≥0 = (τ̂ j (xmin ))j≥0 is Toeplitz, as required.
⇒: For the opposite direction, we will first distill a sequence of KR-
partitions (Pn )n≥1 from the Toeplitz sequence θ = (θj )j≥0 in alphabet A.
5.4. Bratteli Diagrams and Vershik Maps 249

We write X = orbσ (θ) for the (one-sided) Toeplitz shift space. Let q1 be the
period of θ0 , and let V1 be the collection of q1 -prefixes of {σ kq1 (θ) : k ≥ 0}.
Next set the first base
Bv (1) := {x ∈ X : x and θ share q1 -skeleton and x0 . . . xq1 −1 = v},
for v ∈ V1 , so that we obtain the first Kakutani-Rokhlin partition
P1 := {σ j (Bv (1)) : 0 ≤ j < q1 , v ∈ V1 }.
To continue the induction, suppose we have found qn , Vn , (Bv (n))v∈Vn , and
Pn = {σ j (Bv (n)) : 0 ≤ j < qn , v ∈ Vn }. Let qn+1 be the minimal period
with which the word θ0 . . . θqn appears in θ. Since θ0 . . . θqn−1 −1 is a prefix of
θ0 . . . θqn , qn+1 is a multiple of qn . Let Vn+1 be the collection of qn -prefixes
of {σ kqn+1 (θ) : k ≥ 0}. Next set the n + 1-st base
Bv (n + 1) := {x ∈ X : x and θ share qn -skeleton and x0 . . . xqn −1 = v},
for v ∈ Vn+1 , so that we obtain the n + 1-st Kakutani-Rokhlin partition
Pn+1 := {σ j (Bv (n + 1)) : 0 ≤ j < qn+1 , v ∈ Vn+1 }.
Clearly Pn+1 refines Pn and the height of each base element Bv (n + 1) is the
same, namely qn+1 , for each v ∈ Vn+1 .

Since all v ∈ Vn+1 have θ0 . . . θqn −1 as prefix, v∈Vn+1 Bv (n + 1) ⊂
Bθ0 ...θqn −1 (n), verifying condition (KR6) of Section 5.1. To check (KR4),
suppose that x = x ∈ X and let k be the smallest entry for which xk = xk .
Take n such that qn > k. If x, x are in different qn -skeletons, then they
belong to different levels in Pn . If x, x are in the same qn -skeleton, then
there is j < qn , but two different w, w ∈ Vn such that x ∈ σ j (Bw (n)) and
x ∈ σ j (Bw (n)). This shows that (Pn )n≥1 separates points. Since all Pn ’s
are partitions into clopen sets, the Pn ’s generate the topology of X. Hence
(Pn )n≥1 satisfies (KR1)–(KR4) and (KR6). Condition (KR5) may fail but
can be achieved by taken a subsequence of (Pn )n≥1 .
Finally, to construct the Bratteli-Vershik system, Vn are the vertex sets.
For each v ∈ V1 , the edge set E1 contains q1 edges connecting v0 to each
v. To get a simple cap of the Bratteli diagram, we can microscope between
{v0 } and V1 by inserting a level V 1 = A such that there is a single edge
(labeled a) between v0 and a ∈ V 1 . There will be q1 edges between a and
v ∈ V1 , ordered in the same way as the letters appear in v. The general
En , n ≥ 2, will, for each v ∈ Vn , contain qn /qn−1 edges connecting v with
u ∈ Vn−1 ordered in the same way that σ (rv +k)qn−1 (θ) visits the u’s. Here
rv = min{r ≥ 0 : σ rqn−1 (θ) ∈ v}. Clearly the equal path number property is
250 5. Further Minimal Cantor Systems

v0 v0

V1 10 11 V1 10 11

0 1 0 1 0 1 0 1
V2 1011 1010 V1 1011 1010

0 1 0 1 0 1 0 1
V3 10111010 10111011 V2 10111010 10111011

0 1 0 1 0 1 0 1
V4 V3

0 1 0 1

Figure 5.12. Bratteli diagrams for the Feigenbaum Toeplitz shift.

Example 5.58. The Feigenbaum sequence,

ρfeig = 10 11 1010 10111011 1011101010111010 . . . ,
i.e. the fixed point of the Feigenbaum substitution χfeig : 0 → 11, 1 → 10 (see
Examples 1.6 and 4.86) is a Toeplitz sequence. Its periodic structure is qi =
2i , so ρ0 . . . ρ2i −1 reappears with period 2i+1 for i ≥ 0. We find V1 = {10, 11},
V2 = {1011, 1010}, and in general Vn = {ρ0 . . . ρ2n −2 0, ρ0 . . . ρ2n −2 1}. The
resulting Bratteli diagram and its microscoped form (right panel) is given in
Figure 5.12. This microscoped Bratteli diagram coincides with the one ob-
tained from the Feigenbaum substitution. It is not properly ordered, because
it has two maximal paths, which agrees with ρfeig having two preimages in
Xfeig .

5.4.4. Bratteli-Vershik Systems and Cutting and Stacking. Bratteli-

Vershik systems are in one-to-one correspondence with cutting and stacking
systems. The translation algorithm from ordered Bratteli-Vershik system
to cutting and stacking in its short version is as follows: Start with #V1
stacks Si , coded by i ∈ V1 = {0, . . . , #V1 − 1}. Then by induction, assuming
Sv , v ∈ Vn−1 , are the stacks for n ∈ N, we repeat the following two steps:
(1) Cut each stack Sv , v ∈ Vn−1 , into #{e ∈ En : s(e) = v} slices Sv,e .
5.4. Bratteli Diagrams and Vershik Maps 251

(2) If e ∈ En (with s(e ) = v) is the direct successor of e ∈ En (with

s(e) = v) among all edges with the same terminal vertex in w =
t(e) = t(e ) ∈ Vn , then put slice Sv ,e on top of Sv,e .
At every finite stage n, the codes at the bottom of the stacks represent
minimal n-paths, and the codes at the top of the stacks represent maximal
n-paths. This shows in particular that the number of minimal and maximal
paths in an ordered Bratteli diagram is bounded by the rank.
Example 5.59. The first three iterations of the above algorithm are worked
out for the Fibonacci Bratteli diagram in Figure 5.13. In this case, one part
of one bottom level always stays at the bottom; this corresponds to the one
minimal path xmin . The top level of one part of the split stack and the
top level of the other stack always stay on top; this corresponds to the two
maximal paths.

t v0
V1 tH @t

0 1
 HH 1
0 1

V2 t H
t 0 0
 1 1
0 1 HH

V3 t H0
t 0 0

Figure 5.13. The Fibonacci Bratteli diagram and equivalent cutting

and stacking

If (XBV , τ ) is equipped with a non-atomic τ -invariant measure μ, then

we can give the precise width of all stacks and slices in terms of the μ-
measure of cylinder sets, and ultimately the precise form of the cutting and
stacking interval map T : [0, 1] → [0, 1]. We use here a simplified version of
the algorithm described in [401]11 , for which we need, for each n ∈ N and
v ∈ Vn−1 , an order ≤s on the outgoing edges {e ∈ En : s(e) = v}. This
produces a total lexicographical order ≺lex on XBV : given x = x ∈ XBV ,
find the smallest n ∈ N such that xn = xn (whence s(xn ) = s(xn )) and set
x ≺lex x if and only if xn <s xn .
11 In fact, this paper continues to associate vertical and horizontal flows on a translation

surface with a double Bratteli-Vershik system, on which a renormalization of the translation

surface corresponds to the shift in the double Bratteli diagram.
252 5. Further Minimal Cantor Systems

Now define a weight function w : n En → [0, 1] by setting
μ([x1 . . . xn e])
w(e) := for any path x1 . . . xn with t(xn ) = s(e) ∈ Vn .
μ([x1 . . . xn ])
By τ -invariance of μ, any path x1 . . . xn with t(xn ) = v ∈ Vn has the same
mass as any other path with
 the same terminal vertex v, so the above defi-
nition makes sense. Also s(e)=v w(e) = 1 for every n ∈ N0 and v ∈ Vn .
Now define a function ϕ : XBV → [0, 1] by

ϕ(x) = w(e) + μ([x1 . . . xn−1 ]) w(e).
s(e)=v0 ,e<s x1 n=2 s(e)=s(xn ),e<s xn

Then d(x, x ) = |ϕ(x) − ϕ(x )| is a semi-metric on XBV . Define T : [0, 1] →

[0, 1] by
T (y) = ϕ ◦ τ ◦ ϕ−1 (y).
Proposition 5.60. The system ([0, 1], Leb, T ) obtained this way is a cutting
and stacking system isomorphic to (XBV , μ, τ ), and ϕ is the isomorphism
(but not a conjugacy, because it is two-to-one at a countable dense subset of
XBV ).

Proof. If w = x1 . . . xn is an n-path from v0 in the Bratteli diagram and

w and xmax
w are the minimal and maximal sequences in the cylinder [w]
in the lexicographical order ≺lex , then
w ) − ϕ(xw ) = μ([w]).
ϕ(xmax min
However, if w is the n-path from v0 succeeding w in the ≺lex -ordering, then
(5.12) ϕ(xmax min
w ) = ϕ(xw ),
which shows that ϕ is not injective but identifies pairs of points and turns
the Cantor set XBV into the interval ϕ(XBV ) = [0, 1]. These pairs xmax min
w , xw 
are only countably many (so their total μ-measure is zero12 ), but otherwise
ϕ is injective. Formulas (5.11) and (5.12) together show that ϕ is an order-
preserving isomorphism from (XBV , μ, ≺lex ) to ([0, 1], Leb, <) with standard
order <.
It follows that T = ϕ ◦ τ ◦ ϕ−1 preserves Lebesgue measure and is one-
to-one except at countably many points where it is one-to-two. Indeed, if
w = x1 . . . xn and w = x1 . . . xn are two consecutive n-paths as before with

w ) = ϕ(xw ) = y and if there is m < n such that xj = xj for all j ≤ m
ϕ(xmax min

and xm is not a maximal incoming edge (we can take m minimal with this
property), then τ (xmax
w ) = xv
max and τ (xmin ) = xmin for consecutive n-paths
w v
v = τ (w) and v = τ (w ). In this case, T (y) = ϕ(xmax ) = ϕ(xmin
v v  ) so T is
well-defined (and in fact continuous) at y after all. Therefore, for every n
12 Recall that μ is assumed to be non-atomic.
5.4. Bratteli Diagrams and Vershik Maps 253

there are at most i,j mij (n) pairs xmax min
w , xw (corresponding to the total
number of slices of the n − 1-st level stacks to create the n-th level stacks)
where T can be discontinuous at y = ϕ(xmax min
w ) = ϕ(xw ). At all other points,
T acts as a local isometry, that is, precisely as a cutting and stacking map.
This proves the proposition. 

5.4.5. Bratteli-Vershik Systems and Enumeration Systems. Enu-

meration systems are also conjugate to Bratteli-Vershik systems. Let us
start with enumeration system (XG , a) associated with the sequence (Gi )i≥0
with G0 = 1 and extend the notation of (5.3):
(5.13) Gn = d1 (n)Gn−1 + · · · + dn−Q(n) (n)GQ(n) .
For simplicity we assume that dn−1 (n) ≥ 1. For n ≥ 2
RG (n) := {m ∈ N : Gm appears in the greedy expansion of Gn }
in the sense that dn−m (n) in (5.13) is non-zero.
Proposition 5.61. The enumeration system (XG , a) is conjugate to the
following Bratteli-Vershik system (XBV , τ ):
(1) V0 = {0} = {v0 } and Vn = {m : Q(m) < n ≤ m} for n ≥ 1.
(2) En = {n − 1 → m : 0 ≤ a < dm−n+1 (m), n − 1 ∈ RG (m)}
∪{m −→ m : Q(m) < n − 1},
for n ≥ 1. Thus the incoming edges in m ∈ Vn have labels a =
0, . . . , dm−(n−1) (m), and this gives the order among these edges.

Figure 5.14 gives an example based on the recursion Gn = 3Gn−1 +

2Gn−3 + Gn−4 .

Proof. We start by showing that all vertices have incoming and outgoing
edges. Indeed, if m ∈ Vn , so Q(m) < n ≤ m, then there are

n − 1 → m if n − 1 ∈ RG (m),
incoming edges
m→m / RG (m), so Q(m) < n − 1
if n − 1 ∈
m→m+1 if m = n,
outgoing edges
m→m if Q(m) < n < m.
0 0 0
We call the path v0 → 1 → 2 → · · · with code xmin := 0000 . . . the spine
of the Bratteli diagram. It is clearly the unique minimal path. A path x
is maximal if, whenever it passes through the first vertex n ∈ Vn , it passes
through Q(n) ∈ VQ(n) as well, but not through m ∈ Vm for Q(n) < m < n.

The edge-labeled paths with this property are exactly the sequences in XG .

We set τ (x) = x min for every x ∈ XG .
254 5. Further Minimal Cantor Systems

G0 = 1, V0

0 1 0 0 1 0
G1 = 2, V1 1 2 3 4

0 1 2 1 0
3 0 2
G2 = 7, V2 2 3 4 5

0 1 2 1 0
3 0 2
G3 = 23, V3 3 4 5 6

0 1 2 1 0
3 0 2
G4 = 74, V4 4 5 6 7

0 1 2 1 0
3 0 2
G5 = 237, V5 5 6 7 8

Figure 5.14. The Bratteli diagram for Gn = 3Gn−1 + 2Gn−3 + Gn−4 ,

n ≥ 4.

Thus there are finitely many maximal paths if n − Q(n) is bounded and
only one maximal path if there are infinitely many n such that Q(m) ≥ n
for all m > n. In this case, the Bratteli diagram corresponds to an adding
machine based on these particular Gn ’s. This proved so far that the ordered
Bratteli diagram is well-defined, with a single minimal and finitely many
maximal paths.
Claim: The number of paths from v0 to n ∈ Vn is equal to Gn .
Since there are G1 = d1 (1) edges v0 → 1 ∈ V1 , the claim holds for
G1 . We continue to prove the claim by induction, assuming the claim is
true for all m < n. The number of paths through both n − 1 ∈ Vn−1 and
n ∈ Vn equals d1 (n)Gn−1 . This proves the inductive step if Q(n) = n − 1.
Otherwise, the remaining incoming edge n → n is the last edge of a strand
VQ(n)  Q(n) → n → n → · · · → n → n ∈ Vn . Going up this strand, we
accumulate d2 (n)Gn−2 + d3 (n)Gn−3 + · · · + dQ(n) GQ(n) paths. Together, this
adds up to Gn , proving the induction step, and thus the claim.
If 0 ≤ k < Gn , then, counting from the spine, the k-th path from v0
through 1 ∈ Vn in the Vershik order, i.e. x = τ k (00000 . . . ), satisfies k =
i=1 xi Gi−1 . From this it follows that τ (x) = a(x) for every x ∈ XG , and by

continuity also τ (x) = a(x) for every x ∈ XG . This completes the proof. 
5.4. Bratteli Diagrams and Vershik Maps 255

S0 = 1

S1 = 2 1 2 3

0 1

S2 = 3 2 3 4

0 1

S3 = 4 3 4 5

0 1

S4 = 6 4 5 6

0 1

S5 = 9 5 6 7

Figure 5.15. The Bratteli diagram for Sk = Sk−1 − Smax{k−3,0} .

Example 5.62. If Q is the kneading map of a unimodal map, then the

corresponding Bratteli diagram has a single spine v0 → 1 → 2 → 3 →
4 → · · · which is the unique minimal path. The vertex k ∈ Vk has a second
incoming edge, which is larger than the edge k − 1 → k, and the number of
paths connecting v0 to k ∈ Vk is exactly the k-th cutting time Sk . All other
vertices have only one incoming and one outgoing edge. Figure 5.15 gives
the ordered diagram for Q(k) = max{k − 3, 0}; see also Section 5.3.1 where
the same example is used.
Exercise 5.63. Assume that Gk = Sk and Sk = Sk−1 + SQ(k) with S0 = 1
are the cutting times of a unimodal map f , and consider the corresponding
Bratteli-Vershik system (XBV , τ ). Show that if f is infinitely renormalizable
(see Proposition 4.122), then (XBV , τ ) is isomorphic to an adding machine.
Chapter 6

Methods from
Ergodic Theory

In ergodic theory, we study dynamical systems (X, B, T ) by means of prob-

ability1 measures μ : B → [0, 1]. Here B is the σ-algebra of measurable sets
which in our setting is always the Borel σ-algebra generated by the open sets
or equivalently by the closed sets.
Definition 6.1. A measure μ on (X, B, T ) is called invariant if μ(A) =
μ(T −1 A) for all A ∈ B. Equivalently, g ◦T dμ = g dμ for every g ∈ L1 (μ).

Invariant measures allow us to study the behaviors of typical orbits,

that is, all orbits except for a set of μ-measure zero (i.e. up to a nullset).
This is denoted as a.e. (almost everywhere) or μ-a.e. or mod μ. The word
generic is used too, but since “generic” plays so many other roles in this text,
we reserve the word “typical” to be associated to (mod μ). That there exist
invariant measures in the first place is guaranteed by the Krylov-Bogul’yubov
Theorem 6.2 (Krylov-Bogul’yubov). If T is a continuous map on a compact
space X, then there is at least one T -invariant probability measure.

Proof. Let ν be any probability measure and define Cesàro means:

νn (A) = ν(T −j A).
These are all probability measures. The collection of probability measures
on a compact metric space is compact in the weak∗ topology; i.e. there is a
1 Occasionally also by infinite measures, but not in this text.

258 6. Methods from Ergodic Theory

limit probability measure μ and a subsequence (ni )i∈N such that for every
continuous function ψ : X → R,
0 0
(6.1) ψ dνni → ψ dμ as i → ∞.
In a metric space, for any ε > 0 and closed set A, we can find a continuous
function ψA : X → [0, 1] such that ψA (x) = 1 if x ∈ A and
μ(A) ≤ X ψA dμ ≤ μ(A) + ε,
μ(T −1 A) ≤ X ψA ◦ T dμ ≤ μ(T −1 A) + ε.
Here we use outer regularity of the measure μ: μ(A) = inf{μ(U ) : A ⊂
U is open}. We take U ⊃ A so small that μ(U ) − μ(A) < ε and ψA = 0 for
all x ∈
/ U . Note that it is important that A is closed, because if there exists
a ∈ ∂A \ A, then the above property fails for μ = δa .
By the definition of μ
/0 0 /
/ /
|μ(T −1
(A)) − μ(A)| ≤ // ψA ◦ T dμ − ψA dμ// + ε
/0 0 /
/ /
= lim / ψA ◦ T dνni − ψA dνni // + ε
/ /
/n −1 0 0 /
1 // i /
= lim / ψA ◦ T j+1 dν − ψA ◦ T j dν // + ε
i→∞ ni / /
/0 0 /
1 // /
≤ lim / ψA ◦ T ni dν − ψA dν // + ε
i→∞ ni
≤ lim &ψA &∞ + ε = ε.
i→∞ ni

Since ε > 0 is arbitrary, μ(T −1 (A)) = μ(A). The closed sets generate the σ-
algebra of Borel sets, so μ(T −1 (A)) = μ(A) also for arbitrary Borel sets. 
Exercise 6.3. To demonstrate the role of the compactness assumption in
Theorem 6.2, consider the fixed point ρCantor of the Cantor substitution
χCantor in Remark 2.14 and Example 6.19. Let X = {σ n (ρCantor ) : n ≥ 0}, so
no closure is taken! Show that (X, σ) has no invariant probability measure.
Remark 6.4. Invariant measures are related to fixed points of the trans-
fer operator of a dynamical system (X, T ). First define the Koopman
operator UT g = g ◦ T . With respect to some reference measure m (e.g.

Lebesgue measure), UT becomes an operator of L2 (m) to itself, and the

transfer operator LT : L2 (m) → L2 (m) is defined by duality:
0 0
(6.2) g ◦ T · h dm = g · LT h dm.
2 Named after Bernard Koopman (1900–1981), a student of Birkhoff.
6.1. Ergodicity 259

So if LT h = h for some h ∈ L1 (m) and dμ = h dm, then (6.2) becomes

X g ◦ T dμ = X g dμ; i.e. μ is T -invariant. If the reference measure m is
already T -invariant, then LT preserves constant functions.

6.1. Ergodicity
The notion of ergodicity says that the space X doesn’t fall apart in separate
T -invariant components, both of positive measure.
Definition 6.5. A measure μ is called ergodic if for every T -invariant set
A ∈ B (i.e. T −1 (A) = A mod μ) either μ(A) = 0 or μ(Ac ) = 0. That is, the
only T -invariant sets are nullsets or the whole space up to a nullset.
Example 6.6. For the full shift (AN or Z , σ), Bernoulli measures (see Def-
inition 1.32) are ergodic σ-invariant measures. So are equidistributions or
periodic orbits. Non-trivial convex combinations of such measures are still
σ-invariant, but not ergodic.
Corollary 6.7. A dynamical system (X, T, μ) is ergodic if and only if the
only T -invariant L1 (μ)-functions (i.e. ψ = ψ ◦ T μ-a.e.) are constant μ-a.e.

Proof. If ψ is a T -invariant function that is not constant μ-a.e., then there

is c ∈ R such that μ({x ∈ X : ψ(x) ≤ c}) and μ({x ∈ X : ψ(x) > c}) both
have positive measure. But these sets are T -invariant, proving that μ cannot
be ergodic. Conversely, if T is not ergodic, say for A = T −1 (A) mod μ both
μ(A) > 0 and μ(Ac ) > 0, then the indicator function ψ = 1A is T -invariant,
but not constant μ-a.e. 
Exercise 6.8. Show that ergodicity of μ is equivalent to the following: if
μ = αμ1 +(1−α)μ2 for two measures and some α ∈ (0, 1), then μ = μ1 = μ2 .
Conclude that if there is only one invariant probability measure, it has to be
Example 6.9. A Sturmian shift is ergodic. To prove this, let Rα : S1 → S1
be an irrational circle rotation; it preserves Lebesgue measure m. We show
that everyT -invariant function ψ ∈ L2 (m) must be constant. Indeed, write
ψ(x) = 2πinx as a Fourier series. The T -invariance implies that
n∈Z an e
an e 2πiα = an for all n ∈ Z. Since α ∈ / Q, this means that an = 0 for all
n = 0, so ψ(x) ≡ a0 is indeed constant.
Since the Sturmian shift with rotation number α (with its unique invari-
ant probability measure) is isomorphic (see Definition 6.53) to (S1 , Rα , m),
the ergodicity of the rotation carries over to the Sturmian subshift.

The set M(T ) of all invariant measures is convex; it is a special case of a

Choquet simplex. The actual definition of a Choquet simplex is that it is a
compact, metrizable, convex set in which every element can be decomposed
260 6. Methods from Ergodic Theory

uniquely3 as a convex combination of extremal points4 (this is an instance

of Choquet’s Theorem; see [460]).
The set of probability measures has indeed this property, since, as Ex-
ercise 6.8 showed, the ergodic measures Merg (T ) are precisely the extremal
points of this simplex. Hence, as a consequence of Choquet’s Theorem, for
every μ ∈ M(T ), there is a probability measure ν on Merg (T ) such that
μ(A) = μerg (A)dν for every A ∈ B.
Merg (T )

This is called the ergodic decomposition of μ. The Choquet simplex is

called a Poulsen simplex if its set of extremal points (here the ergodic
measures) is dense. Up to homeomorphisms (and in fact affine homeomor-
phisms) there is only one non-singleton simplex in which the extremal points
are dense, see [400], so we can speak of the Poulsen simplex. The next the-
orem is due to Sigmund [507, 508] with precursors by Ruelle [480].
Theorem 6.10. If (X, T ) is a continuous dynamical system with specifica-
tion on a compact metric space, then the set of equidistributions on periodic
orbits is dense in Merg (T ), so Merg (T ) is a Poulsen simplex. A fortiori, for
every convex subset V ⊂ Merg (T ), there is x ∈ X such that V is exactly the

set of weak∗ accumulation points of ( n1 n−1 i=0 δT i (x) )n∈N .

However, also for dynamical systems lacking specification, and in partic-

ular zero entropy subshifts, the Choquet simplex can be Poulsen. Downarow-
icz demonstrated that the family of Toeplitz shifts is so rich that for every
simplex Σ, there is a Toeplitz shift whose Choquet simplex equals Σ; see
[207]. Cortez & Rivera-Letelier [168] showed that for enumeration systems
every simplex Σ with a compact, totally disconnected set of extremal points
can emerge as Choquet simplex. Kułaga-Przymus et al. [379] showed that
for B-free shifts with positive entropy, the set of shift-invariant measures is
a Poulsen simplex.

6.2. Birkhoff ’s Ergodic Theorem

A simple consequence of having an invariant probability measure is:
Theorem 6.11 (Poincaré Recurrence Theorem). If (X, B, T ) has an invari-
ant probability measure μ, then for every set A ∈ B and μ-a.e. x ∈ A, there
is n ≥ 1 such that T n (x) ∈ A.

This property of μ is called recurrence, hence the name of the theorem.

3 Therefore a filled triangle is a Choquet simplex, but a filled square is not, because its center

is the convex combination of the corners in multiple ways.

4 I.e. points that cannot be written as non-trivial convex combinations of other points.
6.2. Birkhoff’s Ergodic Theorem 261

Proof. Let A ∈ B be an arbitrary set of positive measure (if μ(A) = 0,

the result is trivially true). As μ is invariant, μ(T −i (A)) = μ(A) > 0 for
all i ≥ 0. On the other hand, 1 = μ(X) ≥ μ( i T (A)), so there must
be overlap in the backward iterates of A; i.e. there are 0 ≤ i < j such that
μ(T −i (A) ∩ T −j (A)) > 0. Take the j-th iterate and find μ(T j−i (A) ∩ A) ≥
μ(T −i (A) ∩ T −j (A)) > 0. This means that a positive measure part of the
set A returns to itself after n := j − i iterates.
For the part A of A that didn’t return within n steps, assuming A has
positive measure, we repeat the argument. That is, there is n such that
μ(T n (A ) ∩ A ) > 0 and then also μ(T n (A ) ∩ A) > 0.
Repeating this argument, we can exhaust the set A up to a set of measure
zero, and this proves the theorem. 

Remark 6.12. Kac’s Lemma provides a quantitative version of Poincaré

recurrence. If τA (x) = min{n ≥ 1 : T n (x) ∈ A} is the first return time to
some A ∈ B with μ(A) > 0, then A τA (x) dμ = 1.

The central result in ergodic theory is paraphrased as

Space Average = Time Average,
at least for typical points. This is called Birkhoff’s5 Ergodic Theorem:

Theorem 6.13. Let μ be a probability measure and let ψ ∈ L1 (μ). Then the
ergodic average
ψ ∗ (x) := lim ψ ◦ T i (x)
n→∞ n
exists μ-a.e., and ψ∗ is T -invariant. If in addition μ is ergodic, then

(6.3) ψ = ψ dμ μ-a.e.

Remark 6.14. A point x ∈ X satisfying (6.3) is called μ-typical. To

be precise, the set of μ-typical points also depends on ψ, but for different
functions ψ, φ, the (μ, ψ)-typical points and (μ, φ)-typical points differ only
on a nullset.

Exercise 6.15. Let (X, T, B, μ) be an ergodic measure-preserving system on

a compact metric space. Show that T is topologically transitive on supp(μ).

5 Named after George Birkhoff (1884–1944). Details of the controversy on priority of the
1 i−1
Ergodic Theorem (John von Neumann was earlier in proving his L1 -version n k=0 UT ψ →
X ψ dμ in L (μ), but Birkhoff delayed its publication until after the appearance of his own
paper) can be found in [569].
262 6. Methods from Ergodic Theory

Definition 6.16. A point x ∈ X is called quasi-generic6 w.r.t. μ if there

are sequences (an )n∈N and (bn )n∈N with bn → ∞ such that
a +b −1 0
1 n n
lim ψ ◦ T (x) =
ψ dμ
n→∞ bn X

for every continuous function ψ.

6.3. Unique Ergodicity

For continuous dynamical systems on compact spaces (such as subshifts),
Theorem 6.2 provides at least one invariant measure. The question we raise
in this section is whether there is a unique invariant measure.
Definition 6.17. A transformation (X, T ) is uniquely ergodic if it admits
only one invariant probability measure. If (X, T ) is both uniquely ergodic
and minimal, we call it strictly ergodic.
Example 6.18. An SFT is not uniquely ergodic, exceptwhen it consists of
a single periodic orbit, because the equidistribution n1 n−1i=0 δσ i (x) an each

periodic sequences x = (x1 . . . xn ) is an invariant measure. The same holds
for sofic shifts. On the other hand, Sturmian shifts (X, σ) are strictly ergodic,
and their unique measure is obtained by lifting Lebesgue measure from the
circle, using the itinerary map i : S1 → X; that is, μ(A) = Leb(i−1 (A)).
Example 6.19. The Cantor substitution from Remark 2.14

0 → 000,
χCantor :
1 → 101
with fixed point ρCantor = 101000101000000000101000101 . . . generates a
non-minimal (since it contains a fixed point 0∞ ) subshift (ΣCantor , σ). It is
uniquely ergodic, with the Dirac measure δ0∞ being the only shift-invariant
probability measure, because for every n and among all words of length
n ∈ N, the word 0n occurs with limit frequency 1. This example shows that
the support of the unique invariant measure need not be the whole space.

As mentioned in Exercise 6.8, if (X, T ) is uniquely ergodic, its unique

measure is ergodic. A very useful property of uniquely ergodic systems is
that Birkhoff averages converge uniformly, rather than only μ-a.e.
Theorem 6.20 (Oxtoby’s Theorem). A continuous dynamical system T :
X → X on a compact space is uniquely ergodic if and only
 if, for every
continuous function ψ : X → R, the Birkhoff averages n1 n−1
i=0 ψ ◦ T (x)

converge uniformly to a constant function.

6 Recall that sometimes μ-typical points are called μ-generic.
6.3. Unique Ergodicity 263

In fact [277, Theorem 4.10], if every point is typical for a generic mea-
sure7 , then (X, T ) is uniquely ergodic.
A major consequence of unique ergodicity is the uniform existence of
visit frequencies; i.e. for a uniquely ergodic subshift (X, σ, μ)
(6.4) μ([a1 . . . aN ]) = lim #{0 ≤ j < n : xj+1 . . . xj+N = a1 . . . aN },
n→∞ n
for every word a1 . . . aN and all x ∈ X.

Proof. If μ and ν were two different ergodic measures, then we can find a
continuous function ψ : X → R such that ψ dμ = ψ dν. Using Birkhoff’s
Ergodic Theorem 6.13 for both measures (with their own typical points x
and y), we see that
n−1 0 0 n−1
1 1
lim ψ ◦ T (x) = ψdμ = ψdν = lim
ψ ◦ T k (y),
n→∞ n n→∞ n
k=0 k=0
so there is no uniform convergence to a constant function.
Conversely, we know by Birkhoff’s Ergodic Theorem 6.13 that
n−1 0
lim ψ ◦ T (x) = ψ dμ
n n
is constant μ-a.e. But if the convergence is not uniform, nithen there is a
sequence (yi )i∈N ⊂ X and (ni )i∈N ⊂ N, such that limi n1i k=0 ψ ◦ T k (yi ) =
ni −1
X ψ dμ. Define probability measures νi
:= n1i k=0 δT k (yi ) . This sequence

(νi )i∈N has a weak accumulation points ν which is shown to be T -invariant
measures in the same way as in the proof of Theorem 6.2. But ν = μ because
ψ dν = ψ dμ. Hence (X, T ) cannot be uniquely ergodic. 

We think of unique ergodicity as an indicator of low complexity. For in-

stance, the uniquely ergodic subshifts presented so far (such as Sturmian and
substitution shifts) have zero entropy, and the non-uniquely ergodic shifts
(SFTs, sofic shifts) have positive entropy. However, there are minimal zero
entropy shifts that are not uniquely ergodic. One of the first such examples
is due to Keane [348] and comes from an interval exchange transformation
on four intervals. It is known [542, Theorem 2.12] that any transitive inter-
val exchange transformation8 on n intervals can have at most n/2 ergodic
measures, so IETs on two or three intervals are uniquely ergodic.
On the other hand, there are positive entropy subshifts that are uniquely
ergodic. Without proof, we state a general result by Krieger [372] in this
7 See Remark 1.31 for the definition.
8 For Interval Translation Maps on n intervals, the bound is (n + 1)/2; see [128, 137].
264 6. Methods from Ergodic Theory

Theorem 6.21. Every subshift (X, σ) has a uniquely ergodic subshift (X  , σ)

of the same entropy.

On the positive side, there are several conditions implying unique ergod-
Theorem 6.22. Let (X, T ) be an equicontinuous surjection on compact met-
ric space (X, d). Then the following are equivalent:
(a) T is transitive.
(b) T is uniquely ergodic.
(c) Every T -invariant probability measure is ergodic.

The main implication (a) ⇒ (b) is due to Fomin [250] for minimal dy-
namical systems and was generalized to transitive systems by Oxtoby [440],
as we shall do in the proof below (but see Exercise 2.28). In fact, Oxtoby’s
proof applies to transitive mean equicontinuous systems as well.

Proof. (a) ⇒ (b): Assume that T : X → X is equicontinuous and tran-

sitive. Let ψ : X → R be continuous and, due to the compactness of X,
also uniformly continuous. Take ε > 0 arbitrary, and choose δ > 0 so that
|ψ(x)−ψ(y)| < ε whenever d(x, y) < δ. By equicontinuity, we can find γ > 0
such that d(x, y) < γ implies d(T j (x), T j (y)) < δ for all j ≥ 0. Take a point
x ∈ X with a dense orbit, and let y ∈ X be arbitrary. Choose k ≥ 0 such
that d(T k (x), y) < γ. Then
/ / /
/n−1 / /k−1  
1 //  /
/ 1 //
ψ ◦ T j
(x) − ψ ◦ T j
(y) ≤ ψ ◦ T j
(x) − ψ ◦ T n−k+j
n // /
/ n //
j=0 j=0
n−1  //
+ ψ ◦ T j (x) − ψ ◦ T j−k (y) //
j=k /
≤ k(sup ψ − inf ψ) + (n − k)ε → ε
as n → ∞. Since ε and y ∈ X are arbitrary, it follows that the ergodic
average of every point converges to the same value. By Oxtoby’s Theorem
6.20, this implies unique ergodicity.
(b) ⇒ (c): This follows immediately from Exercise 6.8.
(c) ⇒ (a): Suppose that T is not transitive, so in particular not minimal.
Let (Y, T ) be a T -invariant subsystem for some compact proper subset Y ⊂
X. It supports an invariant measure μ0 , due to the Krylov-Bogul’yubov
Theorem 6.2. Take ε > 0 so small that there is x ∈ X such that d(x, Y ) > ε.
As T is equicontinuous and surjective, Corollary 2.35 implies that there
6.3. Unique Ergodicity 265

is δ > 0 such that d(T n x, Y ) > δ for all n ∈ N. Hence, for any weak∗
accumulation point
n −1
1 k
μ1 = lim δT i x ,
k→∞ nk
the support supp(μ1 ) ∩ Y = ∅. Therefore 12 (μ0 + μ1 ) is not ergodic. 
Lemma 6.23. If a shift (X, σ) is balanced on words, see Definition 4.36,
then it is uniquely ergodic.

Proof. Suppose that μ and ν are two different ergodic invariant measures
and that u ∈ L(X) is such that μ([u]) = ν([u]). Take typical points x and y
(in the sense of the Birkhoff Ergodic Theorem 6.13,) for μ and ν, respectively.
Then | |x1 . . . xn |u −|y1 . . . yn |u | ∼ n|μ([u])−ν([u])| → ∞ as n → ∞, so L(X)
cannot be balanced. 
Example 6.24. If L(X) is R-balanced on words, then it is R-balanced on
letters, but the other direction fails of course. For example, the full shift
X := {0, 1}Z is not balanced, but χT M (X) for the Thue-Morse substitution
χTM is balanced on letters but not even balanced on 2-words. This example
shows that balancedness on letters is not sufficient for Lemma 6.23.
Adding machines are minimal isometries, and therefore uniquely ergodic
by Theorem 6.22. For the more general class of Toeplitz shifts, the question of
unique ergodicity is more interesting. A following result (requiring regularity;
see Section 4.5.1) is by Jacobs & Keane [332].
Theorem 6.25. Every regular Toeplitz shift is uniquely ergodic.

Proof. The original result is [332, Theorem 5]; we follow [381, Theo-
rem 4.78], using the notation from Theorem 4.92. In particular, Li =
lcm(q1 , . . . , qi ), where (qi )i∈N is the periodic structure of the Toeplitz se-
quence. Let u ∈ L(x) be an arbitrary word; then for i so large that Li > |u|,
the frequency μi (u) := L1i |V (i)|u of the word u in V (i) is a lower bound for
inf j≥0 lim inf n n1 #{1 ≤ k ≤ n : xj+n+1 . . . xj+n+|u| = u}, whereas μi + |u|+r

is an upper bound for supj≥0 lim supn n1 #{1 ≤ k ≤ n : xj+n+1 . . . xj+n+|u| =

u}. But |u|+r
→ 0 as i → ∞. Therefore limi μi (u) exists, and the visit
frequency is well-defined and with uniform convergence on orbσ (x). By Ox-
toby’s Theorem 6.20, this implies unique ergodicity. 

However, there are non-uniquely ergodic Toeplitz shifts. For an explicit

counterexample to unique ergodicity if regularity fails, see [381, Theorem
4.78] and also the aforementioned result of Downarowicz [207].
The following result is due to Furstenberg [265] and is used to prove the
unique ergodicity (and minimality) of many skew-product systems.
266 6. Methods from Ergodic Theory

Proposition 6.26. Let (X, T, μ) be uniquely ergodic and let G be a compact

group with Haar measure9 m. Let the group extension S : Y → Y be defined
on Y := X ×G as S(x, g) = (T (x), f (x)g) for some f : X → G. If S is ergodic
w.r.t. ν = μ × m, then S is uniquely ergodic.

Proof. Let (x, g) ∈ Y be the ν-typical point, so it satisfies Birkhoff’s Ergodic

Theorem 6.13 w.r.t. every continuous function ϕ : Y → R. For any h ∈ G,
ϕ̃ defined by ϕ̃(g) = ϕ(gh) is continuous too, so (x, gh) is ν-typical w.r.t. ϕ
because (x, g) is ν-typical w.r.t. ϕ̃. It follows that there is a subset W ⊂ X
with μ(W ) = 1 such that W × G consists entirely of ν-typical points.
If ν  was another ergodic S-invariant probability measure, then the ar-
gument above gives a set W  ⊂ X \ W with ν  (W ) = 1 such that W  × G
consists entirely of ν  -typical points. Then the projected measure μ on X
defined by μ (A) = ν  (A×G) is T -invariant and satisfies μ (W  ) = 1. But W
and W  are disjoint, so μ = μ , contradicting that T is uniquely ergodic. 

6.3.1. Unique Ergodicity and Word-Complexity. Subshifts with a

sufficiently low word-complexity p(n) are uniquely ergodic. The following
results in this direction are due to Boshernitzan [96, 97].
Theorem 6.27. Let (X, σ) be a minimal shift and let b := lim inf n p(n)/n <
∞. Then there are at most10 max{b, 1} shift-invariant ergodic measures.
Let us first discuss some extensions and related results. A strengthening
of the previous theorem is: if b < 3, then (X, σ) is uniquely ergodic, see
[96, Theorem 1.5], and without the minimality assumption [230, Theorem
Damron & Fickenscher [177, Theorem 1] show, using bi-special words
and Rauzy graph techniques, that if there is a constant such that p(n) =
bn + C for some 3 ≤ b ∈ N, C ∈ N, and all sufficiently large n, then there are
at most b − 2 shift-invariant ergodic measures. In [178], they improved this
estimate to b/2 shift-invariant ergodic measures for transitive subshifts,
provided all bi-special words are regular; see Definition 1.11.
There are also extensions to the non-minimal setting, see [435, Theorem
1.5.], namely that if X = orbσ (x) for a bi-infinite sequence x that is not even-
tually periodic in both directions, then lim inf n p(n)/n < 2 implies unique
ergodicity. Cyr & Kra [174] noted that, without minimality assumption,
there are no more than max{b, 1} generic measures11 . They also give, for
any 4 ≤ b ∈ N, an example of a subshift with lim inf n pn /n = b − 1 < b =
lim supn p(n)/n having precisely b ergodic measures.
9 Haar measure, introduced by Alfréd Haar [295], is the unique measure on a group that is

preserved under each group rotation: x → gx.

10 Or max{b+  − 1 , 1} shift-invariant ergodic measures if b+ := lim inf p(n)/n < ∞.
11 See Remark 1.31 for the definition.
6.3. Unique Ergodicity 267

Theorem 6.28. Let (X, σ) be a minimal subshift and μ a shift-invariant

ergodic measure. Define
εμ (n) = min{μ(Z) : Z is an n-cylinder in X}.

If lim supn εμ (n) n > 0, then (X, σ) is uniquely ergodic.

Corollary 6.29. Every transitive linearly recurrent subshift is uniquely er-


Proof. Since every word w ∈ Ln reoccurs with gap ≤ Ln, Birkhoff’s Ergodic
Theorem 6.13 implies that μ([w]) ≥ 1/(Ln) > 0 for every ergodic invariant
measure μ. Therefore Theorem 6.28 applies. 

Lemma 6.30. A subshift (X, σ) is linearly recurrent (see Definition 4.1) if

and only if ε̂ := lim inf n nεμ (n) > 0 for every σ-invariant measure μ.

Proof. The implication ⇒ uses the same argument as in the previous proof.
The reverse implication ⇐ is due to Boshernitzan (unpublished) and appears
in [74]12 . Let u ∈ Ln be arbitrary, and let N = N (n, u) be the length of the
longest return word x = x1 x2 . . . xN associated to u. Note that N > n be-
cause otherwise x is periodic; see Example 4.3. For n ≤ k ≤ N , let Uk be the
collection of words ending in x1 . . . xk . Then μ( v∈Uk [v]) = μ([x1 . . . xk ]) ≥
ε̂/k. Since the sets Uk are all disjoint (after all u cannot reappear in the
return word x), it follows that
N 0 N +1
ε̂ ε̂ N
1≥ ≥ dx ≥ ε̂ log .
k n x n

Therefore N/n ≤ e1/ε̂ , uniformly in n and u ∈ Ln (X). Since every x ∈ X is a

concatenation of return words to u, linear recurrence follows for L = e1/ε̂ . 

Proof of Theorem 6.27. A sequence of words (Bk )k≥1 of lengths |Bk | → ∞

as k → ∞ is called generic for a measure μ if for every word B ∈ L (defining
a cylinder [B] = {x ∈ X : B is prefix of x}),
|Bk |B
lim = μ(B),
k→∞ |Bk |
where |Bk |B denotes the number of appearances of B in Bk . An argument
similar to Theorem 6.2 shows that there is a subsequence (kj ) and a measure
μ such that (Bki )i≥1 is generic for μ.

12 I am grateful to Fabien Durand for pointing this out to me and the streamlined argument.
268 6. Methods from Ergodic Theory

Let W = {n ∈ N : there are ≤ b right-special words of length n}. Since

p(n) ≤ (b+ 12 )n for large n and p(n+1) ≥ p(n)+#{right-special words in Ln },
the set W is infinite. For each n ∈ W , let {Rn,1 , . . . , Rn,b } ⊂ Ln be the col-
lection of right-special words. (If there are fewer, then we just duplicate
some.) By taking subsequences (at most b times), we can construct a subset
1 such that (Rn,k )  is generic for a single measure μk , k = 1, . . . , b.
W n∈W
Take m(n) = (b + 1)n. The following claim is proved as in Proposi-
tion 1.12.
Claim 1: Every m(n)-word contains a right-special word.
Let μ be an ergodic shift-invariant measure. Choose a μ-typical point
x ∈ X. For every n ∈ W 1 , we can decompose x into m(n)-words x =
Cn,1 Cn,2 Cn,3 Cn,4 . . . , and each word Cn,i contains a right-special word Bi,k(i)
so that at least one of them occurs with upper density
1 1
lim sup #{0 ≤ i < j : Bi,k(i) = Rn,k } ≥ .
j→∞ j b
Claim 2: Set η = 4b(b+1) ; then μ = 1
1−η (μ − η · μk ) is a
non-negative measure.
From Claim 2 we conclude that μ = (1 − η)μ + ημk . But since μ was
assumed to be ergodic (see Exercise 6.8), μ = μk . Hence there are at most
b ergodic measures.
It remains to prove Claim 2. Let A ⊂ X be arbitrary. Since (Rn,k )n∈N
is generic for μk , we have |Rnk |A /|Rn,k | ≥ 12 μk (A) for all sufficiently large n.
By definition of upper density, #{0 ≤ i < j : Cn,i contains Rn,k } > j/2b for
all sufficiently large j. Therefore
j j μ(A)n
|x1 . . . xj·m(n) |A > |Rn,k |A ≥ .
2b 2b 2
Dividing by jm(n) we find
|x1 . . . xj·m(n) |A μk (A) n μk (A)
≥ ≥ = η · μk (A).
jm(n) 4b m(n) 4b(b + 1)
But since x is typical for μ, also jm(n) |x1 . . . xj·m(n) |A → μ(A), and therefore
μ(A) ≥ η · μk (A). This proves Claim 2, thus completing the entire proof. 

Proof of Theorem 6.28. If lim inf n p(n)/n in Theorem 6.27 is infinite,

then lim supn nεμ (n) = 0 and there is nothing to prove. Hence we can
assume that b := lim inf n p(n)/n < ∞ and there are finitely many er-
godic shift-invariant measures. Clearly μ is one of them. Assume there
is another ergodic measure μ , and take an N -cylinder [a1 . . . aN ] such that
6.3. Unique Ergodicity 269

μ([a1 . . . aN ]) = μ ([a1 . . . aN ]). Without loss of generality, we can assume

μ([a1 . . . aN ]) < u < v < μ ([a1 . . . aN ])
and μ ([a1 . . . aN ]) ∈
/ [u, v] for every ergodic shift-invariant measure μ . By
the Birkhoff Ergodic Theorem 6.13,
lim μ x ∈ X : #{0 ≤ j < n : σ (x) ∈ [a1 . . . aN ]} ∈ [u, v] = 0.
n→∞ n
However, by minimality there are arbitrarily large n and two n-cylinders Z1
and Z2 such that
1 1
|Z1 |a1 ...aN < u < v < |Z2 |a1 ...aN .
n n
Let Z = z1 . . . zr be the smallest word having Z1 as prefix and Z2 as suffix.
|zj+1 . . . zj+n |a1 ...aN − |zj+2 . . . zj+n+1 |a1 ...aN ≤ 1,
there must be at least n(v − u) distinct integers j such that
u< |zj+1 . . . zj+n |a1 ...aN < v.
2 n(v − u) · εμ (n)
n εμ (n) ≤
2 1
≤ μ x ∈ X : #{0 ≤ j < n : σ (x) ∈ [a1 . . . aN ]} ∈ [u, v]
v−u n
→ 0
as n → ∞. This finishes the proof. 

6.3.2. Unique Ergodicity for Primitive Substitution Shifts. We start

with primitive substitution shifts. We give the original proof due to Michel
[417] here (see also [465, Theorem V.13] and Host in [85, Chapter 15]),
because it is classical, although unique ergodicity follows directly from linear
recurrence; see Corollary 6.29. More general results later in Section 6.3.3
would apply as well, as they apply to certain S-adic transformations; see
e.g. [70, Theorem 3.1(i)].
Theorem 6.31. Every primitive substitution shift (X, σ) is strictly ergodic.

Proof. Since every primitive substitution shift is minimal, see Theorem 4.17,
it remains to prove unique ergodicity. Recall that
|w|u = #{1 ≤ i ≤ |w| − |u| : wi wi+1 . . . wi+|u|−1 = u}
270 6. Methods from Ergodic Theory

stands for the number of occurrences of u in w. We claim that for every non-
empty u ∈ L(X), there is a frequency pu ∈ (0, 1) and a function εu (n) → 0
as n → ∞ such that
/ /
/ |w|u /
(6.5) / − p / ≤ εu (|w|) for all w ∈ L(X).
/ |w| − |u| + 1 u /
If μ is a σ-invariant probability measure on X, then for every non-empty
u ∈ L(X) and every n ∈ N, we have
/ /
/0 /
/ 1
|μ([u]) − pu | = / / 1[u] (σ (x)) dμ − pu //
/ X n j=0 /
0 / /
/1 /
≤ / |x[1,n+|u|] |u − pu / dμ
/n /
≤ εu (n) dμ = εu (n) → 0.
Therefore there is only one σ-invariant probability measure given by μ([u]) =
pu (and extended to the Borel σ-algebra by the Kolmogorov Extension The-
Now the proof of the claim (6.5) is fairly direct from Theorem 8.58, at
least for single letters u = a ∈ A. Indeed, let A be the matrix associated to
the substitution χ. Then Anb,a = |χn (b)|a . By the Perron-Frobenius Theorem
8.58, there are 1 < ρ < λ (the leading eigenvalue of A) and C > 0 such that
| |χn (b)|a − λn pa qb | ≤ Cρn ,
 p and q are the left and right leading eigenvectors of A, scaled so that
a∈A a qa = 1. Also, by the triangle inequality,
/ /
/ /
/ n /
| |χ (b)| − λ qb | = /
n n
|χ (b)|a − pa λ qb /
/ /

≤ | |χn (b)|a − λn pa qb | ≤ C #A ρn
| |χn (b)|a − pa |χn (b)| | ≤ | |χn (b)|a − λn pa qb | + pa | |χn (b)| − λn qb | ≤ C  ρn
for C  = C(1 + pa #A). Therefore we have
/ n /
/ |χ (b)|a /
/ / 
/ |χn (b)| − pa / ≤ C (ρ/λ) → 0,

uniformly in a, b ∈ A. For general words w ∈ L(X) (instead of w = χn (b)),

we can split

(6.6) w = v0 χ(v1 ) . . . χn−1 (vn−1 )χn (vn )χn−1 (vn−1 ) . . . χ(v1 )v0
6.3. Unique Ergodicity 271

for some maximal n such that / kvn =  and each/ vk and vk has length ≤ L :=
maxa∈A |χ(a)|. Therefore /|χ (vk )|a − pa λk / ≤ LC  ρk and the same holds
for each χk (vk ). Additionally, there can be at most 2n|u| appearances of |u|
in between the words χk (vk ) and χk (vk ) in (6.6). Altogether
| |w|a − pa |v| | ≤ 2n|u| + 2LC  ρk ≤ C  ρn .

Divide by |v| ≥ Cλn to obtain (6.5) for u = a ∈ A.

Now for a general word u ∈ L (X) we consider the -block shift as
in Section 4.2.2. By Proposition 4.22 this -block substitution χ : A →
A∗ is primitive. Hence we can apply the first half of the proof to χ and
conclude that u ∈ A appears with a single frequency in all words of the
block substitution shift space X . Since u ∈ A∗ appears in x ∈ L(X ) if and
only if u ∈ A appears in L(X ), the unique ergodicity follows. 

6.3.3. Unique Ergodicity and Bratteli-Vershik Systems. Let (XBV , τ )

be a BV-system with incidence matrices M (n), n ≥ 1, i.e., mv,w (n) =
#{e ∈ En : s(e) = v, t(e) = w}. Throughout, we assume that the diagram is
telescoped so that all M (n) are strictly positive. Recall from Definition 5.28
hw (n) = #{paths between v0 and w ∈ Vn }.

The (non-)unique ergodicity for Bratteli-Vershik systems was investigated

in [77, 80, 248] and in [121, 168, 169] for specific cases. It requires a prob-
abilistic version of the incidence matrices M (n) = (mv,w (n))v∈Vn−1 ,w∈Vn of
the Bratteli diagram:

hv (n − 1)
(6.7) K(n) = (kv,w (n))v∈Vn−1 ,w∈Vn , kv,w (n) = mv,w (n) .
hw (n)

Due to the appearance of hv (n − 1) and hw (n), the matrix K(n) is not

just a normalized version of M (n), but all the previous incidence matrices
M (1), . . . , M (n − 1) play a role.

Lemma 6.32. The columns of every K(n) all add up to 1.

Proof. Suppose μ is a τ -invariant Borel measure. Then μ(Z) = μ(Z  ) for

every two cylinder sets Z = [x1 , . . . , xn ], Z  = [x1 , . . . , xn ] ⊂ XBV with
t(xn ) = t(xn ) = w ∈ Vn . Denote this probability by pw (n). Note that
p(n) = (pw (n))w∈Vn is in general not a probability vector, but p̃(n) with
p̃w (n) := hw (n)pw (n) is. Indeed, p̃w (n) represents the total mass of all
272 6. Methods from Ergodic Theory

cylinder sets Z = [x1 , . . . , xn ] with t(xn ) = w. It follows that p(n − 1) =

M (n)p(n) and

p̃v (n − 1) = hv (n − 1)pv (n − 1) = hv (n − 1) mv,w (n)pw (n)

p̃w (n)
= hv (n − 1)mv,w (n) = kv,w (n)p̃w .
hw (n)
w∈Vn w∈Vn

(6.8) p̃(n − 1) = K(n)p̃(n),
hv (n − 1)
kv,w (n) = mv,w (n)
hw (n)
v∈Vn−1 v∈Vn−1
1 hw (n)
= hv (n − 1)mv,w (n) = = 1,
hw (n) hw (n)

as claimed. 
Example 6.33. To illustrate this lemma, we repeat the telescoping of Ex-
ample 5.30 with probabilistic incidence matrices:
  12 1 3
K(1, 3) = 1 1 1 1 = 1 .
2 0 3

Figure 6.1. Telescoping a probabilistic Bratteli diagram.

The following results appears in e.g. [80, Proposition 2.13].

Proposition 6.34. The number of ergodic invariant measures of a BV-
system is bounded by its rank.

Proof. If μ is a probability measure

 on the path space of the BV-system,
then the probabilities p̃v (n) = μ( [x1 . . . xn ] : t(xn ) = v} for v ∈ Vn , n ∈ N,
6.3. Unique Ergodicity 273

determine the measure completely due to the Kolmogorov Extension The-

orem. Furthermore, these p̃(n)’s satisfy (6.8), and we can view this as an
infinite chain of linear maps, mapping cones R#V n
≥0 to each other:
K(1) K(2) K(3) K(4)
R≥0 ←− R#V #V2 #V3
≥0 ←− R≥0 ←− R≥0 ←− · · · .

Let Σn = {x ∈ R#V n
≥0 :
j=1 xj = 1} be the unit simplex in R≥0 . The
ergodic measures correspond to the extremal points of the sets

Sn := K(n + 1) · K(n + 2) · · · K(n + j)(Σj ).

If r is the rank of the BV-system, then #Vn = r infinitely often, and Sn can
have no more than r extremal points for every n. This proves that (X, τ )
cannot preserve more than r ergodic measures. 
Lemma 6.35. Let K(n) = (kv,w )v∈Vn−1 ,w∈Vn be the probabilistic matrices of
a Bratteli diagram as defined in (6.7). Define
max{kv,w /kv,w : v ∈ Vn−1 }
ρn := max

w,w ∈Vn min{kv,w /kv,w : v ∈ Vn−1 }
for n ≥ 1. If n=1 1/ρn = ∞, then the BV-system is uniquely ergodic.

Proof. This goes as the proof of Proposition 6.34,  but now because of
Lemma 8.61, there is a unique solution to (6.8) if n 1/ρn = ∞. Hence,
under this assumption, unique ergodicity follows. See also [121] and [80, Sec-
tion 4]. 

This result gives another proof that minimal linear recurrent shifts are
uniquely ergodic, because (after telescoping to make the transition matri-
ces M (n) ≥ 1), the M (n) are still bounded, and also the entries of K(n)
∞bounded and bounded away from zero. Therefore supn ρn < ∞ and
n=1 1/ρn = ∞. In fact, we have a type of exponential mixing:

Lemma 6.36. Assume that the Bratteli diagram of a minimal linearly re-
current (with constant L) shift is telescoped so that its transition matrices
are strictly positive. Then there exist C > 0 and β = β(L) ∈ (0, 1) such that
for every 0 ≤ k ≤ n and v ∈ Vn−k , w ∈ Vn ,
/ /
/ μ([v] ∩ [w]) /
/ − μ([v]) / ≤ Cβ k .
/ μ([w]) /
Here [v] = {x ∈ XBV : t(xn−k ) = v} and [w] = {x ∈ XBV : t(xn−k ) = w}
are cylinder sets.

Proof. Let K = K(n − k + 1) · · · K(n) and let Σn denote the unit simplex
in R#V
≥0 . Then there is β ∈ (0, 1) (which can be derived from (8.29)), such
274 6. Methods from Ergodic Theory

that diam(K(Σn )) ≤ Cβ k diam(K). This means that for each v ∈ Vn−k , the
entries Kv,w are no more than Cβ k apart from each other or from any of
their convex combinations. That is,
/ /
/ /
/ /
|μ([v] ∩ [w]) − μ([v])μ([w])| = /Kv,w μ([w]) − Kv,w μ([w ])μ([w])//

/ w ∈Vn /
≤ Cβ k μ([w]).
Now divide by μ([w]) to get the lemma. 
Example 6.37. Let F0 , F1 , F2 , F3 , F4 , . . . = 1, 1, 2, 3, 5, . . . be the Fibonacci
numbers. For the Fibonacci Bratteli-Vershik system of 1 Figure 5.13 (i.e. the
diagram is stationary with M (1) = (1 1) and M (n) = 1 0 for every n ≥ 2),
we find hvn (n) = Fn and hwn (n) = Fn−1 for the vertex sets Vn = {vn , wn }.
Therefore  Fn−1
F 1
K(n) = F n .
Fn 0
After telescoping
⎛ ⎞
2Fn−2 Fn−2
K(n − 1)K(n) = ⎝ Fn Fn−1

Fn−3 Fn−3
Fn Fn−1

we obtain strictly positive

√ probabilistic incidence matrices. Now, for the
1+ 5
golden mean γ = 2 , we find
/ / / /
/ 2Fn−2 Fn−2 / / Fn−3 Fn−3 / 2Fn−2 Fn−3
/ / / / −4
/ Fn − Fn−1 / + / Fn − Fn−1 / = Fn Fn−1 → 2γ = 0

by Binet’s formula (8.5). Lemma 6.35 shows that the Fibonacci Bratteli-
Vershik system is uniquely ergodic. Indeed, we can compute
2F2n−1 /F2n √
ρ̃n := ρ(K(2n − 1)K(2n)) = = 2,
F2n−1 /F2n

so n 1/ρ̃n = ∞.

Lemma 6.35 gives only a sufficient condition, but this is not a necessary
condition; see Example 6.37 below. An “if and only if” condition for unique
ergodicity is the following.
Theorem 6.38. A Bratteli-Vershik system (XBV , τ ) is uniquely ergodic if
and only if, after sufficient telescoping,
lim max |kvw (n) − kvw (n)| = 0.
n→∞ w=w ∈Vn
6.3. Unique Ergodicity 275

This is [77, Theorem 3.1]13 and this paper also contains a description of
BV-systems with any fixed number of ergodic measures.
Example 6.39. Assume that we have a rank 2 simple Bratteli diagram with
Vn = {vn , wn } and incidence matrices
an − 1 1
M (1) = (1, 1) and M (n) = for an ≥ 2, n ≥ 2.
1 an − 1
Then hvn (n) = hwn (n) = nj=1 aj . This gives
1 − εn εn 1
K(n) = , εn = , n ≥ 2.
εn 1 − εn an

We compute that ρn = 1−ε n
for ρn as in Lemma 6.35, so if n εn = ∞ (hence

n 1/ρn = ∞), then the corresponding BV-system is uniquely ergodic.

Now assume that n εn < ∞. Telescope the matrices K(n) to
1 − ηn,r ηn,r
L(n, r) = (lvw (n, r))v∈V (n−1) := K(r) · · · K(n) = ,
w∈V (r) ηn,r 1 − ηn,r
for r ≥ n ≥ 2, with  − 2ηn,r εr+1 <
 ηn,n = εn and ηn,r+1 = ηn,r + εr+1
ηn,r + εr+1 . Since n≥2 εn < ∞, we have limr ηn,r ≤ r≥n εr → 0 as
n → ∞. Therefore
lim max |lvw (n, r) − lvw (n, r)| → 2 as n → ∞.
r→∞ w=w ∈Vr

In other words, no telescoping will be enough to satisfy Theorem 6.38. There-

fore unique ergodicity fails. Instead, there will be two ergodic invariant mea-
sures (not more, because the rank is 2), and what they look like depends on
the ordering on the Bratteli diagram; cf. [79, Lemma 2.7]. Further examples
of this sort can be found in [7, Section 3].

6.3.4. Unique Ergodicity and Enumeration Systems. Barat et al.

proved the following results on enumeration system (see [49, Corollary 2
and Theorem 7] for proofs):
Theorem 6.40. Let (XG , a) be the enumeration system based on the enu-
meration scale (Gj )j≥0 .

(1) If k≥1 1/Gk < ∞, then XG supports an a-invariant probability
(2) Conversely, the existence of an a-invariant probability measure im-
plies that lim inf k k/Gk = 0.

(3) If Gj −Gj−1 → ∞ and the sequence (m k≥m 1/Gk )m≥1 is bounded,
then (XG , a) is uniquely ergodic.
13 In [7, Proposition 3.1] the complete case of 2 × 2 matrices with the equal row sum property

is given.
276 6. Methods from Ergodic Theory

A special case holds for unimodal maps.

Corollary 6.41. Let f be a unimodal map with kneading map Q. If k−Q(k)
is bounded, then f : ω(c) → ω(c) is uniquely ergodic.

Proof. The cutting times (Sk )k≥1 form an enumeration scale with Sk =
Sk−1 + SQ(k) . The boundedness of k − Q(k) implies that Sk , ∞ exponen-
tially, and therefore condition (2) in Theorem 6.40 is satisfied. 

Unique ergodicity is not given for minimal critical ω-limit sets; see [121].
In fact, the collection of kneading maps is so rich that for every simplex Σ
with a compact totally disconnected set of extremal points, there is a knead-
ing map Q such that the corresponding system (ω(c), f ) has its Choquet
simplex homeomorphic to Σ; see [168]. This goes to the extent that the
invariant measures can form a Poulsen simplex.
Note that a : XG → XG need not be continuous, so the existence of an
a-invariant measure does not follow from the Krylov-Bogul’yubov Theorem
6.2. Theorem 6.40 shows, however, that unique ergodicity holds whenever
lim inf j Gj+1 /Gj > 1, so also if e.g. Gj = 2Gj−1 + 1 for all j ≥ 1, even
though in this case a is not continuous. As shown in [121, 168], there are
non-uniquely ergodic enumeration systems,
Example 6.42. Condition (1) in Theorem 6.40 is fairly strict. Having
n k → 0 is not enough as [49, Example 6]
k/G shows. Here we choose Jn =
k=1 k! for n ∈ N, G0 = 1, and

(n + 1)Gj−1 + 1 if j = Jn ,
Gj =
Gj−1 + 1 otherwise.
In this case a : XG → XG is discontinuous at (0) (because Gj − Gj−1 = 1
infinitely often). Also (m)0 = 0 for each GJn ≤ m ≤ GJn+1 , but one can
compute that (m)0 = 1 for a definite proportion of the integers GJn−1 <
m < GJn . As a result, the cylinder [1] has no well-defined visit frequency for
the a-orbit of (0), or in fact of any x ∈ XG . Therefore a admits no invariant

6.3.5. Unique Ergodicity and Interval Exchange Transformations.

Recall the definition of interval exchange transformations with permutation π
on {1, . . . , d} and length vector λ = (λ1 , . . . , λd ). Under the Keane condition,
interval exchange transformations are minimal. The next natural question of
whether they are uniquely ergodic, i.e. whether Lebesgue measure is the only
T -invariant probability measure, has a negative answer. The work of Veech
[541] contains counterexamples, but this was not realized as such, partly
because of the terminology he used (skew-product of rotations), except by
Keynes & Newton [358] who created an IET T with fives pieces that is not
6.3. Unique Ergodicity 277

uniquely ergodic (in fact, it is an IET on three pieces with eigenvalue −1, and
therefore T 2 on five pieces allows two independent T -invariant functions).
Keane [348] found a class of examples with four pieces, and this is the
minimal possible number, because an IET with d pieces can have at most
d/2 ergodic probability measures; see [345,347,542]. Chaika & Masur [151]
gave an example of a six piece IET with two ergodic measures and one extra
generic (but non-ergodic) measure.
We start with Keane’s counterexample; the technique of proving non-
unique ergodicity is similar to what is discussed in Section 6.3.3. This ex-
ample T has permutation π : {1, 2, 3, 4} → {3, 2, 4, 1} and lengths λi = |Δi |
satisfying λ3 > λ4 > λ1 ; see Figure 6.2.

m − 1 times n times

Δ1 Δ2 Δ3 Δ4
d c b a

ca b d

Δ4 Δ2 Δ1 Δ3

Figure 6.2. An IET and the first return map to D.

The first return map to Δ4 is again an IET with four pieces and per-
mutation π  = π provided we number the sub-pieces of Δ4 in reverse order.
Since λ4 > λ1 , T maps the right part Δ of Δ4 into Δ2 , and then T translates
T (Δ ) some m − 1 ∈ N times within Δ2 until it covers the right end-point
γ2 of Δ2 . Thus there is a decomposition Δ = b ∪ a into intervals such
that γ2 is the common boundary point of T m (b) and T m (a). Since T |Δ3
is a translation over λ4 − λ1 = |Δ |, T m+1 (b) is adjacent to the other side
of T m (a). Now the adjacent intervals T m (a), T m+1 (b) are mapped n times
within Δ3 (for some n ∈ N) before returning to Δ4 . The left part Δ of
Δ4 is first mapped onto Δ1 , then into Δ3 , and after n − 1 more iterates it
covers γ3 . Thus there is a decomposition Δ = d ∪ c into intervals such that
γ3 is the common boundary point of T n+1 (d) and T n+1 (c). The interval c
is now back in Δ4 and d needs one more iterate to return. As a whole, the
itineraries of the four sub-pieces of Δ4 before returning to Δ4 are described
278 6. Methods from Ergodic Theory

by the substitution and associate matrix:

⎪ ⎛ ⎞
⎪1 → 4 2m−1 3n ,
⎪ 0 0 1 1

⎨2 → 4 2m 3n , ⎜m − 1 m 0 0⎟
χ: A=⎜ ⎝ n

⎪ 3 →
 4 1 3 n−1 , n n − 1 n⎠

⎩4 → 4 1 3n , 1 1 1 1

Proposition 6.43. There is an IET as above that is not uniquely ergodic.

From the proof it is clear that there are actually uncountably many such
IETs. Whether Lebesgue measure is ergodic for any of them is still an open

Proof. For k ∈ N , let Ak be the associated matrix to the substitution χk of

the k-th induction step. It was shown in [348, Theorem 1] that for any choice
on (mk )k∈N and (nk )k∈N of integers appearing in the matrices (Ak )k∈N , there
is an IET realizing these sequences.
Let fk : Σ3 → Σ3 , v → vAk /&vAk &1 be the map v → vAk normalized to
the 3-simplex Σ3 . Let v0 = 12 (0, 1, 1, 0) and w0 = (0, 0, 1, 0) be two vectors
in Σ3 . The goal is to show that for appropriate choices of the integers mk
and nk appearing in Ak , the differences
(6.9) &f1 ◦ · · · ◦ fk (v0 ) − f1 ◦ · · · ◦ fk (w0 )&1 ≥ for all k ∈ N,
because then the convergence to limit frequencies of symbols in σ j (ρT ) is
not uniform in j. Oxtoby’s Theorem 6.20 then shows that unique ergodicity
Take ε ∈ (0, 18 ) arbitrary and N ∈ N such that 2/N < ε. Set nk = N 2k ,
mk = 2nk , and εk = ε2−k . Note that &v0 − w0 &1 = 1. Therefore (6.9) follows
immediately from the following claim:
&fk (v) − v0 &1 , &fk (w) − w0 &1 ≤ εk
for all v, w ∈ Σ3 with &v − v0 &1 , &w − w0 &1 < εk+1 . To prove this, assume
that v = v0 + η, where &η&1 < εk+1 . Then
1 1
fk (v) = (1, 2nk , 2nk − 1, 2) + (η3 + η4 , 2nk (η1 + η2 ) − η1 ,
2nk + O(1) 2

nk (η1 + η2 + η3 + η4 ) − η3 , η1 + η2 + η3 + η4 )
 1  1
= v0 + 0, η1 + η2 , (η1 + η2 + η3 + η4 ), 0 + O .
2 nk
Therefore, since 1/N < ε/2, &fk (v) − v0 &1 ≤ 32 εk+1 + O(n−1
k ) < εk . The
computation for w is analogous. This finishes the proof of the claim and of
the whole proposition. 
6.3. Unique Ergodicity 279

Despite these examples, the prevalent case is that IETs are uniquely
 Keane & Rauzy showed in [351] that a residual set of the IETs in
d≥2 Σd−1 × Sd is uniquely ergodic. In [348], Keane stated the conjecture
that Lebesgue-a.e. IETs are uniquely ergodic, and this was proven in separate
papers (but in the same issue of the Annals of Mathematics) by Veech [543]
and Masur [410].

0 (12) 1

1 0
0 (132) (13) (123) 1
1 0

Figure 6.3. Graphs for the Rauzy induction with d = 2 and d = 3.

Rauzy induction constitutes a dynamical system on the parameter space

of the family of interval exchange transformations. Recall that for IETs
on d pieces, this parameter space is Σd−1 × Sd , where Σd−1 = {λ ∈ [0, 1]d :
i=1 λi = 1} is the d − 1-dimensional simplex and Sd is the group of permu-
tations on {1, . . . , d}. Each copy Σd−1 × {π} is divided in halves Σ0d−1 × {π}
and Σ1d−1 × {π} according to whether λd > λπ−1 (d) (Type 0) or λd < λπ−1 (d)
(Type 1). The case λd = λπ−1 (d) fails the Keane condition. The Rauzy induc-
tion Θ maps Σ0d−1 × {π} diffeomorphically onto Σd−1 × {π  } and Σ1d−1 × {π}
diffeomorphically onto Σd−1 × {π  } for some π  , π  ∈ Sd (and π  = π  if
d ≥ 3). As such, Θ is a 2-to-1 map and it is schematically represented by
a graph with the permutations π ∈ Sd as vertices and arrows π → π  and
π → π  ; see Figure 6.3 for the cases d = 2 (circle rotations) and d = 3.
Reducible permutations are left out, e.g. for d = 3 there are only three ver-
tices instead of #S3 = 3! = 6, because the permutations (12), (23) and the
identity are reducible.
The case d = 4, see Figure 6.4, is the first case that shows that the
graph need not be irreducible but can fall apart in so-called Rauzy classes.
All permutations in the first Rauzy class in Figure 6.4 keep (at least two)
adjacent intervals adjacent, and therefore it describes effectively IETs on
three intervals. On each Rauzy class, Θ is topologically transitive and in
fact topologically exact (i.e. locally eventually onto).
280 6. Methods from Ergodic Theory

1 (1234) 0 (1324)
0 0

(14) (13)(24)
1 1
0 (1432) 1 (1423)

0 1
1 (1342) (142) (143) 0
1 1

0 0
0 (1243) (124) (134) 1
1 0

Figure 6.4. The graph for the Rauzy induction with d = 4.

The restriction of Θ to the half-simplices Σd−1 is expanding diffeomor-
phisms. This expansion is achieved by the normalization (i.e. division by
1 − λd or 1 − λπ−1 (d) ) and therefore it is not uniform. Indeed, Σ0d−1 has
a hyperplane {λd = 0} and Σ1d−1 has a hyperplane {λπ−1 (d) = 0} of neu-
tral points, which can in fact contain fixed points. To overcome this lack
of expansion, we can accelerate (i.e. take an induced map) called Zorich
acceleration: Z : Σd−1 × Sd → Σd−1 × Sd defined as follows. Let
τ (λ) = min{n ≥ 1 : Θn changes type at λ} and Z(λ, π) = Θn (λ, π).
There are countably many connected components of level sets of τ , and it
can be shown that Z is uniformly expanding on each of them. The following
theorem, after Veech [543] and Masur [410], is the main ingredient for the
proof of the Keane conjecture.
6.3. Unique Ergodicity 281

Theorem 6.44. Rauzy induction Θ preserves an infinite measure, equiva-

lent to d − 1-dimensional Lebesgue measure × counting measure, and it is
ergodic on each Rauzy class. Zorich acceleration preserves a finite measure
μZ , equivalent to d − 1-dimensional Lebesgue × counting measure, and it
is ergodic on each Rauzy class. Moreover, its density dμdλ
Z (λ)
is a rational
dμZ (λ)
function in λ, and dλ is bounded and bounded away from 0.
Now the Keane conjecture can be stated as a corollary of the above.
Corollary 6.45. The Keane conjecture holds: Lebesgue-a.e. irreducible in-
terval exchange transformation is uniquely ergodic, with Lebesgue measure
as its only invariant probability measure.

Proof. Assume we are in some Rauzy class R; the proof for every Rauzy
class is the same. Rauzy induction Θ “removes” the rightmost interval of
length λd or λπ−1 (d) , whichever is shorter. So by applying Θ repeatedly,
each interval j will eventually become rightmost, so some χ-image will have
j as second letter; see Table 4.3 in Section 4.4. Hence, we can find an open
set U on which Z N is continuous and U visits all parts Σd−1 × {π} in R
sufficiently often in these N iterates that the telescoping of the corresponding
substitution χ1 ◦· · ·◦χN has a strictly positive transition matrix A. Therefore
ρ(A) as computed in (8.29) is a fixed positive number, and consequently, A is
a strict contraction in Hilbert metric. By Birkhoff’s Ergodic Theorem 6.13,
μZ -a.e. (λ, π) ∈ R visits U infinitely often, so we can apply Lemma 8.61 and
conclude that such (λ, π) corresponds to a uniquely ergodic IET. Since μZ is
equivalent to Lebesgue measure × counting measure, the Keane conjecture
Remark 6.46. The collection of non-uniquely ergodic IETs of d pieces has
Lebesgue measure 0 in d − 1-dimensional parameter space, but their Haus-
dorff dimension is equal to d − 32 for d ≥ 4; see [37, 152]. Dimension d = 2, 3
is too low to get any non-unique ergodicity other than via a rational rela-
tion between the lengths of the pieces (for d = 2, i.e. circle rotations, this
means a rational rotation number) and therefore the Hausdorff dimension of
non-uniquely ergodic IETs is d − 2.
Remark 6.47. Katok (quoted in [165]) showed that for every IET, uniquely
ergodic or not, Lebesgue measure is not mixing; see Section 6.7. Avila &
Forni [40] showed that typical IETs are weak mixing. This comes after the
result of Nogueira & Rudolph [432] that generic IETs have no continuous
eigenfunctions apart from constant functions, and results by Sinaı̆ & Ulci-
grai [514] showing that IETs for which the Rauzy induction has a certain
type of periodicity are weak mixing. Conditions ensuring that IETs have a
continuous spectrum and satisfy Sarnak’s conjecture were given in [131] and
[342], respectively.
282 6. Methods from Ergodic Theory

6.4. Measure-Theoretic Entropy

Every T -invariant probability measure μ has its own entropy, called Kol-
mogorov entropy, measure-theoretic entropy, or (a misnomer) metric
entropy. It is denoted as hμ (T ).
Both topological and measure-theoretic entropy are measures for the
complexity of a dynamical system (X, T ), but hμ (T ) also plays a role in
information theory under the name of Shannon entropy. The topological
entropy from3 Section 2.4 gives the exponential growth rate of the cardinality
of14 Pn = n−1 T −k P for some natural partition of the space X. In this
section, instead of just counting Pn , we take a particular weighted sum of
the elements Zn ∈ Pn . If the mass of μ is equally distributed (as much as
possible) over all Zn ∈ Pn , then the outcome of this sum is largest; μ would
then be the measure of maximal entropy.
The weighing function is

ϕ : [0, 1] → R, ϕ(x) = −x log x,

with ϕ(0) := limx 0 ϕ(x) = 0. Clearly ϕ (x) = −(1+log x), so ϕ(x) assumes
its maximum at 1/e and ϕ(1/e) = 1/e. Also ϕ (x) = −1/x < 0, so ϕ is
strictly concave.
Given a finite partition P of a probability space (X, μ), let

(6.10) Hμ (P) = ϕ(μ(P )) = − μ(P ) log(μ(P )),

P ∈P P ∈P

where we can ignore the partition elements with μ(P ) = 0 because ϕ(0) = 0.
For a T -invariant probability measure μ on (X, B, T ) and a partition P,
define the entropy of μ w.r.t. P as
1 4
(6.11) hμ (T, P) = lim Hμ T −k P .
n→∞ n
3 −k P)
This limit exists by Fekete’s Lemma 1.15 and the fact that n1 Hμ ( n−1
k=0 T
is subadditive; see [551, Corollary 4.9.1]. Finally, the measure-theoretic
entropy of μ is

(6.12) hμ (T ) = sup{hμ (T, P) : P is a finite partition of X}.

The next theorem is the key to really computing entropy, as it shows

that a single well-chosen partition P suffices to compute the entropy as
hμ (T ) = hμ (T, P).

14 The joint P ∨ Q := {P ∩ Q : P ∈ P, Q ∈ Q}. The expression here is an n-fold joint.

6.4. Measure-Theoretic Entropy 283

Theorem 6.48 (Kolmogorov-Sinaı̆). Let (X, B, T, μ) be a measure-preserving

dynamical system. If partition P is such that
' 3∞
T −k P generates B if T is non-invertible,
3∞ j=0 −k
j=−∞ T P generates B if T is invertible,
then hμ (T ) = hμ (T, P).

We haven’t explained properly what “generates B” means, but the idea

to have in mind is that (up to measure 0), every two points in X should be in
3 −k P (if T is non-invertible) or of
3n−1 −k P
different elements of n−1 k=0 T k=−n T
(if T is invertible) for some sufficiently large n.
Example 6.49. Recall from Definition 1.32 that Bernoulli shifts are the
full shifts AN0 or Z (where we take A = {1, . . . , N }) equipped with a sta-
tionary product measure μp , depending on a fixed probability vector p =
(p1 , . . . , pN ). The partition into cylinder sets [a], a ∈ A, is generating, and
the entropy can be computed to be ([551, Theorem 4.26])

(6.13) hμp (σ) = − pi log pi .


The existence of a finite generating partition is guaranteed by a theorem

due to Krieger [372].
Theorem 6.50. Let (X, B, μ) be a Lebesgue space15 . If T is an invertible
measure-preserving transformation of finite entropy, then there is a finite
generator P = {P1 , . . . , PN } and ehμ (T ) ≤ N ≤ ehμ (T ) + 1.
Example 6.51. An example can be created where a likely candidate of
partition is not generating is the doubling map T2 : S1 → S1 , T2 (x) =
2x mod 1 preserving Lebesgue measure μ. The partition P = {[0, 12 ), [ 12 , 1)}
separates each pair of points, because if x = y, say 2−(n+1) < |x − y| ≤ 2−n ,
then there is k ≤ n such that T2k x and T2k y belong to different partition
However, Q = {[ 14 , 34 ), [0, 14 ) ∪ [ 34 , 1)} does not separate points. Indeed,
if y = 1 − x, then T2k (y) = 1 − T2k (x) for all k ≥ 0, so x and y belong to the
same partition element, and T2k (y) and T2k (x) will also belong to the same
partition element.
The partitions P and Q are special cases of the family of partitions
{J0b , J1b } in Example 1.43. The partition P can be used to compute hμ (T ),
while Q in principle cannot (although here, for every Bernoulli measure
μ = μ(p,1−p) , we have hμ (T2 ) = hμ (T, P) = hμ (T, Q)).

15 That is, (X, B, μ) is isomorphic to ([0, 1], Leb) ! (countable set with counting measure).
284 6. Methods from Ergodic Theory

For piecewise differentiable maps T : I → I on the interval that preserve

a measure μ equivalent to Lebesgue measure, there is the Rokhlin formula
to find the entropy:
0 1
(6.14) hμ (T ) = log |T  | dμ.
See [475], [549, Theorem 9.7.3], and [390] for proofs16 The entropy in this
case is positive, provided T is non-invertible in the sense that μ({x ∈ [0, 1] :
#T −1 (x) ≥ 2}) > 0. This follows from the following general result.
Proposition 6.52. Let (X, T ) be a transformation that preserves an ergodic
probability measure μ. If there are two disjoint sets A, B such that T (A) =
T (B) are measurable and μ(A), μ(B) > 0, then μ has positive entropy.

Proof. Construct the factor system Y ⊂ {A, B, C}N0 where C = X \(A∪B)

with the itinerary map as factor map. This factor system is non-invertible,
but such non-invertible symbolic systems have positive entropy; see [209,
Fact 2.3.12]. It follows that (X, T ) has positive entropy as well. 

6.5. Isomorphic Systems

An isomorphism is the measure-theoretic equivalent of conjugacy. It is both
a stronger notion than conjugacy (since it requires the preservation of mea-
sures) and a weaker notion (isomorphisms need not be homeomorphisms or
continuous or even defined everywhere). In particular, the phase spaces of
isomorphic dynamical systems need not be homeomorphic; see e.g. Theo-
rem 6.56.
Definition 6.53. Two measure-preserving dynamical systems (X, B, T, μ)
and (Y, C, S, ν) are called isomorphic if there are X  ∈ B, Y  ∈ C, and
φ : Y  → X  such that
• μ(X  ) = 1, ν(Y  ) = 1;
• φ : Y  → X  is a bi-measurable bijection;
• φ is measure preserving: ν(φ−1 (B)) = μ(B) for all B ∈ B;
• φ ◦ S = T ◦ φ.
Example 6.54. The doubling map T2 : [0, 1] → [0, 1] with Lebesgue mea-
sure is isomorphic to the one-sided ( 12 , 12 )-Bernoulli shift (X, B, σ, μ), via the
coding map i : Y  → X  , where Y  = [0, 1] \ {dyadic rationals} because these
dyadic rationals map to 12 under some iterate of T , and at 12 the coding map
is not well-defined. Note that X  = {0, 1}N \ {v10∞ , v01∞ : v ∈ {0, 1}∗ }.
16 Note that the Rokhlin formula can fail if T has infinitely many branches; see Example 6.71

and [127]. The matrix A in that example is the transition matrix of a countably piecewise
Markov map T : [0, 1] → [0, 1] such that the slope |T  | > 4 wherever defined. Yet the entropy is
hμ (T ) < htop (T ) = log 4. Similar examples can be found in [94, 95, 421].
6.5. Isomorphic Systems 285

Example 6.55. Let p = (p1 , . . . , pN ) be some probability vector with all

pi > 0. The one-sided p-Bernoulli shift is isomorphic to ([0, 1], B, T, Leb)
where T : [0, 1] → [0, 1] has N linear branches of slope 1/pi . The one-
sided p-Bernoulli shift is also isomorphic to ([0, 1], B, S, ν) where S(x) =
N x mod 1. But here ν is another measure that gives [ i−1 i
N , N ] the mass pi
i−1 j−1 i−1 j
and [ N + N 2 , N + N 2 ] the mass pi pj , etc.
Theorem 6.56. Sturmian shift with rotation number α ∈ / Q (with its unique
invariant probability measure) is isomorphic to (S1 , Rα , μ).

Proof. Lebesgue measure μ is the only Rα -invariant probability measure.

The itinerary map i : S1 → X is a bijection from S1 \ orb(0) onto its image.
Since μ(orb(0)) = 0, i serves as isomorphism φ and ν(A) := μ(φ−1 (A)) is
automatically σ-invariant. By unique ergodicity of (X, σ), ν is indeed the
only σ-invariant measure on X. 

Clearly, invertible systems cannot be isomorphic to non-invertible sys-

tems, i.e. where μ({x : #T −1 (x) ≥ 2}) > 0. But there is a construction to
make a non-invertible system invertible, namely the natural extension.
Definition 6.57. Let (X, B, μ, T ) be a measure-preserving dynamical sys-
tem. A system (Y, C, S, ν) is a natural extension of (X, B, μ, T ) if there
are X  ∈ B, Y  ∈ C, and φ : Y  → X  such that
• μ(X  ) = 1, ν(Y  ) = 1;
• S : Y  → Y  is invertible;
• φ : Y  → X  is a measurable surjection;
• φ is measure preserving: ν(φ−1 (B)) = μ(B) for all B ∈ B;
• φ ◦ S = T ◦ φ.

Any two natural extensions are isomorphic, see [456, page 13], so it
makes sense to speak of the natural extension. Sometimes natural extensions
have explicit formulas, e.g. the baker transformation

(2x, y2 ) if x ≤ 12 ,
b : [0, 1] → [0, 1] ,
2 2
b(x, y) =
(2x − 1, 1+y
2 ) if x > 12
is the natural extension of the doubling map T2 (x) = 2x mod 1. There is
also a general construction: Set
Y = {(xi )i≥0 : T (xi+1 ) = xi ∈ X for all i ≥ 0}
with S(x0 , x1 , . . . ) = (T (x0 ), x0 , x1 , . . . ). Then S is invertible (with the left
shift σ = S −1 ) and
ν(A0 , A1 , A2 , . . . ) = inf μ(Ai ) for (A0 , A1 , A2 . . . ) ⊂ S
286 6. Methods from Ergodic Theory

is S-invariant. The surjection φ(x0 , x1 , x2 , . . . ) := x0 makes the diagram

commute: T ◦ φ = φ ◦ S. Also φ is measure preserving because, for each
A ∈ B,
φ−1 (A) = (A, T −1 (A), T −2 (A), T −3 (A), . . . )
and clearly ν(A, T −1 (A), T −2 (A), T −3 (A), . . . ) = μ(A) because μ(T −i (A)) =
μ(A) for every i by T -invariance of μ.

6.5.1. The Bernoulli Property and Ornstein’s Theorem.

Definition 6.58. Let (X, B, μ, T ) be a measure-preserving dynamical sys-
(1) If T is invertible, then the system is called Bernoulli if it is iso-
morphic to a two-sided Bernoulli shift.
(2) If T is non-invertible, then the system is called one-sided Bernoulli
if it is isomorphic to a one-sided Bernoulli shift.
(3) If T is non-invertible, then the system is called Bernoulli if its
natural extension is isomorphic to a two-sided Bernoulli shift.

The third Bernoulli property is quite general (for example, one-sided

SFTs and similar shifts with natural measures have this property; see e.g.
[158]), even though the isomorphism φ may be very difficult to find explicitly.
Expanding circle maps that are sufficiently smooth are also Bernoulli, i.e.
have a Bernoulli natural extension; see [390]. Being one-sided Bernoulli,
on the other hand, is quite special. If T : [0, 1] → [0, 1] has N linear surjec-
tive branches Ii , i = 1, . . . , N , then Lebesgue measure m is invariant, and
([0, 1], B, m, T ) is isomorphic to the one-sided Bernoulli system with prob-
ability vector (|I1 |, . . . , |IN |); see Example 6.55. If T is piecewise C 2 but
not piecewise linear, then it has to be C 2 -conjugate to a piecewise linear
expanding map to be one-sided Bernoulli; see [123].
Entropy is preserved under isomorphisms. This is a direct consequence
of being isomorphic. The opposite question, namely whether systems with
the same entropy are isomorphic, was solved for two-sided Bernoulli shifts
by Ornstein [438] in 1974 (cf. [551, Theorem 4.28] and [479, Chapter 7]).
Theorem 6.59 (Ornstein’s Theorem). Two two-sided Bernoulli shifts are
isomorphic if and only if they have the same entropy.

This is a remarkable result; e.g. it implies that the ( 14 , 14 , 14 , 14 )-Bernoulli

shift and the ( 12 , 18 , 18 , 18 , 18 )-Bernoulli shift are isomorphic, although the first
is on four and the second on five symbols.
Remark 6.60. Ornstein’s Theorem is usually stated for Bernoulli shifts,
but it holds for two-sided SFTs with stationary Markov measures as well,
6.6. Measures of Maximal Entropy 287

see Definition 6.65 below, because these systems are isomorphic to Bernoulli
shifts [261]. Ornstein’s Theorem also holds for infinite alphabet shifts; see
[209]. A short and elegant proof was given by Downarowicz & Serafin [216].
Ornstein’s Theorem strengthened a result by Sinaı̆ [511] from 1962:
Theorem 6.61 (Sinaı̆’s Theorem). Every ergodic measure-preserving trans-
formation (X, B, T, μ) with entropy hμ (T ) has every p-Bernoulli shift with
hμp (σ) ≤ hμ (T ) as measure-theoretic factor.
Sinaı̆’s Theorem says, for example, that if two Bernoulli shifts (with
probability vectors p and p ) have the same entropy, then there are measure-
preserving factor maps ψ and ψ  from the one to the other and vice versa.
But this leaves unanswered whether ψ  = ψ −1 . Ornstein’s Theorem settles
this in the positive.
We stress that (unlike Sinaı̆’s Theorem) Ornstein’s Theorem holds for
two-sided shifts because in the one-sided shift setting the number of
preimages is, almost surely, preserved under isomorphisms. Walters [550]
showed that the one-sided (p1 , . . . , pm )-Bernoulli shift is isomorphic to the
(p1 , . . . , pn )-Bernoulli shift if and only if m = n and (p1 , . . . , pm ) is a permu-
tation of (p1 , . . . , pn ).
The isomorphism for the two-sided setting is very complicated and has
nothing to do with sliding block codes (no continuity is required). The
proof of the existence of the isomorphism by Ornstein is not constructive,
but in 1979, Keane & Smorodinsky [352] (sketched also in [456]) gave a
constructive proof showing that the isomorphism can be made finitary.
Definition 6.62. A factor map ψ : (X, μ) → (Y, ν) is called finitary if one
of the following equivalent properties holds:
• ψ is continuous μ-a.e.
• For μ-a.e. x ∈ X, there is N = N (x) such that the zeroth entry of
ψ(x) depends only on [x−N , . . . , xN ]. In this sense, a finitary factor
map is a sliding block code with window size depending on x.
If ψ is invertible ν-a.e. and ψ −1 satisfies the above two properties, then ψ is
a finitary isomorphism.

6.6. Measures of Maximal Entropy

There is an important connection between topological and measure-theoretic
entropy (see Section 6.4); namely the former is the supremum over the latter.
Theorem 6.63 (Variational Principle). If (X, T ) is a continuous dynamical
system on a compact space, then
(6.15) htop (T ) = sup{hμ (T ) : μ is a T -invariant probability measure}.
288 6. Methods from Ergodic Theory

Therefore it makes sense to define:

Definition 6.64. A T -invariant probability measure μ is called a measure
of maximal entropy if hμ (T ) = htop (T ).

For β-transformations and tent maps (or every interval maps T of con-
stant slope s > 1, so htop (T ) = log s), the absolutely continuous (w.r.t.
Lebesgue measure) invariant probability measures are also the measures of
maximal entropy. This follows from the Rokhlin formula (6.14). See Re-
mark 3.70 and Example 3.92 for precise formulas for these measures.
Full shifts on A = {0, . . . , d − 1} have a unique measure of maximal en-
tropy, namely the ( d1 , . . . , d1 )-Bernoulli measure. A generalization of Bernoulli
measures for SFTs is Markov measures. For such measures, the probability
of xk depends on the value of xk−1 but not on the further past . . . xk−3 , xk−2 .
Definition 6.65. Let A = {0, . . . , d − 1} be our alphabet. Define a d × d
probability transition matrix P = (pij )d−1i,j=0 where all rows are probabil-
ity vectors. Let π ∈ R be a probability row-vector. The measure defined

on cylinders as
μπ ([x0 . . . xn ]) = πx0 px0 x1 px1 x2 · · · pxn−1 xn
and extended to the Borel σ-algebra B of AN0 by the Kolmogorov Extension
Theorem is call a Markov measure. It is shift-invariant and (provided P
is irreducible) ergodic.

The following result follows directly from the Perron-Frobenius Theorem

Theorem 6.66. If P is a primitive d × d probability matrix, then there is a
unique stationary probability row-vector p = (p1 , . . . , pd ) with the following
equivalent properties:
(i) p is the probability left-eigenvector of P w.r.t. eigenvalue 1.
⎛ ⎞
p0 . . . pd−1
⎜ .. ⎟ as n → ∞.
(ii) P n → ⎝ ... . ⎠
p0 . . . pd−1
(iii) xP n → p as n → ∞ for every probability vector x ∈ Rd .
The convergence in items (ii) and (iii) is exponential. Moreover, the expected
first return time E(τi : x0 = i) = 1/pi , where τi (x) = min{n > 0 : xn = i}.

For subshifts of finite type, Shannon [495] and Parry [446] (see also
[364, Section 6.2] and [346, Section 4.4]) demonstrated how to construct
the measure of maximal entropy. Let (ΣA , σ) be a subshift of finite type on
6.6. Measures of Maximal Entropy 289

alphabet {0, . . . , d − 1} with transition matrix A = (Ai,j )d−1

i,j=0 , so x = (xn ) ∈
Σn if and only if Axn ,xn+1 = 1 for all n. Let us assume that A is aperiodic
and irreducible. Then by the Perron-Frobenius Theorem 8.58, there is a
unique real eigenvalue λ, of multiplicity one, which is larger in absolute
value than every other eigenvalue, and htop (σ) = log λ. Furthermore, by
irreducibility of A, the left and right eigenvectors u = (u0 , . . . , ud−1 ) and
v = (v0 , . . . , vd−1 )T associated to λ are unique up to a multiplicative factor,
and they can be chosen to be strictly positive. We will scale them such that

ui vi = 1.

Now define the Shannon-Parry measure by

pi := ui vi = μSP ([i]),
Ai,j vj
pi,j := = μSP ([ij] | [i]),

so pi,j indicates the conditional probability that xn+1 = j knowing that

xn = i. Therefore μSP ([ij]) = μSP ([i])μSP ([ij] | [i]) = pi pi,j . It is stationary
(i.e. shift-invariant) but not quite a product measure: μSP ([im . . . in ]) = pim ·
pim ,im+1 · · · pin−1 ,in .

Theorem 6.67. The Shannon-Parry measure μSP is the unique measure of

maximal entropy for an SFT with a primitive transition matrix.

Proof. This measure was introduced by Shannon [495], and Parry showed
in [446] that it is indeed the unique measure. In this proof, we will only show
that hμSP (σ) = htop (σ) = log λ and skip the (more complicated) uniqueness
part; see [550, Theorem 8.10].
The definitions of the masses of 1-cylinders and 2-cylinders are compat-
ible, because (since v is a right eigenvector)

d−1 d−1 d−1

Ai,j vj λvi
μSP ([ij]) = pi pi,j = pi = pi = pi = μSP ([i]).
λvi λvi
j=0 j=0 j=0

d−1 d−1
Summing over i, we get i=0 μSP ([i]) = i=0 ui vi = 1, due to our scaling.

17 Infact, also if k = Aij ∈ N \ {1} we can interpret this as of k paths from state i to state
j. The theory doesn’t change.
290 6. Methods from Ergodic Theory

To show shift-invariance, take any cylinder set Z = [im . . . in ] and com-


d−1 d−1
pi pi,im
μSP (σ −1 Z) = μSP ([iim . . . in ]) = μSP ([im . . . in ])
i=0 i=0
ui vi Ai,im vim
= μSP ([im . . . in ])
λvi uim vim
ui Ai,im λuim
= μSP (Z) = μSP (Z) = μSP (Z).
λuim λuim

This invariance carries over to all sets in the σ-algebra B generated by the
cylinder sets.
Based on the interpretation of conditional probabilities, the identities

⎪ d−1
⎨ im+1 ,...,in =0 pim pim ,im+1 · · · pin−1 ,in = pim ,

· · · pin−1 ,in = pin
im ,...,in−1 =0 pim pim ,im+1

follow because the left-hand side indicates the total probability of starting
in state im and reaching some state after n − m steps, respectively, starting
at some state and reaching state n after n − m steps.
To compute hμSP (σ), we will confine ourselves to the partition P of 1-
cylinder sets; this partition is generating, so this restriction is justified by
Theorem 6.48.

4 d−1
HμSP σ P = − μSP ([i0 . . . in−1 ]) log μSP ([i0 . . . in−1 ])
k=0 i0 ,...,in−1 =0
Aik ,ik+1 =1

= − pi0 pi0 ,i1 · · · pin−1 ,in
i0 ,...,in−1 =0
Aik ,ik+1 =1
× log pi0 + log pi0 ,i1 + · · · + log pin−2 ,in−1
d−1 d−1
= − pi0 log pi0 − (n − 1) pi pi,j log pi,j ,
i0 =0 i,j=0
6.6. Measures of Maximal Entropy 291

by using (6.16) repeatedly. Hence

1 4
hμSP (σ) = lim HμSP σ −k P
n→∞ n
= − pi pi,j log pi,j
ui Ai,j vj
= − (log Ai,j + log vj − log vi − log λ) .

The first term in the brackets is zero because Ai,j ∈ {0, 1}. The second term
(summing first over i) simplifies to
d−1 d−1
λuj vj
− log vj = − uj vj log vj ,
j=0 j=0

whereas the third term (summing first over j) simplifies to

d−1 d−1
ui λvi
log vi = ui vi log vi .
i=0 i=0

Hence these two terms cancel each other. The remaining term is
d−1 d−1 d−1
ui Ai,j vj ui λvi
log λ = log λ = ui vi log λ = log λ.
λ λ
i,j=0 i=0 i=0

This shows that μSP maximizes entropy. 

⎛ ⎞
0 1 1  
1 1
A = ⎝1 0 1⎠ A=
1 0
1 1 0 √
λ=2 λ = 12 (1 + 5)

Figure 6.5. Slope λ maps with prescribed transition matrices.

We can interpret the Shannon-Parry measure geometrically by building

an interval map T with a Markov partition {Pi }d−1
i=0 (for d = #A) of intervals.
Whenever Aij = 1, assure that there is a subinterval of Pi that is mapped
monotonically with slope ±λ onto Pj . Therefore the topological entropy
htop (T ) = log λ; see [422]. This may result in a discontinuous map T , see
292 6. Methods from Ergodic Theory

the left panel of Figure 6.518 , but from the measure-theoretic viewpoint this
doesn’t matter. As a piecewise expanding map, T preserves a probability
measure μ - Leb with density ρ = d dμ Leb constant on each Pi .
If we denote the lengthsof the partition elements by vi = |Pi | and set
ui = ρ|Pi , then i ui vi = i μ(Pi ) = 1. By the Rokhlin formula (6.14)
the entropy is hμ (T ) = log |T  | dμ = log λ = htop (T ), so μ is a measure of
maximal entropy. Also
Aij vj = Leb(T (Pi )) = λ|Pi | = λvi ,

and, because ρ is a fixed point of the transfer operator LT ,

1 1
(6.17) Aij vi = ρ(y) =: LT (ρ)(x) = vj
λ |T  (y)|
i T (y)=x

for all x ∈ Pj◦ . An intuitive way to see this is that the expansion factor λ
of T dilutes the density by a factor 1/λ, but summing over all preimages of
the interval Pj gives (6.17). Thus u and v are left and right eigenvectors of
A for the leading eigenvalue λ. Finally Aij λvji is the relative measure of the
subinterval of Pi that is mapped to Pj by T .
Example 6.68. We carry out the computation for the Fibonacci SFT with
associated matrix
1 1 1 √
A= and leading eigenvalue λ = γ = (1 + 2).
1 0 2
In this case, we can let T be the tent map Tγ (x) = min{γx, γ(1 − x)}; see
the right panel of Figure 6.5. Then P0 = [ 12 , γ2 ] has length v1 = 12 (γ − 1) and
P1 = [ 12 (γ − 1), 12 ] has length v1 = 12 (2 − γ). Solving
μ0 = u0 v0 , μ1 = u1 v1 , μ0 + μ1 = 1, u1 = u0 /γ,
1 γ2
we find μ0 = 3−γ = γ 2 +1 and μ1 = γ 21+1 , which is in agreement which what
Theorem 6.67 would provide. Although the Fibonacci substitution χFib has
the same associated matrix A, μ does not describe the frequency of symbols
0, 1 or of words in the Fibonacci substitution shift (XFib , σ). This is because
the fixed point ρFib = 0100101001001 . . . of χFib is not the itinerary of a
μ-typical point.

The next example applies this to S-gap shifts and finds their measure of
maximal entropy.

18 Since in the example of Figure 6.5 the matrix has zeros on the diagonal, there is no fixed

point, which is impossible for a continuous map T : I → I.

6.6. Measures of Maximal Entropy 293

Example 6.69. Let the directed graph G consist of a vertex v0 from which
q loops of length  emerge. Let (X, σ) be the corresponding SFT. We want
to find the Shannon-Parry
 measure. First we canreplace the directed graph
by one with  q vertices vi,j , 1 ≤ i ≤ L :=  q and 1 ≤ j ≤  if i is
the index of a loop of length  (let us write  = i in this case). For such i,
the edges of the graph are vi,j → vi,j+1 if j < i and vi,i → vi ,1 for each
i ∈ {1, . . . , L}. Thus the collection R = {vi,1 : i = 1, . . . , L} is a rome; see
Definition 8.71. Once in vertex vi,1 it takes i steps to return to R, and the
first return map to R has the full graph as transition graph.
By Theorem 8.73, htop (σ) = log λ, where λ is the unique positive solution
(6.18) q λ = λ−i = 1.
≥1 i=1

The Shannon-Parry measure μ̃ is completely determined by the probabilities

p̃i that when you return to R, you are in vertex vi,1 . Thus the entropy hμ̃

of the first return shift σ  to R is − i p̃i log p̃i . Abramov’s formula gives

hμ̃ (σ  ) − i p̃i log p̃i
(6.19) hμ (σ) = =  .
Eμ̃ () i i p̃i

We try p̃i = λ−i . By (6.18),

this gives a probability vector. Inserting it in
− λ−i log λ−i
(6.19) gives hμ (σ) = i
−i = log λ, so we found the measure of
i i λ
maximal entropy. It remains to construct a σ-invariant probability measure
by setting pi,j = p̃i /i for each 1 ≤ j ≤ i as the probability of being in
vertex vi,j . The transition probabilities are uniquely determined by this:
P(xn ∈ vi ,1 |xn−1 ∈ vi,i ) = p̃i .

6.6.1. Intrinsic Ergodicity. If (X, T ) has a unique measure of maximal

entropy, then μ can be shown to be ergodic; see [551, Theorem 8.7].
Definition 6.70. A dynamical system (X, T ) is intrinsically ergodic if it
has a unique measure of maximal entropy. This notion was coined by Benjy
Weiss [555].

Clearly, uniquely ergodic systems are intrinsically ergodic, and for zero
entropy systems, the two notions are equivalent. But for positive entropy,
most intrinsically ergodic dynamical systems are not uniquely ergodic.
As shown in Theorem 2.88, specification implies intrinsic ergodicity. For
this reason, irreducible SFTs19 , irreducible sofic shifts (Theorem 3.48), and
factors thereof [556] are all intrinsically but not uniquely ergodic. Intrinsic

19 However, for SFTs in higher dimension, intrinsic ergodicity can fail [133, 134].
294 6. Methods from Ergodic Theory

ergodicity need not hold for SFTs on infinite alphabets, as the next example
Example 6.71. Take the infinite alphabet N and the infinite transition
matrix A = (Ai,j )i,j∈N is given by
⎛ ⎞
1 1 1 1 1 ...
⎜1 1 1 1 1 . . .⎟
' ⎜ ⎟
⎜0 1 1 1 1 ⎟
1 if j ≥ i − 1, ⎜ ⎟
Ai,j = A = ⎜0 0 1 1 1 ⎟.
0 if j < i − 1, ⎜ ⎟
⎜0 0 0 1 1 ⎟
⎝ ⎠
.. .. .. ..
. . . .
Then htop (σ) = log 4, but there is no measure of maximal entropy. For the
proof, see [127].

There are weaker versions of specification that still imply intrinsic er-
godicity. This approach has been used to show that β-shifts (Corollary 3.69)
and unimodal shifts [159, 315] (see Example 3.92) are intrinsically ergodic.
Some further results, not relying on (any form of) specification, follow:
• S-gap shifts are intrinsically ergodic; see [159].
• Coded shifts and their factors are intrinsically ergodic under the
conditions given by Theorem 3.48.
• The hereditary B-free subshift (XBher , σ) is intrinsically ergodic [378]
and [231, Theorem J] and also B-free shifts themselves are of-
ten intrinsically ergodic (such as the B-free shift for B = {p2 :
p prime}; see [451]), but (XB , σ) need not be intrinsically ergodic
if htop (Xη ) > 0 [379].
On the other hand, there exist transitive and even minimal shifts that are
not intrinsically ergodic; see e.g. [306] and [194, Example 27.2] where there
are infinitely many measures of maximal entropy.
Definition 6.72. A dynamical system (X, T ) is called entropy dense if for
every invariant measure μ, there is a sequence of ergodic measures μn such
that μn → μ in the weak∗ topology and the entropies hμn (T ) → hμ (T ).

Obviously, uniquely ergodic systems are entropy dense, but there are
many more systems which have this property for non-trivial reasons.
• Every dynamical system with specification is entropy dense; see
[234] with an extended result in [458, Theorem 2.1] and [459,536].
Thus topological mixing unimodal maps are entropy dense (they
have specification; see [91, 92]). Weaker versions of specification
6.7. Mixing 295

hold for β-shifts, and this can be used to show that also β-shifts are
entropy dense [458, Theorem 2.1 and Proposition 5.1].
• Every transitive dynamical system with the shadowing property is
entropy dense, see [383, Corollary 31].
• General conditions for entropy denseness in the context of hyper-
bolic measures were given in [283], and also B-free shifts [344].
Remark 6.73. Entropy denseness implies that the Choquet simplex is
Poulsen, which in its turn implies that the Choquet simplex is arc-wise con-
nected. The reverse implications are not true in general. For Dyck shifts,
the Choquet simplex is arc-wise connected, but not Poulsen; see Proposi-
tion 3.134. In [369] it was shown that the set of ergodic measures of a
hereditary shift is arc-wise connected, but according to [379] not necessarily
Poulsen. In [271, Proposition 4.29] examples are given where the Poulsen
simplex is not entropy dense; using [207] one can create minimal shifts with
this property.

6.7. Mixing
Whereas a Bernoulli process consists of totally independent trials, mixing
refers to an asymptotic independence:
Definition 6.74. A dynamical system (X, B, μ, T ) preserving the probabil-
ity measure μ is called mixing (or strongly mixing) if
(6.20) μ(T −n (A) ∩ B) → μ(A)μ(B) as n → ∞
for every A, B ∈ B.
Lemma 6.75. Every Bernoulli system is mixing.

Proof. For any pair of cylinder sets C, C  we have μ(σ −n (C) ∩ C  ) =

μ(C)μ(C  ) for n sufficiently large. This property carries over to all mea-
surable sets by the Kolmogorov Extension Theorem. 

Similar to Bernoulli shifts, SFTs with Markov measures (so in particular

the Shannon-Parry measure) are mixing.
Proposition 6.76. A probability-preserving dynamical system (X, B, μ, T )
is mixing if and only if the correlation coefficients
0 0 0
(6.21) f ◦ T (x) · g(x) dμ →
f (x) dμ · g(x) dμ as n → ∞
for all f, g ∈ L2 (μ). Written in the notation of the Koopman operator
UT f = f ◦ T and inner product (f, g) = X f (x) · g(x) dμ, we get
(6.22) (UTn f, g) → (f, 1)(1, g) as n → ∞.
296 6. Methods from Ergodic Theory

Proof. The “if” direction follows by taking indicator functions f = 1A and

g = 1B . For the “only if” direction, all f, g ∈ L2 (μ) can be approximated by
linear combinations of indicator functions. 

Corollary 6.77. If (X, T ) is uniformly rigid, then no T -invariant measure

apart from a Dirac measure can be strongly mixing.

Proof. Take A, B ⊂ X measurable such that μ(A), μ(B) > 0 and inf{d(a, b) :
a ∈ A, b ∈ B} =: ε > 0. Next take any n such that d(T n (x), x) < ε for all
x ∈ X. Then A ∩ T n (B) = ∅, so μ(T −n (A) ∩ B) = 0 = μ(A)μ(B). Since n
can be arbitrarily large, μ(T n (A) ∩ B) → μ(A)μ(B). 

6.7.1. Linearly Recurrent Shifts and Mixing. Dekking & Keane [191]
gave the first general proof that constant length substitution shifts are never
strongly mixing. A short and more general result [167] states that no lin-
early recurrent shift20 can be strongly mixing, and this includes all primitive
substitution shifts.

Theorem 6.78. A linearly recurrent shift (X, σ) is not mixing w.r.t. its
unique21 invariant probability measure.

Proof. Let L be the constant appearing in the definition of linear recurrence.

Let u0 ∈ L(X) be some word and recursively construct a sequence of return
words un ∈ R(un−1 ); see Definition 4.2. As in Example 4.3, we can choose
un at least as long as un−1 , so any appearance of un in any x ∈ X has un−1
as prefix and is also followed by un−1 . Set hn := |un |. Since un reappears
with gap ≤ Lhn , the measure of the cylinder [un ] satisfies

(6.23) μ([un ]) ≥ 1/(Lhn ).

Take m so large that μ([um ]) < L−2 . Define

D(n) = [um ] ∩ σ hn ([um ]) and E(n) = {0 ≤ j < hn−1 : σ j ([un−1 ]) ⊂ [um ]}.

For j ∈ E(n) we have

σ j ([un ]) ⊂ σ j ([un−1 ]) ⊂ [um ] and σ j+hn ([un ]) ⊂ σ j ([un−1 ]) ⊂ [um ],

and therefore σ j ([un ]) ⊂ D(n). Because σ i ([un ]) ∩ σ j ([un ]) = ∅ for 0 ≤ i <

j < hn−1 , it follows that μ(D(n)) ≥ #E(n)μ([un ]). Unique ergodicity of

20 In fact, [167] applies to any linearly recurrent mapping of the Cantor set.
21 Recall from Corollary 6.29 that linear recurrent shifts are uniquely ergodic.
6.7. Mixing 297

(X, σ) gives limn→∞ hn = μ([um ]). Combining all of the above, we get
lim inf μ(D(n)) ≥ lim inf #E(n)μ([un ])
n→∞ n→∞
≥ lim inf hn−1 μ([un ])μ([um ]) (by Theorem 4.4(iii))
≥ lim inf μ([un ])μ([um ]) (bounded gaps)
n→∞ L
hn 1
≥ μ([um ]) (by (6.23))
L Lhn
> μ([um ])2 (by the choice of m).
μ(D(n)) = μ([um ] ∩ σ hn ([um ])) = μ(σ −hn ([um ] ∩ σ hn ([um ])))
= μ(σ −hn ([um ]) ∩ σ −hn ◦ σ hn ([um ])).
But htop (σ|X ) = hμ (σ) = 0, so by Proposition 6.52, σ is invertible μ-a.s.
Therefore μ([um ]$σ −hn ◦ σ hn ([um ])) = 0 and thus
lim inf μ(σ −hn ([um ]) ∩ [um ]) > μ([um ])2 .

In other words, (X, σ, μ) cannot be mixing. This finishes the proof. 

6.7.2. Cutting and Stacking and Mixing. A similar poof as in the pre-
vious section holds for cutting and stacking systems of finite rank and with
a bound on the number of layers of spacer.
Theorem 6.79. Let (Δ, T, μ) be a finite rank cutting and stacking system
such that at each step of the construction, at most s layers of spacer are
inserted between stacked slices. Then (Δ, T, μ) is not strongly mixing.

Proof. Let w(n) be the number of stacks Δi (n) at step n in the construction,
hi (n)−1 j
and let hi (n) be their heights, so Δi (n) = j=0 Δi (n), 1 ≤ i ≤ w(n).
Also let hmin (n) = mini hi (n) and hmax (n) = maxi hi (n). By speeding up
the cutting and stacking construction, we can assume that hmin (n) ≥ 2n .
Let Sn be the spacer left at step n; the assumption on hmin (n) also implies
that μ(Sn ) = O(2−n ). Because the system is of finite rank r, 1 ≤ w(n) ≤ r
for all n, but there need not be a fixed upper bound on the number of slices
each stack is cut into.
Choose ε ∈ (0, 2rs ). Take m so large that εhmin (m) ≥ 100(1 + s),
μ(Sm ) < ε/100, and e−4s/2 ≥ 89 . Now let

w(m) εhi (m)

A= Δji (m).
i=1 j=0
298 6. Methods from Ergodic Theory

Clearly 4
3 ε ≥ μ(A) ≥ 3
ε. We claim
' 5
μ(A ∩ Δi (n)) 2ε
an := min ≥ for all n ≥ m.
i=1,...,w(n) μ(Δi (n)) 3
By the choice of A, am ≥ 34 ε. Each of the slices into which each stack Δi (m)
hmin (m)
is cut receives at most s layers of spacer. Therefore am+1 ≥ am hmin (m)+s ≥
am (1 − hmin (m) ). By induction

⎛ ⎞



an ≥ am 1− ≥ exp ⎝ log 1 − ⎠
hmin (j) 4 hmin (j)
j=m j=m
⎛ ⎞

3ε 3ε −4s/2m 2ε
≥ exp ⎝−2s 2−j ⎠ = e ≥ ,
4 4 3

proving the claim.

Now let n ≥ m be arbitrary. Note that the cutting and stacking proce-
dure ensures that A ∩ Δji (n) = ∅ implies that Δji (n) ⊂ A. In fact, A ∩ Δi (n)
consists of (multiple) layers of at least 100(1 + s) consecutive levels Δji (n).
Let p > n be minimal such that for each 1 ≤ i ≤ w(n), at least two
slices of the stack Δi (n) can be found in at least one of the stacks Δk (p),
1 ≤ k ≤ w(p). In between appearances of those slices, there can be layers of
spacer and slices of other stacks Δi (n). But the difference in level in Δk (p)
of the appearance of two slices of the same stack Δi (n), without slices of
Δi (n) or more than one slice of any stack Δi (n), i = i in between, can take
at most sr2r−1 values. Take the value t that occurs the most often. Then

μ(T −t (A) ∩ A) > r−1
> 2ε2 > μ(A)2 .
2 rs
Since t → ∞ as n → ∞, this contradicts mixing. 
Remark 6.80. Ferenczi [244, Corollary 3], using cutting and stacking sys-
tems techniques, showed that every minimal uniquely ergodic subshift of
(X, σ) with word-complexity satisfying lim supn pX (n)/n < ∞ is not mix-

Note that although in Theorem 6.79 every stack can receive no more
that s layers of spacer throughout the procedure, we don’t require a bound
on the number of slices a stack is cut into. Without the bound s, and still
without a bound on the number of slices, strong mixing may be achieved.
This was first hinted at by Ornstein in [436]. In detail, in the n-th step
of the construction, we cut the stack in z(n) slices and add i − 1 layers of
spacer to the i-th slice before stacking them (slice i on top of slice i − 1) into
6.7. Mixing 299

a single stack again. Symbolically, this amounts to a (non-adic) substitution

(6.24) B0 = 0, Bn+1 = Bn Bn 1Bn 11 . . . Bn 1z(n)−1 .
Although this is a rank 1 cutting and stacking, we rather speak of a staircase
system. Smorodinsky22 conjectured that such a system is strong mixing if
z(n) → ∞. Extending unpublished results in [6], Adams [5] showed that
this is indeed true if z(n) → ∞ in such a way that z(n)2 /h(n) → 0. We
follow his exposition, but see [171, 244] for further results.
Before going to this main result, we prove [5, Lemma 2.2]:
Lemma 6.81. Let E be a union of levels in Δ(n0 ) for some arbitrary n0 ∈ N
and let (ρn )n≥n0 be an integer sequence such that h(n) ≤ ρn < 2h(n). Then
there exists a sequence (n )n≥n0 tending to infinity such that
0 // n −1 /
/1 −iρn (x) /
lim / 1 E (T ) − μ(E) / dμ → 0 as n → ∞.
n→∞ X / n /

Proof. Choose i ≥ 1. For each n ≥ max{i2 , n0 }, take j ≥ 1 and t < h(n)

such that iρn = jh(n) + t. Take Cn = D1 ∪ D2 where the D1 are the top t
levels of Cn and the D2 the bottom h(n) − t levels. Take
D3 := {bottom (j + 1)z(n) levels of D1 } ∪ {j + 1 rightmost slices of D1 }
and E1 := (D1 ∩ E) \ D3 . Then
(j + 1)z(n) j + 1
μ((D1 ∩ E) \ E1 ) ≤ + →0
h(n) z(n)
as n → ∞. Hence, the part of D1 ∩E that is outside E1 is so small for large n
that it doesn’t affect the limit behavior. Let I be a level of E1 . Then T iρn (I)
goes through the roof j + 1 times and thus intersects z(n) − (j + 1) levels
of Cn in an arithmetic progression {η0 + η(j + 1)}η=0 ; see the dashed
lines in Figure 6.6. The mass of each intersection is μ(I)/z(n). Let I ∗ be the
highest of the levels of Cn that intersects T iρn (I). Then for E1∗ = I⊂E1 I ∗ ,
we have
(6.25) μ(T iρn
(E1 ) ∩ E) = μ(T −η(j+1) (E1∗ ) ∩ E) → μ(E1∗ )μ(E)

as n → ∞, by ergodicity of T (see Lemma 6.84). For E2 and E2∗ = I⊂E2 I
we get
(6.26) μ(T iρn
(E2 ) ∩ E) = μ(T −ηj (E2∗ ) ∩ E) → μ(E2∗ )μ(E).

22 See [5, 172, 244]; the conjecture was probably stated by Meir Smorodinsky in personal

discussions with Nat Friedman. Ferenczi, building on the example from a symbolic point of view,
states that the map was proposed by Smorodinsky but not published before [6].
300 6. Methods from Ergodic Theory


⎨ Ē1 ⊃ I D1 t = 11

h(n) − t D2


Figure 6.6. The action of T iρn on a level I ⊂ E1 .

Since E1 and E2 are disjoint, we can add (6.25) and (6.26) to obtain
μ(T iρn (E) ∩ E) → μ(E)2 as n → ∞.
By T -invariance of μ, we obtain for i1 − i2 = i,
μ(T i1 ρn (E) ∩ T i2 ρn (E)) = μ(T (i1 −i2 )ρn (E) ∩ T i2 ρn (E)) → μ(E)2 ,
as n → ∞. Choosing n ∈ N sufficiently small that the above convergence
holds uniformly over 1 ≤ i1 , i2 < n , i1 = i2 , but still so that n → ∞, we
arrive at
0 // n −1 /
/1 /
/ 1E (T −iρn (x)) − μ(E)/ dμ (by Cauchy-Schwarz)
X / n i=0 /
6 / /2
7 / 1 n −1 /
8 / /
≤ / 1E (T −iρn (x)) − μ(E)/ dμ
X / n i=0 /
6 / /
70 / /
7 / n −1 /
7 / 1
=8 / 2 (1E (T n (x)) − μ(E)) (1E (T n (x)) − μ(E))// dμ
−iρ −jρ
X / n i,j=0 /
6 / /
70 / /
7 / n −1 /
7 /1
=8 / 2 1E (T −iρn (x))1E (T −jρ n (x)) − μ(E) // dμ.
X / n i,j=0 /
6.7. Mixing 301

Since the terms in this sum tend to 0 for i = j and the terms with i = j are
only an n -th part of the whole, the average tends to 0 as n → ∞. This is
what we needed to prove. 

Theorem 6.82. Let (X, T, μ) be a staircase cutting and stacking system such
that z(n) → ∞ and z(n)2 /h(n) → 0. Then (X, T, μ) is strongly mixing.

Proof. Let us go through the construction once more. We start with a

single interval Δ(0) and a certain amount of spacer. For the induction step,
assume that Δ(n) is a single stack of height h(n). Cut it into z(n) slices,
place i − 1 layers of spacers on the i-th slice, and then stack them up, slice i
on top of slice i − 1 for 2 ≤ i ≤ z(n), into a single stack Δ(n + 1). Thus the
heights satisfy
h(n + 1) = z(n)h(n) + z(n)(z(n) − 1),
and the condition z(n)2 /h(n) → 0 implies that 12 z(n)(z(n)−1) < h(n)/10 for
n sufficiently large. In the limit, we obtain a transformation T : X → X that
preserves Lebesgue measure μ and is invertible up to a set of zero measure.
Let A, B be subsets of X. We need to show that
(6.27) lim |μ(T m (A) ∩ B) − μ(A)μ(B)| → 0.

Note that for any ε > 0 we can take n0 ∈ N so large and A , B  con-
sisting of full levels of Δ(n0 ) such that the symmetric differences satisfy
μ(A$A ), μ(B$B  ) < ε. Then A , B  also consist of full levels of Δ(n) for
all n ≥ n0 . If (6.27) holds for every such A , B  and ε > 0, then (6.27) holds
for A and B as well. So there is no loss of generality to work with sets A, B
that consist of full levels of Δ(n) for every n ≥ n0 .
Next take m ∈ N arbitrary, and let n ∈ N be such that
h(n) ≤ m = kn h(n) + tn < h(n + 1), 1 ≤ kn ≤ z(n), 0 ≤ t n < hn .
We can assume that m is so large that n ≥ n0 .
Divide Δ(n) into pieces D1 , D2 , D3 where
D1 = {kn + 1 rightmost slices of Δ(n)},
D2 = {top tn levels of Δ(n) \ D1 },
D3 = {bottom h(n) − tn levels of Δ(n) \ D1 };
see Figure 6.7.
Mixing on D1 : Since the levels of D1 get interspersed with layers of
spacer, D1 occupies the top (kn + 1)h(n) + (z(n) − 1) + (z(n) − 2) + · · · +
(z(n) − kn − 1) = (kn + 1)h(n) + 12 (kn + 1)(2z(n) − kn − 2) levels of Δ(n + 1).
302 6. Methods from Ergodic Theory


⎪ I ⊂ Ā1

D1 D̄1

' ⎪

spacer ⎪

⎧ D2 ⎪

⎪ ⎪

⎪ and ⎨

⎨ Ā2 ⊃ I D2 D3 ⎫
tn ⎪
⎪ ⎪

⎪ inter- ⎪

⎪ ⎪
⎪ T m (I)

⎩ spersed ⎪
⎪ ⎪

D1 ⎪
⎪ ⎪

⎧ ⎪

⎪ ⎪

⎨ ⎪

h(n) − tn D3 ⎪

⎪ ⎪

⎩ ⎪

Δ(n) Δ(n + 1)

Figure 6.7. The staircase represented as Δ(n) (left) and Δ(n + 1) (right).

Inside the representation Δ(n + 1), define

D̄1 = D1 \ {bottom h(n) + z(n) levels of Δ(n + 1)}

∪{rightmost slice of Δ(n + 1)} .

Let Ā1 = A ∩ D̄1 . It consists of levels of Δ(n + 1) \ {rightmost slice of

Δ(n + 1)}, and μ((A ∩ D1 )$Ā1 ) < z(n+1)1
→ 0 as n → ∞. Let I be one of
these levels. The iterate T to I pushes it through the roof, where it splits
up into z(n + 1) − 1 intervals that intersect z(n + 1) − 1 consecutive levels of
Δ(n + 1) (see Figure 6.7, the “dotted diagonal” in the right panel), and each
of the intersections has mass μ(I)/(z(n + 1) − 1). Let I ∗ be the highest level
6.7. Mixing 303

of Δ(n + 1) that T m (I) intersects. Then

μ(T m (I) ∩ B) = μ(T −i (I ∗ ∩ B)),
z(n + 1) − 1

and μ(I ∗ ) = z(n+1)
z(n+1)−1 μ(I). Let A∗1 = I⊂Ā1 I ∗ . Summing over all I ⊂ Ā1
|μ(T m (Ā1 ) ∩ B) − μ(Ā1 )μ(B)|
/ /
/ z(n+1)−2   /
/ 1 1 /
= // μ(T −i (A∗1 ) ∩ B) − 1 + μ(A∗1 )μ(B)// .
/ z(n + 1) − 1 i=0 z(n + 1) − 1 /

By ergodicity (see Lemma 6.84)

μ(T −i (A∗1 ) ∩ B) − μ(A∗1 )μ(B) → 0
z(n + 1) − 1

as m → ∞ (and thus z(n + 1) → ∞). Naturally 1 ∗ → 0 as

z(n+1)−1 μ(A2 )μ(B)
well. This proves (6.27) for the part A1 ⊂ A.
Mixing on D2 : Let
Ā2 = A ∩ D2 \ {bottom z(n)2 levels of D2 in Δ(n)}.
Then μ((A∩D2 )\Ā2 ) ≤ z(n) h(n) → 0 as m → ∞ (and hence n → ∞). Let I be a
level in Ā2 , which is only a (z(n)−kn −1)/z(n) part of a level in Δ(n). Under
T m , I goes through the roof kn +1 times, splitting into z(n)−kn −1 intervals
which intersect z(n) − kn − 1 levels on Δ(n), in an arithmetic progression
z(n)−k −2
{i0 + i(kn + 1)}i=0 n ; see the dotted lines in Figure 6.7 (left). Let I ∗ be
the highest level on Δ(n) that T m (I) intersects. Then
μ(T (I) ∩ B) =
μ(T −i(kn +1) (I ∗ ) ∩ B).
z(n) − 1
 z(n)−kn −1
Let A∗2 = ∗
I⊂Ā2 I . Note that μ(Ā2 ) = μ(A∗2 ). Thus we can as-
/ z(n) /
sume that kn +1
< 1−ε, because otherwise /μ(T m (Ā2 ) ∩ B) − μ(Ā2 )μ(B)/ <
ε immediately.
Summing up over all I ⊂ Ā2 we obtain
/ /
/μ(T m (Ā2 ) ∩ B) − μ(Ā2 )μ(B)/
/ /
/ z(n)−kn −2 /
z(n) − kn − 1 / / 1 /
= μ(T −i(kn +1)
) ∩ B) − μ(A∗
)μ(B) /.
/ /
/ z(n) − kn − 1 i=0
2 2
z(n) /
304 6. Methods from Ergodic Theory

Since T is invertible mod μ and preserves μ, the expression in the absolute

value bars can be estimated as

z(n)−kn −2
μ(T −i(kn +1) (A∗2 ) ∩ B) − μ(A∗2 )μ(B)
z(n) − kn − 1
z(n)−kn −2
= μ(A∗2 ∩ T i(kn +1) (B)) − μ(A∗2 )μ(B)
z(n) − kn − 1
z(n)−kn −2 0
= 1B ◦ T −i(kn +1) − μ(B) dμ
z(n) − kn − 1 ∗
/ /
0 / z(n)−kn −2 /
/ 1 /
≤ / 1 ◦ T −i(kn +1)
− μ(B) / dμ.
/ B /
X / z(n) − kn − 1 i=0 /

So we need to show that this expression tends to 0 as n → ∞.

Take p ∈ N such that h(p − 1) ≤ kn + 1 < h(p). We have

(z(n) − kn − 1)(kn + 1) εz(n)(kn + 1) h(p − 1)2

≥ ≥ε
h(p) h(p) h(p)
z(p − 1)h(p − 1) h(p − 1)
= ε → ∞.
h(p) z(p − 1)

n −1
Choose kn ≥ 1 minimal such that kn (kn + 1) ≥ h(p). Then z(n)−k

kn →∞
as well. By Lemma 6.81 we can choose a sequence n → ∞ so slowly that
n −1
an :=  z(n)−k
n k   → ∞, but

/ /
0 // n −1
/1  (k +1)
−ikn /
(6.28) / 1B ◦ T n
− μ(B)/ dμ → 0 as n → ∞.
X // n i=0


In the rest of the proof, we abbreviate S = T kn +1 and g = 1B − μ(B). By

S-invariance of μ,

0 // n −1
/ 0 // n −1
−ikn / /1  +j
−ikn /
/ g◦S / dμ = / g◦S − μ(B)/ dμ
X / n i=0
/ X / n i=0
6.7. Mixing 305

for 0 ≤ j < kn . Taking the average of this expression over j = 0, . . . , kn − 1,
we find
/ /
0 // n −1 /
/ 0 / kn −1 n −1 /
/1  / / 1 1  +j /
/ g◦S −ikn
/ dμ = / g◦S −ikn / dμ
/  /
X / n i=0 / X / kn j=0 n i=0 /
/ /
0 /  /
/ 1 kn n −1 /
(6.29) = / −i /
g ◦ S / dμ.
/ k  n
X/ n /

Recall that z(n) − kn − 1 = an kn n + bn for some integer 0 ≤ bn < kn n .

/ /
0 / z(n)−kn −2 /
/ 1 /
/ −i /
g ◦ S / dμ
/ z(n) − kn − 1
X/ i=0 /
/ /
0 / an kn   +b −1 /
/ 1
n n
= / −i /
g ◦ S / dμ
/ an k  n + bn
X/ n i=0 /
/ ⎛  ⎞/
0 / an kn n −1   +b −1
an kn /
/ an kn n  1
n n
= / ⎝ g◦S + −i
g◦S −i ⎠/
/ an k  n + bn an k  n / dμ
X/ n n i=0 i=an kn  /
/ /
0 / an kn  −1 /
/ 1 n
/ 2bn &g&∞
≤ / −i /
g ◦ S / dμ +
/ an k  n an kn n + bn
X/ n i=0 /
/ /
0 / an −1 kn  −1 /
/1 1
  )/ 2&g&∞
≤ / g◦S −(i+jkn n /
dμ +
/ an 
k  / an
X/ j=0 n n i=0 /
/ /
0 /  /
/ 1 kn n −1 / 2&g&∞
≤ / g ◦ S −i /
/ k  n / dμ + an ,
X/ n i=0 /

where in the last line we used the opposite argument of (6.29). Combining
this result with (6.29) and (6.28), we get the required convergence. This
finishes the proof for D2 .
Mixing on D3 : The argument here is the same as for D2 , except that
T m (I)
now only goes through the roof kn times. 

Recall that, since this staircase example uses only one stack, it is of
rank 1. In [244] the word-complexity of a small variation of this example is
computed. Namely, instead of (6.24), the recursion is

(6.30) B0 = 0, Bn+1 = Bn Bn 1Bn 11Bn . . . Bn 1n−1 Bn .

306 6. Methods from Ergodic Theory

Then h(0) = h(1) = 1, h(2) = 2, h(3) = 7, and in general h(n + 1) =

(n + 1)h(n) + 12 n(n − 1). For this example, [244, Proposition 1] gives
p(1) = 2, p(2) = 4, p(3) = 7, p(4) = 12,
p(5) = 18, p(6) = 26, p(7) = 35, p(8) = 43,

⎪k+i+1 for k = h(n) + i, 1 ≤ i < k,

⎪ for h(n) + n < k ≤ 2h(n) + 1,


⎨k + n − i for 2h(n) + 2i ≤ k ≤ 2h(n) + 2i + 1,
p(k + 1) − p(k) =

⎪ 1 ≤ i ≤ n − 2,

⎪k + 2 for 2h(n) + 2n − 2 < k ≤ 3h(n) + 2n − 3,

⎩k + 1 for 3h(n) + 2n − 2 ≤ k ≤ h(n + 1).
This amounts to quadratic complexity:
  k log k 
1 2 1
(k + 3k − 2) ≤ p(k) ≤ k2 + O .
2 2 log log k
For staircase systems with general z(n), i.e.
(6.31) B0 = 0, Bn+1 = Bn Bn 1Bn 11Bn . . . Bn 1z(n)−1 Bn ,
the word-complexity satisfies p(k) ≥ 12 (k 2 + 3k − 2) with infinitely many
values of k where equality holds. There is no definite upper bound apart
from p(k) being subexponential, but it can be superpolynomial.

6.7.3. Weak Mixing. Weak mixing refers to the decay of correlations in

Definition 6.83. A dynamical systems (X, B, μ, T ) preserving a probability
measure μ is called weak mixing if
(6.32) |μ(T −i (A) ∩ B) − μ(A)μ(B)| → 0 as n → ∞

for every A, B ∈ B.

We can express ergodicity in analogy to (6.20) and (6.32):

Lemma 6.84. A probability-preserving dynamical system (X, B, T, μ) is er-
godic if and only if
(6.33) μ(T −i (A) ∩ B) − μ(A)μ(B) → 0 as n → ∞,

for all A, B ∈ B. Note the absence of absolute value bars compared to (6.32).
6.7. Mixing 307

Proof. Assume that T is ergodic, so by Birkhoff’s Ergodic Theorem 6.13

1 n−1
i=0 1A ◦ T (x) → μ(A) for μ-a.e. x. Multiplying by 1B gives
1A ◦ T i (x)1B (x) → μ(A)1B (x) μ-a.e.

Integrating over x (using the Dominated Convergence Theorem to swap limit

and integral) gives limn n1 n−1
i=0 X 1A ◦ T (x)1B (x) dμ = μ(A)μ(B).

Conversely, assume that A = T −1 A and take B = A. Then we obtain

1 n−1
μ(A) = n i=0 μ(T −i (A)) → μ(A)2 ; hence μ(A) ∈ {0, 1}. 

Theorem 6.85. We have the following implications:

Bernoulli ⇒ mixing ⇒ weak mixing ⇒ ergodic ⇒ recurrent.
None of the reverse implications holds in general.

Proof. Bernoulli ⇒ mixing holds by Lemma 6.75. The classical example

of why mixing automorphisms need not be two-sided Bernoulli comes from
Ornstein [437]. For non-invertible systems, it is easier to find mixing systems
that are not one-sided Bernoulli. For instance, typical C 2 expanding circle
maps have this property; see [123].
Mixing ⇒ weak mixing is immediate from the definition. Conversely,
Examples 6.125, 6.121 and 6.124 all give weakly mixing systems that are not
strongly mixing.
Weak mixing ⇒ ergodic holds by Lemma 6.8423 . Conversely, irrational
circle rotations are ergodic but not weakly mixing.

Ergodic ⇒ recurrent. If B ∈ B has positive mass, then A := i∈N T −i (B)
is T -invariant up to a set of measure 0; see the Poincaré Recurrence The-
orem 6.11. By ergodicity, μ(A) = 1, which is the definition of recurrence.
Conversely, rational circle rotations are recurrent but not ergodic. 

There are many equivalent characterizations of weak mixing. The fol-

lowing refer mostly to the product system.

Theorem 6.86. Let (X, B, μ, T ) be a probability measure-preserving dynam-

ical system. Then the following are equivalent:
(1) (X, B, μ, T ) is weak mixing.
(2) limEn→∞ μ(T −n A ∩ B) = μ(A)μ(B) for all A, B ∈ B and a subset
E of zero density.

23 A direct argument: Let A = T −1 (A) be a measurable T -invariant set. Then by weak

1 n−1 −i (A) ∩ A) → μ(A)μ(A) = μ(A2 ). This means that μ(A) = 0 or 1.
mixing, μ(A) = n i=0 μ(T
308 6. Methods from Ergodic Theory

(3) T × T is weak mixing.

(4) T × S is ergodic on (X, Y ) for every ergodic system (Y, C, ν, S).
(5) T × T is ergodic.

Proof. (1) ⇔ (2): Use Lemma 8.53 for ai = |μ(T −i (A) ∩ B) − μ(A)μ(B)|.
(2) ⇒ (3): For every A, B, C, D ∈ B, there are subsets E1 and E2 of N
of zero density such that

lim μ(T −n (A) ∩ B) − μ(A)μ(B)

E1 n→∞

= lim μ(T −n (C) ∩ D) − μ(C)μ(D) = 0.

E2 n→∞

The union E = E1 ∪ E2 still has density 0, and

0 ≤ lim /μ × μ((T × T )−n (A × C) ∩ (B × D))
−μ × μ(A × B) · μ × μ(C × D)|
= lim |μ(T (A) ∩ B) · μ(T −n (C) ∩ D) − μ(A)μ(B)μ(C)μ(D)|

≤ lim μ(T −n (A) ∩ B) · |μ(T −n (C) ∩ D) − μ(C)μ(D)|


+ lim μ(C)μ(D) ∗ |μ(T −n (A) ∩ B) − μ(A)μ(B)| = 0.


(3) ⇒ (4): If T × T is weakly mixing, then so is T itself. Suppose

(Y, C, ν, S) is an ergodic system; then for A, B ∈ B and C, D ∈ C we have

μ (T −i (A) ∩ B)ν(S −i (C) ∩ D)
= μ(A)μ(B)ν(S −i (C) ∩ D)
+ (μ(T −i (A) ∩ B) − μ(A)μ(B))ν(S −i (C) ∩ D).

By ergodicity of S (see Lemma 6.84), n1 n−1i=0 ν(S (C) ∩ D) → μ(C)μ(D),
so the first term in the above expression tends to μ(A)μ(B)μ(C)μ(D). The
second term is majorized by n1 n−1i=0 |μ(T (A) ∩ B) − μ(A)μ(B)|, which
tends to 0 because T is weak mixing.
(4) ⇒ (5): By assumption T × S is ergodic for the trivial map S : {0} →
{0}. Therefore T itself is ergodic, and hence T × T is ergodic.
6.8. Spectral Properties 309

(5) ⇒ (1): First observe that ergodicity of T × T trivially implies ergod-

icity of T . Because of Lemma 8.53 it suffices to show that the limit
1  2
L := lim μ(T −i (A) ∩ B) − μ(A)μ(B) = 0.
n n

Expanding the square gives

L = lim μ(T −i (A) ∩ B)2 + (μ(A)μ(B))2
n→∞ n
−2μ(A)μ(B) lim μ(T −i (A) ∩ B).
n→∞ n

The first limit is (μ(A)μ(B))2 by ergodicity of T × T ; indeed apply Lemma

6.84 to (T × T )−i (A × A) and B × B. Hence
L = −2μ(A)μ(B) lim μ(T −i (A) ∩ B) − μ(A)μ(B) ,
n→∞ n

which is indeed zero by ergodicity of T (use Lemma 6.84 again). 

6.8. Spectral Properties

For a measure space (X, B, μ), the space L2 (μ) of complex-valued square-
integrable observables f, g ∈ L2 (μ), equipped with inner product (f, g) =
X f (x) · g(x) dμ is a Hilbert space. Let T be a measure-preserving trans-
formation of (X, B, μ). We restrict the Koopman operator (see Remark 6.4)
to the Hilbert space L2 (μ): UT f = f ◦ T for all f ∈ L2 (μ). Then the corre-
sponding transfer operator LT defined via duality X LT f ·g dμ = f ·UT g dμ
fixes constant functions.

Definition 6.87. A dynamical system (X, T ) has a continuous eigenvalue

λ ∈ C if there is a continuous eigenfunction ϕ : X → C such that UT ϕ =
ϕ ◦ T = λϕ. A measurable dynamical system (X, B, μ, T ) has a measurable
eigenvalue λ ∈ C if there is an L2 (μ) eigenfunction ϕ : X → C such that
UT ϕ = ϕ ◦ T = λϕ μ-a.e.

The spectral properties of a dynamical system (X, B, μ, T ) refer to

the spectral properties (in particular, the eigenvalues and eigenfunctions,
and their span) of the Koopman operator UT . In particular, the spectral
properties of isomorphic systems are the same.
310 6. Methods from Ergodic Theory

The Koopman operator is unitary because T preserves μ. Indeed

(UT f, UT g) = f ◦ T (x) · g ◦ T (x) dμ
0X 0
= (f · g) ◦ T (x) dμ = f · g dμ = (f, g),

and therefore UT∗ UT

= UT UT∗ = I. This has several consequences, common
to all unitary operators.

Proposition 6.88. The spectrum σ(UT ) of UT is a closed subgroup of the

unit circle S1 . Eigenvectors to different eigenvalues κ = λ are orthogonal.
If μ is ergodic, then the eigenfunctions have constant modulus and each
eigenspace is one-dimensional.

In fact, if μ is ergodic, then σ(UT ) = S1 , but for the proof of this result
we refer to [429].

Proof. The spectrum of every operator is closed, and if κ is the eigenvalue

of eigenfunction f , then (f, f ) = (UT f, UT f ) = (κf, κf ) = |κ|2 (f, f ), so κ
lies on the unit circle. If κ, λ ∈ S1 are eigenvalues with eigenfunctions f and
g, respectively, then
UT (f g) = (f g) ◦ T = (f ◦ T ) · (g ◦ T ) = UT f · UT g = κλ(f g).
UT (f¯) = f¯ ◦ T = f ◦ T = UT f = κ̄f¯ = κ−1 f¯,
so the set of eigenvalues forms a multiplicative group of the unit circle (and
this carries over to the closure σ(UT )). Additionally,
(f, g) = (UT f, UT g) = (κf, λg) = κλ̄(f, g),
and if κ = λ, then this can only hold if f and g are orthogonal.
Assume now that μ is ergodic, so by Corollary 6.7, the only eigenvectors
of eigenvalue 1 are constant μ-a.e. If f is the eigenfunction of eigenvalue κ,
then |f | is an eigenfunction of eigenvalue |κ| = 1, so |f | is constant μ-a.e.; we
can scale |f | = 1. If g is another eigenfunction of κ, scaled so that |g| = 1,
then f /g is an eigenfunction of 1, so f = g μ-a.e. 

Lemma 6.89. If (Y, C, ν, S) is a measure-theoretic factor of (X, B, μ, T )

(with factor map π and ν = μ ◦ π −1 ), then every eigenvalue of S is also an
eigenvalues of T .

In particular, the spectrum of S is contained in the spectrum of T , and

isomorphic systems have the same eigenvalues and spectrum.
6.8. Spectral Properties 311

Proof. Let g be an eigenvalue of S, with eigenvalue λ. Then f := g ◦ π is

an eigenvector of T , because
f ◦ T = g ◦ π ◦ T = g ◦ S ◦ π = λg ◦ π = λf μ-a.e.
Hence f is an eigenfunction of (X, T, μ) with the same eigenvalue λ. 

6.8.1. Spectral Measures and Decompositions. Given a non-negative

measure ν ∈ M(S1 ) on the circle, the Fourier coefficients of ν are defined as
0 0 1
ν̂(n) = z dν = e2πinx dν(e2πix ).
S1 0
For every sequence (zj )j∈N of complex numbers and N ∈ N, we have
N N 0 1
zj z̄k ν̂(j − k) = zj e2πijx zk e2πikx dν
j,k=1 j,k=1 0
0 1 N N
= zj e zk e2πikx dν
0 j=1 k=1
0 1 N
= & zj e2πijx &2 dν ≥ 0.
0 j=1

This property of (ν̂(n))n∈Z is called positive definiteness. Conversely, the

Bochner-Herglotz Theorem (see e.g. [277, Chapter 5], [389], or [465, page
2]) states that for every positive definite sequence (an )n∈Z ⊂ C, there is a
unique non-negative
9 measure ν ∈ M(S1 ) such that ν̂(n) = an for each n,

and ν(S1 ) = n |an | .

Let (X, B, μ, T ) be an invertible dynamical system. Given a function

f ∈ L2 (μ), the sequence an := (UT f, f ) = X f ◦ T n f dμ is positive definite
zj z̄k aj−k = zj z k (UTj−k f, f ) = (zj UTj f, zk UTk f )
j,k=1 j,k=1 j,k=1
: ; * *2
*N *
* *
= zj UTj f, zk UTk f =*
* zj UT f *
* ≥ 0.
j=1 k=1 * j=1 *
Therefore the Bochner-Herglotz Theorem associates a non-negative measure
νf ∈ M(S1 ) to f , called the spectral measure of f .
Remark 6.90. If U is an invertible unitary operator, then
ν̂f (−n) = (U −n f, f ) = (U −n f, U −n U n f ) = (f, U n f ) = (U n f, f ) = ν̂f (n),
for every n ∈ N. Therefore it makes sense to define ν̂f (−n) := ν̂f (n) also for
non-invertible unitary operators. Most of the theory remains valid.
312 6. Methods from Ergodic Theory

Remark 6.91. For the Koopman operator UT of an invertible dynamical

system (X, B, μ, T ), the Fourier coefficients
ν̂f (n) = (UT f, f ) =
f ◦ T n f¯ dμ
are the autocorrelation coefficients of the observable f ∈ L2 (μ). If μ is
mixing, then ν̂f (n) → 0 for every f ∈ L2 (μ) with X f dμ = 0. Conversely, if
σf - Leb, then by the Riemann-Lebesgue Lemma, these Fourier coefficients,
i.e. the autocorrelation coefficients of f , tend to 0. In fact, the correlation
coefficients (UTn f, g) = X f ◦ T n ḡ dμ of two observables f, g ∈ L2 (μ) are
the Fourier coefficient of a complex measure σf,g . (This is an application
of a more general version of the Bochner-Herglotz Theorem.)

Suppose that a unitary operator U acts on a Hilbert space H. We can

decompose H into subspaces that are the linear spans of U -orbits of well-
chosen functions in H; see [465, Theorem II.4]:
Theorem 6.92. Let U be an invertible unitary operator acting on a separable
Hilbert space H. Then there is a (possibly finite) sequence of functions hj ∈ H
such that
⎧ <
⎨H = j Span(U n hj : n ∈ Z),
⎩Span(U n h : n ∈ Z) ⊥ Span(U n h : n ∈ Z), if j = k.
j k

The corresponding spectral measures satisfy νh1 / νh2 / νh3 / · · · . More-

over, if the (hj ) satisfy (6.34), then νhj ∼ νhj for each j.
Definition 6.93. The spectral measure νh1 of the leading function h1 in
(6.34) is called the maximal spectral type. If U = UT is the Koopman
operator of an invertible dynamical system, then we call νh1 the spectral
measure of T and we will denote it as νT .
Example 6.94. If f is an eigenfunction of UT to eigenvalue λ scaled so that
&f &2 = 1, then νf = δλ is the Dirac measure at the eigenvalue. Indeed,
δ̂λ (n) = z n dδλ = λn = (λn f, f ) = (UTn f, f ).
For each eigenfunction f , Span(UTn f : n ∈ Z) =: Span(f ) is only a one-
dimensional subspace. However, the Kronecker factor
Hpp := Span(f : UT f = λf )
can be as large as the whole Hilbert space L2 (μ).

Using νT , we can also give a (continuous) decomposition of UT in orthog-

onal projections, called the spectral decomposition; see [298]. For a fixed
eigenfunction f (with eigenvalue λ ∈ S1 ), we let Πλ : L2 (μ) → L2 (μ) be the
6.8. Spectral Properties 313

orthogonal projection onto the span of f . More generally, if S ⊂ σ(UT ), we

define ΠS as the orthogonal projection on the largest closed subspace V such
that UT |V has spectrum contained in S. As with any orthogonal projection,
they have the following properties:
• Π2S = ΠS (ΠS is idempotent).
• Π∗S = ΠS (ΠS is self-adjoint).
• ΠS ΠS  = 0 if S ∩ S  = ∅.
• The kernel ker(ΠS ) = V ⊥ , the orthogonal complement of V .

Theorem 6.95 (Spectral Decomposition of Unitary Operators). There is a

measure νT on S1 such that
UT = λ Πλ dνT (λ),
σ(UT )

and νT (λ) = 0 if and only if λ is an eigenvalue of UT . Using the above

properties of orthogonal projections, we also get
UT n = UT = λn Πλ dνT (λ).
σ(UT )

6.8.2. Weak Mixing Revisited. Although not immediately apparent

from the definition, the most important aspect of the notion of weak mix-
ing is that it excludes the existence of eigenfunctions other than constant
functions (with eigenvalue 1).

Lemma 6.96. A probability measure-preserving transformation (X, B, μ, T )

is weakly mixing if and only if the Koopman operator UT has no measurable
eigenfunctions other than constants.

Proof. ⇒: Assume that (X, B, μ, T ) is weakly mixing. By Theorem 6.86,

the product system T × T is ergodic. Suppose that f is an eigenfunction
with eigenvalue κ. Write φ(x, y) = f (x)f (y). Then

φ ◦ (T × T )(x, y) = φ(T x, T y) = f (T x)f (T y) = |κ|2 φ(x, y) = φ(x, y),

because |κ| = 1 by Proposition 6.88. Hence φ is T × T -invariant. By er-
godicity of T × T , φ must be constant μ × μ-a.e. But then f must also be
constant μ-a.e.
⇐: The other direction relies on spectral theory of unitary operators.
If φ is an eigenfunction of UT , then by assumption, φ is constant, so the
eigenvalue is 1. Let V = Span(φ) and let Π1 be the orthogonal projection
onto V ; clearly V ⊥ = {f ∈ L2 (μ) : f dμ = 0}. One can derive that the
spectral measure νT cannot have any atoms, except possibly at Π1 .
314 6. Methods from Ergodic Theory

Now take f ∈ V ⊥ and g ∈ L2 (μ) arbitrary. Using the Spectral Theorem

6.95, we have
/ /2
n−1 n−1 /0 /
1 / /
|(U i
f, g)| 2
= / λi
(Πλ f, g) dν T (λ) /
n T
n / σ(UT ) /
i=0 i=0
n−1 0 0
= λi (Πλ f, g) dνT (λ) κi (Πκ f, g) dνT (κ)
n σ(UT ) σ(UT )
n−1 0 0
= λi κi (Πλ f, g)(Πκ f, g) dνT (λ) dνT (κ)
n σ(UT )×σ(UT )
0 0 n−1
= λi κi (Πλ f, g)(Πκ f, g) dνT (λ) dνT (κ)
σ(UT )×σ(UT ) n
0 0
1 1 − (λκ)n
= (Πλ f, g)(Πκ f, g) dνT (λ) dνT (κ).
σ(UT )×σ(UT ) n 1 − λκ
In the final line we used that the diagonal {λ = κ} has νT × νT -measure
zero, because ν is non-atomic (except possibly the atom Π1 at λ = 1, but
then Π1 f = 0). Now n1 1−(λκ)
1−λκ is bounded (use l’Hôpital’s rule) and tends to
0 for λ = κ, so by the Bounded Convergence Theorem, we have
|(UTi f, g)|2 = 0.
n→∞ n

By Corollary 8.54, also limn n1 n−1i=0 |(UT f, g)| = 0 (i.e. without the square).

Finally, if f ∈ L (μ) is arbitrary, then f − (f, 1) ∈ V ⊥ . We find


n−1 n−1
1 1
0 = lim |(UTi (f − (f, 1)), g)| = lim |(UTi f − (f, 1), g)|
n→∞ n n→∞ n
i=0 i=0
= lim |(UTi f, g) − (f, 1)(1, g)|.
n→∞ n
Take f = 1A , g = 1B to get the definition of weak mixing. 
Example 6.97. Circle rotations Rα , of any rotation angle α ∈ [0, 1), are
neither mixing nor weakly mixing. To prove non-mixing, set A = B =
[0, 1/3]. There are infinitely many n such that Rα−n (A) ∩ B ⊃ [0, 1/4], so
1/4 ≤ μ(Rα−n (A) ∩ B) → μ(A)μ(B) = 1/9. Furthermore, Rα has a non-
constant eigenfunction ψ : S1 → C defined as ψ(x) = e2πix because ψ ◦
Rα (x) = e2πi(x+α) = e2πiα ψ(x). Therefore Rα is not weakly mixing.
Since the Sturmian shift with rotation number α ∈ [0, 1] \ Q (with its
unique invariant probability measure) is isomorphic to (S1 , Rα , μ), the ab-
sence of (weak) mixing carries over to the Sturmian shift.
6.8. Spectral Properties 315

6.8.3. Pure Point Spectra. The spectral measure of T decomposes as

νT = νpp + νac + νsing
where the following hold:
• νpp is the discrete or pure point part of νT . It is an at most
countable linear combination of Dirac measures, namely at every
eigenvalue, so in particular at λ = 1. For weak mixing transforma-
tions νpp = cδ0 for some c ∈ (0, 1].
• νac is absolutely continuous w.r.t. Lebesgue measure.
• νsing is non-atomic but singular w.r.t. Lebesgue measure.
Then parts νac + νsing = νcont together are called the continuous part of the
spectral measure. It follows from a result by Wiener [558] that ν = νcont if

and only if the averages of Fourier coefficients 2N1+1 N n=−N |ν̂(n)| → 0.

Definition 6.98. A measure-preserving dynamical system (X, B, μ, T ) is

said to have pure point spectrum (also called discrete spectrum) if
the collection of eigenfunctions of the Koopman operator UT spans L2 (μ).
That is, the Kronecker factor is L2 (μ). Equivalently, the spectral measure
νT = νpp is a countable linear combination of Dirac measures.

As we have seen in Proposition 6.88, the eigenvalues of UT form an

(in case of pure point spectrum countable) subgroup of S1 . Von Neumann
[299] proved that this group is a complete isomorphic invariant among the
measure-preserving dynamical systems with pure point spectrum; see [550,
Theorem 3.4]24 :
Theorem 6.99. Two measure-preserving dynamical systems with pure point
spectra are isomorphic if and only if their eigenvalues are the same.

The structure theorem by Halmos & von Neumann [299] associates pure
point spectrum to group rotations:
Theorem 6.100. An ergodic probability measure-preserving system
(X, B, μ, T ) on compact metric space has pure point spectrum if and only
if it is isomorphic to a rotation on a compact metrizable abelian group G
with Haar measure μG , so there is g0 ∈ G such that T x = φ−1 (φ(x) + g0 ),
where φ : X → G is the isomorphism.

Examples of such group rotations are rotations on the torus Td or ad-

dition on an odometer Σp , or (skew-)products of these. We give a proof
following Kůrka [381, Theorem 2.55], who, however, works with continuous
eigenfunctions only.
24 Walters uses the word conjugate for what is called isomorphic in this text. Conjugate in

this text is topologically conjugate in Walters’s book.

316 6. Methods from Ergodic Theory

Proof. ⇒: Suppose (fn )n∈N is a system of continuous eigenfunctions of UT ,

spanning L2 (μ). Let d be the metric of X, but we define a new metric ρ as
ρ(x, y) = |fn (x) − fn (y)|.

The triangle inequality is easily checked. For all x = y ∈ X, there is a

function g ∈ Span(fn : n ∈ N) such that g(x) = g(y), so there must be some
n ∈ N such that fn (x) = fn (y), so ρ(x, y) = 0. Since the eigenvalues λn of
the fn ’s all lie on the unit circle, we have
ρ(T (x), T (y)) = |fn (T (x)) − fn (T (y))|
|λn |
= |fn (x) − fn (y)| = ρ(x, y),

so T is an (invertible!) isometry w.r.t. ρ. The identity map Id : (X, d) →

(X, ρ) is continuous. To show this, take ε > 0 arbitrary and N such that

n>N 2n ≤ ε/2. Also using uniform continuity of the eigenfunctions fn , we

can choose δ > 0 such that d(x, y) < δ implies that |fn (x) − fn (y)| < ε/2 for
all n < N . Therefore
1 1
ρ(x, y) = |fn (x) − fn (y)| + |fn (x) − fn (y)|
2n 2n
n=1 n>N
1 ε ε
< + < ε.
2n 2 2

Thus (X, ρ), as a continuous image of the compact space (X, d), is compact
itself. Also T is assumed to be transitive, so by Exercise 2.28 it is minimal
(and in fact uniformly rigid; see Lemma 2.30). It remains to give (X, ρ) a
group structure. Fix x0 ∈ X, and define a homomorphism h : Z → orb(x0 )
by h(n) = T n (x0 ). Since T is an isometry on (X, ρ), it is easy to check that
the addition on Z transfers to a uniformly continuous action on orb(x0 ) and
T (x) = h(h−1 (x) + 1). But orb(x) = X, so this action extends continuously
to X and the group G is the compactification of Z in the topology that Z
inherits from (X, ρ) via h−1 .
⇐: Let Ĝ be the group of characters of G; i.e. each γ ∈ Ĝ is a continuous
function γ : G → S1 such that γ(g1 + g2 ) = γ(g1 )γ(g2 ) for all g1 , g2 ∈ G.
Define T̂ : G → G as T̂ (g) = g + g0 , so ψ ◦ T̂ = T ◦ φ μG -a.e. Then

γ(T̂ (g)) = γ(g + g0 ) = γ(g0 ) · γ(g) = λγ(g) for λ = γ(g0 ) ∈ S1 ,

6.8. Spectral Properties 317

so each character is an eigenfunction. The linear span Span(γ : γ ∈ Ĝ)

is an algebra of complex-valued continuous functions, containing the con-
stant function (because g ≡ 1 is also a character). The Theorem of Stone-
Weierstraß, see e.g. [56, Theorem 20.45], implies that Span(γ : γ ∈ Ĝ) is
dense in C(G, C) and therefore also in L2 (μ). Hence (G, μG , T̂ ) has pure
point spectrum and so has the isomorphic system (X, μ, T ). 
Corollary 6.101. Let (X, T ) be a continuous dynamical system on a com-
pact metric space. If T has pure point spectrum for each of its invariant
measures, then htop (T ) = 0.

Proof. Since a group rotation is an isometry, Haar measure has zero topo-
logical entropy. By Theorem 6.100, and since isomorphisms preserve entropy,
each ergodic T -invariant measure has therefore zero entropy. Now use the
Variational Principle 6.63. 
Definition 6.102. We say that U has simple spectrum if there is single
h ∈ L2 (μ) such that Span(UTn h : n ∈ Z) = L2 (μ). In other words, for
the decomposition of Theorem 6.92, the sequence (hj ) consists of a single
function h.

Dynamical systems with simple spectrum are necessarily ergodic, [429].

A theorem by von Neumann [431] says that pure point spectrum implies
simple spectrum. We will not prove this theorem in full but only give the
special case in Theorem 6.104 for circle rotations, where the eigenfunctions
are the Fourier modes. This situation is not unrepresentative for the general
case, because the eigenfunctions form a complete orthogonal system when
the spectrum is pure point.
A result going back to Fomin [250] states that every equicontinuous
system25 has pure point spectrum. A measure-theoretic version of equicon-
tinuity is in fact enough; see [323]. Recently, this result was extended to
mean equicontinuity in [263, 394]:
Theorem 6.103. If (X, T ) is mean equicontinuous, then it has pure point
spectrum w.r.t. each of its ergodic invariant measures.

This was again strengthened in [269] to μ-mean equicontinuous systems

with ergodic T -invariant measures μ: (X, T ) μ-mean equicontinuous if and
only if T has pure point spectrum w.r.t. μ. The condition that μ is ergodic
was later removed in [322]; see also [263, Theorem 1.4]. For invertible
maps26 , mean equicontinuity was shown to be equivalent to (X, T ) having a
pure point spectrum; see [263, Corollary 1.6].
25 Recallthat they are uniquely ergodic; see Theorem 6.20.
26 Recall
from Corollary 2.36 that equicontinuous surjections are invertible, but mean equicon-
tinuous maps need not be invertible.
318 6. Methods from Ergodic Theory

6.8.4. Examples of Pure Point Spectra. That the dynamical systems

in this section have pure point spectrum can now be seen as a special case of
Theorem 6.99 or Theorem 6.103, but the specific proofs may be instructive,
also because we omit the proofs of Theorem 6.99 and Theorem 6.103 and
Fomin’s original proof.

Theorem 6.104. A Sturmian shift with rotation number α ∈ / Q has pure

point spectrum, with eigenvalues e2πiαn , n ∈ Z. It also has simple spectrum.

Proof. Since the Sturmian shift with frequency α (with its unique invariant
probability measure) is isomorphic to (S1 , B, Leb, Rα ), it suffices to consider
the irrational circle rotation; it preserves Lebesgue measure Leb, so we take
μ to be Lebesgue measure. The Koopman operator URα has eigenfunctions
ψn : S1 → C defined as ψn (x) = e2πnix , n ∈ Z, because

ψn ◦ Rα (x) = e2πni(x+α mod 1) = e2πniα ψn (x).

But the (ψn )n∈Z form the standard basis of Fourier modes spanning L2 (μ),
so URα has pure point spectrum.
Now for the simple spectrum part, irrational rotations Rα indeed have
a simple spectrum, but the Fourier modes, i.e. the eigenfunctions, don’t
play the role of h. Quite the opposite: take h ∈ L2 (μ) such that ĥ(n) =
1 2πin dx = 0 for all n ∈ Z. We show that the orthogonal complement
0 h(x)e
Span(UTn h : n ∈ Z)⊥ = {0}. Indeed, suppose that g ∈ L2 (μ) satisfies
g ⊥ UTn h for all n ∈ Z. Write h = j ĥ(j)e−2πijx and g = k ĝ(k)e−2πikx ,
where both sequences of Fourier coefficients belong to 2 (C). Then
: ⎛ ⎞ ;
0 = (UnT h, g) = UnT ⎝ ĥ(j)e−2πijx ⎠ , ĝ(k)e−2πikx
j∈Z k∈Z
: ;
−2πij(x+nα) −2πikx
= ĥ(j)e , ĝ(k)e
j k∈Z
−2πijnα −2πijx
= e (ĥ(j)e , ĝ(k)e−2πikx ) = ĥ(j)ĝ(j)e−2πjnα .
j,k∈Z j∈Z

Hence, if we abbreviate ζ = e−2πiα ∈ S1 and (bj ) = (ĥ(j)ĝ(j))

  ∈ 2 (C), then
j bj ζ
nj = 0 for each n ∈ Z. The function b(z) := j∈Z bj z j ∈ L2 (μ) is
continuous. But since {ζ n }n∈Z is dense
 in S2 and b(ζ
1 n ) = 0, b(z) is identically

zero. By the Parseval inequality, j |bj | ≤ &b&2 = 0, and therefore bj =


ĥ(j)ĝ(j) = 0 for each j. But ĥ(j) = 0, so ĝ(j) = 0 for each j ∈ Z and

g(x) ≡ 0 as well. 
6.8. Spectral Properties 319

Theorem 6.105. Every odometer has pure point spectrum.

Proof. Let (X, a) be the odometer, with a-invariant measure μ. Set qj =

p1 p2 . . . pj . Recall that the j-cylinder Z[0j ] = [0 . . . 0] (j zeros) is periodic with
period qj . Abbreviate Zjn = σ n (Z[0j ] ). Now λkj := e2πik/qj for 0 ≤ k < qj is
an eigenvalue, because we can construct a corresponding eigenfunction fj,k
of the Koopman operator as fj,k |Zjn = e−2πink/qj . In particular, as shown
in Proposition 6.88, (fj,k )j∈N,0≤k<qj forms an orthogonal system. Also the
L2 (μ)-norms &fj,k &2 = 1 for all j ∈ N, 0 ≤ k < qj .
To show that it is a complete orthonormal system, i.e. Span({fj,k : j ∈
N, 0 ≤ k < qj }) is dense in L2 (μ), it suffices to show that if h ∈ L2 (μ) is
such that X h fj,k dμ = 0 for all j ∈ N, 0 ≤ k < qj , then h ≡ 0 μ-a.e. Since
C(X) is dense in L2 (μ), we can assume that h is continuous.
Assume that there is a cylinder set Zjn such that Zjn h dμ = 0. Let
gε = ε + (1 − ε)e−2πin/qj fj,1 . Then gε |Zjn = 1 and for ε > 0 sufficiently small
|gε (x)| ≤ 1 − ε2 < 1 for x ∈/ Zjn . Clearly gε is a linear combination of eigen-
functions, so X h gε dμ = 0. The algebraic power gεr is a linear combination
of eigenfunctions too, and hence also X h gεr dμ = 0. But since |gεr (x)| <
(1 − ε2 )r → 0 for each x ∈/ Zjn , we get limr X h gεr dμ = Z n h dμ = 0. This
contradiction shows that (fj,k )j∈N,0≤k<qj is indeed a complete orthonormal
Another way to see that (fj,k )j∈N,0≤k<qj spans L2 (μ) is the following:
Since the fj,k ’s are linear combinations of the 1Zjn ’s, the 1Zjn ’s are linear
combinations of the fj,k ’s. But the cylinder sets {Zjk }j∈N,0≤k<qj generate the
Borel σ-algebra, whence the indicator functions {Zjk }j∈N,0≤k<qj span L1 (μ);
see Remark 1.29, where μ is the unique a-invariant measure, obtained from
the Kolmogorov Extension Theorem. Since μ is a probability measure,
L (μ) ⊂ L (μ).
2 1 

There are sufficient conditions for substitution shifts to have pure point
spectrum, such as [519, Lemma 3.2] (see also Host’s results quoted in [465,
Section VI.27]). We state it for the associated BV-system:

Theorem 6.106. Let (XBV , τ ) be a stationary BV-system with incidence

matrix M =  (mij )i,j∈A and τ -invariant measure μ. Suppose m11 = 1, and
let h(n) = w∈Vn hw (n) be the number of n-paths from v0 to Vn as in Defi-
nition 5.28. Set

Dn,k = {x ∈ XBV : x and τ h(n) (x) belong to different k-cylinders}.

320 6. Methods from Ergodic Theory

The following hold:

(1) If (Xρ , τ ) has pure point spectrum, then limn μ(Dn,k ) → 0 for each
k ∈ N.

(2) If ∞ n=1 μ(Dn,k ) < ∞ for every k ∈ N, then (Xρ , τ ) has pure point

Remark 6.107. A famous conjecture in this context is the Pisot substitu-

tion conjecture, asking if a substitution subshift has pure point spectrum
if and only if the leading Perron eigenvalue of the associated matrix is a
Pisot number. This is solved for substitutions in two letters [52, 317] (see
also [54]) and in general [50] for β-substitutions, i.e. substitutions associated
to β-transformations; see Remark 3.75.

The next theorem is the main result in [332] of Jacobs & Keane.

Theorem 6.108. A regular Toeplitz shift (X, σ) has pure point spectrum.

The “regular” is important here, as shown by the counterexample of

Downarowicz & Lacroix [214] for non-regular Toeplitz shifts. However, there
are also non-regular Toeplitz shifts with pure point spectrum; see [331].

Proof. Let (X, σ), X ⊂ AN or Z , be our regular Toeplitz shift; from The-
orem 6.25 we know that it preserves a unique probability measure μ. Let
x ∈ X be a sequence with skeletons Skj (x) of periods qj . Set rj
= q1j #{1 ≤ i ≤ qj : Skj (x)i = ∞}. By our assumption of regularity, rj → 0
as j → ∞.
1 j (x) as follows. If Skj (x)m = ∗
First we need to modify the skeleton to Sk
for some 1 ≤ m ≤ qj and xm+kqj = a for all k, for a single a ∈ A, then set
1 j (x)m+kq = a for all k. This new skeleton has period q̃j which is equal
Sk j
to, or at least divides, qj . Continue these modifications for all 1 ≤ m ≤ q̃j ,
1 j (x) of period q̃j with
but after a finite number of steps it stabilizes to a Sk
the property that for each 1 ≤ m ≤ q̃j with Skj (x)m = ∗, the sequence
(xm+kq̃j )k contains at least two different letters from A. The corresponding
sequence r̃j → 0.
Next define

Ajm (x) = {σ m+kq̃j : k ∈ N or Z} for 1 ≤ m ≤ q̃j .


Ajm = Ajm if m ≡ m mod q̃j ,
Ajm ∩ Ajm = ∅ otherwise,
6.8. Spectral Properties 321

q̃j q̃j
and since (X, σ) is minimal, m=1 Ajm = X. It follows that27 {Ajm }m=1 is a
clopen partition of X, independently of the choice of x ∈ X. Also
(6.35) σ(Ajm ) = Ajm+1 mod q̃j for all 1 ≤ m ≤ q̃j .
fr,j (x) = λm
r,j 1Ajm for λr,j = e2πir/q̃j .
By (6.35), fr,j ◦σ = λs,j fr,j , so the λs,j ’s are eigenvalues to the eigenfunctions
fr,j associated with the clopen partition {Ajm }m=1 .
To show that {fr,j }j∈N,1≤r≤q̃j spans L2 (μ), first note that because the
fr,j ’s are linear combinations of the indicator functions 1Ajm , the indicator
functions 1Ajm are linear combinations of the fr,j ’s. The Borel σ-algebra
is generated by the cylinder sets, so let us show that for each cylinder set
B = σ m ([b1 , . . . , bn ]), 1B is the L1 (μ)-limit of 1Ajm ’s. Set

Bj = {Ajm : 1 ≤ m ≤ q̃j , Ajm ⊂ B}.

By Birkhoff’s Ergodic Theorem 6.13,
μ(B \ Bj ) = lim #{1 ≤ t ≤ N : σ t (x) ∈ B \ Bj }.
N →∞ N
Take t ∈ N or Z such that σ t (x) ∈ B. If
Sk 1 j (x)t+1 , . . . , Sk
1 j (x)t , Sk 1 j (x)t+n−1 ∈ A,
1 j . But then σ t (x) ∈ Bj . Therefore
then σ t+kq̃j (x) ∈ B by construction of Sk
μ(B \ Bj ) ≤ r̃j (x) → 0 as j → ∞, and hence 1B can be approximated in
L1 (μ) by linear combinations of fr,j ’s.
Kolmogorov’s Extension Theorem implies that {fr,j }j∈N,1≤r≤q̃j spans
L1 (μ); see Remark 1.29. Because L2 (μ) ⊂ L1 (μ) for the probability measure
μ, its spans L2 (μ) as well. 
Example 6.109. Take x = 101101101111101101 . . . , continuing Exam-
ple 4.89. We have for j = 3 and q̃j = qj = 6,

⎪ j
⎪A1 = [011011011] ∪ [011111],

⎪ Aj2 = [110110111] ∪ [111110],

⎨Aj = [101101111] ∪ [111101],

⎪ A j
4 = [011011111] ∪ [111011],

⎪ Aj5 = [110110110] ∪ [110111],

⎩Aj = [101101101] ∪ [101111],

showing that the sets Ajm are in general more complicated than q̃j -cylinders.
27 Later these sets were used in [560]; see also [381, Chapter 4.6].
322 6. Methods from Ergodic Theory

The next few results on the spectrum of substitution shifts where the sub-
stitution χ is of constant length q go back to Kamae [340] and Dekking [190].
In this case, because the height vectors of the corresponding Bratteli diagram
(En , Vn )n∈N are hv (n) = q n for every v ∈ Vn , the Koopman operator Uσ has
an eigenvalue e2πiα for each α = p/q n . According to [465, Proposition VI.11],
there are no irrational continuous eigenvalues, but other rational eigenvalues
are not excluded.

Example 6.110. Kamae [340, Example 2] and Dekking [190, Example

before Lemma 10] give the constant length 3 substitution

⎨0 → 010, with fixed point
(6.36) χ : 1 → 201,

⎩ ρ = 0 10 201010 102010102020101010 . . . ,
2 → 102

so α = p/3n gives an eigenvalue for each p, n ∈ N0 . However, every second28

entry of ρ is 0, so the shift space Xρ falls apart into two disjoint sets [0] ∩ Xρ
and ([1] ∪ [2]) ∩ Xρ which are permuted cyclically by σ. Using the sliding
block code f : 0 → 0, 1, 2 → 1, we obtain the shift {(01)∞ , (10)∞ } with
eigenvalue eπi = −1. By Lemma 6.89, this is also an eigenvalue for the
substitution shift of (6.36). The subshift (Xρ , σ 2 ) is isomorphic to another
constant length substitution, called the pure base substitution. For our
example, the pure base substitution is obtained as block-substitution using
a = 10 and b = 20; see Section 4.2.2. We get the constant length substitution
χ̃(a) = aba, χ̃(b) = aab.

The constant length substitution χ has a coincidence if there are n ∈ N

and j ≤ q n such that χn (a)j is the same for each a ∈ A. The substitution
χ of Example 6.110 has no coincidence, but the pure base substitution has.
The following result was proved by Dekking [190, Theorem 7].

Theorem 6.111. A constant length substitution shift has pure point spec-
trum if and only if its pure base substitution has a coincidence.

Theorem 6.111 is close to Toeplitz sequences; after all, Theorem 5.54

shows that Toeplitz shifts can be represented as BV-systems in a way simi-
lar to substitutions of constant length (namely with the equal path property).
However, a coincidence does not imply that the BV-representation of a con-
stant length substitution shift has a unique minimal path, as the example
0 → 11, 1 → 01 shows.

28 This “period” 2 is called the height by Dekking, see [190, Definition 8] and [320, page 531],

but to avoid confusion with height vectors, we will not adopt that terminology.
6.8. Spectral Properties 323

Example 6.112. The substitutions

⎧ ⎧

⎨0 → 020, ⎪
⎨0 → 002,
χ : 1 → 110, and χ̃ : 1 → 110,

⎩ ⎪

2 → 210 2 → 210
are both of constant length q = 3 and share the associated matrix with
eigenvalues 3 and 1 (with multiplicity 2). In either case e2πiα is an eigenvalue
for all α = p/q n . Only χ has a coincidence, namely the 0 at the third position,
and therefore the corresponding subshift has pure point spectrum. For the
other substitution, there is no coincidence for any iterate (because χ̃n (0)
has not a single coincidence with neither χ̃n (1) nor χ̃n (2) for any n ∈ N).
Therefore the corresponding subshift has mixed spectrum. This example
shows that, although it is not visible in the eigenvalues of the Koopman
operator Uσ , the order of the letters in a substitution has an impact on the
spectral type of Uσ .
6.8.5. Mixed Spectrum. After this discussion on the pure point spectrum
case, we present two well-known shifts whose spectrum has a continuous part.
Theorem 6.113. The two-sided Thue-Morse shift (XTM , σ) from Exam-
ple 1.6 has mixed spectrum.

Proof. First recall that (XTM , σ, μTM ) is a 2-to-1 extension of the two-sided
Feigenbaum substitution shift (Xfeig , σ, μfeig ); see Example 4.9. Since Xfeig is
also a regular Toeplitz shift, it has pure point spectrum by Theorem 6.108;
in fact, e2πiα is an eigenvalue for every dyadic rational. Define a two-point
skew-product extension over (Xfeig , σ) by
σ  : Xfeig := Xfeig × {−1, 1} → Xfeig , (x, y) → (σ(x), (−1)x0 · y).
Then (XTM , σ) is conjugate to (Xfeig , σfeig ). Indeed, if π1 : Xfeig → Xfeig is

the projection onto the first coordinate and ψ : XTM → XTM is defined as29
⎛ ⎞
ψ(x)n = ⎝π(x)n , (−1)xj ⎠ ,

then the diagram in Figure 6.8 commutes.

The conjugacy ψ transforms μTM into μfeig = μfeig ⊗ ( 12 δ−1 + 12 δ+1 ),
and therefore it suffices to show that the system (XTM , σ  , μfeig ) has mixed
spectrum. Decompose L2 (μfeig ) = Heven ⊕ Hodd for

Heven = {f ∈ L2 (μfeig ) : f (x, y) = f (x, −y)},
Hodd = {f ∈ L2 (μfeig ) : f (x, y) = −f (x, −y)},
29 Where
0 xj
for n < 0 it may be more standard to write j=n (−1) .
324 6. Methods from Ergodic Theory

σ ψ 
XTM Xfeig σ

π π1

Xfeig σ

Figure 6.8. The Thue-Morse shift, the Feigenbaum shift, and its skew-
product extension.

and likewise
f (x, y) + f (x, −y) f (x, y) − f (x, −y)
f (x, y) = + .
feven fodd

Note that for f ∈ Heven and g ∈ Hodd , we have

0 0 0
f · g dμfeig = f · g dμfeig + f · g dμfeig
Xfeig Xfeig ×{+1} Xfeig ×{−1}
0 0
= f · g dμfeig + f · −g dμfeig = 0,
Xfeig ×{+1} Xfeig ×{+1}

so Heven ⊥ Hodd . Since (Heven , μfeig ) is isomorphic to (Xfeig , μfeig ), it has pure

point spectrum. It is the Kronecker factor of (Xfeig , μfeig ).
It remains to show that L2 (μfeig ) \ Heven contains no eigenfunctions. In-
deed, if f ∈ L2 (μfeig ) satisfies f ◦ σ  = λf μfeig -a.e., then

 f ◦ σ  (x, y) + f ◦ σ  (x, −y) f (x, y) + f (x, −y)

feven ◦ σfeig = =λ = λfeven .
2 2
That is, feven ∈ Heven is an eigenfunction with the same eigenvalue λ = e2πiα ,
so α must be a dyadic rational. By ergodicity (see Proposition 6.88), every
eigenspace of Uσ is one-dimensional, so if both f and feven belong to the
same eigenspace, f must be a multiple of feven , and hence f ∈ Heven . 
Example 6.114. The Rudin-Shapiro substitution is a substitution on a four
letter alphabet A = {0, 1, 0̄, 1̄}, given by

0 → 01̄, 0̄ → 0̄1,
ζRS :
1 → 0̄1̄, 1̄ → 01,
6.9. Eigenvalues of Bratteli-Vershik Systems 325

with fixed point ρ = 01̄0101̄0̄1̄01̄010̄101 . . . and word-complexity p(n) =

8n − 8 for n sufficiently large. An alternative, sometimes convenient repre-
sentation of ζRS is as a block substitution:

⎪ +1 + 1 → +1 + 1 + 1 − 1,

⎨+1 − 1 → +1 + 1 − 1 + 1,
χRS :

⎪ −1 + 1 → −1 − 1 + 1 − 1,

⎩−1 − 1 → −1 − 1 − 1 + 1

giving rise to the Rudin-Shapiro sequence:

ρ = +1 +1 +1 −1 +1 +1 −1 +1 +1 +1 +1 −1 −1 −1 +1 −1 . . . .
This sequence was discovered by Shapiro in his Ph.D. Thesis [496] and later
by Rudin [477] to solve some extremal problems of trigonometric polyno-
mials; see [20, Example 3.3.1 and Section 6.9] and [249, Proposition 2.2.3].
Another description of the Rudin-Shapiro sequence ρ = (ρn )≥0 is
ρ0 = 1, ρ2n = ρn , ρ2n+1 = (−1)n ρn ,
and also, see [249, Proposition 2.2.2] and [111]:
ρn = (−1)#{blocks 11 in the binary expansion of n}
= (−1)n0 n1 +n1 n2 +···+nk−1 nk ,
where n = j≥1 nj 2 is the binary expansion of n.
Since χRS is a substitution of constant length 2, e2πiα is an eigenvalue of
Uσ for every dyadic rational, and there are no other eigenvalues. However,
the Kronecker factor Hpp is not the whole space L2 (μ); the remaining part of
the spectrum of the decomposition (6.34) is Hcont . Queffélec [466, Theorems
1 & 2] showed that ν̂f - Leb for every f ∈ Hcont . It follows that νcont = νac
and νsing is absent. The multiplicity of the continuous spectrum is 2, so there
are exactly two independent functions h1 , h2 with non-purely Dirac spectrum
There are generalizations of the Rudin-Shapiro shift to substitution shifts
of constant length q, and then the multiplicity of the continuous part of the
spectrum becomes q; see [465, Section 4]. Results of a somewhat similar
flavor can be found in [41].

6.9. Eigenvalues of Bratteli-Vershik Systems

The search for eigenvalues of the Koopman operator is a central problem for
many dynamical systems. For BV-systems, λ = e2πiα are the most likely to
be eigenvalues if α = μ([w]) for some cylinder set [w]. We start with a lemma
that appears to have originated from Gottschalk & Hedlund [284, Theorem
14.11]; see also [285, 455].
326 6. Methods from Ergodic Theory

Lemma 6.115. Let (X, T ) be a continuous dynamical system, and let f :

X → R be a continuous function such that supn∈N supx∈X n−1j=0 f ◦ T (x) <

∞. Then there is a continuous function g : X → R such that f = g − g ◦ T .

Proof. The space C(X) of continuous complex-valued functions with sup-

norm & &∞ is a Banach space. Let UT : C(X) → C(X) be the Koopman
operator, i.e. UT h = h ◦ T , and set V h = f + UT h. Since T iscontinuous,
both UT and V are continuous operators. Abbreviate Fn = n−1 j=0 f ◦ T .

Let K be the closure of the convex hull of {Fn }n∈N ; i.e.

each h ∈ K can
be written as the limit of a finite convex combinations nj=1 aj,n Fj where

aj,n ≥ 0 and nj=1 aj,n = 1. This is a compact convex subset of C(X).
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
n n n n
V⎝ aj,n Fj ⎠ = f + ⎝ aj,n Fj ⎠ ◦ T = ⎝ aj,n ⎠ f + aj,n Fj ◦ T
j=1 j=1 j=1 j=1
n n
= aj,n (f + Fj ◦ T ) = aj,n Fj+1 ∈ K,
j=1 j=1

it follows that V (K) ⊂ K. Now the Schauder-Tychonov Fixpoint Theorem,

see e.g. [478, Theorem 5.28], shows that V has a fixed point: g = V g =
f + g ◦ T , so f = g − g ◦ T as required. 

The function f = g − g ◦ T in this lemma is called a coboundary. If T

preserves a probability measure μ, then the lemma has an analogue in Lp (μ),
which is obtained by replacing the Banach space C(X) in the proof by Lp (μ).

Also, if (X, T ) is minimal, then the condition supn∈N n−1
j=0 f ◦ T (x) < ∞

suffices to show that f is a coboundary.

Remark 6.116. With the help of a result by Kesten [357], see also the end
of Section 8.3, Petersen [455] used this for circle rotations Rα : S1 → S1
to show that 1[0,β] − β is a function of bounded discrepancy if and only if
β ∈ Z[α] if and only if e2πiβ is an eigenvalue of the Koopman operator URα .

Corollary 6.117. Let (Xρ , σ) be an irreducible Pisot substitution shift with

its unique shift-invariant measure μ. Then for every w ∈ L(Xρ ), e2πiμ([w]) is
an eigenvalue of the Koopman operator.

Proof. The word w has bounded discrepancy; see Remark 4.38. Therefore
/ /
/n−1 /
/ /
sup sup / (1[w] − μ([w])) ◦ σ k (x)/ < ∞.
n∈N x∈Xρ / k=0
6.9. Eigenvalues of Bratteli-Vershik Systems 327

Lemma 6.115 implies in this case that 1[w] − μ([w]) = g − g ◦ σ for some
continuous map g : Xρ → R. This gives
e2πig ◦ σ = e2πig◦σ = e2πi(g−1[w] +μ([w])) = e2πiμ([w]) · e2πig ,
because e2πi1[w] (x) = 1 for all x ∈ Xρ . Therefore e2πiμ([w]) is the eigenvalue
to the (continuous) eigenfunction e2πig . 

For primitive substitution shifts (Xρ , σ), Host [320, Theorem 1.4] for-
mulated necessary and sufficient conditions for e2πiα to be an eigenvalue;
n (a)|
(6.37) lim e2πiα|χ = h(a)

for some coboundary30 h : X → C that is constant on cylinders [a]. He also

showed that measurable eigenfunctions are almost surely identical to con-
tinuous eigenfunctions. In fact, all eigenvalues in this setting are associated
to continuous eigenfunctions. See also [21, 62, 246, 403, 519] and the sur-
vey [222]. We skip the details in favor of more general results coming from
a series of papers by Durand & Maass with various combinations of coau-
thors, [109, 110, 167, 224]. We state aversion assuming linear recurrence
and uniqueness of the minimal path.
Recall that for a vertex v ∈ Vn in a Bratteli diagram, hv (n) is the number
of paths (height of the tower) from v0 to v. Let |||x||| = min{x−x , x−x}
denote the distance of x to the nearest integer.
Theorem 6.118. Let (XBV , τ ) be a linearly recurrent Bratteli-Vershik sys-
tem such that for every n ≥ 1 and w ∈ Vn+1
(6.38) s(e) = vnmin ∈ Vn is a vertex on the minimal path xmin
for every smallest incoming edge e ∈ En+1 with t(e) = w. Then λ = e2πiα is
(a) an L2 (μ) eigenvalue if and only if
max |||αhv (n)|||2 < ∞,

(b) a continuous eigenvalue if and only if

max |||αhv (n)||| < ∞.

This theorem invites the question of whether there are systems that
have measurable but no continuous eigenfunctions (other than constant func-
tions). Such systems indeed exist; see [110, Theorem 2.5].
30 Cases where h ≡ 1 have to do with periodic structure of the fixed point χ = χ(ρ), similar

to Example 6.110.
328 6. Methods from Ergodic Theory

Remark 6.119. Condition (6.38) corresponds to condition (KR6) of the

Kakutani-Rokhlin partitions and implies that there is a unique minimal path
xmin . Conversely, if there is a unique minimal path, then by Lemma 5.57 we
can telescope the Bratteli diagram in such a way that (6.38) holds. However,
this telescoping could a priori compromise the linear recurrence of the BV-
system. In Theorem 6.118 we therefore assume (6.38) without telescoping.
If we can telescope such that both (6.38) and linear recurrence remain true,
then the conclusions of Theorem 6.118 hold.

Remark 6.120. Theorem 6.118 applies directly to primitive (hence linearly

recurrent; see Theorem 4.18) substitution shifts χ : A → A∗ for which there
are n ≥ 1 and b ∈ A such that χn (a) = b for each a ∈ A. Indeed, this latter
condition guarantees that the corresponding ordered Bratteli diagram has a
unique minimal path. Host’s result (6.37) shows that this condition is not

Example 6.121. Dekking & Keane [191] give the example

0 → 01010, 3 1
χ: with associated matrix A = .
1 → 011 2 2

This matrix has integer eigenvalues 1 and 4. However, the height vectors
h(n) = An 11 have only odd components. Therefore Theorem 6.118 implies
that this substitution shift is weak mixing. If we alter this substitution to
0 → 01010, 3 1
χ: with associated matrix A = ,
1 → 0111 2 3

then the eigenvalues of A are 3 ± 2 > 1. According to Theorem 8.10 and
Exercise 8.12, the conditions of Theorem 6.118 cannot be met. Hence also
this substitution shift is weak mixing.

Let us first give an of idea what eigenfunctions should look like. This
step requires neither (6.38) nor linear recurrence. Let

(6.39) rn (x) = min{j ≥ 1 : ∃k > n such that τ j (x)k = xk }

be the number of iterates of the Vershik maps necessary to change the path
x beyond the n-th edge.

Proposition 6.122. The complex number λ = e2πiα is a measurable eigen-

value of a BV-system (XBV , τ ) if and only if there is a sequence ρn : Vn → R
such that

(6.40) ||| (rn (x) + ρn ◦ t(xn )) α||| converges μ-a.e. as n → ∞.

6.9. Eigenvalues of Bratteli-Vershik Systems 329

Similarly, λ = e2πiα is a continuous eigenvalue if and only if

(6.41) |||rn (x)α||| converges uniformly as n → ∞,

so the ρn is not required in this case.

Proof. Measurable eigenvalues: Let λ = e2πiα be a measurable eigen-

value of (XBV , μ, τ ) and let f ∈ L2 (μ) be the corresponding eigenfunction.
Define Ψn = λrn (x) f (x). Note that rn is constant on elements of the n-th
Kakutani-Rokhlin partition

Pn = {[x1 . . . xn ] : x1 . . . xn is a path from v0 to Vn }.

Therefore, the conditional expectations w.r.t. μ satisfy

λ−rn Eμ (Ψn |Pn ) = Eμ (λ−rn Ψn |Pn ) = Eμ (f |P (n)) → f μ-a.e.

as n → ∞. If rn (x) ≥ 2, then Ψn ◦ τ (x) = λrn (x)−1 f ◦ τ (x) = λrn (x) f (x) =

Ψn (x). Hence Ψn is constant on each cylinder set31 [v] = {x ∈ XBV : t(xn ) =
v} and v ∈ Vn . Therefore we can define functions ρn : Vn → R implicitly by
−ρn (v) 1
(6.42) λ := Eμ (Ψn |Pn )|v = Ψn dμ.
μ([v]) [v]

It denotes a kind of average phase of the cylinder [v]. For μ-a.e. x ∈ XBV

e−2πiα(rn (x)+ρn (t(xn ))) = λ−rn λ−ρn (t(xn )) = λ−rn (x) Eμ (Ψn |Pn ) → f (x).

Therefore (6.40) holds.

For the converse, suppose that α satisfies (6.40). Set λ = e2πiα and

(6.43) f (x) = lim fn (x) for fn (x) = e−2πiα(rn (x)+ρn ◦t(xn )) .


For x ∈/ X max , there is n0 ∈ N such that rn (x) ≥ 2 for all n ≥ n0 . In

particular, x and τ (x) belong to the same [v] for v ∈ Vn . Therefore

fn ◦ τ (x) = e−2πiα(rn (x)−1+ρn ◦t(xn )) = λfn (x).

In the limit fn → f we get f ◦ τ = λf μ-a.e.

Continuous eigenvalues: Let λ = e2πiα be a continuous eigenvalue of

(XBV , τ ) and let f : XBV → C be the corresponding continuous eigenfunc-
tion. As stated in (6.38), each minimal incoming edge e ∈ En+1 satisfies
s(e) = vnmin . Then also s ◦ τ rn (x) (x)n+1 = vnmin .

31 This set corresponds to the v-th stack of the n-th tower in a cutting and stacking

330 6. Methods from Ergodic Theory

Take M ∈ N arbitrary. Then the distance d(τ rn (x) (x), xmin ) < 2−n <
2−M for all n > M . In other words, τ rn (x) → xmin uniformly, so

f (τ rn (x) (x)) f (xmin )

e2πiαrn (x) = λrn (x) = →
f (x) f (x)
uniformly. Hence |||rn (x)α||| converge uniformly in x.
Conversely, if |||rn (x)||| converges uniformly, then f (x) := limn λ−rn (x)
is uniformly continuous. If x ∈ / XBVmax , then r (τ (x)) = r (x) − 1 for n
n n
sufficiently large. Therefore f (τ (x)) = limn λ−rn (τ (x)) = λ limn λ−rn (x) =
λf (x). To get the same equality also for xmax ∈ XBV max let ε > 0 be arbitrary.

Since f is uniformly continuous, we can find δ > 0 such that f (Bδ (x)) ⊂
Bε (f (x)) for every x ∈ XBV . Therefore, for y ∈ orb(x) ∩ Bδ (xmax ) with
τ (y) ∈ Bδ (τ (xmax )), we have

|f (τ (xmax )) − λf (xmax )| ≤ |f (τ (xmax )) − λf (y)| + |λf (y) − λf (xmax )|

= |f (τ (xmax )) − f (τ (y))| + |λ| |f (y) − f (xmax )|
(6.44) ≤ ε + ε = 2ε.

Since ε > 0 was arbitrary, f (τ (xmax )) = λf (xmax ) as required. 

Equation (6.43) suggests that we can approximate f by λrn (x)+ρn (t(xn ))

where the power contains an “average phase” ρn (v) if t(x) = v ∈ Vn and
an integer rn (x) indicating for how many iterates τ j (x) ∈ [v]. However, in
the proofs below, it is convenient to approximate f by a martingale (fn )n∈N
obtained from the condition expectations32 :
(6.45) fn (x) := Eμ (f |Pn )(x) = f dμ,
μ(Pn [x]) Pn [x]

where Pn [x] denotes the partition element of the n-th Kakutani-Rokhlin

partition containing x. This sequence (fn )n∈N is indeed a martingale because

(6.46) Eμ (fn+1 |Pn ) = Eμ (Eμ (f |Pn+1 )|Pn ) = Eμ (f |Pn ) = fn .

The Martingale Theorem (see e.g. [86, Theorem 35.6]) implies that fn → f
in L2 (μ) and μ-a.e. as n → ∞.
Let [v min ] denote the cylinder set of the minimal path connecting v0
with v ∈ Vn . Then the elements of Pn are of the form τ j ([v min ]) for v ∈
Vn , 0 ≤ j < hv (n), and for each v ∈ Vn , the measures μ(τ j ([v min ])) for
0 ≤ j < hv (n) all coincide. Assume now that rn (x) ≥ 2, so τ (x) ∈ [v]. Then

32 For background on martingales and conditional expectation, see e.g. [56, Chapter 21] and

[86, Sections 34 & 35].

6.9. Eigenvalues of Bratteli-Vershik Systems 331

for x ∈ τ j ([v min ]) = Pn [x],

fn ◦ τ (x) = f ◦ τ dμ
μ(Pn [x]) Pn [x]
= f ◦ τ dμ
μ(τ j ([v min ])) τ j ([vmin ])
= f ◦ τ j+1 dμ
μ([v min ]) [vmin ]
= f ◦ τ j dμ = λfn (x).
μ([v min ]) [vmin ]
Since averages over a function f taking values in the unit circle lie in the
closed unit disk, we can find cn (v) ∈ [0, 1] and ρn : Vn → R such that
(6.47) fn (x) = cn (v)λ−(rn (x)+ρn (v)) if t(xn ) = v ∈ Vn .
The L2 (μ)-norm of fn satisfies
hn (v)−1
&fn &22 = |cn (v)λ−(rn (x)+ρn (v)) |2 μ([v min ])
v∈Vn j=0

= |cn (v)|2 hn (v)μ([v min ])


(6.48) = cn (v)2 μ(x ∈ XBV : t(xn ) = v).


Since our system is linearly recurrent with constant L, we have

1 μ(x ∈ XBV : t(xn ) = v)
(6.49) ≤ ≤ L2
L2 μ(x ∈ XBV : t(xn ) = w)
for all v, w ∈ Vn and all n ∈ N. Since &fn &22 → 1 by the Martingale Conver-
gence Theorem, (6.48) and (6.49) together imply that
(6.50) 1 ≥ min cn (v) → 1 as n → ∞.

Proof of Theorem 6.118. Since the BV-system is assumed to be linearly

recurrent, say with constant L, #Vn ≤ L for all n ∈ N and τ preserves a
single invariant probability measure μ; see Corollary 6.29.
Measurable eigenvalues, “only if ” direction: Let fn = Eμ (f |Pn ) be the
martingale as in (6.45). By the Martingale Convergence Theorem fn → f
μ-a.e. Also, since Eμ (fn+1 − fn |Pn ) = 0,
Eμ ((fn+1 − fn )(fm+1 − fm )) = Eμ (Eμ (fn+1 − fn )(fm+1 − fm )|Pn )
= Eμ ((fm+1 − fm )Eμ (fn+1 − fn )|Pn ) = 0
332 6. Methods from Ergodic Theory

for m > n ≥ 1. This makes the mixed terms disappear in

* *2
* *
* *
&fn+1 − fn &2 = * fn+1 − fn *
* = &f &2 < ∞.
2 2

n≥1 *n≥1 *

As before, let v min denote the path connecting v0 and v ∈ Vn , and let [v min ]
be the corresponding n-cylinder. Define for v ∈ Vn and w ∈ Vn+1

J(v, w) = {0 ≤ j < hv (n + 1) : τ j (wmin ) ∈ [v min ]}.

Then for j ∈ J(v, w) and x ∈ τ j+k ([wmin ]) ⊂ τ k ([v min ]) we have by (6.47):

fn+1 (x) = e2πiα(j+k) cn+1 (w)λρw (n+1) ,
fn (x) = e2πiαk cn (v)λρv (n) ,

for 0 ≤ k < hv (n). Because all the sets τ j+k ([wmin ]) have the same mass, it
follows that

hv (n)−1 0
&fn+1 − fn &22 ≥ |fn+1 − fn |2 dμ
k=0 [wmin ]

hv (n)−1 0
≥ |e2πiαj cn+1 (w)λρw (n+1) − cn (v)λρv (n) |2 dμ
k=0 [wmin ]

= μ([w ])hv (n)|e2πiαj cn+1 (w)λρw (n+1) − cn (v)λρv (n) |2


1 2πiαj
≥ |e cn+1 (w)λρw (n+1) − cn (v)λρv (n) |2 ,

where the last line is by linear recurrence. Therefore

(6.51) max max |e2πiαj cn+1 (w)λρw (n+1) − cn (v)λρv (n) |2 < ∞.
v∈Vn ,w∈Vn+1 j∈J(v,w)

By (6.38), 0 ∈ J(vnmin , w) for every n ∈ N and w ∈ Vn+1 , so (6.51) for j = 0


ρvmin (n) 2
max |cn+1 (w)λρw (n+1) − cn (vnmin )λ n | < ∞.
6.9. Eigenvalues of Bratteli-Vershik Systems 333

By the triangle inequality, for any v ∈ Vn , w ∈ Vn+1 also

|cn+1 (w)λρw (n+1) − cn (v)λρv (n) |2

≤ |cn+1 (w)λρw (n+1) − cn (vnmin )λ vnmin (n)|
ρvmin (n+1) ρvmin (n−1)
(6.52) + |cn (vnmin )λ n − cn−1 (vn−1
)λ n−1 |
ρ min (n−1)
+ |cn−1 (vn−1
)λ vn−1 − cn (v)λ ρv (n)
ρ (n)
≤ 6 max |cn+1 (w)λρw (n+1) − cn (vnmin )λ vnmin |,
ρvmin (n+1) ρvmin (n−1)
|cn (vnmin )λ n − cn−1 (vn−1
)λ n−1 |,
ρ min (n−1)
|cn−1 (vn−1
)λ vn−1 − cn (v)λ ρv (n)

is summable.
By (6.50), 1 ≥ maxv∈Vn cn (v) ≥ minv∈Vn cn (v) → 1 as n → ∞. Therefore
/ / / / 2
/ ρv (n) / / c (v)λρv (n) /
/ c n (v)λ / / n /
|e2πiαj − 1|2 ≤ /e2πiαj − / + / − 1 /
/ cn+1 (w)λ ρw (n+1) / / cn+1 (w)λρw (n+1) /

≤ 4 |e2πiαj cn+1 (w)λρw (n+1) − cn (v)λρv (n) |
+|cn+1 (w)λρw (n+1) − cn (v)λρv (n) | .

Combining this with (6.51) and (6.52), we obtain

(6.53) max max |e2πiαj − 1|2 < ∞.

v∈Vn ,w∈Vn+1 j∈J(v,w)

|λhv (n) − 1|2 = |e2πiαhv (n) − 1|2 = |e2πiα(j+hv (n)) − e2πiαj |2
≤ |e2πiα(j+hv (n)) − 1| + |e2πiαj − 1|
- .2
≤ 4 max |e2πiα(j+hv (n)) − 1| , |e2πiαj − 1| .

For j ∈ J(v, w), we have j + hv (n) ∈ J(v  , w) for some v  ∈ Vn , and therefore
the summability in (6.53) implies that maxv∈Vn |λhv (n) − 1|2 is summable as
well, completing this direction of the proof.

Measurable eigenvalues, “if ” direction: Let λ = e2πiα be the eigenvalue,

and set κn := maxv∈Vn |||hv (n)α||| where we recall that
hv (n) = #{n − paths from v0 to v ∈ Vn }.
334 6. Methods from Ergodic Theory

By assumption, 2
n κn < ∞. Define

θn (x) = hs(e) (n)α mod Z.

En+1 e>xn+1
t(e)=t(xn+1 )

Since (XBV , τ ) is linearly recurrent with recurrence constant L, we have

|||θn (x)||| ≤ Lκn . Recalling rn (x) from (6.39), we have rn (x)α ≡ n−1
j=0 θj (x)
mod 1. Define
⎛ ⎞
(6.54) gn (x) = ⎝ θj (x) − Eμ (θj )⎠ mod Z.

Then gn (x) − gn ◦ τ (x) = α whenever t(xn+1 ) = t(τ (x)n+1 ). In order to

show that g := limn gn exists in L2 (μ), we decompose gn+1 − gn as

gn+1 − gn = Eμ (θn − Eμ (θn )|Pn ) mod Z + (θn − Eμ (θn |Pn )) mod Z

Yn Zn

and show that both (Yn ) and (Zn ) are summable in L2 (μ). For this we
replace the mod Z by the closest distance ||| · ||| to the integers.
The L2 (μ)-norm of Zn satisfies

&Zn & = &θn − Eμ (θn |Pn )&2 ≤ &θn &2 + &E(θn |Pn )&2 ≤ 2&θn &2 ≤ 2Lκn ,

so the (Zn ) is summable. Secondly, Zn is Pm -measurable for m > n because

it depends only on the first n + 1 edges of x and

E(Zm |Pm ) = Eμ (θn |Pm ) − Eμ (Eμ (θn |Pm )) = 0.


Eμ (Zm Zn ) = Eμ (Eμ (Zm Zn |Pm )) = Eμ (Zn Eμ (Zm |Pm )) = 0,

so that the mixed terms in the following expression vanish:

⎛ ⎞  ∞
∞ 2

Eμ ⎝ Zm ⎠ = Eμ 2
m=n+1 m=n+1
∞ ∞
≤ Eμ (Zm
) ≤C κ2m < ∞.
m=n+1 m=n+1

Therefore limn Zn = Z exists in L2 (μ).

6.9. Eigenvalues of Bratteli-Vershik Systems 335

Now for Yn , let θn (e) be the value of θn at the cylinder set [e] = {x ∈
XBV , xn+1 = e} and recall that μ([v]) ≥ 1/L by linear recurrence. Then

&Yn &2 = &Eμ (θn − Eμ (θn )|Pn )&2 = &Eμ (θn |Pn ) − Eμ (θn )&2
* *
* *
*   *
* hv (n) μ([w]) hv (n)μ([w]) *
=** 1{s(xn+1 )=v} − θn (e) *
hw (n + 1) μ([v]) hw (n + 1) *
*w∈Vn+1 e>xn+1 *
* v∈V t(e)=t(x ) *
n n+1
* * 2
* *
*   *
* hv (n)μ([w]) 1 *
=** 1{s(xn+1 )=v} − 1 θn (e) *
*w∈Vn+1 e>xn+1 hw (n + 1) μ([v]) *
* v∈V t(e)=t(x ) *
n n+1 2
hv (n)mv,w (n)
≤ μ([w])(L − 1)Lκn
hw (n + 1)
w∈Vn+1 v∈Vn

= μ([w])(L − 1)Lκn ≤ L2 κn .

Since θn only depends on the edges in En+1 , Yn is constant on cylin-

ders [w], w ∈ Vn . Thus w.r.t. the partition Qn−1 = {[w], w ∈ Vn−1 }, the
conditional expectation satisfies
Yn (x) = Eμ (θn − E(θn )|Qn−1 (x)) = θn − Eμ (θn ) dμ =: q(w)
μ([w]) [w]

for x ∈ [w]. It follows that for 1 ≤ k ≤ n, v ∈ Vn−k , and x ∈ v, the

conditional expectation
Eμ (Yn |Qn−k )(x) = Yn dμ
μ([w]) [v]
= μ([v] ∩ [w]) q(w)
μ([v] ∩ [w])
= − μ([w]) q(w),

where in the last line we used XBV Yn dμ = w∈Vn μ([w]) q(w) = 0. Since
(XBV , τ ) is linearly recurrent and
/ |q(w)| ≤ Lκ
/ n , Lemma 6.36 gives us a
/ μ([vw]) /
C > 0 and β ∈ (0, 1) such that / μ([v]) − μ([w])/ ≤ Cβ k . It follows that

Eμ (Yn |Qn−k ) ≤ Cβ k Lκn .

336 6. Methods from Ergodic Theory

* *2 ⎛⎛ ⎞2 ⎞
* n *
* * n
* Yj * ⎝⎝ Yj ⎠ ⎠ =
* * = Eμ Eμ (Yj Yk )
*j=m+1 * j+m+1 m<j,k≤n
n−m−1 n−i
2 j−k i
= 2L C β κj κk = 2C β κj κj+i
m<j≤k≤n i=0 j=m+1
2 n
2L C
≤ κ2j < ∞,

uniformly in n. Recalling gn from (6.54), we conclude that gn → g in L2 (μ)

and the limit satisfies g − g ◦ τ = α. Therefore f := e−2πiαg is the required
Continuous eigenvalues, “only if ” direction: For x such that t(xn ) =

v ∈ Vn , let 0 ≤ rn (x) < hv (n) be such that x ∈ τ rn (x) ([v min ]). Then rn (x) =

hv (n) − rn (x) and since xmin is the unique minimal path, f ◦ τ −rn (x) (x) →
f (xmin ). Continuity of f shows that
 f (x) f (x)
λrn (x) =  →
f ◦ τ −rn (x) (x) f (xmin )
and hence |||αrn (x)||| converge uniformly. Combining this with the second
part of Proposition 6.122, we obtain that maxv∈Vn |||αhv (n)||| → 0, but we
want to show its summability.
By telescoping, we can assume that all entries of the transition matrices
are at least 2. Due to Lemma 5.57, for each n ∈ N and v ∈ Vn , the minimal
incoming edge e ∈ En with t(e) = v satisfies s(e) = vn−1 min , i.e. the n − 1-st
vertex of the unique minimal path x .
Now for each n ∈ N, let vn ∈ Vn be the vertex such that |||αhvn (n)||| is
maximal among all v ∈ Vn . We can divide N = Iodd
∪ Ieven
+ where
= {n : αhvn − round(αhvn (n)) > 0, n is odd/even}.
− −
Here round(x) is the integer nearest to x. Similarly N = Iodd ∪ Ieven for

Iodd/even = {n : αhvn − round(αhvn (n)) < 0, n is odd/even}.

odd ∈ XBV be the infinite path such that

Let x+

+ vn if n ∈ Iodd
t(xodd,n ) = min
vn otherwise.
Let yodd ∈ XBV be the infinite path such that for each n ∈ Iodd
,n , yodd,n+1 ∈

En+1 is the successor edge to xodd,n+1 if n ∈ Iodd . Since the entries of

+ +
6.9. Eigenvalues of Bratteli-Vershik Systems 337

odd odd

odd odd


odd and yodd near some level n ∈ Iodd .

Figure 6.9. The paths x+ + +

M (n + 1) are ≥ 2, such edges with t(x+ + min

odd,n+1 ) = t(yodd,n+1 ) = vn+1 can be
found. For all other n ∈ N, s(yodd,n+1 ) = vnmin ; see Figure 6.9. Then
α rn (yodd
) − rn (x+
odd ) mod Z ≡ αhvk (k) − round(αhvk (k)).

Since the left-hand side converges as n → ∞, we obtain

(6.55) max |||αhv (n)||| < ∞.


+ and I −
The arguments for Ieven are similar, so adding the four variants
of (6.55) gives n∈∞ maxv∈Vn |||αhv (n)||| < ∞, as required.
Continuous eigenvalues, “if ” direction: Set fn (x) = λj if x ∈ τ j ([v min ]),
v ∈ Vn , and 0 ≤ j < hv (n). Then fn is continuous because it is constant
on the elements of Pn . Also, if t(τ (x)n ) = t(xn ), i.e. j < hv (n) − 1, then
fn ◦ τ (x) = λj+1 = λfn (x).

Now fn+1 (x)
f n (x) = λ
un (x) where u (x) =
n e∈En+1 ,e<xn+1 hs(e) (n). Using
linear recurrence (so each vertex has at most L incoming edges),
/ /
/ fn+1 (x) /
|fn+1 (x) − fn (x)| = // n − 1// ≤ |||αun (x)||| ≤ L max |||αhv (n)|||.
f (x) v∈Vn

We have n maxv∈Vn |||αhv (n)||| < ∞ by assumption, so fn converges uni-
formly to some f : XBV → C and therefore f is continuous as well.
338 6. Methods from Ergodic Theory

For x ∈/ X max , there is N such that t(τ (x)n ) = t(xn ) for all n ≥ N ,
and therefore f (τ (x)) = limn fn (τ (x)) = limn λfn (x) = λf (x). Finally, for
xmax ∈ X max we use the argument of (6.44). 

Corollary 6.123. Let (XBV , τ ) be a BV-system of finite rank, and set

Kn = min{|hv (n) − hw (n)| : v, w ∈ Vn , hv (n) = hw (n)}.

If K := lim inf n Kn ∈ [1, ∞), then there are at most K eigenvalues.

Since 1 is an eigenvalue, K = 1 means that (XBV , τ ) is weakly mixing.

On the other hand, if hv (n) is a multiple of K for all n ∈ N and v ∈
Vn , then setting f (x) = e2πij/K whenever x ∈ τ j ({x1 = e ∈ E1 }) gives
a continuous eigenfunction to eigenvalue e2πij/K . Corollary 6.123 doesn’t
pertain to substitutions of constant length q because K = 0, and naturally
e2πiα is an eigenvalue for every p/q n .

Proof. Let λ = e2πiα be an eigenvalue. From Theorem 6.118 we know that

lim max |1 − λhv (n) | = lim max |||αhv (n)||| = 0.

n→∞ v∈Vn n→∞ v∈Vn

But there is a subsequence (nk ) so that we can find vk , wk ∈ Vnk with

hwk (nk ) = hvk (nk ) + K. Therefore both |||αhvk (nk )|||, |||αhvk (nk ) + K||| → 0
as k → ∞. But then |||αK||| → 0, which means that α = j/K for j ∈
{0, 1, . . . , K − 1}. 

Example 6.124. The primitive Chacon substitution shift (see Exam-

ple 1.27) given by

⎨0 → 0012,
χChac : 1 → 021,

2 → 21

is weakly mixing. Since (XχChac , σ) and the standard Chacon substitution

shift are topologically conjugate and uniquely ergodic, this is immediate. It
can also be seen by direct computation as follows. The associate matrix of
χChac and its diagonalization are
⎛ ⎞ ⎛ ⎞⎛ ⎞⎛ ⎞
2 1 1 0 −1 3 0 0 0 2 −4 2
A = ⎝1 1 1⎠ = ⎝−1 0 2⎠ ⎝0 1 0⎠ ⎝−3 3 3⎠ .
0 1 1 1 1 1 0 0 3 1 1 1
6.9. Eigenvalues of Bratteli-Vershik Systems 339

When turned into an ordered Bratteli diagram33 of rank 3, the heights of

order n can be computed as

⎨h0 (n) = |χ (0)| = &(1, 0, 0)A &1 = 2 (3
n n 1 n+1
− 1),
h1 (n) = |χ (1)| = &(0, 1, 0)A &1 = 3 ,
n n n

h2 (n) = |χn (2)| = &(0, 0, 1)An &1 = 12 (3n + 1).
It follows that for any α ∈ (0, 1), 2|||αh2 (n)||| − |||αh1 (n)||| = |||α||| → 0 as
n → ∞. Theorem 6.118 thus says that e2πiα is not an eigenvalue, and
therefore the primitive Chacon substitution shift is weakly mixing. In fact,
any substitution shift obtained from χChac by permuting the letters in each
substitution word is also weak mixing (provided χ(0) starts with 0).
Example 6.125. If f is a unimodal map with kneading map Q(k) =
max{k − 5, 0}, then the restriction f |ω(c) is a uniquely ergodic minimal Can-
tor system, which can be described as a BV-system (although not invertible,
because #X max = 5 > 1 = #X min ) or as enumeration system; see Exam-
ple 5.23. Sufficiently large cutting times satisfy the recursive relation

⎨Sk−2 + Sk−3 + 1 if k ≡ 2 or 3 mod 6,
Sk = Sk−1 + Sk−5 = Sk−2 + Sk−3 − 1 if k ≡ 0 or 5 mod 6,

Sk−2 + Sk−3 if k ≡ 1 or 4 mod 6.
Also, #Vk = 5 for all k sufficiently large and the height vector h(k) =
(Sk , Sk−1 , Sk−2 , Sk−3 , Sk−4 ). For k ≡ 2 or 3 mod 6, we have
0 < |e2πiα − 1| = |e2πiα(Sk −Sk−2 −Sk−3 ) − 1|
= |e2πiαSk − e2πiα(Sk−2 +Sk−3 ) |
≤ |e2πiαSk − 1| + |e2πiα(Sk−2 +Sk−3 ) − 1|
= |e2πiαSk − 1| + |e2πiαSk−2 − e−2πiαSk−3 |
≤ |e2πiαSk − 1| + |e2πiαSk−2 − 1| + |e2πiαSk−3 − 1|,
so |e2πiαSk − 1| → 0 as k → ∞. Hence 1 is the only eigenvalue, and (ω(c), f )
is weakly mixing. This example was discussed at length in [125] and [122,
Lemma 6.1]. Weak mixing holds for kneading maps Q(k) = max{k − d, 0}
for all d ≥ 5, but there are non-trivial eigenvalues if 1 ≤ d ≤ 4. For such d,
the leading solution of the characteristic equation xd = xd−1 + 1 is Pisot; see
Section 5.3.1.

Recall from Proposition 6.88 that the eigenvalues form a multiplicative

subgroup of S1 . The number of rationally independent elements in this group
is estimated by the following result; see [110, Theorem 9].
33 This diagram has two minimal sequences and one maximal sequence, but this has no

consequence for the application of Theorem 6.118 in regards to measurable eigenvalues

340 6. Methods from Ergodic Theory

Theorem 6.126. A Bratteli-Vershik system of finite rank r and e ergodic

probability measures has at most r −e+1 rationally independent eigenvalues.

Clearly, this bound is sharp for r = 1, and since 1 is an eigenvalue, all

eigenvalues have to be roots of unity. The next example shows that the
bound is also sharp for r = 2, and now irrational eigenvalues are possible.
Example 6.127. Let χ : 0 → 0001, 1 → 01 be a substitution, with fixed
point ρ = 0 001 001000101 . . . and associated matrix
3 1 √
with eigenvalues α± = 2 ± 2.
1 1
In particular, α+ is Pisot, so |||α+
n ||| → 0 exponentially, according to Propo-

sition 8.5. Consider the corresponding Bratteli diagram, where Vn = {0, 1}

(with vertex 0 on the spine) has height functions:
√ √ √
2 n 2− 2 n 2+ 2 n
h0 (n) = (α+ − α− ),
h1 (n) = α+ + α− .
4 4 4
All dyadic rationals are eigenvalues because every 2k divides h0 (n), h1 (n) for
n sufficiently large.
Also e2πiα+ is an eigenvalue, because
/// √ /// /// √ ///
/// 2 /// /// ///
/// // / // / 2 n ///
|||α+ h0 (n)||| = /// n+1
(α+ − α+ α−
)/// = ///h0 (n + 1) + (α− − α+ )α− ///
/// 4 /// /// 4 ///
decreases exponentially, and likewise for α+ h1 (n).
Chapter 7

Automata and
Linguistic Complexity

7.1. Automata
In this section we discuss Turing machines and variations of them and ask the
question of what languages they can recognize or generate. The terminology
is not entirely consistent in the literature, so some of the notions below may
be called differently depending on which book you read.

7.1.1. Turing Machines. A Turing machine is a formal description of a

simple type of computer (or if you prefer algorithm), named after the British
mathematician Alan Turing (1912–1954). He used this in theoretic papers to
explore the limits of what is computable by computers and what is not. So
rather than an actual computing device, Turing machines are a theoretic tool
to study which types of problems are in principle solvable by a computer.
For us, the minimal size of a Turing machine that can accept words in a
language L(X) (and reject words that don’t belong to L(X)) is a measure
for how complicated a language is. In fact, a subshift is called regularly
enumerable in the Chomsky hierarchy, see Section 7.2, if its language can
be recognized by a Turing machine.
A Turing machine has the following components:
• A tape on which the input is written as a word over the input
alphabet A = {0, 1, b} where b stands for a blank symbol. Only
finitely many symbols are allowed to be non-blank, so if the tape
is bi-infinite (but some variants of Turing machines used bounded
tapes), then it starts and ends with infinitely many blanks.

342 7. Automata and Linguistic Complexity

• A reading device that can read a symbol at one position on the tape
at the time. It can also erase the symbol and write a new one, and
it can move to the next or previous position on the tape. At the
start, the reading device is located at the first non-blank position.
• A finite collection of states Q = {q0 , . . . , qN −1 }, so N is the size
of the Turing machine. One state, say q0 , is the initial state. A
collection H of states is called halting states, which falls apart in
accepting, rejecting, and indecisive states. When a state q ∈ H
is reached, the machine stops and accepts or rejects the input or
remains undecided according to the status of q.
• Each state comes with a short list of instructions:
– read the symbol;
– replace the symbol or not;
– move to the left or right position;
– move to another (or the same) state.
This instruction list is the outcome of the transition function

δ : (Q \ H) × A  Q × A × {L, N, R},

where {L, N, R} stands for the moves of the reading device (left,
no move, right). Some authors disallow the N ; it can always be re-
placed by move right + move left, using a few extra states. Because
the Turing machine halts when it reaches a state q ∈ H, δ need not
be defined on H.
Furthermore, in all generality, the transition function can be
multivalued (so we write  rather than →). When reading a ∈ A
in state q ∈ Q\H, the Turing machine can choose which instruction
list in δ(q, a) it will perform. In order to emphasize this, we speak of
a non-deterministic Turing machine, whereas the Turing machine
is deterministic if δ is a proper single-valued function.

A Turing machine recognizes a language L if it halts and accepts every

x ∈ L, whereas it doesn’t accept (i.e. rejects, is indecisive, or never halts on)
every x ∈
/ L.

Example 7.1. Let χFib be the Fibonacci substitution of Example 4.6. The
following Turing machine replaces the input w ∈ {0, 1}∗ by χFib (w). The
word w is written as · · · bbbwbbbb · · · on the tape, and the reading device
(cursor) starts at the first symbol of w. The initial state is q0 and there is
7.1. Automata 343

one halting state q8 . The transition function δ is as follows (with a ∈ {0, 1}):

⎫ q0 : 01bbbb
(q0 , 0) → (q1 , b, R) ⎬
q1 : b1bbbb
(q0 , 1) →  (q4 , b, R) read input symbol
⎭ q1 : b1bbbb
(q0 , b) → (q8 , b, R)
q2 : b1bbbb
⎫ q3 : b1b0bb
(q1 , a) → (q1 , a, R) ⎬
move cursor right q6 : b1b01b
(q1 , b) → (q2 , b, R)
⎭ until the second blank q6 : b1b01b
(q2 , a) →  (q2 , a, R)
q7 : b1b01b
(q2 , b) → (q3 , 0, R) q7 : b1b01b
write 01
(q3 , b) → (q6 , 1, L) q0 : b1b01b

(q4 , a) → (q4 , a, R) ⎬ q4 : bbb01b
move cursor right
(q4 , b) → (q5 , b, R) q5 : bbb01b
⎭ until the second blank
(q5 , a) → (q5 , a, R) q5 : bbb01b
@ q5 : bbb01b
(q5 , b) → (q6 , 0, L) write 0
q6 : bbb010

(q6 , a) → (q6 , a, L) ⎬ q6 : bbb010
move cursor left
(q6 , b) → (q7 , b, L) q6 : bbb010
⎭ until the second blank
(q7 , a) → (q7 , a, L) q7 : bbb010
@ q0 : bbb010
(q7 , b) → (q0 , b, R) start at next input symbol
q8 : bbb010

The rightmost column gives an example of how the Turing machine works
step by step. If we remove the halting state q8 and replace the instruction in
the third line by (q0 , b) → (q0 , b, R), then the Turing machine will never halt,
but instead replace w by χFib (w) and then by χ2Fib (w), χ3Fib (w), χ4Fib (w), etc.

A priori, Turing machines need not halt on every input. Hence, if it

keeps rattling on, you can’t be sure if it simply hasn’t come to a conclusion
or maybe never will. Having an estimate (in terms of an input length) of how
long a Turing machine will take to accept an input is therefore of interest.
Since the reading device can move left too, the number of unread symbols
gives no indication.

7.1.2. Finite Automata. A finite automaton is a simplified Turing ma-

chine that can only read a tape from left to right but not write on it. For-
mally, it is a 4-tuple

(7.1) M = (Q, A, q0 , f )
344 7. Automata and Linguistic Complexity


Q = collection of states the machine can be in. This Q is divided into

accepting and rejecting states, which take effect once the input
has been read to the end.
A = the input alphabet over which the tape is written.
q0 = the initial state in Q.
δ = the rule of how to go from one state to the next when reading a
symbol on the tape. This is the transition function δ : Q×A  Q.

A language L is regular if it can be recognized by a finite automaton; i.e.

w ∈ L if and only if the automaton parsed w to the end and finishes in an
accepting state.
The finite automaton is deterministic (DFA) if the transition function
δ is single-valued, and it is non-deterministic (NFA) if the δ is multivalued.
So, if we are in state q, read symbol a on the input tape, and δ(q, a) has
multiple outcomes, then we need to make a choice. For computers, making
random choices is somewhat problematic. We don’t want to go into the
theoretic subtleties of random number generators, but we suppose that we
can simply assign equal probability to every valid choice, independently of
the choices made elsewhere in the process. The underlying stochastic process
is then a discrete Markov process. A word is accepted by an NFA if there is
a positive probability that the word is parsed to the end and finishes in an
accepting state.

Example 7.2. The language of the even shift of Example 1.4 is recognized
by the automaton in Figure 7.1. The tape is written over the alphabet
A = {0, 1, b}. The arrow qi → qj labeled a ∈ A represents δ(qi , a) = qj .

q3 accept q4 reject

b b
1 0
0 q0 q1 q2 0, 1

Figure 7.1. Transition graph for a finite automaton recognizing the

even shift. Parsing ends when the first blank is read.
7.2. The Chomsky Hierarchy 345

This example demonstrates how to assign an edged-labeled transition

graph to a finite automaton and explains why the regular languages are
precisely the sofic languages.
Sometimes it is easier, for proofs or constructing compact examples, to
allow finite automata to have transitions in the graph without reading the
symbol on the input tape1 (and moving to the next symbol). Such transitions
are called -moves. Automata with -moves are non-deterministic, because
if a state q has an outgoing arrow with label a and an outgoing arrow with
label  and the input tape reads a, then still there is the choice to follow that
a-arrow or the -arrow.
We mention without proof (see [319, page 22] or [20, Chapter 4]):

Theorem 7.3. Let L be a language recognized by a non-deterministic finite

automaton with -moves. Then there is a deterministic finite automaton that
recognizes L as well.

Corollary 7.4. Let wR = wn · · · w1 stand for the reverse of a word w =

w1 · · · wn . If a language L is recognized by some finite automaton, then so
is its reversed language LR = {wR : w ∈ L}.

Proof. Let (G, A) be the edge-labeled directed graph representing the finite
automaton for L. Assume without loss of generality that the automaton di-
rects every path to a single state, say qe , when the entire input is read. Then
the reverse graph (G R , A) in which the directions of all arrows are reversed
and qe becomes the initial state and vice versa recognizes LR . However,
even if every outgoing arrow in G has a different label (so the automaton is
deterministic), this is no longer true for (G R , A). But by Theorem 7.3 there
is also a DFA that recognizes LR . 

7.2. The Chomsky Hierarchy

A different approach to complexity of languages is due to Noam Chom-
sky’s study to describe grammar of natural languages, based on production
For example, to build sentences in English, you could (repeatedly) use
the following rules, until there are no variables (i.e. the things within ( ))

1 Said differently, if we include the movement of the reading device in the transition function

δ : Q × A  Q × {N, R}, then the δ −1 (Q × {N }) are the -moves.

346 7. Automata and Linguistic Complexity

(sentence) → (articled noun phrase)(transitive verb)
(articled noun phrase)
(articled noun phrase) → (article)(noun phrase)
(noun phrase) → (adjective)(noun phrase)
(noun phrase) → (noun)
(noun) → mouse, cat, book, fluency
(article) → the, a
(adjective) → big, small, high, low, red, green, orange, yellow
(transitive verb) → chases, eats, hits, reads
This produces sentences such as
a small yellow mouse chases the big green cat,
a high low red fluency eats a orange book.
Here the first sentence is fine; the second is nonsense. But apart from the
fact that “a orange” should be “an orange” it is grammatically correct.
In arithmetics, we can make the following example:
(expression) → (expression) ∗ (expression),
(expression) → (expression) + (expression),
(expression) → ((expression)),
(expression) → 0, 1, 2, 3, 4, 5, 6, 7, 8, 9.
This can generate all kinds of arithmetic expressions by repeatedly adding
and multiplying the numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 that a pocket calculator
should be able the compute. For instance
9 + (5 ∗ 3) + 7, (9 + 5) ∗ 3 + 7, 9 + 5 ∗ (3 + 7), (9 + 5) ∗ (3 + 7),
all with different outcomes.
Formally, this grammar has the components
G = (V, T, P, S),
V = set of variables to which production rules can be applied.
T = set of terminals which remain unchanged.
P = set of production rules to replace variables with words in V ∪ T.
S = a special variable, called the starting symbol.
7.2. The Chomsky Hierarchy 347

The language L(G) of a grammar G is the collection of all words in T ∗ that,

starting from S, can be generated by repeated application of the production
rules until no variables are left.
The Chomsky hierarchy is a classification of languages according to
how complicated their grammars are. In order of increasing complexity, they
regular grammars  context-free grammars
 context sensitive grammars
 recursively enumerable grammars.
A language that is generated by a regular/context-free/. . . grammar is itself
called regular/context-free/. . . . Note that different grammars can produce
the same language. We will define and discuss these types of grammars in
the following sections.

7.2.1. Regular Grammars. The regular grammars can be brought in

a form where the production rules are one of the following types:
left-linear or right-linear
A → Bw A → wB
A→w A→w
where A, B ∈ V and w ∈ T ∗ (possibly w is empty).

Example 7.5. The even shift (Example 1.4) is recognized by the following
left- and right-linear regular grammars with T = {0, 1}:
left-linear right-linear
S → S0 S → 0S
S → S11 S → 11S
S→ S→

Note that the language L is closed under taking reverses, i.e. LR = L, and
this property makes it so simple to convert the left-linear productions into
the right-linear productions.

Theorem 7.6 (Kleene’s Theorem). Every regular grammar (left-linear or

right-linear) produces a language that can be recognized by a finite automaton
and vice versa.

Hence a regular grammar produces the language of a sofic subshift.

Proof. First assume that G = {V, T, P, S} is a right-regular grammar. Con-

struct a finite automaton with -moves {Q, A, q0 , δ} where Q consists of all
348 7. Automata and Linguistic Complexity

q such that q = S or q is a (not necessarily proper) suffix of the right-hand

side of a production rule. Define a transition function

q  if q ∈ V, a = , q → q  is a production,
δ(q, a) =
q  if q = aq  ∈ T ∗ ∪ T ∗ V, a ∈ T, q → aq  is a production.
Conversely, if a finite automaton is given by {Q, A, q0 , δ}, then make the
right-regular grammar G = {V, T, P, qS} where the productions are p → aq
whenever δ(p, a) = q, and p → a if δ(p, a) = q.
A left-linear grammar is found by first constructing a finite automaton
that accepts exactly the reverse wR = wn · · · w1 of every w = w1 · · · wn ∈ L
(see Corollary 7.4) and then taking the right-linear grammar for this reverse
language LR . Then rewrite every production rule A → wB to A → Bw to
obtain a left-linear grammar that recognizes exactly the original L. 

7.2.2. Context-Free Grammars. The second sentence in (7.2) makes no

sense, because (for example) high does not go together with low and fluencies
don’t eat. In other words, the grammar rules produce word combinations
without looking at the meaning of the particular words and which words
can go together. This is the explanation behind the term context-free.
Formally, a context-free grammar (V, T, P, S) is one in which the set P of
productions is finite, and each of them has the form A → α, where A ∈ V
and α ∈ (V ∪ T )∗ is a finite string of variables and terminals.
Example 7.7. Dyck shifts (see Section 3.10) are context-free. For example,
if there are two types of brackets (, ), [, ], then a valid set of production rules
1. S→ (the empty word).
2. S → SS.
3. S → (S).
4. S → [S].
For instance, the expression ( [ ] ( [ ] ) ) is produced by the chain
S −→3 (S) −→2 (SS) −→4 ([S]S) −→3 ([S](S)) −→4 ([S]([S]))
−→1 ([ ]([S])) −→1 ([ ]([ ])).

Example 7.8. Consider the language L := {0n 1n : n ≥ 1}. That is, every
maximal block of 0’s is followed by an equally long word of 1’s.
This is a context-free language, generated by the productions
S → 0S1,
S → 01.
7.2. The Chomsky Hierarchy 349

In order to show that L is not simpler than context-free, we assume by

contradiction that L is sofic. Then there is a finite edge-labeled transition
graph G which generates L. Since there are only finitely many, say r, vertices,
every word 0n for n ≥ r must contain a subword 0m corresponding to a loop
in G. But then we can also take this loop k times. In particular, for each
word 0n 1n , also
0n+(k−1)m 1n = 01a 0m 0m 0
· · · 0m 0b 1n
the m-loop k times

is generated in G. But 0n+(k−1)m 1n ∈

/ L for k ≥ 2, so we have a contradiction.

This example shows that context-free grammars are a strictly wider class
than the regular grammars, and it also illustrates the working of a general
class of lemmas, called Pumping Lemmas that are frequently used as a
tool to distinguish grammars. The simplest (applied in Example 7.8) is:
Lemma 7.9 (Pumping Lemma for Regular Languages). Let L be a regular
language. Then there is N such that for every w ∈ L of length |w| ≥ N , we
can decompose w = tuv such that |uv| ≤ N , v = , and tuk v ∈ L for all
k ≥ 0.

Proof. As in Example 7.8, note that N ≤ #{vertices in G}. 

Corollary 7.10. The language of a Sturmian sequence x with irrational
rotation number is not regular.

Proof. Suppose by contradiction that the language L(x) was regular. Then
by the Pumping Lemma 7.9, there are words tuk v ∈ L(x) for some u =  and
any k ≥ 1. But the frequency of 1’s is limk |tu
k v|

|tuk v|
∈ Q and this contradicts
that the rotation number of x is irrational. See [249, Corollary 6.1.11]. 
Lemma 7.11 (Pumping Lemma for Context-Free Languages). Let L be a
context-free language. Then there is N such that for every w ∈ L of length
|w| ≥ N , we can decompose w = rstuv such that 1 ≤ |su| ≤ |stu| ≤ N , and
rsk tuk v ∈ L for all k ≥ 1.

Proof. See [319, Chapter 6] or [203, Chapter 4]. 

The very form of the Pumping Lemma 7.11 implies that square-free (or
power-free) subshifts cannot be context-free. The same holds for linear recur-
rent subshifts and in fact most other subshifts discussed in previous chapters.
Corollary 7.12. The language L(x) of a Sturmian sequence x is not context-
350 7. Automata and Linguistic Complexity

Proof. Suppose by contradiction that the language L(x) was context-free.

Then the Pumping Lemma 7.11 gives an N ∈ N such that every word w ∈
LN (x) of x can be decomposed as w = rstuv such that rsk tuk v ∈ L(x) as well
for each k ≥ 1. But then the limit frequency of 1’s is limk |rs = |su|
k tuk v|

|su| ∈
1 1
|rsk tuk v|
Q, contradicting that Sturmian sequences have irrational frequencies. 

A stronger form of Lemma 7.11 is Ogden’s Lemma; see [319, 433] and
[203, Chapter 4]. In this lemma, we mark positions in words w in any way
we like, but after our choice, the marking cannot change anymore.

Lemma 7.13 (Ogden’s Lemma). Let L be a context-free language. Then

there is p = p(L) ∈ N such that for every w ∈ L, if at least p positions in
w are marked, then we can decompose w = rstuv where s and u together
contain at least one and at most p marked positions, such that rsk tuk v ∈ L
for all k ≥ 1.

Ogden’s Lemma reduces to Lemma 7.11 if all positions in w are marked.

In this case |stu| ≤ p automatically. However, as we shall see in the following
result, marking positions can be very useful. In fact, there are stronger
versions, such as the one cited as [203, Corollary 4.2.5], which states that in

(7.3) Each of either r, s, t or t, u, v contains at least one marked position.

Corollary 7.14. Let ν = νfeig = 1011 1010 10111011 · · · be the Feigenbaum

kneading sequence. The associated unimodal shift space Xfeig is not context-

Proof. The proof is an adaptation of [565, Section 5]. Recall ν is the

fixed point of the Feigenbaum substitution χfeig : 0 → 11, 1 → 10 from
(4.4). In particular, ν is a concatenation of blocks Bj = ν1 · · · ν2j −1 ν2j and
Bj = ν1 · · · ν2j −1 (1 − ν2j ). In fact, letting χfeig act in the same way on Bj
and Bj as it acts on 0 and 1, it produces the Feigenbaum sequence.
Recall also that the Feigenbaum unimodal map is infinitely renormaliz-
able, and its edge-labeled Hofbauer tower is shown in Figure 7.2.
This shows that any x ∈ Xfeig can be decomposed as

(7.4) σ a ( 0n 1 B0n0 B0 B1n1 B1 B2n2 B2 . . . ),

where a, n, ni ∈ N0 and ni = 0 means that the block Bi is left out. The

graph also shows the following:
(1) If u4 ∈ L(Xfeig ) for some |u| ≥ 2, then u is (a cyclic permutation
of) Bja for some a, j ≥ 1.
7.2. The Chomsky Hierarchy 351

0 1 0 1

1 0 1 1 1 0 1 0
• • • • • • • • •
2nd 3rd

Figure 7.2. The edge-labeled Hofbauer tower for the Feigenbaum map.

(2) Since ν is fourth-power-free, neither Bj Bj Bj Bj nor Bj Bj Bj Bj can
be a subword of x after the first appearance of Bj+1 .
Now take m ∈ N so large that 2m−1 > p from Ogden’s Lemma 7.13, and
(7.5) z = Bm Bm Bm Bm = Bm Bm Bm Bm−1 Bm−1 ∈ L(Xfeig )

of which we mark the last 2m−1

positions. Let z = rstuv be the decom-
position promised by Ogden’s Lemma, and by (7.3), t has to intersect the
final block Bm−1 in (7.5). Therefore, if u = , then it is contained in the

final block Bm−1 and hence |u| < 2m−1 . Thus item (2) above implies that
rs5 tx5 v ∈
/ L(Xfeig ). Finally, if u = , then r has to intersect the final block

Bm−1 in (7.5), and s is contained in Bm−1 . Thus we can repeat the above
argument with s instead of u. 
Exercise 7.15. We compute the number of n-paths n starting in the left-
most node of Figure 7.2.
(1) Let an and bn be the number of n-paths from the second and third
node of Figure 7.2. By convention we set 0 = a0 = b0 = 1. Show
that n = n−1 + an−1 and an = an−1 + bn−1 .
(2) Show that b2n = b2n+1 .
(3) Use the self-similarity of the graph of Figure 7.2 to conclude that
b2n = an . Hence a0 = 1, a2 = 2, and an = an−1 + an/2 .
(4) Conclude that the lap-number (f n |[0,1] ) of the Feigenbaum equals
n ; it grows superpolynomially but subexponentially2 .

From the shape of its production rules, it is clear that the language
of Example 7.8 is context-free. No finite automaton can keep track of the
precise number of 0’s before starting on the 1’s, but there is a simple memory
device that can. Imagine that for every 0, we put a card on a stack, until
log (f n )
2 That is, lim sup 1
log (f n ) = 0 but lim inf n log n
= ∞.
352 7. Automata and Linguistic Complexity

we reach the first 1. At every 1 we remove a card again. If at the end of the
word no cards are left on the stack, the word is accepted.
This device is simple in construction: we can only add or remove at the
top of the stack; what is further down cannot be read until all the cards
above it are removed. On the other hand, the stack is unbounded, so it
requires unbounded memory.
Formally, the (push-down) stack has its (finite) stack alphabet C (think
of cards of different color) which is different from A and an “empty stack”
symbol e. The transition function needs to include instructions for the stack:
δ : Q × A × (C ∪ {})  Q × (C ∪ {, r})
where c ∈ C refers to adding a card of color c to the stack, r refers to
removing the top card from the stack, and  refers to leaving the stack
unchanged. When the input is read entirely, its acceptance depends on the
status of both the stack and state finally reached. The resulting automaton
with stack is called a push-down automaton.
Theorem 7.16. A language is (not more complicated than) context-free if
and only if it is recognized by a push-down automaton.

See [319, Section 5.3] for the proof. Using this theorem, it becomes
clear that Dyck shifts from Example 7.7 are context-free languages (and
non-regular if there are at least two types of brackets). We use a different
color card for each type of bracket, add a card of the correct color for every
opening bracket, and remove it for the corresponding closing bracket. If the
correct color is not at the top of the stack, then there are linked sets of

7.2.3. Context-Sensitive Grammars. We call a grammar (V, T, P, S)

context-sensitive if its set P of productions is finite and each of them
has the form α → β, where α, β ∈ (V ∪ T )∗ and |β| ≥ |α|. The terminals
themselves cannot change, but they can swap position with a variable. For
example aA → Aa and a1 a2 A → Ba1 a2 are valid production rules in a
context-sensitive grammar.
Remark 7.17. The word context-sensitive comes from a particular nor-
mal form of the productions, in which each of them has the form α1 Aα2 →
α1 Bα2 , where B ∈ (V ∪ T )∗ is a non-empty finite string of variables and
terminals and α1 , α2 ∈ (V ∪T )∗ are contexts in which the production rule can
be applied. Only if A is preceded by α1 and followed by α2 , the production
rule can be applied, leaving the context α1 , α2 unchanged.
Example 7.18. In contrast to Example 7.8, consider the language L =
{0n 1n 2n : n ≥ 1}. Pumping Lemma 7.11 can be applied to show that L is
7.2. The Chomsky Hierarchy 353

not context-free. However L is context-sensitive. For example, we can use

the productions
S → 012,
S → 00A12,
A1 → 1A,
1A2 → 11B22,
1B → B1,
0B1 → 00A1,
1A2 → 1122.
In practice, A is a cursor moving to the right, doubling 12 when it hits the
first 3. The procedure can stop here (by using the last production rule) or
produce cursor B that moves to the left, doubling 1 when it hits the first 1.
Note that at any stage, there is at most one variable: A can change into B
and B into A. Only the last production rule can remove this one variable
and stop the procedure altogether.
Example 7.19. The following set of productions produces the language
L = {12 : n ≥ 0}, that is, strings of 1’s of length equal to a power of 2:
S → AC1B, 1D → D1,
C1 → 11C, AD → AC,
CB → DB, 1E → E1,
CB → E, AE → .
Here A and B are begin-marker and end-marker, and C is a moving marker,
doubling the number of 1’s when it moves to the right. When it reaches the
end-marker B, then the following happens:
• It changes to a moving marker D, which just moves to the left until
it hits begin-marker A, and changes itself in C again. In this loop,
the number of 1’s is doubled again.
• Or it merges with the end-marker B to a new marker E. This
marker E moves left until it hits begin-marker A. It then merges
with A into the empty word: end of algorithm.
This language is context-sensitive, although the production rules CB → E
and AE →  are strictly speaking not of the required form. The trick around
it is to glue a terminal 1 to (pairs of) variables in a clever way and then call
these glued strings the new variables of grammar; see [319, page 224].
Proposition 7.20. Let χ : A → A∗ be a substitution on a finite alphabet
A = {a1 , . . . , aN } with χ(a1 ) = a1 · · · ar and fixed point ρ = limn χn (a1 ).
Then the corresponding substitution shift language L(Xρ ) is context-sensitive.
354 7. Automata and Linguistic Complexity

Proof. The language L(Xρ ) consists of all the finite subwords of ρ. We

present production rules that generate L(Xρ ) using the terminals T = A
and variables {Ai : i = 1, . . . , n} ∪ {B, E, P, Q, R, S}. The production rules,
with initial variable B, are3
B → SRA1 E the starting rule;
RAi → χ(Ai )R χ acts on Ai in the same way as on ai ∈ A;
RE → QE introduces a cursor to repeat the substitution;
Ai Q → QAi the cursor walks backwards;
SQ → SR initiates the next round of substitutions;
SQ → SP initial step to replace variable with terminals;
P Ai → ai P replacing variables by terminals;
ai P E → PE to remove suffixes;
PE →  final step to remove P ;
Sai → S acts as the left-shift σ to remove prefixes;
S →  final step to remove S.
It is straightforward to check that the first five rules mimic the substitution.
The seventh and eighth rule take a subword of the result of the first five
rules, and the last three rules remove the auxiliary variables P and S. 
Proposition 7.21. Let α ∈ (0, 1) be a quadratic irrational number. Then
the Sturmian subshift space Xα with frequency α has a context-sensitive

Proof. By Lagrange’s Theorem 8.45, the continued fraction expansion α =

[0; a1 , . . . , am , am+1 , . . . , am+n ] of every quadratic irrational is (pre)periodic.
Recall from Section 4.3.5 that the S-adic transformation (χi )i≥1 with χi =
(where χ0 and χ1 are defined in (4.31)) produces a Sturmian sequence
with the required frequency. But we can rewrite this as χpre = χa0/1 1
◦ ··· ◦
(which is used once) and χperiod = χa0/1
m +1
◦ · · · ◦ χa0/1
m +n
(which is used
repeatedly). Thus we can find a set of production rules in the same way as
in Proposition 7.20. 

A cardinality argument shows that not all Sturmian shifts or S-adic

shifts (or β-shifts or unimodal shifts) can have context-sensitive grammars.
Indeed, there are only countably many collections of production rules and
uncountably many Sturmian shifts, etc. However, due to the quadratically
irrational frequency, Xα involves an eventually periodic S-adic system, and
then there are only countably many such systems. In [554] it is shown
that the language of every unimodal map with unbounded kneading map
3 Note that the last three productions actually shorten strings. To avoid this, we can replace

them by “dummy” variables: P E → DD, Sai → SD, and S → D, and finally put all dummies at
the end by an extra production DB → BD for each B ∈ T ∪ V .
7.2. The Chomsky Hierarchy 355

cannot be context-free. This adds examples to the conjecture that context-

free languages coming from unimodal maps actually have to be regular (and
thus be derived from a unimodal map with preperiodic critical point; see

Proposition 7.22. Let νfeig = 1011 1010 10111011 · · · be the Feigenbaum

kneading sequence. The associated unimodal shift space Xfeig is context-

Complete proofs of this (regarding the itineraries of all x ∈ [0, 1], not just
those contained in the attractor) were given in [153] and [565, Section 6.2].
We give an explicit set of production rules. In [554] it is also shown that
two types of languages derived from Fibonacci unimodal maps are context-
sensitive as well.

Proof. Recall the structure of L(Xfeig ) from Corollary 7.14. The production
rules will be in groups, doing specific tasks.
(1) The initial word is BCr 1HE, where B and E are begin- and end-
markers that will also be used to eventually remove symbols from
the left and right.
(2) Produce a length 2n prefix of νfeig (for n ≥ 1 arbitrary) with markers
H at positions 2k , 0 ≤ k ≤ n. The markers Cr , Cl are cursors
running right and left, respectively.

BCr 1H → B1H0Cr H, XCr 1 → X10Cr , X = B,

Cr 0 → 11Cr , Cr 0 → 11r C,
Cr H → HCr , Cr E → Cl E,
BCl → BCr , XCl → Cl X, X = B, B  ,
BCl → BHCs , BCl → BHCr . 

(3) Produce an n-fold concatenation of ν2k−1 +1 · · · ν2k −1 ν2 k before the

appearance of ν2k−1 +1 · · · ν2k −1 ν2k . Here markers S play the role of
cards in the stack, and the cursor Cs puts them there. The cursors
C0 and C1 are used to copy a 0 and 1 from the beginning of a
subword to its end. Cursors Cl and Cr are again cursors going to
the left and right, respectively.

Cs a → SaCs , Cs H → Cl H,
XCl → Cl X, X = H, HCl Sa → aHCa , a ∈ {0, 1},
HCl SaS → aCa S, HCl SaX → a Ca Xa = 1 − a, X = S,
HCl → HCs a, HCl a → Cr ,
XCl → 
Cl X, X = H, Cr X → XCr if X = H,
Cr H → HCs , Cr H → HCr .
356 7. Automata and Linguistic Complexity

(4) Remove the markers H.

Cr E → Ch , BCh → B  ,
HCh → Ch , XCh → Ch X, X = H.
(5) Remove symbols from the left and the right, and remove final vari-
Ba → B for every a ∈ {0, 1}, B  → ,
aE → E for every a ∈ {0, 1}, E → .
This concludes the set of production rules. 

In effect, a Turing machine is a finite automaton with a memory device

in the form of an input tape that can be read, erased, and written on, in little
steps of one symbol at the time, but otherwise without restrictions on the
tape. If we restrict the tape to the length of the initial input, then we have
a linearly bounded non-deterministic Turing machine or linearly
bounded automaton. To avoid going beyond the initial input, we assume
that the input is preceded by a begin-marker, that cannot be erased and to
the left of which the reading/writing device cannot go. Similarly, the input
is followed by an end-marker, that cannot be erased and to the right of which
the reading/writing device cannot go. The next characterization is proved
in [319, Theorem 9.5].
Theorem 7.23. A language is context-sensitive if and only if it is recognized
by a linearly bounded non-deterministic Turing machine.

With this restriction, a Turing machine can still be very powerful. It can
compute prime numbers in the sense that {1p }p prime is a context-sensitive
language (but not context-free); see [486]. Theorem 7.23 suggests that to
find languages which are not context-sensitive, one needs to search for prob-
lems that take a lot of memory to solve. The class EXPSPACE is the class
of problems whose solution requires memory space of order 2p(n) (but not
less) for some polynomial p(n) of the input length n. The known examples
of such problems are complicated, and even more so to state in the form of
a language, so we will not try to give an example.

7.2.4. Recursively Enumerable Grammars. A grammar is called recur-

sively enumerable if there is no restriction anymore on the type of production
rules. For this largest class in the Chomsky hierarchy, there is no restriction
on the Turing machine anymore either, not even halting. That is, for every
word in the language, the Turing machine needs to halt and accept the word,
but for words not in the language, the Turing machine need not halt. If the
Turing machine halts on every input (and hence can decide for every word if
it belongs to the language or not), then the corresponding grammar is called
7.3. Automatic Sequences and Cobham’s Theorems 357

recursive (without enumerable); recursively enumerable grammars form a

strictly larger class than enumerable grammars.
Theorem 7.24. A language is recursively enumerable if and only if it is
recognized by a Turing machine.

A language that is not recursively enumerable can be shown to exist

by a cardinality argument and also by a version of the diagonal argument.
In short, let {Mi }i∈N be an enumeration of all Turing machines, and wi ∈
{0, 1}∗ is a word rejected by Mi (or Mi doesn’t halt on input wi ). Then
there is no Turing machine recognizing the language L = {wi }i∈N .
Seen in terms of push-down automata, a Turing machine is equivalent
to a push-down automaton with two stacks. Namely, we first read and store
the input of the tape into the first stack; then it serves as the input plus the
infinite left end of the tape. The second stack, empty at the beginning, serves
as the infinite right end of the tape. Then all computations that otherwise
would have been done on the tape can now be done with these two stacks
In summary of the Chomsky hierarchy, we have Table 7.1:
Table 7.1. Summary of the Chomsky hierarchy.

Type Automaton Productions

regular finite automaton A → w, A → wB (right-linear)
(sofic shift) A → w, A → Bw (left-linear)
context-free push-down A → γ ∈ (V ∪ T )∗
context- linearly bounded α → β, α, β ∈ (V ∪ T )∗ ,
sensitive non-deterministic |β| ≥ |α| (or αAβ → αγβ
Turing machine ∅ = γ ∈ (V ∪ T )∗ )
recursively Turing machine α → β (no restrictions)
enumerable = two stack push-
down automaton

7.3. Automatic Sequences and Cobham’s Theorems

A deterministic finite automaton with output (DFAO) is a sextuple
(7.6) Mout = (Q, A, q0 , δ, τ, B),
where the first four components are the same as in a finite automaton in
(7.1) and τ : Q → B is an output function that gives a symbol associated to
each state in Q. For a word w ∈ A∗ , we extend the transition function δ to
358 7. Automata and Linguistic Complexity

words w = w0 w1 . . . wk−1 , so that δ(q0 , w) is the state the DFAO is in when

the last letter of w is read (or when the automaton halts). Clearly
(7.7) δ(q, w) = δ(δ(q, w0 . . . wk−2 ), wk−1 ) = δ(δ(q, w0 ), w1 . . . wk−1 ).
Similarly, τ (q0 , w) denotes the symbol that is read off when the last letter of
w is read (or when the automaton halts).
The following central notion was originally due to Büchi [130]; see also
the monograph by Allouche & Shallit [20].
Definition 7.25. Fix a base N ≥ 2. Let A = {0, 1, . . . , N − 1}, and for
the integer n ≥ 0, let w(n) ∈ A∗ be the representation of n in base N ; i.e.
n = [w(n)]N := i=1 wi (n)N |w(n)|−i and w1 (n) = 0. A sequence x ∈ B N
is N -automatic if xn = τ (q0 , w(n)) for all n ∈ N, and an automaton that
generates x is called an N -automaton. A sequence x ∈ B N is called
automatic if it is N -automatic for some N ∈ N.
Example 7.26. The automaton in Figure 7.3 assigns the symbol 1 to every
word w ∈ L(Xeven ) of the one-sided even shift (Example 1.4) and the symbol
0 to words w ∈ {0, 1}N \ L(Xeven ). The output function τ : Q → {0, 1} is
indicated by the second symbol at each state (i.e. the numbers in the circles).

1 0
0 q0 /1 q1 /0 q2 /0 1 0

Figure 7.3. The DFAO for the even shift. The label qi /t stands for qi , τ (qi ).

As such, the sequence x ∈ {0, 1}N defined as

1, n contains only even blocks of 1 in its binary expansion,
xn =
0, otherwise
is a 2-automatic sequence.

0 q0 /0 q1 /1 0

Figure 7.4. The DFAO for the Thue-Morse sequence.

7.3. Automatic Sequences and Cobham’s Theorems 359

The Thue-Morse sequence ρTM = 0110 1001 10010110 10 · · · is used as

an example of a 2-automatic sequence, because of its characterization as
xn = #{1’s in the binary expansion of n} mod 2; see Example 1.6 and [20,
Section 5.1]. Figure 7.4 gives its 2-automaton.
An exact characterization whose sequences are automatic was given by
Cobham in 1972. This is sometimes called Cobham’s Little Theorem; see
[61, 162] and [20, Theorem 6.3.2].
Theorem 7.27. A sequence x is automatic if and only if x = ψ(ρ) for some
letter-to-letter substitution ψ and ρ is a fixed point of some constant length
substitution χ.

Proof. The proof relies on a way to rewrite the DFAO as a pair of substitu-
tions χ and ψ, and vice versa. The state space Q is the alphabet of both sub-
stitutions, the base N of the N -automaton is the length of the substitution
words χ(q) and also the cardinality of the input alphabet A = {0, . . . , N −1},
and the output function τ : Q → B is the letter-to-letter ψ : Q → B.
First assume that χ and ψ are given so that χ(q0 ) starts with q0 . Since
χ has the fixed point ρ, the letter q0 is the zeroth letter of ρ = ρ0 ρ1 ρ2 . . . ,
and we take it as the initial state of the DFAO. Now define the transition
function δ : Q × A → Q as
δ(q, a) is the a-th letter of χ(q).
Then for n ∈ {0, . . . , N − 1}, the representation of n in base N is simply
w(n) = n. If this is the input word, then q = δ(q0 , w(n)) is the n-th letter
of χ(q0 ), which is the n-th letter of ρ.
We continue by induction using that the induction hypothesis is
δ(q0 , w(n)) = ρn . We verified this for 0 ≤ n < N and assume that it
holds for all m < n. Write n = n N + n . Then
δ(q0 , w(n)) = δ(q0 , w(n)1 · · · w(n) )
= δ(δ(q0 , w(n)1 · · · w(n)−1 ), w(n) ) (by (7.7))

= δ(δ(q0 , w(n )), w(n) )
= δ(ρn , n ) (by induction)

= the n -th letter of χ(ρn )
= ρN n +n = ρn .
This completes the induction. It follows that τ (q0 , w) = τ (ρn ) = ψ(ρn ) = xn ,
as required for an automatic sequence.
Now for the converse, we are given the DFAO, and we can assume that
δ(q0 , 0) = q0 for intitial state q0 , because this simply deals with the insignifi-
cant digits 0 in a base N representation of n. Set ψ = τ : Q → B and define
360 7. Automata and Linguistic Complexity

χ : Q → QN by
χ(q) = δ(q, 0)δ(q, 1) . . . δ(q, N − 1).
Then χ(q0 ) = q0 and χ has a fixed point ρ starting with the letter q0 . For
n ∈ {0, . . . , N − 1} and its representation w(n) = n in base N , we find that
δ(q0 , w(n)) is the n-th letter of χ(q0 ), which is the n-th letter of ρ.
The induction hypothesis is again δ(q0 , w(n)) = ρn . We verified this for
0 ≤ n < N and assume that it holds for m < n. Write n = n N + n . Then
δ(q0 , w(n)) = δ(q0 , w(n)1 . . . w(n) )
= δ(δ(q0 , w(n)1 . . . w(n)−1 ), w(n) ) (by (7.7))

= δ(ρn , n ) (by induction)

= the n -th letter of χ(ρn )
= ρN n +n = ρn .
This completes the induction. Again τ (q0 , w) = τ (ρn ) = ψ(ρn ) = xn . 

All sequences that are eventually periodic are N -automatic for every
N ∈ N. In particular, every indicator sequence x = 1E of a finite set E is
automatic for every N ∈ N. However, as soon as sup E > N #Q for some
N -automaton Mout with set of states Q, then there must be a loop. That
is, for some m = [w]N ∈ E, the automaton Mout reading w must reach the
same state q ∈ Q twice: there must be a loop from q to q. The (proof of the)
Pumping Lemma 7.9 gives that Mout must accept the words that take this
loop an arbitrary number of times. These loops explain the existence of geo-
metric progressions in automatic sequences, such as the indicator sequence
x = 1{2k :k≥0} of the powers of 2; its 2-automaton in shown in Figure 7.5. It
also shows that 1{n!:n∈N} , or the indicator sequence of any superexponentially
increasing sequence, cannot be automatic.

k 3k base 2
0 1
1 1 11
0 0 0 2 1001
3 11011
1 1 4 1010001
q0 /0 q1 /1 q2 /0 5 11110011
6 1011011001
7 100010001011
start 8 11001101000011

Figure 7.5. A 2-automaton recognizing the powers of 2, and the lack

of structure of 3k expressed in base 2.
7.3. Automatic Sequences and Cobham’s Theorems 361

We call q ∈ Q a rejecting state if τ (q) = 0 and an accepting state

otherwise. From any DFAO, we can remove halting states q by removing
their halting status and instead add labeled arrows q →a q for each a ∈ A.
These are the dashed arrow in Figure 7.5. If Mout has no halting states
and from every q ∈ Q there is some path to an accepting state q  , then for
every u ∈ A∗ , we can find a suffix v ∈ A∗ such that uv is accepted by Mout .
A language with this property is called right dense. This is a substantial
extra requirement; the 2-automaton from Figure 7.5, also when the halting
states are removed, doesn’t satisfy it. It has the following consequence on the
density of automatic sequences of right dense languages; see [20, Theorem
Proposition 7.28. If x is N -automatic and for every u ∈ A∗ there is v such
that xn = 0 for n = [uv]N , then {n ∈ N : xn = 0} is syndetic.

This is not surprising in view of Theorem 7.27 and the fact that fixed
points of substitution shifts are syndetic.
For example, the indicator sequence x = 1{3k :k≥0} of the powers of 3
is trivially 3-automatic, but the powers of 3 written in base 2 betray no
pattern; see Figure 7.5. The natural question that inspired Büchi’s paper
[130] is whether x is 2-automatic. The answer relies on an elegant application
of the Pumping Lemma 7.9.
log Ñ
Proposition 7.29. If N and Ñ are multiplicatively independent, i.e. log N ∈
Q, then the indicator sequence x = 1{Ñ k :k≥0} of the powers of Ñ is not N -

Before giving the proof, we need a simple result from number theory.
Lemma 7.30. If N, Ñ ∈ N are multiplicatively independent, then for every
ε > 0 there are r, r̃ ∈ N such that |Ñ r̃ − N r | < εN r .

Proof. For every m ∈ N, there is a unique dm ∈ N such that 1 ≤ Ñ m N −dm <

N . Let ε > 0 be given and divide [0, N ] into intervals of length ≤ ε. The pi-

geon hole principle gives m < m ∈ N such that |Ñ m N −dm − Ñ m N −dm | < ε.
Multiplying with N dm Ñ −m , we get

|Ñ m −m − N dm −dm | < εN dm Ñ −m ≤ εN dm −dm .
Hence the lemma holds for r = dm − dm and r̃ = m − m. 

Proof of Proposition 7.29. Suppose Mout is an N -automaton generating

x; let Q be its set of states. Let ε = Ñ −#Q . By Lemma 7.30 we can find r, r̃
such that |Ñ r̃ − N r | < εN r . This means that
[10#Q v]N = Ñ r̃ = N #Q+|v| + [v]N = N r + [v]N
362 7. Automata and Linguistic Complexity

for some word v. Hence, when M0 parses 10#Q v, it must see some state
q ∈ Q twice before it finishes reading 10#Q . Say the corresponding loop
from q to q has length s. By the Pumping Lemma 7.9, Mout has to accept
10#Q+ks v for every integer k ≥ 0. Therefore there is an (k) ∈ N such that
(7.8) [10#Q+ks v]N = Ñ (k) = N #Q+ks+|v| + [v]N = N r+ks + [v]N .
Note that (k + 1) − (k) is bounded in k. Subtracting (7.8) for k from (7.8)
for k + 1, we obtain
Ñ (k) (Ñ (k+1)−(k) − 1) = N #Q+ks+|v| (N − 1).
This can only hold for all k ≥ 0 if Ñ m̃ = N m for some m, m̃ ∈ N, which
contradicts our assumption that N and Ñ are multiplicatively independent.
This concludes the proof. 

Cobham [161] generalized this to general Ñ -automatic sequences, not

just the indicator sequences of the powers of Ñ . This is Cobham’s Theorem.
Theorem 7.31. If 2 ≤ N, Ñ ∈ N are multiplicatively independent, then the
only sequences which are both N -automatic and Ñ -automatic are eventually

Since Cobham’s proof4 several others were given; see [229,301,453,473]

and [20, Section 11]. We will follow a new and much shorter proof due to
Krebs [371]. We start with a definition and lemma.
Definition 7.32. A sequence (xn )n∈N is locally periodic of period p ≥ 1
on an integer interval I if xn = xn+p whenever n, n + p ∈ I.
 7.33. Let (Ik )k∈N be a sequence of integer intervals such that N0 \
k∈N k is finite. Assume that (xn )n≥0 is a sequence that is locally periodic
with period pk on Ik for each k ∈ N. If #(Ik ∩ Ik+1 ) ≥ pk + pk+1 , then
(xn )n∈N is eventually periodic with period p = mink∈N pk .

Proof. The overlap of neighboring intervals Ik and Ik+1 is large enough that
the period pk extends to Ik ∪ Ik+1 . This is a special case of the Fine-Wilf
Theorem [247, Theorem 3]. In detail:
(i) Suppose that n, n + pk ∈ Ik ∪ Ik+1 . If n + pk ∈ Ik , then xn = xn+pk
by the local periodicity on Ik .
(ii) Otherwise n, n + pk ∈ Ik+1 and by local periodicity on Ik+1 we can
find n ≡ n mod pk+1 such that n , n + pk ∈ I ∩ I  ; see Figure 7.6. But then
xn = xn = xn +pk = xn+pk , using local periodicity on Ik+1 , on Ik , and again
on Ik+1 , respectively.
4 Eilenberg [233] called Cobham’s proof correct, but long and unreasonable, but Cobham’s

proof is just six pages, and whether unreasonably technical is in the eye of the reader.
7.3. Automatic Sequences and Cobham’s Theorems 363

case (i) case (ii)

n n+p n n +p 
n n+p
p + p I

Figure 7.6. Overlapping intervals with local periods p and p .

Therefore the local period pk carries over to Ik+1 . By the same argument,
the period pk carries over to all next neighbors, both left and right. By
induction, the period carries over to all Ij , j ∈ N. Hence xn = xn+pk for all
n ≥ min I1 . This is true for all k, so in particular to p = mink pk . 

Proof of Theorem 7.31. Assume that the sequence (xn )n≥0 is both N -
automatic and Ñ -automatic. The aim is to prove that (xn )n≥0 is locally
periodic. We will specify the integer intervals Ik , show that they have a
local period, and finally show that they have sufficient overlap to apply
Lemma 7.33. An important idea to achieve this overlap is to use larger
input alphabets A = {0, 1, . . . , 2N − 1} and à = {0, 1, . . . , 2Ñ − 1} than the
bases suggest, but according to [20, Theorem 6.8.6], there are DFAOs that
produce the same automatic sequences in this case. We don’t change the
bases, so integers may have multiple representations in the extended input
alphabets, but for all representations, the DFAO will give the same output.
Without loss of generality, assume that N < Ñ . Let Q and Q̃ be the
sets of states of the corresponding automata. Since we only have to prove
that (xn )n≥0 is eventually periodic, it suffices to consider states q such that
δ(q0 , w) = q for infinitely many w ∈ A∗ .
If w, w ∈ A∗ are such that δ(q0 , w) = δ(q0 , w ), then also δ(q0 , wz) =
δ(q0 , w z) for every z ∈ A∗ . For the N -automatic sequence (xn )n≥0 and any
r ∈ N, this means that

(7.9) xkN r +j = xk N r +j for every j ∈ {0, . . . , 2N r − 1},

|w| |w|−i (as in Definition 7.25) and k  = [w  ]
if k = [w]N = i=1 wi N N =
|w | |w  |−i 
i=1 wi N are the integers represented by w and w , respectively. The
analogous statement holds for the Ñ -automaton.
For each q ∈ Q, we can find q̃ ∈ Q̃ and distinct integers aq , aq such that
q = δ(q0 , w) = δ(q0 , w ) and q̃ = δ(q̃0 , w̃) = δ(q0 , w̃ ) whenever w, w ∈ A∗
and w̃, w̃ ∈ Ã∗ are such that aq = [w]N = [w̃]Ñ and aq = [w ]N = [w̃ ]Ñ .
Let ξ = maxq∈Q max{aq , aq }. Using Lemma 7.30 for ε = 1/(8ξ) we can find
364 7. Automata and Linguistic Complexity

r, r̃ ∈ N, independently of q ∈ Q, such that

ξ|Ñ r̃ − N r | ≤ N r .
In particular, 78 N r ≤ Ñ r̃ . Define integer intervals
" A
1 5
Ik = kN r + N r , kN r + N r ∩ N, k ∈ N,
3 3
centered at (k + 1)N r with radius 23 N r .
Assume that k = [w]N for some word w with δ(q0 , w) = q. Swapping aq
and aq if necessary, we fix the local period on Ik as
pq := (aq − aq ) · (Ñ r̃ − N r ) ∈ 0, N r .
To show that pq is indeed a local period, choose j ∈ {0, . . . , 2N r − 1} so that
kN r + j, kN r + j + pq ∈ Ik . Then

|j − aq (Ñ r̃ − N r ) − Ñ r̃ | ≤ |j − N r | + (aq + 1)|Ñ r̃ − N r |

2 r 1 r 7
≤ N + N < N r ≤ Ñ r̃ ,
3 8 8
so 0 ≤ j − aq (Ñ r̃ − N r ) < 2Ñ r̃ − 1, and
xkN r +j = xaq N r +j by (7.9) since 0 ≤ j < 2N r
= xaq Ñ r̃ +j−aq (Ñ r̃ −N r )
= xa Ñ r̃ +j−aq (Ñ r̃ −N r ) by the analogue of (7.9) for the

Ñ -automaton, since 0 ≤ j − aq (Ñ r̃ − N r ) < 2Ñ r̃

= xa N r +j+(a −aq )(Ñ r̃ −N r )
q q
= xkN r +j+pq
= xkN r +j+pq by (7.9) since 0 ≤ j < 2N r .

Since |Ik ∩ Ik+1 | = 13 N r > 2 maxq∈Q pq , the overlap of these integer intervals
is as large as required in Lemma 7.33. 

Since all automatic sequences can basically be written as in Theorem 7.27

(cf. [405]), this gives retrospectively information on constant length substi-
tution shifts; see Durand & Rigo [220, 229] and also [223] and references
therein. The question is then whether Cobham’s Theorem extends beyond
automatic sequences to fixed points of substitutions that are not constant
length. Durand [223, Theorem 1] indeed proves a corresponding result.
7.3. Automatic Sequences and Cobham’s Theorems 365

Theorem 7.34. Let ρ and ρ̃ be the fixed points of two primitive5 substitutions
whose associated matrices have leading eigenvalues λ and λ̃. If there are
substitutions ψ and ψ̃ such that ψ(ρ) = ψ̃(ρ̃) and this is not an eventually
periodic sequence, then log λ̃
log λ ∈ Q.

That non-trivially different substitutions can have the same fixed point
is shown in e.g. Example 4.28.

Example 7.35. One application of Theorem 7.34 to the classification inverse

limit spaces of tent maps helps solving the so-called Ingram Conjecture.
Let Ts (x) = min{sx, s(1 − x)} be a tent map whose critical point c = 12
has periodic N . Restrict Ts to the core [c2 , c1 ] where cn = Tsn (c). The
inverse limit space ← −([c2 , c1 ], Ts ) is the collection of backward orbits
of Ts : [c2 , c1 ] → [c2 , c1 ] with product topology. Such inverse limit spaces
consist of uncountably many continuous images of R as well as N continuous
images of [0, ∞) (and the image of 0 is called an end-point), all lying dense
←−([c2 , c1 ], Ts ). They can be embedded in the plane and then look similar
in lim
to the Knaster continuum6 of Figure 4.2 (left), except that instead of one
end-point, ← −([c2 , c1 ], Ts ) has N end-points. Ingram’s Conjecture states that
if Ts and Ts̃ , both with periodic critical points, have different slopes, then
their inverse limit spaces are not homeomorphic.
Let c2 = x1 < x2 < · · · < xN = c1 be the critical orbit arranged
in increasing order. It defines a Markov partition, see Example 3.9 and
Theorem 3.14, and the leading eigenvalue of the corresponding transition
matrix A is ehtop (Ts ) = s. We can also associate a substitution χ to Ts on
the alphabet A = {1, 2, . . . , M , 1,
 ...,M } (where M = N − 1) by

⎪i → j · · · k if Ts ([xi−1 , xi ] = [xj−1 , xk ]) is orientation preserving,

⎨i → k · · · j if T ([x , x ] = [x , x ]) is orientation reversing,
s i−1 i j−1 k

i → k · · · j if Ts ([xi−1 , xi ] = [xj−1 , xk ]) is orientation preserving,

⎩i → j · · · k if T ([x , x ] = [x , x ]) is orientation reversing.
s i−1 i j−1 k

Figure 7.7 gives the three possibilities if the period N = 5.

As shown in [120, Lemma 3], the associate matrix of χ has the same
eigenvalues as A as well as M eigenvalues on the unit circle. Since c2 has
period N , χN (1) starts with 1, and ρ := limn χnN (1) is a fixed point of
χN . It represents the way the arc-component of the end-point (c2 , c1 , c =
cN , cN −1 , . . . , c3 , c2 , c1 , c, . . . ) coils through the inverse limit space.

5 In fact, [223] uses a slightly weaker assumption than primitive.

6 Which is the inverse limit space of T2 .
366 7. Automata and Linguistic Complexity

⎧ ⎧ ⎧
⎪1 → 2, 1 → 2,
 ⎪1 → 2 3, 1 → 3 2,
 ⎪1 → 3 4, 1 → 4 3,

⎪ ⎪
⎪ ⎪

⎨2 → 3, 2 → 3,
 ⎨2 → 4, 2 → 4,
 ⎨2 → 4,
 2 → 4,
χ: χ: χ:

⎪3 → 4, 3 → 4,
⎪  3 → 3 4,
3 → 4 3, ⎪
⎪3 → 3 2,
 3 → 2 3,

⎩ ⎪
⎩ ⎪
4 → 4 3 2 1,
 4 → 1 2 3 4,
     4 → 21, 4 → 1 2,
  4 → 1,
 4 → 1.

c2 c c1 c2 c c1 c2 c c1

Figure 7.7. Partitions for three different tent maps with critical period 5.

It is a topological invariant in the sense that if Ts̃ is another tent map

where the critical point has period Ñ and χ̃ and ρ̃ are constructed analo-
gously, then ← −([c2 , c1 ], Ts ) can only be homeomorphic to ←
lim −([c2 , c1 ], Ts̃ ) if
N = Ñ and there is a substitution ψ such that ψ(ρ̃) = ρ; see [120]. But The-
orem 7.34 implies that this can only happen if the logarithms of the leading
eigenvalues of the associated matrices of χ and χ are rationally dependent,
h (Ts )
that is, if htop
top (Ts̃ )
∈ Q.
By now, the Ingram Conjecture has been confirmed in [338], [526], and
[51] for periodic, preperiodic, and general critical orbits, respectively. For
multimodal maps, the Ingram Conjecture remains open. Although similar
techniques are likely to work, the problem has an extra facet in that there
are non-conjugate multimodal tent maps with the same entropy.
Chapter 8

Background Topics

8.1. Pisot and Salem Numbers

Definition 8.1. A real number α is called algebraic if it is a zero of a non-
constant polynomial with integer coefficients. The smallest possible degree
d that such a polynomial can have is called the degree of α, and the poly-
nomial1 p ∈ Z[x] of this smallest degree is called the minimal polynomial.
The other solutions of p(x) = 0 are called the algebraic conjugates or Ga-
lois conjugates of α. If the leading coefficient of p(x) is 1, then α is called
an algebraic integer. Non-algebraic numbers are called transcendental.

The set of algebraic numbers is countable, as one can check from the
fact that for each n ∈ N, there are only finitely many algebraic numbers
that are the root of a degree d polynomial with integer coefficients ai such

that d + di=0 |ai | = n. Hence, most real numbers are transcendental. They
are more difficult to specify (short of writing down all their decimal dig-
its), but examples of transcendental numbers are e, π (in fact, π α for every
algebraic number α) and ζ(3), as proved by Hermite (1873), Lindemann
(1882), and Apéry (1978), respectively. Hilbert asked in his 1900 address

the International Mathematical Congress whether numbers such as 2 are 2

transcendental (Hilbert’s 7-th Problem). This was solved a good thirty years
later by Gelfond [272] and Schneider [491]: every number of the form ab
where a = 0, 1 is algebraic and b is an algebraic irrational is transcendental.

1 For definiteness, we assume that the coefficients have no common prime divisor and the first

coefficient is positive.

368 8. Miscellaneous Background Topics

Among the algebraic numbers there are some classes that are responsible
for special properties in various dynamical systems.
Definition 8.2. An algebraic integer α > 1 is called a Pisot number if all
its Galois conjugates of its minimal polynomial (called the Pisot polyno-
mial) are in the open unit disk. If the Galois conjugates are in the closed
unit disk, with at least one on the boundary, then α is a Salem number.

For example, all the multinacci numbers, i.e. the leading solutions of the
xd = xd−1 + xd−2 + · · · + 1,

are Pisot numbers. The numbers xa = a+ 2a +4 , i.e. the leading roots of

x2 − ax − 1, a ∈ N, are all Pisot numbers. In particular, x1 = 12 (1 + 5) is

the golden mean, x2 = 1+ 2 is the silver mean, and in general, the numbers
xb are called the metallic means; see also (8.21). Salem [485] showed that
the set of Pisot numbers is closed, so there is a smallest Pisot number. This
turns out to be the cubic irrational x = 1.3247 . . . solving x3 = x + 1. It
is known as the plastic number (see [1] for more on the history of this
terminology), and it is isolated in the set of Pisot numbers [506]. The next

one is the leading root of x4 = x3 − 1 and every other one is larger than 2.
The smallest known Salem number is called Lehmer’s number λ =
1.17628 . . . [312, 391]; it is the leading root of Lehmer’s polynomial
p(x) = x10 + x9 − x7 − x6 − x5 − x4 − x3 + x + 1.
There are polynomials of lower degree that non-trivially have roots on the
unit circle, for example x4 − 2x3 − 2x + 1 = 0, which has smallest possi-
ble degree, but the leading root is larger than Lehmer’s number. It is an
open question whether all characteristic polynomials of non-negative integer
matrices with roots on the unit circle are reducible (so not of Salem type).
Proposition 8.3. If α > 1 is a Salem number, then its minimal polynomial
is palindromic; i.e.
p(x) = ad xd + ad−1 xd−1 + · · · + a1 x + a0 = a0 xd + a1 xd−1 + · · · + ad−1 x + ad .
Except for 1/α, all Galois conjugates of α lie on the unit circle but are not
roots of unity.

Proof. Let p(x) = ad xd + · · · + a1 x + a0 be the minimal polynomial of α,

and let p∗ (x) = a0 xd + · · · + ad−1 x + ad be the reciprocal polynomial,
i.e. p(x) with the coefficients written in backward order. We need to show
that p(x) = p∗ (x).
Note that the reciprocal polynomial p∗ (x) = xd p(1/x), so if α is a Galois
conjugate, then so is 1/α . In particular 1/α is a Galois conjugate, and no
8.1. Pisot and Salem Numbers 369

other Galois conjugate α can have |α | < 1, because then |1/α | > 1 and
this contradicts that α is a Salem number. Thus all the remaining Galois
conjugates α lie on the unit circle, and the complex conjugate α = 1/α is
also a root of p(x) and of p∗ (x). But then α is also a root of the polynomial
a0 p(x) − ad p∗ (x) which has degree < d. This contradicts that p(x) is the
minimal polynomial of α, unless p(x) = p∗ (x). If α is a root of unity, then
its minimal polynomial, which is (a factor of) xr − 1, divides p(x), but is not
equal to it. This contradicts that p is irreducible. The proof is complete. 
Definition 8.4. An algebraic number α > 1 is a Perron number if all its
algebraic conjugates α satisfy |α | < α.

That the leading eigenvalue of a non-negative aperiodic irreducible ma-

trix is a Perron number follows directly from the Perron-Frobenius Theorem
8.58. The converse, however, is also true: every Perron number is the leading
eigenvalue of a non-negative aperiodic irreducible matrix; see [397, Theorem
1]. In fact [397, Theorem 3], the algebraic conjugates α of α satisfy |α | ≤ α
if and only if α is the leading eigenvalue of a non-negative irreducible integer
matrix (without stipulating that this matrix is aperiodic).
Recall that |||x||| denotes the distance of x to the nearest integer.
Proposition 8.5. If α > 1 is Pisot, then |||αn ||| → 0 exponentially fast.

Proof. Let α1 , . . . , αd−1 be the Galois conjugates of α, and let A be a d × d

integer matrix whose characteristic polynomial is the minimal polynomial of
α. Let J = U −1 AU be the Jordan normal form of A, so J n = U −1 An U
and J n and An have the same characteristic polynomial det(λI − An ) =
λd +pd−1 λd−1 +· · ·+p1 λ+p0 , with the same coefficient pd−1 . This coefficient
is minus the trace, so
αn + αin = tr(J n ) = tr(An ) = −pd−1 ∈ Z.
Therefore |||αn ||| = ||| i=1 αin ||| → 0 exponentially fast. 

8.1.1. Conditions for recursive sequences satisfying |||αGn||| → 0. We

will investigate for which irrational α (if any) |||αGn||| → 0 for sequences Gn .
This is of interest in several questions in ergodic theory and elsewhere; see
e.g. Section 6.9. Pisot numbers are important, but this is not the whole
Proposition 8.6. Let the integer sequence (Gn )n≥0 satisfy the recursion
(8.1) Gn = ad−i Gn−i , G0 , . . . , Gd−1 ∈ N0 arbitrary,
370 8. Miscellaneous Background Topics

where ai ∈ N0 , a1 ≥ 1 are the coefficients of a Pisot polynomial p(x) :=

xd − d−1 i
i=0 ai x Pisot. If α is the leading root of p(x), then
(8.2) |||αGn ||| → 0 exponentially fast.

The recursive relation (8.1) can be written as

⎛ ⎞
⎛ ⎞ ⎛ ⎞ 0 1 0 ... 0
Gn−d+1 Gn−d ⎜0 0 ⎟
⎜Gn−d+2 ⎟ ⎜Gn−d+1 ⎟ ⎜ 1 ... 0 ⎟
⎜ ⎟ ⎜ ⎟ ⎜ .. . .. ..⎟
(8.3) ⎜ .. ⎟ = A⎜ .. ⎟, A = ⎜ . .. . .⎟,
⎝ . ⎠ ⎝ . ⎠ ⎜ ⎟
⎝0 0 1 ⎠
Gn Gn−1
a0 a1 . . . ad−2 ad−1
so p(x) = det(xI − A). This has the advantage that in the Jordan decom-
position of A, namely A = U JU −1 , the matrix
⎛ ⎞
1 1 ... 1
⎜ λ1 λ2 ... λd ⎟
⎜ ⎟
⎜ 2 λ22 λ2d ⎟
U = ⎜ λ1 ... ⎟
⎜ . . .. ⎟
⎝ .. .. . ⎠
1 λd−1
2 ... λd−1
is (the transpose of) a Vandermonde matrix, provided that all the eigenvalues
λi of A are distinct. However, we get the same result for any matrix A

with characteristic polynomial p(x) = xd − d−1 i
i=0 ai x . Indeed, if we take
(0) (0)
G(0) = (G0 , . . . , Gd−1 )T ∈ Nd0 arbitrary and set G(n) = An G(0) , then by the
Cayley-Hamilton Theorem, p(A) = 0. Hence
G(n) = ad−1 G(n−1) + · · · + a0 G(n−d) .
In particular, each component of G(n) satisfies (8.1).

Proof of Proposition 8.6. Let (vi )di=1 be the eigenvectors of A associated

to eigenvalues λi where λ = λ1 > 1 > |λi | for i = 2, . . . , d. For sim-
plicity, we assume that A is indeed complex diagonalizable; otherwise the
proof becomes more technical but remains in essence the same. Decompose

(G0 , . . . , Gd−1 )T = di=1 civi for ci ∈ C; then
d d
(8.4) (Gn−d+1 , . . . , Gn )T = An−d civi = civi λn−d
i .
i=1 i=1
Therefore, the d-th component
 d d−1
λGn − Gn+1 = civi λn−d
i (λ − λi ) = civi λn−d
i (λ − λi )
i=1 d i=1 d
is indeed exponentially small in n. 
8.1. Pisot and Salem Numbers 371

Remark 8.7. For the case p(x) = x2 − x − 1 with Gn = Fn , F0 = 0, F1 = 1

the Fibonacci numbers, (8.4) reduces to Binet’s formula
1   1 √
(8.5) Fn = √ γ n − (−γ)−n , γ = (1 + 5).
5 2

The problem we would like to solve (for various purposes in number

theory, Diophantine approximation, and, as presented in Section 6.9) is under
what conditions (8.2) has a solution. That is, when is α ∈ R such that
|||αGn||| → 0? If the Gn ’s are multiples of q, then α = p/q solves this
equation. No α ∈ Q solves the equation if q | Gn for all sufficiently large n.
But for irrational α beyond Pisot numbers, it is an important and intriguing
question, which sometimes, maybe surprisingly, has a positive answer.

For the rest of this section, let p(x) = det(A−xI) = xd − d−1 i
i=0 ai x be the
characteristic polynomial of an integer matrix A. Let Λ = {λi ∈ C : p(λi ) =
0} be the set of eigenvalues of A, which we split into Λ+ = {λi ∈ Λ : |λj | ≥ 1}
and Λ− = {λi ∈ Λ : |λi | < 1}. Let also λ1 be the leading root of p(x); if A is
a non-negative matrix, λ1 > 0 due to the Perron-Frobenius Theorem 8.58.
Theorem 8.8. Assume that the eigenvalues λ ∈ Λ+ are all distinct. Suppose
that theinteger sequence (Gn )n≥0 satisfies (8.1). If there is a polynomial
g(x) = k gk xk ∈ Q[x] such that g(λ) = α for all λ ∈ Λ+ , then
|||αGn ||| → 0 for  = lcm{denominators of gi }.
In addition,
|||αk Gn ||| → 0 for every k ∈ N and some  = (k) ∈ N.

This result goes back to Livshits; see [402, 403] and also [519]. For
Pisot matrices, we can trivially choose g(x) = x and α = λ1 , the leading
eigenvalue. By choosing g(x) = xk , we see that |||λk1 Gn ||| → 0 for all k ∈ N.
Example 8.9. In [246, Section 4], the following example is presented:

⎪ 0 → 0133, ⎛ ⎞

⎪ 1 0 0 1

⎨1 → 12, ⎜1 1 0 0⎟
χ: with associated matrix A = ⎜ ⎝0 1 0 0⎠ .

⎪ 2 → 3,

⎩3 → 0 2 0 1 0

The characteristic polynomial

√ √
p(x) = x4 − 2x3 − x2 + 2x + 1 = (x2 − x − 1 − 2)(x2 − x − 1 + 2)
has solutions
9 √ 9 √
1+ 5+4 2 1− 5+4 2
λ1 = > 1, λ2 = < −1
2 2
372 8. Miscellaneous Background Topics

and 9 9
√ √
1+i −5 + 4 2 1 − i −5 + 4 2
λ3 = , λ4 =
2 2

√ the unit circle. For g(x) = x 2 − x − 1, we have g(λ1 ) = g(λ2 ) = 2,
so 2 solves (8.2).

Proof of Theorem 8.8. By the recursive relation (8.1) we can write Gn =

d−1 n
i=0 ci λi , where λi are the roots of p. Here we assume that all roots of
p(x) are distinct, not just the roots outside the unit disk. Without this
assumption we need to consider non-trivial Jordan blocks for these roots,
which is more technical but not seriously different. Thus we have
αGn = α ci λni + α ci λni .
|λi |≥1 |λi |<1
d j.
Suppose that the polynomial g(x) = j=0 gj x Then
m m d
Fn := gj Gn+j = gj ci λn+j
j=0 j=0 i=1
= ci λni g(λi ) = α ci λni + ci λni g(λi ).
i=1 |λi |≥1 |λi |<1

Therefore, letting  = lcm{denominators of gi }, we obtain

αGn − Fn = ci (α − g(λi ))λni → 0 as n → ∞.

|λi |<1

Since Fn ∈ Z, |||αGn ||| → 0 as claimed. To obtain that also |||αk Gn ||| → 0,
we repeat the argument with g k (x) = p(x)q(x) + r(x), noting that p(λi ) = 0
for all i and r(λi ) = αk for |λi | ≥ 1. 

The following converse of Theorem 8.8 requires some notation. Decom-

pose the characteristic polynomial into factors that are irreducible over Q:

p(x) = det(A − xI) = qi (x),

and let Λi = {λj ∈ Λ : qi (λj ) = 0}. Also let

Hi = Span{vj : vj is (generalized) eigenvector of λj ∈ Λi }
be the< eigenvalues in Λi . Clearly
joint span of the eigenspaces associated to <
Cd = i Hi is a direct sum, and we write Hi⊥ = j=i Hj for the subspace
complementary to Hi . Naturally, all these notion simplify to Λi = Λ, Hi =
Cd , and Hi⊥ = {0} if p is irreducible.
8.1. Pisot and Salem Numbers 373

Theorem 8.10. Assume that the eigenvalues λ ∈ Λ+ all have multiplicity

one. If α ∈ / Q satisfies |||αGn ||| → 0, then for each Λi intersecting Λ+ such
/ Hi⊥ , there is a polynomial gi ∈ Q[x] such that α =
that (G0 , . . . , Gd−1 )T ∈
g(λ) for all λ ∈ Λi ∩ Λ+ .
Remark 8.11. For irreducible polynomials p(x), also this result goes back
to Livshits [402, 403], who in fact claims2 that we can take g(x) ∈ Z[x]; the
paper [246] comes to the same result as Theorem 8.10.
Exercise 8.12. Suppose that the characteristic polynomial p(x) of A has
an irreducible quadratic factor q(x) such that both roots |λ± | ≥ 1. Show
that there is no α ∈ R \ Q such that |||αGn (x)||| → 0.

The proof of Theorem 8.10 uses a fair amount of Galois theory and
properties of the Galois group G associated to the field extension Q(λj ∈ Λ).
This is the group G of automorphisms on Q(λj ∈ Λ) that fixes Q itself. It
turns out that every τ ∈ G permutes the roots in Λ, and if we extend τ
coordinate-wise to the eigenvectors3 vj corresponding to the eigenvalues λj ,
then it permutes the eigenvectors in the same way as the eigenvalues:
Lemma 8.13. Given τ ∈ G we have τ (vj ) = vj  if and only if τ (λj ) = λj  .

Proof. Let G be the Galois group of the splitting field K, and for τ ∈ G,
denote the coordinate-wise action on vectors in Kd again by τ . Because the
eigenvectors satisfy Avj = λj vj , indeed vj ∈ Q(λj )d ⊂ Kd . Since A has
integer coefficients, we have for every τ ∈ G and vector vj ∈ Kn ,
A τ (vj ) = τ (A vj ) = τ (λj vj ) = τ (λj ) τ (vj ).
This shows that τ (vj ) is an eigenvector of A with eigenvalue τ (λj ). 

In the case that p(x) is irreducible, and hence Gi = G acts transitively

on Λi = Λ, we can say a bit more. Since each τ ∈ G either fixes the set Λg =
{λj ∈ Λ : g(λj ) = α} or τ (Λg ) ∩ Λg = ∅, we derive that Λ = s−1 i
i=0 τ (Λg )
for some s ∈ N, and therefore #Λg divides the degree d of the characteristic
polynomial p(x); see [519, Corollary 2.2]. It follows [519, Corollary 2.3]
that if d is a prime, only (integer linear combinations of powers of) Pisot
numbers α = λ0 can satisfy Theorem 8.10. The smallest degree for which
Theorem 8.10 non-trivially applies, i.e. there are at least two roots outside
the unit circle, is four; see Example 8.9.
We finish this section with the proof of Theorem 8.10, which needs some

2 But I am not able to verify this from his paper.

3 We scale the eigenvectors vj so that one and therefore every entry belongs to Q(λj ).
374 8. Miscellaneous Background Topics

Lemma 8.14. Let qi (x) be an irreducible factor of the characteristic poly-

 ∈ Qd \ Hi⊥ . Then in
nomial p(x) of the matrix A, vanishing on Λi . Let w
 = j=0 cj vj we have cj = 0 for all λj ∈ Λi .
the decomposition w

Proof. Let Gi be the Galois group associated to the splitting field Ki of the
/ Hi⊥ ,
irreducible factor qi . If deg(qi ) = 1, then dim(Hi ) = 1 and since w
there is nothing to prove.
Now suppose deg(qi ) ≥ 2, so dim Hi ≥ 2. Suppose by contradiction that
there are λj , λj  ∈ Λi such that cj = 0 = cj  . Since Gi acts transitively on
Λi , we can find τ ∈ Gi such that τ (λj  ) = λj . Therefore
⎛ ⎞
d−1 d−1
τ (w) cj vj ⎠ = τ (cj )τ (vj ).
j=0 j=0

Since τ fixes Q, we have τ (w)  and

 = w,
0 = w = (cj vj − τ (cj ) τ (vj )) .

But this is a contradiction to linear independence of the vj , because the vec-
tor τ (vj  ) is collinear with vj , and the coefficient of vj in the above expression
is cj − τ (cj  ) = 0. 

Since we are only interested in distance to the nearest integer vector, we

can think of the action of A as an endomorphism TA on the d-dimensional
torus Td . Let w  = (G0 , . . . , Gd−1 )T ∈ Zd . Assume that α ∈ R \ {0} is
such that |||αw|||
 → 0; that is, TAn (αw  − z) → 0, where z is the integer
vector such that αw  (mod 1) = αw  − z. By projecting αw  along the stable

eigenspace onto the local unstable eigenspace of 0 (assuming there is no
neutral eigenspace), the image is a homoclinic4 point of TA .
Let w  = vj and z =
j cj  vj be the linear decompositions into
j dj 
(generalized) eigenvectors  of A. We try to scale w  in such a way that αw 
(mod 1) = αw  − z = j (αc j − d j )
v j has only contributions from stable
(generalized) eigenvectors. If the matrix A is Pisot, this is easy, because we
only have to select α = d0 /c0 to eliminate the contribution of v0 . In general,
we have

Lemma 8.15. Suppose that w  := (G0 , . . . , Gd−1 )T = j cj vj is a non-zero
 Then α ∈ R satisfies (8.1) if and only+if there is an integer
integer vector.
vector z = j dj vj such that α = dj /cj for all λj ∈ Λ .
It follows from Lemma 8.14 that all cj , dj ∈ Λi are either all non-zero or all
 z ∈ Hi⊥ ).
zero (namely, if w,
4 Meaning that x = 0 but T n (x) → 0 both as n → ∞ and n → −∞.
8.1. Pisot and Salem Numbers 375

 − z =
Proof. We have αw j (αcj − dj )vj . If α = dj /cj for all λj ∈ Λ+ ,
/// ///
/// ///
// / ///
|||A (αw)|||
 = |||A (αw
n // /
 − z)||| = ///A n
(αcj − dj )vj //////
/// λj ∈Λ ///
/// ///
/// ///
/// ///
/ /
= //// n ///
(αcj − dj )λj vj /// → 0.
///λj ∈Λ− ///
 → 0, then αw
Conversely, if |||An αw|||  (mod 1) belongs to the stable manifold
of the toral endomorphism TA : T → Td induced by A. Hence αw−
d  z belongs
to the stable manifold of −
0 for some z ∈ Zd , and in the decomposition αw

z = j (αcj − dj )vj all the coefficients belonging to non-stable eigenvectors
are zero. Hence α = dj /cj for all λj ∈ Λ+ . 
Lemma 8.16. Assume that all λj ∈ Λ+ have multiplicity one. The coeffi-
cients cj and dj in the previous lemma belong to the field extension Q(λj )
for every λj ∈ Λ+ . In particular, α ∈ λj ∈Λ+ Q(λj ).

Proof. Since Avj = λj vj , the eigenvector can be scaled such that vj ∈
Q(λj )d . Note also that the matrix V with these eigenvectors as columns
satisfies V DV −1 = A, where D is the Jordan matrix with eigenvalues λj on
the diagonal. Since we assumed that all λj ∈ Λ+ have multiplicity one, all
the corresponding Jordan blocks are trivial.
Then also (V −1 )T DV T = AT and (V −1 )T has the eigenvectors of AT
as columns. Because the transpose matrix AT has the same eigenvalues
λ0 , . . . , λj , the same argument as before gives that these eigenvectors belong
to Q(λj ), respectively. Hence, the j-th row of V −1 belongs to Q(λj ).
Now w  = V c, so c = V −1 w;
 in other words, the j-th component of c
satisfies cj ∈ Q(λj ). Similarly dj ∈ Q(λj ) and hence α = dj /cj ∈ Q(λj ).
Since this is true for all λj ∈ Λ+ , we find α ∈ λj ∈Λ+ Q(λj ). 
Remark 8.17. Note that this proof gives that the j-th and j  -th rows of
V −1 are obtained from each other by replacing every λj by λj  .
Now we can finished the proof of Theorem 8.10.

Proof of Theorem 8.10. Choose i such that Λi ∩ Λ+ = ∅ and w  ∈/ Hi⊥ ,

and choose λj ∈ Λi ∩ Λ+ . Since cj ∈ Q(λj ) by Lemma 8.16, there is a
polynomial fi ∈ Q[x] of lowest degree such that cj = fi (λj ). Take λj  ∈
Λi ∩ Λ+ and τ ∈ Gi such that τ (λj  ) = λj . Since qi is irreducible, the Galois
group Gi acts transitively on Λi , so such a τ exists. Recall from the proof of
Lemma 8.16 that c = V −1 w,
 and by Remark 8.17,
cj  = τ (cj ) = τ (f (λj )) = f (τ (λj )) = f (λj  ).
376 8. Miscellaneous Background Topics

By the same reasoning, there is a polynomial f˜i ∈ Q[x] of lowest degree such
that dj = f˜i (λj ), and dj  = f˜i (λj ) for each λj  ∈ Λi ∩ Λ+ . Therefore
dk f˜(λk )
α= = for all λj ∈ Λi ∩ Λ+ .
ck f (λk )
Note that the splitting field Ki of the factor qi (x) is a finite separable field
extension, and by the Primitive Element Theorem (see [36, Chapter 2, The-
orem 27]), there exists γi ∈ Ki such that Ki = Q(γi ). Let mi (x) be the
minimal polynomial of γi over Q. Then by [525, Theorem 5.12] there is
an isomorphism Q[x]/∼ → Q(γ), where Q[x]/∼ denotes the quotient field
of polynomials in Q[x], where r1 ∼ r2 if mi (x) divides r1 (x) − r2 (x). This
isomorphism is realized by [r(x)] → r(γ), where [r(x)] is the equivalence
class of r(x) in Q[x]. This isomorphism tells us that Q[x]/∼ is a field of
polynomials, so the quotient f˜i /fi can be expressed in Q[x]/∼ as a single
polynomial gi . This concludes the proof. 

8.2. Continued Fractions

Continued fractions5 evolved from the Euclidean algorithm to find the great-
est common divisor of two integers, or more precisely the real-valued analog
of this algorithm. Given two sticks Sp and Sq , of lengths p and q, we are
asked to determine the proportion θ1 := p/q. Assume p < q. Cut length p
off the stick Sq and see if it is still longer than Sp . If yes, cut again length p
off the stick Sq and repeat. If Sp is longer than Sq , then reverse the roles of
Sp and the remaining bit of Sq .
In other words, assume length p is cut off a1 ∈ N times from Sq . Then
Sq has remaining length q − a1 p < p, and we reverse the roles of Sp and Sq .
Then a1 p ≤ q < (a1 + 1)p, so
1 1
< θ1 ≤
a1 + 1 a1
gives a first approximation of p/q. In reality, θ1 = a1 +θ 2
for θ2 =
(Q − a1 P )/P , the proportion of the lengths of the remainder of stick Sq
and Sp . Let a2 ∈ N be the number of times the new length of Sp is cut off
from Sp , etc. Then a21+1 < θ2 ≤ a12 , so
1 1
1 ≤ θ1 < .
a1 + a2 a1 + a21+1
Continuing this way, as long as ai = 0, we obtain
1 1 1 1
1 < 1 < · · · < θ1 < · · · < 1 < .
a1 + a2 a1 + a + 1 a1 + a 1 a1
2 2+ a
a3 + a1 3

5 See [98, 175, 360] for some of the many general references on continued fractions.
8.2. Continued Fractions 377

In the limit, we find

θ1 = [0; a1 , a2 , a3 , . . . ] := lim 1 .
n→∞ a1 + a2 + 1
an−2 +
an−1 + a1

Every irrational has a unique continued fraction expansion, but every ra-
tional θ1 ∈ (0, 1) has two (finite) continued fraction expansions, namely
[0; a1 , . . . , an ] and [0; a1 , . . . , an − 1, 1] for an ≥ 2.

Exercise 8.18. An alternative to the Euclidean algorithm uses only addition

and division by 2. Let p, q ∈ N, q odd, and iterate

  ,q if p is even,
f : (p, q) → (p , q ) =  2p+q 
2 ,p if p is odd.

Show that there is n ≥ 0 such that f n (p, q) = (gcd(p, q), gcd(p, q)).

Define recursively

p0 = 0, p1 = 1, pn+1 = an+1 pn + pn−1 ,
q0 = 1, q1 = a1 , qn+1 = an+1 qn + qn−1 .

The fractions pqnn are called the convergents of θ1 . They are the best rational
approximations of θ1 and |θ1 − pqnn | ≤ qn q1n+1 as we shall see.

pn pn (a1 ,...,an )
Exercise 8.19. Denote the convergents qn = qn (a1 ,...,an ) . Show that

qn (a1 , . . . , an−1 , an ) = qn (an , an−1 . . . , a1 ),
pn (a1 , a2 . . . , an−1 , an ) = qn−1 (an , an−1 , . . . , a2 ).

In particular, pn (a1 , . . . , an ) is independent of an .

Lemma 8.20. For each z ∈ R \ {− qn−1

qn } we have

zpn + pn−1
= [0; a1 , a2 , . . . , an , z].
zqn + qn−1

Proof. The proof is by induction. For n = 1, we have

1 z zp1 + p0
[0; a1 , z] = 1 = =
a1 + z
za1 + 1 zq1 + q0
378 8. Miscellaneous Background Topics

as required. Now for the induction step,

" A
[0; a1 , . . . , an−1 , an , z] = 0; a1 , . . . , an−1 , an +
(an + z1 )pn−1 + pn−2 an pn−1 + pn−2 + z1 pn−1
= =
(an + 1z )qn−1 + qn−2 an qn−1 + qn−2 + 1z qn−1
pn + 1z pn−1 zpn + pn−1
= 1 = .
qn + z qn−1 zqn + qn−1
The induction step and hence the proof are complete. 
Remark 8.21. Relations as (8.6) and Lemma 8.20 hold for more general
continued fraction constructions; see e.g. [67, 334]. For instance, more gen-
eral continued fractions
θ = lim b2
n→∞ a1 +
a2 +
an−2 +
an−1 + a

for arbitrary sequences of non-zero integers (an ) and (bn ) have convergents
qn satisfying the recursive relations

p0 = 0, p1 = b1 , pn+1 = an+1 pn + bn+1 pn−1 ,
q0 = 1, q1 = a1 , qn+1 = an+1 qn + bn+1 qn−1 .

Beyond real numbers, we can take F[X] to be the ring of polynomials

with coefficients in a field F, such as Z/pZ for some prime, or Fq for the
prime power q = pm .
 −n , where
We extend this to formal Laurent series f = n≥−n0 fn X
fn ∈ F and there is a largest n0 ∈ Z such that f−n0 = 0. Define the integer
and fractional part of such a Laurent series as
0 ∞
[f ] = fn X n and {f } = f − [f ] = fn X −n .
n=−n0 n=1

Now we perform the recursive continued fraction algorithm:

1/{f } if f = 0,
T (f ) = and ai (X) = [T i f ], i ≥ 0.
0 if f = 0,
The result is
f = [a0 (X); a1 (X), a2 (X), . . . ] = a0 (X) + 1 .
a1 (X) + a2 (X)+ 1
8.2. Continued Fractions 379

The convergents pqnn(X)

= [a0 (X); a1 (X), a2 (X), . . . , an (X)] then satisfy (8.6)
and Lemma 8.20 (if F = Z). If X = p ∈ {2, 3, 4, . . . } and F = Z/pZ, then we
have our standard continued fractions back. If X = β > 1 is not an integer,
then we speak of the β-continuous fractions.
Theorem 8.22. The convergents { pqnn }n≥0 and more generally the Farey
convergents6 ' 5
p apn + pn−1

q aqn + qn−1 1≤n, 0≤a≤an+1
are the best rational approximations of x ∈ [0, 1] in the sense that there are
no rationals of denominator 0 < q  < q between x and pq . Moreover, for n
odd, the Farey convergents are situated as in Figure 8.1, and for n even, the
Farey convergents are situated as in the mirror image of Figure 8.1.

pn−1 pn +pn−1 2pn +pn−1 pn+1 pn+2 pn+1 +pn pn

qn−1 qn +qn−1 2qn +qn−1 = qn+1 θ qn+2 qn+1 +qn qn

Figure 8.1. Farey convergents of x for some odd n and an+1 = an+2 = 2.

Proof. For n = 1 and a ∈ {1, . . . , a2 } we have

ap1 + p0 p1 (ap1 + p0 )q1 − p1 (aq1 + q0 ) p0 q1 − p1 q0 −1
− = = = ,
aq1 + q0 q1 (aq1 + q0 )q1 (aq1 + q0 )q1 (aq1 + q0 )q1
because p0 q1 − p1 q0 = −1. Assume by induction that pn−1 qn − pn qn−1 =
(−1)n ; then for a ∈ {1, . . . , a2 },
apn + pn−1 pn (apn + pn−1 )qn − pn (aqn + qn−1 )
− =
aqn + qn−1 qn (aqn + qn−1 )qn
pn−1 qn − pn qn−1 (−1)n
(8.7) = = .
(aqn + qn−1 )qn (aqn + qn−1 )qn
In particular (apn + pn−1 )qn − pn (aqn + qn−1 ) = (−1)n . For a = an+1 , this
gives pn qn+1 − pn+1 qn = (−1)n+1 and we can continue the induction. This
proves (8.7) for all n ∈ N and 0 ≤ a < an+1 .

Next, by the proof of Lemma 8.26 below, the fraction in ( pq , pq ) of smallest

pn−1 pn
denominator is p+p
q+q  . If we apply this to qn−1 qn  θ we get as successive
rational approximations of θ the Farey convergents
pn + pn−1 2pn + pn−1 an+1 pn + pn−1 pn+1
< < ··· < = ≤ θ.
qn + qn−1 2qn + qn−1 an+1 qn + qn−1 qn+1
6 Named after the British geologist John Farey (1766–1826), who had only a minor interest

in mathematics, but his studies in number theory led to what are now called Farey fractions.
380 8. Miscellaneous Background Topics

The next Farey convergent pqn+1 +pn

n+1 +qn
∈ (θ, pqnn ) starts a new list of Farey con-
pn+2 an+1 pn+1 + pn 2pn+1 + pn pn+1 + pn pn
θ≤ = < ··· < < ,
qn+2 an+2 qn+1 + qn 2qn+1 + qn qn+1 + qn qn
as in Figure 8.1. This completes the proof. 

The sequence of integers (ai )i≥1 in the continued fraction of θ1 is found

by a simple algorithm:
1 1 1
(8.8) ai = , θi+1 = G(θi ) := − .
θi θi θi
The map G is called the Gauß map; see Figure 8.2 (right).


11 1 1
0 1 0 54 3 2 1

Figure 8.2. The Farey map and the Gauß map.

Theorem 8.23 (Gauß). The Gauß map G(x) = 1

x −  x1  preserves the
probability density g(x) = log1 2 1+x .
Proof. Clearly 0 g(x) dx = log1 2 [log(1 + x)]10 = 1. This proof just consists
of checking that g is indeed a fixed point of the transfer operator LG f (x) =
 f (y)
G(y)=x |G (y)| . It is an intriguing question how Gauß came to guess the
formula, because he left no indication of how he derived it; see Keane’s
account [350] and also [333]. 
Proposition 8.24. Given a rotation Rα , the first return map of Rα to [0, α)
is equal to the rotation R1−G(α) .

Proof. The first return map T to [0, α) is piecewise affine and invertible
because Rα is. The first return time of 0 is  α1  + 1. Taking into account the
rescaling of [0, α) to unit size, we obtain
B C    B C
1 1 1 1
T (0) = +1 α−1 =1− − = 1 − G(α).
α α α α
8.2. Continued Fractions 381

Hence T = R1−G(α) as claimed. 

More information on continued fractions and their role in ergodic theory

and dynamics can be found in e.g. [98, 175, 360].

8.2.1. Farey Arithmetic. The goal of Diophantine approximation (also

called rational approximation) in dimension one is to approximate numbers
x ∈ R as well as possible with fractions of denominators as small as possi-
ble. How rationals with smallest denominators are distributed over the unit
interval can be understood in terms of Farey arithmetic.

Definition 8.25. Two rationals pq and pq (in lowest terms) are called Farey
neighbors if there is no rational ab between them with b ≤ min{q, q  }. If pq
and q are Farey neighbors, then
p p p + p
⊕  :=
q q q + q
is called their Farey sum or mediant.

0 θ = 0.3580 . . . 1
1 1


1 2
3 3

1 2 3 3
4 5 5 4

1 2 3 3 4 5 5 4
5 7 8 7 7 8 7 5


Figure 8.3. The Farey web with convergents of θ = [0; 2, 1, 3, 1, . . . ] in boldface.

p p p
Sometimes q ⊕ q is called the Farey child of the Farey parents q
and q , but as this Farey child produces another child with either of its
parents and then again with all of its Farey children, the rationals become
a rather incestuous collection. The collection of all rationals (in [0, 1]) with
lines connecting Farey parents with children is called the Farey web; see
Figure 8.3.

In order to find the continued fraction of some θ ∈ ( pq , pq ) using this web,
perform the Euclidean algorithm as in the proof of Theorem 8.22. That is,
382 8. Miscellaneous Background Topics

to find the Farey convergents of θ, starting with pq00 = 01 and pq11 = a11 , take
Farey sums “towards θ”, i.e. with the Farey parent on the other side of θ.
Just before you cross the vertical line at θ, we have a “true” convergent; see
Figure 8.3.
Every next Farey convergent that we find in this algorithm is the Farey
sum of the previous Farey convergent and “true” convergent, and it is neigh-

bor to both of them. At some point it is the turn of pq and pq ; in fact, since

θ ∈ ( pq , pq ), at least one of them is a “true” convergent of θ.
Clearly 0 = 01 and 1 = 11 are Farey neighbors, and their Farey sum is
1 ⊕ 1 = 2 . We see that 2 is Farey neighbor to both 1 and 1 . This is no
0 1 1 1 0 1

p p
Lemma 8.26. Two rationals 0 ≤ q < q ≤ 1 are Farey neighbors if and

only if p q − pq  = 1. In this case they are both neighbors with p
q ⊕ pq as well.

Proof. Clearly p
q = 0
1 and q = 1
1 are Farey neighbors and p q − pq  =
p p
1 − 0 = 1. We continue by induction, assuming that q < q  are Farey
p p+p p
neighbors and p q − pq  = 1. Clearly p p
q < q ⊕ q  = q+q  < q  and both
(p + p )q − p(q + q  ) = p q − pq  = 1 and p (q + q  ) − (p + p )q  = p q − pq  = 1.
Note also that if p
q < b
< q and aq − pb = p b − aq  = 1, then a(q + q  ) =
b((p + p ) so a
b = q+q  , but that doesn’t contribute to the proof.
p p
Instead, it remains to check that q ⊕ q is the fraction with smallest
p p 
denominator between and Take any fraction ab ∈ ( pq , pq ). Then there is
q q .
some η ∈ (0, 1) such that
a p p p η p a p p 1−η
0< − =η − = and 0 < − = (1−η) − = .
b q q q qq  q b q q qq 
Multiply out the denominators:
0 < q  (aq − pb) = ηb and 0 < q(aq  − p b) = (1 − η)b.
Thus aq − pb and aq  − p b are positive integers, and therefore q + q  ≤

q  (aq − pb) + q(aq  − p b) ≤ b. This means that p+p
q+q  has indeed the smallest

denominator of all fractions in ( pq , pq ). 

Exercise 8.27. Consider two circles of radii 12 and centers (0, 12 ) and (1, 12 )
in the plane. They are tangent to each other and “perched” on the horizontal
axis , with base-points 0 and 1. Inscribe the maximal circle between these
two and ; it clearly touches  in base-point 12 . Continue inscribing new
circles in between  and neighboring circles; see Figure 8.4. These circles are
8.2. Continued Fractions 383

called Ford circles after the American mathematician Lester Ford (1886–

1967). Show that if the base-points of the neighboring circle are pq and pq

(in lowest terms), then the base-point of the new circle is pq ⊕ pq . Show that
the diameter of a circle with base-point pq is q12 . See [98, Chapter 9] for more
details on Ford circles.

0 1 1 2 1
1 3 2 3 1

Figure 8.4. Inscribed Ford circles with rational base-points.

Exercise 8.28. Show that if you join Farey neighbors in R by semi-circles

(hyperbolic geodesics) in the upper half-plane, then none of these semi-circles
intersect except possibly at their end-points.
Example 8.29. Besides the Farey tree, there are other tree structures to
enumerate the rationals. For instance, the Kepler tree (first appearing in
Harmonices mundi, Book III by Johannes Kepler [355]) and the Calkin-
Wilf tree7 are both rooted binary trees, with roots 12 and 11 , respectively,

p ⎨ p and q for the Kepler tree,
p+q p+q
(8.9) has descendants
q ⎩ p and p+q for the Calkin-Wilf tree;
p+q q

see Figure 8.5. The Kepler tree contains all fractions pq ∈ (0, 1) and the
Calkin-Wilf tree contains all fractions pq ∈ (0, ∞), and all the indicated
fractions appear exactly once. For both trees, going up along branches in
the tree mimics the Euclidean algorithm in the sense that the descendant is
b and the parent has denominator and numerator equal to max{a, b} and
|b − a| or vice versa.
There is a single function f of the positive rationals, called the Calkin-
Wilf function, defined as
f (x) = ,
2x − x + 1
7 The tree was introduced earlier by Jean Berstel and Aldo de Luca [73] as a Raney tree,

since they drew some ideas from a paper by George Raney [469].
384 8. Miscellaneous Background Topics

1 1
2 1

1 2 1 2
3 3 2 1

1 3 2 3 1 3 2 3
4 4 5 5 3 1 3 2

1 4 3 4 2 5 3 5 1 4 3 4 2 5 3 5
5 5 7 7 7 7 8 8 4 1 4 3 5 2 5 3

Figure 8.5. The Kepler tree and the Calkin-Wilf tree.

such that the f -orbit of 1 denumerates all rationals in the Calkin-Wilf tree
row by row.
Exercise 8.30. Express the rules (8.9) of the Kepler tree and the Calkin-
Wilf tree in terms of continued fractions. Find a function, similar to the
Calkin-Wilf function, that denumerates the rationals in (0, 1) row by row in
the Kepler tree.

8.2.2. Closest Returns for the Circle Rotation. Let θ ∈ [0, 1] \ Q, and
consider the integers q such that Rθq (0) is closer to 0 ∈ S1 than Rθi (0) for all
0 < i < q. This means that qθ is closer to some integer p than ever before,
so pq is closer to θ than ab is for each integer 1 ≤ b < q. Theorem 8.22 tells
us that this happens if pq are Farey convergents.

Proposition 8.31. Let θ = [0; a1 , a2 , a3 , . . . ] ∈ (0, 1) be irrational, and

consider the circle rotation Rθ . Then the closest return times of 0 to itself
are of the form 1 and qn−1 + aqn for n ∈ N and 0 ≤ a < an+1 .

Proof. Since q1 = 1/θ < 1/θ, the intervals

(8.10) [0, θ), [θ, 2θ), . . . , [(q1 − 1)θ, q1 θ), and [q1 θ, 1)
tile the circle, and q0 θ (recall q0 = 1) and q1 θ are closest returns of 0 to itself,
on the right and left, respectively (these points are the same if q1 = a1 = 1).
Because 0 < (q1 + 1)θ mod 1 = θ − |1 − q1 θ|, also (1 + q1 )θ mod 1 is a closest
return to the right. Since Rθ is an isometry and due to Theorem 8.22, the
next closest returns at the right are (1 + 2q1 )θ mod 1, (1 + 3q1 )θ mod 1, . . .
until (1 + a2 q1 )θ mod 1 = q2 θ mod 1; see Figure 8.6.
We then overshoot (giving a next closest return on the left) with
(q1 + q2 )θ mod 1, and continuing this way, the next closest returns on the
8.3. Uniformly Distributed Sequences 385

q1 θ 0 θ

(q1 + 2q2 )θ q2 θ = (1 + 3q1 )θ (1 + q1 )θ

(q1 + q2 )θ (q1 + 3q2 )θ (1 + 2q1 )θ

Figure 8.6. The closest returns of 0 to itself for rotation Rθ .

left are (q1 + 2q2 )θ mod 1, (q1 + 3q2 )θ mod 1, . . . until (q1 + a3 q2 )θ mod 1 =
q3 θ mod 1. In this way, we obtain all the closest returns at times indicated
in the proposition. 

8.3. Uniformly Distributed Sequences

Definition 8.32. A sequence (xn )n∈N in the circle S1 is called uniformly
distributed if
lim # {1 ≤ j ≤ n : xj ∈ J} = |J|
n→∞ n

for every subinterval J ⊂ S1 . We call (xn )n∈N well-distributed if

lim # {m + 1 ≤ j ≤ m + n : xj ∈ J} = |J|
n→∞ n
uniformly in m for every subinterval J ⊂ S1 .

This makes the fractional parts xn := {nα} for irrational α a well-

distributed sequence [557], because the rotation Rα preserves Lebesgue mea-
sure and is in fact uniquely ergodic (so the uniform convergence in the above
definition follows from Oxtoby’s Theorem 6.20). Another classical example
of a well-distributed sequence is xn = {λn } for a.e. λ > 1 (but not Pisot); see
[302]. More recent examples emerge from the α-Kakutani equidistribution8
[8, 339] and variations thereof, e.g. [11, 141–143]. A classical reference on
uniformly distributed sequence is the book by Kuipers & Niederreiter [377];
more recent is the monograph by Drmota & Tichy [218].
The following lemma shows that uniformly distributed sequences can be
used for Monte Carlo method estimates of integrals:
Lemma 8.33. A sequence (xn )n≥1 is uniformly distributed in S1 if and only
if for every continuous function f : S1 → C,
N 0
(8.11) lim f (xn ) → f (x) dx.
N →∞ N S1

8 Starting from the trivial partition P of the circle, P is obtained from P

0 n n−1 by dividing
the largest intervals of Pn−1 into two subintervals of relative lengths α and 1 − α.
386 8. Miscellaneous Background Topics

The analogous statement holds for well-distributed sequences. In other

words, Birkhoff’s Ergodic Theorem 6.13 w.r.t. Lebesgue measure applies
to uniformly distributed sequences, and hence typical orbits of Lebesgue
measure-preserving ergodic circle maps are uniformly (and well-)distributed.

Sketch of Proof. For the “only if” part, take an interval [a, b] ⊂ S1 ; uniform
distribution is then equivalent to (8.11) applied to the indicator function
1[a,b] . Next approximate 1[a,b] in L1 by continuous functions.
For the “if” part split f into its real and imaginary part (which are both
Riemann integrable), and approximate Re f and Im f by step functions. 
Theorem 8.34 (Weyl’s Criterion). A sequence (xn )n∈N is uniformly dis-
tributed if and only if
(8.12) lim e2πijxn = 0 for every integer j = 0.
N →∞ N

Proof. Since x → e2πijx is continuous with integral 0, the “only if” part
follows immediately from Lemma 8.33.
For the “if” part use Lemma 8.33 again, and approximate an arbitrary
continuous function f : S1 → C with S1 f (x) dx = 0 by Fourier series
K 2πijx . Here the Fourier coefficient f = 0 because
j=−K fn e 0 S1 f (x) dx = 0.
1 N
Now (8.12) implies that limN →∞ N n=1 f (xn ) = 0, so (xn )n∈N is uniformly
Theorem 8.35 (Van der Corput’s Difference Theorem). Let (xn )n∈N be a
sequence in S1 . If (xn+k − xn )n∈N is uniformly distributed for some fixed
integer k ≥ 1, then (xn )n≥1 is uniformly distributed.

Proof. Let (un )n≥1 ⊂ C. We claim that for all K ≤ N ∈ N,

/ /2
/N / N
2/ /
(8.13) K / un / ≤ (N + K − 1)K |un |2
/ /
n=1 n=1
K N −k
+2(N + K − 1) (K − k) Re un ūn+k .
k=1 n=1

Apply this to un = e2πijxn , so |un | = 1. Divide (8.13) by K 2 N 2 . Then

/ /2
/1 N / N +K−1
(N + K − 1)(K − k)(N − k)
/ 2πijxn /
/ e / ≤ + 2
/N / KN K 2N 2
n=1 k=1
/ /
/ 1 N −k /
/ /
×/ e2πij(xn −xn+k ) / .
/N − k /
8.3. Uniformly Distributed Sequences 387

As N → ∞, the second factor | | tends to zero for each k by Weyl’s Crite-

rion, and the first term tends to K1 . Since K is arbitrary, N1 Nn=1 e
2πijxn →

0, so again by Weyl’s Criterion, (xn )n∈N is uniformly distributed.

It remains to prove the claim (8.13). Set un = 0 for n ≤ 0 and n > N .
As can be seen from Figure 8.7,
N N +K−1 K−1
(8.14) K un = un−k .
n=1 n=1 k=0

k n = 0 1 2 ... ... N N +K −1

0 +++++++++++++++++ + + ++
1 + +++++++++++++++++ +++
++ + + + + + + + + + + + + + + + + + ++
+++ +++++++++++++++++ +
K−1 + + ++ + + + + + + + + + + + + + + + + +
un−k =0 un−k =0

Figure 8.7. Diagram explaining (8.14).

The Cauchy-Schwarz inequality gives

/ /2 / /2
/N / /N +K−1 K−1 /
/ / / /
K2 / un / = / 1· un−k /
/ / / /
n=1 n=1 k=0
/ /2
N +K−1 /K−1 /
/ /
≤ (N + K − 1) / un−k /
/ /
n=1 k=0
N +K−1
= (N + K − 1) un−k ūn−
n=1 k=0 =0
⎛ ⎞
N +K−1 K−1
= (N + K − 1) ⎝ |un−k |2 + 2 un−k ūn− ⎠ .
n=1 k=0 0≤k<≤K−1

The first sum is equal to (N + K − 1)K N |un | by (8.14).
 The second
K−1 N
sum is equal to 2(N + K − 1) j=k (K − k) n=1 un ūn+k because there
are K − k pairs 0 ≤ k <  ≤ N − 1 with difference  − k = N − .
Combining everything, we obtain the claim (8.13), ending the proof. 
388 8. Miscellaneous Background Topics

Some direct consequences of the Van der Corput Difference Theorem


Corollary 8.36. If limn xn+k − xn = α ∈

/ Q for a fixed k ∈ N, then (xn )n∈N
is uniformly distributed.

Corollary 8.37. If p(x) = dj=0 pj xj is a real polynomial with at least one
irrational coefficient, then (p(n))n∈N is uniformly distributed.

Example 8.38. In some cases, there is a dynamical alternative to Van der

Corput’s Difference Theorem. Define the skew product

f : T2 → T2 , (x, y) → (x + α, y + 2x + α).

A short computation shows that f n (0, 0) = (αn, αn2 ). Now it follows

from the unique ergodicity of f , see Proposition 6.26, that ({αn2 }) is well-
distributed. See [72] for more examples along this line.

Exercise 8.39. Use Example 8.38 to show that ({αnp })n∈N is well-distri-
buted for each p ∈ N. Conclude that ({αp(n)}) is well-distributed for every
polynomial with rational coefficients.

8.3.1. Discrepancy. Discrepancy measures the amount that averages over

finite sequences deviate from limit frequencies.

Definition 8.40. The discrepancy of a sequence (xn )n∈N ⊂ S1 is defined

/ /
/1 /
DN = sup // #{1 ≤ n ≤ N : xn ∈ [a, b)} − (b − a)// .

Analogously (meant for well-distributed sequences)

/ /
/1 /
DN = sup sup / #{k + 1 ≤ n ≤ k + N : xn ∈ [a, b)} − (b − a)// .
∗ /
k 0≤a≤b≤1 N

Clearly N1 ≤ DN ≤ D∗N ≤ 1 for all N ∈ N, and (xn )n∈N is uniformly

distributed (well-distributed) if DN → 0 (D∗N → 0) as N → ∞. Given that
(xn )n≥0 = ({nα})n≥0 is the simplest well-distributed sequence, it is natural
to ask the discrepancies for this sequence. This is an old question, going
back to Ostrowski [439] and Hecke [307], who proved that if b − a = {jα}
for some j ∈ Z, then the discrepancy is “bounded”:

sup N D∗N ([a, b]) := sup sup |#{k + 1 ≤ n ≤ k + N : xn ∈ [a, b)} − (b − a)|
N N k
8.3. Uniformly Distributed Sequences 389

is finite. Indeed, {(n − j)α − a} = {nα − a} − {jα} + 1[a,b) ({nα}), and hence
/ / / /
/N −1 / /N −1 /
/ / / /
/ 1[a,b) ({nα}) − (b − a)/ = / {(n − j)α − a} − {nα − a}/
/ / / /
n=0 n=0
/ /
/ −1 N −1 /
/ /
= / {nα} − {nα}// ≤ j
/n=−j n=N −j /

uniformly in N . The reverse direction, namely that this boundedness only

occurs for intervals of these lengths, was shown by Kesten [357]. See the
next section for more results on the discrepancy of the sequence ({nα})n≥0 .
Uniformly distributed sequences are useful to approximate integrals by
sums, not unlike Riemann sums. A central result is the Koksma inequality,
which applies to integrands f : [0, 1] → R of bounded variation Var(f ) =
sup0≤x1 <···<xn ≤1 ni=1 |f (xi ) − f (xi−1 )|:

Theorem 8.41. Let f : S1 → R have bounded variation. Then

/ /
/ 0 /
/1 N
(8.15) / f (xj ) − f (x) dx// ≤ Var(f ) D∗N .
/ j=1 S1 /

The proof below is reworked from [377, Theorem 5.1].

Proof. Without loss of generality, we can reorder the point xn such that
0 := x0 ≤ x1 ≤ · · · ≤ xN ≤ xN +1 := 1. Integration by parts and telescoping
series give
N −1 0 xn+1  0
n 1 N −1
t− df (t) = t df (t) − (f (xn+1 ) − f (xn ))
xn N 0 N
n=0 n=0
0 1
= [tf (t)]10 − f (t) dt
N −1  
n+1 1 n
− f (xn+1 ) − f (xn+1 ) − f (xn )
0 1 N −1
1 N 0
= f (1) − f (t) dt + f (xn+1 ) − f (xN ) − f (x0 )
0 N N N
N 0 1
(8.16) = f (xn ) − f (t) dt.
N 0
390 8. Miscellaneous Background Topics

We used the Stieltjes integral with df (t) instead of our usual notation f  (t)dt,
because f need not be differentiable to carry out this step. Also xnn+1 |df | =
Var(f |[xn ,xn+1 ] ). Note that
/ /
/1 /

DN = max sup / #{1 ≤ i ≤ N : xi ∈ [0, a)} − a//
n=0,...,N xn <a≤xn+1 N
/n / -/ n / /n /.
/ / / / / /
= max sup / − a/ = max max / − xn / , / − xn+1 / .
n=0,...,N xn <a≤xn+1 N n=0,...,N N N
From (8.16), we get
/0 1 / / 0 /
/ N / / N xn+1  n /
/ 1 / / /
/ f (t) dt− f (xn )/ = / t− df (t)/
0 N / / N /
n=1 n=0 xx
0 -/ n // // n //.
N xn+1
≤ max /xn − / , /xn+1 − / |df |
n=0 xn
N 0 xn+1
≤ D∗N Var(f |[xn ,xn+1 ] ) = D∗N Var(f ),
n=0 xn
as required. 

The special case that xn = {αn} for irrational α is called the Denjoy-
Koksma inequality. It applies to specific values of N .
Theorem 8.42. Let f : S1 → R have bounded variation and let p/q be a
convergent of the irrational
/ /
/ q 0 /
/1 / 1
(8.17) / f (αj) − f (x) dx / ≤ Var(f ).
/q / q
/ j=1 S1 /

Proof. Since p/q is a convergent of α, we have |α − p/q| ≤ 1/q 2 , say 0 ≤

qα − p ≤ 1/q. For each i = 1, . . . , q, the interval Ii := [ i−1 i
q , q ) contains
exactly one point xi ∈ {αj}j=0 ; see Section 8.2.2. Therefore
/ / / q 0 /
/ q−1 0 / / /
/1 /
/ f (jα) − f (x) dx / ≤ // (f (x ) − f (x)) dx
/q / /
/ j=0 S 1 / Ii
≤ |Ii | sup |f (xi ) − f (x)|
i=1 x∈Ii
1 1
≤ Var(f |Ii ) = Var(f ).
q q
This finishes the proof. 
8.4. Diophantine Approximation 391

In a symbolic context, where the letter frequencies replace the measure

of intervals, discrepancy looks as follows. Take an infinite word ρ ∈ AN and
let Ln (ρ) = {subwords of ρ of length n}. Assuming that for every a ∈ A the
letter frequencies fa (σ k (ρ)) := limn→∞ n1 |ρk+1 · · · ρk+n |a exist independently
in k, we define the discrepancy
/ /
/ |w|a /
(8.18) ∗
Dn (ρ) = sup sup / / − fa (ρ)// .
a∈A w∈Ln (ρ)

We take this formulation in order to be in accordance with Definition 8.40,

used in the theory of uniform sequences. However, many authors use
/ /
/ /
sup sup /|w|a − |w|fa (ρ)/
a∈A w∈Ln (ρ)

which differs from (8.18) by a factor n, and “bounded discrepancy” then

refers to supn nD∗n (ρ) < ∞.
The next proposition was proved in e.g. [4, 64].
Proposition 8.43. A sequence ρ ∈ AN is balanced if and only if its letter
frequencies fa exist and supn nD∗n (ρ) < ∞.

Proof. If ρ has bounded discrepancy, say R := 2 supn D∗n (ρ), then

/ / / / / /
/ / / / / /
/|v|a − |w|a / ≤ / |v|a − |v|fa / + / |w|a − |w|fa / ≤ 2nD∗n (ρ)
for all a ∈ A and v, w ∈ Ln (ρ). That is, ρ is R-balanced.
Conversely, assume that ρ is R-balanced. Fix a ∈ A and let Kn ∈ N be
such that Kn ≤ |w|a ≤ Kn + R for all w ∈ Ln (ρ). If |w| = pq, then we have
pKq ≤ |w|a ≤ pKq + R and qKp ≤ |w|a ≤ qKp + R.
This gives
(8.19) −pR ≤ pKq − qKp ≤ qR.
/ /
/K K / K
Dividing this by pq, we find / qq − pp / ≤ min{ Rp , Rq }, so { pp }p∈N is a Cauchy
sequence; call the limit fa . From (8.19) we also derive −R ≤ Kp −p qq ≤ R pq ,
and taking the limit as q → ∞ we get −R ≤ Kp − pfa ≤ 0. Therefore
||w|a − nfa | ≤ R for all n ∈ N and w ∈ Ln (ρ). This holds for all a ∈ A, so
supn nD∗n (ρ) < ∞ 

8.4. Diophantine Approximation

Diophantine9 approximation refers to the process of finding the closest ratio-
nals (preferably of smallest denominator) to irrational numbers or vectors.
9 Diophantus of Alexandria lent his name to this, yet his work was about finding exact integer

and rational solutions to systems of linear equations, and he didn’t go for approximations.
392 8. Miscellaneous Background Topics

The error in the approximation is usually expressed in terms of the denomi-

nator of the approximating fraction. Dirichlet proved the following theorem,
using the pigeon hole principle:
Theorem 8.44 (Dirichlet’s Theorem). For every irrational x ∈ R there are
k > 0 and infinitely many co-prime integers such that
/ p //
/ 1
(8.20) /x − / < 2 .
q kq

We write k(x) = sup{k ∈ R : (8.20) holds}; it turns out that k(x) ≥ 5
for all x.

Proof. Take N ∈ N arbitrary and let P = {0, {x}, {2x}, . . . , {(N − 1)x}, 1}
⊂ [0, 1], where {x} denotes the fractional part of x. Since #P = N + 1, by
the pigeon hole principle at least two of them must have a distance ≤ 1/N .
Suppose these two points are {mx} and {nx}. Then
/ / / /
1 / / / /
≥ /{mx} − {nx}/ = /(mx − m ) − (nx − n )/ = |qx − p|
/ /
/ /
for q = |m − n| < N and p = |m − n |. Therefore /x − pq / ≤ N1q ≤ q12 . Similar
arguments hold if the two points are 0 and {mx}, or {nx} and 1. 

Dirichlet’s Theorem follows also from the properties of continued frac-

tions, because if pqn−1
< x < pqnn are convergents of x, then
/ / / /
/ / / /
/x − pn−1 / < / pn − pn−1 / = 1
< 2 .
/ qn−1 / / qn qn−1 / qn qn−1 qn−1
This suggests (as is indeed true) that the irrational√ x that requires the small-
est such k(x) is the golden mean γ = 2 (1 + 5) with continued fraction

γ = [1; 1, 1, 1, 1, . . . ]. We have qn ∼ γqn−1 in this case and√k(γ) = 5. The
suprema k(x) for these x ∈ R form a closed subset L ⊂ [ 5, ∞) called the
Lagrange spectrum. In fact, the first values of L are all isolated:
√ √ 1√
k1 = 5 < k2 = 2 2 < k3 = 221 < · · · .
These (ki ) form an increasing sequence of the form ki = 9 − 4/zi2 for
some specific increasing integer sequence10 (zi )i∈N starting with 1, 2, 5, . . . ,
as proved by Hall [297] in 1947. We have limi ki = 3, and after 3, there
are more gaps in L but their structure is not yet understood completely.
10 To be precise, the z ’s are the third components of Markov triples (x , y , z ) which are
i i i i
positive integers satisfying x2i + yi2 + zi2 = 3xi yi zi . Starting with (1, 1, 1) as a root of a tree, they
can all be found applying the transformation f (x, y, z) = (3 − yz, x, y) on permutations of the
previous triple. See [192, 396, 411, 412] and references therein for more information on recent
development on Lagrange spectra.
8.4. Diophantine Approximation 393

However, Freiman [257] proved in 1975 that the maximal line (called Hall
line) that L contains is [k∗ , ∞) for

2222221564096 + 283748
k∗ = ≈ 4.52782956616 . . . .
The numbers x ∈ R for which k(x) in (8.20) is finite are irrationals of
bounded type, i.e. with a bounded sequence (ai )i∈N in their continued frac-
tion expansion. One special class of these is the quadratic numbers, i.e.
irrationals solutions to quadratic equations ax2 + bx + c = 0 for a, b, c ∈ Z,
because these have an eventually periodic sequence (ai )i∈N . This is La-
grange’s11 Theorem; see [385].
Theorem 8.45. An irrational x is a quadratic number if and only if the
sequence (ai )i∈N of partial quotients (i.e. digits in its continued fraction ex-
pansion) is (eventually) periodic.

The numbers of constant type, i.e. the numbers

a + a2 + 4
(8.21) xa = [a; a, a, a, a, . . . ] =
for a fixed a ∈ N, are called the
√ metallic means because a = 1 gives the
golden mean γ, and x2 = 1 + 2 is called the silver mean; see Section 8.1.
Just as the golden mean is the limit ratio of Fibonacci numbers, the silver
mean is the limit ratio of the Pell numbers 1, 2, 5, 12, 29, . . . , pn = 2pn−1 +
pn−2 , . . . .
Numbers of bounded type are also called badly approximable num-
bers, because they have k(x) < ∞.
Definition 8.46. An irrational number x ∈ R is called Diophantine of
order ν ≥ 0 if there is an  > 0 such that
/ p //
/ 1
(8.22) / x − / ≥ 2+ν for all p, q ∈ Z, q = 0.
q q
Irrational numbers that are not Diophantine for any finite order ν are called
Liouville numbers.

Thus the numbers of bounded type are Diophantine of order 0. Further-

more, since
/ / /
/ pn−1 // // pn pn−1 // 1 1
/x − /≤/ − ≤ ≤ ,
qn−1 qn qn−1 / qn qn−1 2
an qn−1
we conclude that Diophantine numbers of order ν have partial quotients
satisfying an ≤ qn−1
ν / for all n ∈ N.

11 Or rather, one of his many theorems.

394 8. Miscellaneous Background Topics

Proposition 8.47. Diophantine numbers are a meager set of full Lebesgue


Proof. Write
p 1 p 1
Qk,ν = − 2+ν , + 2+ν .
p q kq q kq

Clearly every R \ Qk,ν is nowhere dense,

 and because every Diophantine
number belongs to the countable union k∈N R \ Qk,1/k , they are contained
in a meager set.
On the other hand, there are q + 1 rationals in [0, 1] with denominator
q, and the length intervals ( pq − kq2+ν
, pq + kq2+ν
) summed over p = 0, . . . , q
are less than k2 q −(1+ν) . Assuming ν > 0, this in summable in q, so by the
Borel-Cantelli Lemma, Lebesgue-a.e. x ∈ [0, 1] belongs to
p 1 p 1
− , +
q kq 2+ν q kq 2+ν

for at most finitely many q ∈ N. But every x ∈ [0, 1] with this property is

Thue [532] proved that algebraic numbers of degree

√ d ≥ 2 are Diophan-
tine of order d−2 (improved by Siegel [505] to order 2 d−1 for d sufficiently
large). It turns out that algebraic numbers are in fact Diophantine of any
positive order ν and therefore are hardly better approximable than quadratic
numbers. This was proved by Roth [476] and sharpened by Davenport &
Roth [188] in 1955:

Theorem 8.48 (Roth’s Theorem). For every algebraic irrational x ∈ R and

every ν > 0, there are only finitely many integer solutions to
/ p //
/ 1
(8.23) /x − / < 2+ν .
q q
Remark 8.49. If x is transcendental, this doesn’t imply that (8.23) has
infinitely many integer solutions. For instance, all numbers of bounded type
(and there are uncountably many such numbers, but only countably many
algebraic numbers) have only finitely many solutions of (8.23). An example
of a transcendental number of unbounded type for which (8.23) has only
finitely many solutions is e = 2.71828182846 . . . . This number has a very
regular continued fraction expansion (see [163]):

e = [2; 1, 2, 1, 1, 4, 1, 1, 6, 1, 1, 8, 1, 1, 10, 1, 1, . . . ];
8.5. Density and Banach Density 395

that is, a0 = 2, a3i−1 = 2i, and ai = 1 otherwise. For the convergents we

then have
/ / / /
/ p / / p3i−2 p3i−1 / 1 1
/e − 3i−2 /∼/ − /= ≥ 2 .
/ q3i−2 / / q3i−2 q3i−1 / q3i−2 q3i−2 q3i−2 log q3i−1
In a similar gist, the continued fraction expansion [a0 ; a1 , a2 , . . . ] of Liouville
numbers must satisfy lim supn n1 log an > 0.
Remark 8.50. A stronger conjecture than Roth’s Theorem is due to Lang:
for every algebraic number x and ε > 0, the equation |x − pq | < q2 log(q)

has only finitely many integer solutions.

Quoting from [377, Theorem 3.2-3.4], we have the following results on

the discrepancies of the sequence ({αn})n∈N :
Theorem 8.51. If α is Diophantine of order ν, then for every ε > 0 there
is Cε such that
− 1+ν
D∗N ({αn}) ≤ Cε N − 1+ν +ε
but D∗Nj ({αn}) ≥ Cε Nj
along some subsequence (Nj )j∈N .
If α = [0; a1 , a2 , a3 , . . . ] is of bounded type, say K := supi ai , then
3 + log1 γ + log(K+1)
log N
D∗N ({αn}) ≤ ,
where γ is the golden mean.

It follows from Proposition 8.31 that for any α, D∗q ({αn}) ∼ 1

q if p/q is
a convergent of α.

8.5. Density and Banach Density

Definition 8.52. Given a subset E ⊂ N, the quantities

⎨d(E) := lim inf n→∞ 1 #(E ∩ {1, . . . , n}),
⎩d(E) := lim sup 1
∩ {1, . . . , n})
n→∞ n #(E

are called the lower and upper density of E. If they coincide, then d(E) :=
limn n1 #(E ∩ {1, . . . , n}) is the density of E.
More generally, the lower Banach density and upper Banach den-
sity of E are

⎨d∗ (E) := lim inf m,n→∞ 1 #(E ∩ {n + 1, . . . , n + m}),
⎩d∗ (E) := lim sup 1
∩ {n + 1, . . . , n + m}).
m,n→∞ m #(E
396 8. Miscellaneous Background Topics

If they coincide, then d∗ (E) := limn n1 #(E ∩ {n + 1, . . . , n + m}) is the

Banach density of E.
Lemma 8.53. Let (ai )i≥0 be a bounded non-negative sequence of real num-
bers. Then limn n1 ni=1 ai = 0 if and only if there is a sequence E of zero
density in N such that limEn→∞ an = 0.

Proof. ⇐: Assume that limEn→∞ an = 0, and for ε > 0, take N such that
an < ε for all E  n ≥ N . Also let A = sup an . Then
n n n
1 1 1
0 ≤ ai = ai + ai
n n n
i=1 Ei=1 Ei=1
N A + (n − N )ε A
≤ + #(E ∩ {1, . . . , n}) → ε,
n n

as n → ∞. Since ε > 0 is arbitrary, limn n1 ni=1 ai = 0.
⇒: Let Em = {n : an ≥ m }.
Then clearly E1 ⊂ E2 ⊂ E3 ⊂ · · · and each
Em has density 0 because
n n
1 1 1
0 = m lim ai ≥ lim 1Em (i) = lim #(Em ∩ {1, . . . , n}).
n→∞ n n→∞ n n→∞ n
i=1 i=1

Now take 0 = N0 < N1 < N2 < · · · such that n1 #(Em ∩ {1, . . . , n}) < 1
m for
every n ≥ Nm−1 . Let E = m (Em ∩ {Nm−1 + 1, . . . , Nm }).
Then, taking m = m(n) maximal such that Nm−1 ≤ n,
# (E ∩ {1, . . . , n})
1 1
≤ #(Em−1 ∩ {1, . . . , Nm−1 }) + #(Em ∩ {Nm−1 + 1, . . . , n})
n n
1 1
≤ #(Em−1 ∩ {1, . . . , Nm−1 }) + #(Em ∩ {1, . . . , n})
Nm−1 n
1 1
≤ + →0
m−1 m
as n → ∞. 
Corollary 8.54. For every non-negative sequence (an )n≥0 ,
n n
1 1
lim ai = 0 if and only if lim a2i = 0.
n→∞ n n→∞ n
i=1 i=1
Proof. By the previous lemma, the average limn n1 i=1 ai = 0 if and only
if limEn→∞ an = 0 for a set E of zero density. But the latter is clearly
equivalent to limEn→∞ a2n = 0 for the same set E. Applying the lemma

again, we arrive at limn n1 ni=1 a2i = 0. 
8.5. Density and Banach Density 397

Definition 8.55. Given a subset E ⊂ N, the quantities

n n
1 1 1 1
δ(E) := lim inf and δ(E) := lim sup
n→∞ log n i n→∞ log n i
Ei=1 Ei=1
are called the logarithmic lower density and logarithmic
 upper den-
sity of E. If they coincide, then δ(E) := limn log1 n nEi=1 1i is the loga-
rithmic density of E.

The following relation between density and logarithmic density follows

from Abel’s summation formula:
Lemma 8.56. For every set E ⊂ N we have
d(E) ≤ δ(E) ≤ δ(E) ≤ d(E).

Proof. Let (aj )j≥1 be a sequence in C and set A(x) := 1≤j≤x aj . Note
that A(j) − A(j − 1) = aj and A(j) = A(t) for t ∈ [j, j + 1), 1 ≤ j ≤ n = x.
Abel’s summation formula states that for every C 1 function f ,
0 x
(8.24) aj f (j) = A(x)f (x) − A(t)f  (t) dt.
1≤j≤x 1

This follows because

aj f (j) = A(n)f (n) − A(j)(f (j + 1) − f (j))
1≤j≤x j=1
=− A(j)(f (j + 1) − f (j)) − A(n)(f (x) − f (n)) + A(x)f (x)
n−1 0 j+1 0 x

= A(x)f (x) − A(j)f (t) dt − A(n)f  (t) dt
j=1 j n
0 x
= A(x)f (x) − A(t)f  (t) dt.
We apply (8.24) to aj = 1j 1E (j) and f (t) = 1t , so A(n) = #{1 ≤ j ≤ n :
j ∈ E} and
x 0 x
1 A(x) A(t)
(8.25) L(x) := = + dt.
j x 1 t2

Take ε > 0 arbitrary, and find x0 so large that d(E) − ε ≤ A(x)x ≤ d(E) + ε
for all x ≥ x0 . Substitute these inequalities into the integral of (8.25) to
obtain 0 x
d(E) − ε
L(x) ≥ dt ≥ (d(E) − ε)(log x − log x0 )
x0 t2
398 8. Miscellaneous Background Topics

and, since ≤ 1 for all t,
x0 0 0 x
1 d(E) + ε
L(x) ≤ 1+ dt+ dt ≤ 1+log x0 +(d(E)+ε)(log x−log x0 ).
1 t x0 t
Divide these inequalities by log x and take the limit x → ∞. Finally letting
ε → 0 gives the result. 

Exercise 8.57. Let E = n≥1 {k ∈ N : 22n ≤ k < 22n+1 }. Show that
1 1 2
= d(E) < = δ(E) < = d(E).
3 2 3

8.6. The Perron-Frobenius Theorem

Theorem 8.58 (Perron-Frobenius). Let A be a primitive12 non-negative
N × N -matrix. Then the following hold:
(a) There is a real positive eigenvalue λ (called the leading or Perron-
Frobenius eigenvalue) of algebraic multiplicity one, such that λ >
λ̃ for every other eigenvalue λ̃ of A.
(b) The eigenvector (left and right) associated to λ can be chosen to be
strictly positive.
(c) If λw ≥ Aw coordinate-wise for some non-negative vector w, then
w is a multiple of the right eigenvector associated to λ.

Proof. Let C = RN ≥0 be the non-negative N -dimensional quadrant, i.e. the

one-sided cone of non-negative vectors. Since A is non-negative, AC ⊂ C,
and because Am > 0, Am C ⊂ C ◦ ∪ {0}, by which we mean that every non-
zero vector in C is mapped into the interior of C by Am . Define the simplex
S = {x ∈ C : &x&1 = 1} spanned by the unit vectors ei , and let the map
f : S → S be defined by f (x) = Ax/&Ax&1 . Since Am > 0, it is impossible
that Ax = 0 for x ∈ S, so f is well-defined. Although non-linear, the map
f is convex: it sends convex subsets of S to convex subsets and extremal
points to extremal points. Applying this to Πn := nk=0 f k (S), we conclude
that (Πn )n≥0 is a nested sequence of convex sets with f n (ei ), i = 1, . . . , N ,
as extremal points. This carries over to the limit Π := n Πn as well; note
that Π ⊂ S ◦ because Am > 0. We can select a subsequence (nj ) such that
f nj (ei ) → pi are the extremal points of Π. This is a finite set, invariant under
f , so there is M such that each pi is fixed by f M and therefore an eigenvector
of AM associated to a positive eigenvalue. By reordering the pi , we can
assume that the corresponding eigenvalues of AM are λ1 ≥ λ2 ≥ · · · ≥ λN .

12 See Definition 3.6. We emphasize that A need not be an integer matrix in this theorem.
8.6. The Perron-Frobenius Theorem 399

(1) If λ2 = λ1 and p1 = p2 , then we can find v = α1 p1 + α2 p2 ∈ ∂C.

This is also an eigenvector of AM , so AkM v ∈ ∂C for all k, but this
contradicts that Am C ⊂ C ◦ ∪ {0}.
(2) If λ2 < λ1 , then take v = p2 − εp1 ∈ C (for ε > 0 sufficiently small),
and note that AkM v = λk2 p2 − ελk1 p1 cannot be contained in C for
all k. This contradicts again the invariance of C. Hence, M = 1, all
pi coincide, and it is the unique fixed point of f .
(3) To show that λ1 has multiplicity one, assume by contradiction that
there is a generalized eigenvector v ∈ S with Av = λ1 v + p1 . Then
also Ak v = λk1 v + kλ1 p1 . Take w = p1 − εv ∈ C for some
small ε > 0. Then Ak w = λk−1 1 (λ1 − εk)p1 − ελ1 v which cannot be

contained in C for large k. This again contradicts that Ak C ⊂ C for

all k ≥ 0.
(4) Finally, suppose that λ̃ is some eigenvalue of A, not necessarily
associated with an eigenvector in S, such that |λ̃| ≥ λ1 . There is
an A-invariant subspace V (possibly of dimension two if λ̃ ∈ C \ R)
such that A : V → V is the composition of an isometry and a
dilatation by a factor |λ̃|. In particular, there is a subsequence
(kj ) such that |λ̃|−kj Akj v → v for every v ∈ V . Take v ∈ V
so that w := v + p1 ∈ ∂C. If |λ̃| = λ1 , then |λ̃|−kj Akj w → w,
contradicting that Ak C ⊂ C ◦ ∪ 0 for all k ≥ m. If |λ̃| = λ1 , then
|λ̃|−kj Akj w → v, again contradicting that Am C ⊂ C ◦ ∪ {0}. Hence
all other eigenvectors of A are strictly smaller than λ1 .
For item (c), let v be the right eigenvalue associated to λ and assume all its
coordinates are positive. Take u = αv − w, for α > 0 minimal such that
uj = 0 for some j, but u ∈ C. If u = 0, then there is nothing to prove.
Otherwise, Au ≤ u coordinate-wise and Ak u ≤ λk u coordinate-wise for all
k. But then (Ak u)j ≤ 0 for all k, contradicting that A eventually maps C
strictly into itself. 

In this proof, the uniqueness of the eigenvector in the positive quadrant

follows from the fact that n f n (S) is a single point. If we don’t iterate
the same matrix but different matrices A(i) every time, i.e. we replace f by
non-autonomous iteration f1 ◦ f2 ◦ f3 ◦ · · · where fi (x) = A(i)x/&A(i)x&1 ,
then it is possible that

(8.26) S∞ := f1 ◦ f2 ◦ · · · ◦ fn (S) is a non-singleton convex set.

The non-unique ergodicity of e.g. various S-adic transformations and

Bratteli-Vershik systems, see Section 6.3.3, rely on this: the different ergodic
measures correspond to the extremal points of S∞ ; see e.g. Keane [348] (in
400 8. Miscellaneous Background Topics

Section 6.3.5) or [121, 168, 169]. This is also why rank r transformations
(namely if all the A(i) area di × di -matrices with di ≤ r) have at most r
ergodic measures. The following example shows that this upper bound r is
cn 1 
Lemma 8.59. For n ≥ 1, let A(n) = , with 1 1
n≥1 cn +1 < 2 .
1 cn
Then S∞ in (8.26) is a non-degenerate arc.

Proof. Let (a1 , b1 ) = (1, 0) and inductively (an+1 , bn+1 ) = (an , bn )A(n).
Set λn = an /(an + bn ) ∈ [0, 1], so λ1 = 1. If we parametrize the simplex by
S = {(t, 1 − t) : t ∈ [0, 1]}, then
cn λn + 1 − λn 1 − 2λn 1
λn+1 = fn (λn ) = = λn + ≥λ− .
cn + 1 cn + 1 cn + 1

Therefore (λn )n≥1 is a decreasing sequence with limit λ∞ ≥ 1− n≥1 cn1+1 >
2 by assumption. By symmetry, the same procedure starting with (a1 , b1 ) =
(0, 1) produces an increasing sequence with limit 1 − λ∞ < 12 . Therefore S∞
is the non-degenerate arc {(t, 1 − t) : t ∈ [1 − λ∞ , λ∞ ]}. 

Recall that a cone C is a subset of a linear space that is closed under

addition and multiplication by non-negative scalars. The Hilbert metric
is defined on a cone C as
inf{μ > 0 : μv − w ∈ C}
(8.27) ΘC (v, w) = log , v, w ∈ C.
sup{λ > 0 : w − λv ∈ C}
More precisely said, ΘC is a pseudo-metric but a metric on projective
space because ΘC (αv, βw) = ΘC (v, w) for all α, β > 0, and ΘC (v, w) = 0
if and only if w is a scalar multiple of v (i.e. 0, v, and w are collinear). If
T : C → C  is a linear map, then, for all non-collinear v, w ∈ C,
ΘC  (T v, T w)
(8.28) ≤ tanh(Diam /4), for Diam = sup ΘC  (x, y).
ΘC (v, w) x,y∈T C

This means that T is a strict contraction in Hilbert metric, provided

T (C \ {0}) belongs to the interior of C  .
Remark 8.60. A more general definition of Hilbert metric applies to open
bounded convex sets C in a normed space. Given x = y ∈ C, we can extend
the line joining x and y until it intersects ∂C in the points a and b (where a
is closer to x and b closer to y). Then
&x − b& &y − a&
ΘC (x, y) = log ≥ 0.
&x − a& &y − b&
If C  ⊂ C is another open convex set and x, y ∈ C  , then ΘC  (x, y) ≥ ΘC (x, y).
8.7. Countable Graphs and Matrices 401

If T : Rd≥0 → Rd≥0 has the matrix representation A = (ai,j )d,d
i=1,j=1 , then
Diam(T (Rd≥0 )) = 2 log ρ(A) for
max{ak,j /ak,j  : 1 ≤ k ≤ d}
(8.29) ρ(A) := max ,

1≤j,j ≤d min{ak,j /ak,j  : 1 ≤ k ≤ d}

and this is finite if and only if A is strictly positive. In this case

√ 9
ρ − 1/ρ ρ2 + 1 − 2ρ 1
(8.30) tanh(Diam(T (R≥0 ))/4) = √ 9 = 2−1
∼ (ρ − 1),
ρ + 1/ρ ρ 2

as ρ → 1. In particular, if d = d, i.e. T maps the cone Rd≥0 into itself, and its
matrix representation is strictly positive, then n T n (Rd≥0 ) is a single half-
line, and the convergence to this half-line is exponential. More generally, we
have the following result (see [121, Proposition 3]):

Lemma 8.61. Let (Cn )n≥n be a sequence of cones and Tn : Cn+1 → Cn are
linear transformations
 with matrix representations An such that ρn := ρ(An )
satisfies ∞ n=1 1/ρn = ∞. Then n T1 ◦T2 ◦· · ·◦Tn (Cn+1 ) is a single half-line.

√ √
ρn −
Proof. By (8.30), the contraction factor of Tn is √ √1/ρn ∼ 1 − 2/ρn .
& ρn + 1/ρn
 the infinite product of contraction factors n 1 − 2/ρn = 0 provided
n 1/ρn = ∞. This proves the lemma. 

8.7. Countable Graphs and Matrices

In this section, we look at the analogue of the Perron-Frobenius theory for
countable graphs. Some main studies were done in connection to countable
Markov chains in [268, 300, 363, 463, 540], by Vere-Jones in the 1960s [544,
545] (whose approach we largely follow), and later by Salama [484] and
Ruette [481]. See also the surveys [493] and [364, Chapter 7].
Let us consider a matrix A = (mij )i,j∈I , where the index set I is count-
ably infinite. (If I is finite, then the trivial versions of the below results
hold.) If the ordinal type of I is that of N or Z, then the matrices look like
⎛ ⎞
⎛ ⎞ .. .. .. .. ..
1 2 3 ... ⎜ . . . . . ⎟
⎜ ⎟ ⎜ . . . −1 0 −1 . . .⎟
A = ⎝2 3 4 . . .⎠ and A = ⎜ ⎜
.. .. .. . . ⎝ . . . −2 0 −2 . . .⎟ ⎠
. . . . . .. .. .. . .
. . . . . .
402 8. Miscellaneous Background Topics

and other ordinal types of I can be recounted to N or Z. The powers An are

denoted as (ai,j )i,j∈I . The matrix A will be called
• irreducible, if for each pair of indices i, j there exists a positive
integer n such that aij > 0 and
• aperiodic, if for each index i ∈ I the value gcd{ : aii > 0} = 1.
Proposition 8.62. Let A = (aij )i,j∈I be a non-negative irreducible ape-
riodic matrix indexed by a countable index set I. There exists a common
value λA such that for each i, j
(n) 1 (n) 1
(8.31) lim [aij ] n = sup[aii ] n = λA .
n→∞ n∈N
For any value r > 0 and all i, j ∈ I
• the power series n∈N aij rn are either all convergent or all diver-
• either all or none of the sequences (aij rn )n tend to zero as n → ∞.

The number λA defined by (8.31) is called the Perron eigenvalue of

A. We will assume throughout that A = (aij )i,j∈I is a non-negative irre-
ducible aperiodic matrix and its Perron value λA is finite. The fact that the
convergence and divergence of power series occur simultaneously for all i, j
(n) (n ) 1 −n2 (n2 )
follows because aij ≥ aik 1 an−n
k,l al,j for all states i, j, k, l, and n ∈ N.
Associated to A = (aij )i,j∈I is a directed graph G = G(A) = (I, E ⊂
I × I) containing aij edges from i to j. Clearly, aij is equal to the number
of n-paths in G connecting i to j. The following is the generally accepted
analogue for topological entropy for infinite graphs/transition matrices.
Definition 8.63. The Gurevich entropy of A (or of G = G(M )) is defined
hG (G) = hG (A) = sup{log r(A ) : A is a finite submatrix of A},
where r(A ) is the large eigenvalue of A . Equivalently [291],
hG (A) = log λA .

By Proposition 8.62 the value R = λ−1 is the common radius of con-

 A (n) n
vergence of the power series Aij (z) := n≥0 aij z . It follows that for each
pair i, j ∈ I, 
∈ R, 0 ≤ r < R,
Aij (r)
= ∞, r > R.

Since the underlying alphabet is no longer compact and (unless j aij <
∞ for every i) not even locally compact, many “standard” properties of
8.7. Countable Graphs and Matrices 403

entropy fail. The set of shift-invariant measures is no longer compact in the

weak∗ topology, the Variational Principle can fail, and there is not necessarily
a measure of maximal (Gurevich) entropy.

8.7.1. The Vere-Jones Classification. Following [545], consider the “re-

duced” coefficients for each n ∈ N:
• First entrance to j: fij is the number of n-paths connecting i to
j, without appearance of j in between. Let Fij (z) = n≥1 fij z n
be the corresponding power series, with radius of convergence Φi,j .
• Last exit of i: ij is the number of n-paths connecting i to j,
without appearance of i in between. Let Lij (z) = n≥1 ij z n be
the corresponding power series, with radius of convergence Λi,j .
(n) (n) (n)
Clearly fii (n) = ii (n) for each i ∈ I. Also, since fij , ij ≤ aij for all
n ∈ N and i, j ∈ I, we have R ≤ Φij , Λij . The following relations hold for
the power series (see [545, page 365]):

Aii (z) = 1−L1ii (z) = 1−F1ii (z) ,
Aij (z) = Fij (z)Ajj (z) = Aii (z)Lij (z) if i = j.
One might think that in irreducible aperiodic graphs, Φi,j and Λi,j are inde-
pendent of i, j, but that is not entirely true,
2not even in finite graphs. This
is illustrated by the transition matrix A = 1 0 for state space {1, 2}. Here
Φ1,1 = ∞ whereas Φ2,2 = 12 .
The next proposition was stated in [481, Proposition 2.6], with a cor-
rected proof (for part (i)) in [94].
Proposition 8.64. Suppose we have G = G(A) with R = λ−1
A .
(i) If there is a vertex j such that R = Φjj , then there exists an irre-
ducible subgraph G   G such that hG (G  ) = hG (G).
(ii) If there is a vertex j such that R < Φjj , then hG (G  ) < hG (G) for
all proper strongly connected subgraphs G  .
(iii) If there is a vertex j such that R < Φjj , then R < Φii for all i.

The Vere-Jones classification of irreducible aperiodic matrices [545] is based

on the behavior of the series Aij (z), Fij (z) for z = R. Vere-Jones origi-
nally distinguished the R-transient, null R-recurrent, and positive R-
recurrent cases. The classification was later refined by Ruette in [481],
adding the strongly positive R-recurrent case13 . This is summarized in
Table 8.1 which applies independently of the sites i, j ∈ I for an irreducible
13 Gurevich & Savchenko called this stable positive recurrent (see [293, Definition 2.8]),

but the term strongly positive recurrent stuck.

404 8. Miscellaneous Background Topics

Table 8.1. The Vere-Jones classification.

transient null weakly positive strongly positive

recurrent recurrent recurrent
Fii (R) <1 =1 =1 =1
Fii (R) ≤∞ ∞ <∞ <∞
Aij (R) <∞ =∞ =∞ =∞
limn aij Rn =0 =0 ∈ (0, ∞) ∈ (0, ∞)
for all i R = Φii R = Φii R = Φii R < Φii

matrix A — compare the last row of Table 8.1 and Proposition 8.64. We
call corresponding classes of matrices transient, null recurrent, weakly
positive recurrent, and strongly positive recurrent.

Remark 8.65. Note that Fii (x) = 1 (n) n
x n≥1 fii nx , and several authors
 (n) n
give the second line of this table in the form of n≥1 nfii R .

In order to find out which box in Table 8.1 a matrix fits in, one can use
Salama’s criteria (see [481, 484]); they depend on whether the underlying
graph G can be enlarged/reduced (in the class of strongly connected directed
graphs) without changing the entropy.

Theorem 8.66. Let G be an irreducible directed graph.

(i) G is transient if and only if there is a graph G   G such that

hG (G  ) = hG (G).
(ii) G is strongly positive recurrent if and only if hG (G  ) < hG (G) for
every G   G.
(iii) G is recurrent but not strongly positive recurrent if and only if there
exists G   G with hG (G  ) = hG (G), but hG (G) < hG (G  ) for every
G   G.

Also Φii < R for some (and hence every) vertex i implies that G is positive
8.7. Countable Graphs and Matrices 405

Example 8.67. Let G have vertex set N and directed edges n + 1 → 1 and
1 → n for each n ∈ N. Then the truncated n × n transition matrix is
⎛ ⎞
1 1 1 ... ... 1
⎜1 0 0 0⎟
⎜ ⎟
⎜ .. ⎟
⎜0 1 0 .⎟
⎜ ⎟
An = ⎜ . .. .. .. ⎟ ,
⎜ .. . . .⎟
⎜ ⎟
⎜ .. .. .. ⎟
⎝. . . 0⎠
0 ... ... 1 0

so det(A1 − λ1 ) = 1 − λ, and

det(An − λIn ) = (−λ) det(An−1 − λIn−1 ) − (−1)n

= (−1)n (λn − λn−1 − λ−2 − · · · − 1)
λn+1 − 2λn + 1
= (−1)n .

Therefore the leading eigenvalues λn of An increase to 2 as n → ∞ and the

Gurevich entropy is hG (G) = log 2. We have

F11 (z) = zn = 1 for z = = e−hG (G)


 1 1
F11 (z) = nz n−1 = =4<∞ for z = .
(1 − z)2 2

Hence G is positive recurrent. Removing any of the edges 1 → n makes G

transient because the Gurevich pressure doesn’t change, but F11 (1/2) be-
comes < 1. The classification of Theorem 8.66 shows that G is weakly pos-
itive recurrent. Adding any edge to G makes it strongly positive recurrent
(and increases the Gurevich entropy).

8.7.2. Measures of Maximal Entropy for Countable Matrices. Non-

compactness of the graph G sometimes results in the non-existence of a
measure of maximal entropy, even if the Gurevich entropy is finite. To
illustrate this, we consider the graph and transition matrix in Figure 8.8
which is the purely combinatorial version of the standard symmetric random
walk on Z.
406 8. Miscellaneous Background Topics

−1 0 1

⎛ ⎞
.. .. .. ..
. . . .
⎜ ⎟
⎜ 0 1 0 1 0 ⎟
⎜ ⎟
⎜ 0 1 0 1 0 ⎟

⎜ 0 1 0 1 0 ⎟
⎝ ⎠
.. .. .. ..
. . . .

Figure 8.8. The transition graph and matrix for the symmetric random
walk on Z.

Since every vertex has exactly two outgoing (and two incoming) edges,
the Gurevich entropy hG (G) = log 2. In more detail, using the reflection
principle (see Exercise 3.132) and Stirling’s formula, we obtain the following
for every i ∈ Z and even n ∈ N:

(n+2) n n
fii = 2 −
n/2 (n − 2)/2
n! n!
= 2 −
(n/2)! (n/2)! (n/2 − 1)! (n/2 + 1)!
2n! 1 1
= −
(n/2)! (n/2 − 1)! n/2 n/2 + 1
4 n 2 2n+2
= ∼ .
n + 2 n/2 πn n + 2

Hence, for radius of convergence R = hG (G)−1 = 12 , we find Fii (R) =

1  2 1 n+2 = ∞, and therefore G is null recurrent.
R n≥0, even (n + 2) πn n+2 2
Next, if we truncate G to GN consisting of states −N, . . . , N , then there
is a Markov measure μN by setting

i = −(N − 1), . . . , N − 1, ⎨1, i = ±N, j = ±(N − 1),
pi = 2N , pij = 12 , |i − j| = 1, i = ±N,
4N , i = −N, N, ⎪

0, otherwise.

This would have been the Shannon-Parry measure if a−N,−N +1 = aN,N −1 =

2 instead of 1. As it is, we find the measure-theoretic entropy w.r.t. partition
8.7. Countable Graphs and Matrices 407

P = {[−N ], . . . , [N ]} as
1 4
h(μN , P) = lim H σ −k (P)
n→∞ n
= − μN ([i0 . . . , in−1 ]) log μN ([i0 . . . , in−1 ])
i0 ,...,in−1 =−N
pi0 2−#{1≤k<n:pik−1 ,ik = 2 }
i0 =−N i0 ···in−1 =−N
 ' 5 
− log pi0 + # 1 ≤ k < n : pik−1 ,ik = log 2
= pi0 2−#{0≤k<n−1:ik−1 ,ik =±N }
i0 =−N i0 ···in−1 =−N
(− log pi0 + #{0 ≤ k < n − 1 : ik = ±N } log 2).

By the Birkhoff Ergodic Theorem 6.13, μ-a.e. n-path satisfies #{0 ≤ k <
−1 2N −1
n − 1 : ik = ±N } ∼ 2N2N n. Therefore h(μN , P) = 2N log 2. The partition
P obviously generates the truncated path space, and since there is no invari-
ant measures more evenly spread out, hG (GN ) = h(μN ) = 2N 2N log 2. How-

ever, in the weak topology, μN doesn’t converge, and in the vague topology
(i.e. weak∗ topology restricted to compact subsets) μN converges to the zero-
measure. The entire system G has no measure of maximal entropy. Indeed,
regarding the existence of measures of maximal entropy intrinsic ergodicity,
Gurevich [292] proved the following:

Theorem 8.68. Let G be a transition graph with positive Gurevich entropy.

There exists an invariant Borel probability measure μ on G such that hμ (G) =
hG (G) if and only if G is positively recurrent.

Apart from graphs failing to have a measure of maximal entropy, upper

semi-continuity of the entropy function μ → h(μ) can fail as well. For
instance, in the above example, 12 (μN + μ1 ) → 12 μ1 in the vague topology14
and also15 on cylinders, and the measure-theoretic entropies h( 12 (μN +μ1 )) →
3 1 1
4 log 2 which is larger than h( 2 μ1 ) = 2 log 2. Here we adopt the convention
that h(μ) = μ(X)h(μ/μ(X)) for finite non-zero measures.

14 If a sequence of probability measures μ converges to μ in the weak∗ topology, which in

particular implies that μ is also a probability measure, then h(μ) ≥ lim supn h(μn ), so for this
type of convergence, upper semi-continuity of the entropy function holds.

j aij < ∞ for all i.
15 In fact equivalently, because
408 8. Miscellaneous Background Topics

Not only mass but also entropy “escapes to infinity”, and one would like to
quantify how much entropy is carried “at infinite”, i.e. outside every compact
subgraph. This is addressed by papers by Buzzi [136] and Iommi, Todd &
Velozo [327].

Definition 8.69. Let G be a countable Markov graph on the state space

N and let M1 (σ) be its collection of shift-invariant probability measures.
Define (see [136, Definition 1.13] and also [481])

b∞ := inf inf sup{h(μ) : μ ∈ M1 (σ), μ(F ) < ε},

F ε>0

where the first infimum is taken over all finite subgraphs of G.

Let zn (ε, q) = {[x1 · · · xn ] : x1 , xn ≤ q, n1 #{i : xi ≤ q} < ε} indicate the
number of n-cylinders that, whilst starting and ending in states ≤ q, spend
only an ε-fraction of their time in states ≤ q. Define (see [327, Definition
δ∞ := lim inf lim inf lim log zn (ε, q).
q→∞ ε→0 n→∞ n

Finally, the measure-theoretic entropy at infinity is

h∞ := sup{lim sup h(μn ) : M1 ⊃ (μn ) → 0 on cylinders}.


As it was shown in [327, Theorem 1.4] that these three quantities coincide,
we can call them the entropy at infinity of G.

It was show in [294] and [293, Theorem 3.8] that G is strongly positive
recurrent if and only if δ∞ < hG (G). The entropy at infinity δ∞ is precisely
the defect that may occur in upper semi-continuity of the entropy function;
see [327, Theorem 1.1].

Theorem 8.70. Let G be a countable directed graph with path space XG and
finite Gurevich entropy. Let (μn )n∈N be a sequence of probability measures
that converge to μ on cylinders (i.e. μn ([Z]) → μ(Z) for every cylinder set).
lim sup h(μn ) = μ(XG ) h(μ) + (1 − μ(XG )) δ∞ .

It is important to use the topology of convergence on cylinders here

(rather than weak∗ topology) because it allows limit measures in M≤1 , i.e.
shift-invariant measures with μ(XG ) ∈ [0, 1]. If G is irreducible and has the
property that for every state a ∈ I and n ∈ N, there are only finitely many
paths ending in a, then M≤1 is in fact so large that it is homeomorphic to
the Poulsen simplex Σ (i.e. equidistributions are dense in Σ). In addition
[327, Theorem 8.7], if G has finite Gurevich entropy, then G is entropy dense.
8.7. Countable Graphs and Matrices 409

8.7.3. Graphs and Romes. In this section, we discuss some techniques

presented in [88] to compute characteristic polynomials and leading eigen-
values (hence entropy) of transition graphs. It gives an alternative proof (via
Theorem 8.73) to parts of Theorems 3.77 and 3.114.
Definition 8.71. A subgraph R of a directed graph G is called a rome if
it is connected and every infinite path in G has at least one vertex with R
in common.

The name rome16 was coined by Misiurewicz, since all roads lead to
Rome. Clearly G itself is a rome, as is every connected subgraph of G that
contains a rome R.
Let B = (bi,j )ni,j=1 be the transition matrix associated to G, so we enu-
merated vertices of G as {1, . . . , n}. A simple (i.e. without self-intersections)
path p of length l(p) is given by i = i0 → i1 → · · · → il(p) = j, where i, j ∈ R,
but the intermediate vertices belong to G \ R. Let w(p) = k=1 bik−1 ,ik be
the weight of p. The rome matrix Arome (x) = (ai,j (x)), where i, j run over
the vertices of R, is given by
ai,j (x) = w(p)x1−l(p) ,

where the sum runs over all simple paths p as above. (Note that with the
convention that x0 = 1 for x = 0, Arome (0) reduces to the weighted transition
matrix of the rome R.) The result from [88, Theorem 1.7]17 is:
Theorem 8.72. Let G be a transition graph containing a rome R of cardi-
nalities n and r, respectively. The characteristic polynomial of its associated
matrix B is equal to
(8.32) det(B − xIn ) = (−x)n−r det(Arome (x) − xIr ),
where In and Ir are the identity matrices of the appropriate dimensions.

Proof. Clearly G itself is a rome of G, and in this case all the path lengths
l(p) = 1 and ai,j (x) = bi,j , so det(Arome (x) − xIr ) = det(B − xIn ) holds.
Now we argue by induction, decreasing the number of vertices in steps
of one until we get down to R. Recall that the graphs of the intermediate
steps are all romes by themselves.
Let S = {s1 , s2 , . . . , sk } be the vertex set of such intermediate rome,
and let S  = {s0 } ∪ S be the vertex set of rome at the previous step. Set
16 The original definition in [88] says that there are no loops disjoint from R, but that is for

finite graphs G. Since we don’t accept e.g. 1  2 as a rome in 1  2 → 3 → 4 → · · · ,

the statement with infinite paths is the correct generalization to infinite connected graphs and is
in agreement with the adagio that all roads lead to Rome.
17 Put in the form used in [116, Theorem 9.3.13].
410 8. Miscellaneous Background Topics

AS  − xIk+1 = (ui,j )ki,j=0 and AS − xIk+1 = (vi,j )ki,j=1 , where AS  and AS

are the rome matrices as in (8.32). Since there is no loop s0 → s0 , we have
u0,0 = −x. Therefore, if we add the first column of AS  − Ik+1 multiplied
by u0,j /x to the j-th column for j = 1, . . . , k, then we obtain a matrix
(ũi,j )ki,j=0 where ũ0,0 = −x, ũ0,j = 0, and ũi,j = vi,j for all i, j ∈ {1, . . . , k}.
Thus det(AS  −xIk+1 ) = −x det(AS −xIk ), and this completes the induction

The following corollary follows from Vere-Jones [545] and is covered by

Ruette [481]. It had various independent mentions without (full) proof,
[457, 555]. In [398, page 117] it is left as an exercise for the reader. Further
proofs are in [522, Section 7] and [181] (but only in specific cases). Pavlov
[450] applies the result to coded shifts; see Section 3.3.

Theorem 8.73. Let the directed graph G consist of a vertex v0 from which q
loops of length  emerge. Let H∗ := lim supn n1 log #{closed n-paths in G}.
Then eH∗ is the positive solution of the equation

1= q x− ,

if a finite solution exists.


H∗ ≥ Q := lim sup log q = − log radius of convergence of q z  .

If ≥1 q e < 1, then H∗ = Q. (In particular, if Q = ∞, then H∗ = ∞.)

Proof. First assume that G is finite; i.e. q = 0 for  sufficiently

For the rome R = {v0 } we have the rome matrix Arome (x) =  q x1− .
Therefore Theorem 8.72 gives
det(B − xI) = (−x)#G−1 q x1− − x = (−x)#G 1− q x− .
Equating this to zero, we get 1 =  q x , as required. Thus x is the
leading eigenvalue of the corresponding transition matrix B, and this is also
equal to eH∗ .
the case that q > 0 infinitely often, write xN for the positive solution
of 1 = N − in N ; set x∞ = limN xN ∈ [1, ∞].
=1 q x . Then xN increasing
If there is a finite solution x∗ to 1 =  q , then x∗ = x∞ , because of the
A. 8.7. Countable Graphs and Matrices 411

If x∞ < x∗ − 2ε for some positive ε, then there is N such that
N − N − > 1.
q x
=1  N ≥ =1 q (x∗ − ε)
• If x∞ > x∗ + 2εfor some positive
N ε, then there is N such that
∞ − −
xN > x∗ + ε and =1 q x∗ > =1 q xN = 1.
Let GN be the subgraph of G consisting of loops of length ≤ N and define
HN := lim log #{closed n-paths in GN }.
n→∞ n
The existence of the limit HN follows from Fekete’s Lemma 1.15 because
#{closed n-paths in GN } is supermultiplicative in n. Clearly HN is non-
decreasing in N ∈ N; set H∞ = limN →∞ HN ∈ [0, ∞]. Since H∗ ≥ HN for
all N , also H∗ ≥ H∞ . The above argument on the finite graphs case shows
that xN = eHN , and therefore x∞ = eH∞ ≤ eH∗ .
Next assume x∞ < ∞. Assume by contradiction that eH∗ > x∞ . Then
there is N such that log x∞ < N1 log K for K := #{closed N -paths in G}.
There are also K N -loops in the graph GN , and #{closed rN -paths in GN } ≥
K r for every r ∈ N. Therefore log xN = HN ≥ N1 log K > log x∞ , and this
contradicts the monotonicity xN , x∞ .
We have H∗ ≥ lim sup 1 log q = Q, because #{closed -paths in G} ≥
q . This covers the case Q = ∞ with H∗ = ∞ as well.
Assume that Q < ∞ and x∗ doesn’t exist because  q x∞ < 1. If
H∗ > Q, then we can find N such that H∗ ≥ HN > Q, and similarly as
 the −Q above argument, xN = eHN > eQ . However, this contradicts that
 q e < 1. Therefore H∗ = Q = log x∞ . 

Remark 8.74. The fact that x∞ = eQ if  q x− ∞ < 1 also confirms that
Fii (R) ≤ 1 in the Vere-Jones classification (first  line in Table 8.1);
case R = e −Q 
is the radius of convergence of  q z . If 1 =  q x∞ <
 −Q , then G is positive recurrent (cf. the last statement of Theo-
rem 8.66), and hence it has a unique
 measure of maximal entropy by The-
orem 8.68. If 1 =  q x∞ =  q e −Q , then it requires more informa-
tion on (q )∈N to decide whether G is positive recurrent or not. Ruette
[481, Example 2.9] and Pavlov [450, Section 5] give examples illustrating
this distinction.

Solutions to Exercises

Solution to Exercise 1.24: Take π([00]) = 0, π([10]) = π([01]) = 1.

Solution to Exercise 1.26: Since (Y, σ) is a factor of (X, σ), we have
htop (Y, σ) ≤ htop (X, σ); see Corollary 2.51. For the other inequality, assume
that the sliding block code has window length 2N + 1, then

pY (n) ≥ pX (n + 2N ),
1 1
htop (Y, σ) = limlog pY (n) ≥ lim log pX (n + 2N )
n→∞ n n→∞ n
n + 2N 1 1
= lim log pX (n + 2N ) − log k
n→∞ n n + 2N n
= lim log pX (n + 2N ) = htop (X, σ).
n→∞ n + 2N

Therefore the entropies are the same.

Solution to Exercise 1.37: Let XFib be the space of the one-sided Fi-
bonacci SFT. Then Σ = XFib \{x110 : x ∈ L(XFib )} and Σ = XFib ∪{x1001 :
x ∈ L(XFib )}.
Solution to Exercise 2.7: Take the one-point compactification X =
N ∪ {∞} and set Y = X \ {1}, both equipped with the map

⎨(n + 1)/2 if n + 1 is a power of 2,
f (n) = ∞ if n = ∞,

n+1 otherwise.

414 A. Solutions to Exercises

Then n is 2k -periodic if 2k ≤ n < 2k+1 , and f |X has two fixed points, 1 and
∞, but f |Y has only one, so they are not conjugate. There is a factor map
π : X → Y given by π(1) = π(∞) = ∞ and π(n) = n otherwise. The reverse
factor map π̃ : Y → X can be taken as π̃(∞) = ∞ and π̃(n) = 2k−1 + n
(mod 2k−1 ) if 2k ≤ n < 2k+1 and k ≥ 1 (halving the period of each n < ∞).
Solution to Exercise 2.28: If (X, T ) is minimal, then it is of course transi-
tive. Suppose it is transitive but not minimal. Let x have a dense orbit and y
a non-dense orbit. Then there are ε > 0 and z ∈ X such that d(z, orb(y)) >
2ε. Take δ > 0 so small such that d(u, v) < δ implies d(T n u, T n v) < ε for all
n ≥ 1. Since orb(x) = X, there are 0 ≤ m < n such that d(T m x, y) < δ and
d(T n x, z) < ε. Then d(T n−m y, z) ≤ d(T n−m y, T n (x)) + d(T n x, z) < ε + ε =
2ε, contradicting the choice of z, ε.
Solution to Exercise 2.47: First note that pX (m + n) ≤ pX (m)pX (n),
so log pX (n) is subadditive. Thus by Fekete’s Lemma 1.15,
1 1
lim log pX (n) = inf log pX (n)
n n n n

exists. Next take ε > 0 arbitrary and N ∈ N such that 2−N < ε. Then
every n + N -cylinder is an (n, ε)-ball, and we need exactly pX (n + N )
of them to cover the space. Therefore, writing m = n + N , htop (σ) =
limε limn n1 log pX (n + N ) = limm m
log pX (m).

Solution to Exercise 3.12: Since M is invertible and M −1 is an integer

matrix, the orbits of points with rational coordinates (say with common de-
nominator q) are periodic. In computer graphics, all pixels have rational
coordinates, so after iterating TM sufficiently often, all these pixels are back
where they started, and the original picture reappears; see https://www. The problem sparked further result; see e.g.
[232] and [528] and reference therein.
Solution to Exercise 3.20: The transition matrices are
⎛ ⎞
⎛ ⎞ 1 0 1 1
1 1 1 ⎜1 0 1 1⎟
A= 0⎝ 1 1⎠ and  = ⎜ ⎝0 0 1 1⎠ .

1 0 0
0 1 0 0
Also ⎛ ⎞
⎛ ⎞ 1 0 0
1 0 1 1 ⎜1 0 0⎟
D = ⎝0 0 1 1⎠ and C = ⎜ ⎝0 1 0⎠ ,

0 1 0 0
0 0 1
so this time C is the rectangular “diagonal” matrix. Now check that DC = A
and CD = Â.
A. Solutions to Exercises 415

Solution to Exercise 3.23: To show that ≈ is an equivalence relation:

Reflexivity follows because A = AI = IA = A.
Transitivity follows from concatenating the chains in (3.2).
Symmetry follows from reversing the chains in (3.2), i.e. swapping the
roles of Ci and Di .
Regarding the leading eigenvalues, if A = DC, CD = Â, then An =
(DC)n = D(CD)n−1 C = D Ân−1 C. Since the maximal entries of An grow
like the leading eigenvalue of A (see the Perron-Frobenius Theorem 8.58),
λ = λ̂. This equality progresses through the chain.
Solution to Exercise 3.26: (i) If (3.3) holds, then A+1 = (AC)D and
D(AC) = DC Â = Â . Also A(AC) = (AC)Â, and ÂD = D Â as before.
(ii) Reflexivity: A I = IA . Symmetry: Swap the roles of C and D.
Transitivity: If A ∼  via C, D and  ∼ˆ  via Ĉ, D̂, then (C Ĉ)(D̂D) =
ˆ ˆ ˆ ˆ ˆ ˆ
C Â D = CDA = A+ and (D̂D)(C Ĉ) = D̂ Â Ĉ = Â D̂Ĉ = Â+ . Also
ˆ ˆ
AC Ĉ = C ÂĈ = C Ĉ Â and D̂DA = D̂ ÂD = ÂD̂D.
(iii) If A ≈ Â with lag  = 1, then the first half of (3.3) implies (3.2),
whereas the second half is an immediate consequence of the first. Now for
general lag : if A ≈ Â with chain Ci , Di , i = 1, . . . , , and intermediate ma-
trices Ai , then Ai−1 ≈ Ai with lag 1 and hence Ai−1 ∼1 Ai for i = 1, . . . , .
By the proof of the transitivity in part (ii), we get A = A0 ∼ A = Â.
Solution of Exercise 3.80: We have θ(x)0 = +1 for all x, because
the identity preserves orientation. Then i(x) = k = g(θ(x)k , θ(x)k+1 ) for
g(+1, −1) = g(−1, +1) = 1 and g(+1, +1) = g(−1, −1) = 0. However,
σ ◦ θ = θ ◦ f , and there is no inverse sliding block code transforming i(x)
into θ(x). Indeed, whether θ(x)k = ±1, i.e. f k is increasing/decreasing at x
does not only depend on the positions i(x)j , k − N ≤ j ≤ k + N for a fixed
block size, but on all positions i(x)j , 0 ≤ j < k.
Solution to Exercise 3.87: With respect to θ, we have ρ(m) =
min{n > m : θn = θn−m }.
Solution to Exercise 3.91: We prove by induction on k. For κ :=
min{j ≥ 2 : νj = 1} all positions ≤ κ+1 are cutting times, so there is nothing
to prove. For k ≥ κ+2, since νSk−1 +1 · · · νn · · · νSk = ν1 · · · νn−Sk−1 · · · νS Q(k) ,
we have ρ(n − Sk−1 ) ≥ SQ(k) whenever ρ(n) ≥ Sk . By the induction hypoth-
esis, Sk − n = SQ(k) − (n − Sk−1 ) is a cutting time, as required.
416 A. Solutions to Exercises

Solution to Exercise 3.98:

(1) Since the kneading sequence of fa is ν = 10 ∗ 10 ∗ 10 ∗ · · · , we
get θ(t) = 1, −1, −1, 1, −1, −1, . . . and Dfa (t) = 1−t−t
by Corol-
lary 3.96.
(2) By (3.34) we have
1 + γ(t) = 1 + (1 − t)−1 Dfa (t)−1
1 − t3 2
=1+ = .
(1 − t)(1 − t − t )
2 1 − t − t2
Developing the power series we obtain
∞ ∞ n  
1 2 n n n j
= (t + t ) = t t
1 − t − t2 j
n=0 n=0 j=0
∞ n/2   ∞
= tn = 2Fn ,
n=0 j=0 n=0

where the last step can be done using an induction proof on Pascal’s

(3) Since Fn ∼ γ n for the golden ratio γ = 12 (1 + 5), the radius of

convergence is γ −1 and htop (fa ) = log 12 (1 + 5). An easier proof
follows from the fact that fa is Markov with transition matrix 11 10 .
Solution to Exercise 3.115: The Fibonacci shift is a gap shift with set
of gaps S = N, and 1 = x−2 + x−3 + · · · = 1−x −2 is equivalent to x
2 =

x + 1. The largest solution is the golden mean 12 (1 + 5), so the entropy is

htop (σ) = log( 12 (1 + 5)).
The odd shift is a gap shift with set of gap {0, 1, 3, 5, 7, . . . }. The equation
1 = x−1 +x−2 +x−4 +x−6 +· · · = x−1 + 1−x −2 is equivalent to x −x −2x+1 =
3 2

0, so the entropy is htop (σ) ≈ log 1.80193773580484 . . . .

The even shift is a gap shift with reversed symbols and set of gap
{0, 2, 4, 6, . . . }. The equation 1 = x−1 +x−3 +x−5 +· · · = 1−x is equivalent
2 1
to x = x + 1, so again the entropy is htop (σ) = log( 2 (1 + 5)).
Solution to Exercise 3.132: This is an application of the reflection princi-
ple from the theory of random walks. Consider each word u ∈ Lext of length
2n as a walk on a grid Λ which is Z2 rotated over 45◦ . The walk starts at
the origin and ends at (2n, 0), and every bracket ( indicates a step northeast
and ) a step southeast; see Figure A.1.
There are 2nn such paths, but many go below the horizontal axis and
therefore don’t correspond to any u ∈ Lext . If p = (p0 , . . . , p2n ) ∈ Λ2n is
A. Solutions to Exercises 417

• • • • • • •

• • • • • • • •

• • • • • • •

• • • • • • • •

p • • • • • • •

• • • • • • • •

{y = −1} • • • • • • •

• • • • • • • •

• • • • • • •

• • • • • • • •

• • • • • • •

Figure A.1. The walk p and its reflection p̃.

such a disallowed path, there is a smallest 0 ≤ k < n such that p2k+1 =

(2k + 1, −1). Reflect the initial part of the path (p0 , . . . , p2k+1 ) in the hori-
zontal line {y = −1}, and call the resulting path p̃ = (p̃0 , . . . , p̃2+1 , . . . , p̃2n )
(so p̃j = pj for j ≥ 2k + 1, and p̃0 = (0, −2)). This gives a bijection between
disallowed paths p and paths p̃ from (−2, 0) to (2n, 0). Since  2n  p̃ has n + 1
northeast steps and only n − 1 southeast steps, there are n+1 such paths.
Finally compute
2n 2n 2n 2n 2n n 1 2n
− = − = (1− )= ,
n n+1 n!n! (n + 1)!(n − 1)! n!n! n+1 n+1 n

as required.
Solution to Exercise 4.53: For every length n, there is at most one bi-
special word w, namely if the unique left-special coincides with the unique
right-special word. Since rotational shifts are palindromic, we can take a
palindrome v in the language large enough to contain awb as a subwords for
all possible letters a, b. Reversing v, we see that also the reverse of awb oc-
curs in the language, so the reverse of w is bi-special as well. The uniqueness
of w shows that it is a palindrome.
Solution to Exercise 4.65: Take α ∈ [0, 1] \ Q and partition S1 into inter-
vals [0, N α mod 1) and [N α mod 1, 1). Then the symbolic dynamics of the
rotation Rα w.r.t. this partition has the required complexity function.
Solution to Exercise 4.96: Take ϕ : X → X, x → −x, i.e. the additive
inverse under the group action on the odometer. Then ϕ ◦ a(x) = ϕ(x + 1) =
−(x + 1) = −x − 1 = a−1 ◦ ϕ(x).
418 A. Solutions to Exercises

Solution to Exercise 4.111: There are an periodic sequences of (not nec-

essarily minimal) period d. In order not to count twice, we go from d = n
down along the divisors of n.

• For d = n, we count +an and indeed μ( nn ) = μ(1) = 1.

• For d = n/p for some prime p, then we count −ad because they
were already counted in an , but not anywhere else, because d has
no other multiple that divides n. Indeed μ( nd ) = μ(p) = −1.

• For d = n/(pq) for some distinct primes p, q, then we count +ad

because they were already counted in an , but then subtracted in a p
and in a q . Indeed μ( nd ) = μ(pq) = 1.

• For d = n/(pqr) for some distinct primes p, q, r, then we have to

subtract ad because they were already counted in an , then sub-
n n n n n
tracted in a p , a q , and a r , but then added again in a pq , a qr , and
a rp . Indeed μ( nd ) = μ(pqr) = −1.

• Continue like this for all square-free divisors d.

• If d = n/(kp2 ) for some prime p, then all the overcounting done at

n n
a lp for l|k, 1 ≤ l < k, is offset by undercounting at a kp/l , and the
overcounting at an is offset by undercounting at a kp . Therefore we
don’t count ad , and indeed μ(d) = μ(kp2 ) = 0.

This proves the formula. The required number of p-periodic sequences is

ap − a (where we subtracted the a constant sequences). These come in
groups of p, namely their σ-orbits. Hence p|ap − p. Dividing by a turns this
into p|ap−1 − 1 because p is not a divisor of a. Therefore ap−1 ≡ 1 mod p.
Solution to Exercise 4.115: The fixed point of the Cantor substitution
ρ = 101000101000000000101000101 · · · has zeroes at position 2 and 5, so
2, 5 ∈ B. But that would mean that ρ25 = 0, which is not true. Therefore
ρ is not B-free for any B. Neither is XCantor , because ρ contains arbitrarily
long blocks of zeroes.
Solution to Exercise 5.42: Note that r B is the smallest integer such
that #Vi = r B infinitely often. Let (ki )i be an increasing sequence such that
#Vki = r B , and telescope to M̃1 = M (1) · · · M (k1 ) and M̃i = Mki−1 +1 · · · Mki
for i ≥ 2. This way, it is clear that r ≤ r B . However, r < r B is possible, as
Example 5.37 shows. In that case, r B = 2 because the Bratteli diagram is
stationary. But since the BV-system is isomorphic to an odometer, the rank
is actually 1. Figure 5.9 gives another example.
A. Solutions to Exercises 419

Solution to Exercise 6.3: Since limn n1 #{m ≤ i < m + n : ρi = 0} = 1

for all m ∈ N, the only possible shift-invariant measure would be the Dirac
measure δ0∞ . But 0∞ ∈ / X (it is contained only in the closure). Hence X
supports no shift-invariant measure.
Solution to Exercise 6.8: First assume that μ is not ergodic for the sys-
tem (X, T ). Hence there is a T -invariant set A such that 0 < μ(A) < 1.
μ(B ∩ A) μ(B \ A)
μ1 (B) = and μ2 (B) = .
μ(A) μ(X \ A)

Then μ = αμ1 + (1 − α)μ2 for α = μ(A) ∈ (0, 1). In particular, if T preserves

only one probability measure, this cannot happen, so μ is ergodic.
Conversely, suppose that μ is ergodic and μ = αμ1 + (1 − α)μ2 for some
α ∈ (0, 1) and T -invariant measures μ1 and μ2 . Then μ1 and μ2 are both
absolutely continuous w.r.t. μ. Now let A ∈ B and let Y ⊂ X be the set of μ-
typical points. Then μ(Y c ) = 0 and hence μ1 (Y c ) = 0. Applying Birkhoff’s
Ergodic Theorem 6.13 to μ1 and μ separately for the indicator function 1A
and some y ∈ Y , we get

μ(A) = lim ψ ◦ T (y) = μ1 (A).
n→∞ n

But A ∈ B was arbitrary, so μ1 = μ. Then also μ2 = μ.

Solution to Exercise 6.15: The support supp(μ) is a compact subset of
the metric space X, so there is a countable base Ui of the topology. Let φi
be a non-negative continuous function supported on Ui . Let Xi be the set of
μ-typical points for φi . Thus μ(Xi ) = 1 and #(orb(x) ∩ Ui ) = ∞ for every
x ∈ Xi . If U, V are arbitrary open sets of supp(μ) and Ui ⊂ U , Uj ⊂ V ,
then for every x ∈ Xi ∩ Xj there are ni < nj such that T ni (x) ∈ Ui and
T nj (x) ∈ Uj . Hence U ∩ T −(nj −ni ) (V ) = ∅.
Solution to Exercise 7.15: We number the node of Figure 7.2 as 0, 1, 2, 3, . . . .

(1) Since node 0 and node 1 have arrows to themselves, but no arrows
back from higher nodes, the relations n = n−1 + an−1 and an =
an−1 + bn−1 follow.
(2) Since beyond node 2, only even nodes can have two outgoing arrows,
we have b2n = b2+1 .
(3) Again from node 2 onwards, if we remove every odd-numbered node,
we obtain the original graph from node 1 onwards. Therefore b2n =
an .
420 A. Solutions to Exercises

(4) Obviously a0 = 1, a1 = 2. The previous steps together give an =

an−1 + an/2 for n ≥ 2.
(5) The number of n-path from a level Dk on a Hofbauer tower is equal
to the lap-number (f n |Dk ) and here we use D1 = [0, c1 ]. Using
d’Alembert’s convergence criterion limn an−1 an
= limn 1 + an/2
= 1,
so an grows only subexponentially. On the other hand, if an ∼ Cnd
for some d ≥ 1, then an − an−1 would grow like Cdnd−1 , but in
reality an − an−1 = an/2 = 2−d nd . Hence (an ) grows superpoly-

nomially. Thus n = 1 + n−1 k=0 ak grows superpolynomially but
subexponentially as well.

Solution to Exercise 8.12: Since q(x) is quadratic, λ± = a + b for some
a, b ∈ Q. We need to check if α = g(λ+ ) = λ(λ− ) for some polynomial.
By a translation
√ i over a, we can assume
√ that g(x) = g̃(x − a), so g̃(λ± ) =
d  i/2 ±

i=0gi (± b) = i even gi b b i odd gi b(i−1)/2 . If this is irrational,
then i odd gi b (i−1)/2 = 0, so g(λ ) and g(λ− ) have opposite irrational parts.

Solution to Exercise 8.18: If k divides both p and q, it also divides p

and q  . Hence d := gcd(p, q) divides both components of f n (p, q) for all
n ≥ 0. Also f (d, d) = (d, d), and max{p , q  } ≤ max{p, q} with equality only
if max{p, q} = q  . Now unless p = q, the role of p and q has to change every
so often. Hence the maximum component of f n (p, q) will eventually reach d.
Solution to Exercise 8.19: Write the recursive rule (8.6) in a single matrix
qn pn an 1 qn−1 pn−1 q0 p0 1 0
= and = .
qn−1 pn−1 1 0 qn−2 pn−2 q−1 p−1 0 1
qn pn 1 0 ai 1
= An · · · A1 for Ai = .
qn−1 pn−1 0 1 1 0
Taking the transpose and using that the Ai ’s are symmetric and the identity
matrix commutes with every matrix, we find
 t  t  
qn pn 1 0 1 0
= A1 · · · An = A1 · · · An
t t
qn−1 pn−1 0 1 0 1
This shows that qn (a1 , . . . , an ) = qn (an , . . . , a1 ) and pn (a1 , a2 . . . , an−1 , an )
= qn−1 (an , an−1 , . . . , a2 ). That makes pn independent of a1 .
Solution to Exercise 8.30: In the Kepler tree, [0; a1 , a2 , a3 , . . . , an ] has
descendants [0; a1 + 1, a2 , a3 , . . . , an ] and [0; 1, a1 , a2 , a3 , . . . , an ].
In the Calkin-Wilf tree, [0; a1 , a2 , a3 , . . . , an ] has the two descendants
[0; a1 +1, a2 , a3 , . . . , an ] and [1; a1 , a2 , a3 , . . . , an ], and for a0 ≥ 1 (i.e. rationals
> 1), [a0 ; a1 , a2 , a3 , . . . , an ] has descendants [0; a0 + 1, a1 , a2 , a3 , . . . , an ] and
A. Solutions to Exercises 421

[a0 + 1; a1 , a2 , a3 , . . . , an ]. In particular, a rational pq ∈ (0, 1) has the same

left descendant in both trees, but the right descendant in the Kepler tree is
obtained in the Calkin-Wilf tree by some other less direct path but in the
same row. The f 2 -orbit of 12 for the Calkin-Wilf function f is a denumeration
of the rationals in (0, 1), row by row in the Kepler tree.
Solution to Exercise 8.39: Use the unique ergodicity of f : Tp → Tp ,

f : (x1 , . . . , xp ) → x1 + α,x2 + 2x1 + α,
x3 + 3x2 + 3x1 + α, . . . , xj + α .

Then f n (0, . . . , 0) = (αn, αn2 , αn3 , . . . , αnp ) mod 1, so ({αnj }) is uniformly

(and even well-)distributed for each 1 ≤ j‘p. Well-distribution carries over
to linear combinations, so ({αnj }) is well-distributed for every polynomial p
with rational coefficients.
Solution to Exercise 8.57: Straightforward calculation:
1 4N +1 − 4
1 1
d(E) = lim 22n = lim = .
N →∞ 22N +2 N →∞ 4 N +1 4−1 3
1 2
d(E) = lim 22n = 2d(E) = .
N →∞ 22N +1 3
N 22n+1 −1
1 1
δ(E) = lim
N →∞ log 22N +2 k
n=1 k=22n
= lim log 22n+1 − log 2n + O(2−2n )
N →∞ 2(N + 1) log 2
(2N + 1) log 2 + O(1) 1
= lim N → ∞ = .
2(N + 1) log 2 2
N 22n+1 −1
1 1 2N + 2 1
δ(E) = lim = lim δ(E) = .
N →∞ log 22N +1 k N →∞ 2N + 1 2
n=1 k=22n

[1] J. Aarts, R. Fokkink, and G. Kruijtzer, Morphic numbers (Dutch, with Dutch summary),
Nieuw Arch. Wiskd. (5) 2 (2001), no. 1, 56–58. MR1823158
[2] E. H. El Abdalaoui, M. Lemańczyk, and T. de la Rue, A dynamical point of view
on the set of B-free integers, Int. Math. Res. Not. IMRN 16 (2015), 7258–7286, DOI
10.1093/imrn/rnu164. MR3428961
[3] B. Adamczewski, Balances for fixed points of primitive substitutions: Words, Theoret. Com-
put. Sci. 307 (2003), no. 1, 47–75, DOI 10.1016/S0304-3975(03)00092-6. MR2014730
[4] B. Adamczewski, Symbolic discrepancy and self-similar dynamics (English, with English
and French summaries), Ann. Inst. Fourier (Grenoble) 54 (2004), no. 7, 2201–2234 (2005).
[5] T. M. Adams, Smorodinsky’s conjecture on rank-one mixing, Proc. Amer. Math. Soc. 126
(1998), no. 3, 739–744, DOI 10.1090/S0002-9939-98-04082-9. MR1443143
[6] T. Adams and N. Friedman, Staircase mixing, Preprint, 1997.
[7] M. Adamska, S. Bezuglyi, O. Karpel, and J. Kwiatkowski, Subdiagrams and invariant mea-
sures on Bratteli diagrams, Ergodic Theory Dynam. Systems 37 (2017), no. 8, 2417–2452,
DOI 10.1017/etds.2016.8. MR3719266
[8] R. L. Adler and L. Flatto, Uniform distribution of Kakutani’s interval splitting proce-
dure, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 38 (1977), no. 4, 253–259, DOI
10.1007/BF00533157. MR447521
[9] R. L. Adler, A. G. Konheim, and M. H. McAndrew, Topological entropy, Trans. Amer. Math.
Soc. 114 (1965), 309–319, DOI 10.2307/1994177. MR175106
[10] R. L. Adler and B. Weiss, Entropy, a complete metric invariant for automorphisms of the
torus, Proc. Nat. Acad. Sci. U.S.A. 57 (1967), 1573–1576, DOI 10.1073/pnas.57.6.1573.
[11] C. Aistleitner, M. Hofer, and V. Ziegler, On the uniform distribution modulo 1 of multi-
dimensional LS-sequences, Ann. Mat. Pura Appl. (4) 193 (2014), no. 5, 1329–1344, DOI
10.1007/s10231-013-0331-0. MR3262635
[12] E. Akin, The general topology of dynamical systems, Graduate Studies in Mathemat-
ics, vol. 1, American Mathematical Society, Providence, RI, 1993, DOI 10.1090/gsm/001.
[13] E. Akin, J. Auslander, and E. Glasner, The topological dynamics of Ellis actions, Mem.
Amer. Math. Soc. 195 (2008), no. 913, vi+152, DOI 10.1090/memo/0913. MR2437846

424 Bibliography

[14] E. Akin, J. Auslander, and K. Berg, When is a transitive map chaotic?, Convergence in
ergodic theory and probability (Columbus, OH, 1993), Ohio State Univ. Math. Res. Inst.
Publ., vol. 5, de Gruyter, Berlin, 1996, pp. 25–40. MR1412595
[15] S. Akiyama, Cubic Pisot units with finite beta expansions, Algebraic number theory and
Diophantine analysis (Graz, 1998), de Gruyter, Berlin, 2000, pp. 11–26. MR1770451
[16] S. Akiyama, On the boundary of self affine tilings generated by Pisot numbers, J. Math.
Soc. Japan 54 (2002), no. 2, 283–308, DOI 10.2969/jmsj/05420283. MR1883519
[17] K. T. Alligood, T. D. Sauer, and J. A. Yorke, Chaos: An introduction to dynamical systems,
Textbooks in Mathematical Sciences, Springer-Verlag, New York, 1997, DOI 10.1007/978-
3-642-59281-2. MR1418166
[18] J.-P. Allouche and M. Cosnard, The Komornik-Loreti constant is transcendental, Amer.
Math. Monthly 107 (2000), no. 5, 448–449, DOI 10.2307/2695302. MR1763399
[19] J.-P. Allouche and J. Shallit, The ubiquitous Prouhet-Thue-Morse sequence, Sequences and
their applications (Singapore, 1998), Springer Ser. Discrete Math. Theor. Comput. Sci.,
Springer, London, 1999, pp. 1–16. MR1843077
[20] J.-P. Allouche and J. Shallit, Automatic sequences: Theory, applications, generaliza-
tions, Cambridge University Press, Cambridge, 2003, DOI 10.1017/CBO9780511546563.
[21] J.-P. Allouche, J. Shallit, and R. Yassawi, How to prove that a sequence is not automatic,
Expo. Math. 40 (2022), no. 1, 1–22, DOI 10.1016/j.exmath.2021.08.001. MR4388977
[22] L. Alsedà, J. Llibre, and M. Misiurewicz, Combinatorial dynamics and entropy in dimension
one, 2nd ed., Advanced Series in Nonlinear Dynamics, vol. 5, World Scientific Publishing
Co., Inc., River Edge, NJ, 2000, DOI 10.1142/4205. MR1807264
[23] L. Alvin, The strange star product, J. Difference Equ. Appl. 18 (2012), no. 4, 657–674, DOI
10.1080/10236198.2011.608066. MR2905289
[24] L. Alvin, Toeplitz kneading sequences and adding machines, Discrete Contin. Dyn. Syst. 33
(2013), no. 8, 3277–3287, DOI 10.3934/dcds.2013.33.3277. MR3021357
[25] L. Alvin, Uniformly recurrent sequences and minimal Cantor omega-limit sets, Fund. Math.
231 (2015), no. 3, 273–284, DOI 10.4064/fm231-3-3. MR3397281
[26] L. Alvin, Homeomorphisms on minimal Cantor sets in the unimodal setting, Topology Appl.
282 (2020), 107292, 10, DOI 10.1016/j.topol.2020.107292. MR4119460
[27] D. Anosov, Tangential fields of transversal foliations in Y -systems, Math. Notes 2 (1967),
[28] A. Anušić, H. Bruin, and J. Činč, Topological properties of Lorenz maps derived
from unimodal maps, J. Difference Equ. Appl. 26 (2020), no. 8, 1174–1191, DOI
10.1080/10236198.2020.1760260. MR4164085
[29] C. Apparicio, Reconnaissabilité des substitutions de longueur constante, Stage de Maîtrise
de l’ENS Lyon, 1999.
[30] V. I. Arnold and A. Avez, Ergodic problems of classical mechanics, Translated from the
French by A. Avez, W. A. Benjamin, Inc., New York-Amsterdam, 1968. MR0232910
[31] P. Arnoux and A. M. Fisher, The scenery flow for geometric structures on the
torus: the linear setting, Chinese Ann. Math. Ser. B 22 (2001), no. 4, 427–470, DOI
10.1142/S0252959901000425. MR1870070
[32] P. Arnoux and E. Harriss, What is ... a Rauzy fractal?, Notices Amer. Math. Soc. 61 (2014),
no. 7, 768–770, DOI 10.1090/noti1144. MR3235844
[33] P. Arnoux and S. Ito, Pisot substitutions and Rauzy fractals (English, with English and
French summaries), Journées Montoises d’Informatique Théorique (Marne-la-Vallée, 2000),
Bull. Belg. Math. Soc. Simon Stevin 8 (2001), no. 2, 181–207. MR1838930
[34] P. Arnoux, M. Mizutani, and T. Sellami, Random product of substitutions with the same
incidence matrix, Theoret. Comput. Sci. 543 (2014), 68–78, DOI 10.1016/j.tcs.2014.06.002.
Bibliography 425

[35] P. Arnoux and G. Rauzy, Représentation géométrique de suites de complexité 2n+1 (French,
with English summary), Bull. Soc. Math. France 119 (1991), no. 2, 199–215. MR1116845
[36] E. Artin, Galois Theory, Notre Dame Mathematical Lectures 2 (1971).
[37] J. S. Athreya and J. Chaika, The Hausdorff dimension of non-uniquely ergodic direc-
tions in H(2) is almost everywhere 12 , Geom. Topol. 19 (2015), no. 6, 3537–3563, DOI
10.2140/gt.2015.19.3537. MR3447109
[38] J. Auslander, Minimal flows and their extensions, Notas de Matemática [Mathematical
Notes], 122, North-Holland Mathematics Studies, vol. 153, North-Holland Publishing Co.,
Amsterdam, 1988. MR956049
[39] J. Auslander and J. A. Yorke, Interval maps, factors of maps, and chaos, Tohoku Math. J.
(2) 32 (1980), no. 2, 177–188, DOI 10.2748/tmj/1178229634. MR580273
[40] A. Avila and G. Forni, Weak mixing for interval exchange transformations and translation
flows, Ann. of Math. (2) 165 (2007), no. 2, 637–664, DOI 10.4007/annals.2007.165.637.
[41] M. Baake and U. Grimm, Squirals and beyond: substitution tilings with singular con-
tinuous spectrum, Ergodic Theory Dynam. Systems 34 (2014), no. 4, 1077–1102, DOI
10.1017/etds.2012.191. MR3227148
[42] S. Baker, Generalized golden ratios over integer alphabets, Integers 14 (2014), Paper No.
A15, 28, DOI 10.15546/aeei-2014-0005. MR3239596
[43] S. Baker and A. E. Ghenciu, Dynamical properties of S-gap shifts and other shift spaces, J.
Math. Anal. Appl. 430 (2015), no. 2, 633–647, DOI 10.1016/j.jmaa.2015.04.092. MR3351972
[44] V. Baker, M. Barge, and J. Kwapisz, Geometric realization and coincidence for reducible
non-unimodular Pisot tiling spaces with an application to β-shifts, Numération, pavages,
substitutions, Ann. Inst. Fourier (Grenoble) 56 (2006), no. 7, 2213–2248. MR2290779
[45] F. Balibrea, J. Smítal, and M. Štefánková, The three versions of distributional chaos, Chaos
Solitons Fractals 23 (2005), no. 5, 1581–1583, DOI 10.1016/j.chaos.2004.06.011. MR2101573
[46] J. Banks, J. Brooks, G. Cairns, G. Davis, and P. Stacey, On Devaney’s definition of chaos,
Amer. Math. Monthly 99 (1992), no. 4, 332–334, DOI 10.2307/2324899. MR1157223
[47] J. Banks, T. T. D. Nguyen, P. Oprocha, B. Stanley, and B. Trotta, Dynamics
of spacing shifts, Discrete Contin. Dyn. Syst. 33 (2013), no. 9, 4207–4232, DOI
10.3934/dcds.2013.33.4207. MR3038059
[48] G. Barat, T. Downarowicz, A. Iwanik, and P. Liardet, Propriétés topologiques et combi-
natoires des échelles de numération. part 2 (French, with English summary), dedicated
to the memory of Anzelm Iwanik, Colloq. Math. 84/85 (2000), no. part 2, 285–306, DOI
10.4064/cm-84/85-2-285-306. MR1784198
[49] G. Barat, T. Downarowicz, and P. Liardet, Dynamiques associées à une échelle de numéra-
tion (French), Acta Arith. 103 (2002), no. 1, 41–78, DOI 10.4064/aa103-1-5. MR1904893
[50] M. Barge, The Pisot conjecture for β-substitutions, Ergodic Theory Dynam. Systems 38
(2018), no. 2, 444–472, DOI 10.1017/etds.2016.44. MR3774828
[51] M. Barge, H. Bruin, and S. Štimac, The Ingram conjecture, Geom. Topol. 16 (2012), no. 4,
2481–2516, DOI 10.2140/gt.2012.16.2481. MR3033522
[52] M. Barge and B. Diamond, Coincidence for substitutions of Pisot type (English, with Eng-
lish and French summaries), Bull. Soc. Math. France 130 (2002), no. 4, 619–626, DOI
10.24033/bsmf.2433. MR1947456
[53] M. Barge and J. Kwapisz, Geometric theory of unimodular Pisot substitutions, Amer. J.
Math. 128 (2006), no. 5, 1219–1282. MR2262174
[54] M. Barge, S. Štimac, and R. F. Williams, Pure discrete spectrum in substitution tiling spaces,
Discrete Contin. Dyn. Syst. 33 (2013), no. 2, 579–597, DOI 10.3934/dcds.2013.33.579.
426 Bibliography

[55] A. D. Barwell, C. Good, and P. Oprocha, Shadowing and expansivity in subspaces, Fund.
Math. 219 (2012), no. 3, 223–243, DOI 10.4064/fm219-3-2. MR3001240
[56] R. Bass, Real analysis for graduate students, CreateSpace Independent Publishing Platform
(2016), Version 3.1 online on:
[57] T. Bedford et al., Ergodic theory, symbolic dynamics, and hyperbolic spaces (Trieste, 1989),
Oxford Sci. Publ., Oxford Univ. Press, New York, 1991.
[58] K. R. Berg, On the conjugacy problem for K-systems, ProQuest LLC, Ann Arbor, MI, 1967.
Thesis (Ph.D.)–University of Minnesota. MR2616688
[59] E. R. Berlekamp, J. H. Conway, and R. K. Guy, Winning ways for your mathematical plays.
Vol. 1: Games in general, Academic Press, Inc. [Harcourt Brace Jovanovich, Publishers],
London-New York, 1982. MR654501
[60] J. Berstel, Growth of repetition-free words—a review, Theoret. Comput. Sci. 340 (2005),
no. 2, 280–290, DOI 10.1016/j.tcs.2005.03.039. MR2150766
[61] J. Berstel, A. Lauve, C. Reutenauer, and F. V. Saliola, Combinatorics on words: Christoffel
words and repetitions in words, CRM Monograph Series, vol. 27, American Mathematical
Society, Providence, RI, 2009, DOI 10.1090/crmm/027. MR2464862
[62] V. Berthé, P. Cecchi Bernales, and R. Yassawi, Coboundaries and eigenvalues of finitary
S-adic systems, Preprint, 2022, arXiv:2202.07270.
[63] V. Berthé, P. Cecchi Bernales, F. Durand, J. Leroy, D. Perrin, and S. Petite, On the di-
mension group of unimodular S-adic subshifts, Monatsh. Math. 194 (2021), no. 4, 687–717,
DOI 10.1007/s00605-020-01488-3. MR4228544
[64] V. Berthé and V. Delecroix, Beyond substitutive dynamical systems: S-adic expansions,
Numeration and substitution 2012, RIMS Kôkyûroku Bessatsu, B46, Res. Inst. Math. Sci.
(RIMS), Kyoto, 2014, pp. 81–123. MR3330561
[65] V. Berthé, T. Jolivet, and A. Siegel, Substitutive Arnoux-Rauzy sequences have pure discrete
spectrum, Unif. Distrib. Theory 7 (2012), no. 1, 173–197. MR2943167
[66] V. Berthé, T. Jolivet, and A. Siegel, Connectedness of fractals associated with Arnoux-
Rauzy substitutions, RAIRO Theor. Inform. Appl. 48 (2014), no. 3, 249–266, DOI
10.1051/ita/2014008. MR3302487
[67] V. Berthé and H. Nakada, On continued fraction expansions in positive characteristic:
equivalence relations and some metric properties, Expo. Math. 18 (2000), no. 4, 257–284.
[68] V. Berthé et al., Combinatorics, automata and number theory, Eds. V. Berthé and M. Rigo.
[69] V. Berthé, A. Siegel, and J. Thuswaldner, Substitutions, Rauzy fractals and tilings, Com-
binatorics, automata and number theory, Encyclopedia Math. Appl., vol. 135, Cambridge
Univ. Press, Cambridge, 2010, pp. 248–323. MR2759108
[70] V. Berthé, W. Steiner, and J. M. Thuswaldner, Geometry, dynamics, and arithmetic of
S-adic shifts (English, with English and French summaries), Ann. Inst. Fourier (Grenoble)
69 (2019), no. 3, 1347–1409. MR3986918
[71] V. Berthé, W. Steiner, J. M. Thuswaldner, and R. Yassawi, Recognizability for sequences
of morphisms, Ergodic Theory Dynam. Systems 39 (2019), no. 11, 2896–2931, DOI
10.1017/etds.2017.144. MR4015135
[72] V. Bergelson, G. Kolesnik, and Y. Son, Uniform distribution of subpolynomial func-
tions along primes and applications, J. Anal. Math. 137 (2019), no. 1, 135–187, DOI
10.1007/s11854-018-0068-1. MR3938000
[73] J. Berstel and A. de Luca, Sturmian words, Lyndon words and trees, Theoret. Comput. Sci.
178 (1997), no. 1-2, 171–203, DOI 10.1016/S0304-3975(96)00101-6. MR1453849
[74] A. Besbes, M. Boshernitzan, and D. Lenz, Delone sets with finite local complexity: linear
repetitivity versus positivity of weights, Discrete Comput. Geom. 49 (2013), no. 2, 335–347,
DOI 10.1007/s00454-012-9455-z. MR3017915
Bibliography 427

[75] A. S. Besicovitch, On the density of certain sequences of integers, Math. Ann. 110 (1935),
no. 1, 336–341, DOI 10.1007/BF01448032. MR1512943
[76] E. Bessel-Hagen, Zahlentheorie, Teubner, 1929.
[77] S. Bezuglyi, O. Karpel, and J. Kwiatkowski, Exact number of ergodic invariant mea-
sures for Bratteli diagrams, J. Math. Anal. Appl. 480 (2019), no. 2, 123431, 49, DOI
10.1016/j.jmaa.2019.123431. MR4000100
[78] S. Bezuglyi, J. Kwiatkowski, and K. Medynets, Aperiodic substitution systems and
their Bratteli diagrams, Ergodic Theory Dynam. Systems 29 (2009), no. 1, 37–72, DOI
10.1017/S0143385708000230. MR2470626
[79] S. Bezuglyi, J. Kwiatkowski, K. Medynets, and B. Solomyak, Invariant measures on station-
ary Bratteli diagrams, Ergodic Theory Dynam. Systems 30 (2010), no. 4, 973–1007, DOI
10.1017/S0143385709000443. MR2669408
[80] S. Bezuglyi, J. Kwiatkowski, K. Medynets, and B. Solomyak, Finite rank Bratteli diagrams:
structure of invariant measures, Trans. Amer. Math. Soc. 365 (2013), no. 5, 2637–2679,
DOI 10.1090/S0002-9947-2012-05744-8. MR3020111
[81] S. Bezuglyi, J. Kwiatkowski, and R. Yassawi, Perfect orderings on finite rank Brat-
teli diagrams, Canad. J. Math. 66 (2014), no. 1, 57–101, DOI 10.4153/CJM-2013-041-6.
[82] S. Bezuglyi and R. Yassawi, Orders that yield homeomorphisms on Bratteli diagrams, Dyn.
Syst. 32 (2017), no. 2, 249–282, DOI 10.1080/14689367.2016.1197888. MR3638433
[83] F. Blanchard, E. Glasner, S. Kolyada, and A. Maass, On Li-Yorke pairs, J. Reine Angew.
Math. 547 (2002), 51–68, DOI 10.1515/crll.2002.053. MR1900136
[84] F. Blanchard and G. Hansel, Systèmes codés (French, with English summary), Theoret.
Comput. Sci. 44 (1986), no. 1, 17–49, DOI 10.1016/0304-3975(86)90108-8. MR858689
[85] F. Blanchard et al., Topics in Symbolic Dynamics and Applications, London Mathematical
Society Lecture Note Series, Editors: F. Blanchard, A. Maass, and A. Nogueira, Cambridge
Univ. Press, 2000, ISBN 9780521796606.
[86] P. Billingsley, Probability and measure, 3rd ed., A Wiley-Interscience Publication, Wiley
Series in Probability and Mathematical Statistics, John Wiley & Sons, Inc., New York,
1995. MR1324786
[87] L. S. Block and W. A. Coppel, Dynamics in one dimension, Lecture Notes in Mathematics,
vol. 1513, Springer-Verlag, Berlin, 1992, DOI 10.1007/BFb0084762. MR1176513
[88] L. Block, J. Guckenheimer, M. Misiurewicz, and L. S. Young, Periodic points and topologi-
cal entropy of one-dimensional maps, Global theory of dynamical systems (Proc. Internat.
Conf., Northwestern Univ., Evanston, Ill., 1979), Lecture Notes in Math., vol. 819, Springer,
Berlin, 1980, pp. 18–34. MR591173
[89] L. Block, J. Keesling, and M. Misiurewicz, Strange adding machines, Ergodic Theory Dy-
nam. Systems 26 (2006), no. 3, 673–682, DOI 10.1017/S0143385705000635. MR2237463
[90] A. M. Blokh, Sensitive mappings of an interval (Russian), Uspekhi Mat. Nauk 37 (1982),
no. 2(224), 189–190. MR650765
[91] A. M. Blokh, Decomposition of dynamical systems on an interval (Russian), Uspekhi Mat.
Nauk 38 (1983), no. 5(233), 179–180. MR718829
[92] A. M. Blokh, The “spectral” decomposition for one-dimensional maps, Dynamics reported,
Dynam. Report. Expositions Dynam. Systems (N.S.), vol. 4, Springer, Berlin, 1995, pp. 1–59.
[93] A. Blokh and L. Oversteegen, Wandering triangles exist (English, with English and
French summaries), C. R. Math. Acad. Sci. Paris 339 (2004), no. 5, 365–370, DOI
10.1016/j.crma.2004.06.024. MR2092465
[94] J. Bobok and H. Bruin, Constant slope maps and the Vere-Jones classification, Entropy 18
(2016), no. 6, Paper No. 234, 27, DOI 10.3390/e18060234. MR3530057
428 Bibliography

[95] J. Bobok and M. Soukenka, On piecewise affine interval maps with countably many laps,
Discrete Contin. Dyn. Syst. 31 (2011), no. 3, 753–762, DOI 10.3934/dcds.2011.31.753.
[96] M. Boshernitzan, A unique ergodicity of minimal symbolic flows with linear block growth,
J. Analyse Math. 44 (1984/85), 77–96, DOI 10.1007/BF02790191. MR801288
[97] M. D. Boshernitzan, A condition for unique ergodicity of minimal symbolic flows, Er-
godic Theory Dynam. Systems 12 (1992), no. 3, 425–428, DOI 10.1017/S0143385700006866.
[98] W. Bosma et al., Continued fractions, Eds. W. Bosma and C. Kraaikamp, URL:∼bosma/Students/CF.pdf.
[99] R. Bowen, Markov partitions for Axiom A diffeomorphisms, Amer. J. Math. 92 (1970),
725–747, DOI 10.2307/2373370. MR277003
[100] R. Bowen, Markov partitions and minimal sets for Axiom A diffeomorphisms, Amer. J.
Math. 92 (1970), 907–918, DOI 10.2307/2373402. MR277002
[101] R. Bowen, Periodic points and measures for Axiom A diffeomorphisms, Trans. Amer. Math.
Soc. 154 (1971), 377–397, DOI 10.2307/1995452. MR282372
[102] R. Bowen, Equilibrium states and the ergodic theory of Anosov diffeomorphisms, Second
revised edition, with a preface by David Ruelle; edited by Jean-René Chazottes, Lecture
Notes in Mathematics, vol. 470, Springer-Verlag, Berlin, 2008. MR2423393
[103] R. Bowen, Some systems with unique equilibrium states, Math. Systems Theory 8 (1974/75),
no. 3, 193–202, DOI 10.1007/BF01762666. MR399413
[104] R. Bowen, Markov partitions are not smooth, Proc. Amer. Math. Soc. 71 (1978), no. 1,
130–132, DOI 10.2307/2042234. MR474415
[105] P. Boyland, A. de Carvalho, and T. Hall, Natural extensions of unimodal maps: virtual
sphere homeomorphisms and prime ends of basin boundaries, Geom. Topol. 25 (2021),
no. 1, 111–228, DOI 10.2140/gt.2021.25.111. MR4226229
[106] M. Boyle, Open problems in symbolic dynamics, Geometric and probabilistic struc-
tures in dynamics, Contemp. Math., vol. 469, Amer. Math. Soc., Providence, RI,
and updates on∼mboyle/open/, 2008, pp. 69–118, DOI
10.1090/conm/469/09161. MR2478466
[107] F.-J. Brandenburg, Uniformly growing kth power-free homomorphisms, Theoret. Comput.
Sci. 23 (1983), no. 1, 69–82, DOI 10.1016/0304-3975(88)90009-6. MR693069
[108] O. Bratteli, Inductive limits of finite dimensional C ∗ -algebras, Trans. Amer. Math. Soc.
171 (1972), 195–234, DOI 10.2307/1996380. MR312282
[109] X. Bressaud, F. Durand, and A. Maass, Necessary and sufficient conditions to be an eigen-
value for linearly recurrent dynamical Cantor systems, J. London Math. Soc. (2) 72 (2005),
no. 3, 799–816, DOI 10.1112/S0024610705006800. MR2190338
[110] X. Bressaud, F. Durand, and A. Maass, On the eigenvalues of finite rank Bratteli-Vershik
dynamical systems, Ergodic Theory Dynam. Systems 30 (2010), no. 3, 639–664, DOI
10.1017/S0143385709000236. MR2643706
[111] J. Brillhart and L. Carlitz, Note on the Shapiro polynomials, Proc. Amer. Math. Soc. 25
(1970), 114–118, DOI 10.2307/2036537. MR260955
[112] K. Briggs, A precise calculation of the Feigenbaum constants, Math. Comp. 57 (1991),
no. 195, 435–439, DOI 10.2307/2938684. MR1079009
[113] M. Brin and G. Stuck, Introduction to dynamical systems, Cambridge University Press,
Cambridge, 2002, DOI 10.1017/CBO9780511755316. MR1963683
[114] J. Brinkhuis, Nonrepetitive sequences on three symbols, Quart. J. Math. Oxford Ser. (2) 34
(1983), no. 134, 145–149, DOI 10.1093/qmath/34.2.145. MR698202
[115] S. Brlek, Enumeration of factors in the Thue-Morse word, First Montreal Conference on
Combinatorics and Computer Science, 1987, Discrete Appl. Math. 24 (1989), no. 1-3, 83–96,
DOI 10.1016/0166-218X(92)90274-E. MR1011264
Bibliography 429

[116] K. M. Brucks and H. Bruin, Topics from one-dimensional dynamics, London Mathemat-
ical Society Student Texts, vol. 62, Cambridge University Press, Cambridge, 2004, DOI
10.1017/CBO9780511617171. MR2080037
[117] H. Bruin, Invariant measures of interval maps, ProQuest LLC, Ann Arbor, MI, 1994. Thesis
(Dr.)–Technische Universiteit Delft (The Netherlands). MR2714793
[118] H. Bruin, Combinatorics of the kneading map, Proceedings of the Conference “Thirty Years
after Sharkovskiı̆’s Theorem: New Perspectives” (Murcia, 1994), Internat. J. Bifur. Chaos
Appl. Sci. Engrg. 5 (1995), no. 5, 1339–1349, DOI 10.1142/S0218127495001010. MR1361922
[119] H. Bruin, Homeomorphic restrictions of unimodal maps, Geometry and topology in dynam-
ics (Winston-Salem, NC, 1998/San Antonio, TX, 1999), Contemp. Math., vol. 246, Amer.
Math. Soc., Providence, RI, 1999, pp. 47–56, DOI 10.1090/conm/246/03773. MR1732370
[120] H. Bruin, Inverse limit spaces of post-critically finite tent maps, Fund. Math. 165 (2000),
no. 2, 125–138, DOI 10.4064/fm-165-2-125-138. MR1808727
[121] H. Bruin, Minimal Cantor systems and unimodal maps, dedicated to Professor Alexander N.
Sharkovsky on the occasion of his 65th birthday, J. Difference Equ. Appl. 9 (2003), no. 3-4,
305–318, DOI 10.1080/1023619021000047743. MR1990338
[122] H. Bruin and J. Hawkins, Exactness and maximal automorphic factors of unimodal
interval maps, Ergodic Theory Dynam. Systems 21 (2001), no. 4, 1009–1034, DOI
10.1017/S0143385701001481. MR1849599
[123] H. Bruin and J. Hawkins, Rigidity of smooth one-sided Bernoulli endomorphisms, New York
J. Math. 15 (2009), 451–483. MR2558792
[124] H. Bruin, A. Kaffl, and D. Schleicher, Existence of quadratic Hubbard trees, Fund. Math.
202 (2009), no. 3, 251–279, DOI 10.4064/fm202-3-4. MR2476617
[125] H. Bruin, G. Keller, and M. St. Pierre, Adding machines and wild attractors, Ergodic
Theory Dynam. Systems 17 (1997), no. 6, 1267–1287, DOI 10.1017/S0143385797086392.
[126] H. Bruin and D. Schleicher, Admissibility of kneading sequences and structure of Hubbard
trees for quadratic polynomials, J. Lond. Math. Soc. (2) 78 (2008), no. 2, 502–522, DOI
10.1112/jlms/jdn033. MR2439637
[127] H. Bruin and M. Todd, Transience and thermodynamic formalism for infinitely branched
interval maps, J. Lond. Math. Soc. (2) 86 (2012), no. 1, 171–194, DOI 10.1112/jlms/jdr081.
[128] H. Bruin and S. Troubetzkoy, The Gauss map on a class of interval translation mappings,
Israel J. Math. 137 (2003), 125–148, DOI 10.1007/BF02785958. MR2013352
[129] H. Bruin and O. Volkova, The complexity of Fibonacci-like kneading sequences, Theoret.
Comput. Sci. 337 (2005), no. 1-3, 379–389, DOI 10.1016/j.tcs.2005.02.001. MR2141232
[130] J. R. Büchi, Weak second-order arithmetic and finite automata, Z. Math. Logik Grundlagen
Math. 6 (1960), 66–92, DOI 10.1002/malq.19600060105. MR125010
[131] A. Bufetov, Y. G. Sinai, and C. Ulcigrai, A condition for continuous spectrum of an in-
terval exchange transformation, Representation theory, dynamical systems, and asymptotic
combinatorics, Amer. Math. Soc. Transl. Ser. 2, vol. 217, Amer. Math. Soc., Providence, RI,
2006, pp. 23–35, DOI 10.1090/trans2/217/03. MR2276099
[132] A. I. Bufetov and B. Solomyak, On the modulus of continuity for spectral measures in
substitution dynamics, Adv. Math. 260 (2014), 84–129, DOI 10.1016/j.aim.2014.04.004.
[133] R. Burton and J. E. Steif, Non-uniqueness of measures of maximal entropy for sub-
shifts of finite type, Ergodic Theory Dynam. Systems 14 (1994), no. 2, 213–235, DOI
10.1017/S0143385700007859. MR1279469
[134] R. Burton and J. E. Steif, New results on measures of maximal entropy, Israel J. Math. 89
(1995), no. 1-3, 275–300, DOI 10.1007/BF02808205. MR1324466
430 Bibliography

[135] J. Buzzi, Subshifts of quasi-finite type, Invent. Math. 159 (2005), no. 2, 369–406, DOI
10.1007/s00222-004-0392-1. MR2116278
[136] J. Buzzi, Puzzles of quasi-finite type, zeta functions and symbolic dynamics for multi-
dimensional maps (English, with English and French summaries), Ann. Inst. Fourier (Greno-
ble) 60 (2010), no. 3, 801–852. MR2680817
[137] J. Buzzi and P. Hubert, Piecewise monotone maps without periodic points: rigidity, mea-
sures and complexity, Ergodic Theory Dynam. Systems 24 (2004), no. 2, 383–405, DOI
10.1017/S0143385703000488. MR2054049
[138] V. Canterini and A. Siegel, Automate des préfixes-suffixes associé à une substitution primi-
tive (French, with English and French summaries), J. Théor. Nombres Bordeaux 13 (2001),
no. 2, 353–369. MR1879663
[139] F. Cai, A characterization of weak-mixing for minimal systems, Topology Appl. 267 (2019),
106844, 11, DOI 10.1016/j.topol.2019.106844. MR4001122
[140] M. Campanino and H. Epstein, On the existence of Feigenbaum’s fixed point, Comm. Math.
Phys. 79 (1981), no. 2, 261–302. MR612250
[141] I. Carbone, A van der Corput-type algorithm for LS-sequences of points, Preprint, 2012,
arXiv:1209.3611, and Extension of van der Corput algorithm to LS-sequences, Appl. Math.
Comput. 255 (2015), 207–213.
[142] I. Carbone, Discrepancy of LS-sequences of partitions and points, Ann. Mat. Pura Appl.
(4) 191 (2012), no. 4, 819–844, DOI 10.1007/s10231-011-0208-z. MR2993975
[143] I. Carbone, M. R. Iacò, and A. Volčič, A dynamical system approach to the Kakutani-
Fibonacci sequence, Ergodic Theory Dynam. Systems 34 (2014), no. 6, 1794–1806, DOI
10.1017/etds.2013.20. MR3272771
[144] L. Carleson and T. W. Gamelin, Complex dynamics, Universitext: Tracts in Mathematics,
Springer-Verlag, New York, 1993, DOI 10.1007/978-1-4612-4364-9. MR1230383
[145] J. Cassaigne, Counting overlap-free binary words, STACS 93 (Würzburg, 1993), Lecture
Notes in Comput. Sci., vol. 665, Springer, Berlin, 1993, pp. 216–225, DOI 10.1007/3-540-
56503-5_24. MR1249296
[146] J. Cassaigne, Special factors of sequences with linear subword complexity, Developments in
language theory, II (Magdeburg, 1995), World Sci. Publ., River Edge, NJ, 1996, pp. 25–34.
[147] J. Cassaigne, Subword complexity and periodicity in two or more dimensions, Developments
in language theory (Aachen, 1999), World Sci. Publ., River Edge, NJ, 2000, pp. 14–21.
[148] J. Cassaigne, S. Ferenczi, and A. Messaoudi, Weak mixing and eigenvalues for Arnoux-Rauzy
sequences, Ann. Inst. Fourier (Grenoble) 58 (2008), no. 6, 1983–2005. MR2473626
[149] J. Cassaigne, S. Ferenczi, and L. Q. Zamboni, Imbalances in Arnoux-Rauzy sequences
(English, with English and French summaries), Ann. Inst. Fourier (Grenoble) 50 (2000),
no. 4, 1265–1276. MR1799745
[150] J. Cassaigne and F. Nicolas, Factor complexity, Combinatorics, automata and number the-
ory, Encyclopedia Math. Appl., vol. 135, Cambridge Univ. Press, Cambridge, 2010, pp. 163–
247. MR2759107
[151] J. Chaika and H. Masur, There exists an interval exchange with a non-ergodic generic
measure, J. Mod. Dyn. 9 (2015), 289–304, DOI 10.3934/jmd.2015.9.289. MR3412151
[152] J. Chaika and H. Masur, The set of non-uniquely ergodic d-IETs has Hausdorff codi-
mension 1/2, Invent. Math. 222 (2020), no. 3, 749–832, DOI 10.1007/s00222-020-00978-3.
[153] X. Chen, Q.-H. Lu, and H.-M. Xie, Grammatical complexity of Feigenbaum attractor, Ad-
vances in Mathematics (China) 22185–6 (1993).
[154] D. K. Childers, Wandering polygons and recurrent critical leaves, Ergodic Theory Dynam.
Systems 27 (2007), no. 1, 87–107, DOI 10.1017/S0143385706000526. MR2297088
Bibliography 431

[155] S. Chowla, On abundant numbers, J. Indian Math. Soc. New Ser. 1 (1934), 41–44.
[156] J. P. Clay, Proximity relations in transformation groups, Trans. Amer. Math. Soc. 108
(1963), 88–96, DOI 10.2307/1993827. MR154269
[157] V. Climenhaga, MathBlog:
[158] V. Climenhaga, Specification and towers in shift spaces, Comm. Math. Phys. 364 (2018),
no. 2, 441–504, DOI 10.1007/s00220-018-3265-y. MR3869435
[159] V. Climenhaga and D. J. Thompson, Intrinsic ergodicity beyond specification: β-shifts, S-gap
shifts, and their factors, Israel J. Math. 192 (2012), no. 2, 785–817, DOI 10.1007/s11856-
012-0052-x. MR3009742
[160] V. Climenhaga and D. J. Thompson, Intrinsic ergodicity via obstruction entropies, Er-
godic Theory Dynam. Systems 34 (2014), no. 6, 1816–1831, DOI 10.1017/etds.2013.16.
[161] A. Cobham, On the base-dependence of sets of numbers recognizable by finite automata,
Math. Systems Theory 3 (1969), 186–192, DOI 10.1007/BF01746527. MR250789
[162] A. Cobham, Uniform tag sequences, Math. Systems Theory 6 (1972), 164–192, DOI
10.1007/BF01706087. MR457011
[163] H. Cohn, A short proof of the simple continued fraction expansion of e, Amer. Math.
Monthly 113 (2006), no. 1, 57–62, DOI 10.2307/27641837. MR2202921
[164] P. Collet and J.-P. Eckmann, Iterated maps on the interval as dynamical systems, Progress
in Physics, vol. 1, Birkhäuser, Boston, Mass., 1980. MR613981
[165] I. P. Cornfeld, S. V. Fomin, and Ya. G. Sinaı̆, Ergodic theory, translated from the Russian by
A. B. Sosinskiı̆, Grundlehren der mathematischen Wissenschaften [Fundamental Principles
of Mathematical Sciences], vol. 245, Springer-Verlag, New York, 1982, DOI 10.1007/978-1-
4615-6927-5. MR832433
[166] J. Coquet, Représentation lacunaires des entières naturelles I, II, Arch. Math. Basel 38
(1982), 184–188, and 41 (1983), 238–242.
[167] M. I. Cortez, F. Durand, B. Host, and A. Maass, Continuous and measurable eigenfunctions
of linearly recurrent dynamical Cantor systems, J. London Math. Soc. (2) 67 (2003), no. 3,
790–804, DOI 10.1112/S0024610703004320. MR1967706
[168] M. I. Cortez and J. Rivera-Letelier, Invariant measures of minimal post-critical sets of logis-
tic maps, Israel J. Math. 176 (2010), 157–193, DOI 10.1007/s11856-010-0024-y. MR2653190
[169] M. I. Cortez and J. Rivera-Letelier, Topological orbit equivalence classes and numeration
scales of logistic maps, Ergodic Theory Dynam. Systems 32 (2012), no. 5, 1501–1526, DOI
10.1017/S0143385711000435. MR2974208
[170] E. M. Coven, I. Kan, and J. A. Yorke, Pseudo-orbit shadowing in the family of tent maps,
Trans. Amer. Math. Soc. 308 (1988), no. 1, 227–241, DOI 10.2307/2000960. MR946440
[171] D. Creutz and C. E. Silva, Mixing on a class of rank-one transformations, Ergodic Theory
Dynam. Systems 24 (2004), no. 2, 407–440, DOI 10.1017/S0143385703000464. MR2054050
[172] D. Creutz and C. E. Silva, Mixing on rank-one transformations, Studia Math. 199 (2010),
no. 1, 43–72, DOI 10.4064/sm199-1-4. MR2652597
[173] J. Currie and N. Rampersad, A proof of Dejean’s conjecture, Math. Comp. 80 (2011),
no. 274, 1063–1070, DOI 10.1090/S0025-5718-2010-02407-X. MR2772111
[174] V. Cyr and B. Kra, Counting generic measures for a subshift of linear growth, J. Eur. Math.
Soc. (JEMS) 21 (2019), no. 2, 355–380, DOI 10.4171/JEMS/838. MR3896204
[175] K. Dajani and C. Kraaikamp, Ergodic theory of numbers, Carus Mathematical Monographs,
vol. 29, Mathematical Association of America, Washington, DC, 2002. MR1917322
[176] D. Damanik and D. Lenz, A condition of Boshernitzan and uniform convergence in the mul-
tiplicative ergodic theorem, Duke Math. J. 133 (2006), no. 1, 95–123, DOI 10.1215/S0012-
7094-06-13314-8. MR2219271
432 Bibliography

[177] M. Damron and J. Fickenscher, On the number of ergodic measures for minimal shifts with
eventually constant complexity growth, Ergodic Theory Dynam. Systems 37 (2017), no. 7,
2099–2130, DOI 10.1017/etds.2015.138. MR3693122
[178] M. Damron and J. Fickenscher, The number of ergodic measures for transitive subshifts
under the regular bispecial condition, Ergodic Theory Dynam. Systems 42 (2022), no. 1,
86–140, DOI 10.1017/etds.2020.134. MR4348411
[179] P. Dartnell, F. Durand, and A. Maass, Orbit equivalence and Kakutani equivalence with
Sturmian subshifts, Studia Math. 142 (2000), no. 1, 25–45, DOI 10.4064/sm-142-1-25-45.
[180] D. A. Dastjerdi and M. Dabbaghian Amiri, Mixing coded systems, Georgian Math. J. 26
(2019), no. 4, 637–642, and ArXiv 1507.08048.pdf, DOI 10.1515/gmj-2017-0058. MR4036605
[181] D. A. Dastjerdi and S. Jangjoo, Computations on sofic S-gap shifts, Qual. Theory Dyn.
Syst. 12 (2013), no. 2, 393–406, DOI 10.1007/s12346-013-0096-2. MR3101268
[182] D. A. Dastjerdi and S. Jangjoo, Dynamics and topology of S-gap shifts, Topology Appl. 159
(2012), no. 10-11, 2654–2661, DOI 10.1016/j.topol.2012.04.002. MR2923435
[183] D. Dastjerdi and S. Shaldehi, (S, S  )-gap shifts as a generalization of run-length-limited
codes, British J. of Math. & Comp. Science 4 (2014), 2765–2780.
[184] H. Davenport, Über numeri abundantes, Sitzungsberichte Preuss. Akad. Wiss. (1933), 830–
[185] H. Davenport, On some infinite series involving arithmetical functions. II, Quart. J. Math.
Oxford 8 (1937), 313–320.
[186] H. Davenport and P. Erdös, On sequences of positive integers, Acta Arithm. 2 (1936), 147–
[187] H. Davenport and P. Erdös, On sequences of positive integers, J. Indian Math. Soc. (N.S.)
15 (1951), 19–24. MR43835
[188] H. Davenport and K. F. Roth, Rational approximations to algebraic numbers, Mathematika
2 (1955), 160–167, DOI 10.1112/S0025579300000814. MR77577
[189] F. Dejean, Sur un théorème de Thue (French), J. Combinatorial Theory Ser. A 13 (1972),
90–99, DOI 10.1016/0097-3165(72)90011-8. MR300959
[190] F. M. Dekking, The spectrum of dynamical systems arising from substitutions of constant
length, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 41 (1977/78), no. 3, 221–239, DOI
10.1007/BF00534241. MR461470
[191] F. M. Dekking and M. Keane, Mixing properties of substitutions, Z. Wahrscheinlichkeitsthe-
orie und Verw. Gebiete 42 (1978), no. 1, 23–33, DOI 10.1007/BF00534205. MR466485
[192] V. Delecroix, C. Matheus, and C. G. Moreira, Approximations of the Lagrange and Markov
spectra, Math. Comp. 89 (2020), no. 325, 2521–2536, DOI 10.1090/mcom/3513. MR4109576
[193] A. Denjoy, Sur les courbes definies par les équations différentielles à la surface du tore,
Journal de Mathématiques Pures et Appliquées 11 (1932), 333-375.
[194] M. Denker, C. Grillenberger, and K. Sigmund, Ergodic theory on compact spaces, Lecture
Notes in Mathematics, Vol. 527, Springer-Verlag, Berlin-New York, 1976. MR0457675
[195] B. Derrida, A. Gervois, and Y. Pomeau, Iteration of endomorphisms on the real axis and
representation of numbers (English, with French summary), Ann. Inst. H. Poincaré Sect. A
(N.S.) 29 (1978), no. 3, 305–356. MR519698
[196] R. L. Devaney, An introduction to chaotic dynamical systems, The Benjamin/Cummings
Publishing Co., Inc., Menlo Park, CA, 1986. MR811850
[197] R. Deviatov, On subword complexity of morphic sequences, Computer science—theory and
applications, Lecture Notes in Comput. Sci., vol. 5010, Springer, Berlin, 2008, pp. 146–157,
DOI 10.1007/978-3-540-79709-8_17. MR2475157
Bibliography 433

[198] J. de Vries, Elements of topological dynamics, Mathematics and its Applications, vol. 257,
Kluwer Academic Publishers Group, Dordrecht, 1993, DOI 10.1007/978-94-015-8171-4.
[199] J. de Vries, Topological dynamical systems: An introduction to the dynamics of contin-
uous mappings, De Gruyter Studies in Mathematics, vol. 59, De Gruyter, Berlin, 2014.
[200] M. de Vries, V. Komornik, and P. Loreti, Topology of the set of univoque bases, Topology
Appl. 205 (2016), 117–137, DOI 10.1016/j.topol.2016.01.023. MR3493310
[201] M. de Vries, V. Komornik, and P. Loreti, Topology of univoque sets in real base expan-
sions, Topology Appl. 312 (2022), Paper No. 108085, 36, DOI 10.1016/j.topol.2022.108085.
[202] E. I. Dinaburg, A correlation between topological entropy and metric entropy (Russian),
Dokl. Akad. Nauk SSSR 190 (1970), 19–22. MR0255765
[203] P. Dömösi and M. Ito, Context-free languages and primitive words, World Scientific Pub-
lishing Co. Pte. Ltd., Hackensack, NJ, 2015. MR3243137
[204] S. Donoso, F. Durand, A. Maass, and S. Petite, On automorphism groups of low
complexity subshifts, Ergodic Theory Dynam. Systems 36 (2016), no. 1, 64–95, DOI
10.1017/etds.2015.70. MR3436754
[205] A. Douady, Julia sets and the Mandelbrot set, in: H.-O. Peitgen and, P. Richter: The beauty
of fractals, Springer-Verlag, New York, 1986, pp. 161–173.
[206] A. Douady and J. Hubbard, Études dynamique des polynômes complexes I & II, Publ. Math.
Orsay (1984-1985) (The Orsay notes).
[207] T. Downarowicz, The Choquet simplex of invariant measures for minimal flows, Israel J.
Math. 74 (1991), no. 2-3, 241–256, DOI 10.1007/BF02775789. MR1135237
[208] T. Downarowicz, Survey of odometers and Toeplitz flows, Algebraic and topological dy-
namics, Contemp. Math., vol. 385, Amer. Math. Soc., Providence, RI, 2005, pp. 7–37, DOI
10.1090/conm/385/07188. MR2180227
[209] T. Downarowicz, Entropy in dynamical systems, New Mathematical Monographs,
vol. 18, Cambridge University Press, Cambridge, 2011, DOI 10.1017/CBO9780511976155.
[210] T. Downarowicz, Positive topological entropy implies chaos DC2, Proc. Amer. Math. Soc.
142 (2014), no. 1, 137–149, DOI 10.1090/S0002-9939-2013-11717-X. MR3119189
[211] T. Downarowicz and E. Glasner, Isomorphic extensions and applications, Topol. Methods
Nonlinear Anal. 48 (2016), no. 1, 321–338, DOI 10.12775/TMNA.2016.050. MR3586277
[212] T. Downarowicz and O. Karpel, Dynamics in dimension zero: a survey, Discrete Contin.
Dyn. Syst. 38 (2018), no. 3, 1033–1062, DOI 10.3934/dcds.2018044. MR3808986
[213] T. Downarowicz and O. Karpel, Decisive Bratteli-Vershik models, Studia Math. 247 (2019),
no. 3, 251–271, DOI 10.4064/sm170519-5-2. MR3937447
[214] T. Downarowicz and Y. Lacroix, A non-regular Toeplitz flow with preset pure point spectrum,
Studia Math. 120 (1996), no. 3, 235–246. MR1410450
[215] T. Downarowicz and A. Maass, Finite-rank Bratteli-Vershik diagrams are expansive, Er-
godic Theory Dynam. Systems 28 (2008), no. 3, 739–747, DOI 10.1017/S0143385707000673.
[216] T. Downarowicz and J. Serafin, A short proof of the Ornstein theorem, Ergodic Theory
Dynam. Systems 32 (2012), no. 2, 587–597, DOI 10.1017/S0143385711000265. MR2901361
[217] T. Downarowicz and J. Serafin, A strictly ergodic, positive entropy subshift uniformly
uncorrelated to the Möbius function, Studia Math. 251 (2020), no. 2, 195–206, DOI
10.4064/sm180719-13-12. MR4045659
[218] M. Drmota and R. F. Tichy, Sequences, discrepancies and applications, Lecture Notes
in Mathematics, vol. 1651, Springer-Verlag, Berlin, 1997, DOI 10.1007/BFb0093404.
434 Bibliography

[219] F. Durand, A characterization of substitutive sequences using return words, Discrete Math.
179 (1998), no. 1-3, 89–101, DOI 10.1016/S0012-365X(97)00029-0. MR1489074
[220] F. Durand, A generalization of Cobham’s theorem, Theory Comput. Syst. 31 (1998), no. 2,
169–185, DOI 10.1007/s002240000084. MR1491657
[221] F. Durand, Linearly recurrent subshifts have a finite number of non-periodic subshift factors.
Ergodic Theory Dynam. Systems 20 (2000) 1061–1078, and corrigendum and addendum,
Ergodic Theory Dynam. Systems 23 (2003) 663–669. MR1779393; MR1972245
[222] F. Durand, Combinatorics on Bratteli diagrams and dynamical systems, Combinatorics,
automata and number theory, Encyclopedia Math. Appl., vol. 135, Cambridge Univ. Press,
Cambridge, 2010, pp. 324–372. MR2759109
[223] F. Durand, Cobham’s theorem for substitutions, J. Eur. Math. Soc. (JEMS) 13 (2011), no. 6,
1799–1814, DOI 10.4171/JEMS/294. MR2835330
[224] F. Durand, A. Frank, and A. Maass, Eigenvalues of minimal Cantor systems, J. Eur. Math.
Soc. (JEMS) 21 (2019), no. 3, 727–775, DOI 10.4171/JEMS/849. MR3908764
[225] F. Durand, B. Host, and C. Skau, Substitutional dynamical systems, Bratteli diagrams
and dimension groups, Ergodic Theory Dynam. Systems 19 (1999), no. 4, 953–993, DOI
10.1017/S0143385799133947. MR1709427
[226] F. Durand and J. Leroy, S-adic conjecture and Bratteli diagrams (English, with English
and French summaries), C. R. Math. Acad. Sci. Paris 350 (2012), no. 21-22, 979–983, DOI
10.1016/j.crma.2012.10.015. MR2996779
[227] F. Durand and J. Leroy, The constant of recognizability is computable for primitive mor-
phisms, J. Integer Seq. 20 (2017), no. 4, Art. 17.4.5, 15. MR3622264
[228] F. Durand, J. Leroy, and G. Richomme, Do the properties of an S-adic representation
determine factor complexity?, J. Integer Seq. 16 (2013), no. 2, Article 13.2.6, 30. MR3032389
[229] F. Durand and M. Rigo, On Cobham’s theorem, Chapter in Automata: from Mathematics
to Applications, Eur. Math. Soc., Editor J.-E. Pin.
[230] A. Dykstra, N. Ormes, and R. Pavlov, Subsystems of transitive subshifts with linear complex-
ity, Ergodic Theory Dynam. Systems 42 (2022), no. 6, 1967–1993, DOI 10.1017/etds.2021.8.
[231] A. Dymek, S. Kasjan, J. Kułaga-Przymus, and M. Lemańczyk, B-free sets and dynamics,
Trans. Amer. Math. Soc. 370 (2018), no. 8, 5425–5489, DOI 10.1090/tran/7132. MR3803141
[232] F. J. Dyson and H. Falk, Period of a discrete cat mapping, Amer. Math. Monthly 99 (1992),
no. 7, 603–614, DOI 10.2307/2324989. MR1176587
[233] S. Eilenberg, Automata, languages, and machines. Vol. A, Pure and Applied Mathemat-
ics, Vol. 58, Academic Press [Harcourt Brace Jovanovich, Publishers], New York, 1974.
[234] A. Eizenberg, Y. Kifer, and B. Weiss, Large deviations for Zd -actions, Commun. Math.
Phys. 644 (1994), 33–54.
[235] S. B. Ekhad and D. Zeilberger, There are more than 2n/17 n-letter ternary square-free
words, J. Integer Seq. 1 (1998), Article 98.1.9 (1 HTML document). MR1677077
[236] R. Ellis and W. H. Gottschalk, Homomorphisms of transformation groups, Trans. Amer.
Math. Soc. 94 (1960), 258–271, DOI 10.2307/1993310. MR123635
[237] J. Epperlein, D. Kwietniak, and P. Oprocha, Mixing properties in coded systems, New trends
in one-dimensional dynamics, Springer Proc. Math. Stat., vol. 285, Springer, Cham, [2019]
c pp. 183–200, DOI 10.1007/978-3-030-16833-9_10. MR4043215
[238] P. Erdős, M. Horváth, and I. Joó, On the uniqueness of the expansions 1 = q i , Acta
Math. Hungar. 58 (1991), no. 3-4, 333–342, DOI 10.1007/BF01903963. MR1153488
[239] P. Erdős and I. Joó, On the number of expansions 1 = q i , Ann. Univ. Sci. Budapest.
Eötvös Sect. Math. 35 (1992), 129–132. MR1198106
Bibliography 435

P. Erdös, I. Joó, and V. Komornik, Characterization of the unique expansions 1 =
∞ −ni and related problems (English, with French summary), Bull. Soc. Math. France
i=1 q
118 (1990), no. 3, 377–390. MR1078082
[241] M. J. Feigenbaum, Quantitative universality for a class of nonlinear transformations, J.
Statist. Phys. 19 (1978), no. 1, 25–52, DOI 10.1007/BF01020332. MR501179
[242] D.-J. Feng, M. Furukado, S. Ito, and J. Wu, Pisot substitutions and the Hausdorff dimen-
sion of boundaries of atomic surfaces, Tsukuba J. Math. 30 (2006), no. 1, 195–223, DOI
10.21099/tkbjm/1496165037. MR2248292
[243] S. Ferenczi, Les transformations de Chacon: combinatoire, structure géométrique, lien avec
les systèmes de complexité 2n + 1 (French, with English and French summaries), Bull. Soc.
Math. France 123 (1995), no. 2, 271–292. MR1340291
[244] S. Ferenczi, Rank and symbolic complexity, Ergodic Theory Dynam. Systems 17 (1996),
[245] S. Ferenczi, J. Kułaga-Przymus, and M. Lemańczyk, Sarnak’s conjecture: what’s new, Er-
godic theory and dynamical systems in their interactions with arithmetics and combinatorics,
Lecture Notes in Math., vol. 2213, Springer, Cham, 2018, pp. 163–235. MR3821717
[246] S. Ferenczi, C. Mauduit, and A. Nogueira, Substitution dynamical systems: algebraic char-
acterization of eigenvalues, Ann. Sci. École Norm. Sup. (4) 29 (1996), no. 4, 519–533.
[247] N. J. Fine and H. S. Wilf, Uniqueness theorems for periodic functions, Proc. Amer. Math.
Soc. 16 (1965), 109–114, DOI 10.2307/2034009. MR174934
[248] A. M. Fisher, Nonstationary mixing and the unique ergodicity of adic transformations,
Stoch. Dyn. 9 (2009), no. 3, 335–391, DOI 10.1142/S0219493709002701. MR2566907
[249] N. P. Fogg, Substitutions in dynamics, arithmetics and combinatorics, edited by V. Berthé,
S. Ferenczi, C. Mauduit and A. Siegel, Lecture Notes in Mathematics, vol. 1794, Springer-
Verlag, Berlin, 2002, DOI 10.1007/b13861. MR1970385
[250] S. Fomin, On dynamical systems with a purely point spectrum (Russian), Doklady Akad.
Nauk SSSR (N.S.) 77 (1951), 29–32. MR0043397
[251] S. Forchhammer and J. Justesen, Entropy bounds for constrained two-dimensional random
fields, IEEE Trans. Inform. Theory 45 (1999), no. 1, 118–127, DOI 10.1109/18.746776.
[252] R. H. Fox and R. B. Kershner, Concerning the transitive properties of geodesics on a rational
polyhedron, Duke Math. J. 2 (1936), no. 1, 147–150, DOI 10.1215/S0012-7094-36-00213-2.
[253] A. S. Fraenkel, Systems of numeration, Amer. Math. Monthly 92 (1985), no. 2, 105–114,
DOI 10.2307/2322638. MR777556
[254] S. B. Frick and N. Ormes, Dimension groups for polynomial odometers, Acta Appl. Math.
126 (2013), 165–186, DOI 10.1007/s10440-013-9812-9. MR3077947
[255] S. Frick, K. Petersen, and S. Shields, Dynamical properties of some adic systems with
arbitrary orderings, Ergodic Theory Dynam. Systems 37 (2017), no. 7, 2131–2162, DOI
10.1017/etds.2015.128. MR3693123
[256] S. Frick, K. Petersen, and S. Shields, Periodic codings of Bratteli-Vershik systems, Math.
Scand. 126 (2020), no. 2, 298–320, DOI 10.7146/math.scand.a-117570. MR4102566
[257] G. A. Freı̆man, Diofantovy priblizheniya i geometriya chisel (zadacha Markova) (Russian),
Kalinin. Gosudarstv. Univ., Kalinin, 1975. MR0485714
[258] D. Fried, Finitely presented dynamical systems, Ergodic Theory Dynam. Systems 7 (1987),
no. 4, 489–507, DOI 10.1017/S014338570000417X. MR922362
[259] S. Friedland, On the entropy of Zd subshifts of finite type, Linear Algebra Appl. 252 (1997),
199–220, DOI 10.1016/0024-3795(95)00676-1. MR1428636
436 Bibliography

[260] S. Friedland and U. N. Peled, Theory of computation of multidimensional entropy with an

application to the monomer-dimer problem, Adv. in Appl. Math. 34 (2005), no. 3, 486–522,
DOI 10.1016/j.aam.2004.08.005. MR2123547
[261] N. A. Friedman and D. S. Ornstein, On isomorphism of weak Bernoulli transformations, Ad-
vances in Math. 5 (1970), 365–394 (1970), DOI 10.1016/0001-8708(70)90010-1. MR274718
[262] G. Fuhrmann and M. Gröger, Constant length substitutions, iterated function systems and
amorphic complexity, Math. Z. 295 (2020), no. 3-4, 1385–1404, DOI 10.1007/s00209-019-
02426-2. MR4125694
[263] G. Fuhrmann, M. Gröger, and D. Lenz, The structure of mean equicontinuous group actions,
Israel J. Math. 247 (2022), no. 1, 75–123, DOI 10.1007/s11856-022-2292-8. MR4425329
[264] G. Fuhrmann, M. Gröger, and T. Jäger, Amorphic complexity, Nonlinearity 29 (2016), no. 2,
528–565, DOI 10.1088/0951-7715/29/2/528. MR3461608
[265] H. Furstenberg, Strict ergodicity and transformation of the torus, Amer. J. Math. 83 (1961),
573–601, DOI 10.2307/2372899. MR133429
[266] H. Furstenberg, Disjointness in ergodic theory, minimal sets, and a problem in Diophan-
tine approximation, Math. Systems Theory 1 (1967), 1–49, DOI 10.1007/BF01692494.
[267] H. Furstenberg and B. Weiss, Topological dynamics and combinatorial number theory, J.
Analyse Math. 34 (1978), 61–85 (1979), DOI 10.1007/BF02790008. MR531271
[268] F. R. Gantmacher, The theory of matrices. Vol. 1, translated from the Russian by K. A.
Hirsch; Reprint of the 1959 translation, AMS Chelsea Publishing, Providence, RI, 1998.
[269] F. García-Ramos, Weak forms of topological and measure-theoretical equicontinuity: rela-
tionships with discrete spectrum and sequence entropy, Ergodic Theory Dynam. Systems 37
(2017), no. 4, 1211–1237, DOI 10.1017/etds.2015.83. MR3645516
[270] F. García-Ramos, J. Li, and R. Zhang, When is a dynamical system mean sensitive?, Er-
godic Theory Dynam. Systems 39 (2019), no. 6, 1608–1636, DOI 10.1017/etds.2017.101.
[271] K. Gelfert and D. Kwietniak, On density of ergodic measures and generic points, Er-
godic Theory Dynam. Systems 38 (2018), no. 5, 1745–1767, DOI 10.1017/etds.2016.97.
[272] A. Gelfond, Sur le septième Problème de Hilbert, Bulletin de l’Académie des Sciences de
l’URSS, Classe des sciences mathématiques et na. VII (4) (1934), 623–634.
[273] J.-M. Gambaudo and M. Martens, Algebraic topology for minimal Cantor sets, Ann. Henri
Poincaré 7 (2006), no. 3, 423–446, DOI 10.1007/s00023-005-0255-3. MR2226743
[274] R. Gjerde and Ø. Johansen, Bratteli-Vershik models for Cantor minimal systems: applica-
tions to Toeplitz flows, Ergodic Theory Dynam. Systems 20 (2000), no. 6, 1687–1710, DOI
10.1017/S0143385700000948. MR1804953
[275] R. Gjerde and Ø. Johansen, Bratteli-Vershik models for Cantor minimal systems associ-
ated to interval exchange transformations, Math. Scand. 90 (2002), no. 1, 87–100, DOI
10.7146/math.scand.a-14363. MR1887096
[276] E. Glasner, Structure theory as a tool in topological dynamics, Descriptive set theory and
dynamical systems (Marseille-Luminy, 1996), London Math. Soc. Lecture Note Ser., vol. 277,
Cambridge Univ. Press, Cambridge, 2000, pp. 173–209. MR1774426
[277] E. Glasner, Ergodic theory via joinings, Mathematical Surveys and Monographs, vol. 101,
American Mathematical Society, Providence, RI, 2003, DOI 10.1090/surv/101. MR1958753
[278] P. Glendinning and N. Sidorov, Unique representations of real numbers in non-integer bases,
Math. Res. Lett. 8 (2001), no. 4, 535–543, DOI 10.4310/MRL.2001.v8.n4.a12. MR1851269
[279] C. Good and J. Meddaugh, Shifts of finite type as fundamental objects in the theory of
shadowing, Invent. Math. 220 (2020), no. 3, 715–736, DOI 10.1007/s00222-019-00936-8.
Bibliography 437

[280] C. Good, J. Meddaugh, and J. Mitchell, Shadowing, internal chain transitivity and α-limit
sets, J. Math. Anal. Appl. 491 (2020), no. 1, 124291, 19, DOI 10.1016/j.jmaa.2020.124291.
[281] P. Góra, Invariant densities for generalized β-maps, Ergodic Theory Dynam. Systems 27
(2007), no. 5, 1583–1598, DOI 10.1017/S0143385707000053. MR2358979
[282] P. Góra, Invariant densities for piecewise linear maps of the unit interval, Ergodic
Theory Dynam. Systems 29 (2009), no. 5, 1549–1583, DOI 10.1017/S0143385708000801.
[283] A. Gorodetski and Y. Pesin, Path connectedness and entropy density of the space of hy-
perbolic ergodic measures, Modern theory of dynamical systems, Contemp. Math., vol. 692,
Amer. Math. Soc., Providence, RI, 2017, pp. 111–121, DOI 10.1090/conm/692. MR3666070
[284] W. H. Gottschalk and G. A. Hedlund, Topological dynamics, American Mathematical Society
Colloquium Publications, Vol. 36, American Mathematical Society, Providence, R.I., 1955.
[285] P. J. Grabner, P. Hellekalek, and P. Liardet, The dynamical point of view of low-discrepancy
sequences, Unif. Distrib. Theory 7 (2012), no. 1, 11–70. MR2943160
[286] P. J. Grabner, P. Liardet, and R. F. Tichy, Odometers and systems of numeration, Acta
Arith. 70 (1995), no. 2, 103–123, DOI 10.4064/aa-70-2-103-123. MR1322556
[287] C. Grillenberger, Constructions of strictly ergodic systems. I. Given entropy, Z. Wahrschein-
lichkeitstheorie und Verw. Gebiete 25 (1972/73), 323–334, DOI 10.1007/BF00537161.
[288] M. Gröger, Examples of dynamical systems in the interface between order and chaos, PhD.
Thesis, University of Bremen (2015).
[289] J. G. Simonsen, On the computability of the topological entropy of subshifts, Discrete Math.
Theor. Comput. Sci. 8 (2006), no. 1, 83–95. MR2247517
[290] J. Grytczuk, H. Kordulewski, and A. Niewiadomski, Extremal square-free words, Electron.
J. Combin. 27 (2020), no. 1, Paper No. 1.48, 9, DOI 10.37236/9264. MR4075246
[291] B. Gurevič, Topological entropy for denumerable Markov chains, Dokl. Akad. Nauk SSSR
10 (1969), 911–915.
[292] B. M. Gurevič, Shift entropy and Markov measures in the space of paths of a countable
graph (Russian), Dokl. Akad. Nauk SSSR 192 (1970), 963–965. MR0268356
[293] B. M. Gurevich and S. V. Savchenko, Thermodynamic formalism for symbolic Markov chains
with a countable number of states (Russian), Uspekhi Mat. Nauk 53 (1998), no. 2(320), 3–
106, DOI 10.1070/rm1998v053n02ABEH000017; English transl., Russian Math. Surveys 53
(1998), no. 2, 245–344. MR1639451
[294] B. M. Gurevich and A. S. Zargaryan, Conditions for the existence of a maximal measure
for a countable symbolic Markov chain (Russian), Vestnik Moskov. Univ. Ser. I Mat. Mekh.
5 (1988), 14–18, 103; English transl., Moscow Univ. Math. Bull. 43 (1988), no. 5, 18–23.
[295] A. Haar, Der Massbegriff in der Theorie der kontinuierlichen Gruppen (German), Ann. of
Math. (2) 34 (1933), no. 1, 147–169, DOI 10.2307/1968346. MR1503103
[296] J. Hadamard, Les surfaces à courbures opposées et leurs lignes géodesiques, Journ. Math.
Pures et Appliqués 4 (1898), 27–73.
[297] M. Hall Jr., On the sum and product of continued fractions, Ann. of Math. (2) 48 (1947),
966–993, DOI 10.2307/1969389. MR22568
[298] P. R. Halmos, Introduction to Hilbert Space and the theory of Spectral Multiplicity, Chelsea
Publishing Co., New York, N. Y., 1951. MR0045309
[299] P. R. Halmos and J. von Neumann, Operator methods in classical mechanics. II, Ann. of
Math. (2) 43 (1942), 332–350, DOI 10.2307/1968872. MR6617
[300] T. E. Harris, Transient Markov chains with stationary measures, Proc. Amer. Math. Soc.
8 (1957), 937–942, DOI 10.2307/2033696. MR91564
438 Bibliography

[301] G. Hansel, A propos d’un théoreme de Cobham, Actes de la Fête des mots, Ed. Perrin, Greco
de Programmation, Rouen, 1982 (55–59).
[302] G. H. Hardy and J. E. Littlewood, Some problems of diophantine approximation, Acta Math.
37 (1914), no. 1, 193–239, DOI 10.1007/BF02401834. MR1555099
[303] G. H. Hardy and E. M. Wright, An introduction to the theory of numbers, 5th ed., The
Clarendon Press, Oxford University Press, New York, 1979. MR568909
[304] B. Hasselblatt and A. Katok, Principal structures, Handbook of dynamical systems, Vol.
1A, North-Holland, Amsterdam, 2002, pp. 1–203, DOI 10.1016/S1874-575X(02)80003-0.
[305] J. Hawkins, Ergodic dynamics—from basic theory to applications, Graduate Texts in
Mathematics, vol. 289, Springer, Cham, [2021] 2021,
c DOI 10.1007/978-3-030-59242-4.
[306] N. T. A. Haydn, Phase transitions in one-dimensional subshifts, Discrete Contin. Dyn. Syst.
33 (2013), no. 5, 1965–1973, DOI 10.3934/dcds.2013.33.1965. MR3002738
[307] E. Hecke, Über analytische Funktionen und die Verteilung von Zahlen mod. eins (German),
Abh. Math. Sem. Univ. Hamburg 1 (1922), no. 1, 54–76, DOI 10.1007/BF02940580.
[308] G. A. Hedlund, Endomorphisms and automorphisms of the shift dynamical system, Math.
Systems Theory 3 (1969), 320–375, DOI 10.1007/BF01691062. MR259881
[309] M.-R. Herman, Sur la conjugaison différentiable des difféomorphismes du cercle à des ro-
tations (French), Inst. Hautes Études Sci. Publ. Math. 49 (1979), 5–233. MR538680
[310] R. H. Herman, I. F. Putnam, and C. F. Skau, Ordered Bratteli diagrams, dimension
groups and topological dynamics, Internat. J. Math. 3 (1992), no. 6, 827–864, DOI
10.1142/S0129167X92000382. MR1194074
[311] A. Heinis, Arithmetics and combinatorics of words of low complexity, PhD. Thesis, Univer-
sity of Leiden (2001).
[312] E. Hironaka, What is . . . Lehmer’s number?, Notices Amer. Math. Soc. 56 (2009), no. 3,
374–375. MR2494103
[313] M. Hochman, Multidimensional shifts of finite type and sofic shifts, Combinatorics, words
and symbolic dynamics, Encyclopedia Math. Appl., vol. 159, Cambridge Univ. Press, Cam-
bridge, 2016, pp. 296–358. MR3525488
[314] M. Hochman and T. Meyerovitch, A characterization of the entropies of multidimensional
shifts of finite type, Ann. of Math. (2) 171 (2010), no. 3, 2011–2038, DOI 10.4007/an-
nals.2010.171.2011. MR2680402
[315] F. Hofbauer, On intrinsic ergodicity of piecewise monotonic transformations with posi-
tive entropy, Israel J. Math. 34 (1979), no. 3, 213–237 (1980), DOI 10.1007/BF02760884.
[316] F. Hofbauer, The topological entropy of the transformation x → ax(1 − x), Monatsh. Math.
90 (1980), no. 2, 117–141, DOI 10.1007/BF01303262. MR595319
[317] M. Hollander and B. Solomyak, Two-symbol Pisot substitutions have pure dis-
crete spectrum, Ergodic Theory Dynam. Systems 23 (2003), no. 2, 533–540, DOI
10.1017/S0143385702001384. MR1972237
[318] C. Holton and L. Q. Zamboni, Directed graphs and substitutions, Theory Comput. Syst. 34
(2001), no. 6, 545–564, DOI 10.1007/s00224-001-1038-y. MR1865811
[319] J. E. Hopcroft and J. D. Ullman, Introduction to automata theory, languages, and computa-
tion, Addison-Wesley Series in Computer Science, Addison-Wesley Publishing Co., Reading,
Mass., 1979. MR645539
[320] B. Host, Valeurs propres des systèmes dynamiques définis par des substitutions de longueur
variable (French), Ergodic Theory Dynam. Systems 6 (1986), no. 4, 529–540, DOI
10.1017/S0143385700003679. MR873430
Bibliography 439

[321] B. Host, B. Kra, and A. Maass, Nilsequences and a structure theorem for topological dy-
namical systems, Adv. Math. 224 (2010), no. 1, 103–129, DOI 10.1016/j.aim.2009.11.009.
[322] W. Huang, J. Li, J.-P. Thouvenot, L. Xu, and X. Ye, Bounded complexity, mean equicon-
tinuity and discrete spectrum, Ergodic Theory Dynam. Systems 41 (2021), no. 2, 494–533,
DOI 10.1017/etds.2019.66. MR4177293
[323] W. Huang, P. Lu, and X. Ye, Measure-theoretical sensitivity and equicontinuity, Israel J.
Math. 183 (2011), 233–283, DOI 10.1007/s11856-011-0049-x. MR2811160
[324] W. Huang and X. Ye, Devaney’s chaos or 2-scattering implies Li-Yorke’s chaos, Topology
Appl. 117 (2002), no. 3, 259–272, DOI 10.1016/S0166-8641(01)00025-6. MR1874089
[325] W. Huang and X. Ye, Dynamical systems disjoint from any minimal system, Trans. Amer.
Math. Soc. 357 (2005), no. 2, 669–694, DOI 10.1090/S0002-9947-04-03540-8. MR2095626
[326] J. E. Hutchinson, Fractals and self-similarity, Indiana Univ. Math. J. 30 (1981), no. 5,
713–747, DOI 10.1512/iumj.1981.30.30055. MR625600
[327] G. Iommi, M. Todd, and A. Velozo, Escape of entropy for countable Markov shifts, Adv.
Math. 405 (2022), Paper No. 108507, DOI 10.1016/j.aim.2022.108507. MR4438058
[328] S. Ito and H. Rao, Purely periodic β-expansions with Pisot unit base, Proc. Amer. Math.
Soc. 133 (2005), no. 4, 953–964, DOI 10.1090/S0002-9939-04-07794-9. MR2117194
[329] S. Ito and H. Rao, Atomic surfaces, tilings and coincidence. I. Irreducible case, Israel J.
Math. 153 (2006), 129–155, DOI 10.1007/BF02771781. MR2254640
[330] S. Ito, S. Tanaka, and H. Nakada, On unimodal linear transformations and chaos II, Tokyo
J. Math. 2 (1979), 241–259.
[331] A. Iwanik, Toeplitz flows with pure point spectrum, Studia Math. 118 (1996), no. 1, 27–35,
DOI 10.4064/sm-118-1-27-35. MR1373622
[332] K. Jacobs and M. Keane, 0 − 1-sequences of Toeplitz type, Z. Wahrscheinlichkeitstheorie und
Verw. Gebiete 13 (1969), 123–131, DOI 10.1007/BF00537017. MR255766
[333] H. Jager and C. Kraaikamp, On the approximation by continued fractions, Nederl. Akad.
Wetensch. Indag. Math. 51 (1989), no. 3, 289–307. MR1020023
[334] M. Jellali, M. Mkaouar, K. Scheicher, and J. M. Thuswaldner, Beta-continued fractions over
Laurent series, Publ. Math. Debrecen 77 (2010), no. 3-4, 443–463. MR2741860
[335] T. Jolivet, B. Loridant, and J. Luo, Rauzy fractals with countable fundamental group, J.
Fractal Geom. 1 (2014), no. 4, 427–447, DOI 10.4171/JFG/13. MR3299819
[336] U. Jung, On the existence of open and bi-continuing codes, Trans. Amer. Math. Soc. 363
(2011), no. 3, 1399–1417, DOI 10.1090/S0002-9947-2010-05035-4. MR2737270
[337] R. M. Jungers, V. Y. Protasov, and V. D. Blondel, Overlap-free words and spectra of matri-
ces, Theoret. Comput. Sci. 410 (2009), no. 38-40, 3670–3684, DOI 10.1016/j.tcs.2009.04.022.
[338] L. Kailhofer, A classification of inverse limit spaces of tent maps with periodic critical
points, Fund. Math. 177 (2003), no. 2, 95–120, DOI 10.4064/fm177-2-1. MR1992527
[339] S. Kakutani, A problem of equidistribution on the unit interval [0, 1], Measure theory (Proc.
Conf., Oberwolfach, 1975), Springer, Berlin, 1976, pp. 369–375. Lecture Notes in Math., Vol.
541. MR0457678
[340] T. Kamae, A topological invariant of substitution minimal sets, J. Math. Soc. Japan 24
(1972), 285–306, DOI 10.2969/jmsj/02420285. MR293611
[341] T. Kamae, A simple proof of the ergodic theorem using nonstandard analysis, Israel J. Math.
42 (1982), no. 4, 284–290, DOI 10.1007/BF02761408. MR682311
[342] A. Kanigowski, M. Lemańczyk, and M. Radziwiłł, Rigidity in dynamics and Möbius disjoint-
ness, Fund. Math. 255 (2021), no. 3, 309–336, DOI 10.4064/fm931-11-2020. MR4324828
440 Bibliography

[343] J. Karhumäki and J. Shallit, Polynomial versus exponential growth in repetition-free binary
words, J. Combin. Theory Ser. A 105 (2004), no. 2, 335–347, DOI 10.1016/j.jcta.2003.12.004.
[344] S. Kasjan, G. Keller, and M. Lemańczyk, Dynamics of B-free sets: a view through the
window, Int. Math. Res. Not. IMRN 9 (2019), 2690–2734, DOI 10.1093/imrn/rnx196.
[345] A. B. Katok, Invariant measures of flows on orientable surfaces (Russian), Dokl. Akad.
Nauk SSSR 211 (1973), 775–778. MR0331438
[346] A. Katok and B. Hasselblatt, Introduction to the modern theory of dynamical systems,
with a supplementary chapter by Katok and Leonardo Mendoza, Encyclopedia of Mathe-
matics and its Applications, vol. 54, Cambridge University Press, Cambridge, 1995, DOI
10.1017/CBO9780511809187. MR1326374
[347] M. Keane, Interval exchange transformations, Math. Z. 141 (1975), 25–31, DOI
10.1007/BF01236981. MR357739
[348] M. Keane, Non-ergodic interval exchange transformations, Israel J. Math. 26 (1977), no. 2,
188–196, DOI 10.1007/BF03007668. MR435353
[349] M. S. Keane, Ergodic theory and subshifts of finite type, Ergodic theory, symbolic dynamics,
and hyperbolic spaces (Trieste, 1989), Oxford Sci. Publ., Oxford Univ. Press, New York,
1991, pp. 35–70. MR1130172
[350] M. Keane, A continued fraction titbit, Symposium in Honor of Benoit Mandelbrot (Curaçao,
1995), Fractals 3 (1995), no. 4, 641–650, DOI 10.1142/S0218348X95000576. MR1410284
[351] M. S. Keane and G. Rauzy, Stricte ergodicité des échanges d’intervalles (French), Math. Z.
174 (1980), no. 3, 203–212, DOI 10.1007/BF01161409. MR593819
[352] M. Keane and M. Smorodinsky, Bernoulli schemes of the same entropy are finitarily iso-
morphic, Ann. of Math. (2) 109 (1979), no. 2, 397–406, DOI 10.2307/1971117. MR528969
[353] G. Keller, Tautness for sets of multiples and applications to B-free dynamics, Studia Math.
247 (2019), no. 2, 205–216, DOI 10.4064/sm180305-9-4. MR3920387
[354] K. Keller, Invariant factors, Julia equivalences and the (abstract) Mandelbrot set, Lecture
Notes in Mathematics, vol. 1732, Springer-Verlag, Berlin, 2000, DOI 10.1007/BFb0103999.
[355] J. Kepler, Harmonices mundi, Linz, 1619.
[356] D. Kerr and H. Li, Independence in topological and C ∗ -dynamics, Math. Ann. 338 (2007),
no. 4, 869–926, DOI 10.1007/s00208-007-0097-z. MR2317754
[357] H. Kesten, On a conjecture of Erdős and Szüsz related to uniform distribution mod 1, Acta
Arith. 12 (1966/67), 193–212, DOI 10.4064/aa-12-2-193-212. MR209253
[358] H. B. Keynes and D. Newton, A “minimal”, non-uniquely ergodic interval exchange trans-
formation, Math. Z. 148 (1976), no. 2, 101–105, DOI 10.1007/BF01214699. MR409766
[359] H. B. Keynes and J. B. Robertson, On ergodicity and mixing in topological transformation
groups, Duke Math. J. 35 (1968), 809–819. MR234441
[360] A. Ya. Khinchin, Continued fractions, with a preface by B. V. Gnedenko; reprint of the
1964 translation, translated from the third (1961) Russian edition, Dover Publications, Inc.,
Mineola, NY, 1997. MR1451873
[361] K. H. Kim and F. W. Roush, Williams’s conjecture is false for reducible subshifts, J. Amer.
Math. Soc. 5 (1992), no. 1, 213–215, DOI 10.2307/2152756. MR1130528
[362] J. L. King, A map with topological minimal self-joinings in the sense of del Junco, Er-
godic Theory Dynam. Systems 10 (1990), no. 4, 745–761, DOI 10.1017/S0143385700005873.
[363] J. F. C. Kingman, The exponential decay of Markov transition probabilities, Proc. London
Math. Soc. (3) 13 (1963), 337–358, DOI 10.1112/plms/s3-13.1.337. MR152014
Bibliography 441

[364] B. P. Kitchens, Symbolic dynamics: One-sided, two-sided and countable state Markov shifts,
Universitext, Springer-Verlag, Berlin, 1998, DOI 10.1007/978-3-642-58822-8. MR1484730
[365] J. Kiwi, Wandering orbit portraits, Trans. Amer. Math. Soc. 354 (2002), no. 4, 1473–1485,
DOI 10.1090/S0002-9947-01-02896-3. MR1873015
[366] R. Kolpakov, Efficient lower bounds on the number of repetition-free words, J. Integer Seq.
10 (2007), no. 3, Article 07.3.2, 16. MR2291946
[367] R. Kolpakov, G. Kucherov, and Y. Tarannikov, On repetition-free binary words of minimal
density, WORDS (Rouen, 1997), Theoret. Comput. Sci. 218 (1999), no. 1, 161–175, DOI
10.1016/S0304-3975(98)00257-6. MR1687788
[368] V. Komornik and P. Loreti, Unique developments in non-integer bases, Amer. Math.
Monthly 105 (1998), no. 7, 636–639, DOI 10.2307/2589246. MR1633077
[369] J. Konieczny, M. Kupsa, and D. Kwietniak, Arcwise connectedness of the set of ergodic
measures of hereditary shifts, Proc. Amer. Math. Soc. 146 (2018), no. 8, 3425–3438, DOI
10.1090/proc/14029. MR3803667
[370] C. Kopf, Invariant measures for piecewise linear transformations of the interval, Appl.
Math. Comput. 39 (1990), no. 2, 123–144, DOI 10.1016/0096-3003(90)90027-Z. MR1071209
[371] T. J. P. Krebs, A more reasonable proof of Cobham’s theorem, Internat. J. Found. Comput.
Sci. 32 (2021), no. 2, 203–207, DOI 10.1142/S0129054121500118. MR4218824
[372] W. Krieger, On entropy and generators of measure-preserving transformations, Trans.
Amer. Math. Soc. 149 (1970), 453–464, and Erratum 168 (1972) 519, DOI 10.2307/1995407.
[373] W. Krieger, On the uniqueness of the equilibrium state, Math. Systems Theory 8 (1974/75),
no. 2, 97–104, DOI 10.1007/BF01762180. MR399412
[374] W. Krieger, On topological Markov chains, Dynamical systems, Vol. II—Warsaw, Soc. Math.
France, Paris, 1977, pp. 193–196. Astérisque, No. 50. MR0500874
[375] W. Krieger, On a dimension for a class of homeomorphism groups, Math. Ann. 252
(1979/80), no. 2, 87–95, DOI 10.1007/BF01420115. MR593623
[376] W. Krieger, On dimension functions and topological Markov chains, Invent. Math. 56
(1980), no. 3, 239–250, DOI 10.1007/BF01390047. MR561973
[377] L. Kuipers and H. Niederreiter, Uniform distribution of sequences, Pure and Applied
Mathematics, Wiley-Interscience [John Wiley & Sons], New York-London-Sydney, 1974.
[378] J. Kułaga-Przymus, M. Lemańczyk, and B. Weiss, On invariant measures for B-free sys-
tems, Proc. Lond. Math. Soc. (3) 110 (2015), no. 6, 1435–1474, DOI 10.1112/plms/pdv017.
[379] J. Kułaga-Przymus, M. Lemańczyk, and B. Weiss, Hereditary subshifts whose simplex of
invariant measures is Poulsen, Ergodic theory, dynamical systems, and the continuing in-
fluence of John C. Oxtoby, Contemp. Math., vol. 678, Amer. Math. Soc., Providence, RI,
2016, pp. 245–253, DOI 10.1090/conm/678. MR3589826
[380] M. Kulczycki, D. Kwietniak, and J. Li, Entropy of subordinate shift spaces, Amer. Math.
Monthly 125 (2018), no. 2, 141–148, DOI 10.1080/00029890.2018.1401875. MR3756340
[381] P. Kůrka, Topological and symbolic dynamics, Cours Spécialisés [Specialized Courses],
vol. 11, Société Mathématique de France, Paris, 2003. MR2041676
[382] D. Kwietniak, Topological entropy and distributional chaos in hereditary shifts with ap-
plications to spacing shifts and beta shifts, Discrete Contin. Dyn. Syst. 33 (2013), no. 6,
2451–2467, DOI 10.3934/dcds.2013.33.2451. MR3007694
[383] D. Kwietniak, M. Ła̧cka, and P. Oprocha, A panorama of specification-like properties and
their consequences, Dynamics and numbers, Contemp. Math., vol. 669, Amer. Math. Soc.,
Providence, RI, 2016, pp. 155–186, DOI 10.1090/conm/669/13428. MR3546668
442 Bibliography

[384] D. Kwietniak, P. Oprocha, and M. Rams, On entropy of dynamical systems with almost
specification, Israel J. Math. 213 (2016), no. 1, 475–503, DOI 10.1007/s11856-016-1339-0.
[385] J. Lagrange, Additions au mémoire sur la résolution des équations numériques, Mém. Acad.
royale sc. et belles-lettres, Berlin 24 (1770); also in Oeuvres II, 581–652.
[386] O. E. Lanford III, A computer-assisted proof of the Feigenbaum conjectures, Bull. Amer.
Math. Soc. (N.S.) 6 (1982), no. 3, 427–434, DOI 10.1090/S0273-0979-1982-15008-X.
[387] K. Lau and A. Zame, On weak mixing of cascades, Math. Systems Theory 6 (1972/73),
307–311, DOI 10.1007/BF01740722. MR321058
[388] P. Lavaurs, Une description combinatoire de l’involution définie par M sur les rationnels à
dénominateur impair (French, with English summary), C. R. Acad. Sci. Paris Sér. I Math.
303 (1986), no. 4, 143–146. MR853606
[389] P. D. Lax, Functional analysis, Pure and Applied Mathematics (New York), Wiley-
Interscience [John Wiley & Sons], New York, 2002. MR1892228
[390] F. Ledrappier, Some properties of absolutely continuous invariant measures on an interval,
Ergodic Theory Dynam. Systems 1 (1981), no. 1, 77–93, DOI 10.1017/s0143385700001176.
[391] D. H. Lehmer, Factorization of certain cyclotomic functions, Ann. of Math. (2) 34 (1933),
no. 3, 461–479, DOI 10.2307/1968172. MR1503118
[392] J. Leroy, Some improvements of the S-adic conjecture, Adv. in Appl. Math. 48 (2012),
no. 1, 79–98, DOI 10.1016/j.aam.2011.03.005. MR2845508
[393] J. Leroy and G. Richomme, A combinatorial proof of S-adicity for sequences with linear
complexity, Integers 13 (2013), Paper No. A5, 19. MR3083467
[394] J. Li, S. Tu, and X. Ye, Mean equicontinuity and mean sensitivity, Ergodic Theory Dynam.
Systems 35 (2015), no. 8, 2587–2612, DOI 10.1017/etds.2014.41. MR3456608
[395] T. Y. Li and J. A. Yorke, Period three implies chaos, Amer. Math. Monthly 82 (1975),
no. 10, 985–992, DOI 10.2307/2318254. MR385028
[396] D. Lima, C. Matheus, C. G. Moreira, and S. Romaña, Classical and dynamical Markov and
Lagrange spectra—dynamical, fractal and arithmetic aspects, World Scientific Publishing
Co. Pte. Ltd., Hackensack, NJ, [2021] 2021.
c MR4274593
[397] D. A. Lind, The entropies of topological Markov shifts and a related class of al-
gebraic integers, Ergodic Theory Dynam. Systems 4 (1984), no. 2, 283–300, DOI
10.1017/S0143385700002443. MR766106
[398] D. Lind and B. Marcus, An introduction to symbolic dynamics and coding, Cambridge
University Press, Cambridge, 1995, DOI 10.1017/CBO9780511626302. MR1369092
[399] D. Lind, K. Schmidt, and T. Ward, Mahler measure and entropy for commuting
automorphisms of compact groups, Invent. Math. 101 (1990), no. 3, 593–629, DOI
10.1007/BF01231517. MR1062797
[400] J. Lindenstrauss, G. Olsen, and Y. Sternfeld, The Poulsen simplex (English, with French
summary), Ann. Inst. Fourier (Grenoble) 28 (1978), no. 1, vi, 91–114. MR500918
[401] K. Lindsey and R. Treviño, Infinite type flat surface models of ergodic systems, Discrete
Contin. Dyn. Syst. 36 (2016), no. 10, 5509–5553, DOI 10.3934/dcds.2016043. MR3543559
[402] A. N. Livshits, On the spectra of adic transformations of Markov compact sets (Russian),
Uspekhi Mat. Nauk 42 (1987), no. 3(255), 189–190. MR896889
[403] A. N. Livshits, Sufficient conditions for weak mixing of substitutions and of station-
ary adic transformations (Russian), Mat. Zametki 44 (1988), no. 6, 785–793, 862, DOI
10.1007/BF01158030; English transl., Math. Notes 44 (1988), no. 5-6, 920–925 (1989).
Bibliography 443

[404] M. Lothaire, Combinatorics on words, Encyclopedia of Mathematics and its Applications,

vol. 17, a collective work by Dominique Perrin, Jean Berstel, Christian Choffrut, Robert
Cori, Dominique Foata, Jean Eric Pin, Guiseppe Pirillo, Christophe Reutenauer, Marcel-P.
Schützenberger, Jacques Sakarovitch and Imre Simon; with a foreword by Roger Lyndon;
edited and with a preface by Perrin, Addison-Wesley Publishing Co., Reading, Mass., 1983.
[405] M. Lothaire, Applied combinatorics on words, Encyclopedia of Mathematics and its Appli-
cations, vol. 105, a collective work by Jean Berstel, Dominique Perrin, Maxime Crochemore,
Eric Laporte, Mehryar Mohri, Nadia Pisanti, Marie-France Sagot, Gesine Reinert, Sophie
Schbath, Michael Waterman, Philippe Jacquet, Wojciech Szpankowski, Dominique Poulal-
hon, Gilles Schaeffer, Roman Kolpakov, Gregory Koucherov, Jean-Paul Allouche and Valérie
Berthé; with a preface by Berstel and Perrin, Cambridge University Press, Cambridge, 2005,
DOI 10.1017/CBO9781107341005. MR2165687
[406] A. de Luca and S. Varricchio, Some combinatorial properties of the Thue-Morse sequence
and a problem in semigroups, Theoret. Comput. Sci. 63 (1989), no. 3, 333–348, DOI
10.1016/0304-3975(89)90013-3. MR993769
[407] M. Lyubich, Six lectures on real and complex dynamics, https://www.math.stonybrook.
[408] R. Mañé, Ergodic theory and differentiable dynamics, translated from the Portuguese by
Silvio Levy, Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics
and Related Areas (3)], vol. 8, Springer-Verlag, Berlin, 1987, DOI 10.1007/978-3-642-70335-
5. MR889254
[409] B. Matson and E. Sattler, S-limited shifts, Real Anal. Exchange 43 (2018), no. 2, 393–415,
DOI 10.14321/realanalexch.43.2.0393. MR3942586
[410] H. Masur, Interval exchange transformations and measured foliations, Ann. of Math. (2)
115 (1982), no. 1, 169–200, DOI 10.2307/1971341. MR644018
[411] C. Matheus and C. G. Moreira, Markov spectrum near Freiman’s isolated points in M \ L,
J. Number Theory 194 (2019), 390–408, DOI 10.1016/j.jnt.2018.06.016. MR3860483
[412] C. Matheus and C. G. Moreira, Diophantine approximation, Lagrange and Markov spectra,
and dynamical Cantor sets, Notices Amer. Math. Soc. 68 (2021), no. 8, 1301–1311, DOI
10.1090/noti2329. MR4309164
[413] K. Medynets, Cantor aperiodic systems and Bratteli diagrams (English, with English
and French summaries), C. R. Math. Acad. Sci. Paris 342 (2006), no. 1, 43–46, DOI
10.1016/j.crma.2005.10.024. MR2193394
[414] W. de Melo and S. van Strien, One-dimensional dynamics, Springer, Berlin, Heidelberg,
New York, 1993.
[415] X. Méla and K. Petersen, Dynamical properties of the Pascal adic transformation, Er-
godic Theory Dynam. Systems 25 (2005), no. 1, 227–256, DOI 10.1017/S0143385704000173.
[416] N. Metropolis, M. L. Stein, and P. R. Stein, On finite limit sets for transformations on
the unit interval, J. Combinatorial Theory Ser. A 15 (1973), 25–44, DOI 10.1016/0097-
3165(73)90033-2. MR316636
[417] P. Michel, Stricte ergodicité d’ensembles minimaux de substitution (French), C. R. Acad.
Sci. Paris Sér. A 278 (1974), 811–813. MR362276
[418] F. Mignosi, On the number of factors of Sturmian words, Theoret. Comput. Sci. 82 (1991),
no. 1, Algorithms Automat. Complexity Games, 71–84, DOI 10.1016/0304-3975(91)90172-X.
[419] J. Milnor, Dynamics in one complex variable: Introductory lectures, Friedr. Vieweg & Sohn,
Braunschweig, 1999. MR1721240
[420] J. Milnor and W. Thurston, On iterated maps of the interval, Dynamical systems (College
Park, MD, 1986), Lecture Notes in Math., vol. 1342, Springer, Berlin, 1988, pp. 465–563,
DOI 10.1007/BFb0082847. MR970571
444 Bibliography

[421] M. Misiurewicz and S. Roth, Constant slope maps on the extended real line, Ergodic Theory
Dynam. Systems 38 (2018), no. 8, 3145–3169, DOI 10.1017/etds.2017.3. MR3868025
[422] M. Misiurewicz and W. Szlenk, Entropy of piecewise monotone mappings, Studia Math. 67
(1980), no. 1, 45–63, DOI 10.4064/sm-67-1-45-63. MR579440
[423] J. Mitchell, On origin of orbits and the shadow of chaos, Ph.D. Thesis, 2020, University of
[424] T. K. S. Moothathu, Diagonal points having dense orbit, Colloq. Math. 120 (2010), no. 1,
127–138, DOI 10.4064/cm120-1-9. MR2652611
[425] M. Morse and G. A. Hedlund, Symbolic dynamics II. Sturmian trajectories, Amer. J. Math.
62 (1940), 1–42, DOI 10.2307/2371431. MR745
[426] B. Mossé, Puissances de mots et reconnaissabilité des points fixes d’une substitu-
tion (French), Theoret. Comput. Sci. 99 (1992), no. 2, 327–334, DOI 10.1016/0304-
3975(92)90357-L. MR1168468
[427] J. Moulin Ollagnier, Proof of Dejean’s conjecture for alphabets with 5, 6, 7, 8, 9, 10 and 11
letters (English, with French summary), Theoret. Comput. Sci. 95 (1992), no. 2, 187–205,
DOI 10.1016/0304-3975(92)90264-G. MR1156042
[428] J. Myhill, Finite automata and the representation of events, WADD technical report (1957),
[429] M. G. Nadkarni, Spectral theory of dynamical systems, Birkhäuser Advanced Texts: Basler
Lehrbücher. [Birkhäuser Advanced Texts: Basel Textbooks], Birkhäuser Verlag, Basel, 1998,
DOI 10.1007/978-3-0348-8841-7. MR1719722
[430] A. Nerode, Linear automaton transformations, Proc. Amer. Math. Soc. 9 (1958), 541–544,
DOI 10.2307/2033204. MR135681
[431] J. von Neumann, Zur Operatorenmethode in der klassischen Mechanik (German), Ann. of
Math. (2) 33 (1932), no. 3, 587–642, DOI 10.2307/1968537. MR1503078
[432] A. Nogueira and D. Rudolph, Topological weak-mixing of interval exchange maps, Ergodic
Theory Dynam. Systems 17 (1997), no. 5, 1183–1209, DOI 10.1017/S0143385797086276.
[433] W. Ogden, A helpful result for proving inherent ambiguity, Math. Systems Theory 2 (1968),
191–194, DOI 10.1007/BF01694004. MR233645
[434] P. Oprocha and G. Zhang, Topological aspects of dynamics of pairs, tuples and sets, Recent
progress in general topology. III, Atlantis Press, Paris, 2014, pp. 665–709, DOI 10.2991/978-
94-6239-024-9_16. MR3205496
[435] N. Ormes and R. Pavlov, On the complexity function for sequences which are not uniformly
recurrent, Dynamical systems and random processes, Contemp. Math., vol. 736, Amer.
Math. Soc., [Providence], RI, [2019] 2019,
c pp. 125–137, DOI 10.1090/conm/736/14833.
[436] D. S. Ornstein, On the root problem in ergodic theory, Proceedings of the Sixth Berkeley
Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif.,
1970/1971), Univ. California Press, Berkeley, Calif., 1972, pp. 347–356. MR0399415
[437] D. Ornstein, A Kolmogorov automorphism that is not a Bernoulli shift, Matematika 15
(1971), 131–150.
[438] D. S. Ornstein, Ergodic theory, randomness, and dynamical systems, James K. Whittemore
Lectures in Mathematics given at Yale University, Yale Mathematical Monographs, No. 5,
Yale University Press, New Haven, Conn.-London, 1974. MR0447525
[439] A. Ostrowski, Bemerkungen zur Theorie der Diophantischen Approximationen (German),
Abh. Math. Sem. Univ. Hamburg 1 (1922), no. 1, 77–98, DOI 10.1007/BF02940581.
[440] J. C. Oxtoby, Ergodic sets, Bull. Amer. Math. Soc. 58 (1952), 116–136, DOI 10.1090/S0002-
9904-1952-09580-X. MR47262
Bibliography 445

[441] J.-J. Pansiot, Complexité des facteurs des mots infinis engendrés par morphismes itérés
(French, with English summary), Automata, languages and programming (Antwerp, 1984),
Lecture Notes in Comput. Sci., vol. 172, Springer, Berlin, 1984, pp. 380–389, DOI 10.1007/3-
540-13345-3_34. MR784265
[442] J.-J. Pansiot, À propos d’une conjecture de F. Dejean sur les répétitions dans les mots
(French, with English summary), Discrete Appl. Math. 7 (1984), no. 3, 297–311, DOI
10.1016/0166-218X(84)90006-4. MR736893
[443] J.-J. Pansiot, On various classes of infinite words obtained by iterated mappings, Automata
on infinite words (Le Mont-Dore, 1984), Lecture Notes in Comput. Sci., vol. 192, Springer,
Berlin, 1985, pp. 188–197, DOI 10.1007/3-540-15641-0_34. MR814743
[444] S. S. Park, Bratteli diagram isomorphic to Chacon homeomorphism, Bull. Korean Math.
Soc. 37 (2000), no. 3, 519–536. MR1779242
[445] W. Parry, On the β-expansions of real numbers (English, with Russian summary), Acta
Math. Acad. Sci. Hungar. 11 (1960), 401–416, DOI 10.1007/BF02020954. MR142719
[446] W. Parry, Intrinsic Markov chains, Trans. Amer. Math. Soc. 112 (1964), 55–66, DOI
10.2307/1994009. MR161372
[447] W. Parry, Representations for real numbers, Acta Math. Acad. Sci. Hungar. 15 (1964),
95–105, DOI 10.1007/BF01897025. MR166332
[448] O. G. Parshina, On arithmetic progressions in the generalized Thue-Morse word, Combina-
torics on words, Lecture Notes in Comput. Sci., vol. 9304, Springer, Cham, 2015, pp. 191–196,
DOI 10.1007/978-3-319-23660-5_16. MR3446321
[449] M. E. Paul, Construction of almost automorphic symbolic minimal flows, General Topology
and Appl. 6 (1976), no. 1, 45–56. MR388365
[450] R. Pavlov, On entropy and intrinsic ergodicity of coded subshifts, Proc. Amer. Math. Soc.
148 (2020), no. 11, 4717–4731, DOI 10.1090/proc/15145. MR4143389
[451] R. Peckner, Uniqueness of the measure of maximal entropy for the squarefree flow, Israel J.
Math. 210 (2015), no. 1, 335–357, DOI 10.1007/s11856-015-1255-8. MR3430278
[452] C. Penrose, On quotients of shifts associated with dendrite Julia sets of quadratic polyno-
mials, Ph.D. Thesis, University of Coventry, 1994.
[453] D. Perrin, Finite automata, Handbook of theoretical computer science, Vol. B, Elsevier,
Amsterdam, 1990, pp. 1–57. MR1127186
[454] K. E. Petersen, A topologically strongly mixing symbolic minimal set, Trans. Amer. Math.
Soc. 148 (1970), 603–612, DOI 10.2307/1995392. MR259884
[455] K. Petersen, On a series of cosecants related to a problem in ergodic theory, Compositio
Math. 26 (1973), 313–317. MR325927
[456] K. Petersen, Ergodic theory, Cambridge Studies in Advanced Mathematics, vol. 2, Cambridge
University Press, Cambridge, 1983, DOI 10.1017/CBO9780511608728. MR833286
[457] K. Petersen, Chains, entropy, coding, Ergodic Theory Dynam. Systems 6 (1986), no. 3,
415–448, DOI 10.1017/S014338570000359X. MR863204
[458] C.-E. Pfister and W. G. Sullivan, Large deviations estimates for dynamical systems without
the specification property. Applications to the β-shifts, Nonlinearity 18 (2005), no. 1, 237–
261, DOI 10.1088/0951-7715/18/1/013. MR2109476
[459] C.-E. Pfister and W. G. Sullivan, On the topological entropy of saturated sets, Ergodic
Theory Dynam. Systems 27 (2007), no. 3, 929–956, DOI 10.1017/S0143385706000824.
[460] R. R. Phelps, Lectures on Choquet’s theorem, 2nd ed., Lecture Notes in Mathematics,
vol. 1757, Springer-Verlag, Berlin, 2001, DOI 10.1007/b76887. MR1835574
[461] S. Yu. Pilyugin, Shadowing in dynamical systems, Lecture Notes in Mathematics, vol. 1706,
Springer-Verlag, Berlin, 1999. MR1727170
446 Bibliography

[462] C. Preston, Iterates of maps on an interval, Lecture Notes in Mathematics, vol. 999,
Springer-Verlag, Berlin, 1983, DOI 10.1007/BFb0061749. MR706078
[463] W. E. Pruitt, Eigenvalues of non-negative matrices, Ann. Math. Statist. 35 (1964), 1797–
1800, DOI 10.1214/aoms/1177700401. MR168579
[464] J. Qiu and J. Zhao, A note on mean equicontinuity, J. Dynam. Differential Equations 32
(2020), no. 1, 101–116, DOI 10.1007/s10884-018-9716-5. MR4061636
[465] M. Queffélec, Substitution dynamical systems—spectral analysis, Lecture Notes in Mathe-
matics, vol. 1294, Springer-Verlag, Berlin, 1987, DOI 10.1007/BFb0081890. MR924156
[466] M. Queffélec, Une nouvelle propriété des suites de Rudin-Shapiro (French, with English
summary), Ann. Inst. Fourier (Grenoble) 37 (1987), no. 2, 115–138. MR898934
[467] N. Rampersad and J. Shallit, Repetitions in words, Combinatorics, words and symbolic
dynamics, Encyclopedia Math. Appl., vol. 159, Cambridge Univ. Press, Cambridge, 2016,
pp. 101–150. MR3525483
[468] N. Rampersad, J. Shallit, and É. Vandomme, Critical exponents of infinite balanced words,
Theoret. Comput. Sci. 777 (2019), 454–463, DOI 10.1016/j.tcs.2018.10.017. MR3961908
[469] G. N. Raney, On continued fractions and finite automata, Math. Ann. 206 (1973), 265–283,
DOI 10.1007/BF01355980. MR340166
[470] M. Rao, Last cases of Dejean’s conjecture, Theoret. Comput. Sci. 412 (2011), no. 27, 3010–
3018, DOI 10.1016/j.tcs.2010.06.020. MR2830264
[471] G. Rauzy, Nombres algébriques et substitutions (French, with English summary), Bull. Soc.
Math. France 110 (1982), no. 2, 147–178. MR667748
[472] C. Richard and U. Grimm, On the entropy and letter frequencies of ternary square-free
words, Electron. J. Combin. 11 (2004), no. 1, Research Paper 14, 19. MR2035308
[473] M. Rigo and L. Waxweiler, A note on syndeticity, recognizable sets and Cobham’s theorem,
Bull. Eur. Assoc. Theor. Comput. Sci. EATCS 88 (2006), 169–173. MR2222340
[474] C. Robinson, Dynamical systems: Stability, symbolic dynamics, and chaos, 2nd ed., Studies
in Advanced Mathematics, CRC Press, Boca Raton, FL, 1999. MR1792240
[475] V. A. Rokhlin, Exact endomorphisms of a Lebesgue space (Russian), Izv. Akad. Nauk SSSR
Ser. Mat. 25 (1961), 499–530. MR0143873
[476] K. F. Roth, Rational approximations to algebraic numbers, Mathematika 2 (1955), 1–20;
corrigendum, 168, DOI 10.1112/S0025579300000644. MR72182
[477] W. Rudin, Some theorems on Fourier coefficients, Proc. Amer. Math. Soc. 10 (1959), 855–
859, DOI 10.2307/2033608. MR116184
[478] W. Rudin, Functional analysis, McGraw-Hill Series in Higher Mathematics, McGraw-Hill
Book Co., New York-Düsseldorf-Johannesburg, 1973. MR0365062
[479] D. J. Rudolph, Fundamentals of measurable dynamics: Ergodic theory on Lebesgue spaces,
Oxford Science Publications, The Clarendon Press, Oxford University Press, New York,
1990. MR1086631
[480] D. Ruelle, Statistical mechanics on a compact set with Z v action satisfying expansiveness
and specification, Trans. Amer. Math. Soc. 187 (1973), 237–251, DOI 10.2307/1996437.
[481] S. Ruette, On the Vere-Jones classification and existence of maximal measures for
countable topological Markov chains, Pacific J. Math. 209 (2003), no. 2, 366–380, DOI
10.2140/pjm.2003.209.365. MR1978377
[482] S. Ruette, Chaos on the interval, University Lecture Series, vol. 67, American Mathematical
Society, Providence, RI, 2017, DOI 10.1090/ulect/067. MR3616574
[483] Y. Saiki, M. A. F. Sanjuán, and J. A. Yorke, Low-dimensional paradigms for high-
dimensional hetero-chaos, Chaos 28 (2018), no. 10, 103110, 7, DOI 10.1063/1.5045693.
Bibliography 447

[484] I. A. Salama, Topological entropy and recurrence of countable chains, Pacific J. Math. 134
(1988), no. 2, 325–341. MR961239
[485] R. Salem, A remarkable class of algebraic integers. Proof of a conjecture of Vijayaraghavan,
Duke Math. J. 11 (1944), 103–108. MR10149
[486] A. Salomaa, Theory of automata, International Series of Monographs in Pure and Applied
Mathematics, Vol. 100, Pergamon Press, Oxford-New York-Toronto, Ont., 1969. MR0262021
[487] O. Sarig and M. Schmoll, Adic flows, transversal flows, and horocycle flows, Ergodic theory
and dynamical systems, De Gruyter Proc. Math., De Gruyter, Berlin, 2014, pp. 241–259.
[488] J. Schmeling, Symbolic dynamics for β-shifts and self-normal numbers, Ergodic Theory
Dynam. Systems 17 (1997), no. 3, 675–694, DOI 10.1017/S0143385797079182. MR1452189
[489] J. Schmeling and S. Troubetzkoy, Interval translation mappings, Dynamical systems
(Luminy-Marseille, 1998), World Sci. Publ., River Edge, NJ, 2000, pp. 291–302. MR1796167
[490] K. Schmidt, On periodic expansions of Pisot numbers and Salem numbers, Bull. London
Math. Soc. 12 (1980), no. 4, 269–278, DOI 10.1112/blms/12.4.269. MR576976
[491] T. Schneider, Transzendenzuntersuchungen periodischer Funktionen, Teil 1, 2, Journal für
die Reine und Angewandte Mathematik 172 (1934), 65–69, 70–74,
[492] B. Schweizer and J. Smítal, Measures of chaos and a spectral decomposition of dynam-
ical systems on the interval, Trans. Amer. Math. Soc. 344 (1994), no. 2, 737–754, DOI
10.2307/2154504. MR1227094
[493] E. Seneta, Nonnegative matrices and Markov chains, 2nd ed., Springer Series in Statistics,
Springer-Verlag, New York, 1981, DOI 10.1007/0-387-32792-4. MR719544
[494] C. Series, Geometrical methods of symbolic coding, Ergodic theory, symbolic dynamics, and
hyperbolic spaces (Trieste, 1989), Oxford Sci. Publ., Oxford Univ. Press, New York, 1991,
pp. 125–151. MR1130175
[495] C. E. Shannon, A mathematical theory of communication, Bell System Tech. J. 27 (1948),
379–423, 623–656, DOI 10.1002/j.1538-7305.1948.tb01338.x. MR26286
[496] H. S. Shapiro, Extremal problems for polynomials and power series, ProQuest LLC, Ann
Arbor, MI, 1953. Thesis (Ph.D.)–Massachusetts Institute of Technology. MR2938495
[497] S. Shao and X. Ye, Regionally proximal relation of order d is an equivalence one for minimal
systems and a combinatorial consequence, Adv. Math. 231 (2012), no. 3-4, 1786–1817, DOI
10.1016/j.aim.2012.07.012. MR2964624
[498] A. Sharkovskiy, Coexistence of cycles of a continuous map of the line into itself, (Russian)
Ukrain. Math. Zh. 16 (1964), 61-71.
[499] A. Sharkovskiy, Coexistence of cycles of a continuous mapping of the line into itself, trans-
lation into English by J. Tolosa, International Journal of Bifurcation and Chaos 05 (1995)
[500] T. Shimomura, Special homeomorphisms and approximation for Cantor systems, Topology
Appl. 161 (2014), 178–195, DOI 10.1016/j.topol.2013.10.018. MR3132360
[501] T. Shimomura, Graph covers and ergodicity for zero-dimensional systems, Ergodic Theory
Dynam. Systems 36 (2016), no. 2, 608–631, DOI 10.1017/etds.2014.72. MR3503037
[502] T. Shimomura, Bratteli-Vershik models and graph covering models, Adv. Math. 367 (2020),
107127, 54, DOI 10.1016/j.aim.2020.107127. MR4080580
[503] A. M. Shur, Growth rates of complexity of power-free languages, Theoret. Comput. Sci. 411
(2010), no. 34-36, 3209–3223, DOI 10.1016/j.tcs.2010.05.017. MR2676864
[504] A. M. Shur, Growth properties of power-free languages, Developments in language the-
ory, Lecture Notes in Comput. Sci., vol. 6795, Springer, Heidelberg, 2011, pp. 28–43, DOI
10.1007/978-3-642-22321-1_3. MR2862712
[505] C. Siegel, Approximation algebraischer Zahlen (German), Math. Z. 10 (1921), no. 3-4, 173–
213, DOI 10.1007/BF01211608. MR1544471
448 Bibliography

[506] C. L. Siegel, Algebraic integers whose conjugates lie in the unit circle, Duke Math. J. 11
(1944), 597–602. MR10579
[507] K. Sigmund, Generic properties of invariant measures for Axiom A diffeomorphisms, Invent.
Math. 11 (1970), 99–109, DOI 10.1007/BF01404606. MR286135
[508] K. Sigmund, On dynamical systems with the specification property, Trans. Amer. Math. Soc.
190 (1974), 285–299, DOI 10.2307/1996963. MR352411
[509] C. E. Silva, Invitation to ergodic theory, Student Mathematical Library, vol. 42, American
Mathematical Society, Providence, RI, 2008, DOI 10.1090/stml/042. MR2371216
[510] S. Silverman, On maps with dense orbits and the definition of chaos, Rocky Mountain J.
Math. 22 (1992), no. 1, 353–375, DOI 10.1216/rmjm/1181072815. MR1159963
[511] Ja. G. Sinaı̆, A weak isomorphism of transformations with invariant measure (Russian),
Dokl. Akad. Nauk SSSR 147 (1962), 797–800. MR0161960
[512] Ja. G. Sinaı̆, Construction of Markov partitionings (Russian), Funkcional. Anal. i Priložen.
2 (1968), no. 3, 70–80 (Loose errata). MR0250352
[513] Ya. G. Sinai, Introduction to ergodic theory, translated by V. Scheffer, Mathematical Notes,
vol. 18, Princeton University Press, Princeton, N.J., 1976. MR0584788
[514] Ya. G. Sinai and C. Ulcigrai, Weak mixing in interval exchange transformations of peri-
odic type, Lett. Math. Phys. 74 (2005), no. 2, 111–133, DOI 10.1007/s11005-005-0011-0.
[515] V. F. Sirvent and Y. Wang, Self-affine tiling via substitution dynamical systems and
Rauzy fractals, Pacific J. Math. 206 (2002), no. 2, 465–485, DOI 10.2140/pjm.2002.206.465.
[516] J. Smítal, Chaotic functions with zero topological entropy, Trans. Amer. Math. Soc. 297
(1986), no. 1, 269–282, DOI 10.2307/2000468. MR849479
[517] J. Smítal and M. Štefánková, Distributional chaos for triangular maps, Chaos Solitons Frac-
tals 21 (2004), no. 5, 1125–1128, DOI 10.1016/j.chaos.2003.12.105. MR2047330
[518] M. Sollami, C. C. Douglas, and M. Liebmann, An improved lower bound on the number of
ternary squarefree words, J. Integer Seq. 19 (2016), no. 6, Article 16.6.7, 21. MR3546621
[519] B. Solomyak, On the spectral theory of adic transformations, Representation theory and
dynamical systems, Adv. Soviet Math., vol. 9, Amer. Math. Soc., Providence, RI, 1992,
pp. 217–230. MR1166205
[520] V. T. Sós, On the theory of diophantine approximations. I, Acta Math. Acad. Sci. Hungar.
8 (1957), 461–472, DOI 10.1007/BF02020329. MR93510
[521] V. Sós, On the distribution (mod 1) of the sequence {ηα}, Ann. Univ. Sci. Budapest. Eötvös
Sect. Math. 1 (1958), 127–134.
[522] C. Spandl, Computing the topological entropy of shifts, MLQ Math. Log. Q. 53 (2007),
no. 4-5, 493–510, DOI 10.1002/malq.200710014. MR2351946
[523] B. Stanley, Bounded density shifts, Ergodic Theory Dynam. Systems 33 (2013), no. 6, 1891–
1928, DOI 10.1017/etds.2013.38. MR3122156
[524] P. Štefan, A theorem of Šarkovskii on the existence of periodic orbits of continuous endo-
morphisms of the real line, Comm. Math. Phys. 54 (1977), no. 3, 237–248. MR445556
[525] I. Stewart, Galois Theory, Chapman & Hall, 2004.
[526] S. Štimac, A classification of inverse limit spaces of tent maps with finite critical or-
bit, Topology Appl. 154 (2007), no. 11, 2265–2281, DOI 10.1016/j.topol.2007.03.003.
[527] D. Sullivan, Conformal dynamical systems, Geometric dynamics (Rio de Janeiro,
1981), Lecture Notes in Math., vol. 1007, Springer, Berlin, 1983, pp. 725–752, DOI
10.1007/BFb0061443. MR730296
Bibliography 449

[528] F. Svanström, Properties of a generalized Arnold’s discrete cat map, Master Thesis, Lin-
naeus University Uppsala, 2014,
[529] S. Tabachnikov, Dragon curves revisited, Math. Intelligencer 36 (2014), no. 1, 13–17, DOI
10.1007/s00283-013-9428-y. MR3166985
[530] Y. Tarannikov, The minimal density of a letter in an infinite ternary square-free word is
0.2746 · · · , J. Integer Seq. 5 (2002), no. 2, Article 02.2.2, 8. MR1938221
[531] A. Thue, Über unendliche Zeichenreihen, Norske Vid. Selk. Skr. I. Mat. Nat. Kl. Christiana
7 (1906), 1–12.
[532] A. Thue, Über Annäherungswerte algebraischer Zahlen (German), J. Reine Angew. Math.
135 (1909), 284–305, DOI 10.1515/crll.1909.135.284. MR1580770
[533] A. Thue, Über die gegenseitige Lage gleicher Teile gewisser unendliche Zeichenreihen,
Norske Vid. Selk. Skr. I. Mat. Nat. Kl. Christiana 1 (1912), 1–67.
[534] H. Thunberg, A recycled characterization of kneading sequences: Discrete dynamical sys-
tems, Internat. J. Bifur. Chaos Appl. Sci. Engrg. 9 (1999), no. 9, 1883–1887, DOI
10.1142/S0218127499001371. MR1728748
[535] W. Thurston, On the geometry of iterated rational maps, Preprint, Princeton University,
[536] X. Tian, Different asymptotic behavior versus same dynamical complexity: recurrence
& (ir)regularity, Adv. Math. 288 (2016), 464–526, DOI 10.1016/j.aim.2015.11.006.
[537] O. Toeplitz, Ein Beispiel zur Theorie der fastperiodischen Funktionen (German), Math.
Ann. 98 (1928), no. 1, 281–295, DOI 10.1007/BF01451594. MR1512405
[538] C. Tresser and P. Coullet, Itérations d’endomorphismes et groupe de renormalisation
(French, with English summary), C. R. Acad. Sci. Paris Sér. A-B 287 (1978), no. 7, A577–
A580. MR512110
[539] R. Tijdeman, Periodicity and almost-periodicity, More sets, graphs and numbers, Bolyai
Soc. Math. Stud., vol. 15, Springer, Berlin, 2006, pp. 381–405, DOI 10.1007/978-3-540-
32439-3_18. MR2223402
[540] W. Veech, The necessity of Harris’ condition for the existence of a stationary measure,
Proc. Amer. Math. Soc. 14 (1963), 856–860, DOI 10.2307/2035014. MR156379
[541] W. A. Veech, Strict ergodicity in zero dimensional dynamical systems and the Kronecker-
Weyl theorem mod 2, Trans. Amer. Math. Soc. 140 (1969), 1–33, DOI 10.2307/1995120.
[542] W. A. Veech, Interval exchange transformations, J. Analyse Math. 33 (1978), 222–272, DOI
10.1007/BF02790174. MR516048
[543] W. A. Veech, Gauss measures for transformations on the space of interval exchange maps,
Ann. of Math. (2) 115 (1982), no. 1, 201–242, DOI 10.2307/1971391. MR644019
[544] D. Vere-Jones, Geometric ergodicity in denumerable Markov chains, Quart. J. Math. Oxford
Ser. (2) 13 (1962), 7–28, DOI 10.1093/qmath/13.1.7. MR141160
[545] D. Vere-Jones, Ergodic properties of nonnegative matrices. I, Pacific J. Math. 22 (1967),
361–386. MR214145
[546] A. M. Vershik, Uniform algebraic approximation of shift and multiplication operators
(Russian), Dokl. Akad. Nauk SSSR 259 (1981), no. 3, 526–529. MR625756
[547] A. M. Vershik, A theorem on Markov periodic approximation in ergodic theory (Russian),
Boundary value problems of mathematical physics and related questions in the theory of
functions, 14, Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. (LOMI) 115 (1982),
72–82, 306. MR660072
[548] M. Viana, Ergodic theory of interval exchange maps, Rev. Mat. Complut. 19 (2006), no. 1,
7–100, DOI 10.5209/rev_REMA.2006.v19.n1.16621. MR2219821
450 Bibliography

[549] M. Viana and K. Oliveira, Foundations of ergodic theory, Cambridge Studies in Ad-
vanced Mathematics, vol. 151, Cambridge University Press, Cambridge, 2016, DOI
10.1017/CBO9781316422601. MR3558990
[550] P. Walters, Some results on the classification of non-invertible measure preserving trans-
formations, Recent advances in topological dynamics (Proc. Conf. Topological Dynamics,
Yale Univ., New Haven, Conn., 1972; in honor of Gustav Arnold Hedlund), Lecture Notes
in Math., Vol. 318, Springer, Berlin, 1973, pp. 266–276. MR0393424
[551] P. Walters, An introduction to ergodic theory, Graduate Texts in Mathematics, vol. 79,
Springer-Verlag, New York-Berlin, 1982. MR648108
[552] P. Walters, On the pseudo-orbit tracing property and its relationship to stability, The struc-
ture of attractors in dynamical systems (Proc. Conf., North Dakota State Univ., Fargo, N.D.,
1977), Lecture Notes in Math., vol. 668, Springer, Berlin, 1978, pp. 231–244. MR518563
[553] H. Wang, X. Long, and H. Fu, Sensitivity and chaos of semigroup actions, Semigroup Forum
84 (2012), no. 1, 81–90, DOI 10.1007/s00233-011-9335-5. MR2885999
[554] Y. Wang, L. Yang, and H. Xie, Complexity of unimodal maps with aperiodic kneading
sequences, Nonlinearity 12 (1999), no. 4, 1151–1176, DOI 10.1088/0951-7715/12/4/323.
[555] B. Weiss, Intrinsically ergodic systems, Bull. Amer. Math. Soc. 76 (1970), 1266–1269, DOI
10.1090/S0002-9904-1970-12632-5. MR267076
[556] B. Weiss, Subshifts of finite type and sofic systems, Monatsh. Math. 77 (1973), 462–474,
DOI 10.1007/BF01295322. MR340556
[557] H. Weyl, Über die Gleichverteilung von Zahlen mod. Eins (German), Math. Ann. 77 (1916),
no. 3, 313–352, DOI 10.1007/BF01475864. MR1511862
[558] N. Wiener, Generalized harmonic analysis, Acta Math. 55 (1930), no. 1, 117–258, DOI
10.1007/BF02546511. MR1555316
[559] R. F. Williams, Classification of subshifts of finite type, Ann. of Math. (2) 98 (1973), 120–
153; errata, ibid. (2) 99 (1974), 380–381, DOI 10.2307/1970908. MR331436
[560] S. Williams, Toeplitz minimal flows which are not uniquely ergodic, Z. Wahrsch. Verw.
Gebiete 67 (1984), no. 1, 95–107, DOI 10.1007/BF00534085. MR756807
[561] C. Williamson, An overview of the Thue-Morse sequence, unpublished manuscript, 2012,
[562] T. S. Wu, Proximal relations in topological dynamics, Proc. Amer. Math. Soc. 16 (1965),
513–514, DOI 10.2307/2034685. MR179775
[563] X. Wu, Y. Luo, X. Ma, and T. Lu, Rigidity and sensitivity on uniform spaces, Topology
Appl. 252 (2019), 145–157, DOI 10.1016/j.topol.2018.11.014. MR3883168
[564] H. M. Xie, On formal languages in one-dimensional dynamical systems, Nonlinearity 6
(1993), no. 6, 997–1007. MR1251254
[565] H. Xie, Grammatical complexity and one-dimensional dynamical systems, Directions in
Chaos, vol. 6, World Scientific Publishing Co., Inc., River Edge, NJ, 1996, DOI 10.1142/2877.
[566] J. C. Xiong, A chaotic map with topological entropy, Acta Math. Sci. (English Ed.) 6 (1986),
no. 4, 439–443, DOI 10.1016/S0252-9602(18)30503-4. MR924033
[567] E. Zeckendorf, Représentation des nombres naturels par une somme de nombres de Fibonacci
ou de nombres de Lucas (French, with English summary), Bull. Soc. Roy. Sci. Liège 41
(1972), 179–182. MR308032
[568] A. N. Zemljakov and A. B. Katok, Topological transitivity of billiards in polygons (Russian),
Mat. Zametki 18 (1975), no. 2, 291–300. MR399423
[569] J. D. Zund, George David Birkhoff and John von Neumann: a question of priority and the
ergodic theorems, 1931–1932 (English, with English and French summaries), Historia Math.
29 (2002), no. 2, 138–156, DOI 10.1006/hmat.2001.2338. MR1896971

(δ, v)-separated, 39 ζ-function, 60, 61, 106, 198, 367

(δ, v)-spanning, 39
(n, ε)-separated, 36 Abel’s summation formula, 397
(n, ε)-spanning, 36 abelianization, 140
∗-product, 207 Abramov’s formula, 293
1-cutting, 149, 150, 176 accepting state, 342, 343, 361
N0 , 19 add and carry, 191, 226
N -automaton, 358 adding machine, 186, 190, 200, 207,
S-adic, 142, 158, 176, 183, 235, 242, 229, 265
269, 399 additive coding, 177
S-adic shift, 207, 242, 354 admissible, 91–93, 96, 115, 117, 197
S-gap shift, 117, 292, 294 algebraic conjugate, 367
UT (Koopman operator), 309 algebraic integer, 367
α(x) (α-limit), 21, 27 algebraic number, 367
a (adding machine), 190 almost one-to-one extension, 21, 34,
β-shift, 15, 50, 72, 78 164, 194
β-transformation, 15, 89 alpha-limit set, 21
B-admissible shift, 197 alphabet, 1, 53, 182
B-free shift, 195, 260, 295 Alvin, 186, 203
B-hereditary subshift, 197 amorphic complexity, 39, 39, 142, 166,
L(X) (language), 5 188
LT (transfer operator), 258, 292, 309 Anosov Closing Lemma, 48
Ln (X), 5 Anosov Shadowing Lemma, 48
 (empty word), 2, 141 aperiodic, 53, 139, 218, 402
-move, 345 arbre de retenues, 228
i (itinerary), 15 Arnoux, 155, 157, 162
μ-mean equicontinuous, 34, 317 Arnoux-Rauzy conjecture, 162
ω(x) (ω-limit set), 20, 27 Arnoux-Rauzy substitution, 162
σ (left-shift), 3 associated matrix, 139
||| |||, 229, 327, 369 asymptotic pair, 138
| |0 , 168 attracting, 20
| |1 , 91, 168 Auslander, 24, 30, 42
| |a , 270 Auslander-Yorke chaos, 42, 77

452 Index

Auslander-Yorke dichotomy, 42, 42, 239 Chacon map, 223

autocorrelation coefficient, 312 Chacon substitution, 11, 46, 150, 223,
automatic sequence, 358 241, 338
automaton, 95 chain-connected, 195
axiom of choice, 43 chain-recurrent, 47
chaos, 36, 40
baker transformation, 131, 285 Auslander-Yorke, 42
balanced, 152, 162, 168, 177, 265 Devaney, 41, 43, 121
Banach density, 395 distributional, 43
base, 218, 249 Li-Yorke, 43, 44, 121
basis of the topology, 2, 218 character, 316
Beatty sequence, 88 characteristic periodic point, 115
Bernoulli measure, 14, 130, 239, 259, characteristic polynomial, 60, 151, 369,
288 373, 409
Bernoulli shift, 44, 129, 283 Chebyshev polynomial, 22, 137
Bernoulli trials, 295 Chinese Remainder Theorem, 192, 199
Berstel, 383 Chomsky hierarchy, 341, 347
bi-special, 6, 164, 169, 266 Choquet simplex, 130, 259, 276, 295
biaccessible, 108 Choquet’s Theorem, 260
Binet’s formula, 371
circle rotation, 15, 22, 33, 162, 163, 175,
Birkhoff, 258, 261 179
Birkhoff Ergodic Theorem, 12, 167, 261,
Climenhaga, 70, 130
265, 307, 321, 386, 407
clopen, 2, 218, 233
block, 2
closest precritical point, 96, 112, 206
block code, 52
co-cutting time, 93, 99, 206
block shift, 12, 61, 143, 153, 173, 271
Cobham’s Little Theorem, 359
block substitution, 137, 143, 146
Cobham’s Theorem, 362
Bochner-Herglotz Theorem, 311
coboundary, 326
Borel-Cantelli Lemma, 394
code, 10
Bounded Convergence Theorem, 314
block, 52
bounded gaps, 26
sliding block, 10, 56, 58, 84, 287
bounded type, 127, 133, 161, 393, 395
coded shift, 46, 65, 118, 119, 204, 294
bounded variation, 166
coincidence, 322
Bowen, 36, 47, 49, 60
Bratteli, 233 concave, 282
Bratteli diagram, 327 Conjecture
Bratteli-Vershik system, 218, 219, 271, Arnoux-Rauzy, 162
319, 325, 327, 399 Host’s, 161
broken line, 153 Ingram, 365
Keane, 279–281
Calkin-Wilf function, 383 Lang’s, 395
Calkin-Wilf tree, 383 MLC, 111
canonical function, 74, 88 Pisot substitution, 85, 152, 320
canonical Markov extension, 95 Sarnak’s, 198, 281
canonical odometer, 200 Smorodinsky’s, 299
Cantor set, 2, 204, 233, 236 Steinhaus’s, 175
Cantor substitution, 24, 200, 262 Williams’s, 61
Carathéodory’s Theorem, 108 conjugate, 10, 21
Catalan number, 129 constant
Cayley-Hamilton Theorem, 61, 370 Komornik-Loreti, 81
Cesàro mean, 202, 257 constant length substitution, 135, 142,
Chacon, 220 322, 359
Index 453

constant type, 393 edge-labeled, 3, 4, 61, 62, 62–64, 66, 78,

context-free, 131, 347–349 81, 84, 95, 117, 233, 345, 349
context-sensitive, 347, 352, 354–356 eigenfunction, 122, 309, 310, 312, 317
continued fraction, 164, 176, 178, 228, eigenvalue, 309
376 elementary insplit, 58
continuous eigenvalue, 309 elementary outsplit, 58
convergents, 164, 228, 377, 379, 392, 395 entropy, 8, 36, 44, 119, 188
core, 88, 365 at infinity, 408
correlation coefficient, 295, 312 Gurevich, 66, 71, 402, 405
critical exponent, 126 measure-theoretic, 406
critical point, 88 topological, 89, 100, 122, 197
critical value, 88 entropy dense, 294, 295, 408
cube, 122 enumeration scale, 225
cube-free, 122 enumeration system, 190, 204, 219, 225,
cutting and stacking, 218, 219, 240, 250 228, 260, 275
cutting time, 92, 99, 206, 229, 255, 276 equal path number property, 247, 322
cyclic S-gap shift, 119 equicontinuous, 28, 29, 40, 219, 229,
cylinder set, 2 317
μ-mean, 34, 42, 317
decisive, 237 mean, 33, 264, 317
Dejean’s Theorem, 126 Weyl mean, 33
Dekking, 149, 296, 322, 328 equidistribution, 50, 50, 259, 260, 262,
dendrite, 106 408
Denjoy, 165, 390 Eratosthenes, 196
Denjoy circle map, 165, 213 ergodic, 13, 259, 260, 288, 293, 399
intrinsically, 201, 202
Denjoy Theorem, 165
uniquely, 152, 200, 265, 276, 298
Denjoy-Koksma inequality, 390
ergodic average, 261
density, 33, 72, 127, 196, 395, 397
ergodic decomposition, 260
density point, 44
Ergodic Theorem, 167, 261, 265, 307
density shift, 46, 74, 87
essentially minimal, 27, 240, 241
derived substitution, 147
Euclidean algorithm, 192, 376, 377, 381,
deterministic finite automaton, 344
Devaney chaos, 41, 121
even shift, 4, 11, 12, 62, 63, 65, 117,
DFAO (deterministic finite automaton
119, 344, 347, 358
with output), 357
eventually conjugate, 61
Diophantine approximation, 381, 391
evolution rule, 19
Diophantine number, 393
expanding, 17
Dirichlet’s Theorem, 392
expansive, 17, 17, 33, 50, 193, 194, 247
discrepancy, 388, 391, 395
expansivity constant, 17
discrete spectrum, 315
exponential growth, 8, 36, 51, 68, 69, 89
distal, 30, 31, 44
doubling map, 18, 110, 285 extension, 21
Downarowicz, 149, 198, 237, 239, 260, external ray, 108
265, 287, 295, 320
Durand, 134, 141, 145, 150, 160, 161, factor, 2, 21
327, 364 factor map, 21
dyadic odometer, 190, 240 Farey, 379, 381
Dyck shift, 5, 128, 128, 348 child, 381
dynamical system, 19 convergents, 379
neighbors, 381
edge surjective, 174 parent, 381
454 Index

sum, 381 golden mean, 54, 80, 92, 136, 274, 368,
web, 381 392, 393, 395, 416
Farey convergents, 214 Gottschalk, 325
Fatou set, 107 grammar, 346
Feigenbaum context-free, 348, 349
map, 100, 205, 207–209, 350 context-sensitive, 352, 353
sequence, 350, 355 recursively enumerable, 347, 356
shift, 186, 194 regular, 347, 349
substitution, 136, 193, 207, 250, 323, graph cover, 174, 220, 243
350 greedy expansion, 77, 79, 225, 226, 253
Feigenbaum map, 100 Gurevich entropy, 66, 71, 402, 405
Fekete’s Lemma, 8, 76, 89, 188, 190,
282, 411, 414 Haar measure, 201, 266, 315, 317
Ferenczi, 159, 161, 223, 298, 299 Halmos & von Neumann Structure
Fermat’s Little Theorem, 198 Theorem, 155, 315
Fibonacci halting state, 342
Bratteli-Vershik system, 242, 251, 274 Hedlund, 6, 10, 17, 163, 172, 176, 325
number, 6, 92, 105, 134, 226, 228, height, 218, 221, 234, 271, 327
274, 371, 393 hereditary, 71, 71, 72, 82, 121, 197
SFT, 3–5, 9, 11, 51, 53, 66, 74, 117, closure, 72
119, 228, 292 subshift, 72
substitution, 3, 92, 134, 135, 143, hereditary subshift, 197
145, 180, 222, 242, 292 Hilbert, 367, 400
unimodal map, 92, 95, 96, 355 Hilbert metric, 281, 400, 400
Fibonacci substitution, 46 Hilbert space, 309
Fine-Wilf Theorem, 362 Hofbauer, 82, 95, 96, 100
finitary, 287 Hofbauer tower, 81, 85, 95
first return map, 179, 183, 204, 217 homoclinic point, 374
first return time, 217, 218, 261, 288 homterval, 92
fixed point, 20 Host, 134, 149, 161, 269, 319, 327
flip-conjugacy, 23 Host’s conjecture, 161
follower separated, 64 Hubbard tree, 106
follower set, 63, 63, 76, 83, 85, 118 hyperbolic, 47, 54, 131
Ford circles, 383
formula IET, 162, 181, 244
Abel’s summation, 397 incidence matrix, 234, 242, 243, 271
Abramov’s, 293 inequality
Binet’s, 371 Denjoy-Koksma, 390
Rokhlin’s, 284, 292 Parseval, 318
Stirling’s, 73, 406 infinitely renormalizable, 136, 205, 207,
forward orbit, 20 229, 255
frequency, 12, 127, 134, 151, 171, 172, Ingram conjecture, 365
175, 177, 262, 263, 350, 354, 391 initial state, 342
Furstenberg, 46, 265 insplit graph, 58
internal address, 111
Galois conjugate, 367, 368, 369 interval exchange transformation (IET),
gap shift, 50, 117 161, 162, 181, 263, 281
Gauß map, 380 intrinsically ergodic, 14, 50, 57, 64, 70,
generalized odometer, 225 82, 100, 119, 201, 202, 293, 407
generating, 34, 283 invariant coordinate, 101
generic, 267 invariant measure, 13, 257
generic measure, 13, 266, 277 inverse limit space, 137, 365
Index 455

irreducible, 9, 53, 139, 151, 181, 288, left-special, 6, 138, 164, 169, 176, 209
402 Lehmer’s number, 368
isometry, 28, 37, 40, 192, 265 Lemma
isomorphic, 152, 284, 309 Anosov Closing, 48
isomorphism, 34, 252, 315 Anosov Shadowing, 48
iterated function system (IFS), 157, 231 Borel-Cantelli, 394
itinerary, 15, 90, 109 Fekete’s, 8, 76, 89, 188, 190, 282, 411,
Julia set, 106, 107, 108, 114 Kac’s, 261
Ogden’s, 350
Kac’s Lemma, 261
Pumping, 349, 360
Kakutani, 220
Riemann-Lebesgue, 312
Kakutani-Rokhlin partition, 218, 329
leo (locally eventually onto), 45, 279
Keane, 180, 263, 265, 277, 279, 287,
Li-Yorke chaos, 43, 77, 121
296, 320, 328, 380, 399
Li-Yorke chaotic, 44
Keane condition, 181, 245, 276, 279
Keane conjecture, 279–281 Li-Yorke pair, 43
Kepler tree, 383 lift, 22, 165
Kleene’s Theorem, 347 light tails, 196
Knaster continuum, 137, 365 linear complexity, 39
kneading determinant, 100, 103 linearly recurrent, 7, 133, 141, 160–162,
kneading increment, 102 219, 273, 296, 327, 349
kneading map, 48, 204, 209, 229, 255, Liouville number, 393
276, 354 local rule, 10
kneading sequence, 91, 107, 109, 186, locally eventually onto, 45, 279
203, 208, 350, 355 locally expanding, 17, 208
kneading theory, 90 logarithmic density, 196, 197, 397
Koksma inequality, 389 long-branched, 90, 94
Kolmogorov entropy, 282 Lorenz-like map, 210
Kolmogorov Extension Theorem, 13, 14, low enumeration scale, 204, 229
270, 273, 288, 295, 319, 321 Lyapunov stable, 28, 33
Komornik-Loreti constant, 81
Koopman, 258
Maass, 239, 327
Koopman operator, 122, 151, 258, 309,
main cardioid, 108, 111
315, 326
Mandelbrot set, 108, 108, 111
KR-partition (Kakutani-Rokhlin), 218,
Chacon, 223
Krieger, 128, 129, 263, 283
Kronecker factor, 312, 315, 324, 325 doubling, 285
Krylov-Bogul’yubov Theorem, 276 Feigenbaum, 205, 207–209, 350
skew tent, 100, 105
Lagrange spectrum, 392 tent, 22, 40, 48, 88, 90, 94, 100, 105,
Lagrange’s Theorem, 354, 393 117, 137, 213, 365
Lang’s conjecture, 395 unimodal, 88, 229, 354
language, 5, 65, 345 Markov, 94
lap-number, 89, 103, 351 Markov measure, 288, 406
lazy expansion, 78, 79 Markov partition, 53, 83–85
leading eigenvalue, 398 Markov triple, 392
Lebesgue Density Theorem, 44 martingale, 330
Lebesgue space, 283 Martingale Convergence Theorem, 330
left-linear, 347 maximal equicontinuous factor, 32, 32,
left-shift, 3 34, 45, 194, 199
456 Index

maximal measure, 14, 50, 71, 82, 128, null recurrent, 71, 403
129, 288, 289, 293, 294, 403, 405, nullset, 257
407, 411 number
maximal spectral type, 312 algebraic, 367
mean equicontinuous, 33, 264, 317 Catalan, 129
mean sensitive, 42 Diophantine, 393
measurable eigenvalue, 309 Fibonacci, 6, 92, 105, 134, 226, 228,
measure 274, 371, 393
Bernoulli, 14, 259 Lehmer’s, 368
generic, 13, 266, 277 Liouville, 393
Haar, 266, 315, 317 multinacci, 368
Mirsky, 201 Pell, 393
of maximal entropy, 14, 71, 82, 128, Perron, 9, 84
129, 288, 289, 293, 294, 403, 405, Pisot, 84, 151, 230, 339, 368
407, 411 plastic, 368
Shannon-Parry, 289, 293, 295, 406 quadratic, 354, 393
spectral, 311, 312, 315 rotation, 22, 87, 165
measure-theoretic entropy, 282, 287, 406 Salem, 368
mediant, 381 transcendental, 5, 81, 367
memory, 10, 52 tribonacci, 154, 230
metallic mean, 368, 393
metric entropy, 282 odd shift, 4, 66, 74, 119, 120
microscoping, 234 odometer, 190, 224, 238, 239, 248, 265,
minimal, 24, 25, 42, 157, 159, 183, 200, 319
208, 238, 298 canonical, 200
minimal polynomial, 152, 367, 369 dyadic, 240
mirror invariant, 164 simple, 193
Mirsky measure, 201, 201 Ogden’s Lemma, 350
Misiurewicz, 207, 209, 409 omega-limit set, 7, 20, 21, 204
mixed spectrum, 323 orb, 20
mixing, 295, 296 orbit, 20
strong, 295, 296 orbit cocycle, 23
topological, 45, 49, 63, 65, 76, 118 orbit equivalent, 23
weak, 39, 224, 306, 307, 339 ordered Bratteli system, 233
weak topological, 45 Ornstein, 286, 298
weak topologically, 65 Ornstein’s Theorem, 286
Möbius function, 198, 198 orthogonal, 310
Montel’s Theorem, 111 Ostrowski numeration, 228
morphism, 135 outsplit graph, 58
Morse, 5, 6, 163, 172, 176 overlap, 122
multinacci number, 368 overlap-free, 122
multiplicative coding, 178 Oxtoby’s Theorem, 14, 262
Myhill-Nerode Theorem, 63
palindromic, 137, 164, 368
natural extension, 285 paper folding sequence, 136
neutral, 20 parity-lexicographical order, 89, 91, 203
non-deterministic finite automaton, 344 Parry, 57, 288, 289, 291
non-erasing, 141 Parseval inequality, 318
non-wandering, 47 partial quotients, 393
non-wandering point, 21 Pavlov, 68, 70, 410
non-wandering set, 21, 28, 38 Pell number, 393
Non-wandering Triangle Theorem, 114 perfect Bratteli-Vershik system, 237
Index 457

period, 20, 53 product topology, 2, 233

period doubling cascade, 205 production rule, 345, 346
period doubling substitution, 136, 207 proper, 160
periodic point, 20 properly ordered, 238, 241, 247, 250
periodic structure, 187, 188, 265 proximal, 30, 198, 200
periodically recurrent, 26, 26, 47, 192, proximal pair, 30, 199
194 pseudo-orbit, 47
Perron eigenvalue, 320, 402 Pumping Lemma, 349, 360
Perron number, 9, 84, 369 pure base substitution, 322
Perron-Frobenius eigenvalue, 398 pure point spectrum, 85, 152, 162, 315,
Perron-Frobenius Theorem, 56, 139, 315, 317–320, 322–324
158, 289, 369, 398 push-down automaton, 128, 352
Petersen, 46, 285, 326
Petersen’s shift, 46 quadratic map, 88, 114
phase space, 19 quadratic number, 354, 393
piecewise monotone, 53 quadratic polynomial, 117
pinched disk, 108 quasi-generic, 201, 262
Pisot, 340
random walk, 405
Pisot number, 84, 151, 230, 339, 368,
rank, 149, 162, 222, 224, 239, 251, 305
rational approximation, 381
Pisot polynomial, 368 Rauzy, 149, 154, 162, 173, 180, 230,
Pisot substitution, 151, 162 245, 279
Pisot substitution conjecture, 85, 152, Rauzy class, 279
320 Rauzy fractal, 154, 231
Poincaré Recurrence Theorem, 260 Rauzy graph, 173, 175, 266
pointed, 21 Rauzy induction, 183, 184, 245
pointed conjugacy, 21 reciprocal polynomial, 368
polynomial recognizable, 66, 149, 155, 162, 219
characteristic, 60, 369, 373, 409 recurrent, 21, 47, 169, 260
minimal, 367, 369 chain-, 47
Pisot, 368 periodically, 26, 47
quadratic, 117 uniformly, 25, 26, 47
reciprocal, 368 recursively enumerable, 347, 356
polynomial growth, 8, 38, 39, 189 reflection principle, 406, 416
positive definite, 311 regionally proximal, 32
positive directional, 174, 220 regular, 201
positive recurrent, 71, 403 regular bi-special word, 6, 173, 266
Poulsen simplex, 130, 201, 260, 276, regular grammar, 347
295, 408 regular language, 344, 349
power entropy, 38 regular Toeplitz sequence, 188
power-free, 122, 134, 161 regularly enumerable, 341
prefix, 2 rejecting state, 342, 343, 361
prefix-suffix graph, 157, 232 renormalizable, 204, 229, 350
preperiod, 20 renormalization, 179, 204, 207
preperiodic point, 20 repelling, 20
primitive, 53, 139–143, 149, 150, 185, repetition exponent, 126
196, 224, 235, 269, 288, 289 return word, 133, 145, 160, 219, 267
Primitive Element Theorem, 376 Riemann Mapping Theorem, 108
probability matrix, 288 Riemann-Lebesgue Lemma, 312
probability vector, 14, 288 right dense, 361
product measure, 14 right resolving, 63
458 Index

right-linear, 347 coded, 204, 294

right-special, 6, 169, 209, 268 density, 46, 74
Rokhlin formula, 284, 292 Dyck, 128, 348
rome, 293, 409 even, 4, 11, 12, 62, 63, 65, 117, 119,
rotation, 15, 22, 27, 33, 37, 88, 151, 162, 344, 347, 358
163, 175, 179, 181, 198, 279, 314, Feigenbaum, 186, 194
326, 380, 385 gap, 117
rotation number, 22, 87, 165 hereditary, 72
rotation sequence, 163, 176 odd, 74
Rudin-Shapiro substitution, 324 of finite type, 51
Petersen’s, 46
Salama’s criteria, 404 sofic, 46, 293
Salem number, 368
spacing, 9, 120
Sarnak’s conjecture, 198, 281
square-free, 349
Schauder-Tychonov Theorem, 326
Sturmian, 133
scrambled, 43, 44, 121
subordinate, 72
second countable, 23
substitution, 7, 85, 138, 161, 207,
semi-conjugacy, 21
219, 269, 365
sensitive dependence, 41, 42
synchronized, 9, 83, 131
separation numbers, 39
Thue-Morse, 7, 14, 237, 323
shift equivalence, 60, 144
automatic, 358
shift map, 3
Beatty, 88
Siegel disk, 107, 108
Feigenbaum, 350
silver mean, 368, 393
Fibonacci, 6, 92, 105, 134, 228, 274,
371, 393 simple Bratteli diagram, 235
kneading, 91, 107, 109, 186, 203, 208, simple cap, 234
350, 355 simple odometer, 193
Narayama, 231 simple spectrum, 317
Narayama cow, 231 simplex
paper folding, 136 Choquet, 259, 276, 295
Pell, 393 Poulsen, 130, 201, 260, 276, 295
regular Toeplitz, 188 Sinaı̌’s Theorem, 287
rotation, 163, 176 skeleton, 187, 188, 247, 249, 320
Sturmian, 88 skew tent map, 100, 105
Thue-Morse, 81, 359 sliding block code, 10, 52, 56, 58, 84,
Toeplitz, 185, 186, 201 287
SFT Smorodinsky’s conjecture, 299
Fibonacci, 119, 292 sofic, 62, 84, 89
SFT (subshift of finite type), 3, 9, 12, sofic shift, 9, 46, 95, 117, 118, 127, 293
44, 47, 51, 55, 72, 76, 89, 95, 117, solenoid, 230
118, 120, 131, 288, 289, 293, 295 Sós, 175
shadowing property, 8, 47, 47, 55, 89 spacer, 221, 222, 240
Shannon, 289 spacing shift, 9, 72, 120, 120
Shannon-Parry measure, 289, 293, 295, specification, 47, 48, 48, 82, 83, 118,
406 293, 294
Sharkovskiy’s Theorem, 43 spectral decomposition, 313
shift spectral measure, 311, 312, 315
S-adic, 207, 242 spectral properties, 309
S-gap, 117, 294 spectrum
B-free, 195 continuous, 281, 325
block, 61, 173 Lagrange, 392
Index 459

mixed, 323 Sullivan’s Theorem, 107

pure point, 152, 162, 315, 317–320, superadditive, 8
322–324 supermultiplicative, 8
simple, 317 switch region, 80
speed-up, 23 synchronized shift, 9, 46, 83, 118, 128,
spine, 253, 255 131
square, 122 synchronizing word, 9, 10, 84
square-free, 122, 198, 418 syndetic, 26, 30, 118, 361
square-free shift, 349 syndetically proximal, 30, 198
staircase system, 299
starting symbol, 346 taut, 196
Steinhaus’s conjecture, 175 telescoping, 174, 234, 248
tent map, 22, 40, 48, 88, 90, 94, 100,
Stirling’s formula, 73, 406
105, 117, 137, 213, 365
Stone-Weierstraß Theorem, 317
terminal, 346
strange adding machine, 207, 209, 229
strictly ergodic, 14, 192, 262, 269
Birkhoff’s Ergodic, 12, 167, 261, 265,
strictly sofic, 62
307, 321, 386, 407
strong orbit equivalence, 23
Bochner-Herglotz, 311
strong shift equivalence, 59
Bounded Convergence, 314
strongly mixing, 295
Carathéodory, 108
strongly positive recurrent, 403
Cayley-Hamilton, 61, 370
Sturmian sequence, 88, 161 Chinese Remainder, 192, 199
Sturmian shift, 15, 33, 44, 133, 161, Choquet’s, 260
162, 169, 173, 176, 209, 259, 314, Cobham’s, 362
318, 349, 354 Cobham’s Little, 359
subadditive, 8, 74, 282 Dejean’s, 126
sublinear complexity, 6 Denjoy’s, 165
submultiplicative, 8 Dirichlet’s, 392
subshift, 3 Fermat’s Little, 198
subshift of finite type, 3, 9, 51, 288, 289 Fine-Wilf, 362
substitution, 85, 127, 135, 365 Halmos & von Neumann’s Structure,
Arnoux-Rauzy, 162 155, 315
block, 137, 143, 146 Kleene’s, 347
Cantor, 24, 200, 262 Kolmogorov Extension, 13, 14, 270,
Chacon, 11, 46, 150, 223, 241 273, 288, 295, 319, 321
constant length, 124, 135, 142, 296, Krylov-Bogul’yubov, 257, 276
322, 359 Lagrange’s, 354, 393
derived, 147 Lebesgue Density, 44
Feigenbaum, 136, 193, 207, 250, 323 Martingale Convergence, 330
Fibonacci, 3, 46, 92, 134, 135, 143, Montel’s, 111
145, 180, 222, 242, 292 Myhill-Nerode, 63
Pisot, 151, 162 Non-wandering Triangle, 114
primitive Chacon, 11, 338 Ornstein’s, 286
pure base, 322 Oxtoby’s, 14, 262
Rudin-Shapiro, 324 Perron-Frobenius, 56, 139, 158, 289,
shift, 7, 85, 138, 161, 219, 269 369, 398
Thue-Morse, 5, 46, 123, 124, 140, Poincaré Recurrence, 260
150, 186, 265 Primitive Element, 376
substitution shift, 7, 138, 161, 207, 219, Riemann Mapping, 108
269, 365 Schauder-Tychonov, 326
suffix, 2 Sinaı̌, 287
460 Index

Stone-Weierstraß, 317 uniformly recurrent, 25, 26, 47

Sullivan’s, 107 uniformly rigid, 26, 29, 42, 187, 192
Three Gap, 175 unimodal map, 85, 88, 110, 151, 207,
Tychonov, 2 208, 211, 229, 354
Van der Corput’s Difference, 386 unimodal subshift, 90
Williams’s, 60 unimodular, 54, 151, 155, 184, 244
Zeckendorf’s, 226 unique decomposition, 66
thick, 26, 121, 196 uniquely decipherable, 66
thin, 196, 201 uniquely ergodic, 14, 34, 152, 200, 224,
Three Gap Theorem, 175 262–265, 267, 276, 293, 298, 399
Thue, 5, 122, 394 unitary operator, 310
Thue-Morse univoque, 80
sequence, 81, 359
shift, 7, 14, 237, 323 Van der Corput’s Difference Theorem,
substitution, 5, 123, 124, 140, 150, 386
186, 265 Vandermonde matrix, 370
Thue-Morse shift, 14 variable, 346
Thue-Morse substitution, 5, 46 variation, 166
Thurston lamination, 108 Variational Principle, 38, 287, 403
Toeplitz sequence, 185, 186, 201 Vere-Jones classification, 403
Toeplitz shift, 185, 198, 219, 247, 260, Vershik, 233
265, 320 Vershik map, 235
topological entropy, 8, 36, 44, 66, 89, vertex splitting, 57
100, 110, 122, 197 vertex-labeled, 3, 4, 52, 53, 57, 61, 62,
topological mixing, 45, 49, 63, 65, 76, 78, 117, 173
118, 121 visit time, 46
topologically conjugate, 21 von Neumann, 152, 155, 261, 315, 317
topologically exact, 45, 240, 279 von Neumann-Kakutani map, 221
topologically transitive, 23, 28, 29, 65, weak mixing, 39, 224, 306, 313, 339
76, 264, 279 weak topological mixing, 45, 121
toral automorphism, 51, 54, 54 Weiss, 46, 54, 57, 62, 201, 260, 293, 294
toral endomorphism, 374, 375 well-distributed, 385
totally transitive, 23, 63, 65 Weyl, 33, 386
transcendental number, 5, 81, 367 Weyl mean equicontinuity, 33
transfer operator, 258, 292, 309, 380 Weyl’s Criterion, 386
transient, 403 Williams’s Conjecture, 61
transition function, 342, 344, 352 Williams’s Theorem, 60
transition graph, 53, 345, 409 window, 201
transition matrix, 53, 139, 288 window size, 10
transitive, 9, 23, 29, 41, 45, 122, 261 word, 2, 5, 7
topologically, 23, 28, 65, 264 word-complexity, 5, 8, 57, 126, 161, 189,
totally, 63, 65 223, 239, 266, 298, 305
tribonacci number, 154, 230
Turing machine, 341, 356 Yorke, 42, 43, 90
turning point, 88
Tychonov’s Theorem, 2 Zeckendorf’s Theorem, 226
type of a Sturmian sequence, 168, 176 Zorich acceleration, 280
typical, 50, 202, 261

uniform recurrence, 219

uniform scheme, 203
uniformly distributed, 385
Selected Published Titles in This Series
228 Henk Bruin, Topological and Ergodic Theory of Symbolic Dynamics, 2022
225 Jacob Bedrossian and Vlad Vicol, The Mathematical Analysis of the Incompressible
Euler and Navier-Stokes Equations, 2022
223 Volodymyr Nekrashevych, Groups and Topological Dynamics, 2022
222 Michael Artin, Algebraic Geometry, 2022
221 David Damanik and Jake Fillman, One-Dimensional Ergodic Schrödinger Operators,
220 Isaac Goldbring, Ultrafilters Throughout Mathematics, 2022
219 Michael Joswig, Essentials of Tropical Combinatorics, 2021
218 Riccardo Benedetti, Lectures on Differential Topology, 2021
217 Marius Crainic, Rui Loja Fernandes, and Ioan Mărcuţ, Lectures on Poisson
Geometry, 2021
216 Brian Osserman, A Concise Introduction to Algebraic Varieties, 2021
215 Tai-Ping Liu, Shock Waves, 2021
214 Ioannis Karatzas and Constantinos Kardaras, Portfolio Theory and Arbitrage, 2021
213 Hung Vinh Tran, Hamilton–Jacobi Equations, 2021
212 Marcelo Viana and José M. Espinar, Differential Equations, 2021
211 Mateusz Michalek and Bernd Sturmfels, Invitation to Nonlinear Algebra, 2021
210 Bruce E. Sagan, Combinatorics: The Art of Counting, 2020
209 Jessica S. Purcell, Hyperbolic Knot Theory, 2020
208 Vicente Muñoz, Ángel González-Prieto, and Juan Ángel Rojo, Geometry and
Topology of Manifolds, 2020
207 Dmitry N. Kozlov, Organized Collapse: An Introduction to Discrete Morse Theory, 2020
206 Ben Andrews, Bennett Chow, Christine Guenther, and Mat Langford, Extrinsic
Geometric Flows, 2020
205 Mikhail Shubin, Invitation to Partial Differential Equations, 2020
204 Sarah J. Witherspoon, Hochschild Cohomology for Algebras, 2019
203 Dimitris Koukoulopoulos, The Distribution of Prime Numbers, 2019
202 Michael E. Taylor, Introduction to Complex Analysis, 2019
201 Dan A. Lee, Geometric Relativity, 2019
200 Semyon Dyatlov and Maciej Zworski, Mathematical Theory of Scattering
Resonances, 2019
199 Weinan E, Tiejun Li, and Eric Vanden-Eijnden, Applied Stochastic Analysis, 2019
198 Robert L. Benedetto, Dynamics in One Non-Archimedean Variable, 2019
197 Walter Craig, A Course on Partial Differential Equations, 2018
196 Martin Stynes and David Stynes, Convection-Diffusion Problems, 2018
195 Matthias Beck and Raman Sanyal, Combinatorial Reciprocity Theorems, 2018
194 Seth Sullivant, Algebraic Statistics, 2018
193 Martin Lorenz, A Tour of Representation Theory, 2018
192 Tai-Peng Tsai, Lectures on Navier-Stokes Equations, 2018
191 Theo Bühler and Dietmar A. Salamon, Functional Analysis, 2018
190 Xiang-dong Hou, Lectures on Finite Fields, 2018
189 I. Martin Isaacs, Characters of Solvable Groups, 2018
188 Steven Dale Cutkosky, Introduction to Algebraic Geometry, 2018
187 John Douglas Moore, Introduction to Global Analysis, 2017

For a complete list of titles in this series, visit the

AMS Bookstore at
Symbolic dynamics is essential in the study of dynamical systems of various
types and is connected to many other fields such as stochastic processes, ergodic
theory, representation of numbers, information and coding, etc. This graduate
text introduces symbolic dynamics from a perspective of topological dynamical
systems and presents a vast variety of important examples.
After introducing symbolic and topological dynamics, the core of the book
consists of discussions of various subshifts of positive entropy, of zero entropy,
other non-shift minimal action on the Cantor set, and a study of the ergodic
properties of these systems. The author presents recent developments such as
spacing shifts, square-free shifts, density shifts, B -free shifts, Bratteli-Vershik
systems, enumeration scales, amorphic complexity, and a modern and complete
treatment of kneading theory. Later, he provides an overview of automata and
linguistic complexity (Chomsky’s hierarchy).
The necessary background for the book varies, but for most of it a solid knowl-
edge of real analysis and linear algebra and first courses in probability and
measure theory, metric spaces, number theory, topology, and set theory suffice.
Most of the exercises have solutions in the back of the book.

For additional information

and updates on this book, visit


You might also like