A First Course in Ergodic Theory 9780367226206, 9781032021843, 9780429276019

162 96 6MB

English Pages [268] Year 2021

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

A First Course in Ergodic Theory
 9780367226206, 9781032021843, 9780429276019

Table of contents :
Cover
Half Title
Title Page
Copyright Page
Dedication
Contents
Preface
Author Bios
Chapter 1: Measure Preservingness and Basic Examples
1.1. WHAT IS ERGODIC THEORY?
1.2. MEASURE PRESERVING TRANSFORMATIONS
1.3. BASIC EXAMPLES
Chapter 2: Recurrence and Ergodi
2.1. RECURRENCE
2.2. ERGODICITY
2.3. EXAMPLES OF ERGODIC TRANSFORMATIONS
Chapter 3: The Pointwise Ergodic Theorem and Mixing
3.1. THE POINTWISE ERGODIC THEOREM
3.2. NORMAL NUMBERS
3.3. IRREDUCIBLE MARKOV CHAINS
3.4. MIXING
Chapter 4: More Ergodic Theorems
4.1. THE MEAN ERGODIC THEOREM
4.2. THE HUREWICZ ERGODIC THEOREM
Chapter 5: Isomorphisms and Factor Maps
5.1. MEASURE PRESERVING ISOMORPHISMS
5.2. FACTOR MAPS
5.3. NATURAL EXTENSIONS
Chapter 6: The Perron-Frobenius Operator
6.1. ABSOLUTELY CONTINUOUS INVARIANT MEASURES
6.2. EXACTNESS
6.3. PIECEWISE MONOTONE INTERVAL MAPS
Chapter 7: Invariant Measures for Continuous Transformations
7.1. EXISTENCE
7.2. UNIQUE ERGODICITY AND UNIFORM DISTRIBUTION
7.3. SOME TOPOLOGICAL DYNAMICS
Chapter 8: Continued Fractions
8.1. REGULAR CONTINUED FRACTIONS
8.2. ERGODIC PROPERTIES OF THE GAUSS MAP
8.3. THE DOEBLIN-LENSTRA CONJECTURE
8.4. OTHER CONTINUED FRACTION TRANSFORMATIONS
Chapter 9: Entropy
9.1. RANDOMNESS AND INFORMATION
9.2. DEFINITIONS AND PROPERTIES
9.3. CALCULATION OF ENTROPY AND EXAMPLES
9.4. THE SHANNON-MCMILLAN-BREIMAN THEOREM
9.5. LOCHS’ THEOREM
Chapter 10: The Variational Principle
10.1. TOPOLOGICAL ENTROPY
10.2. PROOF OF THE VARIATIONAL PRINCIPLE
10.3. MEASURES OF MAXIMAL ENTROPY
Chapter 11: Infinite Ergodic Theory
11.1. EXAMPLES
11.2. CONSERVATIVE AND DISSIPATIVE PART
11.3. INDUCED SYSTEMS
11.4. JUMP TRANSFORMATIONS
11.5. INFINITE ERGODIC THEOREMS
Chapter 12: Appendix
12.1. TOPOLOGY
12.2. MEASURE THEORY
12.3. LEBESGUE SPACES
12.4. LEBESGUE INTEGRATION
12.5. HILBERT SPACES
12.6. BOREL MEASURES ON COMPACT METRIC SPACES
12.7. FUNCTIONS OF BOUNDED VARIATION
Bibliography
Index

Citation preview

A First Course in Ergodic Theory

A First Course in Ergodic Theory

Karma Dajani

Utrecht University

Charlene Kalle Leiden University

First edition published 2021 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN © 2021 Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, LLC Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright. com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact [email protected] Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging‑in‑Publication Data ISBN: 9780367226206 (hbk) ISBN: 9781032021843 (pbk) ISBN: 9780429276019 (ebk) Typeset in LMR12 by KnowledgeWorks Global Ltd.

To Annabel, Isaac, Noam, and Rafael

Contents Preface

xi

Author Bios Chapter

1  Measure Preservingness and Basic Examples

xiii 1

1.1

WHAT IS ERGODIC THEORY?

1

1.2

MEASURE PRESERVING TRANSFORMATIONS

3

1.3

BASIC EXAMPLES

6

Chapter

2  Recurrence and Ergodicity

19

2.1

RECURRENCE

19

2.2

ERGODICITY

22

2.3

EXAMPLES OF ERGODIC TRANSFORMATIONS

27

Chapter

3  The Pointwise Ergodic Theorem and Mixing

33

3.1

THE POINTWISE ERGODIC THEOREM

34

3.2

NORMAL NUMBERS

43

3.3

IRREDUCIBLE MARKOV CHAINS

46

3.4

MIXING

49

Chapter

4  More Ergodic Theorems

53

4.1

THE MEAN ERGODIC THEOREM

53

4.2

THE HUREWICZ ERGODIC THEOREM

56 vii

viii  Contents

Chapter

5  Isomorphisms and Factor Maps

65

5.1

MEASURE PRESERVING ISOMORPHISMS

65

5.2

FACTOR MAPS

71

5.3

NATURAL EXTENSIONS

72

Chapter

6  The Perron-Frobenius Operator

77

6.1

ABSOLUTELY CONTINUOUS INVARIANT MEASURES

77

6.2

EXACTNESS

82

6.3

PIECEWISE MONOTONE INTERVAL MAPS

91

Chapter

7  Invariant Measures for Continuous Transformations 101

7.1

EXISTENCE

101

7.2

UNIQUE ERGODICITY AND UNIFORM DISTRIBUTION

109

7.3

SOME TOPOLOGICAL DYNAMICS

115

Chapter

8  Continued Fractions

125

8.1

REGULAR CONTINUED FRACTIONS

126

8.2

ERGODIC PROPERTIES OF THE GAUSS MAP

130

8.3

THE DOEBLIN-LENSTRA CONJECTURE

135

8.4

OTHER CONTINUED FRACTION TRANSFORMATIONS 139

Chapter

9  Entropy

147

9.1

RANDOMNESS AND INFORMATION

147

9.2

DEFINITIONS AND PROPERTIES

150

9.3

CALCULATION OF ENTROPY AND EXAMPLES

156

9.4

THE SHANNON-MCMILLAN-BREIMAN THEOREM

160

9.5

LOCHS’ THEOREM

168

Contents  ix

Chapter 10  The Variational Principle

173

10.1 TOPOLOGICAL ENTROPY

173

10.2 PROOF OF THE VARIATIONAL PRINCIPLE

188

10.3 MEASURES OF MAXIMAL ENTROPY

194

Chapter 11  Infinite Ergodic Theory

199

11.1 EXAMPLES

199

11.2 CONSERVATIVE AND DISSIPATIVE PART

202

11.3 INDUCED SYSTEMS

207

11.4 JUMP TRANSFORMATIONS

216

11.5 INFINITE ERGODIC THEOREMS

218

Chapter 12  Appendix

223

12.1 TOPOLOGY

223

12.2 MEASURE THEORY

224

12.3 LEBESGUE SPACES

229

12.4 LEBESGUE INTEGRATION

231

12.5 HILBERT SPACES

233

12.6 BOREL MEASURES ON COMPACT METRIC SPACES

235

12.7 FUNCTIONS OF BOUNDED VARIATION

237

Bibliography

241

Index

247

Preface As the title suggests this book is intended as a textbook for an introductory course in ergodic theory. It originated from our own Ergodic Theory course that we have been teaching for many years in the Dutch national master program in mathematics called Mastermath. Originally we used the lecture notes made by Karma in the 1990’s. Over the years we added topics, theorems, examples and explanations from various sources to the course and the list of reference material we provided to the students grew longer and longer. With this book we hoped to create one source that is easy to teach from and that contains all the topics one would like to touch upon in an introductory course in ergodic theory. The text has been put to the test on several groups of students and it contains many worked examples and exercises. The book is designed for a one semester course. We treat approximately one chapter per week with the exception of Chapters 6, 9 and 11, which take a bit longer. In many places the book combines material that is also covered in several classical textbooks on ergodic theory. We mention in particular in chronological ordering [48] by W. Parry, [12] by I. P. Cornfeld, S. V. Fomin and Ya. G. Sinai, [65] by P. Walters, [50] by K. Petersen, [33] by A. Lasota and M. C. Mackey, [1] by J. Aaronson, [8] by A. Boyarsky and P. G´ora, [10] by M. Brin and G. Stuck, [14] by the first author and C. Kraaikamp, [17] by M. Einsiedler and T. Ward, [29] by M. Kesseb¨ohmer, S. Munday and B. O. Stratmann and [63] by M. Viana and K. Oliveira. The material in this book requires a basic knowledge of measure theory, some topology and a very basic knowledge of functional analysis. We have collected the results from these areas that are most relevant for the content of the book in the appendix for easy reference. For an introduction into these fields, more details and proofs we refer the reader to standard textbooks on the topic as for example [20, 30, 44, 56, 58].

xi

xii  Preface

The picture on the cover illustrates the concept of mixing, see Chapter 3, while also referring to the statement by A. R´enyi that mathematicians are machines for turning coffee into theorems. And of course, it is a reminder of the many cups that fuelled us while writing the book. We would like to take this opportunity to thank several people for their help in the process of making this book and for reducing the number of mistakes and typos. Our thanks go to all the students who participated in our Mastermath Ergodic Theory course over the past years, in particular the class of 2020, for their careful scrutiny of the text and honest comments. We would also like to thank Jonny Imbierski, Marta Maggioni, Marks Ruziboev and Benthen Zeegers for reading the final versions. Finally, we would like to thank the Hausdorff Research Institute for Mathematics in Bonn for hosting us for a short time during the trimester program Dynamics: Topology and Numbers. Karma and Charlene

Author Bios Karma Dajani earned her PhD degree from the George Washington University in Washington, DC and is currently an Associate Professor in Mathematics at Utrecht University in the Netherlands. She has over 30 years of teaching experience, close to 60 publications and is a coauthor of the book, Ergodic Theory of Numbers. Her research interests are primarily in Ergodic Theory, and its applications to other fields such as Number Theory, Probability Theory, and Symbolic Dynamics. Mathematics is her career but it is also one of her three passions which also includes dance and classical music. Charlene Kalle obtained her PhD in mathematics at Utrecht University in the Netherlands. After postdoctoral positions in Warwick and Vienna, she moved to Leiden University where she is now an Associate Professor in Mathematics. Her research is in Ergodic Theory with applications to Probability Theory. She is also interested in relations to Number Theory and Fractal Geometry. Besides 20 research articles, she has co-authored a book on extracurricular high school mathematics. She has accumulated 20 years of teaching experience ranging from teaching Italian language to adults, to lecturing master courses in mathematics. She devotes her time not spent on mathematics to her three children and playing bridge.

xiii

CHAPTER

1

Measure Preservingness and Basic Examples

1.1

WHAT IS ERGODIC THEORY?

Dynamical systems theory is the area of mathematics dedicated to the study of processes that evolve over time. There are many ways to approach this subject and ergodic theory focuses on describing the asymptotic average behavior of dynamical systems. To do that it uses techniques from many different fields including probability theory, statistical mechanics, number theory, vector fields on manifolds, group actions of homogeneous spaces and fractal geometry. The word ergodic is a mixture of two Greek words: ergon (work) and odos (path). The word was introduced by L. Boltzmann (1844– 1906) regarding his hypothesis in statistical mechanics: for large systems of interacting particles in equilibrium, the time average along a single trajectory equals the space average. The hypothesis as he stated it was false, and the search for conditions under which these two quantities are equal led to the birth of ergodic theory as it is known nowadays. A dynamical system consists of a space X that represents the collection of all states the system can be in, called state space, and a way to describe the time evolution. A general setup to represent time evolution is by a (semi-)group G together with a collection of transformations {Tg : X → X}g∈G , one for each element of the group, with the property that for any two elements g, h ∈ G the composition of the corresponding 1

2  A First Course in Ergodic Theory

maps satisfies Tgh = Tg ◦ Th . Continuous time dynamical systems (or flows) are obtained by setting G = R for example. In this book time will be discrete and time evolution is governed by a single transformation T : X → X. So, if the system is in state x ∈ X at some moment in time, then it will be in state T x next. This can be seen as taking G = N (or Z, depending on whether the dynamics is invertible or not) and associating to each n ∈ N, the iterate T n = T | ◦ T ◦{z· · · ◦ T}. Our dynamical systems n times

are thus represented by a pair (X, T ). This setup is analogous to that of discrete time stochastic processes. Without further assumptions on the pair (X, T ) this setup is too general to lead to any interesting results. The space X usually has a special structure, and we want T to preserve the basic structure on X. For example: • if X is a measure space, then we like T to be measurable; • if X is a topological space, then T must be continuous; • if X has a differentiable structure, then T is a diffeomorphism. In this book we mostly take the measure theoretic approach: all our spaces are measure spaces (X, F, µ) and more specifically finite or infinite measure σ-finite Lebesgue space (see Definition 12.3.2 from the Appendix). The transformations T : X → X we consider are always measurable (so T −1 A ∈ F for all A ∈ F) and also satisfy the following fundamental property, which guarantees that mass cannot appear out of nowhere. Definition 1.1.1. Let (X, F, µ) be a measure space and T : X → X measurable. Then T is called non-singular with respect to µ if for any A ∈ F it holds that µ(A) = 0 if and only if µ(T −1 A) = 0. Throughout the book (and without mentioning this any further) we assume the following: • whenever we write that (X, F, µ) is a measure space, we assume it is a finite or infinite measure σ-finite Lebesgue space; • any transformation T : X → X is always assumed to be measurable and non-singular. In this case we talk about a dynamical system. Definition 1.1.2. We call a quadruple (X, F, µ, T ) a dynamical system if (X, F, µ) is a measure space and T : X → X is a transformation.

Measure Preservingness and Basic Examples  3

Since any finite measure space can be rescaled to a probability space, the measure spaces under consideration will either be probability spaces or have infinite total measure. In the remainder of this chapter and Chapters 2, 6 and 11 we consider both. Chapters 4, 5, 8 and 9 will be mainly concerned with probability spaces. In Chapters 7 and 10 the underlying state space will be a compact metric space (X, d) and we take the measure structure on X that is compatible with the topology, i.e., the σ-algebra will be the Borel σ-algebra generated by open balls under the metric d. The transformation T will then be continuous on X, which automatically makes it measurable.

1.2

MEASURE PRESERVING TRANSFORMATIONS

Let (X, F, µ) be a measure space. Assume it is the state space of a dynamical system for which the time evolution is governed by a measurable transformation T : X → X. The measure structure on X is determined by the σ-algebra as well as the measure, and we would like the measure µ to be compatible with the dynamics of T . This is captured in the definition of measure preservingness. Definition 1.2.1. Let (X, F, µ) be a measure space and T : X → X a transformation. Then T is said to be measure preserving with respect to µ (or equivalently µ is said to be T -invariant) if µ(T −1 A) = µ(A) for all A ∈ F. The definition of measure preservingness concerns inverse images of sets, which in contrast to forward images are always measurable. We say T is invertible if it is one-to-one and if T −1 is measurable. Note that an invertible T is measure preserving if and only if µ(T A) = µ(A) for all A ∈ F. Also note that any measure preserving transformation is automatically non-singular. If a transformation T is not measure preserving with respect to µ, then one way to check the non-singularity of T is by −1 proving that the Radon-Nikodym derivative (see Theorem 12.4.4) dµ◦T dµ is positive µ-a.e. In case T is invertible one can replace T −1 by T . The dynamics of a transformation T is represented by the orbits. For each x ∈ X, the (forward) orbit of x under T is the sequence x, T x, T 2 x, . . . . If T is invertible, then one speaks of the two-sided orbit . . . , T −1 x, x, T x, . . . .

4  A First Course in Ergodic Theory

If T is measure preserving, then for any measurable function f : X → R, the process f, f ◦ T, f ◦ T 2 , . . . is stationary. This means that for all Borel sets B1 , . . . , Bn , and all integers r1 < r2 < . . . < rn , one has for any k ≥ 1, µ {x : f (T r1 x) ∈ B1 , . . . , f (T rn x) ∈ Bn }



= µ {x : f (T r1 +k x) ∈ B1 , . . . , f (T rn +k x) ∈ Bn } . 

According to the definitions, to check whether a transformation is measurable and measure preserving, one needs to verify the conditions for all measurable sets. It would be convenient of course if it would be sufficient to check the conditions on a more manageable, smaller collection of subsets of X. One possible option is a generating semi-algebra. A collection S of subsets of X is called a semi-algebra if it satisfies (i) ∅ ∈ S, (ii) A ∩ B ∈ S whenever A, B ∈ S, and (iii) if A ∈ S, then the complement Ac = X \ A = ∪ni=1 Ei is a disjoint union of elements of S. Recall that a collection S of subsets of X is generating for the σ-algebra F if F = σ(S), where σ(S) denotes the smallest σ-algebra containing all sets from S. Theorem 1.2.1. Let (X, F, µ) be a measure space and T : X → X a map. Suppose S is a generating semi-algebra of F that contains an S exhausting sequence (Sn ), i.e., an increasing sequence with X = ∞ n=1 Sn . −1 −1 Suppose that for each A ∈ S one has T A ∈ F and µ(T A) = µ(A). If furthermore, µ(Sn ) = µ(T −1 Sn ) < ∞ for all n, then T is measurable and measure preserving. The proof of this theorem is given in the Appendix, where it is formulated in the slightly more general setting of a transformation T : X1 → X2 between two measure spaces, see Theorem 12.3.1. The next lemma, of which the proof is an easy exercise, says that the notions of measurability and measure preservingness are preserved under taking the completion of the underlying measure space. Lemma 1.2.1. Let (X, F, µ) be a measure space and F the completion of F. If T : X → X is measurable and measure preserving on (X, F, µ), then it has the same properties on (X, F, µ), where µ is the extended measure.

Measure Preservingness and Basic Examples  5

This lemma is especially useful, since many of the examples of dynamical systems we consider have Borel or Lebesgue σ-algebras and according to the lemma for issues regarding measurability and measure preservingness we do not need to bother with distinguishing between them. Whenever a transformation is measurable and measure preserving with respect to the Borel σ-algebra, the same automatically holds for the Lebesgue σ-algebra. Another way of verifying whether a given measure is invariant is by using the Koopman operator. If necessary, recall the definition of Lp spaces from Section 12.2. Let (X, F, µ) be a measure space and T : X → X a transformation. Define the induced operator or Koopman operator UT : L0 (X, F, µ) → L0 (X, F, µ) by UT (f ) = f ◦ T.

(1.1)

The following properties of UT are easy to prove. Proposition 1.2.1. The operator UT has the following properties. (i) UT is linear. (ii) UT (f g) = UT (f )UT (g). (iii) UT (c) = c for any constant c. (iv) UT is positive. (v) UT (1B ) = 1B ◦ T = 1T −1 B for all B ∈ F. Exercise 1.2.1. Prove Proposition 1.2.1. The next proposition helps to check whether a measure is invariant. Proposition 1.2.2. Let (X, F, µ, T ) be a dynamical system. The measure Rµ is T -invariant if and only if for each f ∈ L0 (X, F, µ) it holds R that X UT f dµ = X f dµ (where if one side doesn’t exist or is infinite, then the other side has the same property). Proof. First, let A ∈ F be given. Then 1A ∈ L0 (X, F, µ) and Z X

1A dµ = µ(A) = µ(T −1 A) =

Z

Z

1T −1 A dµ = X

UT 1A dµ, X

where both sides are infinite precisely if µ(A) = ∞. So the property holds for indicator functions. By linearity of the integral it then also holds for simple functions and by approximation we then get the result for all nonnegative measurable functions. By splitting a function in its positive and negative part, we then obtain the statement. For the other direction, R R let A ∈ F be given. Then by assumption 1 dµ = U X A X T 1A dµ = R X 1T −1 A dµ, giving the T -invariance of µ.

6  A First Course in Ergodic Theory

A consequence of the previous proposition is the following. Proposition 1.2.3. Let (X, F, µ) be a measure space and T : X → X a measure preserving transformation. Let p ≥ 1. Then, UT Lp (X, F, µ) ⊆ Lp (X, F, µ), and kUT f kp = kf kp for all f ∈ Lp (X, F, µ). Exercise 1.2.2. Prove Proposition 1.2.3. Exercise 1.2.3. Let (X, F, µ) be a probability space and T : X → X a measure preserving transformation. Let f ∈ L1 (X, F, µ). Show that if UT f ≤ f µ-a.e., then f = UT f µ-a.e.

1.3 BASIC EXAMPLES Here we collect some examples that will be used throughout the book. In the first couple of examples X is (a subset of) a Euclidean space. To keep the notation simple, in all these cases we use B for the appropriate Lebesgue σ-algebra and λ for the appropriate Lebesgue measure. Example 1.3.1 (Translation). For any t ∈ R the transformation T : R → R given by T x = x + t is a translation by t. By the shift invariance of the Lebesgue measure λ it follows immediately from Proposition 1.2.2 that λ is invariant for T . See Figure 1.1(a) for the graph.

Rθ z

Rθ θ

1

z

1−θ

t

1 0 (a) Translation by t

(b) Rotation

θ

1 2

1

(c) Translation (mod 1)

The graphs of the transformation from Example 1.3.1 in (a) and of the transformations from Example 1.3.2 in (b) and (c). Figure 1.1

Example 1.3.2 (Rotation). Let S1 ⊆ C denote the complex unit circle. The measure we consider on the Lebesgue σ-algebra B is the normalized Haar (or Lebesgue) measure λ. Let 0 < θ < 1 and define rotation by angle θ on S1 by Rθ : S1 → S1 , z 7→ e2πiθ z. One easily verifies that the collection of all half open arcs is a generating semi-algebra for S1 ,

Measure Preservingness and Basic Examples  7

so by Theorem 1.2.1 it is enough to check measure preservingness only for such arcs. Since Rθ is an isometry, it is clear that λ is Rθ -invariant. One can also view this transformation additively by defining the transformation Tθ on the unit interval [0, 1) by Tθ x = x + θ (mod 1) = x + θ − bx + θc. The graphs of Rθ and Tθ are shown in Figures 1.1(b) and (c). Example 1.3.3 (Doubling map). Let T : [0, 1) → [0, 1) be given by (

T x = 2x (mod 1) =

2x, if 0 ≤ x < 21 , 2x − 1, if 21 ≤ x < 1.

T is called the doubling map, the graph is shown in Figure 1.2(b). Note that the set of intervals [a, b) form a generating semi-algebra for ([0, 1), B), so to verify that T is measure preserving with respect to the Lebesgue measure λ, by Theorem 1.2.1 it is enough to only consider such intervals. For any interval [a, b), T and

−1

a b [a, b) = , 2 2 





a+1 b+1 ∪ , , 2 2 





λ T −1 [a, b) = b − a = λ ([a, b)) . Although this map is very simple, it has in fact many facets. For example, iterations of this map yield the binary expansion of points in [0, 1). In other words, using T one can associate with each point in [0, 1) an infinite P sequence (an )n≥1 of 0’s and 1’s, such that x = n≥1 2bnn . To do so, define the function b1 by (

b1 (x) =

0, if 0 ≤ x < 12 , 1, if 21 ≤ x < 1,

(1.2)

so that T x = 2x − b1 (x). Now, for n ≥ 1 and x ∈ [0, 1) set bn (x) = b1 (T n−1 x). Fix x ∈ [0, 1). For simplicity we write bn instead of bn (x). Then T n x = 2T n−1 x − bn . Rewriting we get x = b21 + T2x . Similarly, 2 T x = b22 + T 2 x . Continuing in this manner, we see that for each n ≥ 1, x=

b1 b2 bn T nx + 2 + ··· + n + n . 2 2 2 2

8  A First Course in Ergodic Theory

Since 0 ≤ T n x < 1, this gives x−

n X bi i=1

2i

=

T nx → 0 as n → ∞. 2n

Thus, we have found the binary expansion of x. We shall later see that the sequence of digits b1 , b2 , . . . forms an i.i.d. sequence of Bernoulli random variables. 1

1

1

T 0

1 2

1

0

1

1 2

(a) Doubling

1 2

0

1

(b) Baker

0 (c) Boole

The graphs of the transformations from Example 1.3.3, Exercise 1.3.1 and Example 1.3.4. Figure 1.2

Exercise 1.3.1 (Baker’s transformation). Consider the probability space ([0, 1)2 , B, λ), where B is the product Lebesgue σ-algebra and λ is the two-dimensional Lebesgue measure. Define T : [0, 1)2 → [0, 1)2 by (

T (x, y) =

if 0 ≤ x < 12 , 2x, y2 ,  1 2x − 1, y+1 2 , if 2 ≤ x < 1. 

T is called the baker’s transformation, supposedly because the action of the map on the unit square resembles a kneading technique that bakers use when kneading dough. The graph is shown in Figure 1.2(c). Show that T is invertible, measurable and measure preserving. Example 1.3.4 (Boole’s transformation). Let T : R → R be given by T x = x − x1 . This map, the graph of which is shown in Figure 1.2(c), is called Boole’s transformation. To show that the Lebesgue measure is invariant for T , we use Theorem 1.2.1 again and we consider the inverse image of intervals. Any point c ∈ R has two pre-images c1 < 0 and c2 > 0 under T that are the solutions to the equation x2 − cx − 1 = 0. Since these pre-images are real numbers, we can write x2 − cx − 1 = (x − c1 )(x − c2 ) = 0

Measure Preservingness and Basic Examples  9

and we see that c1 + c2 = c. Hence, for any interval (a, b) ∈ R it holds that λ(T −1 (a, b)) = λ((a1 , b1 )∪(a2 , b2 )) = b1 −a1 +b2 −a2 = b−a = λ((a, b)). The same statement holds for any other type of intervals and since the collection of all intervals is a generating semi-algebra for B on R and the intervals (−n, n) give an exhausting sequence of finite measure intervals, λ is T -invariant. Example 1.3.5 (L¨ uroth series). Besides binary expansions there are many other ways to represent real numbers. In 1883 J. L¨ uroth introduced in [38] a kind of number expansions that are now known as L¨ uroth expansions. A L¨ uroth (series) expansion is a representation of a real number x of the form x=

1 1 + + ··· a1 a1 (a1 − 1)a2 1 + + ··· , a1 (a1 − 1) · · · an−1 (an−1 − 1)an

with ak ≥ 2 for each k ≥ 1. The series can be finite or infinite. L¨ uroth showed that every x ∈ (0, 1) can be written in such a way. How is such a series generated? The L¨ uroth map T : [0, 1) → [0, 1) is defined by (

Tx =

1 n(n + 1)x − n, if x ∈ n+1 , n1 , 0, if x = 0.





(1.3)

The graph is shown in Figure 1.3(a). Let x 6= 0. For k ≥ 1and T k−1 x 6= 0 1 we define the digits ak = ak (x) = n if T k−1 x ∈ n1 , n−1 , n ≥ 2. Then (1.3) can be written as (

Tx =

a1 (a1 − 1)x − (a1 − 1), if x 6= 0, 0, if x = 0,

and for any x ∈ (0, 1) such that T x 6= 0, we have 1 Tx 1 1 x= + = + a1 a1 (a1 − 1) a1 a1 (a1 − 1) =

1 T 2x + a2 a2 (a2 − 1)

!

1 1 T 2x + + . a1 a1 (a1 − 1)a2 a1 (a1 − 1)a2 (a2 − 1)

10  A First Course in Ergodic Theory

After k steps we obtain if T k−1 x 6= 0 that x=

1 1 + ··· + a1 a1 (a1 − 1) · · · ak−1 (ak−1 − 1)ak T kx + . a1 (a1 − 1) · · · ak (ak − 1)

Notice that, if T k−1 x = 0 for some k ≥ 1, and if we assume that k is the smallest positive integer with this property, then x=

k i X Y 1 1 1 +· · ·+ . = (ai −1) a1 a1 (a1 − 1) · · · ak−1 (ak−1 − 1)ak a (a − 1) i=1 j=1 j j

In case T k−1 x 6= 0 for all k ≥ 1, one gets an infinite series x= =

1 1 1 + + ··· + + ··· a1 a1 (a1 − 1)a2 a1 (a1 − 1) · · · ak−1 (ak−1 − 1)ak X i≥1

(ai − 1)

i Y

1

j=1

aj (aj − 1)

.

The above infinite series indeed converges to x. To see this, note that i k k Y X 1 T x x − (a − 1) = i a (a − 1) · · · a (a − 1) ; a (a − 1) 1 1 k k j=1 j j i=1

since T k x ∈ [0, 1) and ak ≥ 2 for all x and k, we find i k Y X 1 x − ≤ 1 → 0 as k → ∞. (a − 1) i a (a − 1) 2k j=1 j j i=1

From the above we also see that if x and y have the same L¨ uroth expansion, then, for each k ≥ 1, |x − y| ≤

1 2k−1

and it follows that x equals y. So a L¨ uroth expansion uniquely determines a real number. Exercise 1.3.2. Show that the L¨ uroth map T from Example 1.3.5 is measure preserving with respect to Lebesgue measure λ.

Measure Preservingness and Basic Examples  11

∈ (1, 2). The β-

Example 1.3.6 (β-transformation). Let β transformation T : [0, 1) → [0, 1) is defined by T x = βx (mod 1) =

 βx,

if 0 ≤ x < β1 ,

βx − 1,

if

1 β

≤ x < 1. √

Figure 1.3(b) shows the graph of T in case β equals G = 1+2 5 , the golden mean. The map T is not measure preserving with respect to Lebesgue measure λ, but it is non-singular. To see this, note that for any measurable set A ∈ B we can write T −1 A =

A (A ∩ [0, β − 1)) + 1 ∪ , β β

where the union is disjoint. The non-singularity then follows by the translation and scaling invariance of λ. In case β = G (notice that G2 = G+1) an invariant measure µ for T that is equivalent to λ is given by Z

g dλ,

for all B ∈ B,

 √  5+3 5 ,

if 0 ≤ x < β1 ,

µ(B) = B

where g(x) =

10 √  5+ 5 , 10

if

1 β

≤ x < 1.

(1.4)

Exercise 1.3.3. (a) Show that for any β ∈ (1, 2) Lebesgue measure is not invariant for T and that (similar to Example 1.3.3) iterations of this map generate expansions for points x ∈ [0, 1) of the form x=

∞ X bi i=1

βi

,

(1.5)

where bi ∈ {0, 1} and bi bi+1 = 0 for all i ≥ 1. Expansions as in (1.5) are called β-expansions. (b) Verify that µ from above is an invariant measure for T in case β = G. Example 1.3.7 (Continued fractions). The Gauss map T : [0, 1) → [0, 1) is defined by T 0 = 0 and for x 6= 0, Tx = Figure 1.3(c) shows the graph.

1 (mod 1). x

12  A First Course in Ergodic Theory 1

1

0

1 β

0

1

11 1 43 2

0

1

1 β

(b) β-transformation

(a) L¨ uroth Figure 1.3

1

11 1 43 2

1

(c) Gauss

The graphs of the transformations from Examples 1.3.5, 1.3.6

and 1.3.7. Exercise 1.3.4. The Gauss probability measure µ is given by Z

µ(B) = B

1 1 dλ(x), log 2 1 + x

for all B ∈ B.

(1.6)

Show that T is not measure preserving with respect to Lebesgue measure, but is measure preserving with respect to µ. An interesting feature of this map is that its iterations generate continued fraction expansions for points in (0, 1). For if we define for each k≥1 (  1, if T k−1 x ∈ 21 , 1 , ak = ak (x) =  1 , n1 , n ≥ 2, n, if T k−1 x ∈ n+1 then for x 6= 0 we can write T x =

1 x

− a1 and hence x =

n iterations, if T n−1 x 6= 0 we see that x=

1 = ··· = a1 + T x a1 +

1 . After a1 + T x

1

.

1 . a2 + . . +

1 an + T n x

If T n x = 0 for some n, then we get a finite continued fraction expansion of x. If, on the other hand, T n x 6= 0 for all n, then we can continue this process indefinitely. In fact, in Chapter 8 we show that if we write pn = qn

1 a1 +

1 1 . a2 + . . + an

,

Measure Preservingness and Basic Examples  13

then the sequence (qn ) is monotonically increasing, and x− pqnn < as n → ∞. The last statement implies that



1

x=

a2 +

→0

.

1

a1 +

1 2 qn

1 a3 +

1 ..

.

Example 1.3.8 (Arnold’s cat map). Let T2 = R2 /Z2 be the twodimensional torus. The map T : X → X defined by 2 1 T (x, y) = (2x + y (mod 1), x + y (mod 1)) = 1 1

!

!

x y

(mod Z2 )

is called Arnold’s cat map. The name refers to the fact that V. Arnold and A. Avez used a picture of a cat to illustrate the dynamics of this map in [3]1 . The graph of the map ! is shown in Figure 1.4. The map 2 1 has determinant 1. The scaling is invertible, since the matrix 1 1 invariance of the Lebesgue measure together with Proposition 1.2.2 then give that Lebesgue measure λ is T -invariant. The map T is a specific example of a hyperbolic toral automorphism. For any n × n matrix M , n ≥ 2, with integer entries and determinant |det M | = 1 we can define the corresponding toral automorphism TM : Tn → Tn by TM x = M x (mod Zn ). Since M maps points in Zn to points in Zn , for any two points x, y ∈ Rn that have the same representative in Tn it holds that M x (mod Zn ) = M y (mod Zn ). Hence, the map TM is well defined and invertible and Lebesgue measure is invariant. In case none of the eigenvalues of M are on the unit circle, TM is called a hyperbolic toral automorphism. The next couple of examples have a symbolic space as a state space, i.e., X = {0, 1, . . . , k − 1}Z or X = {0, 1, . . . , k − 1}N is the set of two1

Another version of the story says that CAT is an abbreviation of Chaotic Automorphism of a Torus (in Russian) and that the picture of a cat was later associated to the cat map due to the appearance of the map in an earlier text by V. Arnold in which the action of another dynamical system was illustrated using the picture of a cat.

14  A First Course in Ergodic Theory 2



1 0 Figure 1.4

2 1 1 1

 2

2 mod

1 1

2

3

0

1

2

3

Z2

1 0

1

2

3

An illustration of Arnold’s cat map.

or one-sided infinite sequences of symbols 0, 1, . . . , k − 1. On such spaces we consider the product σ-algebra C generated by the cylinder sets: [ai · · · aj ] := {x ∈ X : xi = ai , . . . , xj = aj }

(1.7)

with ai , . . . , aj ∈ {0, 1, . . . , k − 1} and i ≤ j (and i > 0 in the onesided case). For one-sided sequences the collection of all cylinders of the form [a1 · · · aj ] for j ≥ 0 (with the empty set for j = 0) forms a generating semi-algebra and for two-sided sequences this is obtained from all cylinders of the form [a−i a−i+1 · · · aj−1 aj ] with i, j ≥ 0 together with the empty set. By Theorem 1.2.1 it is enough check measurability and measure preservingness on such sets only. Example 1.3.9 (Bernoulli shifts). Let X = {0, 1, . . . , k − 1}Z . Let p = (p0 , p1 , . . . , pk−1 ) be a positive probability vector. The measure µ on C defined by specifying it on cylinder sets by µ ({x : xi = ai , . . . , xj = aj }) = pai . . . paj is called the p-Bernoulli measure. If all pi are equal and thus equal to 1 k , then µ is called the uniform Bernoulli measure. The transformation T : X → X given by T x = y, where yn = xn+1 for all n, is called the left shift. It is measurable and measure preserving, since T −1 {x : xi = ai , . . . , xj = aj } = {x : xi+1 = ai , . . . , xj+1 = aj }, (1.8) and µ ({x : xi+1 = ai , . . . , xj+1 = aj }) = pai . . . paj . The system (X, C, µ, T ) is called the two-sided Bernoulli shift on k symbols. The one-sided Bernoulli shift with X = {0, 1, . . . , k − 1}N is set up in exactly the same way. Note that for the proof of measurability and measure preservingness the inverse image of a cylinder as in (1.8) in this case becomes a union of k disjoint cylinder sets.

Measure Preservingness and Basic Examples  15

Example 1.3.10 (Markov shifts). Let (X, C, T ) be as in Example 1.3.9. We now define a measure ν on C as follows. Let P = (pij ) be a stochastic k × k matrix, and q = (q0 , q1 , . . . , qk−1 ) a positive probability vector such that qP = q. Define ν on cylinders by ν ({x : xi = ai , . . . , xj = aj }) = qai pai ai+1 . . . paj−1 aj . The measure ν is called a Markov measure. Just as in Example 1.3.9, one sees that T is measurable and measure preserving. Example 1.3.11 (Binary odometer). On {0, 1}N consider the (p, 1 − p)-Bernoulli measure µp for some 0 < p < 1, that is µp ({x : x1 = a1 , . . . , xn = an }) = pn−

Pn i=1

ai

Pn

(1 − p)

i=1

ai

.

The transformation T : {0, 1}N → {0, 1}N given by T (1, 1, 1, . . .) = (0, 0, 0, . . .) and for n ≥ 1, T (1, . . . , 1, 0, xn+1 , . . .) = (0, . . . , 0, 1, xn+1 , . . .) | {z }

| {z }

n−1

n−1

is called the binary odometer. We will show that T is a non-singular transformation. First note that T is invertible, so it is enough to check that µp and µp ◦ T are equivalent measures. Let d(x) = inf{n ≥ 1 : xn = 0} and set m(x) = d(x) − 2. Note that d(x) = ∞ only for the constant sequence (1, 1, 1, . . .). Consider any cylinder set A ⊆ {x ∈ X : d(x) = k}, then A = {x ∈ X : x1 = · · · = xk−1 = 1, xk = 0, xk+1 = `1 , . . . , xk+j = `j } for some `1 , . . . , `j ∈ {0, 1}. We have T A = {x ∈ X : x1 = · · · = xk−1 = 0, xk = 1, xk+1 = `1 , . . . , xk+j = `j }, and µp (T A) = pk−1 (1 − p)µp ({x : xk+1 = `1 , . . . , xk+j = `j }) 

=

p 1−p

k−2

µp (A).

Note that for all x ∈ A one has m(x) = k − 2, hence the last expression can be written as Z 

µp (T A) = A

p 1−p

m(x)

dµp (x).

16  A First Course in Ergodic Theory



m(x)

p This shows that dµdµp ◦T (x) = 1−p µp -a.e. and hence µp and µp ◦ T p are equivalent measures. So T is non-singular with respect to µp for all = 1 µp -a.e. and so T is measure preserving with p. Notice that dµdµp ◦T p respect to µp if and only if p = 12 .

We end this section with two examples of symbolic systems of a more probabilistic flavor. Example 1.3.12 (Stationary stochastic processes). Let (Ω, F, P) be a probability space, and . . . , Y−2 , Y−1 , Y0 , Y1 , Y2 , . . . a stationary stochastic process on Ω with values in R. Hence, for each i ∈ Z, Yi : Ω → R is a random variable (or measurable function) and for each k ∈ Z, P (Yn1 ∈ B1 , . . . , Ynr ∈ Br ) = P (Yn1 +k ∈ B1 , . . . , Ynr +k ∈ Br ) for any n1 < n2 < · · · < nr and any Lebesgue sets B1 , . . . , Br . We can see this process as coming from a measure preserving transformation in the following way. Let X = RZ = {x = (. . . , x1 , x0 , x1 , . . .) : xi ∈ R} with the product σ-algebra (i.e., generated by the cylinder sets). Let T : X → X be the left shift. Define Ψ : Ω → X by Ψ(ω) = (. . . , Y−2 (ω), Y−1 (ω), Y0 (ω), Y1 (ω), Y2 (ω), . . . ). Then, Ψ is measurable since if B1 , . . . , Br are Lebesgue sets in R, then Ψ−1 ({x ∈ X : xn1 ∈ B1 , . . . xnr ∈ Br }) = Yn−1 (B1 ) ∩ . . . ∩ Yn−1 (Br ) ∈ F. 1 r Define a measure µ on X by µ(E) = P Ψ−1 (E) for any E ∈ F. On cylinder-type sets µ has the form, 

µ ({x ∈ X : xn1 ∈ B1 , . . . , xnr ∈ Br }) = P (Yn1 ∈ B1 , . . . , Ynr ∈ Br ) . Since T −1 {x : xn1 ∈ B1 , . . . , xnr ∈Br }



= {x : xn1 +1 ∈ B1 , . . . , xnr +1 ∈ Br }, the stationarity of the process (Yn ) implies that T is measure preserving. Furthermore, if we let πi : X → R be the natural projection onto the i-th coordinate, then Yi (ω) = πi (Ψ(ω)) = π0 ◦ T i (Ψ(ω)).

Measure Preservingness and Basic Examples  17

Example 1.3.13 (Random shifts). Let (X, F, µ) be a measure space, and T : X → X an invertible measure preserving transformation. Then, T −1 is measurable and measure preserving with respect to µ. Suppose now that at each moment instead of moving forward by T (x → T x), we first flip a fair coin to decide whether we will use T or T −1 . We can describe this random system by means of a measure preserving transformation in the following way. Let Ω = {−1, 1}Z with product σ-algebra C, the uniform Bernoulli measure P and the left shift S : Ω → Ω, which, as we saw in Example 1.3.9, is measure preserving. Now, let Y = Ω × X with the product σ-algebra, and product measure P × µ. Define R : Y → Y by R(ω, x) = (Sω, T ω0 x). Then R is invertible (why?), and measure preserving with respect to P × µ. To see the latter, for any set C ∈ C, and any A ∈ F, we have 



(P × µ) R−1 (C × A)

= (P × µ) ({(ω, x) : R(ω, x) ∈ (C × A)) = (P × µ) ({(ω, x) : ω0 = 1, Sω ∈ C, T x ∈ A) 

+ (P × µ) {(ω, x) : ω0 = −1, Sω ∈ C, T −1 x ∈ A 

= (P × µ) {ω0 = 1} ∩ S −1 C × T −1 A 



 

+ (P × µ) {ω0 = −1} ∩ S −1 C × T A 





= P {ω0 = 1} ∩ S −1 C µ T −1 A 





+ P {ω0 = −1} ∩ S −1 C µ (T A) 







= P {ω0 = 1} ∩ S −1 C µ(A) + P {ω0 = −1} ∩ S −1 C µ(A) = P(S −1 C)µ(A) = P(C)µ(A) = (P × µ)(C × A).

CHAPTER

2

Recurrence and Ergodicity

2.1

RECURRENCE

One of the most general statements in ergodic theory is the Poincar´e Recurrence Theorem. It was formulated and discussed by H. Poincar´e in 1890 (see [51]) and it is one of the first results that uses a measure theoretic approach in the study of dynamical systems. For any subset B of the state space X a point x ∈ B is said to be B-recurrent if it eventually returns to B under iterations of T , i.e., if there exists a k ≥ 1 such that T k x ∈ B. Theorem 2.1.1 (Poincar´ e Recurrence Theorem). Let (X, F, µ) be a probability space and T : X → X a measure preserving transformation. Let B ∈ F with µ(B) > 0. Then µ-a.e. x ∈ B is B-recurrent. Proof. Let N be the subset of B consisting of all elements that are not B-recurrent. Then, N = {x ∈ B : T k x 6∈ B for all k ≥ 1}. We want to show that µ(N ) = 0. First notice that N ∩ T −k N = ∅ for all k ≥ 1, hence T −` N ∩ T −m N = ∅ for all ` 6= m. Thus, the sets N, T −1 N, T −2 N, . . . are pairwise disjoint, and µ(T −n N ) = µ(N ) for all n ≥ 1 (T is measure preserving). If µ(N ) > 0, then 1 = µ(X) ≥ µ

[ k≥0



T −k N =

X

µ(N ) = ∞,

k≥0

a contradiction. 19

20  A First Course in Ergodic Theory

The proof of the Poincar´e Recurrence Theorem implies that almost every x ∈ B returns to B infinitely often. In other words, there exist infinitely many integers n1 < n2 < . . . such that T ni x ∈ B for all i ≥ 1. Recall that we can write {x ∈ B : T n x ∈ B i.o.} =

\ [

T −k B.

n≥0 k≥n

To see that µ({x ∈ B : T n x ∈ B i.o.}) = µ(B), let D = {x ∈ B : T k x ∈ B for finitely many k ≥ 1}. Then, k

D = {x ∈ B : T x ∈ N for some k ≥ 0} ⊆

∞ [

T −k N.

k=0

Thus, µ(D) = 0 since µ(N ) = 0 and T is non-singular. The assumption that the measure of the space µ(X) is finite is essential in the proof. It is not hard to see that the statement does not necessarily hold for infinite measure systems. Recall the translation T x = x+t on R from Example 1.3.1 and assume that t > 0. We saw that T is measure preserving with respect to the Lebesgue measure. The interval [0, t) has positive Lebesgue measure, but λ(T n [0, t) ∩ [0, t)) = 0 for all n ≥ 1. In other words, no element from [0, t) ever returns to [0, t) under iterations of T . Definition 2.1.1. Let (X, F, µ) be a measure space. A set W ∈ F is called wandering for a transformation T : X → X if µ(T −n W ∩ T −m W ) = 0 for all 0 ≤ n < m. We denote the collection of all wandering sets of T by WT . In other words, a set W is wandering for T if the collection {T −n W : n ≥ 0} is essentially pairwise disjoint. Note that any subset of a wandering set is wandering itself. In the example above the interval [0, t) is wandering and so is any other interval of length at most t. Definition 2.1.2. Let (X, F, µ) be a measure space. A transformation T : X → X is called conservative if µ(W ) = 0 for all W ∈ WT . Note that if T is a measure preserving transformation and W is a wandering set, then µ(X) ≥ µ

[ n≥0



T −n W =

X n≥0

µ(T −n W ).

Recurrence and Ergodicity  21

This shows that any measure preserving transformation on a probability space is automatically conservative. If T is conservative, then no set B ∈ F with µ(B) > 0 can be wandering. So, if µ(B) > 0, then there are 0 ≤ n < m, such that µ(T −n B ∩ T −m B) > 0. The non-singularity of T then gives that µ(B ∩ T −(m−n) B) > 0. So, if T is conservative, then from any positive measure set at least a positive measure part returns to the set after some time. It is a consequence of the following more general recurrence theorem, which is due to P. R. Halmos [20], that conservativity actually guarantees that a system satisfies the statement from the Poincar´e Recurrence Theorem. Theorem 2.1.2 (Halmos Recurrence Theorem). Let (X, F, µ) be a measure space and T : X → X a transformation. Then for every A ∈ F the following property holds: µ(A ∩ W ) = 0 for all W ∈ WT if and only if for all measurable sets B ⊆ A, µ({x ∈ B : T n x ∈ B i.o.}) = µ(B).

(2.1)

Proof. Assume first that µ(A ∩ W ) = 0 for all wandering sets W . Fix a measurable set B ⊆ A and let N = {x ∈ B : T k x 6∈ B for all k ≥ 1} = B \

[

T −k B

k≥1

be the set of points in B that never return to B. It suffices to show that 

µ B\

\ [



T −n B = 0.

k≥0 n≥k

From the definition of N it follows that T −n N ∩ T −m N = ∅ for all n 6= m, so N is wandering and by assumption µ(N ) = µ(N ∩ A) = 0. The non-singularity of T then implies that for each k ≥ 0, 

0 = µ(T −k N ) = µ T −k B \

[



T −n B ,

n≥k+1

so also µ n≥k T −n B \ n≥k+1 T −n B = 0. Since n≥k+1 T −n B ⊆ S T −n B for all k, from this we obtain inductively, starting from n≥kS B ⊆ n≥0 T −n B, that S



S



µ B\

[ n≥k

S



T −n B = 0

22  A First Course in Ergodic Theory

for all k ≥ 0 and the result follows. For the other direction, let W be a wandering set and assume that µ(A ∩ W ) > 0. Set B = A ∩ W . Then B ⊆ A, so by assumption µ({x ∈ B : T n x ∈ B i.o.}) = µ(B) > 0. On the other hand B is wandering, so no positive measure set of points can return to B. This gives a contradiction. Note that if T is conservative, we can take A = X in Theorem 2.1.2 to get the statement from the Poincar´e Recurrence Theorem for every set B ∈ F. Conversely, if (2.1) holds for all B ∈ F, then µ(W ) = 0 for every wandering set W and T is conservative. Hence, conservativity can be characterized as follows. Corollary 2.1.1. Let (X, F, µ) be a measure space and T : X → X a transformation. Then T is conservative if and only if for each B ∈ F it holds that µ-almost every x ∈ B returns to B infinitely often. If T is invertible, then we can replace T by T −1 in the above results and obtain that for µ-a.e. x ∈ B there are infinitely many positive and negative integers n such that T n x ∈ B. Note that since any measure preserving transformation on a probability space is conservative, the Poincar´e Recurrence Theorem follows from this corollary. We stated and proved it separately, because of its historical importance.

2.2

ERGODICITY

The right condition under which Boltzmann’s Hypothesis is true, turned out to be ergodicity. Ergodicity is an irreducibility condition defined as follows. Recall that if A and B are measurable sets, then their symmetric difference is defined by A∆B = (A ∪ B) \ (A ∩ B) = (A \ B) ∪ (B \ A). We use the notation Ac for the complement X \ A of A in X. Definition 2.2.1. Let T : X → X be a transformation on a measure space (X, F, µ). The map T is said to be ergodic if for every measurable set A satisfying µ(A∆T −1 A) = 0, we have µ(A) = 0 or µ(Ac ) = 0.

Recurrence and Ergodicity  23

We can actually replace essential invariance by total invariance, using the following small exercise. Exercise 2.2.1. Let (X, F, µ) be a measure space. (a) Show that for any measurable sets A, B, C one has µ(A∆B) ≤ µ(A∆C) + µ(C∆B). (b) Let T : X → X be a transformation and let A ∈ F satisfy µ(A∆T −1 A) = 0. Prove that then µ(A∆T −k A) = 0 for each k ≥ 1. Proposition 2.2.1. Let T : X → X be a transformation on a measure space (X, F, µ). Then T is ergodic if and only if for every measurable set A satisfying T −1 A = A, we have µ(A) = 0 or µ(Ac ) = 0. Proof. One direction is immediate. For the other direction, let A ∈ F be such that µ(A∆T −1 A) = 0. Set B = {x ∈ X : T n x ∈ A i.o.}. Then obviously T −1 B = B, so by assumption µ(B) = 0 or µ(B c ) = 0. Furthermore, µ(A∆B) = µ

\ [



T −k A ∩ Ac + µ

[ \

n≥1 k≥n

≤µ

[



T −k A ∩ Ac + µ

µ T



n≥1 k≥n



k≥1

X

T −k Ac ∩ A

[

T −k Ac ∩ A



k≥1 −k



A∆A .

k≥1

It follows from Exercise 2.2.1 that µ T −k A∆A = 0 for each k ≥ 1. Hence, µ(B∆A) = 0 which implies that µ(B) = µ(A). Therefore, µ(A) = 0 or µ(Ac ) = 0. 

To illustrate the concept, consider the following example of a transformation that is not ergodic. Example 2.2.1. Let T : [0, 1] → [0, 1] be given by    2x,

if 0 ≤ x < 41 , T x = 2x − 12 , if 41 ≤ x < 34 ,   2x − 1, if 3 ≤ x ≤ 1. 4 See Figure 2.1 for the graph. One readily checks that Lebesgue measure λ is invariant for T , but for A = [0, 12 ) we see that T −1 A = A and 0 < λ(A) < 1. The dynamics of T splits into two independent parts: the orbit of any x ∈ A is completely contained in A and similarly for points in Ac .

24  A First Course in Ergodic Theory

1 2

0

Figure 2.1

1 4

1 2

1

1 4

The graph of the transformation from Example 2.2.1.

The following proposition says that in a conservative and ergodic system, any positive measure set is eventually visited by almost all points in the space. Proposition 2.2.2. Let (X, F, µ) be a measure space and let A ∈ F with µ(A) > 0 be given. If T : X → X is a conservative and ergodic S transformation, then µ X \ n≥0 T −n A) = 0. Proof. Let µ(A) > 0 and set B = X \ n≥0 T −n A = n≥0 T −n Ac . By definition B ⊆ T −1 B and B = Ac ∩ T −1 B. The set T −1 B contains precisely those elements of X that do not enter A under any iteration of T . Hence, by conservativity Corollary 2.1.1 gives that µ(A ∩ T −1 B) = 0. This yields µ(T −1 B) = µ(Ac ∩ T −1 B) = µ(B), so µ(B∆T −1 B) = µ(T −1 B \ B) = µ(A ∩ T −1 B) = 0. By ergodicity then µ(B) = 0 or µ(B c ) = 0. Since A ⊆ B c , we get the result. S

T

Remark 2.2.1. Note that the non-singularity of T implies that if µ X \ S −n A) = 0, then n≥0 T [

µ X\

T −n A) = µ T −k (X \

[

T −n A) = 0 

n≥0

n≥k

for each k ≥ 0 and hence, 0≤µ X\

\ [

T −n A = µ 

k≥0 n≥k

[

(X \

[

µ X\

[

k≥0



X k≥0

T −n A)



n≥k

T −n A = 0. 

n≥k

So µ-a.e. x ∈ X visits a set A eventually under iterations of T if and only if µ-a.e. x ∈ X visits A infinitely often under T . Proposition 2.2.2 can be strengthened.

Recurrence and Ergodicity  25

Proposition 2.2.3. A transformation T : X → X on a measure space (X, F, µ) is conservative and ergodic if and only if for each f ∈ R P L1 (X, F, µ) with f ≥ 0 and X f dµ > 0 it holds that n≥0 f (T n x) = ∞ for µ-almost all x ∈ X. Proof. Assume first that T isR conservative and ergodic and let f ∈ L1 (X, F, µ) with f ≥ 0 and X f dµ > 0 be given. Then there must be an ε > 0, such that µ({x ∈ X : f (x) > ε}) > 0. Set A = {x ∈ X : f (x) > ε}. From Proposition 2.2.2 we obtain as in Remark 2.2.1 that 

µ X\

\ [



T −k A = 0,

n≥0 k≥n

which is equivalent to X

1A (T n x) = ∞ µ − a.e.

n≥0

By definition of A this implies the statement. For the other Rdirection, first let W ∈ F be a wandering set for T with µ(W ) > 0. Then X 1W dµ > 0, P but n≥0 1W (T n x) ≤ 1 for µ-a.e. x ∈ X contradicting the assumption made. Hence µ(W ) = 0 and T is conservative. Now, let A ∈ F be a T -invariant set with µ(A) > 0. Then by assumption X

1A (x) =

n≥0

X n≥0

1T −n A (x) =

X

1A (T n x) = ∞ µ-a.e.,

n≥0

so µ-a.e. x ∈ X is an element of A and thus µ(X \ A) = 0. Hence, T is ergodic. The following theorem contains several equivalent characterizations of ergodicity in case T is conservative. Theorem 2.2.1. Let (X, F, µ) be a measure space and T : X → X conservative. The following are equivalent. (i) T is ergodic. 

(ii) If A ∈ F with µ(A) > 0, then µ X \

S



−n A = 0. n≥1 T

(iii) If A, B ∈ F with µ(A)µ(B) > 0, then there exists an n > 0 such that µ(T −n A ∩ B) > 0.

26  A First Course in Ergodic Theory

Remark 2.2.2. (1) In case T is invertible, then in the above characterization one can replace T −n by T n . (2) In words, (ii) says that if A is a set of positive measure, almost every x ∈ X will eventually (in fact infinitely often) visit A. (3) Item (iii) says that a positive measure set of elements from B will eventually enter A. Proof of Theorem 2.2.1. (i) ⇒ (ii) This follows from Proposition 2.2.2 and the non-singularity of T . (ii)  ⇒ S(iii) Let A,  B ∈ F be such that µ(A)µ(B) > 0. From −n µ X \ n≥1 T A = 0 we can conclude that 

0 < µ(B) = µ B ∩

[ n≥1



T −n A = µ

[



(B ∩ T −n A) .

n≥1

Hence, there exists an n ≥ 1 such that µ(B ∩ T −n A) > 0. (iii) ⇒ (i) Let A ∈ F be such that µ(A∆T −1 A) = 0 and µ(A) > 0. If µ(Ac ) > 0, then by (iii) there exists an n ≥ 1 such that µ(Ac ∩ T −n A) > 0. On the other hand, by Exercise 2.2.1, 0 = µ(A∆T −n A) = µ(Ac ∩ T −n A) + µ(A ∩ T −n Ac ), a contradiction. Hence, µ(Ac ) = 0 and T is ergodic. We can obtain several other characterizations of ergodicity using the Koopman operator. Recall its definition from (1.1). Theorem 2.2.2. Let (X, F, µ) be a measure space, and T : X → X a transformation. The following are equivalent. (i) T is ergodic. (ii) If f ∈ L0 (X, F, µ) with UT f = f µ-a.e., then f is a constant µ-a.e. In addition, if (X, F, µ) is a probability space, then (i) and (ii) are equivalent to (iii) If f ∈ L2 (X, F, µ) with UT f = f µ-a.e., then f is a constant µ-a.e. Proof. (i) ⇒ (ii) Suppose f ∈ L0 (X, F, µ) satisfies UT f = f µ-a.e. and assume without any loss of generality that f is real (otherwise we consider separately the real and imaginary parts of f ). For each a ∈ R,

Recurrence and Ergodicity  27

let Ba = {x ∈ X : f (x) ≤ a}. From f (T x) = f (x) µ-a.e., it follows that µ(Ba ∆T −1 Ba ) = 0 and by ergodicity then µ(Ba ) = 0 or µ(Bac ) = 0. Let a0 = inf{a ∈ R : µ(Bac ) = 0}, and note that S c {x ∈ X : f (x) = a0 } = Ba0 \ ∞ n=1 Ba0 − n1 . Since µ(Ba0 ) = 0 and  µ Ba0 − 1 = 0 for all n ≥ 1, we have µ({x ∈ X : f (x) = a0 }c ) = 0 and n f is a constant µ-a.e. (ii) ⇒ (i) Let A ∈ F with µ(A∆T −1 A) = 0 be given. Then 1A ∈ L (X, F, µ) and 1A = 1T −1 A = UT 1A µ-a.e., so by assumption 1A is a constant µ-a.e. and µ(A) = 0 or µ(Ac ) = 0. 0

Finally, in case µ(X) = 1 just notice that for any A ∈ F the indicator function 1A is in L2 (X, F, µ) and the arguments giving (ii) ⇒ (i) also provide (iii) ⇒ (i). Since the implication (ii) ⇒ (iii) is immediate, this finishes the proof.

2.3

EXAMPLES OF ERGODIC TRANSFORMATIONS

Example 2.3.1 (Ergodicity of irrational rotations). For θ ∈ (0, 1), let Tθ : [0, 1) → [0, 1) be the map Tθ x = x + θ (mod 1) as in Example 1.3.2. We have seen that Tθ is measure preserving with respect to Lebesgue measure λ. When is Tθ ergodic? As an example, consider θ = 14 . Then the set 1 A = 0, 8 



1 3 ∪ , 4 8 



1 5 ∪ , 2 8 



3 7 ∪ , 4 8 



is Tθ -invariant, but λ(A) = 12 . Hence, T 1 is not ergodic. 4

p q

Exercise 2.3.1. Suppose θ = with gcd(p, q) = 1. Prove that the orbit of any point x ∈ [0, 1) is periodic with period q, i.e., show that Tθq x = x and Tθi x 6= x for all 1 ≤ i < q. Find a non-trivial Tθ -invariant set and conclude that Tθ is not ergodic if θ is a rational. Claim. Tθ is ergodic if and only if θ is irrational. Proof of claim. Suppose θ is irrational, and let f ∈ L2 ([0, 1), B, λ) be Tθ invariant λ-a.e. Let A = {x ∈ [0, 1) : f (x) = f (Tθ x)}. Then λ(A) = 1. Since f ∈ L2 ([0, 1), B, λ), we can write f in its Fourier series1 : f (x) =

X

an e2πinx .

(2.2)

n∈Z 1

It was proven by L. Carleson in [11] that for L2 -functions on [0, 1] the convergence of the Fourier series from (2.2) holds almost everywhere.

28  A First Course in Ergodic Theory

For x ∈ A we get from f (Tθ x) = f (x) that f (Tθ x) =

X

X

an e2πin(x+θ) =

n∈Z

an e2πinθ e2πinx

n∈Z

= f (x) =

X

2πinx

an e

.

n∈Z

Hence, n∈Z an (1 − e2πinθ )e2πinx = 0. By the uniqueness of the Fourier coefficients, we have an 1 − e2πinθ = 0, for all n ∈ Z. For n 6= 0 the irrationality of θ implies that 1 − e2πinθ 6= 0. Thus, an = 0 for all n 6= 0, and therefore f (x) = a0 for all x ∈ A. By Theorem 2.2.2, Tθ is ergodic. Together with Exercise 2.3.1 this proves the claim. P

Exercise 2.3.2. Consider the probability space ([0, 1)2 , B, λ), where B is the Lebesgue σ-algebra on [0, 1)2 , and λ the two-dimensional Lebesgue measure. Suppose θ ∈ (0, 1) is irrational, and define Tθ × Tθ : [0, 1)2 → [0, 1)2 by Tθ × Tθ (x, y) = (x + θ (mod 1), y + θ (mod 1)) . Show that Tθ × Tθ is measure preserving, but is not ergodic. Exercise 2.3.3. Consider the probability space ([0, 1]2 , B, λ), where B is the two-dimensional Lebesgue σ-algebra and λ is the two-dimensional Lebesgue measure. Prove that the transformation S : [0, 1]2 → [0, 1]2 given by S(x, y) = (x + θ (mod 1), x + y (mod 1)) with θ ∈ (0, 1) irrational is measure preserving and ergodic with respect to λ. 2πi(nx+my) (Hint: The Fourier series of a function f ∈ n,m cn,m e P 2 2 2 L ([0, 1] , B, λ) satisfies n,m |cn,m | < ∞.)

P

Example 2.3.2 (Ergodicity of one- (or two-)sided shift). Let (X, C, µ, T ) be the one-sided Bernoulli shift from Example 1.3.9 with positive probability vector p = (p0 , p1 , . . . , pk−1 ). To show that the left shift T is ergodic, let E be a measurable subset of X which is T -invariant i.e., T −1 E = E. For any  > 0 there exists an A ∈ C which is a finite disjoint union of cylinders and satisfies µ(E∆A) < , see Lemma 12.2.1. Then |µ(E) − µ(A)| = |µ(E \ A) − µ(A \ E)| ≤ µ(E \ A) + µ(A \ E) = µ(E∆A) < .

Recurrence and Ergodicity  29

A depends on finitely many coordinates only, so there exists an n0 > 0 such that T −n0 A depends on different coordinates than A. Since µ is a product measure, we have µ(A ∩ T −n0 A) = µ(A)µ(T −n0 A) = µ(A)2 . Further, µ(E∆T −n0 A) = µ(T −n0 E∆T −n0 A) = µ(E∆A) < , and thus |µ(E) − µ(A ∩ T −n0 A)| ≤ µ E∆(A ∩ T −n0 A)



≤ µ(E∆A) + µ(E∆T −n0 A) < 2. Hence, |µ(E) − µ(E)2 | ≤ |µ(E) − µ(A)2 | + |µ(A)2 − µ(E)2 | = |µ(E) − µ(A ∩ T −n0 A)| + (µ(A) + µ(E))|µ(A) − µ(E)| < 4. Since  > 0 is arbitrary, it follows that µ(E) = µ(E)2 , hence µ(E) = 0 or 1. Therefore, T is ergodic. The following lemma provides, in some cases, a useful tool to verify that a measure preserving transformation defined on ([0, 1), B, µ) is ergodic. Here B is the Lebesgue σ-algebra, and µ is a probability measure equivalent to Lebesgue measure λ (i.e., µ(A) = 0 if and only if λ(A) = 0). It was proven by K. Knopp in [31] from 1926. Lemma 2.3.1 (Knopp’s Lemma). If E ∈ B is a Lebesgue set and S is a class of subintervals of [0, 1), satisfying (a) every open subinterval of [0, 1) is at most a countable union of disjoint elements from S, (b) ∀A ∈ S, λ(A ∩ E) ≥ γλ(A), where γ > 0 is independent of A, then λ(E) = 1. Proof. The proof is done by contradiction. Suppose λ(E c ) > 0. Given ε > 0 there exists by Lemma 12.2.1 a set Aε that is a finite disjoint union of open intervals such that λ(E c 4Aε ) < ε. Now by conditions (a) and (b) (that is, writing Aε as a countable union of disjoint elements of S)

30  A First Course in Ergodic Theory

one gets that λ(E ∩ Aε ) ≥ γλ(Aε ). Also from our choice of Aε and the fact that λ(E c 4Aε ) ≥ λ(E ∩ Aε ) ≥ γλ(Aε ) ≥ γλ(E c ∩ Aε ) > γ(λ(E c ) − ε), we obtain γ(λ(E c ) − ε) < λ(E c 4Aε ) < ε, implying that γλ(E c ) < ε + γε. Since ε > 0 is arbitrary, we get a contradiction. Example 2.3.3 (Ergodicity of the doubling map). Let T : [0, 1) → [0, 1) be the doubling map from Example 1.3.3 given by (

T x = 2x (mod 1) =

2x, if 0 ≤ x < 12 , 2x − 1, if 21 ≤ x < 1.

We have already seen that T is measure preserving and we will use Lemma 2.3.1 to show that T is ergodic. Let 

S=

k k+1 , 2n 2n





: n ≥ 1 and 0 ≤ k ≤ 2n − 1

be the collection of dyadic intervals. Notice that the set 2kn : n ≥ 1, 0 ≤ k < 2n −1 of dyadic rationals is dense in [0, 1), hence each open interval is an at most countable union of disjoint elements of S. So, S satisfies the first condition of Knopp’s Lemma. Now, T n maps each dyadic interval  k k+1  of the form 2n , 2n linearly onto [0, 1), (we call such an interval dyadic of order n); in fact, T n x = 2n x (mod 1). Let E be T -invariant Lebesgue set, and assume λ(E) > 0. Let A ∈ S, and assume that A is dyadic of order n. Then, T n A = [0, 1) and 

λ(A ∩ E) = λ(A ∩ T −n E) =

1 λ(E) = λ(A)λ(E). 2n

Thus, the second condition of Knopp’s Lemma is satisfied with γ = λ(E) > 0. Hence, λ(E) = 1 and T is ergodic. Example 2.3.4 (Ergodicity of the L¨ uroth transformation). Consider the map T from Example 1.3.5. In Exercise 1.3.2 we saw that T is measure preserving with respect to Lebesgue measure λ. We now show that T is ergodic with respect to Lebesgue measure λ using Knopp’s Lemma and essentially the same approach as for the doubling map from

Recurrence and Ergodicity  31

the previous example. We first define the collection S. Recall the definition of the digits ai ≥ 2 in Example 1.3.5. For each i ≥ 2 we set 

∆(i) =

1 1 , i i−1



= {x ∈ [0, 1) : a1 (x) = i}

and for n ≥ 1 we define the fundamental intervals of rank n to be all intervals of the form ∆(i1 , i2 , . . . , in ) = ∆(i1 ) ∩ T −1 ∆(i2 ) ∩ · · · ∩ T −(n−1) ∆(in ) = {x : a1 (x) = i1 , a2 (x) = i2 , . . . , an (x) = in }. We choose as S the collection of all fundamental intervals of all ranks. Notice that ∆(i1 , i2 , . . . , in ) is an interval with endpoints Pn Qn

and

Pn 1 + , Qn i1 (i1 − 1) · · · in (in − 1)

where 1 1 1 Pn = + + ··· + . Qn i1 i1 (i1 − 1)i2 i1 (i1 − 1) · · · in−1 (in−1 − 1)in Furthermore, T n (∆(i1 , i2 , . . . , in )) = [0, 1), and T n restricted to ∆(i1 , i2 , . . . , in ) has slope i1 (i1 − 1) · · · in−1 (in−1 − 1)in (in − 1) =

1 . λ(∆(i1 , i2 , . . . , in ))

Since limn→∞ λ(∆(i1 , i2 , . . . , in )) = 0 for any choice of digits i1 , i2 , . . ., the collection S generates the Lebesgue σ-algebra. Now let E be a T invariant Lebesgue set of positive Lebesgue measure, and let A be any fundamental interval of rank n. Then λ(A ∩ E) = λ(A ∩ T −n E) = λ(E)λ(A). By Knopp’s Lemma with γ = λ(E) we get that λ(E) = 1, i.e., T is ergodic with respect to λ. Remark 2.3.1. The notion of fundamental intervals introduced in the previous example is not specific to the L¨ uroth map. In fact, they are just the maximal intervals on which the iterates T n are monotone, and one can define them similarly for any piecewise smooth, monotone interval map. For the doubling map these correspond to the dyadic intervals  k k+1  , 2n 2n . In some sense one can think of these intervals as analogues for interval maps of the cylinder sets for symbolic systems.

32  A First Course in Ergodic Theory

Exercise 2.3.4. Let λ be the Lebesgue measure on ([0, 1), B), where B is the Lebesgue σ-algebra. Consider the transformation T : [0, 1) → [0, 1) given by ( 3x, if 0 ≤ x < 31 , Tx = 3 1 1 2 x − 2 , if 3 ≤ x < 1. For x ∈ [0, 1) let (

s1 (x) =

(

3, if 0 ≤ x < 13 , 3 1 2 , if 3 ≤ x < 1,

and

h1 (x) =

0, if 0 ≤ x < 31 , 1 1 2 , if 3 ≤ x < 1.

Set sn = sn (x) = s1 (T n−1 x) and hn = hn (x) = h1 (T n−1 x). ∞ X hk (a) Show that for any x ∈ [0, 1) one has x = . s s · · · sk k=1 1 2 (b) Show that T is measure preserving and ergodic with respect to the measure λ. Let

(

a1 (x) =

0, if 0 ≤ x < 13 , 1, if 31 ≤ x < 1,

and set an = an (x) = a1 (T n−1 x) for n ≥ 1. (c) Show that for each n ≥ 1 and any sequence i1 , i2 , . . . , in ∈ {0, 1} one has λ ({x ∈ [0, 1) : a1 (x) = i1 , a2 (x) = i2 , . . . , an (x) = in }) =

2k , 3n

where k = #{1 ≤ j ≤ n : ij = 1}. √

Exercise 2.3.5. Consider for β = 1+2 5 , the golden mean, the βtransformation Tβ : [0, 1) → [0, 1) given by Tβ x = βx (mod 1) from Example 1.3.6. Use Lemma 2.3.1 to show that Tβ is ergodic with respect to Lebesgue measure λ and thus also with respect to the invariant measure µ given in Example 1.3.6.

CHAPTER

3

The Pointwise Ergodic Theorem and Mixing

In this chapter we let (X, F, µ) be a probability space. If T : X → X is measure preserving and f ∈ L1 (X, F, µ), then the sequence f, f ◦ T, f ◦ T 2 , . . ., when considered as random variables, is identically distributed R and Eµ (|f |) = Eµ (|f ◦ T n |) = X |f ◦ T n | dµ < ∞. The Strong Law of Large Numbers in probability theory states that for a sequence Y1 , Y2 , . . . of i.i.d. random variables on a probability space (Ω, G, P), with Eµ (|Yi |) < ∞ one has n 1X lim Yi = Eµ (Y1 ) P − a.e. n→∞ n i=1 Since the sequence f, f ◦T, f ◦T 2 , . . . satisfies two out of three conditions of the Strong Law of Large Numbers (only independence is missing) one could wonder about the asymptotic behavior of the sequence of averages  1 Pn−1 i f ◦ T . When does it converge? In this chapter we will see i=0 n one result where independence is replaced by the weaker condition of  R P i to f dµ is ergodicity and pointwise convergence of n1 n−1 f ◦ T i=0 X guaranteed. This is the Pointwise Ergodic Theorem, first proved in 1931 by G. D. Birkhoff, see [5]. The next chapter contains two other ergodic theorems with different conditions on the system and different modes of convergence.

33

34  A First Course in Ergodic Theory

3.1

THE POINTWISE ERGODIC THEOREM

Since 1931 several proofs of the Pointwise Ergodic Theorem have been obtained; here we present a more recent proof given by T. Kamae and M. S. Keane in [28]. Theorem 3.1.1 (Pointwise Ergodic Theorem). Let (X, F, µ) be a probability space and T : X → X a measure preserving transformation. Then, for any f ∈ L1 (X, F, µ), X 1 n−1 f (T i (x)) = f ∗ (x) n→∞ n i=0

lim

exists µ-a.e., is T -invariant and X f dµ = XRf ∗ dµ. If moreover T is ergodic, then f ∗ is a constant µ-a.e. and f ∗ = X f dµ. R

R

For the proof of the above theorem, we need the following simple lemma. Lemma 3.1.1. Let M > 0 be an integer, and suppose (an )n≥0 , (bn )n≥0 are sequences of non-negative real numbers such that for each n ≥ 0 there exists an integer 1 ≤ m ≤ M with an + · · · + an+m−1 ≥ bn + · · · + bn+m−1 . Then, for each positive integer N > M , one has a0 + · · · + aN −1 ≥ b0 + · · · + bN −M −1 . Proof of Lemma 3.1.1. Using the hypothesis we recursively find integers m0 < m1 < · · · < mk < N with the following properties: m0 ≤ M, mi+1 − mi ≤ M for i = 0, . . . , k − 1, and N − mk < M, a0 + · · · + am0 −1 ≥ b0 + · · · + bm0 −1 , am0 + · · · + am1 −1 ≥ bm0 + · · · + bm1 −1 , .. . amk−1 + · · · + amk −1 ≥ bmk−1 + · · · + bmk −1 . Then, a0 + · · · + aN −1 ≥ a0 + · · · + amk −1 ≥ b0 + · · · + bmk −1 ≥ b0 + · · · + bN −M −1 .

The Pointwise Ergodic Theorem and Mixing  35

Proof of Theorem 3.1.1. Assume with no loss of generality that f ≥ 0 (otherwise we write f = f + − f − , and we consider each part separately). Write fn (x) = f (x) + · · · + f (T n−1 x), f (x) = lim sup n→∞

fn (x) , n

and

f (x) = lim inf n→∞

fn (x) . n

Then both f and f are T -invariant, since fn (T x) n   fn+1 (x) n + 1 f (x) = lim sup · − n+1 n n n→∞ fn+1 (x) = lim sup = f (x) n+1 n→∞

f (T x) = lim sup n→∞

and a similarly for f . Now, to prove that f ∗ exists, is integrable and T -invariant, it is enough to show that Z

f dµ ≥

X

Z

f dµ ≥

X

Z

f dµ. X

For since f − f ≥ 0, this would imply that f = f =: f ∗ µ-a.e. R

R

We first prove that X f dµ ≤ X f dµ. Fix any 0 <  < 1, and let L > 0 be any real number. By definition of f , for any x ∈ X, there exists an integer m > 0 such that fm (x) ≥ min(f (x), L)(1 − ). m For n ≥ 1, let Yn = {x ∈ X : ∃ 1 ≤ m ≤ n with fm (x) ≥ m min(f (x), L)(1 − )}. Note that (Yn ) is an increasing sequence with X=

∞ [

Yn . Thus, for any δ > 0 there exists an integer M > 0 such that

n=1

the set YM = {x ∈ X : ∃ 1 ≤ m ≤ M with fm (x) ≥ m min(f (x), L)(1 − )} has measure at least 1 − δ. Set X0 = YM and define F on X by (

F (x) =

f (x), if x ∈ X0 , L, if x ∈ / X0 .

36  A First Course in Ergodic Theory

Notice that f ≤ F (why?). For any x ∈ X and n ≥ 0, let an = an (x) = F (T n x), and bn = bn (x) = min(f (x), L)(1 − ) (so bn is independent of n). We now show that (an ) and (bn ) satisfy the hypothesis of Lemma 3.1.1 with M > 0 as above. For any n ≥ 0 there are two cases: • If T n x ∈ X0 , then there exists an 1 ≤ m ≤ M such that fm (T n x) ≥ m min(f (T n x), L)(1 − ) = m min(f (x), L)(1 − ) = bn + · · · + bn+m−1 . Hence, an + · · · + an+m−1 = F (T n x) + · · · + F (T n+m−1 x) ≥ f (T n x) + · · · + f (T n+m−1 x) = fm (T n x) ≥ bn + · · · + bn+m−1 . • If T n x ∈ / X0 , then take m = 1 since an = F (T n x) = L ≥ min(f (x), L)(1 − ) = bn . Hence by Lemma 3.1.1 for all integers N > M one has F (x) + · · · + F (T N −1 x) ≥ (N − M ) min(f (x), L)(1 − ). Integrating both sides, and using the fact that T is measure preserving, one gets by Proposition 1.2.2 that Z

N

F dµ ≥ (N − M )

Z

X

min(f (x), L)(1 − ) dµ(x).

X

Since

Z

Z

f dµ + Lµ(X \ X0 ),

F dµ = X

X0

one has Z

f dµ ≥

X

Z

f dµ ZX0

F dµ − Lµ(X \ X0 )

= X



(N − M ) N

Z X

min(f (x), L)(1 − ) dµ(x) − Lδ.

The Pointwise Ergodic Theorem and Mixing  37

Now letting first N → ∞, then δ → 0, then  → 0, and lastly L → ∞ one gets together with the Monotone Convergence Theorem that f is integrable, and Z Z f dµ ≥

f dµ.

X

X

R

R

We now prove that X f dµ ≤ X f dµ. Fix  > 0 and δ0 > 0. Since f ≥ 0, there exists a δ > 0 such that whenever A ∈ F with µ(A) < δ, R then A f dµ < δ0 . Note that for any x ∈ X there exists an integer m such that fm (x) ≤ (f (x) + ). m Now choose M > 0 such that the set Y0 = {x ∈ X : ∃ 1 ≤ m ≤ M with fm (x) ≤ m (f (x) + )} has measure at least 1 − δ. Define G on X by (

G(x) =

f (x), if x ∈ Y0 , 0, if x ∈ / Y0 .

Then G ≤ f . Let bn = G(T n x), and an = f (x) +  (so an is independent of n). One can easily check that the sequences (an ) and (bn ) satisfy the hypothesis of Lemma 3.1.1 with M > 0 as above. Hence for any N > M , one has G(x) + · · · + G(T N −M −1 x) ≤ N (f (x) + ). Integrating both sides yields (N − M )

Z

Z

G(x) dµ(x) ≤ N

X

R X\Y0

f dµ < δ0 . Hence,

Z

Z

f dµ = X

f (x) dµ(x) +  . X

Since µ(X \ Y0 ) < δ, then Z



G dµ + Y0

N ≤ N −M

f dµ X\Y0

Z

(f (x) + ) dµ(x) + δ0 . X

Now, by letting first N → ∞, then δ → 0, δ0 → 0, and finally  → 0, one gets Z Z f dµ ≤ f dµ. X

X

38  A First Course in Ergodic Theory

This shows that

Z

Z

f dµ ≥

X

f dµ ≥

Z

f dµ,

X

X

hence, f = f = f ∗ µ-a.e., and f ∗ is T -invariant. In case T is ergodic, then by Theorem 2.2.2 the T -invariance of f ∗ implies that f ∗ is a constant µ-a.e. Therefore, for µ-a.e. x ∈ X, f ∗ (x) =

Z

f ∗ dµ =

Z

X

f dµ. X

Remark 3.1.1. (i) We can say a bit more about the limit f ∗ in case T is not ergodic. Let I be the sub-σ-algebra of F consisting of all T -invariant subsets A ∈ F. Notice that if f ∈ L1 (X, F, µ), then the conditional expectation of f given I (denoted by Eµ (f |I)), is the unique µ-a.e. Imeasurable L1 (X, F, µ)-function with the property that Z

Z

f dµ = A

A

Eµ (f |I) dµ

for all A ∈ I, i.e., with T −1 A = A. We claim that f ∗ = Eµ (f |I). Since the limit function f ∗ is T -invariant, it follows that f ∗ is I-measurable. Furthermore, for any A ∈ I, by the Pointwise Ergodic Theorem and the T -invariance of 1A , X X 1 n−1 1 n−1 (f 1A )(T i x) = 1A (x) lim f (T i x) = 1A (x)f ∗ (x) µ − a.e. n→∞ n n→∞ n i=0 i=0

lim

and

Z

Z

f 1A dµ = X

f ∗ 1A dµ.

X

This shows that f ∗ = Eµ (f |I). (ii) Suppose T is ergodic and measure preserving with respect to a probability measure µ, and let ν be a probability measure equivalent to µ (i.e., µ and ν have the same sets of measure zero). Then for every f ∈ L1 (X, F, µ) one has ν-a.e. that X 1 n−1 f (T i (x)) = n→∞ n i=0

Z

lim

f dµ. X

This observation is useful in Exercise 3.1.2 below.

The Pointwise Ergodic Theorem and Mixing  39

Exercise 3.1.1. Show that if T is measure preserving on the probability space (X, F, µ) and f ∈ L1 (X, F, µ), then f (T n x) =0 n→∞ n lim

for µ-a.e. x ∈ X. √

Exercise 3.1.2. For β = 1+2 5 , the golden mean, consider the βtransformation Tβ : [0, 1) → [0, 1), given by Tβ x = βx (mod 1) = βx − bβxc as in Example 1.3.6. Define bn on [0, 1) by b1 (x) =

 0, 1,

if 0 ≤ x < β1 , if

1 β

≤ x < 1,

and bn (x) = b1 (Tβn−1 x) for n ≥ 1. Fix k ≥ 0. Consider the limit 1 #{1 ≤ i ≤ n : bi = 0, bi+1 = 0, . . . , bi+k = 0} n→∞ n lim

and find its a.e. value (with respect to Lebesgue measure). Exercise 3.1.3. Let (X, F, µ) be a probability space and f ∈ L1 (X, F, µ). Suppose {Tt : t ∈ R} is a family of transformations Tt : X → X satisfying (i) T0 = idX and Tt+s = Tt ◦ Ts , (ii) Tt is measurable, measure preserving and ergodic with respect to µ, (iii) The map G : X × R → X given by G(x, t) = f (Tt (x)) is measurable, where X × R is endowed with the product σ-algebra F ⊗ B and product measure µ × λ, with B the Borel σ-algebra on R and λ the Lebesgue measure. (a) Show that for all s ≥ 0, Z

Z

Z Z

f (Tt (x)) dµ(x) dλ(t) = [0,s]

X

f (Tt (x)) dλ(t) dµ(x) X

[0,s]

Z

=s

f dµ. X

R

(b) Show that for all s ≥ 0, [0,s] f (Tt (x)) dλ(t) < ∞ µ-a.e. R (c) Define F : X → R by F (x) = [0,1] f (Tt (x)) dλ(t), and consider the

40  A First Course in Ergodic Theory

transformation T1 corresponding to t = 1. Show that for any n ≥ 1 one has n−1 X

F (T1k (x))

Z

=

f (Tt (x)) dλ(t), [0,n]

k=0

R

and X F dµ = (d) Show that

R X

f dµ.

1 n→∞ n

Z

Z

f (Tt (x)) dλ(t) =

lim

f dµ X

[0,n]

holds for µ-a.e. x. Using the Pointwise Ergodic Theorem, one can give yet another characterization of ergodicity. Corollary 3.1.1. Let (X, F, µ) be a probability space and T : X → X a measure preserving transformation. Then, T is ergodic if and only if for all A, B ∈ F, one has X 1 n−1 µ(T −i A ∩ B) = µ(A)µ(B). n→∞ n i=0

lim

(3.1)

Proof. Suppose T is ergodic, and let A, B ∈ F. Since the indicator function 1A ∈ L1 (X, F, µ), by the Pointwise Ergodic Theorem one has X 1 n−1 1A (T i x) = n→∞ n i=0

Z

1A dµ = µ(A) µ-a.e.

lim

X

Then, X X 1 n−1 1 n−1 1T −i A∩B (x) = lim 1T −i A (x)1B (x) n→∞ n n→∞ n i=0 i=0

lim

X 1 n−1 1A (T i x) n→∞ n i=0

= 1B (x) lim

= 1B (x)µ(A)

µ-a.e.

Since for each n, the function n1 n−1 i=0 1T −i A∩B is dominated by the constant function 1, it follows by the Dominated Convergence Theorem that P

X 1 n−1 lim µ(T −i A ∩ B) = n→∞ n i=0

X 1 n−1 1T −i A∩B (x) dµ(x) X n→∞ n i=0

Z

lim

Z

=

1B µ(A) dµ(x) X

= µ(A)µ(B).

The Pointwise Ergodic Theorem and Mixing  41

Conversely, suppose (3.1) holds for every A, B ∈ F. Let E ∈ F be such that T −1 E = E. By invariance of E, we have µ(T −i E ∩ E) = µ(E) for each i, so X 1 n−1 lim µ(T −i E ∩ E) = µ(E). n→∞ n i=0 On the other hand, by (3.1) X 1 n−1 µ(T −i E ∩ E) = µ(E)2 . n→∞ n i=0

lim

Hence, µ(E) = µ(E)2 , which implies µ(E) = 0 or µ(E) = 1. Therefore, T is ergodic. To show ergodicity one needs to verify equation (3.1) for sets A and B belonging to a generating semi-algebra only as the next proposition shows. Proposition 3.1.1. Let (X, F, µ) be a probability space, and S a generating semi-algebra of F. Let T : X → X be a measure preserving transformation. Then, T is ergodic if and only if for all A, B ∈ S, one has X 1 n−1 lim µ(T −i A ∩ B) = µ(A)µ(B). (3.2) n→∞ n i=0 Proof. Assume that (3.2) holds for all A, B ∈ S. Note that it then also holds for all elements in the algebra generated by S. We only need to show that (3.2) holds for all A, B ∈ F. Let  > 0, and A, B ∈ F. Then, by Lemma 12.2.1 there exist sets A0 , B0 each of which is a finite disjoint union of elements of S such that µ(A∆A0 ) < ,

and µ(B∆B0 ) < .

Since, (T −i A ∩ B)∆(T −i A0 ∩ B0 ) ⊆ (T −i A∆T −i A0 ) ∪ (B∆B0 ) for any i ≥ 1, it follows that |µ(T −i A ∩ B) − µ(T −i A0 ∩ B0 )| ≤ µ (T −i A ∩ B)∆(T −i A0 ∩ B0 )



≤ µ(T −i A∆T −i A0 ) + µ(B∆B0 ) < 2

42  A First Course in Ergodic Theory

for any i ≥ 1. Further, |µ(A)µ(B) − µ(A0 )µ(B0 )| ≤ µ(A)|µ(B) − µ(B0 )| + µ(B0 )|µ(A) − µ(A0 )| ≤ |µ(B) − µ(B0 )| + |µ(A) − µ(A0 )| ≤ µ(B∆B0 ) + µ(A∆A0 ) < 2. Hence, for any n,  1 n−1  X µ(T −i A ∩ B) − µ(A)µ(B)

n

i=0

 1 n−1 X

− ≤

1 n

n

n−1 X



µ(T −i A0 ∩ B0 ) − µ(A0 )µ(B0 )

i=0

µ(T −i A ∩ B) − µ(T −i A0 ∩ B0 ) + µ(A)µ(B) − µ(A0 )µ(B0 )

i=0

< 4. Therefore, lim

 1 n−1 X

n

n→∞



µ(T −i A ∩ B) − µ(A)µ(B) = 0.

i=0

Theorem 3.1.2. Suppose µ1 and µ2 are probability measures on (X, F), and T : X → X is a transformation that is measure preserving with respect to both µ1 and µ2 . Then, (i) if T is ergodic with respect to µ1 , and µ2 is absolutely continuous with respect to µ1 , then µ1 = µ2 , and (ii) if T is ergodic with respect to µ1 and µ2 , then either µ1 = µ2 or µ1 and µ2 are singular with respect to each other. Proof. (i) Suppose T is ergodic with respect to µ1 and µ2 is absolutely continuous with respect to µ1 . For any A ∈ F, by the Pointwise Ergodic Theorem for µ1 -a.e. x one has X 1 n−1 1A (T i x) = µ1 (A). n→∞ n i=0

lim

Let n

o X 1 n−1 1A (T i x) = µ1 (A) , n→∞ n i=0

CA = x ∈ X : lim

The Pointwise Ergodic Theorem and Mixing  43

then µ1 (CA ) = 1, and by absolute continuity of µ2 also µ2 (CA ) = 1. Since T is measure preserving with respect to µ2 , for each n ≥ 1 one has X 1 n−1 n i=0

Z

1A (T i x) dµ2 (x) = µ2 (A).

X

On the other hand, by the Dominated Convergence Theorem one has Z

lim

n→∞ X

X 1 n−1 1A (T i x) dµ2 (x) = n i=0

Z

µ1 (A) dµ2 . X

This implies that µ1 (A) = µ2 (A). Since A ∈ F is arbitrary, we have µ1 = µ2 . (ii) Suppose T is ergodic with respect to µ1 and µ2 . Assume that µ1 6= µ2 . Then, there exists a set A ∈ F such that µ1 (A) 6= µ2 (A). For i = 1, 2, let n o X 1 n−1 Ci = x ∈ X : lim 1A (T j x) = µi (A) . n→∞ n j=0 By the Pointwise Ergodic Theorem µi (Ci ) = 1 for i = 1, 2. Since µ1 (A) 6= µ2 (A), then C1 ∩ C2 = ∅. Thus µ1 and µ2 are supported on disjoint sets, and hence µ1 and µ2 are mutually singular. Exercise 3.1.4. Imagine a monkey typing on a laptop with a keyboard that has only 26 keys, one key for each letter of the alphabet and no space bar, numbers or other keys. Suppose that the monkey hits one key every second, that he hits each key with equal probability (independently of the preceding letter) and that he goes on forever. (a) Show how to model this using a shift space on 26 symbols with an appropriate Bernoulli measure. (b) Use the Pointwise Ergodic Theorem to show that with probability 1 the monkey will eventually type the word “BANANA”.

3.2

NORMAL NUMBERS

The Pointwise Ergodic Theorem can be used to compute the frequencies of digits in various number expansions. Recall the doubling map T2 : [0, 1) → [0, 1) defined by T2 x = 2x (mod 1) from Example 1.3.3 and its association to binary expansions of numbers in [0, 1).

44  A First Course in Ergodic Theory

Except for the dyadic rationals each number x ∈ [0, 1) has a unique binary expansion and this can be obtained from iterations of T2 as in (1.2). For a number x the n-th binary digit bn (x) equals 0 precisely if  n−1 1 T2 x ∈ 0, 2 . So the frequency of the digit 0 in the sequence (bn (x)) representing the binary expansion of x is therefore given by X 1 n−1 1[0, 1 ) (T2i x). 2 n→∞ n i=0

lim

T2 is measure preserving and ergodic with respect to the Lebesgue measure. The Pointwise Ergodic Theorem then implies that this limit is   equal to λ 0, 12 = 12 for Lebesgue almost every x ∈ [0, 1). Obviously for those numbers x the frequency of the digit 1 in their binary expansion is also equal to 12 . Instead of single digits, we could also consider arbitrary blocks of digits a1 , a2 , . . . , ak ∈ {0, 1}. The block a1 a2 · · · ak occurs at position n of the binary expansion of x precisely if T2n−1 x

ak a1 a2 ak + 1 a1 a2 . + 2 + ··· + k, + 2 + ··· + ∈ 2 2 2 2 2 2k 



By the Pointwise Ergodic Theorem the block a1 a2 · · · ak occurs in the binary expansion of Lebesgue almost every x ∈ [0, 1) with frequency 

λ

a1 a2 ak a1 a2 ak + 1 + 2 + ··· + k, + 2 + ··· + 2 2 2 2 2 2k



=

1 . 2k

So typically any block of digits of length k occurs with frequency 21k in the binary expansion of x. A number with this property is called normal in base 2. Now consider another integer N ≥ 2 and define the ×N transformation TN : [0, 1) → [0, 1) by TN x = N x − bN xc. Iterations of TN generate the base N expansion of points in the unit interval much like how binary expansions were obtained from the doubling map. For x ∈ [0, 1) and  i ≥ 1 the i-th base N digit is given by ci = ci (x) = k if TNi−1 x ∈ Nk , k+1 N . Then ∞ X ci x= . Ni i=1 This representation is unique if x 6= N i − 1.

k Ni

for some i ≥ 1 and 0 ≤ k ≤

The Pointwise Ergodic Theorem and Mixing  45

Exercise 3.2.1. Prove that TN is measure preserving and ergodic on the probability space ([0, 1), B, λ), where B is the Lebesgue σ-algebra and λ the Lebesgue measure. From Exercise 3.2.1 we see that we can again use the Pointwise Ergodic Theorem to obtain the frequency of digits and blocks of digits in typical base N expansions. Take any a1 , . . . , ak ∈ {0, 1, . . . , N − 1}. For Lebesgue almost all x ∈ [0, 1) the frequency of the block a1 a2 · · · ak in the base N expansion of x is equal to X 1 n−1 1[ a1 + a2 +···+ ak , a1 + a2 +···+ ak +1 ) (TNi x) n→∞ n N N2 N2 Nk N Nk i=0

lim

Z

= [0,1)

1[ a1 + a2 +···+ ak , a1 + a2 +···+ ak +1 ) dλ N

N2

Nk

N

N2

Nk

1 = k. N Hence, Lebesgue almost every x ∈ [0, 1) is normal in base N . Numbers in [0, 1) that are normal in all integer bases N ≥ 2 are called (absolutely) normal. If we denote the exceptional set of numbers that are not normal in base N by EN , then λ(EN ) = 0 for all N ≥ 2  S and thus also λ N ≥2 EN = 0. So we see that Lebesgue almost every x ∈ [0, 1) is normal. This result was first obtained using the BorelCantelli Lemma by E. Borel in [6] from 1909. Unfortunately his proof was not constructive (nor is the proof above) and even though nowadays it is possible to construct a normal number up to any desired precision, it is still not√known whether famous constants like π (or π − 3), e (or e − 2) and 12 2 are normal or not. Exercise 3.2.2. For x ∈ [0, 1], let (an (x))n≥1 denote the finite or infinite sequence of digits of x as produced by the L¨ uroth map from Example 1.3.5. (a) Let E denote the set of x ∈ [0, 1] for which the sequence (an (x))n≥1 contains only even integers. Prove that E has zero Lebesgue measure. (b) Prove that for Lebesgue almost every x ∈ [0, 1] we have lim

n→∞

#{1 ≤ k ≤ n : ak (x) is even} = ln 2. n

What is the frequency of the odd digits in the L¨ uroth expansion of Lebesgue almost every x ∈ [0, 1]?

46  A First Course in Ergodic Theory

3.3

IRREDUCIBLE MARKOV CHAINS

Consider the Markov shift from Example 1.3.10, i.e., X = {0, 1, . . . , N − 1}Z , C is the σ-algebra generated by the cylinders, T : X → X is the left shift, and µ the Markov measure defined by a stochastic N × N matrix P = (pij ), and a positive probability vector π = (π0 , π1 , . . . , πN −1 ) satisfying πP = π through µ({x : xl = il , xl+1 = il+1 , . . . , xn = in }) = πil pil il+1 pil+1 il+2 . . . pin−1 in . We want to find necessary and sufficient conditions for T to be ergodic. To achieve this, we first set X 1 n−1 P k, n→∞ n k=0

Q = lim (k)

where P k = (pij ) is the k-th power of the matrix P , and P 0 is the N × N identity matrix. More precisely, Q = (qij ), where X (k) 1 n−1 pij . n→∞ n k=0

qij = lim

Lemma 3.3.1. For each i, j ∈ {0, 1, . . . , N − 1}, the limit P (k) limn→∞ n1 n−1 k=0 pij exists, i.e., qij is well-defined. Proof. For each n, X (k) X 1 1 n−1 1 n−1 pij = µ({x ∈ X : x0 = i, xk = j}). n k=0 πi n k=0

Since T is measure preserving, by the Pointwise Ergodic Theorem it holds for µ-a.e. y ∈ X that X X 1 n−1 1 n−1 1{x:xk =j} (y) = lim 1{x:x0 =j} (T k y) = f ∗ (y), n→∞ n n→∞ n k=0 k=0

lim

where f ∗ is T -invariant and integrable. Then, X X 1 n−1 1 n−1 1{x:x0 =i,xk =j} (y) = 1{x:x0 =i} (y) lim 1{x:x0 =j} (T k y) n→∞ n n→∞ n k=0 k=0

lim

= f ∗ (y)1{x:x0 =i} (y).

The Pointwise Ergodic Theorem and Mixing  47

Since n1 n−1 k=0 1{x:x0 =i,xk =j} (y) ≤ 1 for all n, by the Dominated Convergence Theorem, P

qij = =

X 1 n−1 1 lim µ({x ∈ X : x0 = i, xk = j}) n→∞ πi n k=0

1 πi

X 1 n−1 1{x:x0 =i,xk =j} (y) dµ(y) n→∞ n k=0 X

Z

lim

1 f ∗ (y)1{x:x0 =i} (y) dµ(y) = πi X Z 1 f ∗ (y) dµ(y), = πi {x:x0 =i} Z

which is finite since f ∗ is integrable. Hence qij exists. Exercise 3.3.1. Show that the matrix Q has the following properties: (a) Q is stochastic, (b) Q = QP = P Q = Q2 , (c) πQ = π. We now give a characterization for the ergodicity of T . Recall that the matrix P is said to be irreducible if for every i, j ∈ {0, 1, . . . , N − 1}, (n) there exists an n ≥ 1 such that pij > 0. Theorem 3.3.1. The following are equivalent. (i) T is ergodic. (ii) All rows of Q are identical. (iii) qij > 0 for all i, j. (iv) P is irreducible. (v) 1 is a simple eigenvalue of P . Proof. (i) ⇒ (ii) By the Pointwise Ergodic Theorem it holds for each i, j, that X 1 n−1 lim 1{x:x0 =i,xk =j} (y) = 1{x:x0 =i} (y)πj . n→∞ n k=0 By the Dominated Convergence Theorem, X 1 n−1 µ({x ∈ X : x0 = i, xk = j}) = πi πj . n→∞ n k=0

lim

48  A First Course in Ergodic Theory

Hence, qij =

X 1 1 n−1 lim µ({x ∈ X : x0 = i, xk = j}) = πj , πi n→∞ n k=0

i.e., qij is independent of i. Therefore, all rows of Q are identical. (ii) ⇒ (iii) If all the rows of Q are identical, then all the columns of Q are constants. Thus, for each j there exists a constant cj such that qij = cj for all i. Since πQ = π, it follows that qij = cj = πj > 0 for all i, j. (iii) ⇒ (iv) For any i, j, X (k) 1 n−1 pij = qij > 0. n→∞ n k=0

lim

(n)

Hence, there exists an n such that pij > 0 and therefore P is irreducible. (iv) ⇒ (iii) Suppose P is irreducible. For any state i ∈ {0, 1, . . . , N − 1}, let Si = {j : qij > 0}. Since Q is a stochastic matrix, it follows that Si 6= ∅. Let l ∈ Si , then qil > 0. Since Q = QP = QP n for all n, then for any state j qij =

N −1 X

(n)

(n)

qim pmj ≥ qil plj

m=0 (n)

for any n. Since P is irreducible, there exists an n such that plj > 0. Hence, qij > 0 for all i, j. (iii) ⇒ (ii) Suppose qij > 0 for all j ∈ {0, 1, . . . , N − 1}. Fix any state j, and let qj = max0≤i≤N −1 qij . Suppose that not all the qij ’s are the same. Then there exists a k ∈ {0, 1, . . . , N − 1} such that qkj < qj . Since Q is stochastic and Q2 = Q, then for any i ∈ {0, 1, . . . , N − 1} we have, qij =

N −1 X l=0

qil qlj
0 there is an N ≥ 1, such that if j ≥ N ,

1 n−1

X X 1 n−1 1 n−1

X i i h◦T ≤ k(h − hj ) ◦ T k2 + k hj ◦ T i k2

n

i=0

n

2

=

1 n

i=0 n−1 X

n

n

i=0

1 n−1

X

= k(h − hj )k2 +

1 n−1

X

< ε+

i=0

1 n−1

X

k(h − hj )k2 + hj ◦ T i

n

n

i=0

2

i=0



hj ◦ T i

2



hj ◦ T i .

i=0

Taking the limit as n → ∞ we see that

2

X 1 n−1 h ◦ T i converges to 0 in n i=0

L2 (µ). Since X X X 1 n−1 1 n−1 1 n−1 f ◦ Ti = ΠT (f ) ◦ T i + h ◦ T i, n i=0 n i=0 n i=0

we get the required result.

56  A First Course in Ergodic Theory

Exercise 4.1.1. Let (X, F, µ) be a probability space and T : X → X a measure preserving transformation. Prove that for any f ∈ L1 (X, F, µ)  P  i the sequence n1 n−1 converges in L1 to some T -invariant funci=0 f ◦ T tion f ∗ ∈ L1 (X, F, µ). (Hint: use Scheff´e’s Lemma, Lemma 12.4.1). Exercise 4.1.2. Let (X, F, µ) be a probability space, T : X → X a measure preserving transformation  P and 1 < p< ∞, prove that for any n−1 p f ∈ L (X, F, µ) the sequence n1 i=0 f ◦ T i converges in Lp to some T -invariant function f ∗ ∈ Lp (X, F, µ).

4.2

THE HUREWICZ ERGODIC THEOREM

In this section we will assume that T : X → X is an invertible transformation that is not necessarily measure preserving. The invertibility and the non-singularity of T imply that for any n ∈ Z, µ(A) = 0

if and only if

µ(T n A) = 0.

Hence, for any n the measure µ ◦ T n defined by µ ◦ T n (A) = µ(T n A) for any measurable set A is equivalent to µ (and hence also equivalent to any other measure µ ◦ T k ). By the Radon-Nikodym Theorem there then n exists for each n, a non-negative measurable function ωn = dµ◦T such dµ that Z µ(T n A) = ωn dµ, A ∈ F. A

(So ω0 = 1.) We have the following propositions. Proposition 4.2.1. Suppose (X, F, µ) is a probability space, and T : X → X is an invertible transformation. Then for every f ∈ L1 (X, F, µ) and n ∈ Z, Z Z f (T n x)ωn (x) dµ(x).

f dµ = X

X

Proof. We show the result for indicator functions only, the rest of the proof is left to the reader. Let A ∈ F and n ∈ Z. Then Z

1A dµ = µ(A) = µ(T n (T −n A))

X

Z

= ZT

−n A

= X

ωn (x) dµ(x)

1A (T n x)ωn (x) dµ(x).

More Ergodic Theorems  57

Proposition 4.2.2. Under the assumptions of Proposition 4.2.1, one has for all n, m ≥ 1, that the set Y := {x ∈ X : ωn+m (x) = ωn (x)ωm (T n x)}

(4.1)

satisfies µ(Y ) = 1. Proof. By Proposition 4.2.1, for any A ∈ F, Z

ωn (x)ωm (T n x) dµ(x) =

A

Z

1T n A (T n x)ωm (T n x)ωn (x) dµ(x)

X

Z

=

1T n A (x)ωm (x) dµ(x) ZX

=

1T n+m A (T m x)ωm (x) dµ(x)

ZX

=

1T m+n A (x) dµ(x) ZX

=

ωn+m (x) dµ(x). A

Hence, ωn+m (x) = ωn (x)ωm (T n x) for µ-a.e x. Exercise 4.2.1. Let (X, F, µ) be a probability space, and T : X → X an invertible transformation. For any measurable function f , set fn (x) = Pn−1 i i=0 f (T x)ωi (x), n ≥ 1. Show that for all n, m ≥ 1 fn+m (x) = fn (x) + ωn (x)fm (T n x) for every element x of the set Y from (4.1). Proposition 4.2.3. Suppose T is invertible and conservative. Then ∞ X

ωn (x) = ∞,

µ-a.e.

n=1

Proof. Let A = {x ∈ X : A=

P∞

n=1

ωn (x) < ∞}. Note that

∞ n [

x∈X:

M =1

∞ X

o

ωn (x) < M .

n=1

If µ(A) > 0, then there exists an M ≥ 1 such that the set n

B= x∈X:

∞ X n=1

ωn (x) < M

o

58  A First Course in Ergodic Theory

has positive measure. Then, B ∞ n=1 ωn dµ < M µ(B) < ∞. However, by the Monotone Convergence Theorem, R P

Z X ∞

ωn dµ =

B n=1

= = =

∞ Z X

ωn dµ

n=1 B ∞ X

µ(T n B)

n=1 ∞ Z X n=1 X Z X ∞

1T n B dµ 1B ◦ T −n dµ.

X n=1

Hence,

R P∞ X

n=1

1B ◦ T −n dµ < ∞, which implies that ∞ X

1B (T −n x) < ∞ µ-a.e.

n=1

Therefore, for µ-a.e. x one has T −n x ∈ B for only finitely many n ≥ 1, contradicting Corollary 2.1.1. Thus µ(A) = 0, and ∞ X

ωn (x) = ∞,

µ-a.e.

n=1

The following theorem by W. Hurewicz is a generalization of the Pointwise Ergodic Theorem to the setting of non-measure preserving transformations; see also W. Hurewicz’ original paper [24]. We give a new proof similar to the proof presented for the Pointwise Ergodic Theorem in Section 3.1, see also [28]. Theorem 4.2.1 (Hurewicz Ergodic Theorem). Let (X, F, µ) be a probability space, and T : X → X an invertible and conservative transformation. For any f ∈ L1 (X, F, µ) the limit n−1 X

lim

n→∞

f (T i x)ωi (x)

i=0 n−1 X

= f∗ (x) ωi (x)

i=0

R

exists µ-a.e., is T -invariant and X f dµ = R is ergodic, then f∗ = X f dµ is a constant.

R X

f∗ dµ. Furthermore, if T

More Ergodic Theorems  59

Proof. By the invertibility and non-singularity of T , µ({x ∈ X : ωn (T k x) > 0

for all n ≥ 1 and k ∈ Z}) = 1.

In this proof we only consider points x in this set and that moreover satisfy the conclusions of Propositions 4.2.2 and 4.2.3, which leaves a subset of X of full µ-measure. Assume with no loss of generality that f ≥ 0 (otherwise we write f = f + − f − , and we consider each part separately). For n ≥ 1 let fn (x) = f (x) + f (T x)ω1 (x) + · · · + f (T n−1 x)ωn−1 (x), gn (x) = ω0 (x) + ω1 (x) + · · · + ωn−1 (x). Moreover, set f (x) = lim sup n→∞

fn (x) , gn (x)

and

f (x) = lim inf n→∞

fn (x) . gn (x)

By Proposition 4.2.2, one has gn+m (x) = gn (x) + ωn (x)gm (T n x). Using Exercise 4.2.1 and Proposition 4.2.3, we will show that f and f are T -invariant. To this end, fn (T x) n→∞ gn (T x) fn+1 (x) − f (x) ω1 (x) = lim sup g (x) − g1 (x) n→∞ n+1 ω1 (x) fn+1 (x) − f (x) = lim sup n→∞ gn+1 (x) − g1 (x)   fn+1 (x) gn+1 (x) f (x) · − = lim sup gn+1 (x) gn+1 (x) − g1 (x) gn+1 (x) − g1 (x) n→∞ fn+1 (x) = lim sup n→∞ gn+1 (x) = f (x).

f (T x) = lim sup

Similarly f is T -invariant. Now, to prove that f∗ exists, is integrable and T -invariant, it is enough to show that Z X

f dµ ≥

Z X

f dµ ≥

Z

f dµ. X

60  A First Course in Ergodic Theory

For since f − f ≥ 0, this would imply that f = f = f∗ µ-a.e. We first R R prove that X f dµ ≤ X f dµ. Fix any 0 <  < 1, and let L > 0 be any positive real number. By definition of f , for any x there exists an integer m > 0 such that fm (x) ≥ min(f (x), L)(1 − ). gm (x) Now, for any δ > 0 there exists an integer M > 0 such that the set X0 = {x : ∃ 1 ≤ m ≤ M with fm (x) ≥ gm (x) min(f (x), L)(1 − )} has measure at least 1 − δ. Define F by (

F (x) =

f (x), L,

if x ∈ X0 , if x ∈ / X0 .

Notice that f ≤ F (why?). For any x and n ≥ 0, let an = an (x) = F (T n x)ωn (x), and bn = bn (x) = min(f (x), L)(1 − )ωn (x). We now show that (an ) and (bn ) satisfy the hypothesis of Lemma 3.1.1 with M > 0 as above. For any n ≥ 0 the following holds. • If T n x ∈ X0 , then there exists an 1 ≤ m ≤ M such that fm (T n x) ≥ min(f (x), L)(1 − )gm (T n x). Hence, ωn (x)fm (T n x) ≥ min(f (x), L)(1 − )gm (T n x)ωn (x). Now, with Proposition 4.2.2 we get bn + · · · + bn+m−1 = min(f (x), L)(1 − )gm (T n x)ωn (x) ≤ ωn (x)fm (T n x) = f (T n x)ωn (x) + f (T n+1 x)ωn+1 (x) + · · · + f (T n+m−1 x)ωn+m−1 (x) ≤ F (T n x)ωn (x) + F (T n+1 x)ωn+1 (x) + · · · + F (T n+m−1 x)ωn+m−1 (x) = an + an+1 + · · · + an+m−1 . • If T n x ∈ / X0 , then take m = 1 since an = F (T n x)ωn (x) = Lωn (x) ≥ min(f (x), L)(1 − )ωn (x) = bn .

More Ergodic Theorems  61

Hence by T -invariance of f and Lemma 3.1.1, for all integers N > M one has F (x) + F (T x)ω1 (x) + · · · + F (T N −1 x)ωN −1 (x) ≥ min(f (x), L)(1 − )gN −M (x). Integrating both sides, and using Proposition 4.2.1 together with the T -invariance of f one gets Z

N

F dµ ≥

X

Z

min(f (x), L)(1 − )gN −M (x) dµ(x)

X

= (N − M )

Z

min(f (x), L)(1 − ) dµ(x).

X

Since

Z

Z

f dµ + Lµ(X \ X0 ),

F dµ = X

X0

one has Z

f dµ ≥

X

Z

f dµ ZX0

F dµ − Lµ(X \ X0 )

= X

(N − M ) ≥ N

Z

min(f (x), L)(1 − ) dµ(x) − Lδ.

X

Now letting first N → ∞, then δ → 0, then  → 0, and lastly L → ∞ one gets together with the Monotone Convergence Theorem that f is integrable, and Z

f dµ ≥

X

Z

f dµ. X

We now prove that Z X

f dµ ≤

Z

f dµ. X

Fix  > 0. Then for any x there exists an integer m such that fm (x) ≤ (f (x) + ). gm (x) Fix δ0 > 0. Then there is aR δ > 0 such that for any measurable set A with µ(A) < δ it holds that A f dµ < δ0 . Fix such a δ. Then there exists an integer M > 0 such that the set Y0 = {x : ∃ 1 ≤ m ≤ M with fm (x) ≤ (f (x) + )gm (x)}

62  A First Course in Ergodic Theory

has measure at least 1 − δ. Define G by (

G(x) =

f (x), 0,

if x ∈ Y0 , if x ∈ / Y0 .

Notice that G ≤ f . For any n ≥ 0 let bn = G(T n x)ωn (x), and an = (f (x) + )ωn (x). We now check that the sequences (an ) and (bn ) satisfy the hypothesis of Lemma 3.1.1 with M > 0 as above. • If T n x ∈ Y0 , then there exists an 1 ≤ m ≤ M such that fm (T n x) ≤ (f (x) + )gm (T n x). Hence, ωn (x)fm (T n x) ≤ (f (x) + )gm (T n x)ωn (x) = (f (x) + )(ωn (x) + · · · + ωn+m−1 (x)). By Proposition 4.2.2, and the fact that f ≥ G, one gets bn + · · · + bn+m−1 = G(T n x)ωn (x) + · · · + G(T n+m−1 x)ωn+m−1 (x) ≤ f (T n x)ωn (x) + · · · + f (T n+m−1 x)ωn+m−1 (x) = ωn (x)fm (T n x) ≤ (f (x) + )(ωn (x) + · · · + ωn+m+1 (x)) = an + · · · + an+m−1 . • If T n x ∈ / Y0 , then take m = 1 since bn = G(T n x)ωn (x) = 0 ≤ (f (x) + )ωn (x) = an . Hence by Lemma 3.1.1 one has for all integers N > M , G(x) + G(T x)ω1 (x) + · · · + G(T N −M −1 x)ωN −M −1 (x) ≤ (f (x) + )gN (x). Integrating both sides yields (N − M )

Z X

G dµ ≤ N

Z X



f dµ +  .

More Ergodic Theorems  63

R

Since µ(X \ Y0 ) < δ, then Z

X\Y0

f dµ < δ0 . Hence,

Z

Z

f dµ = X

G dµ + X

f dµ X\Y0

N (f (x) + ) dµ(x) + δ0 . N −M X Now, letting first N → ∞, then δ → 0, δ0 → 0 and finally  → 0, one gets Z Z Z



f dµ ≤

X

This shows that

Z

f dµ. X

f dµ ≥

Z

X

f dµ ≥

Z

f dµ, X

X

hence, f = f = f∗ µ-a.e., and f∗ is T -invariant. The last statement follows from Theorem 2.2.2 and the fact that µ is a probability measure. Example 4.2.1. Consider the binary odometer from Example 1.3.11. There we have seen that T is non-singular with respect to the product 

m(x)

p , measure µp on X = {0, 1}N and that ω1 (x) = dµdµp ◦T (x) = 1−p p where m(x) = inf{n ≥ 1 : xn = 0} − 2. By Proposition 4.2.2, one has

ωn (x) = ω1 (x)ω1 (T x) · · · ω1 (T

n−1



x) =

p 1−p

m(x)+m(T x)+···+m(T n−1 x)

for µp -a.e. x ∈ X. The map T is in fact ergodic (see Example 6.2.1) and conservative (by Exercise 11.2.2). Consider the function f on X given R by f ((x1 , x2 , . . .)) = x1 , then f is integrable with X f dµp = 1 − p. Furthermore, for each n ≥ 1,

n−1 X

f (T i x)ωi (x) =

i=0

 b(n−1)/2c  X    ω2i (x),      i=0

if x1 = 1,

  bn/2c   X   ω2i−1 (x),  

if x1 = 0.

i=1

In either case, by the Hurewicz Ergodic Theorem we have for µp -a.e. x, n−1 X

lim

n→∞

f (T i x)ωi (x)

i=0 n−1 X i=0

= 1 − p. ωi (x)

CHAPTER

5

Isomorphisms and Factor Maps

In this chapter we only consider dynamical systems on probability spaces. Given two such dynamical systems (X, F, µ, T ) and (Y, G, ν, S), it is natural to wonder when these are actually two versions of the same thing. What would it mean for two systems to be the same? And when could one system be considered as a subsystem of the other? In this chapter we define the notions that make this precise.

5.1

MEASURE PRESERVING ISOMORPHISMS

Let (X, F, µ, T ) and (Y, G, ν, S) be two dynamical systems on probability spaces with T and S measure preserving. On each space there are two important structures: (1) the measure structure given by the σ-algebra and the probability measure. Note that in this context sets of measure zero can be ignored; (2) the dynamical structure given by the measure preserving transformation. So our notion of being the same must mean that we have a map ψ : (X, F, µ, T ) → (Y, G, ν, S) that preserves both these structures, i.e., that satisfies (i) ψ is one-to-one and onto, (ii) ψ is measurable (ψ −1 (G) ∈ F, for all G ∈ G) and so is ψ −1 , 65

66  A First Course in Ergodic Theory

(iii) ψ preserves the measures, i.e., ν = µ ◦ ψ −1 , and the same holds for ψ −1 , (iv) ψ preserves the dynamics of T and S, so ψ ◦ T = S ◦ ψ, which is the same as saying that the diagram from Figure 5.1 commutes, and for all these properties it is sufficient that they hold almost everywhere. X

T

X

ψ

ψ Y

Figure 5.1

S

Y

A commuting diagram.

The last property means that T -orbits are mapped to S-orbits: x → Tx → T 2x → ··· ↓ ↓ ↓ ↓ 2 ψ(x) → Sψ(x) → S ψ(x) → · · ·

T nx → ··· ↓ → S n ψ(x) → · · ·



This leads to the following definition. Definition 5.1.1. Two dynamical systems (X, F, µ, T ) and (Y, G, ν, S) on probability spaces are (measure preservingly or metrically) isomorphic if there exist measurable sets N ⊆ X and M ⊆ Y with µ(N ) = 0 = ν(M ) and T (X \ N ) ⊆ X \ N , S(Y \ M ) ⊆ Y \ M , and finally if there exists a measurable map ψ : X \ N → Y \ M such that (i)–(iv) are satisfied for the systems restricted to X \ N and Y \ M . The map ψ is called a (measurable or metric) isomorphism. Exercise 5.1.1. Suppose (X, F, µ, T ) and (Y, G, ν, S) are two isomorphic dynamical systems. Show that (a) T is ergodic if and only if S is ergodic, (b) T is weakly mixing if and only if S is weakly mixing, (c) T is strongly mixing if and only if S is strongly mixing. Example 5.1.1. Let S1 = {z ∈ C : |z| = 1} be equipped with the Lebesgue σ-algebra Bˆ on S1 , and Haar measure (i.e., normalized Lebesgue measure on the unit circle). Define S : S1 → S1 by Sz = z 2 ; equivalently Se2πiθ = e2πi(2θ) . Figure 5.2(a) shows the graph. One can

Isomorphisms and Factor Maps  67

easily check that S is measure preserving. In fact, the map S is isomorphic to the doubling map T on ([0, 1), B, λ) given by T x = 2x (mod 1) (see Examples 1.3.3 and 2.3.3). Define the map ψ : [0, 1) → S1 by ψ(x) = e2πix . We leave it to the reader to check that ψ is an isomorphism. Example 5.1.2. Let T : [0, 1) → [0, 1) be the ×N transformation given by T x = N x − bN xc for some integer N ≥ 2 as in Section 3.2. See Figure 5.2(b) for the graph. As we have seen iterations of T provide for each x ∈ [0, 1) that is not of the form Nki a unique base N expanP ak sion x = ∞ k=1 N k with ak ∈ {0, 1, . . . , N − 1} for all k ≥ 1 and with the property that there is no k such that aj = N − 1 for all j ≥ k. Let Y := {0, 1, . . . , N − 1}N . We construct an isomorphism between ([0, 1), B, λ, T ) and the Bernoulli shift (Y, C, µ, S) (see Example 1.3.9) with µ the uniform Bernoulli measure defined on cylinders by µ({(yi )i≥1 ∈ Y : y1 = a1 , y2 = a2 , . . . , yn = an }) =

1 . Nn

Define ψ : [0, 1) → Y by ψ: x=

∞ X ak k=1

where

P∞

ak k=1 N k

Nk

7→ (ak )k≥1 ,

is the base N expansion of x produced by T . Let

C(i1 , . . . , in ) = {(yi )i≥1 ∈ Y : y1 = i1 , . . . , yn = in }. In order to see that ψ is an isomorphism we verify measurability and measure preservingness on such cylinders: ψ −1 (C(i1 , . . . , in )) =



i1 i2 in i1 i2 in + 1 + 2 + ··· + n, + 2 + ··· + N N N N N Nn

and λ ψ −1 (C(i1 , . . . , in )) = 



1 = µ(C(i1 , . . . , in )). Nn

Note that N = {(yi )i≥1 ∈ Y : ∃ k ≥ 1, yi = N − 1 for all i ≥ k} is a subset of Y of measure 0. Then ψ : [0, 1) → Y \ N is a bijection. Finally, it is easy to see that ψ ◦ T = S ◦ ψ.

68  A First Course in Ergodic Theory

Sz

S 2θ

1

z

1

θ

··· 1 0

1 2 N N

1

0

(b) ×N

(a) Doubling Figure 5.2

N −1 N

1 2

1

(c) Tent

The transformations from Examples 5.1.1, 5.1.2 and 5.1.3.

Example 5.1.3. Consider the doubling map T x = 2x (mod 1) and the tent map F : [0, 1] → [0, 1] defined by F x = min{2x, 2 − 2x}, see Figure 5.2(c), both on the space ([0, 1], B, λ). (We can take T 1 = 1 for example.) As in Example 1.3.3 we can associate to each x ∈ [0, 1] an infinite sequence (cn ) ∈ {0, 1}N that codes the orbit of x under F by setting for each n ≥ 1, (

cn = cn (x) =

0, if 0 ≤ F n−1 x < 21 , 1, if 12 ≤ F n−1 x ≤ 1.

Very similarly to Example 5.1.2 one can show that the map φ : [0, 1] → {0, 1}N defined by φ(x) = (cn )n≥1 is an isomorphism between ([0, 1], B, λ, F ) and the uniform Bernoulli shift ({0, 1}N , C, µ, S). The composition ψ ◦ φ−1 : [0, 1] → [0, 1] of φ with the isomorphism from Example 5.1.2 then becomes an isomorphism between ([0, 1], B, λ, T ) and ([0, 1], B, λ, F ). To show that two systems are isomorphic it is enough to identify one isomorphism. Proving that two systems are not isomorphic requires different techniques. Example 5.1.4 (Positive and negative β-transformations). The √ 1+ 5 β-transformation was defined in Example 1.3.6 for β = 2 , the golden mean. In Exercise 1.3.3 the relation between this map and number expansions in base β was identified. One can define a β-transformation for any value β ∈ (1, 2) on the interval [0, 1] by Tβ = βx (mod 1). Each of these transformations has an invariant probability measure µβ that is equivalent to the Lebesgue measure (i.e., has the same null sets). The density of this measure was given in Example 1.3.6 for the golden mean and we will give it for general β ∈ (1, 2) in Example 6.3.4. One can also

Isomorphisms and Factor Maps  69

consider the corresponding negative β-transformation Sβ : [0, 1] → [0, 1] defined by  1 − βx, if 0 ≤ x < 1 , β Sβ x = 2 − βx, if 1 ≤ x ≤ 1. β This map is related to β-expansions of numbers in [0, 1] with a negative base −β and digits 1, 2, i.e., each x ∈ [0, 1] can be expressed as a sum P −bi x= ∞ i=1 (−β)i with bi ∈ {1, 2} for each i ≥ 1. We show that if 1 < β < √ 1+ 5 2 ,

then Tβ and Sβ are not isomorphic.

Assume that φ : [0, 1] → [0, 1] is a map that satisfies φ ◦ Tβ = Sβ ◦ φ. Then for each n ≥ 1 it must hold that φ ◦ Tβn = Sβn ◦ φ. Figure 5.3 shows the first three iterates of the maps Tβ and Sβ for some value √ 1+ 5 β < 2 . Let I be the set of x ∈ [0, 1) that have precisely two preimage under Tβ3 . As one can see from Figure 5.3(c) I = [T 2 (β − 1), 1), so µβ (I) > 0. On the other hand, there are no points x that have precisely two pre-images under Sβ , so φ(I) = ∅. This shows that no matter what invariant probability measure νβ we would consider for Sβ , we would always get that 0 < µβ (I) 6= νβ (φ(I)) = 0. So, no map φ can satisfy both conditions (iii) and (iv) from Definition 5.1.1, which implies that ([0, 1), B, µβ , Tβ ) cannot be isomorphic to ([0, 1], B, νβ , Sβ ) for any probability space ([0, 1], B, νβ ). In fact in [26] it was shown that for any β ∈ (1, 2) that is not a so-called multinacci number, i.e., the positive real root of a polynomial of the form xk − xk−1 − · · · − x − 1 = 0,

k ≥ 2,

(5.1)

the system ([0, 1), B, µβ , Tβ ) cannot be isomorphic to ([0, 1], B, νβ , Sβ ) for any probability space ([0, 1], B, νβ ). Exercise 5.1.2. Let T : [0, 1)2 → [0, 1)2 be the baker’s transformation from Exercise 1.3.1 given by (

T (x, y) =

2x, y2 , if 0 ≤ x < 12 ,  1 2x − 1, y+1 2 , if 2 ≤ x < 1. 

Show that T is isomorphic to the two-sided Bernoulli shift S on ({0, 1}Z , C, µ) with µ the uniform Bernoulli measure. Exercise 5.1.3. Consider the measurable space (R, B), where R is the

70  A First Course in Ergodic Theory 1

0

1

1 β

0

1

1

0

1

1 β2

1 β

0

1

1

1 β

0

1

1 β2

1 β

1

1 β

1

1

1 β

0

1

(b) Tβ2 (top) and Sβ2 (bottom)

(a) Tβ (top) and Sβ (bottom)

I

(c) Tβ3 (top) and Sβ3 (bottom)

The first three iterates of the positive and negative βtransformations for β = 23 from Example 5.1.4. The interval I of points that have precisely two pre-images under Tβ3 is indicated in the top picture of (c). Figure 5.3

real line and B is the Lebesgue σ-algebra. Define a transformation T : R → R by T 0 = 0 and 1 1 Tx = x− 2 x for x 6= 0. Consider the measure µ on B given by Z

µ(A) = A

1 dλ(x), π(1 + x2 )

for all A ∈ B.

(a) Show that T is measure preserving with respect to the probability measure µ. (b) Let ψ : R → [0, 1) be defined by ψ(x) =

1 1 arctan x + . π 2

Show that ψ is a measurable isomorphism between the dynamical systems (R, B, µ, T ) and ([0, 1), B ∩ [0, 1), λ, S), where Sx = 2x (mod 1) is the doubling map. (c) Show that T is strongly mixing with respect to µ.

Isomorphisms and Factor Maps  71

5.2

FACTOR MAPS

In the above section, we discussed the notion of isomorphism which describes when two dynamical systems are considered the same. Now, we give a precise definition of what it means for a dynamical system to be a subsystem of another one. Definition 5.2.1. Let (X, F, µ, T ) and (Y, G, ν, S) be two dynamical systems. We say that S is a factor of T if there exist measurable sets N ∈ F and M ∈ G, such that µ(N ) = 0 = ν(M ) and T (X \ N ) ⊆ X \ N and S(Y \ M ) ⊆ Y \ M , and finally if there exists a measurable and measure preserving map ψ : X \ N → Y \ M that is surjective, and satisfies ψ(T x) = Sψ(x) for all x ∈ X \ N . We call ψ a factor map. Remark 5.2.1. Notice that if ψ is a factor map, then E = ψ −1 G is a T -invariant sub-σ-algebra of F, since T −1 E = T −1 ψ −1 G = ψ −1 S −1 G ⊆ ψ −1 G = E. Example 5.2.1. As in Exercise 5.1.2 let T be the baker’s transformation on ([0, 1)2 , B, λ). Let (X = {0, 1}N , C, µ, S) be the uniform Bernoulli shift (see Example 1.3.9). Define ψ : [0, 1)2 → X by ψ(x, y) = (a1 , a2 , . . .), P∞

where x = n=1 a2nn is the binary expansion of x. It is easy to check that ψ is a factor map. Exercise 5.2.1. Let T be the left shift on X = {0, 1, 2}N which is endowed with the σ-algebra C generated by the cylinder sets, and the uniform Bernoulli measure µ, giving each symbol probability 13 , i.e., µ ({x ∈ X : x1 = i1 , x2 = i2 , . . . , xn = in }) =

1 , 3n

for all i1 , i2 , . . . , in ∈ {0, 1, 2}. Let S be the left shift on Y = {0, 1}N which is endowed with the σ-algebra D generated by the cylinder sets, and the product measure ν giving the symbol 0 probability 31 and the symbol 1 probability 23 , i.e., µ {y ∈ Y : y1 = j1 , y2 = j2 , . . . , yn = jn }



 j1 +···+jn  n−(j1 +···+jn )

=

2 3

1 3

where j1 , j2 , . . . , jn ∈ {0, 1}. Show that S is a factor of T .

,

72  A First Course in Ergodic Theory

Exercise 5.2.2. Show that a factor of an ergodic (weakly mixing/strongly mixing) transformation is also ergodic (weakly mixing/strongly mixing).

5.3

NATURAL EXTENSIONS

Exercise 5.2.2 shows that many properties of a dynamical system carry over to a factor. This fact can also be put to use in the other direction; sometimes the properties of a system can be described better by embedding it in a larger system that is easier to analyze, for example by choosing the larger system to be invertible. A natural extension is such a larger system. By putting a minimality condition on it, it is guaranteed that many properties of the original system also hold for the natural extension. Definition 5.3.1. Let (Y, G, ν, S) be a non-invertible measure preserving dynamical system. An invertible measure preserving dynamical system (X, F, µ, T ) is called a natural extension of (Y, G, ν, S) if S is a factor of T and the factor map ψ satisfies ∞ _

T n ψ −1 (G) = F,

n=0 n −1 where ∞ n=0 T ψ (G) is the smallest σ-algebra containing the σ-algebras k −1 T ψ (G) for all k ≥ 0.

W

Example 5.3.1. Let T on ({0, 1}Z , C, µ) be the two-sided Bernoulli shift, and S on ({0, 1}N , G, ν) the one-sided Bernoulli shift, both spaces are endowed with the uniform Bernoulli measure. Notice that T is invertible, while S is not. Define ψ : {0, 1}Z → {0, 1}N by ψ(. . . , x−1 , x0 , x1 , . . .) = (x1 , x2 , . . .). Then, ψ is a factor map. We claim that ∞ _

T n ψ −1 (G) = C.

n=0

To prove this, we show that erating C.

W∞

n=0

T n ψ −1 (G) contains all cylinders gen-

Let ∆ = {x ∈ X : x−k = a−k , . . . , x` = a` } be an arbitrary cylinder

Isomorphisms and Factor Maps  73

in C, and let D = {y ∈ Y : y1 = a−k , . . . , yk+`+1 = a` } which is a cylinder in G. Then, ψ −1 (D) = {x ∈ X : x1 = a−k , . . . , xk+`+1 = a` } and T k+1 ψ −1 (D) = ∆. This shows that ∞ _

T n ψ −1 (G) = C.

n=0

Thus, T is a natural extension of S. Natural extensions were extensively studied by V. A. Rohlin in [55]. In 1961 he gave a canonical construction of a natural extension that resembles the idea of converting a one-sided shift to a two-sided shift. We will outline the construction and end with a theorem summarizing some important results. Start with a non-invertible measure preserving system (Y, G, ν, S) with ν a probability measure. As the underlying space of the natural extension we take the set X = {x = (x0 , x1 , . . .) : xi ∈ Y and Sxi+1 = xi , i ≥ 0}. The sequence (x1 , x2 , . . .) can be seen as a possible past of the point x0 and is typically not unique due to the non-invertibility of S. On X we consider the product σ-algebra generated by sets of the form Ai,C = {x = (x0 , x1 , . . .) : xi ∈ C}, with C ∈ G and i ≥ 0. Note that any cylinder set can be written as a set of this form, since if C0 , C1 , . . . , Cn ∈ G, then {x = (x0 , x1 , . . .) : xj ∈ Cj , 0 ≤ j ≤ n} = An,S −n C0 ∩S −(n−1) C1 ∩···∩Cn . On (X, F) we consider the measure µ defined on the generating sets by µ(Ai,C ) = ν(C). Note that if C0 , C1 , . . . , Cn ∈ G, then µ({x = (x0 , x1 , . . .) : xj ∈Cj , 0 ≤ j ≤ n}) = ν(S −n C0 ∩ S −(n−1) C1 ∩ . . . ∩ Cn ). The above shows that we have a pre-measure on the collection of all cylinders so by the Carath´eodory Extension Theorem the measure µ is a well defined probability measure on F. On the probability space (X, F, µ) we consider the transformation T : X → X defined by T x = T (x0 , x1 , x2 , . . .) = (Sx0 , Sx1 , Sx2 , . . .) = (Sx0 , x0 , x1 , . . .).

74  A First Course in Ergodic Theory

Since T −1 A0,C = A0,S −1 C and T −1 Ai,C = Ai−1,C for i > 0, we see that T is measurable and measure preserving. Furthermore, T is invertible with inverse defined by T −1 (x0 , x1 , . . .) = (x1 , x2 , . . .). Since T Ai,C = Ai+1,C for i ≥ 0, we see that T −1 is invertible and measure preserving. Using the factor map ψ : X → Y defined by ψ(x0 , x1 , . . .) = x0 , one can easily check that Definition 5.3.1 is satisfied, so that the invertible dynamical system (X, F, µ, T ) is a natural extension of (Y, G, ν, S). Theorem 5.3.1 (Rohlin). Let (Y, G, ν) be a probability space and S : Y → Y a measure preserving and non-invertible transformation. (i) There exists a natural extension (X, F, µ, T ) of (Y, G, ν, S). (ii) If (X, F, µ, T ) and (X 0 , F 0 , µ0 , T 0 ) are two natural extensions of (Y, G, ν, S), then (X, F, µ, T ) and (X 0 , F 0 , µ0 , T 0 ) are isomorphic. (iii) If S is ergodic/weakly mixing/strongly mixing and (X, F, µ, T ) is a natural extension, then T is ergodic/weakly mixing/strongly mixing. From Theorem 5.3.1(ii) we see that a natural extension of a dynamical system is unique up to isomorphism. With this in mind we can therefore refer to the natural extension of a system. Exercise 5.3.1. Prove that if (X, F, µ, T ) is a natural extension of (Y, G, ν, S) and S is strongly mixing, then T is strongly mixing. Exercise 5.3.2. Let T : [0, 1) → [0, 1) be the L¨ uroth map from Example 1.3.5 defined by (

Tx =

1 n(n + 1)x − n, if x ∈ n+1 , n1 , n ≥ 1, 0, if x = 0.





Recall from Exercise 1.3.2 that Lebesgue measure λ is a T -invariant measure and that for any point x ∈ [0, 1) with T k x 6= 0 for all k the 1 sequence (ak ) defined by ak = ak (x) = n if T k−1 x ∈ n1 , n−1 , n ≥ 2, gives x=

1 1 1 + + ··· + + ··· . a1 a1 (a1 − 1)a2 a1 (a1 − 1) · · · ak−1 (ak−1 − 1)ak

(a) Consider the Bernoulli shift (X, C, µ, S), where X = {2, 3, · · · }N and 1 µ is the product measure with µ({x : x1 = j}) = j(j−1) . Show that ([0, 1), B, λ, T ) and (X, C, µ, S) are isomorphic.

Isomorphisms and Factor Maps  75

(b) Show that T is strongly mixing. (c) Consider the product space ([0, 1)2 , B 2 , λ2 ). Define the transformation T : [0, 1)2 → [0, 1)2 by   T x, y+n , n(n+1) T (x, y) = (0, 0),

if x ∈



1 1 n+1 , n



,

if x = 0.

Show that ([0, 1)2 , B 2 , λ2 , T ) is the natural extension of ([0, 1), B, λ, T ). Exercise 5.3.3. Let β > 1 be the real number satisfying β 3 = β 2 +β +1 (sometimes called the tribonacci number, see also (5.1)) and consider the β-transformation Tβ : [0, 1) → [0, 1) given by Tβ x = βx (mod 1). Define a measure ν on the Lebesgue σ-algebra B by Z

ν(A) =

h dλ, A

where     

h(x) =

   

1 1 + β22 β

+ β33

1 + β22 β

+ β33

1 + β22 β

+ β33

1

1

1+ 1+

1 1 β + β2  1 β ,



, if x ∈ 0, β1 , 

if x ∈ if x ∈

,



1

1 1 β , β + β2 ,  1 1 β + β2 , 1 .



(a) Show that Tβ is measure preserving with respect to ν. Let 1 1 1 1 × [0, 1) ∪ , + β β β β2     1 1 1 ∪ . + , 1 × 0, β β2 β



X=

0,











× 0,

1 1 + β β2



Let D be the restriction of the two-dimensional Lebesgue σ-algebra on X, and λ2 the normalized (two-dimensional) Lebesgue measure on (X, D). Define on X the transformation Tβ by   1 Tβ (x, y) = Tβ x, (bβxc + y) . β

(b) Show that Tβ is measurable and measure preserving with respect to λ2 . Prove also that Tβ is one-to-one and onto λ2 -a.e. (c) Show that Tβ is the natural extension of Tβ .

CHAPTER

6

The Perron-Frobenius Operator

As a starting point in much of the theory developed so far we took a dynamical system (X, F, µ, T ) with a transformation T : X → X that is measure preserving with respect to µ. In this chapter we consider the existence of invariant measures.

6.1

ABSOLUTELY CONTINUOUS INVARIANT MEASURES

Typically a dynamical system has many different invariant measures. Exercise 6.1.1. (a) Let X be a set and T : X → X a transformation. Suppose that x ∈ X is a periodic point for T , i.e., it satisfies T n x = x for some n ≥ 1 and T j x 6= x for all j < n. Prove that the measure µ defined by X 1 n−1 µ(A) = δ j (A), n j=0 T x for any subset A ⊆ X, where δy denotes the Dirac measure at y, is invariant for T . (b) Use part (a) to find infinitely many invariant measures for Arnold’s cat map from Example 1.3.8. So any periodic point of a transformation T can be associated to an invariant measure. The next example shows that there can still be many others. Example 6.1.1. Consider the doubling map T x = 2x (mod 1) on the 77

78  A First Course in Ergodic Theory

unit interval [0, 1]. As we have seen in Example 1.3.3 Lebesgue measure is invariant for T . From the previous exercise we know that for example the measure 12 δ 1 + 12 δ 2 is also invariant for T . Now, let 0 < p < 1 3 3 and ({0, 1}N , C, mp , S) be the Bernoulli shift on two symbols, where C is the σ-algebra generated by the cylinders, mp is the (p, 1 − p)-Bernoulli measure and S is the left shift as in Example 1.3.9. It was shown in Example 2.3.2 that this system is ergodic for any choice of p. Let B be the Lebesgue σ-algebra on [0, 1], fix a p 6= 21 and define the measure µp on ([0, 1], B) by µp = mp ◦ ψ, where ψ(x) = ψ

X b  n n≥1

2n

= (bn )n≥1

is the sequence of binary digits of x generated by T , so that (bn )n≥1 does not end in an infinite string of 1’s. It follows from Example 5.1.2 that ψ is an isomorphism between the systems ([0, 1], B, µp , T ) and ({0, 1}N , C, mp , S). Since mp is ergodic for S, so is µp for T . It then follows from Theorem 3.1.2(ii) that λ and µp are singular. For systems defined on a subset of Rn , invariant measures that are singular with respect to Lebesgue measure see very little of the “observable” dynamics of the system. One can wonder if and under which conditions invariant measures that are absolutely continuous with respect to some reference measure on the space X exist. To this end we introduce the Perron-Frobenius operator. Let (X, F, m, T ) be a dynamical system. To any non-negative function f ∈ L1 (X, F, m) we can associate a measure on X by setting Z

µ(A) =

f dm

for all A ∈ F,

A

so that µ  m. What does the measure µ ◦ T −1 look like? Is it also absolutely continuous with respect to m? Suppose this is the case, then the corresponding density, let’s call it PT f , would satisfy Z

PT f dm = µ(T

−1

A

Z

A) =

f dm. T −1 A

To see that PT f exists, define the measure (check) ν on (X, F) by setting Z

ν(A) =

f dm,

for all A ∈ F.

T −1 A

Note that by the integrability of f the measure ν is a finite measure. The

The Perron-Frobenius Operator  79

non-singularity of T then implies that ν  m, so that by the RadonNikodym Theorem a unique element PT f ∈ L1 (X, F, m) exists such that Z ν(A) = PT f dm, for all A ∈ F. A

This motivates the following definition. Definition 6.1.1. Let (X, F, m) be a measure space and T : X → X a transformation. Then for each f ∈ L1 (X, F, m), denote by PT f the unique element in L1 (X, F, m) that satisfies Z

Z

PT f dm =

for all A ∈ F.

f dm T −1 A

A

The operator PT : L1 (X, F, m) → L1 (X, F, m) is called the PerronFrobenius operator for T . Note that the definition of PT f is up to m-a.e. equivalence. The Perron-Frobenius operator has some nice properties listed below. Proposition 6.1.1. Let (X, F, m, T ) be a dynamical system. (i) PT is linear. (ii) PT is positive. (iii) PT preserves the integral, i.e., for any f ∈ L1 (X, F, m) we have Z

Z

PT f dm = X

f dm. X

(iv) PT : L1 (X, F, m) → L1 (X, F, m) is a contraction, i.e., kPT f k1 ≤ kf k1 for any f ∈ L1 (X, F, m). Exercise 6.1.2. Prove Proposition 6.1.1. The Perron-Frobenius operator also has the following composition property. Proposition 6.1.2. Let T : X → X and S : X → X be two transformations on the same measure space (X, F, m). Then PT ◦S = PT ◦ PS . In particular, PT n = PTn .

80  A First Course in Ergodic Theory

Proof. Let f ∈ L1 (X, F, m). The non-singularity of T and S implies that also the transformation T ◦ S : X → X is non-singular, so that the operator PT ◦S exists. For each A ∈ F we have Z

Z

Z

PT ◦S f dm = A

f dm = S −1 (T −1 A)

T −1 A

Z

Z

=

PT (PS f ) dm = A

PS f dm

PT ◦ PS f dm.

A

Hence, PT ◦S = PT ◦ PS m-a.e. The following proposition shows why we are interested in the PerronFrobenius operator. Proposition 6.1.3. Let (X, F, m, T ) be a dynamical system. For a nonnegative function f ∈ L1 (X, F, m) it holds that PT f = f if and only if T is measure preserving with respect to the measure µ on (X, F) given by Z µ(A) =

f dm

for all A ∈ F.

A

Proof. Let A ∈ F be given. Then the statement follows from µ(T −1 A) =

Z

Z

f dm = T −1 A

PT f dm. A

A function f ∈ L1 (X, F, m) that satisfies PT f = f is called a fixed point of the operator PT . Hence, the Perron-Frobenius operator can be used to verify whether a measure that is absolutely continuous with respect to m is an invariant measure by checking whether the corresponding density is a fixed point of the operator. The next proposition shows that fixed points that are not necessarily non-negative give invariant measures through their positive and negative parts. Proposition 6.1.4. Let (X, F, m, T ) be a dynamical system and let f ∈ L1 (X, F, m). Then PT f = f if and only if PT f + = f + and PT f − = f − , where f + = max{f, 0} and f − = max{−f, 0}. Proof. One direction follows directly from the linearity of PT . For the other direction, assume that PT f = f . Then from the linearity and positivity of PT it follows that f + = (PT f )+ = max{PT f + − PT f − , 0} ≤ PT f +

The Perron-Frobenius Operator  81

and hence, Z

PT f + − f + dm ≥ 0.

X

Similarly f − ≤ PT f − and 0≤

Z

R X

PT f − − f − dm ≥ 0. So,

PT f + − f + dm +

Z

PT f − − f − dm

X

X

Z



+

PT (f + f ) dm −

= X

Z

f + + f − dm

X

= kPT |f |k1 − kf k1 ≤ 0, where the last step follows from Proposition 6.1.1(iv). Thus, PT f + = f + and PT f − = f − . The next theorem says that if T is ergodic with respect to m, then there can be at most one T -invariant probability measure that is absolutely continuous with respect to m. Theorem 6.1.1. Let (X, F, m, T ) be a dynamical system with T : X → X ergodic with respect to m. Then there is at most one h ∈ L1 (X, F, m) R that satisfies h ≥ 0, X h dm = 1 and PT h = h. Proof. Suppose that g, h ∈ L1 (X, F, m) are two densities that satisfy the R properties from the theorem. Set f = g − h. Then X f dm = 1 − 1 = 0 and by linearity, PT f = f . If either f (x) ≥ 0 for m-a.e. x ∈ X (i.e., g ≥ hR m-a.e.) or f (x) ≤ 0 for m-a.e. x ∈ X (i.e., h ≥ g m-a.e.), then from X f dm = 0 one gets f = 0 m-a.e., so g = h m-a.e. and we are done. So, assume that this is not the case. Write f = f + − f − and let E1 = {x ∈ X : f + (x) > 0} and E2 = {x ∈ X : f − (x) > 0}. Then by the assumption m(E1 ) > 0 and m(E2 ) > 0. The sets E1 and E2 are the supports of the functions f + and f − , respectively. We now use Proposition 6.1.4 to deduce a contradiction. Since PT f + = f + it holds that PT f + = 0 on X \ E1 . Therefore, Z

0=

+

PT f dm = X\E1

Z X\T −1 E

f + dm, 1

so f + (x) = 0 for m-a.e. x ∈ X \ T −1 E1 . Hence, by the definition of E1 we have E1 ⊆ T −1 E1 ∪ N for some set N of m-measure zero, so m(E1 \ T −1 E1 ) = 0. By the non-singularity of T it follows that for each k, n ≥ 0

82  A First Course in Ergodic Theory

with k ≤ n there is an m-null set Nk,n such that T −k E1 ⊆ T −n E1 ∪ Nk,n and similarly for E2 there is a m-null set Mk,n with T −k E2 ⊆ T −n E2 ∪ Mk,n . This leads to 0 ≤ m(T −k E1 ∩ T −n E2 ) ≤ m(T − min{k,n} E1 ∩ T − min{k,n} E2 ) = 0 (6.1) for each k, n ≥ 0, where the last step follows from the non-singularity S of T combined with the fact that E1 ∩ E2 = ∅. Both sets n≥0 T −n Ei , i = 1, 2, have positive measure, so that for i = 1, 2, [

0≤m

T

−n

−1

Ei ∆ T

n≥0



=m



T −n Ei ∆

[ n≥1

= m Ei \

T

−n



Ei

n≥0

Ei ∪



[

[ n≥1 −1

≤ m(Ei \ T

[

T −n Ei



n≥1

T −n Ei



Ei ) = 0.

 S

c 

−n Hence by ergodicity m Ei = 0. Since m(E1 ∩ E2 ) = 0, it n≥0 T −n −n holds that m(T E1 ∩ T E2 ) = 0 for all n ≥ 0, so also

[

m

T −n E1 ∩

n≥0



T −n E2 = 0.

[ n≥0

By (6.1), 0 0 it holds that lim µ(T n A) = 1.

(6.3)

n→∞

Proof. First assume that (6.3) holds for each positive measure set. Let T A ∈ n≥0 T −n F and assume for a contradiction that 0 < µ(A) < 1. Then for each n ≥ 0 there is a set Bn ∈ F such that A = T −n Bn . Fix an n ≥ 0. T is measure preserving, so 0 < µ(Bn ) = µ(A) < 1 and from T n A = T n (T −n Bn ) ⊆ Bn we obtain µ(T n A) ≤ µ(A) < 1. Since n was arbitrary, this contradicts (6.3). Hence, T is exact. For the other direction, assume T is exact and let A ∈ F be a set with µ(A) > 0. Note that for any n ≥ 0, T −n (T n A) ⊆ T −(n+1) (T n+1 A), so that the limit set B = lim T −n (T n A) = n→∞

[ n≥0

T −n (T n A) ∈

\ n≥0

T −n F.

86  A First Course in Ergodic Theory

From A ⊆ B, it follows that µ(B) = 1 and since T is measure preserving  −n n and the sequence T (T A) is increasing, we obtain lim µ(T n A) = lim µ(T −n (T n A)) = µ(B) = 1.

n→∞

n→∞

This gives the result. From this proposition it follows that an invertible measure preserving transformation cannot be exact. To see this, let A ∈ F be a measurable set with 0 < µ(A) < 1. Recall that invertibility of a transformation T implies that all sets T n A, n ≥ 1, are in F, so T is forward measurable. Then for any n ≥ 1, µ(T n A) = µ(T −n (T n A)) = µ(A) < 1. The following result on exactness is in the spirit of Theorem 6.2.1. Theorem 6.2.2. Let (X, F, µ) be a probability space and T : X → X forward measurable and measure preserving. Then T is exact if and only if for all f ∈ L1 (X, F, µ), Z

n

lim PT f − f dµ = 0.

n→∞

1

X

Proof. Assume T is exact and let f ∈ L1 (X, F, µ) be given. Note that for each n ≥ 1, T −n F ⊆ T −n+1 F, so the sequence (T −n F) is decreasing. Also note that by the measure preservingness of T for each n ≥ 1, Z Z Z Z n n f dµ ◦ T n dµ f dµ dµ = PT f − P T f − X X X X Z Z n n = f dµ dµ PT f ◦ T − X

X

and that by the exactness of T 

Z X

f dµ = Eµ

 \ −n f T F . n≥0

Let n ≥ 1. The function PTn f ◦ T n is T −n F-measurable. Moreover, if we take B ∈ F we see with Proposition 6.1.2 that for A = T −n B we get Z A

PTn f ◦ T n dµ =

Z T −n B

PTn f ◦ T n dµ

Z

=

Z

PT n f dµ = B

Z

f dµ = T −n B

f dµ. A

The Perron-Frobenius Operator  87

In other words, PTn f ◦ T n = Eµ (f | T −n F). Hence, since the sequence (T −n F) is decreasing, Z Z n lim f dµ dµ P T f − n→∞ X X   Z \ −n T F dµ = 0. = lim Eµ (f | T −n F) − Eµ f n→∞ X

n≥0

For the other direction, let A ∈ F with µ(A) > 0 be given. For each n ≥ 0 we have n

Z

µ(T A) =

1 dµ T nA

1 1 = PTn 1A dµ − P n 1A − µ(A) dµ n µ(A) T A µ(A) T n A T Z Z 1 1 1A dµ − |P n 1A − µ(A)| dµ ≥ µ(A) T −n (T n A) µ(A) X T Z 1 = 1− |P n 1A − µ(A)| dµ. µ(A) X T Z

By assumption it holds that limn→∞ exactness of T follows.

Z

R X

|PTn 1A − µ(A)| dµ = 0, so the

Note that the assumption of forward measurability in the previous theorem is only used in one direction. Exercise 6.2.2. Let T : X → X be a measure preserving transformation on a probability space (X, F, µ). Prove that if T is exact, then T is strongly mixing. The next results, which is due to V. A. Rohlin and can be found in [55], gives another way to verify whether a transformation is exact. Proposition 6.2.3. Let (X, F, µ) be a probability space and T : X → X a forward measurable and measure preserving transformation. Let S ⊆ F be a countable collection of sets of positive µ-measure with the following properties. (i) For every E ∈ F with µ(E) > 0 there are pairwise disjoint sets S Ak ∈ S, k ≥ 1, with µ(E∆ k≥1 Ak ) = 0.

88  A First Course in Ergodic Theory

(ii) There is a function n : S → N, such that µ(T n(A) A) = 1 for each A ∈ S. (iii) There is a constant γ ≥ 1, such that for each A ∈ S and for all measurable sets E ⊆ A we have µ(T n(A) E) ≤ γ ·

µ(E) . µ(A)

(6.4)

Then T is exact. Proof. Let E ∈ F be arbitrary with µ(E) > 0. We use Proposition 6.2.2 to prove the exactness of T . First note that by the measure preservingness of T we have µ(E) ≤ µ(T −1 (T E)) = µ(T E), so the sequence µ(T n E) n≥0 is increasing. Let δ > 0. We claim that there must be a set B ∈ S with the property that 

δ µ(B). γ 

µ(E ∩ B) > 1 −

(6.5)

To see that this is true, suppose that the opposite holds. Then by (i) there are pairwise disjoint sets Ak ∈ S, k ≥ 1, such that 

µ(E) = µ E ∩

[



An ≤

X k≥1

k≥1

 [ δ δ 1− µ Ak = 1 − µ(E), γ γ k≥1



=

δ µ(Ak ) γ 

1−







a contradiction. So, let B ∈ S be a set that satisfies (6.5). Then by (6.4) µ(B \ (E ∩ B)) µ(B) µ(B) − µ(E ∩ B) =γ µ(B)   δ  < γ 1− 1− = δ. γ

µ T n(B) (B \ E ∩ B) ≤ γ 

Since µ T n(B) (B \ E ∩ B) ≥ µ T n(B) B \ T n(B) (E ∩ B) 



≥ µ T n(B) B − µ T n(B) (E ∩ B) , 



The Perron-Frobenius Operator  89 1

1

0

Figure 6.1

T

1

1

0

T2

1

0

T3

1

The first three iterations of the β-transformation from Exam-

ple 6.2.2. it follows using (ii) that µ(T n(B) E) ≥ µ(T n(B) (E∩B)) ≥ µ(T n(B) B)−µ T n(B) (B\E∩B) > 1−δ. 

The increasingness of the sequence limn→∞ µ(T n E) = 1.

µ(T n E)



then implies that

Example 6.2.2. Consider the β-transformation T x = βx (mod 1) de√ 1+ 5 fined on the interval [0, 1), where β = 2 is the golden mean. Recall the density of the invariant measure µ for T that is absolutely continuous with respect to the Lebesgue measure given in (1.4). We use Proposition 6.2.3 to show that this map is exact. Note that T 0, β1 = [0, 1)     and T β1 , 1 = 0, β1 . This implies that for each n ≥ 1 we can partition the interval [0, 1) into half open intervals Jn ,such that for each of these intervals Jn either T n Jn = [0, 1) or T n Jn = 0, β1 and T n+1 Jn = [0, 1). See Figure 6.1 for the first three iterates of T . Write Pn = {Jn } for S the collection of these intervals and let S = n≥0 Pn . If you recall the notion of fundamental intervals from Example 2.3.4 for the L¨ uroth map, the elements from the collection S are the fundamental intervals of T in this case. We check the three conditions from Proposition 6.2.3. √

5 1 • For each n, µ(Jn ) ≤ 5+3 10 β n and since this decreases to 0 as n → ∞, (i) of Proposition 6.2.3 is satisfied.

• For Jn ∈ Pn either µ(T n Jn ) = 1 and we put n(Jn ) = n or µ(T n+1 J) = 1 and we set n(Jn ) = n + 1. This gives (ii). • Finally, let Jn ∈ Pn and a measurable set E ⊆ Jn be given. Then √ 5 + 3 5 n(Jn ) n(Jn ) β λ(E), µ(T E) ≤ 10

90  A First Course in Ergodic Theory

where λ denotes the one-dimensional Lebesgue measure. Since Jn , we have µ(E) = β n(Jn ) λ(E), µ(Jn ) so we obtain (6.4) with γ =

√ 5+3 5 10

1 β

6∈

and the exactness of T follows.

Using a change of variables formula we can in some cases get a nicer description of the Perron-Frobenius operator. Let (X, F, m, T ) be a dynamical system. By non-singularity the measure m ◦ T −1 is absolutely continuous with respect to m. Let φ : X → [0, ∞) be the Radon−1 0 Nikodym derivative φ := dm◦T dm . Then for any function f ∈ L (X, F, m) that satisfies f ◦ T ∈ L1 (X, F, m) it holds that Z

f ◦ T dm =

Z

T −1 A

f · φ dm for all A ∈ F.

(6.6)

A

This formula can be proved by checking the statement for indicator functions and then using the fact any f can be approximated by simple functions. Note that in case T is measure preserving, we get φ = 1 and we obtain the result from Proposition 1.2.2. Now assume that T : X → X is an invertible transformation on a measure space (X, F, m), such that the inverse T −1 is also non-singular. Then by (6.6) we have for any A ∈ F that Z

Z

PT f dm = A

Z

f dm = T −1 A

f ◦ T −1 · φ dm. 

A

Hence, PT f = f ◦ T −1 · φ m-a.e. 

(6.7)

In case X ⊆ Rd , F is the Lebesgue σ-algebra and m is the d-dimensional Lebesgue measure, φ is just the determinant of the Jacobian matrix of T. Example 6.2.3. The baker’s transformation T from Exercise 1.3.1 is invertible. The Jacobian of T is given by 2 0 0 21

!

.

Therefore φ ≡ 1 and by (6.7) the Perron-Frobenius operator becomes PT f (x, y) = f

x

2



, 2y 1[0, 1 ) (y) + f 2

x + 1

2



, 2y − 1 1[ 1 ,1] (y). 2

The Perron-Frobenius Operator  91

From this we immediately see that PT 1 = 1, so that Lebesgue measure is invariant for T . Exercise 6.2.3. Prove the change of variables formula from (6.6). Exercise 6.2.4. Let (X, F, µ) be a probability space and T : X → X a transformation. (a) Let ν be a probability measure on (X, F) that is equivalent to µ and let f be the Radon-Nikodym derivative f = dµ dν . Let PT,µ and PT,ν denote the Perron-Frobenius operators of T with respect to the measures µ and ν, respectively. Prove that for any g ∈ L1 (X, F, µ), PT,µ g =

PT,ν (f g) . f

(b) Let ν be a probability measure on (X, F) that is absolutely continuous with respect to µ. Assume that µ is T -invariant and that T is strongly mixing with respect to µ. Prove that lim ν(T −n A) = µ(A)

n→∞

holds for any A ∈ F.

6.3

PIECEWISE MONOTONE INTERVAL MAPS

Several of our examples concern maps on the interval that have an underlying partition, such that on each partition element the transformation is monotone and smooth. For this type of maps the Perron-Frobenius operator can be written in a more manageable way and used to prove the existence of invariant measures that are absolutely continuous with respect to the Lebesgue measure. In this paragraph the state space will be ([0, 1], B, λ) with B the Lebesgue σ-algebra and λ the Lebesgue measure. A collection {Ij }j∈Ω of subintervals of [0, 1], with Ω ⊆ N, is called an interval partition of [0, 1] if for eachj, k ∈ Ω it holds that λ(Ij ) > 0, λ(Ij ∩ Ik ) = 0 if j 6= k and S λ j∈Ω Ij = 1. We consider transformations in the following family. Definition 6.3.1. A transformation T : [0, 1] → [0, 1] is called a piecewise C 2 monotone interval map if there is an interval partition {Ij }j∈Ω of [0, 1] with Ω a finite set, such that (1) Tj := T |Ij is C 2 and has a C 2 extension to the closure of Ij and

92  A First Course in Ergodic Theory

(2) T 0 6= 0 on the interior of Ij for any j ∈ Ω. Such a T is non-singular with respect to λ, so the corresponding Perron-Frobenius operator is well defined. Exercise 6.3.1. Prove that any piecewise C 2 monotone interval map is non-singular with respect to λ. A piecewise C 2 monotone interval map T has a local inverse on any of the intervals Ij , so the change of variables formula (6.6) yields for any f ∈ L1 ([0, 1], B, λ) and A ∈ B that Z

PT f dλ = A

=

XZ

f dλ

−1 j∈Ω T A∩Ij

f (Tj−1 x)

XZ j∈Ω A∩T Ij

=

|T 0 (Tj−1 x)|

Z X f (Tj−1 x) A j∈Ω

|T 0 (Tj−1 x)|

dλ(x)

1T Ij (x) dλ(x).

So, PT f is given by PT f (x) =

X f (Tj−1 x) j∈Ω

|T 0 (Tj−1 x)|

1T Ij (x).

(6.8)

Example 6.3.1. Consider the logistic map T x = 4x(1 − x) on [0, 1], see Figure 6.2(a) for the graph.This is a piecewise C 2 monotone interval  map with interval partition 0, 21 , 21 , 1 . Note that it does not matter whether we include the endpoints in these intervals or not. By (6.8) for any f ∈ L1 ([0, 1], B, λ) and A ∈ B we can write √ √      1 1+ 1−x 1− 1−x PT f (x) = √ f +f . 2 2 4 1−x Now take for f the function given by f (x) = √ that 1 −

√ 1± 1−x 2



=

1∓ 1−x 2

1 x(1 − x)



PT p



1 . x(1−x)

From the fact

we obtain

1 2 q = √ √ 4 1 − x ( 1 − 1 1 − x)( 1 + 1 √1 − x) 2 2 2 2 1 1 1 q = √ =p . 2 1 − x 1 − 1 (1 − x) x(1 − x) 4 4

The Perron-Frobenius Operator  93

Since

R [0,1]

1 dλ(x) x(1−x)



= 2 arcsin

√ 1 = π, it follows from Proposi-

tion 6.1.3 that the measure given by Z

µ(A) = A

1 dλ(x) π x(1 − x)

for all A ∈ B

p

is a T -invariant measure that is absolutely continuous with respect to λ. 1

1 a

b α−1

0

1 2

1

(a) Logistic map

0 r

1 2

1

α-1 1 2

(b) W -shaped map

0

(c) α-continued fraction map

The logistic map from Example 6.3.1 in (a), the W -shaped map from Example 6.3.3 with a = 34 , b = 23 and r = 15 in (b) and the 7 α-continued fraction map with α = 10 from Exercise 6.3.3 in (c). Figure 6.2

Example 6.3.2. For the doubling map T x = 2x (mod 1) from Example 1.3.3 we get for any f ∈ L1 ([0, 1], B, λ) that 1 x 1 x + 1 PT f (x) = f + f . 2 2 2 2 One immediately sees that the constant function 1 is a fixed point of PT , so that λ is T -invariant. Note that for each n ≥ 1, n

PTn f (x)

−1  k 1 2X x f n+ n , = n 2 k=0 2 2

i.e., PTn f takes the average value of f over all the points 2kn + 2xn : 0 ≤ n k ≤ 2 − 1 . Hence, PTn f converges uniformly to the Riemann integral of f as n → ∞. Then by H¨older’s Inequality it also holds that 

Z

n lim PT f −

n→∞

[0,1]



f dλ = 0, 1

which by Theorem 6.2.2 implies that T is exact.

94  A First Course in Ergodic Theory

The goal of this section is to prove the following theorem from 1973, which is due to A. Lasota and J. A. Yorke [34]. We call a piecewise C 2 monotone interval map T : [0, 1] → [0, 1] expanding if inf x∈[0,1] |T 0 (x)| ≥ c > 1 for some constant c, where the infimum is taken over all points where T 0 is defined. Theorem 6.3.1 (Lasota and Yorke). Let T : [0, 1] → [0, 1] be a piecewise C 2 monotone expanding interval map. Then T admits an invariant probability measure that is absolutely continuous with respect to Lebesgue measure and has a density of bounded variation. The theorem asserts more than just the existence of an invariant probability measure µ that is absolutely continuous with respect to Lebesgue. It also states that the density dµ dλ of this measure is of bounded variation. Let a, b ∈ R with a < b. The total variation of a function f : [a, b] → R is defined by V ar[a,b] (f ) =

sup

n X

|f (xi ) − f (xi−1 )|,

x0 =a 0 such that |T 00 | ≤ K. Hence, d 1 |T 00 (x)| K = ≤ dx T 0 (x) 0 2 (T (x)) θ(T )2

for all x ∈ I where the derivatives are defined. From Proposition 12.7.2(ii) applied to g = f and h = |T10 | we get 

V arIj

f |T 0 |







d 1 1 V arIj (f ) + |f (x)| dλ(x) θ(T ) dx T 0 (x) Ij Z K 1 V arIj (f ) + |f |dλ. ≤ θ(T ) θ(T )2 Ij Z



(6.12)

Set η = min1≤j≤N λ(Ij ). Then λ(T Ij ) > θ(T )λ(Ij ) ≥ θ(T )η.

(6.13)

96  A First Course in Ergodic Theory

Combining (6.11), (6.12) and (6.13) with (6.10) yields N N 1 2 X K 2 X + V ar[0,1] (PT f ) ≤ V arIj (f ) + θ(T ) j=1 θ(T ) j=1 θ(T ) η



Z

|f |dλ.

Ij

Since (I ) is an interval partition the statement holds with L(T ) =  j 1≤j≤N  2 θ(T )

K θ(T )

+

1 η

.

Proof of Theorem 6.3.1. We first use Lemma 6.3.1 to obtain an appropriate bound on the total variation of any iterate T n of T . If T is a piecewise C 2 monotone expanding interval map, then so is T n for any n. Since θ(T n ) ≥ θ(T )n we can fix a k such that θ(T )k > 2. Then from the previous lemma we obtain for T k and any f ∈ BV ([0, 1]) that V ar[0,1] (PTk f ) ≤ ρV ar[0,1] (f ) + L(T k )kf k1 ,

(6.14)

where ρ = θ(T2 )k ∈ (0, 1) and L(T k ) > 0. Let n > 0 be given and write n = mk + ` with 0 ≤ ` < k. By Lemma 6.3.1 we know that PTj f ∈ BV ([0, 1]) for any j. Repeated application of (6.14) together with Proposition 6.1.1(iv) and Proposition 6.1.2 gives that V ar[0,1] (PTn f ) = V ar[0,1] PTmk (PT` f )



≤ ρm V ar[0,1] (PT` f ) + L(T k )kf k1

m−1 X

ρi .

i=0

Applying Lemma 6.3.1 to PT` f then leads to V

ar[0,1] (PTn f )

≤ρ

m



m−1 X 2 ` k ρi V ar (f ) + L(T )kf k + L(T )kf k 1 1 [0,1] θ(T ` ) i=0



m

`

m

k

≤ 2ρ V ar[0,1] (f ) + L(T )ρ kf k1 + L(T )

m−1 X

ρi kf k1

i=0

≤ 2V ar[0,1] (f ) + Ckf k1 , where we have used that ρ ∈ (0, 1) and have set C =

max0≤`≤k L(T ` ) > 1−ρ

0, which is independent of n. Now fix a function f ∈ BV ([0, 1]) with f ≥ 0 and Define the sequence of functions (fn ) by fn =

X 1 n−1 P i f. n i=0 T

R [0,1]

f dλ = 1.

The Perron-Frobenius Operator  97

By the positivity of PT we find fn ≥ 0 for each n and since PT preserves R the integral also kfn k1 = [0,1] fn dλ = 1 for any n, so each fn is a probability density as well. Moreover, V ar[0,1] (fn ) ≤

X 1 n−1 V ar[0,1] (PTi f ) ≤ 2V ar[0,1] (f ) + C n i=0

and by Proposition 12.7.2(iii), Z

kfn k∞ ≤ V ar[0,1] (fn ) +

|fn |dλ ≤ 2V ar[0,1] (f ) + C + 1.

[0,1]

Hence kfn k∞ and V ar[0,1] (fn ) can be bounded uniformly in n. By Helly’s First Theorem we find a subsequence (fnj ) ⊆ (fn ) that converges pointwise to some h ∈ BV ([0, 1]). By Proposition 6.1.1(iv) it holds that kPT h − hk1 ≤ kPT h − PT fnj k1 + kPT fnj − fnj k1 + kfnj − hk1 ≤ 2kfnj



nX −1 nj −1 X

1 j 1 i+1 i P P f f − − hk1 + T T

n nj i=0

j i=0

1

= 2kfnj ≤ 2kfnj

1 n − hk1 + kPT j f − f k1 nj 2 − hk1 + . nj

Since supj≥1 kfnj k∞ < ∞, we can conclude from the Dominated ConR vergence Theorem that [0,1] h dλ = 1 and that limj→∞ kfnj − hk1 = 0. So, kPT h − hk1 = 0 and hence PT h = h λ-a.e. Moreover, h ≥ 0 and R [0,1] h dλ = 1. Therefore h is the density of an absolutely continuous invariant probability measure for T . Note that in the proof of Theorem 6.3.1, we started with an arbitrary probability density function f ∈ BV ([0, 1]) and then found that Pn−1 i  some subsequence of (fn ) = n1 i=0 PT f converges to an invariant probability density function for T that is of bounded variation. Example 6.3.3. Let 12 ≤ a, b ≤ 1 and 0 < r < 12 be three parameters and define the transformations Wa,b,r : [0, 1] → [0, 1] by

Wa,b,r (x) =

 x   a 1 − r ), 2b (x  1−2r

− r), a,b,r (1 − x),

 W

if 0 ≤ x ≤ r, if r < x ≤ 21 , if 12 < x ≤ 1.

98  A First Course in Ergodic Theory

See Figure 6.2(b) for a graph. Then Wa,b,r is a piecewise C2 mono  tone interval map with partition {[0, r], r, 12 , 21 , 1 − r , 1 − r, 1 }. Since a, b ≥ 12 , the map Wa,b,r is expanding, so by Theorem 6.3.1 an invariant probability measure that is absolutely continuous with respect to λ exists. Exercise 6.3.2. (a) Find an invariant probability measure that is absolutely continuous with respect to λ for the map W 1 , 1 ,r and prove that 2 2 W 1 , 1 ,r is ergodic for any 0 < r < 21 . 2 2

(b) Verify that the probability density h : [0, 1] → R given by h(x) = (2−2r)1[0, 1 ) (x)+2r1[ 1 ,1) (x) is a fixed point of PW1, 1 ,r for any 0 < r < 12 . 2

2

2

Example 6.3.4. Let β ∈ (1, 2) and let T : [0, 1] → [0, 1] be the βtransformation defined by T x = βx (mod 1) (compare Example 1.3.6). Then T is a piecewise C 2 monotone interval map and T is expanding. The Perron-Frobenius operator is given by PT f (x) =

1 x 1 x + 1 f + f 1[0,β−1) (x) β β β β

for any f ∈ L1 ([0, 1], B, λ). Consider the probability density h : [0, 1) → [0, ∞) given by 1 X 1 h(x) = 1[0,T n 1) (x), C n≥0 β n where C = [0,1) n≥0 β1n 1[0,T n 1) dλ is a normalizing constant. First suppose that x ∈ [0, β − 1). If T n 1 < β1 , then βx ∈ [0, T n 1) if and only if x ∈ [0, T n+1 1), so that R

1[0,T n 1)

P

x

β

+ 1[0,T n 1)

x + 1

Similarly, if T n 1 ≥ β1 , then and 1[0,T n 1)

x

β

+ 1[0,T n 1)

β x+1 β

x + 1

β

= 1[0,T n 1)

x

β

= 1[0,T n+1 1) (x).

∈ [0, T n 1) if and only if x ∈ [0, T n+1 1)

= 1 + 1[0,T n 1)

x + 1

β

= 1 + 1[0,T n+1 1) (x).

The Perron-Frobenius Operator  99

This gives 1   x  1  x + 1  h + h β β β β  x + 1   x X 1 1 n 1) n 1) = + 1 1 [0,T [0,T C n≥0 β n+1 β β

PT h(x) =

=

 1 X 1  n n+1 1) (x) + 1 1 1 (T 1) [0,T [ ,1) β C n≥0 β n+1

=

1 X 1 1 1[0,T n+1 1) (x) + = h(x), n+1 C n≥0 β C

where we have used that the β-expansion of 1 is given by the expression P 1 = n≥1 β1n 1[ 1 ,1) (T n−1 1). Now assume that x ∈ [β − 1, 1). Then x+1 β 6∈ β

[0, 1] and it still holds that if T n 1 < β1 , then x ∈ [0, T n+1 1). We obtain PT h(x) = =

x β

∈ [0, T n 1) if and only if

x 1 x 1 X 1 n h = 1 [0,T 1) β β C n≥0 β n+1 β  1 X 1  n 1 n+1 1 (x) + 1 (T 1) = h(x). [0,T 1) [ β ,1) C n≥0 β n+1 R

So the measure µ on ([0, 1), B) given by µ(A) = A h dλ for each A ∈ B is an absolutely continuous invariant probability measure for T . Over the years the result of A. Lasota and J. A. Yorke was extended in many different directions. The condition that the number of intervals of monotonicity of T is finite can be relaxed to include also countable partitions (see [57]) and also the condition that T needs to be expanding can be relaxed to include a countable number of points x ∈ [0, 1] where T 0 (x) = 1 as long as such points x are not attracting (see e.g. [62, 66]), even though it might then no longer be possible to find an invariant probability measure. Here we just mention without a proof (a simplified version of) the result by M. Rychlik on systems with countably many branches. The proof can be found in [57], see also [8, Theorem 5.4.1]. Theorem 6.3.2 (Rychlik). Let I1 , I2 , . . . be a countable collection of S closed intervals with disjoint interiors such that ∞ n=1 In = [0, 1]. Let U ⊆ [0, 1] be the open set consisting of the union of the interiors of the intervals I1 , I2 , . . . and S = [0, 1] \ U . Let T : [0, 1] → [0, 1] be a transformation with the following properties:

100  A First Course in Ergodic Theory

(1) T |U is continuous; (2) for any interval In the restriction of T to the interior of In admits an extension to a homeomorphism from In to a subinterval of [0, 1]; (3) the function g given by g(x) = |T10 x| for x ∈ U and g(x) = 0 for x ∈ S satisfies kgk∞ < 1 and V ar[0,1] (g) < ∞. Then there exists an invariant probability measure for T that is absolutely continuous with respect to λ. To give an example of how to apply this result, we turn to the Gauss map from Example 1.3.7. This map is one specific member of a family  of transformations introduced first by H. Nakada in [45] for α ∈ 12 , 1 in 1981 and extended to the whole parameter interval [0, 1] in 2008 in [39]. For any α ∈ [0, 1] the α-continued fraction map Tα : [α−1, α] → [α−1, α] is defined by  j k  1 − 1 + 1 − α , x Tα x = x 0,

if x 6= 0,

(6.15)

if x = 0.

We obtain the Gauss map for α = 1. The maps are called continued fraction maps, because for each α ∈ [0, 1] the map Tα can be used to generate continued fraction expansions of numbers in the interval [α − 1, α] in a way very similar to what we described in Example 1.3.7, see Exercise 8.4.2. Exercise 6.3.3. Use Theorem 6.3.2 to prove that for every α ∈ (0, 1] the α-continued fraction map Tα has an invariant probability measure that is absolutely continuous with respect to the Lebesgue measure on [α − 1, α]. There is much more that can be said about the Perron-Frobenius operator and the information it gives on the statistical properties of dynamical systems. We refer the interested reader to e.g. [33, 8, 4].

CHAPTER

7

Invariant Measures for Continuous Transformations

7.1

EXISTENCE

Suppose (X, d) is a compact metric space. The topology induced by the metric can be used to put a measure structure on X. Let B be the Borel σ-algebra, i.e., the smallest σ-algebra containing all the open sets. Let M (X) denote the collection of all Borel probability measures on X. There is a natural embedding of the space X in M (X) via the map x 7→ δx assigning to each x the Dirac measure δx concentrated at x. This implies that M (X) is non-empty. We summarize some of the properties of M (X) here. More details can be found in Section 12.6 from the Appendix. • M (X) is a convex set, i.e., pµ + (1 − p)ν ∈ M (X) whenever µ, ν ∈ M (X) and 0 ≤ p ≤ 1. • Each element µ ∈ M (X) is regular: for each A ∈ B and each ε > 0 there exist an open set Oε and a closed set Cε with Cε ⊆ A ⊆ Oε and µ(Oε \ Cε ) < ε (Theorem 12.6.1). • A metric structure on M (X) is given by weak convergence: a sequence (µn ) ⊆ M (X) converges weakly to a measure µ ∈ M (X) if Z Z lim f dµn = f dµ n→∞ X

X

101

102  A First Course in Ergodic Theory

for any continuous f : X → C. The space M (X) is compact under this notion of convergence (Theorem 12.6.4). On X we consider the dynamics given by a continuous map T : X → X. Since B is generated by the open sets, T is measurable with respect to B. We have already seen some examples that fit this setup. Example 7.1.1. Let k ≥ 1, X = {0, 1, . . . , k −1}N or X = {0, 1, . . . , k − 1}Z . On X we can consider the metric d defined by d(x, y) = d (xn ), (yn ) = 2− min{i≥1 : xi 6=yi } 

(7.1)

for the one-sided sequences and d(x, y) = d (xn ), (yn ) = 2− min{|i|≥0 : xi 6=yi } 

for the two-sided ones. This metric induces the product topology on X obtained from taking the discrete topology on the set {0, 1, . . . , k − 1}. Open sets are unions of sets of the form {x ∈ X : xi ∈ Ui , Ui ⊆ {0, 1, . . . , k − 1} and Ui 6= {0, 1, . . . , k − 1} for only finitely many i}. A basis for this topology is provided by the collection S of all sets that specify a finite number of coordinates. As the product of compact spaces X is compact. Two examples of transformations defined on (X, d) are the left shift from Example 1.3.9 and the binary odometer in case k = 2 (see Example 1.3.11). Both maps are continuous since the pre-image of any set from S is a union of sets in S. In case X = {0, 1}N and T is the binary odometer, or X = {0, 1, . . . , k − 1}Z and T is the left shift, then these maps are homeomorphisms. Example 7.1.2. Recall Arnold’s cat map T : T2 → T2 from Example 1.3.8 defined on the two-dimensional torus by an application of the matrix ! 2 1 A= , 1 1 so T (e2πix , e2πiy ) = (e2πi(2x+y) , e2πi(x+y) ). The metric on T2 is the Euclidean distance on arcs. We already noticed that T is invertible, and clearly it is continuous, since it is basically a linear transformation. Since the same holds for T −1 , T is a homeomorphism.

Invariant Measures for Continuous Transformations  103

A continuous map T : X → X on a compact metric space (X, d) induces in a natural way an operator T : M (X) → M (X) given by (T µ)(A) = µ(T −1 A),

A ∈ B.

i

Then T µ(A) = µ(T −i A). Using a standard argument, one can easily show that Z Z f d(T µ) = f ◦ T dµ X

X

for any integrable function f on X. Note that T is measure preserving with respect to µ ∈ M (X) if and only if T µ = µ. Since X is a compact metric space, measures from M (X) are uniquely determined by how they integrate continuous functions, see Theorem 12.6.2. Hence, µ ∈ M (X) is measure preserving if and only if Z

Z

f dµ = X

f ◦ T dµ

X

for all continuous functions f on X. Let M (X, T ) = {µ ∈ M (X) : T µ = µ} be the collection of all T -invariant Borel probability measures. The first question we would like to answer is whether M (X, T ) 6= ∅. Theorem 7.1.1. Let T : X → X be continuous, and (νn ) a sequence in M (X). Define a new sequence (µn ) in M (X) by µn =

X i 1 n−1 T νn . n i=0

Then, any weak limit point µ of (µn ) is an element of M (X, T ). Proof. First note that by the compactness of M (X), the sequence (µn ) has a convergent subsequence (µni ) with limit point µ ∈ M (X). For any

104  A First Course in Ergodic Theory

continuous function f on X, Z Z f ◦ T dµ − f dµ X X Z Z = lim f ◦ T dµnj − f dµnj j→∞ X X Z nX j −1   1 = lim f (T i+1 x) − f (T i x) dνnj (x) j→∞ nj X i=0 Z 1 = lim (f (T nj x) − f (x)) dνnj (x) j→∞ nj X

≤ lim

j→∞

Therefore

R X

f dµ =

R X

2kf k∞ = 0. nj

f ◦ T dµ and µ ∈ M (X, T ).

For example, to find a T -invariant measure, one can start with the Dirac measure δx concentrated on any point x ∈ X and then look at the P sequence (µn ) given by µn = n1 n−1 i=0 δT i x . From the above theorem any limit point of a convergent subsequence of (µn ) is an element of M (X, T ). The next result characterizes the ergodic members of M (X, T ). Theorem 7.1.2. Let T be a continuous transformation on a compact metric space X. Then the following hold. (i) M (X, T ) is a compact convex subset of M (X). (ii) µ ∈ M (X, T ) is an extreme point (i.e., µ cannot be written in a non-trivial way as a convex combination of elements of M (X, T )) if and only if T is ergodic with respect to µ. Proof. (i) Clearly M (X, T ) is convex. Now let (µn ) be a sequence in M (X, T ) converging to µ ∈ M (X). We need to show that µ ∈ M (X, T ). Since T is continuous, then for any continuous function f on X, the function f ◦ T is also continuous. Hence, Z

f ◦ T dµ = lim

Z

n→∞ X

X

f ◦ T dµn

Z

= lim

n→∞ X

f dµn

Z

=

f dµ. X

Therefore, T is measure preserving with respect to µ, and µ ∈ M (X, T ).

Invariant Measures for Continuous Transformations  105

(ii) Suppose T is ergodic with respect to µ. If µ can be written as µ = pν + (1 − p)ρ for some ν, ρ ∈ M (X, T ) and some 0 < p ≤ 1, then ν is absolutely continuous with respect to µ. It then follows from Theorem 3.1.2 that ν = µ. Conversely, suppose that T is not ergodic with respect to µ. Then there exists a measurable set A such that T −1 A = A, and 0 < µ(A) < 1. Define measures µ1 , µ2 on X by µ1 (B) =

µ(B ∩ A) µ(A)

and

µ2 (B) =

µ (B ∩ (X \ A)) . µ(X \ A)

Since A and X \A are T -invariant sets, then µ1 , µ2 ∈ M (X, T ), and µ1 6= µ2 since µ1 (A) = 1 while µ2 (A) = 0. Furthermore, for any measurable set B, µ(B) = µ(A)µ1 (B) + (1 − µ(A))µ2 (B), i.e., µ1 is a non-trivial convex combination of elements of M (X, T ). Thus, µ is not an extreme point of M (X, T ). Recall that in the case of a general finite measure preserving system (X, F, µ, T ), the Pointwise Ergodic Theorem allows us to find for each f ∈ L1 (X, F, µ) a measurable set Xf with µ(Xf ) = 1 such that X 1 n−1 lim f (T i x) = n→∞ n i=0

Z

f dµ X

for each x ∈ Xf . The set Xf might depend on the function f (and does in most cases). When the underlying space X is a compact metric space and T : X → X is continuous, it is possible to find a single set Y of full measure that works for all continuous maps f . The proof uses the fact that the Banach space C(X) of all complex valued continuous functions on X (under the supremum norm kf k∞ = supx∈X |f (x)|) is separable, i.e., C(X) has a countable dense subset. Theorem 7.1.3. Let T : X → X be continuous and µ ∈ M (X, T ) ergodic. Then there exists a measurable set Y with µ(Y ) = 1 and such that Z X 1 n−1 i lim f (T x) = f dµ n→∞ n X i=0 for all x ∈ Y and f ∈ C(X).

106  A First Course in Ergodic Theory

Proof. Since C(X) is separable, we can choose a countable dense subset (gm ) in C(X). By the Pointwise Ergodic Theorem, for each m there exists a subset Ym with µ(Ym ) = 1 and X 1 n−1 gm (T i x) = n→∞ n i=0

Z

lim

for all x ∈ Ym . Let Y =

T∞

m=1

gm dµ X

Ym , then µ(Y ) = 1, and

X 1 n−1 lim gm (T i x) = n→∞ n i=0

Z

gm dµ X

for all m and all x ∈ Y . Now, let f ∈ C(X), x ∈ Y and ε > 0 be given. Then there is an m, such that kf − gk∞ < 3ε . Moreover, we can find an N such that for each n ≥ N , n−1 Z 1 X ε i gm (T x) − gm dµ < n 3 X i=0

and hence, n−1 n−1 Z X 1 X 1 X 1 n−1 i i i f (T x) − f dµ ≤ f (T x) − gm (T x) n n i=0 n i=0 X i=0 n−1 Z 1 X i + gm (T x) − gm dµ n i=0 X Z Z + gm dµ − f dµ < ε. X

X

This gives the result. Theorem 7.1.4. Let X be a compact metric space and B the Borel σalgebra on X. Let T : X → X be continuous, and µ ∈ M (X, T ). Then T is ergodic with respect to µ if and only if for µ-a.e. x one has X 1 n−1 δT i x = µ. n→∞ n i=0

lim

Proof. Suppose T is ergodic with respect to µ. Notice that for any f ∈ C(X), Z X

f d(δT i x ) = f (T i x).

Invariant Measures for Continuous Transformations  107

Hence by Theorem 7.1.3, there exists a measurable set Y with µ(Y ) = 1 such that X 1 n−1 n→∞ n i=0

X 1 n−1 f (T i x) = n→∞ n i=0

Z

lim

Z

f d(δT i x ) = lim

X

f dµ X

for all x ∈  Y , and f ∈ C(X). So, for all x ∈ Y the sequence 1 Pn−1 i=0 δT i x converges weakly to µ. n Conversely, let Y ∈ B be a set with µ(Y ) = 1 and such that 1 Pn−1 i=0 δT i x → µ for all x ∈ Y . Then for any f ∈ C(X) and any n h ∈ L1 (X, B, µ) one has for any x ∈ Y , X 1 n−1 f (T i x)h(x) = h(x) n→∞ n i=0

Z

lim

f dµ. X

By the Dominated Convergence Theorem, X 1 n−1 n→∞ n i=0

Z

lim

f (T i x)h(x) dµ(x) =

X

Z

Z

h dµ X

f dµ.

(7.2)

X

Now, let F, H ∈ L2 (X, B, µ). Since µ is a probability measure we have H ∈ L1 (X, B, µ) so that X 1 n−1 lim n→∞ n i=0

Z

i

Z

f (T x)H(x) dµ(x) = X

Z

H dµ X

f dµ X

for all f ∈ C(X). Let ε > 0, since C(X) is dense in L2 (X, B, µ) (under the L2 -norm),Rthere exists a g ∈ C(X) such that kF − gk2 < ε. This R implies that | F dµ − g dµ| ≤ kF − gk1 < ε. Furthermore, by (7.2) there exists an N so that for n ≥ N one has Z Z Z X 1 n−1 i g(T x)H(x) dµ(x) − H dµ g dµ < ε. X n X X i=0

Thus, for n ≥ N one has using H¨older’s Inequality and Proposition 1.2.3

108  A First Course in Ergodic Theory

that Z

X

X 1 n−1 F (T i x)H(x) dµ(x) − n i=0



Z X

Z

Z

H dµ X

X



F dµ

X 1 n−1 F (T i x) − g(T i x) |H(x)| dµ(x) n i=0

Z Z Z X 1 n−1 i + g(T x)H(x) dµ(x) − H dµ g dµ X n X X i=0 Z Z Z Z + g dµ H dµ − F dµ H dµ X

X

X

X

< εkHk2 + ε + εkHk2 . It follows that X 1 n−1 n→∞ n i=0

Z

lim

F (T i x)H(x) dµ(x) =

X

Z

Z

H dµ X

F dµ X

for all F, H ∈ L2 (X, B, µ). Taking F and H to be indicator functions, one gets that T is ergodic. Exercise 7.1.1. Let X be a compact metric space and T : X → X a homeomorphism. Let x ∈ X be a periodic point of T of period n, i.e., T n x = x and T j x 6= x for j < n. Show that if µ ∈ M (X, T ) is ergodic X 1 n−1 and µ({x}) > 0, then µ = δ i . n i=0 T x We end this section with a short discussion that the assumption of ergodicity is not very restrictive. In case (X, d) is a compact metric space, we have seen that M (X, T ) is a compact convex subset of the convex set M (X) and that the extreme points of M (X, T ) are precisely the ergodic T -invariant Borel probability measures. Denote this set by ME (X, T ). Any f ∈ C(X) defines a continuous linear functional on R M (X) (with respect to the weak topology) by f (µ) = X f dµ. Now recall Choquet’s Theorem, see Theorem 12.6.6. If we simply take Y = M (X, T ) and V = M (X) and let µ ∈ M (X, T ), then Theorem 12.6.6 gives the existence of a probability measure ν on M (X, T ) with the property that for any f ∈ C(X), Z

Z

Z

f dµ = X

f dµe dν. ME (X,T )

X

Invariant Measures for Continuous Transformations  109

In other words, the integral of f with respect to µ can be decomposed into integrals with respect to the ergodic members of M (X, T ). This is a special case of a more general theorem, called the Ergodic Decomposition, which we state below. For the proof we refer to e.g. [17] or [63]. Theorem 7.1.5 (Ergodic Decomposition). Let T : X → X be a measure preserving transformation on a Borel probability space (X, F, µ). Then there exists a Borel probability space (Z, G, ν) and an injective measurable map z 7→ µz such that (i) µz is a T -invariant and ergodic probability measure on X for νa.e. z ∈ Z, and (ii) µ =

R Z

µz dν(z).

7.2 UNIQUE ERGODICITY AND UNIFORM DISTRIBUTION Examining Theorem 7.1.3, one sees that for µ ∈ M (X, T ) the set Y (associated with µ) can be taken to be T -invariant, since the limit of the ergodic averages is a T -invariant function. So it is feasible that Y c (which is T -invariant as well) is the support ofRsome other measure ν ∈ M (X, T ) P i such that limn→∞ n1 n−1 i=0 f (T x) = X f dν for all f ∈ C(X) and all c x ∈ Y . So what can we say in the case X = Y ? This leads us to the notion of unique ergodicity which will be discussed below. A continuous transformation T : X → X on a compact metric space (X, d) is called uniquely ergodic if there is only one T -invariant probability measure µ on X. In this case, M (X, T ) = {µ}, and µ is necessarily ergodic, since µ is an extreme point of M (X, T ). Recall that if ν ∈ M (X, T ) is ergodic, then there exists a measurable subset Y such that ν(Y ) = 1 and X 1 n−1 f (T i x) = n→∞ n i=0

Z

lim

f dν X

for all x ∈ Y and all f ∈ C(X). When T is uniquely ergodic we have the following much stronger result. Theorem 7.2.1. Let T : X → X be a continuous transformation on a compact metric space (X, d). Then the following are equivalent: P j (i) For every f ∈ C(X), the sequence ( n1 n−1 j=0 f (T x)) converges uniformly to a constant.

110  A First Course in Ergodic Theory

(ii) For every f ∈ C(X), the sequence ( n1 pointwise to a constant.

Pn−1 j=0

f (T j x)) converges

(iii) There exists a µ ∈ M (X, T ) such that for every f ∈ C(X) and all x∈X Z X 1 n−1 f (T i x) = f dµ. lim n→∞ n X i=0 (iv) T is uniquely ergodic. Proof. (i) ⇒ (ii) is immediate. (ii) ⇒ (iii) Define L : C(X) → C by X 1 n−1 f (T i x). n→∞ n i=0

L(f )(x) = lim

By assumption L(f ) is a constant function. It is easy to see that L is linear, continuous (|L(f )| ≤ kf k∞ ), positive and satisfies L(1) = 1. Thus, by the Riesz Representation Theorem (Theorem 12.6.3) there exists a probability measure µ ∈ M (X) such that Z

L(f ) =

f dµ X

for all f ∈ C(X). But X 1 n−1 f (T i+1 x) = L(f ). n→∞ n i=0

L(f ◦ T ) = lim Hence, Z X

f ◦ T dµ =

Z

f dµ. X

Thus, µ ∈ M (X, T ), and for every f ∈ C(X), X 1 n−1 f (T i x) = n→∞ n i=0

Z

lim

f dµ X

for all x ∈ X. (iii) ⇒ (iv) Suppose µ ∈ M (X, T ) is such that for every f ∈ C(X), X 1 n−1 lim f (T i x) = n→∞ n i=0

Z

f dµ X

Invariant Measures for Continuous Transformations  111

holds for all x ∈ X. Assume ν ∈ M (X, T ), we will show that µ = ν. For 1 Pn−1 j any f ∈ C(X), since the sequence j=0 f (T x) converges pointwise n R to the constant function X f dµ, and since each term of the sequence is bounded in absolute value by the constant kf k∞ = supx∈X |f (x)|, it follows by the Dominated Convergence Theorem that Z

lim

n→∞ X

X 1 n−1 f (T i x) dν(x) = n i=0

Z Z

Z

f dµ dν = X

X

f dµ. X

But for each n, Z X

Thus,

R X

X X 1 n−1 1 n−1 f (T i x) dν(x) = n i=0 n i=0

f dµ =

R X

Z

f (T i x) dν(x) =

X

Z

f dν. X

f dν, and µ = ν.

(iv) ⇒ (i) The proof is done by contradiction. Assume M (X, T ) = {µ}  1 Pn−1 j h ◦ T does and suppose h ∈ C(X) is such that the sequence j=0 n R not converge uniformly to X h dµ. Then there exists an ε > 0 such that for each M there exist n > M and yn ∈ X such that n−1 Z 1 X j ≥ ε. h(T y ) − h dµ n n X j=0

Let µn =

X X j 1 n−1 1 n−1 δT j yn = T δyn . n j=0 n j=0

Then, Z Z h dµn − h dµ ≥ ε. X

X

Since M (X) is compact, there exists a subsequence (µni ) converging to some ν ∈ M (X). Hence, Z Z h dν − h dµ ≥ ε. X

X

By Theorem 7.1.1, ν ∈ M (X, T ) and by unique ergodicity µ = ν, which is a contradiction. Example 7.2.1. Let θ ∈ [0, 1) \ Q and Tθ (x) = x + θ (mod 1) the corresponding irrational rotation. Then Tθ is uniquely ergodic. This is a

112  A First Course in Ergodic Theory

consequence of the above theorem as well as Weyl’s Theorem on uniform distribution: for any Riemann integrable function f on [0, 1), and any x ∈ [0, 1), one has X 1 n−1 f (x + iθ − bx + iθc) = n→∞ n i=0

Z

lim

f (y) dy. X

We will prove this statement using Theorem 7.2.1(ii). In order to use the results of this section we need to look at the continuous version of Tθ . So consider the probability space (S1 , B, m), where S1 is the unit circle with the Borel σ-algebra and m is the normalized Haar (or Lebesgue) measure. On it we consider the map Rθ which is the rotation by the vector e2πiθ , see also Example 1.3.2. Since Rθ is an isometry, it is a homeomorphism. To show Rθ is uniquely ergodic first note that the set of trigonometric P 2πinx are dense in C([0, 1)) (under the polynomials p(x) = N n=−N cn e supremum norm), so it is enough to consider continuous functions of the form f (x) = e2πikx for some fixed k. Now for any x and any n, n−1 j X X X 1 n−1 1 n−1 1 f (Rθj x) = e2πik(x+jθ) = e2πikx e2πikθ . n j=0 n j=0 n j=0

Thus, X 1 n−1 f (Rθj x) = n j=0

(

1, e2πikx n

·

e2πiknθ −1 , e2πikθ −1

if k = 0, if k = 6 0.

Taking limits we have X 1 n−1 lim f (Rθj x) = n→∞ n j=0

(

1, if k = 0, = 0, if k 6= 0,

Z

f dm. S1

This shows that Rθ is uniquely ergodic, and the same holds for Tθ . Example 7.2.2. Consider the sequence of first digits S = (1, 2, 4, 8, 1, 3, 6, 1, . . .) obtained by writing the first decimal digit of each term in the sequence {2n : n ≥ 0} = {1, 2, 4, 8, 16, 32, 64, 128, . . .}. The elements of S belong to the set {1, 2, . . . , 9}. For each k = 1, 2, . . . , 9, let pk (n) be the number of times the digit k appears in the first n terms

Invariant Measures for Continuous Transformations  113

of the first digit sequence S. The asymptotic relative frequency of the digit k is then limn→∞ pkn(n) . We want to identify this limit for each k ∈ {1, 2, . . . , 9}. It looks like an ergodic average, but what is the underlying transformation? We analyze the sequence S further. The first digit of 2i is k if and only if k · 10r ≤ 2i < (k + 1) · 10r for some r ≥ 0. In this case, r + log10 k ≤ i log10 2 < r + log10 (k + 1). This shows that r = bi log10 2c, and log10 k ≤ i log10 2 − bi log10 2c < log10 (k + 1). Let Jk = [log10 k, log10 (k + 1)) and θ = log10 2. Then, θ is irrational and Tθi 0 = i log10 2 − bi log10 2c. Using this, we see that the first digit of 2i is k if and only if Tθi 0 ∈ Jk , and so X pk (n) 1 n−1 1Jk (Tθi 0). = lim n→∞ n→∞ n n i=0

lim

We cannot apply Theorem 7.2.1 directly because the function 1Jk is not continuous, but this problem can be solved easily by squeezing 1Jk between two continuous functions. In fact we need to use the continuous version Rθ of Tθ , but to keep the notation simple we keep using the space ([0, 1), B, λ, Tθ ). Let ε > 0, we can find two continuous functions f1 , f2 R such that f1 ≤ 1Jk ≤ f2 and (f2 − f1 ) dλ < ε. Then, lim sup n→∞

X X 1 n−1 1 n−1 1Jk (Tθi 0) ≤ lim f2 (Tθi 0) n→∞ n n i=0 i=0 Z

=

f2 dλ

< λ(Jk ) + ε   k+1 = log10 + ε, k

114  A First Course in Ergodic Theory

and lim inf n→∞

X X 1 n−1 1 n−1 1Jk (Tθi 0) ≥ lim f1 (Tθi 0) n→∞ n n i=0 i=0 Z

=

f1 dλ

> λ(Jk ) − ε   k+1 = log10 − ε. k This shows that X 1 n−1 pk (n) k+1 = lim 1Jk (Tθi 0) = log10 . lim n→∞ n→∞ n n i=0 k 



Exercise 7.2.1. Let X be a compact metric space, and B the Borel σ-algebra on X. Let T : X → X be a continuous transformation. Let N ≥ 1 and x ∈ X. P −1 (a) Show that T N x = x if and only if N1 N i=0 δT i x ∈ M (X, T ). (b) Suppose X = {1, 2, . . . , N } and T i = i + 1 (mod N ). Show that T is uniquely ergodic. Determine the unique ergodic measure. Exercise 7.2.2. Let X be a compact metric space, B the Borel σ-algebra on X and T : X → X a uniquely ergodic continuous transformation. Let µ be the unique ergodic measure and assume that µ(G) > 0 for all non-empty open sets G ⊆ X. (a) Show that for each non-empty open subset G of X there exists a continuous function f ∈ C(X), and a closed subset F of G of positive µ measure such that f (x) = 1 for x ∈ F , f (x) > 0 for x ∈ G and f (x) = 0 for x ∈ X \ G. (b) Show that for each x ∈ X and for every non-empty open set G ⊆ X, there exists an n ≥ 0 such that T n x ∈ G. Conclude that {T n x : n ≥ 0} is dense in X. Exercise 7.2.3. Consider the measure space ([0, 1], B), where B is the Borel σ-algebra on [0, 1]. Let T : [0, 1] → [0, 1] be given by T x = x2 , and assume that T is measure preserving with respect to some probability measure µ on B. Show that µ = δ0 , the Dirac measure concentrated at the point 0.

Invariant Measures for Continuous Transformations  115

Exercise 7.2.4. In Example 1.3.11 the binary odometer T : {0, 1}N → {0, 1}N was defined by T (1, 1, 1, . . .) = (0, 0, 0, . . .) and for n ≥ 1, T (1, · · · , 1, 0, xn+1 , . . .) = (0, · · · , 0, 1, xn+1 , . . .), | {z } n−1

| {z } n−1

on the underlying space {0, 1}N equipped with the σ-algebra C generated by the cylinder sets, and uniform Bernoulli measure µ 1 . We already know 2 from Example 1.3.11 that T is measure preserving with respect to µ 1 . 2 Prove that T is uniquely ergodic.

7.3

SOME TOPOLOGICAL DYNAMICS

One can study the dynamics of a continuous transformation T : X → X on a compact metric space without referring to an invariant measure. In this section we introduce some concepts from topological dynamics that have clear analogues in the setting of ergodic theory. The lack of a measure implies that we cannot have a.e. results, but the statements will have to hold for all elements in the compact metric space (X, d). Definition 7.3.1. A continuous transformation T : X → X is called (one-sided) topologically transitive if for some x ∈ X the (forward) orbit {T n x : n ≥ 0} is dense in X. A homeomorphism T : X → X is called topologically transitive if for some x ∈ X the (two-sided) orbit {T n x : n ∈ Z} is dense in X. Write OT (x) for the appropriate orbit of x under T , so OT (x) = {T n x : n ∈ Z} if T is a homeomorphism and OT (x) = {T n x : n ≥ 0} if T is continuous and not invertible. The following theorem summarizes some equivalent definitions of topological transitivity. The analogues for non-invertible continuous maps can be found in e.g. [65]. Theorem 7.3.1. Let T : X → X be a homeomorphism of a compact metric space (X, d). Then the following are equivalent. (i) T is topologically transitive. (ii) If A ⊆ X is a closed set satisfying T A = A, then either A = X or A has empty interior (i.e., X \ A is dense in X). (iii) For any pair U, V ⊆ X of non empty open sets there exists an n ∈ Z such that T n U ∩ V 6= ∅.

116  A First Course in Ergodic Theory

Proof. Recall that OT (x) is dense in X if and only if for every non-empty open set U in X, U ∩ OT (x) 6= ∅. (i) ⇒ (ii) Let x ∈ X be such that OT (x) = X, where OT (x) denotes the closure of OT (x), and let A ⊆ X be a closed set with T A = A. Suppose A does not have an empty interior. Then there is a non-empty open set U ⊆ A and a p ∈ Z such that T p x ∈ U . Since T n A = A for all n ∈ Z we get OT (x) ⊆ A. The statement now follows since A is closed. (ii) ⇒ (iii) Let U, V be two non-empty open subsets of X. Write W = ∪n∈Z T n U . Then W is open, non-empty and T W = W . Hence, A = X\W is closed, T A = A and A 6= X. By (ii), A has empty interior, so W is dense in X. Thus W ∩ V 6= ∅, which implies (iii). (iii) ⇒ (i) By Theorem 12.1.1(iv) there exists a countable basis U = {Uk }k≥1 for the topology of X. Then {x ∈ X : OT (x) = X} =

∞ [ \

T m Uk .

(7.3)

k=1 m∈Z

One inclusion is clear. For the other inclusion note that for any y ∈ X any neighborhood of y contains an element Un ∈ U and thus also an m element T m x ∈ OT (x). Now fix a k ≥ 1 and set U = ∪∞ m=−∞ T Uk . Then U is open, non-empty and T U = U . By (iii) for any open nonempty set V there is an n ∈ Z such that U ∩ V = T n U ∩ V 6= ∅. So, for m each k ≥ 1 the set ∪∞ m=−∞ T Uk is dense in X. From the Baire Category Theorem (see Theorem 12.1.1(vii)) we then see that also the intersection ∞ [ \

T m Uk

k=1 m∈Z

is dense in X and thus non-empty. By (7.3) this gives a point y ∈ X with a dense orbit. A direct consequence of the above theorem is the following corollary. Corollary 7.3.1. Let T : X → X be a homeomorphism of a compact metric space (X, d). Then T is topologically transitive if and only if there are no two disjoint non-empty T -invariant open sets. Note that Theorem 7.3.1(ii) resembles Proposition 2.2.1 and Theorem 7.3.1(iii) corresponds to Theorem 2.2.1(iii), so we could view topological transitivity as an analogue of ergodicity. This is also reflected in the following (partial) analogue of Theorem 2.2.2.

Invariant Measures for Continuous Transformations  117

Proposition 7.3.1. Let T : X → X be continuous and topologically transitive on a compact metric space (X, d). Then every continuous T invariant function is constant. Exercise 7.3.1. Prove Proposition 7.3.1. Transitivity requires the existence of one dense orbit. Minimality is a much stronger notion. Definition 7.3.2. A continuous T : X → X on a compact metric space (X, d) is called minimal if the orbit of every x ∈ X under T is dense in X. Example 7.3.1. Let X = {1, 2, . . . , 9}. The discrete metric given by (

ddisc (x, y) =

0, if x = y, 1, if x 6= y,

induces the discrete topology on X and makes it into a compact metric space. If we now define T : X → X by the permutation (12 · · · 9), i.e., T 1 = 2, T 2 = 3,. . .,T 9 = 1, then it is easy to see that T is a homeomorphism. Moreover, {T n x : n ∈ Z} = X for any x ∈ X, so T is minimal. Example 7.3.2. Fix an integer n ≥ 2. Let X = {0} ∪ { n1k : k ∈ Z≥0 } equipped with the subspace topology inherited from R. Then X is a compact metric space. Define T : X → X by T x = nx . Note that T is continuous, but not surjective. Observe that {T n 1 : n ≥ 0} = X \ {0} = X. So, 1 has a dense orbit and thus T is topologically transitive. On the other hand {T n 0 : n ≥ 0} = {0} = 6 X, so T is not minimal. Exercise 7.3.2. Let (X, d) be a compact metric space and T : X → X a topologically transitive homeomorphism. Show that if T is an isometry (i.e., d(T x, T y) = d(x, y) for all x, y ∈ X), then T is minimal. Example 7.3.3. Consider the binary odometer T from Example 1.3.11. In Example 7.1.1 we have seen that T is a homeomorphism, and it is easily checked that T is an isometry. By Exercise 7.3.2, to prove that T is minimal it is enough to show that the point (0, 0, . . .) has a dense orbit.

118  A First Course in Ergodic Theory

So let x = (x1 , x2 , . . .) ∈ {0, 1}N . For k ≥ 1, write n(k) = Then, T n(k) (0, 0, 0, . . .) = (x1 , . . . , xk , 0, 0, . . .)

Pk

j=1

2j−1 xj .

so that d(x, T n(k) (0, 0, . . .)) < 2−(k+1) . This proves that the orbit of (0, 0, 0 . . .) is dense in X, and hence T is a minimal homeomorphism. We have the following analogue of part of Theorem 7.3.1. Proposition 7.3.2. Let T : X → X be a homeomorphism on a compact metric space (X, d). Then T is minimal if and only if for any closed A ⊆ X with T A = A it holds that either A = ∅ or A = X. Proof. Suppose T is minimal and A is a closed set with T A = A. If A 6= ∅, then let x ∈ A. By the T -invariance of A, then OT (x) ⊆ A. The minimality of T implies that X = OT (x) ⊆ A = A. So A = X. For the other direction, let x ∈ X. By the continuity of T , T OT (x) = T OT (x) = OT (x), so by assumption OT (x) = X or OT (x) = ∅ and since x ∈ OT (x) we get OT (x) = X. Minimality implies that the dynamics of T can not be restricted to any subset of X and is in that sense an irreducibility condition. The following result shows that any homeomorphism on a compact metric space can be restricted to a subset on which the map is minimal. A closed T -invariant subset A ⊆ X is called a minimal set for T if the restriction T |A : A → A is minimal. Theorem 7.3.2. Any homeomorphism T : X → X on a compact metric space (X, d) has a minimal set. Proof. Let A = {A ⊆ X : A is closed non-empty and T A = A}. Then X ∈ A and moreover, for any x ∈ X the set OT (x) ∈ A. So A = 6 ∅. A is partially ordered by set inclusion and any totally ordered subset K ⊆ A has the finite intersection property: any intersection of finitely many elements from K is non-empty. As a consequence of the compactness of X, any family of closed subsets of X that has the finite intersection property T T has a non-empty intersection. Hence, K∈K K 6= ∅. Thus K∈K K is a lower bound for K. Then by Zorn’s Lemma there is a minimal element

Invariant Measures for Continuous Transformations  119

M ∈ A. This set M is closed, non-empty and T M = M . Furthermore, M must be a minimal set, since if there exists a point x ∈ X with OT (x) 6= M , then OT (x) ⊆ M and OT (x) ∈ A, contradicting the minimality of M in A. Example 7.3.4. Let S1 denote the unit circle in C and let θ ∈ (0, 1). Use Rθ : S1 → S1 to denote the rotation on S1 with parameter θ given by Rθ z = e2πiθ z = e2πi(θ+ω) for some ω ∈ [0, 2π) as in Example 1.3.2. We have already seen that in case θ is irrational the map Rθ is uniquely ergodic. Now we see that it is minimal as well. Proposition 7.3.3. Let θ ∈ (0, 1). Then Rθ is minimal if and only if θ is irrational. Proof. Recall from Exercise 2.3.1 that if θ is rational, then the orbit of any point under Rθ is periodic. Hence, Rθ can not be minimal in this case. Let θ is irrational. Fix ε > 0 and z ∈ S1 , so z = eiω for some ω ∈ [0, 2π). Then Rθn z = Rθm z ⇔ ei(ω+nθ) = ei(ω+mθ) ⇔ ei(n−m)θ = 1 ⇔ (n − m)θ ∈ Z. Thus, the points Rθn z, n ∈ Z, are all distinct and it follows by compactness that the sequence (Rθn z)n≥1 has a convergent subsequence. Therefore, we can find integers n > m such that d(Rθn z, Rθm z) < ε, where d is the arc length distance function. Now, since Rθ is distance preserving with respect to this metric, we can set ` = n − m to obtain d(Rθ` z, z) < ε. By continuity Rθ` maps the connected, closed arc with endpoints z and Rθ` z with length < ε onto the connected closed arc from Rθ` z to Rθ2` z with length < ε and this one onto the arc connecting Rθ2` z to Rθ3` z with length < ε, etc. Since these arcs have positive and equal length, they cover S1 . The result now follows, since the arcs have length smaller than ε, and ε > 0 was arbitrary. We now give a generalization of Proposition 7.3.3 in higher dimensions. Recall that real numbers x1 , x2 , . . . , xn are called rationally inP dependent if ni=1 ki xi = 0 for some integers k1 , . . . , kn implies that k1 = · · · = kn = 0.

120  A First Course in Ergodic Theory

Proposition 7.3.4. Let γ = (e2πiγ1 , . . . , e2πiγn ) ∈ Tn , where Tn is the n-dimensional torus, and let Tγ : Tn → Tn be defined by Tγ (e2πix1 , . . . , e2πixn ) = (e2πi(x1 +γ1 ) , . . . , e2πi(xn +γn ) ). Then Tγ is minimal if and only if 1, γ1 , . . . , γn are rationally independent. Proof. First note that Tγ is an isometry, so by Exercise 7.3.2 we can replace minimality by topological transitivity. Assume that 1, γ1 , . . . , γn are rationally dependent, then there exist integers k1 , . . . , kn not all zero P and an integer m such that ni=1 ki γi = m. The function f : Tn → C defined by Pn k x 2πi j=1 j j f (e2πix1 , . . . , e2πixn ) = e is obviously non-constant and one can easily check that f ◦ Tγ = f . Proposition 7.3.1 then implies that Tγ is not topologically transitive and hence is not minimal. To prove the converse, assume that 1, γ1 , . . . , γn are rationally independent. We will show that Tγ is topologically transitive using Corollary 7.3.1. Let U be a non-empty Tγ -invariant open set. Then the indicator function 1U is a Tγ -invariant function. Using the Fourier series expansion of 1U , we have for Lebesgue almost all (e2πix1 , . . . , e2πixn ) ∈ Tn that X

1U (e2πix1 , . . . , e2πixn ) =

2πi

ck1 ,...,kn e

Pn j=1

kj xj

,

k1 ,...,kn

and hence 2πix1

1U (Tγ (e

,...,e

2πixn

)) =

X

2πi

ck1 ,...,kn e

Pn j=1

kj γj 2πi

e

Pn j=1

kj xj

.

k1 ,...,kn

Since 1U = 1U ◦P Tγ , then by the uniqueness of the Fourier series we have 2πi

ck1 ,...,knP (1 − e 2πi

n j=1

n j=1

kj γj

) = 0 for all k1 , . . . , kn . Hence either ck1 ,...,kn = 0

kj γj

or e = 1. By rational independence, the latter can only happen if k1 = · · · = kn = 0. This implies that 1U = c0,0,...,0 Lebesgue almost everywhere. Since U is non-empty, this shows that 1U = 1 Lebesgue almost everywhere. Any non-empty open set V has positive Lebesgue measure and thus V ∩ U 6= ∅. Then Corollary 7.3.1 implies that Tγ is topologically transitive. Since Tγ is an isometry, it is minimal as well. Also mixing has a topological analogue.

Invariant Measures for Continuous Transformations  121

Definition 7.3.3. A continuous map T : X → X on a compact metric space (X, d) is called topologically mixing if for any non-empty open sets U, V ⊆ X there is an N ≥ 0, such that T n U ∩ V 6= ∅ for all n ≥ N . Example 7.3.5. We revisit the Bernoulli shift from Example 7.1.1. We show that the one-sided version with X = {0, 1, . . . , k − 1}N is topologically mixing, but with some minor changes the proof also works for the two-sided shift. Recall the metric on X from (7.1). An open ball under this metric is a set of the form B(x, 2−` ) = {y ∈ X : xi = yi for all 1 ≤ j ≤ `}. If we consider two balls B(x, 2−` ) and B(y, 2−m ) for x, y ∈ X with x 6= y and `, m ≥ 1, then the set T n B(x, 2−` ) ∩ B(y, 2−m ) contains all points z = (zi ) ∈ X satisfying zi = yi for all 1 ≤ i ≤ m and zi = xi−n for all n + 1 ≤ i ≤ n + `. Hence, for any n ≥ m this set is non-empty. Since the collection of open balls forms a basis for the product topology, this implies that T is topologically mixing. Example 7.3.6. Consider Arnold’s cat map as in Example 7.1.2. The √ 3+ 5 matrix A has eigenvalues ` = 2 > 1 and 0 < 1` < 1. Corresponding eigenvectors are v` =

√ ! 1+ 5 2

1

and

v1 = `

√ ! 1− 5 2 .

1

The eigenvectors are perpendicular, since A is symmetric, and T is expanding with a factor ` in the direction of v` . For x ∈ T2 the line in T2 through x in the direction of v` is therefore called the unstable manifold of x, denoted by Mu (x). See Figure 7.1 for an illustration. The collection Mu = {Mu (x) : x ∈ T2 } is called the unstable foliation of T . Since Mu (T x) = T Mu (x), the unstable foliation is a T -invariant set. In the direction of the other eigenvector v 1 , T is contracting with a factor ` and we can similarly construct the ` stable manifold Ms (x) for any x ∈ T2 and the corresponding stable foliation Ms = {Ms (x) : x ∈ T2 }. This is a T -invariant set as well. We show that T is topologically mixing. Let x ∈ T2 and ε > 0. Since √ 1 and 1+2 5 are rationally independent, by Proposition 7.3.4 the unstable manifold Mu (x) is dense in T2 . So, if we place open balls B(y, ε) at all

122  A First Course in Ergodic Theory

points y ∈ Mu (x), we get an open cover of T2 and then by compactness a finite open subcover U. The centers of the balls in U specify a finite number of points in Mu (x). This implies that there exists a bounded segment S ⊆ Mu (x) of which the ε-neighborhood {z ∈ T2 : ∃ y ∈ S s.t. z ∈ B(y, ε)} covers T2 . If we take any translate of S, so S + y for any y ∈ T2 , this again gives a cover of T2 (since translations are isometries). Note that S + y ⊆ Mu (x + y). Let D(ε) denote the length of the arc S. We can deduce that for any ε > 0 and any y ∈ T2 the ε-neighborhood of any arc of length D(ε) in Mu (y) covers T2 . Now let U, V be two non-empty open sets in T2 . Let B(x, δ) ⊆ U , δ > 0, be a non-empty open ball inside U . Then U contains an arc of length 2δ in Mu (x). Let B(y, ε) ⊆ V with ε > 0 be a closed ball inside V . Choose an N such that `N 2δ > D(ε). For any n ≥ N the set T n U contains an arc of length at least D(ε) from some unstable manifold Mu (z), z ∈ T2 . From the above we see that an ε-neighborhood of T n U covers T2 . So it must intersect B(y, ε), and thus V as well. Hence, T is topologically mixing. 1

x

0

1

The torus T2 with part of the unstable manifold Mu (x) indicated by the dashed lines. The arrow points in the direction of the eigenvector v` . Figure 7.1

Finally, for a topological dynamical system there are notions analogous to factor maps and isomorphisms that allow one to carry dynamical properties from one system to another. Definition 7.3.4. Let (X, d), (Y, ρ) be two compact metric spaces and let T : X → X, S : Y → Y be continuous maps.

Invariant Measures for Continuous Transformations  123

(i) A surjective, continuous map φ : X → Y satisfying φ ◦ T = S ◦ φ is called a topological semi-conjugacy. If such a map φ exists, then S is called a topological factor of T . (ii) A homeomorphism φ : X → Y satisfying φ ◦ T = S ◦ φ is called a (topological) conjugacy. If such a map φ exists, then T and S are said to be topologically conjugate. Proposition 7.3.5. Let (X, d), (Y, ρ) be compact spaces and let T : X → X, S : Y → Y be continuous maps with S a topological factor of T . The following hold. (i) If T is topologically transitive, then S is topologically transitive. (ii) If T is minimal, then S is minimal. (iii) If T is mixing, then S is mixing. Exercise 7.3.3. Prove Proposition 7.3.5.

CHAPTER

8

Continued Fractions

Ergodic theory has been particularly useful in the area of continued fractions, where several proofs were simplified quite a bit by taking the dynamical approach. A regular continued fraction expansion of a number x ∈ [0, 1) is an expression of the form 1

x=

or

1

a1 +

a3 +

1

a2 +

1

,

1

a1 +

1

a2 +

1

x=

a3 +

1

.. 1 . an with ai ∈ N for all i ≥ 1. The finite expansion occurs for x ∈ Q and the infinite one for x ∈ [0, 1) \ Q. Recall the definition of the Gauss map T : [0, 1) → [0, 1) from Example 1.3.7 by T 0 = 0 and ..

.+

1 1 Tx = − x x for x 6= 0. The graph is shown in Figure 1.3(c). In Example 1.3.7 we saw that iterations of T produce for x ∈ [0, 1) expressions of the form  

1

x= a1 +

,

1

1 an + T n x where the digits an , n ≥ 1, are obtained by setting (

an = an (x) =

. a2 + . . +

1, if T n−1 x ∈ k, if T n−1 x ∈

1 2, 1 ,  1 1 k+1 , k ,



k ≥ 2, 125

126  A First Course in Ergodic Theory

as long as T n−1 x 6= 0. It is a consequence of Euclid’s algorithm that T produces a finite continued fraction expansion for each x ∈ [0, 1) ∩ Q. In this chapter we first prove that also in case x ∈ [0, 1) is irrational this process converges and in the limit produces a regular continued fraction expansion of x. After that we discuss some of the approximation properties of the continued fractions and construct a natural extension for the map T .

8.1

REGULAR CONTINUED FRACTIONS

For numbers a1 , a2 , a3 , . . . (not necessarily integers), we use [0; a1 , a2 , a3 , . . .] to denote the continued fraction expansion 1

[0; a1 , a2 , a3 , . . .] =

.

1

a1 + a2 +

1 . a3 + . .

Fix some irrational x ∈ [0, 1) \ Q and let (an )n≥1 ∈ NN be the infinite string of digits produced by the Gauss map, called the partial quotients of x or continued fraction digits. Set for each n ≥ 1, υn =

pn := [0; a1 , a2 , . . . , an ]. qn

These rational numbers are called the convergents of x. The goal of this section is to show that x = limn→∞ pqnn . This is done by studying elementary properties of matrices associated with the continued fraction expansions. Let A ∈ SL2 (Z), that is !

r p A= , s q where r, s, p, q ∈ Z and det A = rq − ps ∈ {±1}. Set C∗ = C ∪ {∞}. Associated to A is the M¨obius (or fractional linear) transformation A : C∗ → C∗ defined by !

rz + p r p A(z) = (z) = , s q sz + q

Continued Fractions  127

where we let

1 ∞

= 0 and

1 0

= ∞. If we set for each n ≥ 1 !

0 1 An := , 1 an

and

Mn := A1 A2 · · · An ,

(8.1)

then An (z) = an1+z and multiplication of the matrices in Mn corresponds to the composition of the associated M¨obius transformations. Note that υn = Mn (0). This allows us to find recurrence relations for pn and qn . Writing ! rn pn Mn = , sn qn for some rn , sn , it follows that rn pn sn qn

!

r p = Mn = Mn−1 An = n−1 n−1 sn−1 qn−1

!

!

0 1 . 1 an

So rn = pn−1 and sn = qn−1 and we obtain the recurrence relations p−1 := 1; p0 := a0 = 0; pn = an pn−1 + pn−2 , n ≥ 1, (8.2) qn = an qn−1 + qn−2 , n ≥ 1.

q−1 := 0; q0 := 1;

From this it immediately follows that pn (x) = qn−1 (T x) for all n ≥ 0 and x ∈ (0, 1), and that pn−1 qn − pn qn−1 = det Mn = (−1)n ,

for all n ≥ 1.

(8.3)

This in turn gives gcd(pn , qn ) = 1,

for all n ≥ 1.

Setting !

A∗n

0 1 := , 1 an + T n x

it follows from x = Mn−1 A∗n (0) = [ 0; a1 , . . . , an−1 , an + T n x ], that x=

pn + pn−1 T n x , qn + qn−1 T n x

(8.4)

128  A First Course in Ergodic Theory

i.e., x = Mn (T n x). Combining (8.3) and (8.4) gives x−

pn (−1)n T n x . = qn qn (qn + qn−1 T n x)

(8.5)

In fact, (8.5) yields information about the quality of approximation of the rational number υn = pqnn to the irrational number x. Since T n x ∈ [0, 1), it at once follows that x − pn < 1 , qn qn2

From

1 T nx

for all n ≥ 0.

(8.6)

= an+1 + T n+1 x we get 1 x − pn = . qn qn (qn an+1 + qn T n+1 x + qn−1 )

As can be seen from the recurrence relations in (8.2) the sequence (qn ) is increasing, so one even has



1 pn 1 < x − < , 2qn qn+1 qn qn qn+1

for all n ≥ 1.

(8.7)

Notice that the recurrence relations (8.2) yield that υn − υn−1 =

(−1)n−1 , qn−1 qn

for all n ≥ 1.

(8.8)

This together with (8.5) gives 0 = υ0 < υ2 < υ4 < · · · < x < · · · < υ3 < υ1 < 1.

(8.9)

A natural question to ask now is: “Given a sequence of positive integers (an )n≥1 , does the limit limn→∞ υn exist? And if so, and the limit equals x, do we have that x = [0; a1 , . . . , an , . . .]?” The answer to both questions is “yes” as the following proposition shows. Proposition 8.1.1. Let (an )n≥1 be a sequence of positive integers, and let the sequence of rational numbers (υn )n≥1 be given by υn := [0; a1 , . . . , an ],

for all n ≥ 1.

Then there exists an irrational number x ∈ [0, 1) for which lim υn = x

n→∞

and we moreover have that x = [0; a1 , a2 , . . . , an , . . .].

Continued Fractions  129

Proof. Let the sequence (an )n≥1 be given and use An to denote the matrices specified by this sequence as in (8.1). Write υ0 := 0 and for n ≥ 1, υn = pqnn . We saw before that pn−1 pn qn−1 qn

!

= A1 · · · An .

From (8.8), (8.9) and υ0 = 0 one sees that υn =

n X (−1)k−1

qk−1 qk

k=1

.

Hence the Leibniz Test for the convergence of alternating series yields that x = limn→∞ υn exists. What is left is to show that an = an (x) for n ≥ 1, i.e., that (an )n≥1 is the sequence of partial quotients of x. Since υn = [0; a1 , a2 , . . . , an ] =

1 1 = , a1 + [0; a2 , a3 , . . . , an ] a1 + υn∗

(8.10)

where υn∗ = [0; a2 , a3 , . . . , an ], it is sufficient to show that 1 = a1 . x

 

Taking limits as n → ∞ in (8.10) yields x=

1 , a1 + x ∗

where x∗ = limn→∞ υn∗ . From 0 < υ2∗ < x∗ < υ3∗ < 1 it follows that 1 1 ∗ x = a1 + x ∈ (a1 , a1 + 1), so x = a1 . Exercise 8.1.1. Let ∆(a1 , . . . , ak ) := {x ∈ [0, 1) : a1 (x) = a1 , . . . , ak (x) = ak },

(8.11)

where aj ∈ N for each 1 ≤ j ≤ k. Show that ∆(a1 , a2 , . . . , ak ) is an interval in [0, 1) with endpoints pk qk where

pk qk

and

pk + pk−1 , qk + qk−1

= [0; a1 , a2 , . . . , ak ]. Conclude that 

λ ∆(a1 , a2 , . . . , ak ) =

1 , qk (qk + qk−1 )

where λ is Lebesgue measure on [0, 1). These are the fundamental intervals for the Gauss map T .

130  A First Course in Ergodic Theory

Exercise 8.1.2. Let a1 , . . . , an ∈ N and 0 ≤ a < b ≤ 1. Write ∆n = ∆(a1 , · · · an ) for the fundamental intervals as in (8.11). Show that

T −n [a, b) ∩ ∆n =

  pn−1 a + pn pn−1 b + pn   , ,     qn−1 a + qn qn−1 b + qn

if n is even,

     pn−1 b + pn pn−1 a + pn    , ,

if n is odd.

qn−1 b + qn qn−1 a + qn

Conclude that λ(T −n [a, b) ∩ ∆n ) = λ([a, b))λ(∆n )

qn (qn−1 + qn ) , (qn−1 b + qn )(qn−1 a + qn )

where λ is Lebesgue measure on [0, 1).

8.2

ERGODIC PROPERTIES OF THE GAUSS MAP

In Exercise 1.3.4 we saw that the Gauss map is measure preserving with respect to the Gauss measure µ given by Z

µ(B) = B

1 1 dλ(x) log 2 1 + x

(8.12)

for Borel sets B ⊆ [0, 1). (Here and in the rest of the chapter log refers to 1 the natural logarithm.) Note that the density x 7→ log1 2 1+x is bounded from above and bounded away from 0, so µ and λ are equivalent. The next theorem gives the ergodicity of T with respect to µ. Theorem 8.2.1. Let T : [0, 1) → [0, 1) be the Gauss map given by  T x = x1 − x1 defined on the measure space ([0, 1), B, µ), where B is the Borel σ-algebra on [0, 1) and µ the Gauss measure from (8.12). Then, T is ergodic with respect to µ. Proof. We prove the theorem by applying Knopp’s Lemma, Lemma 2.3.1, to the collection of fundamental intervals. First we estimate the Gauss measure of inverse images of Borel sets. Let I be an interval in [0, 1) with endpoints a < b, and ∆n = ∆(a1 , . . . , an ) a fundamental interval of order n. From Exercise 8.1.2, we know that λ(T −n I ∩ ∆n ) = λ(I)λ(∆n )

qn (qn−1 + qn ) . (qn−1 b + qn )(qn−1 a + qn )

Continued Fractions  131

Since qn−1 < qn and 0 ≤ a < b ≤ 1, it follows that 1 qn qn (qn−1 + qn ) qn (qn−1 + qn ) < < < < 2. 2 qn−1 + qn (qn−1 b + qn )(qn−1 a + qn ) qn2 Therefore we find that 1 λ(I)λ(∆n ) < λ(T −n I ∩ ∆n ) < 2λ(I)λ(∆n ). 2 Let A be a finite disjoint union of intervals. Since Lebesgue measure is additive one has 1 λ(A)λ(∆n ) ≤ λ(T −n A ∩ ∆n ) ≤ 2λ(A)λ(∆n ). 2

(8.13)

The collection of finite disjoint unions of intervals generates the Borel σ-algebra. It follows that (8.13) holds for any Borel set A. From (8.12) it is clear that 1 1 λ(A) ≤ µ(A) ≤ λ(A). (8.14) 2 log 2 log 2 Then by (8.13) and (8.14) one has µ(T −n A ∩ ∆n ) ≥

log 2 µ(A)µ(∆n ). 4

(8.15)

Now let S be the collection of all fundamental intervals ∆n . Since the set of all endpoints of these fundamental intervals is the set of all rationals in [0, 1), condition (a) of Knopp’s Lemma is satisfied. Let B ∈ B is a T -invariant set, i.e., T −1 B = B, and suppose that µ(B) > 0. It then follows from (8.15) that for every fundamental interval ∆n µ(B ∩ ∆n ) ≥

log 2 µ(B)µ(∆n ). 4

So condition (b) from Knopp’s Lemma is satisfied with γ = log4 2 µ(B). Thus λ(B) = 1 and by the equivalence of µ and λ also µ(B) = 1. Hence T is ergodic. Recall Exercise 3.2.2 where we computed the frequency of the even and odd digits in typical L¨ uroth expansions. We can do a similar computation for the continued fractions generated by the Gauss map.

132  A First Course in Ergodic Theory

Exercise 8.2.1. For x ∈ [0, 1], let (an (x))n≥1 denote the finite or infinite sequence of regular continued fraction digits of x as produced by the Gauss map. (a) Let E denote the set of x ∈ [0, 1] for which the sequence (an (x))n≥1 contains only odd integers. Prove that E has zero Lebesgue measure. (b) Prove that for Lebesgue almost every x ∈ [0, 1] we have #{1 ≤ k ≤ n : ak (x) is odd} log π = − 1. n→∞ n log 2 lim

Hint: Use the identity Y

1+

k≥1

1  π = . 4k 2 − 1 2

In fact, using Proposition 6.2.3 we can even show that the Gauss map is exact. To see this, first note that the collection S of all fundamental intervals ∆n satisfies the conditions of the proposition. Consider a Borel set A ⊆ ∆n for some fundamental interval ∆n . The n-th iterate T n is invertible as a map from ∆n to [0, 1). From (8.4) we know the local inverse is given by pn + pn−1 y T −n y = , qn + qn−1 y so by using (8.3) and the change of variables formula (6.6) we get Z

λ(A) =

Z

dλ = A

T nA

1 dλ(y) (qn + qn−1 y)2

1 ≥ 2 λ(T n A) 4qn 1 1 1 ≥ λ(T n A) = λ(∆n )λ(T n A), 4 qn (qn + qn+1 ) 4 where the last equality follows from Exercise 8.1.1. Now using (8.14) we obtain µ(A) ≥

1 log 2 1 λ(A) ≥ λ(T n A)λ(∆n ) ≥ µ(T n A)µ(∆n ). 2 log 2 8 log 2 8

So, condition (6.4) from Proposition 6.2.3 is satisfied with γ =

8 log 2 .

Using the Pointwise Ergodic Theorem we can give simple proofs of old and famous results of P. L´evy from 1929; see [35].

Continued Fractions  133

Proposition 8.2.1 (L´ evy). For Lebesgue almost all x ∈ [0, 1) one has 1 log qn = n→∞ n 1 lim log λ(∆n ) = n→∞ n pn 1 lim log x − = n→∞ n qn lim

π2 , 12 log 2 −π 2 , 6 log 2 −π 2 . 6 log 2

(8.16) and

(8.17) (8.18)

In the proof of the proposition we will use that for each n ≥ 1 the inequality qn (x) ≥ Fn holds, where F1 , F2 , F3 , . . . is the Fibonacci sequence 1, 1, 2, 3, 5, . . .. To see this, just notice that by the recurrence relations in (8.2) qn = Fn precisely when ai = 1 for all 1 ≤ i ≤ n, which gives the smallest possible value of qn . Proof of Proposition 8.2.1. By the recurrence relations (8.2), for any irrational x ∈ [0, 1) one has pn (x) = qn−1 (T x), so that 1 1 pn (x) pn−1 (T x) p2 (T n−2 x) = · · · qn (x) qn (x) qn−1 (T x) qn−2 (T 2 x) q1 (T n−1 x)

=

pn (x) pn−1 (T x) p1 (T n−1 x) ··· . qn (x) qn−1 (T x) q1 (T n−1 x)

Taking logarithms yields − log qn (x) = log

pn (x) pn−1 (T x) p1 (T n−1 x) + log + · · · + log . (8.19) qn (x) qn−1 (T x) q1 (T n−1 x)

(x) For any k ∈ N, and any irrational x ∈ [0, 1), pqkk (x) is a rational number close to x. Therefore we compare the right-hand side of (8.19) with log x + log T x + log T 2 x + · · · + log T n−1 x.

We have − log qn (x) = log x + log T x + log T 2 x + · · · + log T n−1 x + R(n, x). In order to estimate the error term R(n, x), we recall from Exercise +pn−1 8.1.1 that x lies in the interval ∆n , which has endpoints pqnn and pqnn +q . n−1 Therefore and by Exercise 8.1.1, in case n is even, one has 0 < log x − log

pn  pn  1 1 1 1 1 = x− ≤ < ≤ , qn qn ξ qn (qn−1 + qn ) pn /qn qn Fn

134  A First Course in Ergodic Theory

where ξ ∈ pqnn , x is given by the Mean Value Theorem. A similar argument shows that pn 1 1 0 < log − log x < ≤ qn qn Fn 

in case n is odd. Thus |R(n, x)| ≤

1 1 1 + + ··· + . Fn Fn−1 F1

The n-th Fibonacci number can be expressed in√ terms of the golden √ 1+ 5 mean G = 2 and the small golden mean g = 5−1 = G1 by 2 Fn =

Gn + (−1)n+1 g n √ . 5

1 + ··· + It follows that Fn ∼ √15 Gn as n → ∞. Thus F1n + Fn−1 n-th partial sum of a convergent series, and therefore

|R(n, x)| ≤

1 F1

is the

X 1 1 1 + ··· + ≤ < ∞. Fn F1 F n≥1 n

Hence for each x for which 1 lim (log x + log T x + log T 2 x + · · · + log T n−1 x) n→∞ n exists, the limit 1 − lim log qn (x) n→∞ n exists as well, and both limits have the same value. 1 Now lim (log x + log T x + log T 2 x + · · · + log T n−1 x) is ideally n→∞ n suited for the Pointwise Ergodic Theorem; we only need to check that the conditions of the Pointwise Ergodic Theorem are satisfied and to calculate the integral. This is left as an exercise for the reader. This proves (8.16). From Exercise 8.1.1 we have λ(∆n (a1 , . . . , an )) =

1 . qn (qn + qn−1 )

Since qn2 < qn (qn + qn−1 ) < 2qn2 , this gives − log 2 − 2 log qn < log λ(∆n ) < −2 log qn . Then (8.17) is obtained from dividing by n and applying (8.16). Finally (8.18) follows from (8.7) and (8.16).

Continued Fractions  135

Exercise 8.2.2. (a) Show that the arithmetic mean of the partial quotients an = an (x) given by the Gauss map satisfies 1/n

lim (a1 a2 . . . an )

n→∞

=

∞  Y k=1

1 1+ k(k + 2)

 log k log 2

λ-a.e.

(b) Show that the geometric mean of the partial quotients satisfies a1 + a2 + · · · + an lim = ∞ for λ a.e. x ∈ [0, 1). n→∞ n

8.3

THE DOEBLIN-LENSTRA CONJECTURE

A planar and a very useful version of a natural extension of the Gauss map was given by S. Ito, H. Nakada and S. Tanaka, see [46, 45]. Theorem 8.3.1 (Ito, Nakada and Tanaka). Let X = [0, 1) × [0, 1] and B 2 be the collection of Borel sets of X. Define the two-dimensional Gauss-measure ν on (X, B 2 ) by 1 ν(A) = log 2

ZZ A

1 dλ(x) dλ(y), (1 + xy)2

for all A ∈ B 2 ,

where λ denotes the one-dimensional Lebesgue measure. Finally, let the transformation T : X → X be defined for (x, y) ∈ X by T (0, y) = (0, y)

1 T (x, y) = T x,  1  x +y

and

!

, for x 6= 0.

(8.20) Then (X, B , ν, T ) is the natural extension of ([0, 1), B, µ, T ). Furthermore, it is ergodic. 2

Figure 8.1 illustrates T . Proof. We show that T is measure preserving with respect to ν and leave the remainder of the proof that T is the natural extension of T as an exercise below. It then follows from the ergodicity of T that T isergodic. 1 The map T is λ2 -a.e. invertible on X. For (x, y) ∈ (0, 1)× n+1 , n1 , n ≥ 1, let (ξ, η) ∈ X be such, that (ξ, η) = T (x, y). Then ξ=

1 1 −c ⇔ x = , x c+ξ

η=

1 1 ⇔ y = − c. c+y η

and

136  A First Course in Ergodic Theory

Hence the above coordinate transformation has Jacobian J, which satisfies −1 ∂x ∂x 0 1 1 (c+ξ)2 ∂ξ ∂η J = ∂y ∂y = , −1 = 2 η2 ∂ξ ∂η 0 (c + ξ) η2 and therefore using the change of variables formula (6.6) we find 1 1 dλ(x) dλ(y) log 2 A (1 + xy)2 ZZ 1 1 1 = dλ(ξ) dλ(η) 1 1 2 log 2 T A (1 + c+ξ (c + ξ)2 η 2 ( η − c)) ZZ

ν(A) =

1 log 2 =ν(T A)

ZZ

=

TA

1 dλ(ξ) dλ(η) (1 + ξη)2

for any A ∈ B 2 . This gives the result. Exercise 8.3.1. Prove the remainder of Theorem 8.3.1.

1

1

T

···

0

1 1 4 3

1 2

1 2 1 3

0

1

. . .

1

The natural extension map T from Theorem 8.3.1 maps regions on the left to regions of the same shade of grey on the right. Figure 8.1

We define for every real number x ∈ [0, 1) and every n ≥ 0 the so-called approximation coefficients Θn (x) by Θn (x) =

qn2

x − pn . q

(8.21)

n

Not only do these coefficients give a measure for the quality of approximation of the continued fraction convergents pqnn to an irrational number x, but they also determine the rationals that appear as such convergents.

Continued Fractions  137

It follows from (8.6) that Θn (x) < 1 for all irrational x ∈ [0, 1) and n ≥ 0. In 1981, H. W. Lenstra conjectured (this conjecture was formulated by W. Doeblin in a slightly different way in [16]) that for each 0 ≤ z ≤ 1 there is a set Bz of full Lebesgue measure, such that for each x ∈ Bz one has 1 #{1 ≤ j ≤ n : Θj (x) ≤ z} = F (z), n→∞ n lim

where F is the distribution function defined by  z  , if 0 ≤ z ≤ 12 ,    log 2  F (z) =     1 (1 − z + log(2z)), if 1 ≤ z ≤ 1.  2 log 2

(8.22)

In other words: for Lebesgue almost all x the sequence (Θn (x))n≥1 has limiting distribution F . The proof of this conjecture uses the Pointwise Ergodic Theorem applied to the natural extension (X, B 2 , ν, T ). To see how we can apply this theorem, we first use (8.5) to rewrite Θn (x) as Θn (x) =

qn2

T nx x − p n = . n qn 1 + qn−1 qn · T x

By the recurrence relations from (8.2) we have qn−1 qn−1 1 , = = qn an qn−1 + qn−2 an + qqn−2 n−1 so setting Vn = Vn (x) = Vn =

qn−1 qn

yields for n ≥ 1 that

1 = ··· = an + Vn−1 an +

1 an−1 +

1 ..

= [0; an , an−1 , . . . , a1 ]. .+

1 a1

Exercise 8.3.2. (a) Show that for any x ∈ [0, 1) one has T n (x, 0) = (T n x, Vn ). (b) Show that for any x ∈ [0, 1) one has lim (T n (x, 0) − T n (x, y)) = 0,

n→∞

where the convergence is uniform in y.

138  A First Course in Ergodic Theory

From the above, we can interpret Vn as “the past of x at time n” (in the same way as T n x is the “future of x at time n”). An immediate consequence of this and (8.5) is that Θn = Θn (x) =

T nx , 1 + Vn (x)T n x

n ≥ 0.

(8.23)

Exercise 8.3.3. Show that Θn−1 = Θn−1 (x) =

Vn (x) , 1 + Vn (x)T n x

n ≥ 1.

(8.24)

An important consequence of this observation and Theorem 8.3.1 is the following theorem by H. Jager from 1986 stating that for Lebesgue almost all x ∈ [0, 1) the two-dimensional sequence (T n x, Vn (x))n≥0 is distributed over X according to the measure ν. See [25]. Theorem 8.3.2 (Jager). For Lebesgue almost all x ∈ [0, 1) and any Borel set B ⊆ X it holds that lim

n→∞

1 #{0 ≤ j ≤ n − 1 : (T j x, Vj (x)) ∈ B} = ν(B). n

Proof. Denote by E the subset of numbers x ∈ [0, 1) for which the sequence (Tn x, Vn (x))n≥1 is not distributed according to the measure ν. Since the sequence (T n (x, 0)−T n (x, y))n≥0 converges to 0 uniformly in y by Exercise 8.3.2(b), it follows from Exercise 8.3.2(a) that for every pair (x, y) ∈ E × [0, 1] the sequence T n (x, y)n≥0 is not distributed according to the measure ν, i.e., there is a Borel set B ⊆ X such that X 1 n−1 1B (T j (x, y)) 6= ν(B). n→∞ n j=0

lim

Now if λ(E) > 0, then λ2 (E × [0, 1]) > 0 and by definition of ν also ν(E × [0, 1]) > 0. This would be in conflict with the ergodicity of T with respect to ν, obtained in Theorem 8.3.1. Hence, λ(E) = 0. Lenstra’s Conjecture now follows directly from this theorem. x Proof of Lenstra’s Conjecture. Let Az = (x, y) ∈ X : 1+xy ≤ z . j Then, ν(Az ) = F (z), and Θj (x) ≤ z ⇔ T (x, 0) ∈ Az . Furthermore,



n 1 1X #{1 ≤ j ≤ n : Θj (x) ≤ z} = 1A (T j (x, 0)). n n j=1 z

Taking limits and using the above theorem we get the required result.



Continued Fractions  139

A corollary of Lenstra’s Conjecture is that for Lebesgue almost all x ∈ [0, 1), X 1 n−1 1 lim = 0.360673 . . . . (8.25) Θj (x) = n→∞ n 4 log 2 j=0 Exercise 8.3.4. Use Exercise 8.3.2 and the Pointwise Ergodic Theorem x to prove (8.25). applied to the map G(x, y) = 1+xy

8.4

OTHER CONTINUED FRACTION TRANSFORMATIONS

Regular continued fraction expansions are specific examples of the semiregular continued fraction expansions that were introduced in 1913 by O. Perron; see for example [49]. These are finite or infinite expressions for real numbers of the form 0

x = a0 + a1 +

,

1 . a2 + . . +

n−1 . an + . .

where a0 ∈ Z and for each n ≥ 1, n−1 ∈ {−1, 1}, an ∈ N and an +n ≥ 1. For convergence reasons one asks that in case the expression is infinite, then an + n+1 ≥ 1 infinitely often. Besides the Gauss map there are many other transformations that produce semi-regular continued fraction expansions by iteration. Several of them are defined based on the properties of the digits involved. Recall from Exercise 8.2.1 that the set of x ∈ [0, 1] that have only odd regular continued fraction digits has Lebesgue measure 0. The even and odd continued fraction transformations produce for each x ∈ [−1, 1] a semi-regular continued fraction expansion with only even and odd digits, respectively. They are given by (see Figures 8.2(a) and (b)). Te : [−1, 1] → [−1, 1], x 7→

( 1 − 2n, x

0,

1 1 if |x| ∈ 2n+1 , 2n−1 , n ≥ 1, if x = 0,



and  1    x − 1, 1 − (2n + 1), To : [−1, 1] → [−1, 1], x → 7 x   0,

if |x| ∈ 12 , 1 ,  1 1 if |x| ∈ 2n+2 , 2n , n ≥ 1, if x = 0. 

140  A First Course in Ergodic Theory 1

1

0

0

0 α−1

−1

− 13 0 15 31

−1

1

(a) Te

0

− 12

1 1 4 2

1

−1

(b) To

− 14

α

0

(c) Tα

Figure 8.2 The even (a) and odd (b) continued fraction maps and the 7 . α-continued fraction map (c) for α = 10

A particularly useful continued fraction transformation is the nearest  integer continued fraction map T 1 , which for x ∈ − 12 , 12 chooses as 2 the digit a1 the integer nearest to x1 . It is defined by T 1 (x) = 2

 j k  1 − 1 + 1 ,

if x 6= 0,

0,

if x = 0.

x

x

2

The continued fraction convergents pq˜˜nn obtained from T 1 for irrational 2  numbers x ∈ 0, 12 are a subsequence of the regular continued fraction convergents pqnn produced by the Gauss map for the same x (see [49]) and hence converge to x faster. Recall that the α-continued fraction transformations {Tα : [α − 1, α] → [α − 1, α]}α∈[0,1] were given by Tα x =

 j k  1 − 1 + 1 − α ,

if x 6= 0,

0,

if x = 0,

x

x

see also (6.15). Figure 8.2(c) shows the graph of T 7 . We can recognize 10 T 1 for α = 12 and of course the Gauss map for α = 1. 2

From all of these maps Tˆ = Te , To , Tα one can obtain continued fraction digits of real numbers in the domain of the map by iteration. In each case the procedure is essentially the same as for the Gauss map: if at time n the branch x 7→ x1 − k is applied, then the n-th digit an will be k and the value of n is determined by the orientation of x 7→ x1 − k. In this way, we can associate to each x that never gets mapped to 0 two

Continued Fractions  141

sequences (n )n≥0 ∈ {0, 1}N and (an )n≥1 ⊆ N such that for each n ≥ 1, 0

x= a1 +

.

1 . a2 + . . +

n−1 an + Tˆn x

The next question is whether this process converges. An essential ingredient in the proof of Proposition 8.1.1 for the Gauss map is that the sequence (qn ) is increasing. This is not generally true for the convergents produced by the other maps. Instead of treating each case separately, we will provide one proof that shows the convergence for all the maps Te , To , Tα simultaneously. This is done by placing the maps Te , To , Tα in a larger framework. The following dynamical system was described in [27]. Let S : {0, 1}N → {0, 1}N be the left shift. We define the random continued fraction map K : {0, 1}N × [−1, 1] → {0, 1}N × [−1, 1] by 1 setting for x = (xn )n≥1 that K(x, 0) = (Sx, 0) and for |y| ∈ k+1 , k1 , 1   K(x, y) = Sx, − (k + x1 ) . y

The transformation K can be interpreted as follows. For each y ∈  1 [−1, 1] \ {0} there is a unique k ≥ 1 so that |y| ∈ k+1 , k1 . Then both 1 − k, 1 − k − 1 ∈ [−1, 1]. The first digit of x determines which of y y these two points becomes the image of y. Under a Bernoulli measure on {0, 1}N this resembles flipping a coin at each time step to determine what will be the next orbit point of y, which explains the name of the transformation. For n = 1, set d1 = d1 (x, y) =

  i 1 1   k + x , if |y| ∈ , 1  k+1 k ,    ∞,

if y = 0,

and for n ≥ 2, define dn (x, y) = d1 K n−1 (x, y) . Use π : {0, 1}N × [−1, 1] → [−1, 1] to denote the projection onto the second coordinate, so π(x, y) = y, and set x0 = 0 if y ≥ 0 and x0 = 1 otherwise. Then the sequence (dn ) is such that we can write 

π K(x, y) = (−1)x0 

1 − d1 . y

(8.26)

142  A First Course in Ergodic Theory

For each n ≥ 1 with π(K n (x, y)) 6= 0, we get (−1)x0 (−1)x1 d1 + d2 + π(K 2 (x, y)) (−1)x0 = ··· = . (−1)x1 d1 + (−1)xn−1 . d2 + . . + dn + π(K n (x, y))

(−1)x0 = y= d1 + π(K(x, y))

If there is a smallest integer n such that dn (x, y) = ∞ (so K n−1 (x, y) = (S n−1 x, 0)), then y=

(−1)x0 . (−1)x1 d1 + (−1)xn−2 . d2 + . . + dn−1

Suppose that dn (x, y) < ∞ for all n ≥ 1. We would like to show that y=

(−1)x0 . (−1)x1 d1 + (−1)xn−1 . d2 + . . + . dn + . .

For each n ≥ 1, let pqˆˆnn denote the convergent for y given by the sequence (dn (x, y)), i.e., write pˆn = qˆn

(−1)x0 . (−1)x1 d1 + (−1)xn−1 . d2 + . . + dn

Following the approach from Section 8.1 we obtain the following relations: pˆ−1 = 1,

pˆ0 = 0,

pˆn = dn pˆn−1 + (−1)xn−1 pˆn−2 ,

qˆ−1 = 0,

qˆ0 = 1,

qˆn = dn qˆn−1 + (−1)xn−1 qˆn−2 .

(8.27)

Continued Fractions  143

Using these recurrences, induction easily gives that pˆn + pˆn−1 π K n (x, y)  y= qˆn + qˆn−1 π K n (x, y)

(8.28)

pˆn−1 qˆn − pˆn qˆn−1 = (−1)n (−1)x0 +···+xn−1 .

(8.29)



and that We will show that although the sequence (ˆ qn )n≥1 is not necessarily increasing, we still have limn→∞ qˆ1n = 0. The following properties of the numbers qˆn can easily be proved by induction. Exercise 8.4.1. (a) Prove that qˆn > 0 for each n ≥ 2 and that moreover, if qˆn ≤ qˆn−1 , then xn−1 = 1 and dn = 1, so xn = 0. (b) Prove that if qˆn ≤ qˆn−1 , then qˆn−2 < qˆn−1 < qˆn+1 and in fact qˆn 6= qˆn−1 . Before we can prove that the process converges, we need a lower bound on the qˆn ’s in case qˆn < qˆn−1 . This is done in the next lemma. Lemma 8.4.1. Suppose qˆn < qˆn−1 , so dn−1 > 1. (i) If dn−1 > 2, then qˆn > qˆn−2 . (ii) If dn−1 = 2 and (xk , dk ) = (1, 2) for all 1 ≤ k ≤ n − 1, then qˆn = 1 and qˆn−1 = n. (iii) Suppose dn−1 = 2 and there is a 1 ≤ k < n − 1 such that (xk , dk ) 6= (1, 2). Let k be the largest such index. Then qˆn > qˆk−1 . Proof. From Exercise 8.4.1(a) we see that qˆn < qˆn−1 implies that dn = 1, xn = 0 and xn−1 = 1, so dn−1 > 1. Hence, by the recurrence relations from (8.27) qˆn = qˆn−1 − qˆn−2 . (i) If dn−1 > 2, then qˆn = dn−1 qˆn−2 + (−1)xn−2 qˆn−3 − qˆn−2 = (dn−1 − 1)ˆ qn−2 + (−1)xn−2 qˆn−3 ≥ 2ˆ qn−2 + (−1)xn−2 qˆn−3 . If qˆn−2 > qˆn−3 , then qˆn ≥ 2ˆ qn−2 − qˆn−3 > qˆn−2 . If qˆn−2 < qˆn−3 , then Exercise 8.4.1(a) gives that xn−2 = 0 and hence qˆn ≥ 2ˆ qn−2 + qˆn−3 > qˆn−3 > qˆn−2 .

144  A First Course in Ergodic Theory

For both (ii) and (iii), note that if dk = 2 and xk = 1 for some 1 ≤ k ≤ n − 1, then qˆk = 2ˆ qk−1 − qˆk−2 , so qˆk − qˆk−1 = qˆk−1 − qˆk−2 .

(8.30)

(ii) From (8.30) it follows that qˆn = qˆn−1 − qˆn−2 = qˆ2 − qˆ1 = 1. Moreover, for each 1 ≤ k ≤ n − 1 it holds that qˆk = 2ˆ qk−1 − qˆk−2 = qˆk−1 + (ˆ qk−1 − qˆk−2 ) = qˆk−1 + 1. Hence, qˆn−1 = n − 2 + qˆ1 = n. (iii) Let k be as given in the lemma, so (xk , dk ) 6= (1, 2) and (xj , dj ) = (1, 2) for k + 1 ≤ j ≤ n − 1. Then, qˆn = qˆn−1 − qˆn−2 = qˆk+1 − qˆk = dk+1 qˆk + (−1)xk qˆk−1 − qˆk = qˆk + (−1)xk qˆk−1 . If xk = 0, then qˆn = qˆk + qˆk−1 > qˆk−1 . If xk = 1, then dk ≥ 3 and Exercise 8.4.1(a) implies qˆk > qˆk−1 . This gives qˆn = qˆk − qˆk−1 = (dk − 1)ˆ qk−1 + (−1)xk−1 qˆk−2 ≥ 2ˆ qk−1 + (−1)xk−1 qˆk−2 . As in the proof of part (i) we now get that if qˆk−1 > qˆk−2 , then qˆn > qˆk−1 and if qˆk−1 < qˆk−2 , then xk−1 = 0 and we also get qˆn > qˆk−1 . The following result was obtained in [27]. Proposition 8.4.1. Let y ∈ [−1, 1]\Q. For each x ∈ {0, 1}N , the digits dn (x, y) give a continued fraction expansion of y. Proof. For all n ≥ 1 we obtain from (8.28) and (8.29) that pˆn y − =

qˆn

  n qˆ pˆ + pˆ − pˆn qˆn + qˆn−1 π K n (x, y) n−1 π K (x, y) n n  n

qˆn qˆn + qˆn−1 π K (x, y)

π K n (x, y)(ˆ qn pˆn−1 − pˆn qˆn−1 )  = n

qˆn qˆn + qˆn−1 π K (x, y) 1 1 1  ≤ ≤ ≤ , n qˆn |ˆ qn − qˆn−1 | qˆn |ˆ qn qˆn + qˆn−1 π K (x, y) | where in the last step we have also used Exercise 8.4.1(b). We now

Continued Fractions  145

show that limn→∞ qˆ1n = 0. Let ε > 0. By Exercise 8.4.1(b) there exists a subsequence (ˆ qnk )k≥0 such that limk→∞ qˆn1 = 0 and hence there exists an k

N1 such that qˆN1 < ε. If there is no n > N1 , such that qˆn < qˆn−1 , then we 1 are done. If there is, let k be the smallest such index. If (xN1 +1 , dN1 +1 ) 6= (1, 2), by Lemma 8.4.1 qˆk ≥ qˆN1 and then same holds for all other n > N1 . If (xN1 +1 , dN1 +1 ) = (1, 2), then set N = k + 1. We then have qˆ1N < 1 1 ˆn > qˆk−1 , since qˆk−1 < qˆN1 < ε. Moreover, for all n > N we have q (xk , dk ) = (0, 1). This shows that the limit exists and is equal to 0. Hence we get (−1)x0 y= (−1)x1 d1 + (−1)x2 d2 + . d3 + . . for each y ∈ [−1, 1]\Q. The Gauss map is obtained from K by taking x = (0, 0, 0, . . .) and we get the 0-continued fraction map T0 : [−1, 0] → [−1, 0] from x = (1, 1, 1, . . .). This map is a shifted version of what is sometimes called the R´enyi map, see also Exercise 11.1.1. The odd continued fraction expansions can be found inductively by choosing for each n ≥ 1 the coordinate xn such that dn (x, y) is odd and similarly for the even continued fractions. Exercise 8.4.2. Let Tα : [α − 1, α] → [α − 1, α], α ∈ [0, 1], be the α-continued fraction map. For each y ∈ [α − 1, α] that satisfies Tαn y 6= 0 for each n ≥ 1, define a sequence of signs (sn ) ⊆ {0, 1}N and a sequence of digits (dn ) ⊆ NN , such that for each k ≥ 1, y=

(−1)s0 . (−1)s1 d1 + (−1)sk−1 . d2 + . . + dk + Tαk y

Prove that y=

(−1)s0 . (−1)s1 d1 + (−1)s2 d2 + . d3 + . .

CHAPTER

9

Entropy

9.1

RANDOMNESS AND INFORMATION

In some of the examples of dynamical systems we saw, we attached to each orbit a sequence of symbols. For example, orbits under the doubling map correspond to binary digit sequences. There is a more general setup for this. Definition 9.1.1. Suppose (X, F, µ, T ) is a dynamical system with µ a probability measure. A collection of measurable sets α = {Ai } ⊆ F is called a partition for T if • µ(Ai ) > 0 for all i, • µ(Ai ∩ Aj ) = 0 for i 6= j, and, • µ

S

i



Ai = 1.

The elements of a partition are called its atoms. We will only consider finite or countable partitions. As for the doubling map, given a partition α = {Ai } of X, we can assign to µ-almost all x ∈ X a digit sequence (an (x)) by setting an (x) = i if T n−1 x ∈ Ai , n ≥ 1. Suppose we are in the situation where we cannot observe the action of the dynamical system (X, F, µ, T ) directly, but we can only see the consecutive symbols a1 , a2 , a3 , . . .. The information value of this sequence obviously depends on the transformation T and on the chosen partition (the partition α = {X} gives no information). We would like to define a non-negative quantity hµ (T ), called the measure theoretic (or metric) entropy, which in some sense measures the asymptotic average information generated by each application of T and is independent of the chosen partition α. We want to define hµ (T ) in such a way that 147

148  A First Course in Ergodic Theory

(i) hµ (T ) reflects the average amount of information gained by an application of T , where information gained or uncertainty removed are seen as proportional quantities, and that (ii) hµ (T ) is isomorphism invariant, so that isomorphic transformations have equal entropy. The connection between entropy (that is randomness, uncertainty) and the transmission of information was first studied by C. Shannon in the famous paper [59] from 1948. He considered sequences of symbols emitted from some data source and fed into a transmission channel. Think for example of Morse code or compact discs. To study the effect of noise on these sequences the output from the transition channel was viewed as a stationary stochastic process. Entropy was devised to measure the uncertainty in the value of this outcome. Before we turn to dynamics, the first basic question we need to answer is how do we quantify the information gained by the occurrence of an event E? A possible choice would be − log µ(E) (the base of the logarithm is chosen to be 2, but any base would work). To check whether this is a good choice we examine two natural situations. Firstly, if E and F are two independent events, then − log P(E ∩ F ) = − log P(E) − log P(F ), which is exactly what we expect: the information transmitted by E ∩ F is the sum of the information transmitted by each one individually. Secondly, if E is a sure event, i.e., µ(E) = 1, then log µ(E) = 0, which is also what we expect: we gain no information by the occurrence of E since we already know that almost surely any outcome is in E. Now that we have a reasonable way to quantify the information gained by the occurrence of an event, how do we quantify the information gained by a stationary process or in general a probability measure preserving dynamical system? Let us first start simple, and consider an independent process, i.e., a Bernoulli shift. What we see are infinite sequences x1 , x2 , x3 , . . . from the set of symbols {0, 1, . . . , k − 1} (or some other finite set of symbols). Suppose that the probability of receiving symbol i at any given time is pi , and that each symbol is transmitted independently of what has been transmitted earlier. Of course each pi ≥ 0 P and i pi = 1. As we have seen in earlier chapters, we view this process as the dynamical system (X, C, µ, T ), where X = {0, 1, . . . , k − 1}N , C the σ-algebra generated by cylinder sets of the form {x ∈ X : x1 = i1 , . . . , xn = in },

Entropy  149

µ the product measure assigning to each coordinate probability pi of seeing the symbol i, and T the left shift. We define the entropy of this system by H(p0 , . . . , pk−1 ) = h(T ) := −

k−1 X

pi log pi .

(9.1)

i=0

If we define − log pi as the amount of uncertainty in transmitting the symbol i, then H is the average amount of information gained (or uncertainty removed) per symbol (notice that H is in fact an expected value). To see why this is an appropriate definition, notice that if the source is degenerate, that is, pi = 1 for some i (i.e., the source only transmits the symbol i), then H = 0. In this case we indeed have no randomness. Another reason to see why this definition is appropriate, is that H is maximal if pi = k1 for all i, and this agrees with the fact that the source is most random when all the symbols are equiprobable. To see this maximum, consider the function Φ : [0, ∞) → R defined by (

Φ(t) =

0, if t = 0, −t log t, if 0 < t ≤ 1.

Then Φ is continuous and concave, and Jensen’s Inequality implies that for any p0 , . . . , pk−1 with pi ≥ 0 and p0 + · · · + pk−1 = 1,  1 k−1 1 X X  1 1 k−1 1 H(p0 , . . . , pk−1 ) = Φ(pi ) ≤ Φ pi = Φ = log k, k k i=0 k i=0 k k

so H(p0 , . . . , pk−1 ) ≤ log k for all probability vectors (p0 , . . . , pk−1 ). But H

1

k

,...,

1 = log k, k

so the maximum value is attained at

1 1 k, .. . , k



.

In the above example of a Bernoulli shift the symbols were transmitted independently. In general, the symbol generated might depend on what has been received before. In fact these dependencies are often “built-in” to be able to check the transmitted sequence of symbols on errors. Such dependencies must be taken into consideration in the calculation of the average information per symbol. This can be achieved if one replaces the symbols i by blocks of symbols of a particular length. More

150  A First Course in Ergodic Theory

precisely, for every n, let Cn be the collection of all possible cylinder sets specifying n coordinates, and define Hn := −

X

P(C) log P(C).

C∈Cn

Then n1 Hn can be seen as the average information per symbol when a block of length n is transmitted. The entropy of the source is now defined by Hn h := lim . (9.2) n→∞ n The existence of the limit in (9.2) is a consequence of Proposition 9.2.3 below and follows from the fact that Hn is a subadditive sequence, i.e., Hn+m ≤ Hn + Hm , and Proposition 9.2.2.

9.2

DEFINITIONS AND PROPERTIES

Consider now and for the remainder of this chapter a measure preserving system (X, F, µ, T ) with µ a probability measure. How can one define the entropy of this system similar to the case of a Bernoulli shift? We return to the setup from the beginning of the chapter. The symbols {0, 1, . . . , k −1} compare to a partition α = {A0 , A1 , . . . , Ak−1 } of X and with each point x ∈ X, we associate an infinite sequence a1 , a2 , a3 , . . ., where ai is j if and only if T i−1 x ∈ Aj . We define the entropy of the partition α by Hµ (α) := −

X

µ(A) log µ(A) ∈ [0, ∞].

A∈α

It is finite if α is a finite partition. Our aim is to define the entropy of the transformation T independently of the partition we choose. First we need a few facts about partitions. We will use the following definition also for collections of sets that are not necessarily partitions. Definition 9.2.1. Let α and β be two collections of subsets of a space X. We say that β is a refinement of α, and write α ≤ β, if for every B ∈ β there exists an A ∈ α such that B ⊆ A. In case the space X is a measure space, it is enough for the inclusion to hold up to sets of measure zero. The collection α ∨ β := {A ∩ B : A ∈ α, B ∈ β} is called the common refinement of α and β.

Entropy  151

Exercise 9.2.1. Let α and β be two partitions of the same space X. (a) Show that α ∨ β and T −1 α := {T −1 A : A ∈ α} are both partitions of X. (b) Show that if β is finite and α ≤ β, then each atom of α is a finite (a.e. disjoint) union of atoms of β. Given two partitions α and β of X, we define the conditional entropy of α given β by µ(A ∩ B) Hµ (α|β) := − µ(A ∩ B) log µ(B) A∈α B∈β 

X X



(under the convention that 0 log 0 := 0). The quantity Hµ (α|β) is interpreted as the average uncertainty about which element of the partition α the point x will enter (under T ) if we already know which element of β the point x will enter. Proposition 9.2.1. Let α, β and γ be partitions of X. Then, (i) Hµ (T −1 α) = Hµ◦T −1 (α), so Hµ (T −1 α) = Hµ (α) if T is measure preserving for µ; (ii) Hµ (α ∨ β) = Hµ (α) + Hµ (β|α); (iii) Hµ (β|α) ≤ Hµ (β); (iv) Hµ (α ∨ β) ≤ Hµ (α) + Hµ (β); (v) If α ≤ β, then Hµ (α) ≤ Hµ (β); (vi) Hµ (α ∨ β|γ) = Hµ (α|γ) + Hµ (β|α ∨ γ); (vii) If α ≤ β, then Hµ (γ|β) ≤ Hµ (γ|α); (viii) If α ≤ β, then Hµ (α|β) = 0. (ix) We call two partitions α and β independent if µ(A ∩ B) = µ(A)µ(B)

for all A ∈ α, B ∈ β.

If α and β are independent partitions, one has that Hµ (α ∨ β) = Hµ (α) + Hµ (β).

152  A First Course in Ergodic Theory

Proof. For (ii), Hµ (α ∨ β) = −

X X

µ(A ∩ B) log µ(A ∩ B)

A∈α B∈β

= −

X X

µ(A ∩ B) log

A∈α B∈β



X X

µ(A ∩ B) µ(A)

µ(A ∩ B) log µ(A)

A∈α B∈β

= Hµ (β|α) + Hµ (α). We now show part (iii), that Hµ (β|α) ≤ Hµ (β). Recall that the function Φ(t) = −t log t for t ≥ 0 is concave. Thus, Hµ (β|α) = −

X X

µ(A ∩ B) log

B∈β A∈α

= −

X X

µ(A)

B∈β A∈α

µ(A ∩ B) µ(A)

µ(A ∩ B) µ(A ∩ B) log µ(A) µ(A)

µ(A ∩ B) = µ(A)Φ µ(A) B∈β A∈α



µ(A ∩ B) ≤ Φ µ(A) µ(A) A∈α B∈β

!



X X

X

=

X

X

Φ(µ(B)) = Hµ (β).

B∈β

The proofs of the other parts are left as an exercise. Exercise 9.2.2. Prove the rest of the properties of Proposition 9.2.1. Now given a partition α of X consider the partition whose atoms are of the form

Wn−1 i=0

T −i α,

Ai0 ∩ T −1 Ai1 ∩ · · · ∩ T −(n−1) Ain−1 , consisting of all points x ∈ X with the property that x ∈ Ai0 , T x ∈ Ai1 , . . . , T n−1 x ∈ Ain−1 . Exercise 9.2.3. Show that if α is a finite partition of (X, F, µ, T ), then Hµ

 n−1 _ i=0

T

−i



α = Hµ (α) +

n−1 X j=1



Hµ α|

j _ i=1



T −i α .

Entropy  153

To define the notion of the entropy of a transformation with respect to a partition, we need the following two propositions. Proposition 9.2.2. If (an ) is a subadditive sequence of real numbers, i.e., an+p ≤ an + ap for all n, p, then an lim n→∞ n exists. Proof. Fix any m > 0. For any n > 0 one has n = km + i for some i between 0 ≤ i ≤ m − 1. By subadditivity it follows that an akm+i akm ai am ai = ≤ + ≤k + . n km + i km km km km an am Note that if n → ∞, k → ∞ and so lim sup ≤ . Since m is m n→∞ n arbitrary one has an an am lim sup ≤ inf ≤ lim inf . n→∞ n m n n→∞ an an Therefore lim exists, and equals inf . n→∞ n n n Proposition 9.2.3. Let α be a finite partition of (X, F, µ, T ), where T is Wn−1 −i T α a measure preserving transformation. Then, limn→∞ n1 Hµ i=0 exists. Proof. Let an = Hµ and (i), we have

Wn−1 i=0

an+p = Hµ

T −i α ≥ 0. Then, by Proposition 9.2.1(iv) 

 n+p−1 _

T −i α



i=0

≤ Hµ

 n−1 _



T −i α + Hµ

i=0

= an + Hµ

 n+p−1 _

T −i α

i=n

 p−1 _

T −i α



i=0

= an + ap . Hence, by Proposition 9.2.2,  _ an 1  n−1 = lim Hµ T −i α n→∞ n n→∞ n i=0

lim

exists.



154  A First Course in Ergodic Theory

We are now in position to give the definition of the entropy of the transformation T . Definition 9.2.2. Let (X, F, µ) be a probability space and T : X → X measure preserving. The entropy of T with respect to the finite partition α is given by  _ 1  n−1 hµ (α, T ) := lim Hµ T −i α , n→∞ n i=0 where Hµ

 n−1 _



T −i α = −

X Wn−1

i=0

D∈

i=0

µ(D) log(µ(D)). T −i α

Finally, the measure theoretic or metric entropy of the transformation T is given by hµ (T ) := sup hµ (α, T ), α

where the supremum is taken over all finite partitions of X. The following theorem gives an equivalent definition of metric entropy. Theorem 9.2.1. Let (X, F, µ) be a probability space and T : X → X measure preserving. Let α be a finite partition of X. The entropy of T with respect to α is also given by 

hµ (α, T ) = lim Hµ α|

n−1 _

n→∞



T −i α .

i=1

Proof. Notice that the sequence Hµ α| ni=1 T −i α n≥1 is bounded from below, and is non-increasing by Proposition 9.2.1(vii). Hence  W limn→∞ Hµ α| ni=1 T −i α exists. Furthermore, 

W



lim Hµ α|

n→∞

n _

j

T

−i

i=1

n  _  1X α = lim Hµ α| T −i α . n→∞ n j=1 i=1



From Exercise 9.2.3, we have Hµ

 n−1 _ i=0



T −i α = Hµ (α) +

n−1 X j=1



Hµ α|

j _



T −i α .

i=1

Now, dividing by n, and taking the limit as n → ∞, one gets the desired result.

Entropy  155

The next theorem shows that entropy can be used to distinguish non-isomorphic dynamical systems. Theorem 9.2.2. Entropy is an isomorphism invariant. Proof. Let (X, F, µ, T ) and (Y, G, ν, S) be two isomorphic measure preserving systems with ψ : X → Y the corresponding isomorphism. We need to show that hµ (T ) = hν (S). Let β = {B1 , . . . , Bn } be any finite partition of Y , then ψ −1 β = {ψ −1 B1 , . . . , ψ −1 Bn } is a partition of X. Set Ai = ψ −1 Bi , for 1 ≤ i ≤ n. Since ψ : X → Y is an isomorphism, we have that ν = µ ◦ ψ −1 and ψ ◦ T = S ◦ ψ, so that for any n ≥ 0 and Bi0 , . . . , Bin−1 ∈ β, ν Bi0 ∩ S −1 Bi1 ∩ · · · ∩ S −(n−1) Bin−1











= µ ψ −1 Bi0 ∩ ψ −1 S −1 Bi1 ∩ · · · ∩ ψ −1 S −(n−1) Bin−1

= µ ψ −1 Bi0 ∩ T −1 ψ −1 Bi1 ∩ · · · ∩ T −(n−1) ψ −1 Bin−1 



= µ Ai0 ∩ T −1 Ai1 ∩ · · · ∩ T −(n−1) Ain−1 . Setting A(n) = Ai0 ∩ T −1 Ai1 ∩ · · · ∩ T −(n−1) Ain−1 and B(n) = Bi0 ∩ S −1 Bi1 ∩ · · · ∩ S −(n−1) Bin−1 , we find that  _ 1  n−1 Hν S −i β n→∞ n i=0

hν (S) = sup hν (β, S) = sup lim β

β

= sup lim − β

n→∞

1 n

X B(n)∈

1 = sup lim − n→∞ −1 n ψ β

Wn−1 i=0

ν(B(n)) log ν(B(n)) S −i β

X Wn−1

A(n)∈

i=0

µ(A(n)) log µ(A(n))

T −i ψ −1 β

= sup hµ (ψ −1 β, T ) ψ −1 β

≤ sup hµ (α, T ) = hµ (T ), α

where in the last inequality the supremum is taken over all possible finite partitions α of X. Thus hν (S) ≤ hµ (T ). By symmetry we find hν (S) = hµ (T ), and the proof is complete.

156  A First Course in Ergodic Theory

The previous result shows that one can check whether two systems are isomorphic by computing the metric entropy. The result only goes in one direction: if two systems have different metric entropies, then they are not isomorphic. To see that the other implication does not necessarily hold, recall the positive and negative β-transformations Tβ and Sβ , respectively, from Example 5.1.4. In Example 9.4.1 below we see that for any β ∈ (1, 2) the map Tβ has metric entropy hµβ (Tβ ) = log β for the invariant measure µβ identified in Example 6.3.4. Exercise 9.4.6 below considers the negative β-transformation Sβ for the specific value βˆ > 1 satisfying βˆ3 − βˆ2 − 1 = 0. It gives an invariant measure νβˆ for Sβˆ that is absolutely continuous with respect to λ and has hνβˆ (Sβˆ) = √ ˆ Since βˆ < 1+ 5 , we know from Example 5.1.4 that the dynamical log β. 2

systems ([0, 1], B, µβˆ, Tβˆ) and ([0, 1], B, νβˆ, Sβˆ) are not isomorphic, even though they have the same metric entropies. In fact, using combined results by F. Hofbauer and G. Keller from [21, 22, 23] one can show that √  1+ 5 the same is true for any other value β ∈ 1, 2 . There is one important class of dynamical systems for which the metric entropy is a complete invariant, namely two-sided Bernoulli shifts. A famous result by D. Ornstein, see [47], states that two two-sided Bernoulli shifts are metrically isomorphic if and only if their metric entropies are equal.

9.3

CALCULATION OF ENTROPY AND EXAMPLES

Calculating the entropy of a transformation directly from the definition does not seem very feasible, since it requires taking the supremum over all finite partitions. However, the entropy of a partition is relatively easy to compute if one has full information about the partition under consideration. So the question is whether it is possible to find a partition α of X for which hµ (α, T ) = hµ (T ). Naturally, such a partition contains all the information “transmitted” by T . To answer this question we need some notations and definitions. For α = {A1 , . . . , AN } and m, n ∈ Z with n ≤ m, let σ

m [

!

T −i α

i=n

be the smallest σ-algebra containing all elements from the partition Wm −i i=n T α. We only consider forward iterates of T in case T is invertible.

Entropy  157

We call a partition α a generator with respect to an invertible transfor S∞ −i mation T if σ  i=−∞ T α = F, where F is the σ-algebra on X and S σ ∞ T −i α is the smallest σ-algebra containing all elements of pari=−∞ Wm titions i=n T −i α for all m, n ∈ Z with n ≤ m. If T is non-invertible, S −i then α is said to be a generator if σ ∞ i=0 T α = F. Naturally, this equality is modulo sets of measure zero. We now state two famous theorems known as the KolmogorovSinai Theorem and Krieger’s Generator Theorem. A first version of the Kolmogorov-Sinai Theorem was provided by A. N. Kolmogorov in his lectures and the proof for the general case was given by Ya. G. Sinai in [61] from 1959. Not unexpectedly, Krieger’s Generator Theorem was proven by W. Krieger, see [32] from 1970. Before we state the theorem, we would like to remark that although we have considered only finite partitions on X, however all the definitions and results hold if we were to consider countable partitions of finite entropy. Theorem 9.3.1 (Kolmogorov-Sinai Theorem). Let T : X → X be a measure preserving transformation on the probability space (X, F, µ) and α a finite or countable partition with Hµ (α) < ∞. If α is a generator with respect to T , then hµ (T ) = hµ (α, T ). Theorem 9.3.2 (Krieger’s Generator Theorem). If T is an ergodic measure preserving transformation with hµ (T ) < ∞, then T has a finite generator. The proof of Theorem 9.3.1 is given in Exercise 9.3.1 below. For the proof of Theorem 9.3.2, we refer the interested reader to the books [50, 65]. Exercise 9.3.1. Let (X, F, µ) be a probability space and T : X → X a measure preserving transformation. (a) Suppose α is a finite partition of X. Show that hµ (α, T ) = W hµ ni=1 T −i α, T , for any n ≥ 1. (b) Let α and β be finite partitions. Show that hµ (β, T ) ≤ hµ (α, T ) + Hµ (β|α).  S −i (c) Suppose α is a finite generator, i.e., σ ∞ i=0 T α = F. Using parts (a) and (b), show that for any finite partition β of X one has hµ (β, T ) ≤ hµ (α, T ). Conclude that hµ (α, T ) = hµ (T ). We will use these two theorems to calculate the entropy of a Bernoulli shift.

158  A First Course in Ergodic Theory

Example 9.3.1 (Entropy of a Bernoulli Shift). Let T be the left shift on X = {0, 1, . . . , k − 1}N endowed with the σ-algebra C generated by the cylinder sets, and product measure µ giving symbol i probability pi , where p0 + p1 + · · · + pk−1 = 1. Our aim is to calculate hµ (T ). To this end we need to find a partition α which generates the σ-algebra C under the action of T . The natural choice of α is what is known as the time-zero partition α = {A0 , . . . , Ak−1 }, where Ai := {x ∈ X : x1 = i} i = 0, . . . , k − 1. Notice that for all m ≥ 0, T −m Ai = {x ∈ X : xm+1 = i}, and

m−1 \

T −j Aij = {x ∈ X : xj+1 = ij , 0 ≤ j ≤ m − 1}.

j=0 −i In other words, m−1 i=0 T α is precisely the collection of cylinder sets of length m and these by definition generate C. Hence, α is a generating partition, so that

W

1 hµ (T ) = hµ (α, T ) = lim Hµ m→∞ m

m−1 _

!

T

−i

α .

i=0

First notice that, since µ is a product measure, the partitions α, T −1 α, . . . , T −(m−1) α are all independent since each partition specifies a different coordinate. So Hµ (α ∨ T −1 α ∨ · · · ∨ T −(m−1) α) = Hµ (α) + Hµ (T −1 α) + · · · + Hµ (T −(m−1) α) = mHµ (α) = −m

k−1 X

pi log pi .

i=0

Thus, k−1 k−1 X X 1 (−m) pi log pi = − pi log pi . m→∞ m i=0 i=0

hµ (T ) = lim

Entropy  159

Exercise 9.3.2. Let T be the left shift on X = {0, 1, . . . , k − 1}Z endowed with the σ-algebra C generated by the cylinder sets, and the Markov measure µ given by the stochastic matrix P = (pij ), and the probability vector π = (π0 , . . . , πk−1 ) with πP = π. Show that hµ (T ) = −

k−1 X k−1 X

πi pij log pij .

j=0 i=0

Exercise 9.3.3. Let (X, F, µ) be a probability space and T : X → X a measure preserving transformation. Let k > 0.  W −i k (a) Show that for any finite partition α of X one has hµ k−1 = i=0 T α, T khµ (α, T ). (b) Prove that khµ (T ) ≤ hµ (T k ). (c) Prove that hµ (α, T k ) ≤ khµ (α, T ). (d) Prove that hµ (T k ) = khµ (T ). Recall from Theorem 12.3.2 that the fact that we are working on a Lebesgue space implies the existence of an increasing sequence of finite partitions α1 ≤ α2 ≤ . . . such that σ(∪n αn ) = F. This allows us to calculate entropy in the following way. Proposition 9.3.1. If α1 ≤ α2 ≤ . . . is an increasing sequence of finite partitions on (X, F, µ, T ) such σ(αn ) % F, then hµ (T ) = limn→∞ hµ (αn , T ). Proof. It is enough to show that for any finite partition β, one has hµ (β, T ) ≤ limn→∞ hµ (αn , T ). From Exercise 9.3.1(b), hµ (β, T ) ≤ hµ (αn , T ) + Hµ (β|αn ). Since σ(αn ) % F, then by the Martingale Convergence Theorem (see Theorem 12.4.7) together with the Dominated Convergence Theorem one has lim Hµ (β|αn ) = lim Hµ (β|σ(αn )) = Hµ (β|F) = 0.

n→∞

n→∞

Thus, hµ (β, T ) ≤ limn→∞ hµ (αn , T ), and hence hµ (T ) = lim hµ (αn , T ). n→∞

Exercise 9.3.4. Prove that hµ1 ×µ2 (T1 × T2 ) = hµ1 (T1 ) + hµ2 (T2 ).

160  A First Course in Ergodic Theory

9.4

THE SHANNON-MCMILLAN-BREIMAN THEOREM

In the previous sections we have considered only finite partitions on X, however as mentioned earlier, all the definitions and results hold if we were to consider countable partitions of finite entropy. Before we state and prove the Shannon-McMillan-Breiman Theorem, we need to introduce the information function associated with a partition. Let (X, F, µ) be a probability space, and α = {A1 , A2 , . . .} a finite or a countable partition of X into measurable sets. For each x ∈ X, let α(x) be the element of α to which x belongs. Then, the information function associated to α is defined to be X

Iα (x) = − log µ(α(x)) = −

1A (x) log µ(A).

A∈α

Note that Iα is only defined up to µ-a.e. equivalence. Also note that Z

Iα dµ = Hµ (α). X

For two finite or countable partitions α and β of X, we define the conditional information function of α given β by µ(A ∩ B) Iα|β (x) = − 1(A∩B) (x) log . µ(B) B∈β A∈α 

X X



We claim that Iα|β (x) = − log Eµ (1α(x) |σ(β))(x) = −

X

1A (x) log Eµ (1A |σ(β))(x),

A∈α

(9.3) where σ(β) is the σ-algebra generated by the finite or countable partition β. This follows from the fact (which is easy to prove using the definition of conditional expectations) that if β is finite or countable, then for any f ∈ L1 (X, F, µ), one has 1 Eµ (f |σ(β)) = 1B µ(B) B∈β X

Clearly, Hµ (α|β) =

R

X Iα|β

Z

f dµ. B

dµ.

Exercise 9.4.1. Let α and β be finite or countable partitions of X. Show that Iα∨β = Iα + Iβ|α .

Entropy  161

Now suppose T : X → X is a measure preserving transformation on (X, F, µ), and let α = {A1 , A2 , . . .} be any countable partition. Since T −1 α = {T −1 A1 , T −1 A2 , . . .} is also a countable partition and T is measure preserving one has, X

IT −1 α (x) = −

1T −1 Ai (x) log µ(T −1 Ai )

Ai ∈α

X

= −

(9.4) 1Ai (T x) log µ(Ai ) = Iα (T x).

Ai ∈α

Furthermore, 1 H lim n→∞ n + 1

n _

!

T

−i

α

1 n→∞ n + 1

Z

= lim

i=0

IWn i=0

X

T −i α

dµ = hµ (α, T ).

The Shannon-McMillan-Breiman Theorem, named after the contributions of C. Shannon ([59], 1948), B. McMillan ([41], 1953) and L. Breiman ([9], 1957 with a correction in 1960), says that if T is ergodic and if α 1 W has finite entropy, then in fact the integrand I n −i converges n + 1 i=0 T α a.e. to hµ (α, T ). Before we proceed we need the following proposition. Proposition 9.4.1. Let α = {A1 , A2 , . . .} be a countable partition with finite entropy. For each n ≥ 1, let fn = Iα| Wn T −i α , and let f ∗ = i=1 supn≥1 fn . Then, for each t ≥ 0 and for each A ∈ α, µ ({x ∈ A : f ∗ (x) > t}) ≤ 2−t . Furthermore, f ∗ ∈ L1 (X, F, µ). Proof. Let t ≥ 0 and A ∈ α. For n ≥ 1, let fnA (x) = − log Eµ 1A |σ

n [

T −i α



!

(x),

i=1

and A Bn = {x ∈ X : f1A (x) ≤ t, . . . , fn−1 (x) ≤ t, fnA (x) > t}.

Notice that for x ∈ A one has fn (x) = fnA (x), and for x ∈ Bn one has  Sn S −i Eµ 1A |σ( i=1 T α) (x) < 2−t . Since Bn ∈ σ ni=1 T −i α , then µ(Bn ∩ A) =

Z

1A dµ Bn

Z

= Bn



Z Bn

Eµ 1A |σ

n [

T

−i

α



!

i=1

2−t dµ(x) = 2−t µ(Bn ).



162  A First Course in Ergodic Theory

Thus, µ ({x ∈ A : f ∗ (x) > t}) = µ ({x ∈ A : fn (x) > t, for some n}) 



= µ {x ∈ A : fnA (x) > t, for some n} ∞ [



!

A ∩ Bn

n=1

=

X

µ (A ∩ Bn )

n≥1

≤ 2−t

X

µ(Bn ) ≤ 2−t .

n≥1

We now show that f ∗ ∈ L1 (X, F, µ). First notice that µ ({x ∈ A : f ∗ (x) > t}) ≤ µ(A), hence, µ ({x ∈ A : f ∗ (x) > t}) ≤ min(µ(A), 2−t ). Using Fubini’s Theorem, and the fact that f ∗ ≥ 0 one has Z

Z





f dµ = X

µ ({x ∈ X : f ∗ (x) > t}) dt

0

Z

= 0

=



X A∈α ∞

XZ

µ ({x ∈ A : f ∗ (x) > t}) dt µ ({x ∈ A : f ∗ (x) > t}) dt

A∈α 0



XZ



min(µ(A), 2−t ) dt

A∈α 0

=

XZ

− log µ(A)

µ(A) dt +

X



2−t dt

A∈α − log µ(A)

A∈α 0

= −

XZ

µ(A) log µ(A) +

A∈α

= Hµ (α) +

X µ(A) A∈α

ln 2

1 < ∞. ln 2

So far we have defined the notion of conditional information Iα|β when α and β are countable partitions. We can generalize the definition

Entropy  163

to the case α is a countable partition and G is a σ-algebra by setting (see (9.3)), Iα|G (x) = − log Eµ (1α(x) |G). Then Iα|σ(S∞

i=1

T −i α) (x)

= lim Iα|σ(Sn n→∞

i=1

T −i α) (x).

(9.5)

Exercise 9.4.2. Give a proof of (9.5) using the Martingale Convergence Theorem, see Theorem 12.4.7. Theorem 9.4.1 (Shannon-McMillan-Breiman Theorem). Suppose T is an ergodic measure preserving transformation on a probability space (X, F, µ), and let α be a countable partition with Hµ (α) < ∞. Then, 1 W lim I n T −i α (x) = hµ (α, T ) µ − a.e. n→∞ n + 1 i=0 Proof. For each n ≤ 1, let fn (x) = Iα| Wn T −i α (x). Using (9.4) and the i=1 fact that for any two partitions β and γ, Iβ∨γ (x) = Iβ (x) + Iγ|β (x), we obtain IWn i=0

T −i α (x)

= IWn i=1

T −i α (x)

+ Iα| Wn

i=1

T −i α (x)

= IWn−1 T −i α (T x) + fn (x) i=0

= IWn−1 T −i α (T x) + Iα| Wn−1 T −i α (T x) + fn (x) i=1

= IWn−2 i=0

i=1

2

T −i α

(T x) + fn−1 (T x) + fn (x)

.. . = Iα (T n x) + f1 (T n−1 x) + · · · + fn−1 (T x) + fn (x). Let f (x) = Iα|σ(S∞ 1

i=1

L (X, F, µ) since

R X

T −i α) (x)

= limn→∞ fn (x). Notice that f ∈

f dµ = hµ (α, T ). Now letting f0 = Iα , we have

n 1 W 1 X n I fn−k (T k x) −i (x) = n + 1 i=0 T α n + 1 k=0

=

n n 1 X 1 X f (T k x) + (fn−k − f )(T k x). n + 1 k=0 n + 1 k=0

By the Pointwise Ergodic Theorem, n 1 X lim f (T k x) = n→∞ n + 1 k=0

Z X

f dµ = hµ (α, T ) µ − a.e.

164  A First Course in Ergodic Theory

We now study the sequence



n  1 X (fn−k − f )(T k x) . Let n + 1 k=0

FN = sup |fk − f |, and f ∗ = sup fn . n≥1

k≥N

Notice that 0 ≤ FN ≤ f ∗ + f , hence FN ∈ L1 (X, F, µ) and limN →∞ FN (x) = 0 µ-a.e. RBy the Lebesgue Dominated Convergence Theorem, one has limN →∞ FN dµ = 0. Also for any k, |fn−k − f | ≤ f ∗ + f , so that |fn−k − f | ∈ L1 (X, F, µ) and limn→∞ |fn−k − f | = 0 µ-a.e. For any N ≥ 1, and for all n ≥ N one has n X 1 X 1 n−N |fn−k − f |(T k x) = |fn−k − f |(T k x) n + 1 k=0 n + 1 k=0

+ ≤

n X 1 |fn−k − f |(T k x) n + 1 k=n−N +1

X 1 n−N FN (T k x) n + 1 k=0

+

−1 1 NX |fk − f |(T n−k x). n + 1 k=0

If we take the limit as n → ∞, then by Exercise 3.1.1, the second term tends to R0 µ-a.e. and by the Pointwise Ergodic Theorem the first term tends to FN dµ. Now taking the limit as N → ∞, one sees that n 1 X |fn−k − f |(T k x) = 0 µ − a.e. n + 1 k=0

Hence, lim

n→∞

1 W I n −i (x) = hµ (α, T ) µ − a.e. n + 1 i=0 T α

The above theorem can be interpreted as providing an estimate of W the size of the atoms of ni=0 T −i α. For n sufficiently large, a typical Wn element A ∈ i=0 T −i α satisfies −

1 log µ(A) ≈ hµ (α, T ) n+1

or µ(A) ≈ 2−(n+1)hµ (α,T ) .

Entropy  165 −i Furthermore, if α is a generating partition (i.e., σ ∞ i=0 T α = F), then in the conclusion of the Shannon-McMillan-Breiman Theorem one can replace hµ (α, T ) by hµ (T ).

S



Exercise 9.4.3. Let (X, F, µ, T ) be a measure preserving and ergodic dynamical system on a probability space. Suppose α is a finite or countable partition of X with Hµ (α) < ∞. For x ∈ X, let αn (x) be the element W −i of the partition n−1 i=0 T α that contains x. Suppose λ is another probability measure on (X, F) for which there are constants 0 < C1 < C2 such that C1 λ(A) < µ(A) < C2 λ(A) for all A ∈ F. Show that the conclusion of the Shannon-McMillan-Breiman Theorem holds if we replace µ by λ, i.e., lim −

n→∞

log λ(αn (x)) = hµ (α, T ) µ − a.e. with respect to λ. n

Exercise 9.4.4. let T : [0, 1) → [0, 1) be the β-transformation given by √ 1+ 5 T x = βx (mod 1), where β = 2 (see also Example 1.3.6). Use the Shannon-McMillan-Breiman Theorem and Exercise 9.4.3 to calculate the entropy hµ (T ) of T with respect to the invariant measure µ given by Z

µ(A) =

g dλ, A

with g(x) =

 √  5 + 3 5   ,    10

√ 5−1 if 0 ≤ x < , 2

√ 5−1 if ≤ x < 1. 10 2 Exercise 9.4.5. Use the Shannon-McMillan-Breiman Theorem, the Kolmogorov-Sinai Theorem and Exercise 9.4.3 to show that if T is the Gauss map and µ is Gauss measure (see Example 1.3.7 and Chapter 8), π2 then hµ (T ) = 6 log 2.  √    5+ 5   ,

As a tool for computing metric entropy we give a version of Rohlin’s Formula. The result is valid for a large class of dynamical systems. Here we limit ourselves to a version for piecewise monotone interval maps that we state and prove below. The transformations we have in mind are the following. Definition 9.4.1. A transformation T : [0, 1) → [0, 1) is called a number theoretic fibered map (NTFM) if it satisfies the following two conditions.

166  A First Course in Ergodic Theory

(a) There exists a finite or countable interval partition α = (Aj )j∈D such that T restricted to each atom of α (fundamental interval of T ) is strictly monotone and continuous. Furthermore, α generates B and Hµ (α) < ∞. (b) There exists a T -invariant probability measure µ equivalent to λ and for which two constants c1 , c2 > 0 exist such that c1 ≤ dµ dλ ≤ c2 . The name comes from the fact that iterations of T assign to each point x ∈ [0, 1) a digit sequence (an (x)) with digits in D. Since α generates B these sequences are uniquely determined λ-a.e. We refer to the resulting sequence as the T -expansion of x. Almost all known number expansions on [0, 1) are generated by an NTFM . Among them are the base N expansions (T x = N x (mod 1), where N is a positive integer), β-expansions (T x = βx (mod 1), where β > 1 is a real number), regular continued fraction expansions (T x = x1 (mod 1)), L¨ uroth expansions (see Example 1.3.5) and many others, see also [14]. Theorem 9.4.2 (Rohlin’s Formula). Let T : [0, 1) → [0, 1) be an NTFM for a partition α. Assume furthermore that for each j ∈ D the restriction Tj := T |Aj : Aj → T Aj is a diffeommorphism. Then Z

log |T 0 | dµ.

hµ (T ) = [0,1)

Proof. Since α is a generating partition, by Theorem 9.2.1 in combination with the Kolmogorov-Sinai Theorem, 

hµ (T ) = lim Hµ α| n→∞

n _

T

−i



Z

α = lim

i=1

I n→∞ [0,1) α|

Wn i=1

T −i α

dµ.

From Proposition 9.4.1 and by setting, as in the proof of the ShannonMcMillan-Breiman Theorem, f = Iα|σ(S∞ T −i α) = Iα|T −1 B , we obtain i=1 using the Dominated Convergence Theorem that Z

hµ (T ) =

f dµ. [0,1)

Note that f =−

X

1Ak log Eµ (1Ak |T −1 B).

(9.6)

k∈D

From the proof of Theorem 6.2.2 it follows that for any n ≥ 1 and any g ∈ L1 ([0, 1), B, µ), n PT,µ g ◦ T n = Eµ (g|T −n B),

(9.7)

Entropy  167

where PT,µ is the Perron-Frobenius operator of T with respect to µ. Even though an NTFM is not necessarily C 2 on the sets Aj and the partition α might be countable, the conditions of the theorem guarantee that the expression for PT,λ from (6.8) is still valid for any non-negative integrable function. So we can combine (6.8) with Exercise 6.2.4(a) to obtain for each j, k ∈ D that Eµ (1Ak |T −1 B) =

PT,λ ( dµ dλ 1Ak ) dµ dλ

◦T

=

dµ −1 −1 1 X ( dλ ◦ Tj ◦ T )(1Ak ◦ Tj ◦ T ) 1T −1 (T Ak ) dµ |T 0 ◦ Tj−1 ◦ T | dλ ◦ T j∈D

=

−1 −1 1 ( dµ dλ ◦ Tk ◦ T )(1Ak ◦ Tk ◦ T ) , dµ 0 ◦ T −1 ◦ T | |T ◦ T k dλ

where Tj denotes the local inverse of T on Aj . From (9.6) we then see that  dµ   dµ  f = log ◦ T − log + log |T 0 | dλ dλ and since µ is T -invariant, Proposition 1.2.2 implies the result. Remark 9.4.1. Note that in the proof of Rohlin’s Formula we only used dµ 1 that dµ dλ ∈ L ([0, 1), B, µ) and not that c1 ≤ dλ ≤ c as in condition (b) of the definition of NTFM. This condition is only necessary for Lochs’ Theorem in the next section. Example 9.4.1. Letβ ∈ (1, 2) and consider the β-transformation T x =    βx (mod 1). Set α = 0, β1 , β1 , 1 . From (T n )0 x = β n for any x ∈ [0, 1], Wn−1 −i T α. This implies that it follows that λ(I) ≤ β1n for any I ∈ i=0 property (a) from the definition of NTFM holds. Example 6.3.4 contains a formula for the invariant density of an invariant measure µ for T that is equivalent to Lebesgue measure. In fact this density is bounded from above and bounded away from zero. Hence, T has property (b). By Theorem 9.4.2 then Z

hµ (T ) =

log |T 0 | dµ = log β.

[0,1]

Exercise 9.4.6. Let β > 1 satisfy β 3 − β 2 − 1 = 0 and consider the negative β-transformation T : [0, 1] → [0, 1] as in Example 5.1.4, so Tx =

 1 − βx,

if 0 ≤ x < β1 ,

2 − βx,

if

1 β

≤ x ≤ 1.

168  A First Course in Ergodic Theory

(a) Prove that T is measure preserving with respect to the measure µ on [0, 1] given by µ(A) =

1 C

Z A

(β − 1)1[0, 1 − β

1 β2

)

+

1 1 + 1 1 1 [ β ,1] dλ β [2−β, β )

for each A ∈ B, where C is the normalizing constant. (b) Prove that hµ (T ) = log β.

9.5

LOCHS’ THEOREM

In 1964 in [37], G. Lochs compared decimal and continued fraction expansions of real numbers in [0, 1). Let x ∈ [0, 1) be an irrational number, and suppose x = .d1 d2 . . . is the decimal expansion of x (which is generated by iterating the ×10 map Sx = 10x (mod 1). Suppose further that 1 x= = [0; a1 , a2 , . . .] (9.8) 1 a1 + 1 a2 + 1 a3 + .. . is its regular continued fraction expansion (generated by the Gauss map T x = x1 − (mod 1)). Let y = .d1 d2 · · · dn be the rational number determined by the first n decimal digits of x, and let z = y +10−n . Then, [y, z) is the decimal cylinder of order n containing x, which we also denote by Bn (x). Now let 1 y= 1 b1 + 1 . b2 + . . + b` and 1

z= c1 +

1 1 . c2 + . . + ck

be the continued fraction expansions of y and z. Let m(n, x) = max {i ≤ min{`, k} : bj = cj for all j ≤ i} .

(9.9)

Entropy  169

In other words, if Bn (x) denotes the decimal cylinder consisting of all points y in [0, 1) such that the first n decimal digits of y agree with those of x, and if Cj (x) denotes the continued fraction cylinder of order j containing x, i.e., Cj (x) is the set of all points in [0, 1) such that the first j digits in their continued fraction expansion are the same as those of x, then m(n, x) is the largest integer such that Bn (x) ⊂ Cm(n,x) (x). So, m(n, x) is the number of regular continued fraction digits of x that can be determined from knowing the first n decimal digits of x. Lochs proved the following theorem. As before, let λ denote the Lebesgue measure on [0, 1). Theorem 9.5.1 (Lochs’ Theorem). For λ-a.e. x ∈ [0, 1), lim

n→∞

m(n, x) 6 log 2 log 10 = . n π2

In this section, we will prove a generalization of Lochs’ Theorem that allows one to compare any two known expansions of numbers. We show that Lochs’ Theorem is true for any two sequences of interval partitions on [0, 1) satisfying the conclusion of the Shannon-McMillan-Breiman Theorem. The content of this section as well as the proofs can also be found in [13]. We begin with a few definitions that will be used in the arguments to follow. Let P be an interval partition of [0, 1) (see also Section 6.3). For x ∈ [0, 1), we let P (x) denote the interval of P containing x. Definition 9.5.1. Let P = (Pn )n≥1 be a sequence of interval partitions of [0, 1) and let c ≥ 0. We say that P has entropy c a.e. with respect to λ if log λ(Pn (x)) lim − = c λ − a.e. n→∞ n Note that we do not assume that each Pn is refined by Pn+1 . Suppose that P = (Pn )n≥1 and Q = (Qn )n≥1 are two sequences of interval partitions of [0, 1). For each n ∈ N and x ∈ [0, 1), define mP,Q (n, x) = sup {m : Pn (x) ⊆ Qm (x)} . The following result is the main ingredient of the proof of the generalization of Lochs’ Theorem that we present below. Theorem 9.5.2. Let P = (Pn )n≥1 and Q = (Qn )n≥1 be two sequences of interval partitions of [0, 1). Suppose that for some constants c > 0

170  A First Course in Ergodic Theory

and d > 0, P has entropy c a.e. with respect to λ and Q has entropy d a.e. with respect to λ. Then c mP,Q (n, x) = n d

lim

n→∞

λ a.e.

Proof. First we show that lim sup n→∞

mP,Q (n, x) c ≤ n d

λ − a.e.

Fix ε > 0. Let x ∈ [0, 1) be a point at which the convergence conditions c+η of the hypotheses are met. Fix η > 0 so that < 1 + ε. Choose N c − dc η so that for all n ≥ N λ(Pn (x)) > 2−n(c+η) and λ(Qn (x)) < 2−n(d−η) . c Fix n so that min n, n ≥ N , and let m0 denote any integer greater d c than (1 + ε) n. By the choice of η, d 



λ(Pn (x)) > λ(Qm0 (x)) so that Pn (x) is not contained in Qm0 (x). Therefore c mP,Q (n, x) ≤ (1 + ε) n d and so

mP,Q (n, x) c ≤ (1 + ε) λ − a.e. n d Since ε > 0 was arbitrary, we have the desired result. lim sup n→∞

Now we show that lim inf n→∞

mP,Q (n, x) c ≥ n d

λ − a.e. 

Fix ε ∈ (0, 1). Choose η > 0 so that ζ := εc − η 1 + (1 − ε)

c d



> 0. For

c each n ∈ N let m(n) ¯ = (1 − ε) n . For brevity, for each n ∈ N we call d an element of Pn (respectively Qn ) (n, η)-good if 



λ(Pn (x)) < 2−n(c−η)

Entropy  171

(respectively λ(Qn (x)) > 2−n(d+η) ). For each n ∈ N, let (

Dn (η) =

Pn (x) is (n, η)-good and Qm(n) (x) is (m(n), ¯ η)-good ¯ x : and Pn (x) * Qm(n) (x) ¯

)

.

If x ∈ Dn (η), then Pn (x) contains an endpoint of the (m(n), ¯ η)-good interval Qm(n) (x). By the definition of Dn (η) and m(n), ¯ ¯ λ(Pn (x)) < 2−nζ . λ(Qm(n) (x)) ¯ Since no more than one atom of Pn can contain a particular endpoint of an atom of Qm(n) , we see that ¯ λ(Dn (η)) ≤

X

Pn (x) ≤ 2

x∈Dn (η)

and so

2−nζ Qm(n) (x) < 2 · 2−nζ ¯

X x∈Dn (η)

∞ X

λ(Dn (η)) < ∞.

n=1

By the Borel-Cantelli Lemma, this implies that λ({x ∈ [0, 1) : x ∈ Dn (η) i.o.}) = 0. Since m(n) ¯ goes to infinity as n does, we have shown that for almost every x ∈ [0, 1), there exists an N ∈ N, so that for all n ≥ N , Pn (x) is (n, η)-good and Qm(n) (x) is (m(n), ¯ η)-good and x ∈ / Dn (η). In other ¯ words, for almost every x ∈ [0, 1), there exists an N ∈ N, so that for all n ≥ N , Pn (x) is (n, η)-good and Qm(n) (x) is (m(n), ¯ η)-good and ¯ Pn (x) ⊂ Qm(n) (x). Thus, for almost every x ∈ [0, 1), there exists an ¯ N ∈ N, so that for all n ≥ N , c mP,Q (n, x) ≥ m(n) ¯ = (1 − ε) n . d 



This proves that lim inf n→∞

mP,Q (n, x) c ≥ (1 − ε) n d

λ − a.e.

Since ε > 0 was arbitrary, we have established the theorem.

172  A First Course in Ergodic Theory

The above result allows us to compare any two well-known expansions of numbers. Since the commonly used expansions are usually performed for points in the unit interval, our underlying space will be ([0, 1), B, λ), where B is the Lebesgue σ-algebra, and λ the Lebesgue measure. Recall the definition of an NTFM from Definition 9.4.1. For an NTFM T with corresponding invariant probability measure µ and W −i generating interval partition α, write αn = n−1 i=0 T α for the interval partition into cylinder sets of order n and αn (x) for the partition element from αn that contains the point x. The fact that each αn is an interval partitions follows from property (a). We have the following theorem of which Lochs’ Theorem is a specific instance. Theorem 9.5.3. Let T and S be two NTFM’s on [0, 1) with corresponding invariant probability measures µ and ν and generating interval partitions α and β, respectively. Assume that both T and S are ergodic with respect to Lebesgue measure and that hµ (T ), hν (S) > 0. For n ≥ 1 and x ∈ [0, 1) let m(n, x) = sup{m ≥ 1 : αn (x) ⊆ βm (x)}. Then, m(n, x) hµ (T ) = λ-a.e. n→∞ n hν (S) Proof. By the assumption that Hµ (α), Hν (β) < ∞ we can apply the Shannon-McMillan-Breiman Theorem to obtain that log µ(αn (x)) lim − = hµ (T ) µ-a.e. n→∞ n and similarly log ν(βn (x)) lim − = hν (S) ν-a.e. n→∞ n Since µ and ν are equivalent to λ with bounded density, these statements also hold for λ-almost every x ∈ [0, 1). This means that the sequences of partitions (αn ) and (βn ) satisfy the conditions of Theorem 9.5.2, which gives the result. lim

Exercise 9.5.1. Show that the ×N transformation TN x = N x (mod 1), N ≥ 2 an integer, is an NTFM. From Theorem 9.5.1 one can deduce that typically regular continued fraction digits give us slightly more information on the precise value of a number x than its decimal digits. How large should we choose N so that the information obtained from knowing digits in base N is typically larger than the information given by the same number of regular continued fraction digits?

CHAPTER

10

The Variational Principle

Dynamical systems can have many different invariant probability measures each with their own metric entropy. Since entropy reflects the average amount of information gained by applications of the transformation T , one could wonder whether it is possible to find measures that maximize this amount. The search for such maximal measures is simplified when T is a continuous map on a compact metric space (X, d) and therefore, as in Chapter 7, we choose this setup here. For continuous maps T : X → X one can define a topological analogue of the metric entropy by replacing the measurable partitions in the definition of metric entropy by open covers. The first section below includes two equivalent definitions of the notion of topological entropy. In the second section we prove the Variational Principle, which establishes a powerful relationship between topological entropy and metric entropy. In the last section we discuss measures for which the metric entropy is as large as possible.

10.1

TOPOLOGICAL ENTROPY

The definition of topological entropy comes in two flavors. Topological entropy was first introduced by R. L. Adler, A. G. Konheim and M. H. McAndrew in 1965 in [2] as an analogue of the successful concept of metric entropy. Their definition is in terms of open covers and is very similar to the definition of metric entropy from Chapter 9. It requires the space X to be a compact metric space. The second (and

173

174  A First Course in Ergodic Theory

chronologically later) definition was first investigated by R. Bowen in [7] and E. I. Dinaburg in [15]. It uses (n, ε)-separating and spanning sets and only requires X to be a metric space. To ease the exposition we let (X, d) be a compact metric space in both cases and as we shall see both concepts then become equivalent. Let (X, d) be a compact metric space and T : X → X a continuous map. An open cover of X, is a collection α of open subsets of X such S that X ⊆ A∈α A. Recall the definitions of refinement and common refinement of collections of sets from Definition 9.2.1. Exercise 10.1.1. For a finite collection (αi )ni=1 of open covers of X and a continuous transformation T : X → X show that n _

αi =

n n\

Aji : Aji ∈ αi

o

i=1

i=1

and T −1 α = {T −1 A : A ∈ α} are again open covers of X. Moreover, show that T −1 (α ∨ β) = T −1 (α) ∨ T −1 (β) and that α ≤ β implies T −1 α ≤ T −1 β (c.f. Exercise 9.2.1). Let α be an open cover of X. The diameter of α is given by diamd (α) := sup diamd (A) = sup sup d(x, y). A∈α

A∈α x,y∈A

By the compactness of X any open cover has a finite subcover. Let N (α) be the number of sets in a finite subcover of α of minimal cardinality and define the entropy of α to be Htop (α) = log(N (α)). The following proposition summarizes some easy properties of Htop (α). The proof is left as an exercise. Proposition 10.1.1. Let α be an open cover of X. Then the following hold. (i) Htop (α) ≥ 0 and Htop (α) = 0 if and only if N (α) = 1 if and only if X ∈ α. (ii) If α ≤ β, then Htop (α) ≤ Htop (β). (iii) Htop (α ∨ β) ≤ Htop (α) + Htop (β). (iv) For T : X → X continuous we have Htop (T −1 α) ≤ Htop (α). If T is surjective, then Htop (T −1 α) = Htop (α).

The Variational Principle  175

Exercise 10.1.2. Prove Proposition 10.1.1. The topological entropy of T with respect to the open cover α is then defined as  n−1  _ 1 htop (α, T ) = lim Htop T −i α . n→∞ n i=0 The existence of the limit on the right-hand side follows as for the metric entropy in Proposition 9.2.3 using the subadditivity of the sequence  W −i Htop n−1 T α . i=0 Exercise 10.1.3. Prove that limn→∞ n1 Htop

Wn−1 i=1

T −i α exists. 

Proposition 10.1.2. The topological entropy htop (α, T ) of a continuous transformation T with respect to an open cover α satisfies the following properties. (i) htop (α, T ) ≥ 0. (ii) If α ≤ β, then htop (α, T ) ≤ htop (β, T ). (iii) htop (α, T ) ≤ Htop (α). Proof. These statements are easy consequences of Proposition 10.1.1. For example, for the third statement it follows from Proposition 10.1.1(iii) and (iv) that Htop

 n−1 _ i=0

T

−i



α ≤

n−1 X

Htop (T −i α) ≤ nHtop (α).

i=0

This brings us to the first definition of topological entropy. Definition 10.1.1. Let T : X → X be a continuous transformation on a compact metric space (X, d). The topological entropy of T is h1 (T ) = sup htop (α, T ), α

where the supremum is taken over all open covers α of X. The next result states that the topological entropy can be obtained from a suitable sequence of open covers. Lemma 10.1.1. Let (X, d) be a compact metric space. Suppose (αn )n≥1 is a sequence of open covers of X such that limn→∞ diamd (αn ) = 0. Then limn→∞ htop (αn , T ) = h1 (T ).

176  A First Course in Ergodic Theory

Proof. We will only prove the lemma for h1 (T ) < ∞. Let ε > 0 be arbitrary and let β be an open cover of X such that htop (β, T ) > h1 (T )− ε. By Theorem 12.1.1(vi) there exists a Lebesgue number δ > 0 for β and by assumption there exists an N > 0 such that diamd (αn ) < δ for n ≥ N . So if n ≥ N , then for any A ∈ αn there exists a B ∈ β such that A ⊆ B. In other words, β ≤ αn . It follows from Proposition 10.1.2(ii) that h1 (T ) − ε < htop (β, T ) ≤ htop (αn , T ) ≤ h1 (T ) for all n ≥ N and the result follows. Exercise 10.1.4. Finish the proof of Lemma 10.1.1 by showing that if h1 (T ) = ∞, then limn→∞ htop (αn , T ) = ∞. Exercise 10.1.5. Let T : X → X be a continuous transformation on a compact metric space (X, d). For any subset A ⊆ X and any open cover β let N (A, β) denote the minimal cardinality that any subset of β covering A can have. (a) Fix two open covers α and β of X. Prove that N

 n−1 _ i=0

T −i β



≤N

 n−1 _





T −i α + A∈

i=0

Wmax n−1 i=0

N A,

T −i α

n−1 _



T −i β .

i=0

(b) Deduce that for any open cover α of X, 1 log n→∞ n





htop (T ) ≤ htop (α, T ) + sup lim β

A∈

Wmax n−1 i=0

N A,

T −i α

n−1 _

T −i β



,

i=0

where the supremum is taken over all open covers of X. Let us now turn to the second definition of topological entropy. This approach was first explored by R. Bowen and E. I. Dinaburg in 1971 and is based on measuring the exponential growth rate of the number of essentially different initial parts of orbits. For each n ≥ 1 define a new metric dn on X by setting dn (x, y) = max d(T i x, T i y). 0≤i≤n−1

Hence, dn measures the maximal distance between the first n elements in the orbits of x and y.

The Variational Principle  177

For n ≥ 1 and ε > 0 we say that a collection α of open subsets of X is (n, ε)-covering if α is an open cover of X and diamdn (A) < ε for each A ∈ α. Let Co(n, ε, T ) be the minimal cardinality that any covering of X by open sets of dn -diameter less than ε can have. The compactness of X implies that Co(n, ε, T ) < ∞. Moreover, if 0 < ε1 < ε2 , then Co(n, ε1 , T ) ≥ Co(n, ε2 , T ). So Co(n, ε, T ) decreases in ε. Lemma 10.1.2. For any ε > 0 the limit limn→∞ and is finite.

1 n

log Co(n, ε, T ) exists

Proof. Fix n, m ≥ 1 and ε > 0. Let α and β be two open covers of X consisting of sets of dm -diameter and dn -diameter smaller than ε and with cardinalities Co(m, ε, T ) and Co(n, ε, T ), respectively. Pick any A ∈ α and B ∈ β. Then, for x, y ∈ A ∩ T −m B, dm+n (x, y) =

max

0≤i≤m+n−1

= max

n

d(T i x, T i y)

max d(T i x, T i y),

0≤i≤m−1

max

m≤j≤m+n−1

o

d(T j x, T j y)

< ε. So α ∨ T −m β is an open cover of X of dm+n -diameter less than ε. Moreover, the cardinality of α ∨ T −m β is at most Co(m, ε, T )· Co(n, ε, T ). Hence, Co(m + n, ε, T ) ≤ Co(m, ε, T )· Co(n, ε, T ). Since log is an increasing function, the sequence (an ) defined by an = log Co(n, ε, T ) is subadditive and the result follows from Proposition 9.2.2. From the monotonicity of Co(n, ε, T ) in ε it follows that 1 log Co(n, ε, T ) n→∞ n

h2 (T ) := lim lim ε↓0

exists, although it may be infinite. We will take this as our second definition of topological entropy. Before we prove that h1 (T ) = h2 (T ), we give two alternative formulations of h2 (T ). For n ≥ 1 and ε > 0 a subset A ⊆ X is called (n, ε)-spanning for X if for all x ∈ X there exists a y ∈ A such that dn (x, y) < ε. Let Sp(n, ε, T ) denote the minimal cardinality that any (n, ε)-spanning set can have. The fact that Sp(n, ε, T ) < ∞ follows from the compactness of X, so this minimal cardinality is well defined. A subset A ⊆ X is called (n, ε)-separated if any x, y ∈ A with x 6= y satisfy dn (x, y) ≥ ε.

178  A First Course in Ergodic Theory

X The dots indicate a subset A ⊆ X and the dashed circles  d1 -balls of radius ε. A is (1, ε)-spanning and 1, ε -separated. Figure 10.1

We define Se(n, ε, T ) to be the maximal cardinality that any (n, ε)separated set can have. The fact that Se(n, ε, T ) is finite also follows from compactness, see Exercise 10.1.6. Hence, an (n, ε)-separated set of maximal cardinality exists. Both quantities Sp(n, ε, T ), Se(n, ε, T ) are decreasing in ε. Figure 10.1 illustrates both concepts. Exercise 10.1.6. Prove that Se(n, ε, T ) < ∞. We can also formulate the above in terms of open balls. If we use B(x, r) = {y ∈ X : d(x, y) < r} to denote an open ball in the metric d, then the open ball with centre x and radius r in the metric dn is given by Bn (x, r) :=

n−1 \

T −i B(T i x, r).

(10.1)

i=0

Hence, A is (n, ε)-spanning for X if X=

[

Bn (a, ε)

a∈A

and A is (n, ε)-separated if A ∩ Bn (a, ε) = {a} for all a ∈ A. The following relation holds between Co(n, ε, T ), Sp(n, ε, T ) and Se(n, ε, T ). Lemma 10.1.3. For any n ∈ Z≥0 and ε > 0 it holds that Co(n, 3ε, T ) ≤ Sp(n, ε, T ) ≤ Se(n, ε, T ) ≤ Co(n, ε, T ).

The Variational Principle  179

Proof. We will only prove the last two inequalities and leave the first one as an exercise to the reader. Let A be an (n, ε)-separated set of cardinality Se(n, ε, T ). Suppose A is not (n, ε)-spanning for X. Then there is some x ∈ X such that dn (x, a) ≥ ε for all a ∈ A. But then A ∪ {x} is an (n, ε)-separated set of cardinality larger than Se(n, ε, T ). This contradiction shows that A is an (n, ε)-spanning set for X. The second inequality now follows since the cardinality of A is at least as large as Sp(n, ε, T ). To prove the third inequality, let A be an (n, ε)-separated set of cardinality Se(n, ε, T ). Note that if α is an open cover of dn -diameter less than ε, then no element of α can contain more than one element of A. This holds in particular for an open cover of minimal cardinality, so the third inequality is proved. Exercise 10.1.7. Finish the proof of the above lemma by proving the first inequality. From Lemma 10.1.2, the monotonicity of Sp(n, ε, T ) and Se(n, ε, T ) in ε and Lemma 10.1.3 we obtain that lim lim

ε↓0 n→∞

1 1 log Co(n, ε, T ) = lim lim inf log Sp(n, ε, T ) n→∞ ε↓0 n n 1 = lim lim sup log Sp(n, ε, T ) ε↓0 n→∞ n

and lim lim

ε↓0 n→∞

1 1 log Co(n, ε, T ) = lim lim inf log Se(n, ε, T ) ε↓0 n→∞ n n 1 = lim lim sup log Se(n, ε, T ). ε↓0 n→∞ n

This motivates the following definition. Definition 10.1.2. The topological entropy of a continuous transformation T : X → X on a compact metric space (X, d) is given by 1 log Co(n, ε, T ) n 1 1 = lim lim inf log Sp(n, ε, T ) = lim lim sup log Sp(n, ε, T ) n→∞ ε↓0 ε↓0 n→∞ n n 1 1 = lim lim inf log Se(n, ε, T ) = lim lim sup log Se(n, ε, T ). n→∞ ε↓0 ε↓0 n→∞ n n

h2 (T ) = lim lim

ε↓0 n→∞

180  A First Course in Ergodic Theory

Note that there seems to be a concealed ambiguity in this definition, since h2 (T ) depends on the chosen metric d. Further on we see that from the compactness of (X, d) it follows that any metric that is topologically equivalent to d (induces the same topology) gives the same value for h2 (T ). First we show that Definitions 10.1.1 and 10.1.2 of h1 (T ) and h2 (T ) agree, justifying the use of htop (T ) for both. Theorem 10.1.1. If T : X → X is a continuous transformation on a compact metric space (X, d), then h1 (T ) = h2 (T ) =: htop (T ). Proof. We first show that h2 (T ) ≤ h1 (T ). Let (α ) be the sequence k  1 of open covers of X defined by αk = B x, 3k : x ∈ X . Then 2 diamd (αk ) = 3k → 0 as k → ∞, so by Lemma 10.1.1,  n−1  _ 1 log N T −i αk . n→∞ n i=0

h1 (T ) = lim htop (αk , T ) = lim k→∞

n−1 −i If we take x, y in the same element of i=0 T αk , then for each 0 ≤ 1 i i i ≤ n − 1 the points T x and T y are in the same ball of radius 3k . 2 1 Thus dn (x, y) < 3k < k , showing that the dn -diameter of the open cover  Wn−1 −i 1 Co n, 1 , T is upper bounded by the i=0 T αk is less than k . Hence, Wn−1 −i k cardinality of any subcover of i=0 T αk , so

W

 1   n−1  _ Co n, , T ≤ N T −i αk . k i=0

Subsequently taking the log, dividing by n, taking the limit for n → ∞ and the limit for k → ∞ gives the result. To prove that h1 (T ) ≤ h2 (T), let (βk )∞ be the sequence of open k=1  1 covers of X with βk = B x, k : x ∈ X . Then diamd (βk ) → 0 as k → ∞. Fix an n ≥ 1 and let A ⊆ X be any n, k1 -spanning set for X with cardinality Sp n, k1 , T ). For any a ∈ A and any 0 ≤ i ≤ n − 1 the open ball B T i a, k1 is an element of βk . Using the notation from (10.1), we have for each a ∈ A that  1  1   1 1 Bn a, = B a, ∩ T −1 B T a, ∩ · · · ∩ T −(n−1) B T n−1 a, k k k k n−1 _



T −1 βk .

i=0

Since A is n,

1 k



-spanning, we get X⊆

[ a∈A



Bn a,

1 , k

The Variational Principle  181 −1 so {Bn a, k1 : a ∈ A} is a finite subcover of n−1 βk of cardinality i=0 T  W n−1 −1 1 1 Sp n, k , T . Hence, N i=0 T βk ≤ Sp n, k , T , which gives



W

 n−1  1   _ 1 1 log N T −1 βk ≤ lim sup log Sp n, , T . n→∞ n k n→∞ n i=0

lim

Using Lemma 10.1.1 we obtain,  n−1  _ 1 log N T −1 βk k→∞ n→∞ n i=0  1  1 ≤ lim lim sup log Sp n, , T = h2 (T ). k→∞ n→∞ n k

h1 (T ) = lim lim

Hence, h1 (T ) = h2 (T ). The following theorem lists some elementary properties of the topological entropy. Theorem 10.1.2. Let (X, d) be a compact metric space and T : X → X continuous. Then the following hold. (i) If a metric ρ generates the same topology on X as d, then htop (T ) is the same under both metrics. (ii) htop (T ) is a conjugacy invariant. (iii) For each n ≥ 1 it holds that htop (T n ) = nhtop (T ). (iv) If T is a homeomorphism, then htop (T −1 ) = htop (T ). Consequently, for each n ∈ Z we have htop (T n ) = |n|htop (T ). Proof. (i) Since d and ρ induce the same topology on X, both identity maps i : (X, d) → (X, ρ) and j : (X, ρ) → (X, d) are continuous, so by the compactness of X also uniformly continuous. Let ε1 > 0. Then by the uniform continuity of j there is an ε2 > 0 such that for all x, y ∈ X, ρ(x, y) < ε2 ⇒ d(x, y) < ε1 , and then by the uniform continuity of i there is an ε3 > 0 such that for all x, y ∈ X, d(x, y) < ε3 ⇒ ρ(x, y) < ε2 . Let A be an (n, ε2 , ρ)-spanning set for T (where we have added the third argument to emphasize the metric). By definition this means that for each x ∈ X there is an a ∈ A, such that ρn (x, a) = max ρ(T i x, T i a) < ε2 . 0≤i≤n−1

182  A First Course in Ergodic Theory

But then dn (x, a) = max d(T i x, T i a) < ε1 , 0≤i≤n−1

so A is an (n, ε1 , d)-spanning set. Hence, Sp(n, ε1 , T, d) ≤ Sp(n, ε2 , T, ρ). Similarly we obtain Sp(n, ε2 , T, ρ) ≤ Sp(n, ε3 , T, d), yielding lim inf n→∞

1 1 log Sp(n, ε1 , T, d) ≤ lim inf log Sp(n, ε2 , T, ρ) n→∞ n n 1 ≤ lim inf log Sp(n, ε3 , T, d). n→∞ n

If we now let ε3 ↓ 0, ε2 ↓ 0 and then ε1 ↓ 0 we get htop,d (T ) ≤ htop,ρ (T ) ≤ htop,d (T ). (ii) Let (Y, ρ) be a compact metric space and S : Y → Y continuous and topologically conjugate to T with conjugacy ψ : Y → X. Define ˜ y) = d(ψ(x), ψ(y)). We claim that another metric d˜ on Y by setting d(x, ρ and d˜ induce the same topology on Y . To see this, let y ∈ Y and ε > 0 be arbitrary. Since ψ is a homeomorphism, the image ψ(Bρ (y, ε)) of the open ball Bρ (y, ε) is an open set in X and thuscontains an open ball Bd (ψ(y), ε˜) for some ε˜ > 0. Then ψ −1 Bd (ψ(y), ε˜) ⊆ Bρ (y, ε). Note that z ∈ ψ −1 Bd (ψ(y), ε˜) ⇔ d(ψ(z), ψ(y)) < ε˜ ⇔ z ∈ Bd˜(y, ε˜). 

Hence, Bd˜(y, ε˜) ⊆ Bρ (y, ε). Similarly we can show that any open ball Bd˜(y, ε) contains an open ball Bρ (y, ε˜), which gives the claim. Let x1 , x2 ∈ X. Fix an n ≥ 1 and set y1 = ψ −1 (x1 ), y2 = ψ −1 (x2 ). Then for each 0 ≤ i ≤ n − 1, d(T i x1 , T i x2 ) = d(T i ψ(y1 ), T i ψ(y2 )) ˜ i y1 , S i y2 ). = d(ψ(S i y1 ), ψ(S i y2 )) = d(S This means that a collection α of open subsets of X is an (n, ε, d)-cover ˜ of X if and only if the collection ψ(α) is an (n, ε, d)-cover of Y . Hence, ˜ Co(n, ε, T, d) = Co(n, ε, S, d) and thus htop (T ) = htop (S) by part (i). (iii) Fix an n ≥ 1. Observe that for each m ≥ 1 and x, y ∈ X, dm (T n x, T n y) = ≤

max d(T ni x, T ni y)

0≤i≤m−1

max

0≤j≤nm−1

d(T j x, T j y) = dnm (x, y).

The Variational Principle  183

Therefore Sp(m, ε, T n ) ≤ Sp(nm, ε, T ), which implies that 1 log Sp(m, ε, T n ) m 1 ≤ n lim lim sup log Sp(nm, ε, T ) ε↓0 m→∞ nm 1 ≤ n lim lim sup log Sp(m, ε, T ) = nhtop (T ). ε↓0 m→∞ m

htop (T n ) = lim lim inf ε↓0 m→∞

(10.2)

For the other inequality, let ε > 0. By the uniform continuity of T on X, and thus of T i for each 0 ≤ i ≤ n − 1, we can find a δ > 0 such that for all x, y ∈ X with d(x, y) < δ we have dn (x, y) < ε. Let m ≥ 1 and let A be an (m, δ)-spanning set for T n . Then, by definition for all x ∈ X there is an a ∈ A such that max d(T ni x, T ni a) < δ

0≤i≤m−1

and by the above, max

0≤j≤nm−1

d(T j x, T j a) =

max

max d(T ni+k x, T ni+k a) < ε.

0≤i≤m−1 0≤k≤n−1

Thus, Sp(mn, ε, T ) ≤ Sp(m, δ, T n ) and it follows that htop (T n ) ≥ nhtop (T ). (iv) Let n ≥ 1 and ε > 0. Let A be an (n, ε)-separated set for T . Then, for any x, y ∈ A with x 6= y it holds that dn (x, y) > ε. But then, max d(T −i (T n−1 x), T −i (T n−1 y)) = max d(T i x, T i y) > ε.

0≤i≤n−1

0≤i≤n−1

So T n−1 A is an (n, ε)-separated set for T −1 of the same cardinality. Conversely, every (n, ε)-separated set B for T −1 gives the (n, ε)-separated set T −(n−1) B for T . Thus, Se(n, ε, T ) = Se(n, ε, T −1 ) and the first statement follows. The second statement is a direct consequence of (iii). Exercise 10.1.8. Let (X, d), (Y, ρ) be compact metric spaces and T : X → X and S : Y → Y continuous. Show that htop (T × S) = htop (T ) + htop (S). (Hint: first show that the metric d˜ = max{d, ρ} induces the product topology on X × Y ). Example 10.1.1. Let (X, d) be a compact metric space and T : X → X an isometry, so d(T x, T y) = d(x, y) for each x, y ∈ X. Then T is obviously continuous and for each n ≥ 1 we get dn (x, y) = d(x, y).

184  A First Course in Ergodic Theory

Hence, a set A is (n, ε)-covering if and only if it is (1, ε)-covering, giving for each ε > 0 that lim

n→∞

1 1 log Co(n, ε, T ) = lim log Co(1, ε, T ) = 0. n→∞ n n

This yields htop (T ) = 0. In the proof of Proposition 7.3.3 we saw that any rotation Rθ : S1 → S1 , z 7→ e2πiθ z, θ ∈ [0, 1), is an isometry with respect to the arc length distance function which induces the topology on S1 . So, htop (Rθ ) = 0 for any θ ∈ [0, 1). Example 10.1.2. Let X = {0, 1, . . . , k − 1}Z (or X = {0, . . . , k − 1}N ) and let T : X → X be the left-shift. Recall from Example 7.1.1 that T is continuous with respect to the metric d from (7.1). There are more ways to put a metric on X. Exercise 10.1.9. Define the metric ρ on X by ∞ X |xn − yn |

ρ(x, y) =

n=−∞

2|n|

.

Prove that d and ρ are topologically equivalent. Proposition 10.1.3. Let X = {0, 1, . . . , k − 1}Z or X = {0, . . . , k − 1}N with the metric d from (7.1) and let T : X → X be the left shift. The topological entropy of T is equal to htop (T ) = log k. Proof. We will only prove the statement for the one-sided shift on X, since both proofs are similar. Let d be the metric from (7.1). Fix 0 < ε < 1 and any x = (xi )i≥1 , y = (yi )i≥1 ∈ X. Notice that if at least one of the first n symbols of x and y differ, then dn (x, y) = max d(T i x, T i y) = 1 > ε. 0≤i≤n−1

So, the set An := {a ∈ X : aj = 0, j ≥ n} is (n, ε)-separated with Se(n, ε, T ) ≥ k n . Hence, htop (T ) = lim lim sup ε↓0

n→∞

1 log Se(n, ε, T ) ≥ log k. n

To prove the reverse inequality, take ` ≥ 1 such that 2−` < ε. Then An+` is an (n, ε)-spanning set, since for every x ∈ X there is an a ∈ An+` for

The Variational Principle  185

which the first n + ` digits coincide. In other words, dn (x, a) < 2−` < ε. Therefore, htop (T ) = lim lim inf ε↓0 n→∞

n+` 1 log Sp(n, ε, T ) ≤ lim lim log k = log k ε↓0 n→∞ n n

and our proof is complete. As with the metric entropy it is often not so easy to compute the topological entropy of a specific system from the definition. In certain cases there is an easier way. The following theorem is a result by M. Misiurewicz and W. Szlenk from [43] from 1980 for piecewise monotone continuous interval maps. Let T : [0, 1] → [0, 1] be continuous. Then T is called piecewise monotone if there exists a finite interval partition (see also Section 6.3) α of [0, 1] so that the restriction of T to any element of α is monotone. The elements of such an interval partition of smallest cardinality are called the fundamental intervals of T , see Remark 2.3.1. We call the partition itself a fundamental interval partition. Note that a fundamental interval partition is not uniquely defined at the endpoints of the intervals. Note also that if T is a continuous piecewise monotone map, then so is any iterate T n . Theorem 10.1.3 (Misiurewicz and Szlenk). Let T : [0, 1] → [0, 1] be a continuous piecewise monotone map. Then htop (T ) = lim

n→∞

1 log Fn , n

where Fn is the number of fundamental intervals of T n . Before we prove the theorem, we prove the following result. Proposition 10.1.4. Let T : [0, 1] → [0, 1] be a piecewise monotone continuous interval map and α a fundamental interval partition. Then htop (T ) = htop (α, T ). Proof. Let ι be a covering of [0, 1] by finitely many intervals. Fix some W −i k n ≥ 1 and A ∈ n−1 i=0 T α. Then for each 0 ≤ k ≤ n the map T is monotone on A, so A ∩ T −k I is an interval for any I ∈ ι. Hence, if we let Ek denote the collection of all endpoints of the intervals in A ∩ T −k ι, then #Ek ≤ 2#ι. Any set I0 ∩ T −1 I1 ∩ · · · ∩ T −(n−1) In−1 ∈

n−1 _ i=0

T −i ι

186  A First Course in Ergodic Theory

is an interval with endpoints in the set ni=1 Ei , which has cardinality W −i at most 2n#ι. So the collection A ∩ n−1 i=0 T ι specifies at most 2n#ι possible endpoints in A. Since each interval has two endpoints and for each interval there are four possibilities to have the endpoints either open or closed, with this number of endpoints we can form at most 4(2n#ι)2 intervals. Hence, S



N A∩

n−1 _

T

−i



ι ≤ 4(2n#ι)2 .

i=0

From this it follows that 1 log n→∞ n





lim

A∈

N A∩

Wmax n−1 i=0

T −i α

n−1 _

T −i ι



i=0

≤ lim

n→∞

1 log(4(2n#ι)2 ) = 0. n

Hence, 1 log n→∞ n





sup lim ι

A∈

N A∩

Wmax n−1 i=0

T −i α

n−1 _



T −i ι

= 0,

i=0

where the supremum is taken over all finite covers of [0, 1] by intervals. Since any open subset B ⊆ [0, 1] can be written as a countable disjoint union of open intervals, we can find for each open cover β of [0, 1] a finite cover ι of [0, 1] by intervals such that  A∈

Wmax n−1 i=0

T −i α

N A∩

n−1 _ i=0

T

−i



β



≤ A∈

Wmax n−1 i=0

N A∩

T −i α

n−1 _

T

−i



ι .

i=0

The result then follows from Exercise 10.1.5. Proof of Theorem 10.1.3. For each n ≥ 1, let αn be a fundamental interval partition for T n . Let m, k ≥ 1 be given. Then the collection T −k αm ∨ αk consists of finitely many intervals on which the map T m+k is monotone. Hence, Fm+k ≤ # T −k αm ∨ αk ≤ Fm · Fk . 

This implies that (log Fn ) is a subadditive sequence, and hence by Proposition 9.2.2 the limit limn→∞ n1 log Fn exists.

The Variational Principle  187 1

1

0

T

Figure 10.2

1

0

1

T2

1

0

T3

1

The first three iterations of the logistic map from Exam-

ple 10.1.3. Now fix some n ≥ 1. By Theorem 10.1.2(iii), Proposition 10.1.4 and Lemma 10.1.2(iii) we have htop (T ) =

1 1 1 1 htop (T n ) = htop (αn , T n ) ≤ Htop (αn ) = log Fn . n n n n

Hence, 1 log Fn . n For the other direction, again fix some n ≥ 1. For any k ≥ 1 the colW −ni lection k−1 αn is a finite collection of intervals, such that the rei=0 T nk striction of T to each of these intervals is monotone. Hence, Fnk ≤ W −ni N k−1 αn , which by Proposition 10.1.4 and Theorem 10.1.2(iii) i=0 T leads to htop (T ) ≤ lim

n→∞

lim

k→∞

1 1 log Fk = lim log Fnk k→∞ nk k  k−1  _ 1 ≤ lim log N T −ni αn k→∞ nk i=0 1 1 = htop (αn , T n ) = htop (T n ) = htop (T ). n n

This finishes the proof. Example 10.1.3. Consider the logistic map T : [0, 1] → [0, 1], x 7→ 4x(1 − x) we saw in Example 6.3.1. See Figure 10.2 for the first three   iterations. The fundamental intervals of T are 0, 21 and 12 , 1 and since both intervals are mapped to [0, 1] by T , one immediately sees that T n has 2n fundamental intervals. Hence, htop (T ) = log 2.

188  A First Course in Ergodic Theory

Exercise 10.1.10. Let a > 1. Compute the topological entropy of the transformation (

T : [0, 1] → [0, 1], x 7→

ax, a+1 a

if 0 ≤ x ≤ a1 , − x, if a1 < x ≤ 1.

10.2 PROOF OF THE VARIATIONAL PRINCIPLE The Variational Principle is a striking result linking the notions of topological and metric entropy. It was originally obtained by E. I. Dinaburg, T. Goodman and L. Goodwyn, see [15, 18, 19] and states the following: if T : X → X is a continuous transformation on a compact metric space (X, d), then htop (T ) = sup{hµ (T ) : µ ∈ M (X, T )}, where M (X, T ) is the set of all T -invariant Borel probability measures as in Chapter 7. To prove this statement, we will proceed along the shortest and most popular route to victory, provided by M. Misiurewicz in [42] from 1976. The first part of the proof uses only some properties of metric entropy that we have listed in the following lemma. For any collection α of subsets of X and any µ ∈ M (X), let Nµ (α) be the number of sets in α of positive µ-measure. Throughout the section and as before we will use Φ to denote the function (

Φ : [0, ∞) → R, x 7→

−x log x, if x 6= 0, 0, if x = 0.

(10.3)

Recall that Φ is concave. Lemma 10.2.1. Let α, β be finite, measurable partitions of X and T a measure preserving transformation on the probability space (X, F, µ). Then the following hold. (i) Hµ (α) ≤ log(Nµ (α)). Equality holds if and only if µ(A) = for all A ∈ α with µ(A) > 0.

1 Nµ (α)

(ii) If α ≤ β, then hµ (α, T ) ≤ hµ (β, T ). P

Proof. (i) Recall that Hµ (α) = A∈α Φ(µ(A)) for the concave function Φ from (10.3). Applying Jensen’s Inequality then gives !

1 X Hµ (α) ≤ Nµ (α)Φ µ(A) = log(Nµ (α)). Nµ (α) A∈α

The Variational Principle  189

(ii) From α ≤ β we get that for every B ∈ β, there is some A ∈ α such that T −i B ⊆ T −i A for any i ∈ Z≥0 . Therefore, for n ≥ 1 n−1 _

T −i α ≤

n−1 _

i=0

T −i β.

i=0

The result now follows from Proposition 9.2.1(v), division by n and taking the limit as n → ∞. We are now ready to prove the first part of the Variational Principle. Theorem 10.2.1. Let T : X → X be a continuous transformation on a compact metric space (X, d). Then htop (T ) ≥ sup{hµ (T ) : µ ∈ M (X, T )}. Proof. We start by taking an arbitrary measure µ ∈ M (X, T ) and a finite partition α = {A1 , . . . , AN } of X into measurable sets. Our goal is to show that hµ (α, T ) < htop (T ) by finding an open cover of X that 1 is close to α. To do so, first pick an 0 < ε < N log N . By Theorem 12.6.1, µ is regular, so for each i = 1, . . . , N we can find a closed set Ci ⊆ Ai S such that µ(Ai \ Ci ) < ε. Set C0 = X \ N i=1 Ci and let γ be the partition γ = {C0 , . . . , CN }. If µ(C0 ) > 0, then since Ci ⊆ Ai for each i = 1, . . . , N we get from Jensen’s Inequality that Hµ (α|γ) =

N N X X

 µ(C ∩ A )  i j

µ(Ci )Φ

i=0 j=1

= µ(C0 )

µ(Ci )

N  µ(C ∩ A )  X 0 j

Φ

j=1

µ(C0 )





N µ(C0 ∩ Aj )  1 X ≤ N µ(C0 )Φ  N j=1 µ(C0 )

= µ(C0 ) log N < N ε log N < 1. From Exercise 9.3.1(b) we see that hµ (α, T ) ≤ hµ (γ, T ) + Hµ (α|γ) < hµ (γ, T ) + 1.

(10.4)

If µ(C0 ) = 0, then we have hµ (α, T ) = hµ (γ, T ), which also yields (10.4). For each 1 ≤ i ≤ N the set C0 ∪ Ci = X \ j6=i Cj is an open set, so the collection β = {C0 ∪ C1 , . . . , C0 ∪ CN } is an open cover of X. For S

190  A First Course in Ergodic Theory −i any B ∈ n−1 i=0 T β there are some some ij ∈ {1, . . . , N }, such that B can be written as

W

B = (C0 ∪ Ci0 ) ∩ T −1 (C0 ∪ Ci1 ) ∩ . . . ∩ T −(n−1) (C0 ∪ Cin−1 ) =

n−1 [

\

[

m=0 S⊆{0,1,...,n−1}: #S=m

T

−j

\

C0 ∩

T

−j



Cij .

(10.5)

j6∈S

j∈S

Here the sets S specify the positions in which we choose C0 instead of Cij . W −i We see that each element of n−1 i=0 T β can be written as a disjoint union W n−1 −i n of 2 elements from i=0 T γ, some of which might be empty. Recall W −i that N ( n−1 that any subcover i=0 T β) denotes the minimal cardinality Wn−1 −i Wn−1 −i of i=0 T β can have. Let β˜ be a subcover of i=0 T β of cardinality W N ( n−1 T −i β). Then every element of β˜ contains at most 2n elements i=0 Wn−1 −i from i=0 T γ. Moreover, β˜ is a cover of X, so each element from Wn−1 −i ˜ i=0 T γ is contained in at least one element from β. This leads to the estimate     Nµ

n−1 _

T

−i

γ

n

≤2 N

i=0

n−1 _

T −i β .

i=0

By Lemma 10.2.1(ii) we then obtain that  n−1  _ 1 1 Hµ (γ, T ) ≤ lim log Nµ T −i γ n→∞ n n→∞ n i=0

hµ (γ, T ) = lim

  n−1  _ 1 log 2n N T −i β n→∞ n i=0

≤ lim

= htop (β, T ) + log 2, so that hµ (α, T ) < htop (β, T ) + log 2 + 1 ≤ htop (T ) + log 2 + 1 and taking the supremum over all finite measurable partitions of X gives hµ (T ) < htop (T ) + log 2 + 1.

(10.6)

We now use the results on the entropies of T n from Exercise 9.3.3 and Theorem 10.1.2. Any T -invariant measure is automatically T n -invariant. So (10.6) also holds with T replaced by T n for any n ≥ 1. Then Exercise 9.3.3(d) and Theorem 10.1.2(iii) lead to nhµ (T ) = hµ (T n ) < htop (T n ) + log 2 + 1 = nhtop (T ) + log 2 + 1. Dividing by n, letting n → ∞ and taking the supremum over all µ ∈ M (X, T ) gives the result.

The Variational Principle  191

To finish the proof of the Variational Principle the opposite inequality still remains. We need a few lemmas. Lemma 10.2.2. Let (X, d) be a compact metric space and α a finite measurable partition of X. Then, for any µ, ν ∈ M (X, T ) and p ∈ [0, 1] we have Hpµ+(1−p)ν (α) ≥ pHµ (α) + (1 − p)Hν (α). Proof. The concavity of the function Φ from (10.3) gives for any measurable set A that 0 ≤ Φ(pµ(A) + (1 − p)ν(A)) − pΦ(µ(A)) − (1 − p)Φ(ν(A)). The result now follows easily. Exercise 10.2.1. (a) Suppose (X, d) is a compact metric space and T : X → X a continuous transformation. Use (the proof of) Lemma 10.2.2 to show that for any µ, ν ∈ M (X, T ) and p ∈ [0, 1] we have hpµ+(1−p)ν (T ) ≥ phµ (T ) + (1 − p)hν (T ). (b) Improve the result in part (a) by showing that we can replace the inequality by an equality sign, i.e., for any µ, ν ∈ M (X, T ) and p ∈ [0, 1] we have hpµ+(1−p)ν (T ) = phµ (T ) + (1 − p)hν (T ). Recall that the boundary of a set A is defined by ∂A = A \ Ao , where Ao denotes the interior of A. Lemma 10.2.3. Let (X, d) be a compact metric space. The following hold. (i) Let µ ∈ M (X). For any δ > 0, there is a finite, measurable partition α = {A1 , . . . , An } of X such that diamd (Aj ) < δ and µ(∂Aj ) = 0 for all j. (ii) Let T : X → X be continuous and µ ∈ M (X, T ). Suppose that for each j = 0, . . . , n  − 1 we have Aj ∈ B satisfying µ(∂Aj ) = 0. Then Tn−1 −j µ ∂ j=0 T Aj = 0. Proof. (i) First note that for every x ∈ X and δ > 0 we can find an 0 < η < δ such that the open ball B(x, η) satisfies µ(∂B(x, η)) = 0. To see this, suppose that the opposite is true. Then there exists an x ∈ X and a δ > 0 such that for all 0 < η < δ it holds that µ(∂B(x, η)) > 0. This gives an uncountable collection of disjoint subsets of X with positive measure, contradicting the fact that µ is a probability measure. To prove the statement, fix δ > 0 and for each x ∈ X let 0 < ηx < δ/2 be such that µ(∂B(x, ηx )) = 0. The collection {B(x, ηx ) : x ∈ X} forms

192  A First Course in Ergodic Theory

an open cover of X, so by compactness there exists a finite subcover which we denote by β = {B1, . . . , Bn }. Define α by letting A1 = B1 Sj−1 and let Aj = Bj \ k=1 Bk for 0 < j ≤ n. Then α is a partition of X into Borel measurable set with diamd (Aj ) ≤ diamd (Bj ) < δ and  S µ(∂Aj ) ≤ µ ni=1 ∂Bi = 0. n−1 −j −j (ii) Let x ∈ ∂ n−1 Aj . Then x ∈ Aj , but x ∈ / j=0 T j=0 T Tn−1 −j o T A . That is, every open neighborhood of x intersects evj j=0 ery T −j Aj , but there is a 0 ≤ k ≤ n − 1 for which x ∈ / (T −k Ak )o . Hence, −k x ∈ ∂T Ak and by continuity of T it then also follows that x ∈ T −k ∂Ak Tn−1 −j S −j (see Lemma 12.1.1). Hence, ∂ j=0 T Aj ⊆ n−1 ∂Aj . The statej=0 T ment follows since µ is T -invariant.



T

T

Lemma 10.2.4. Let q, n be integers such that 1 < q < n. Define for 0 ≤ j ≤ q − 1 the numbers a(j) = b n−j q c, where b·c denotes the integer part. Then the following hold. (i) a(0) ≥ a(1) ≥ · · · ≥ a(q − 1). (ii) Fix 0 ≤ j ≤ q − 1 and let Sj = {0, 1, . . . , j − 1, j + a(j)q, j + a(j)q + 1, . . . , n − 1}. Then {0, 1, . . . , n−1} = {j +rq +i : 0 ≤ r ≤ a(j)−1, 0 ≤ i ≤ q −1}∪Sj and #Sj ≤ 2q. (iii) For each 0 ≤ j ≤ q − 1, (a(j) − 1)q + j ≤ b n−j q − 1cq + j ≤ n − q. The numbers in the set {j + rq : 0 ≤ j ≤ q − 1, 0 ≤ r ≤ a(j) − 1} are all distinct and do not exceed n − q. The number a(j) represents the amount of times you have to add q to j to get a number between n − q and n. The proof of these three statements is left to the reader. To prove the main result of this section we construct a Borel probability measure µ with metric entropy hµ (T ) ≥ lim inf n→∞ n1 log Se(n, ε, T ). To do this, we first find a sequence of measures νn with Hνn

n−1 _

T −i α = log Se(n, ε, T ) 

i=0

for a suitably chosen partition α. Theorem 7.1.1 is then used to obtain a suitable µ ∈ M (X, T ) and the appropriate estimate for Hµ then comes W −i from applying Lemma 10.2.4 to the tails S(j) of the partition n−1 i=0 T α. Lemma 10.2.3 fills in the remaining technicalities.

The Variational Principle  193

Theorem 10.2.2. Let T : X → X be a continuous transformation on a compact metric space (X, d). Then for each ε > 0 there is a measure µ ∈ M (X, T ) that satisfies 1 hµ (T ) ≥ lim inf log Se(n, ε, T ). n→∞ n Proof. Fix ε > 0. For each n, let En be an (n, ε)-separated set of carP 1 dinality Se(n, ε, T ). Define νn ∈ M (X) by νn = Se(n,ε,T x∈En δx , ) where δx is the Dirac measure concentrated at x. Define µn ∈ M (X) by P −i µn = n1 n−1 i=0 νn ◦T . By Theorem 12.6.4, M (X) is compact, hence there exists a subsequence (nj ) such that (µnj ) converges weakly in M (X) to some µ ∈ M (X). By Theorem 7.1.1, µ ∈ M (X, T ). We will show that this measure µ satisfies the statement from the theorem. By Lemma 10.2.3(i), we can find a µ-measurable partition α = {A1 , . . . , Ak } of X such that diamd (Aj ) < ε and µ(∂Aj ) = 0 for all Wn−1 −i j = 1, . . . , k. Since En is (n, ε)-separated any set A ∈ i=0 T α can contain at most one element from En . Hence, either νn (A) = 0 or Sk Wn−1 −i 1 νn (A) = Se(n,ε,T j=1 Aj , we see that Hνn i=0 T α = ) . Since En ⊆ log(Se(n, ε, T )). Fix integers q, n with 1 < q < n and define for each 0 ≤ j ≤ q − 1 the numbers a(j) as in Lemma 10.2.4. Fix 0 ≤ j ≤ q − 1. Since n−1 _

T −i α =

 a(j)−1 _

T −(rq+j)

 q−1 _

r=0

i=0

T −i α





 _

i=0



T −i α ,

i∈Sj

we find using Proposition 9.2.1(i) and (iv) and Lemma 10.2.1(ii) that log Se(n, ε, T ) = Hνn

 n−1 _

T −i α



i=0 a(j)−1





X

Hνn T

−(rq+j)

r=0

X

T −i α

i=0

a(j)−1



 q−1 _

Hνn ◦T −(rq+j)

r=0

 q−1 _



+

X

Hνn (T −i α)

i∈Sj



T −i α + 2q log k.

i=0

Now if we sum the above inequality over j and divide both sides by n, we obtain by Lemmas 10.2.4(iii) and 10.2.2 that q−1

_  2q 2 X q 1 n−1 log Se(n, ε, T ) ≤ Hνn ◦T −` T −i α + log k n n `=0 n i=0

≤ Hµn

 q−1 _ i=0



T −i α +

2q 2 log k. n

194  A First Course in Ergodic Theory q−1 −i By Lemma 10.2.3(ii) each atom A of i=0 T α has boundary of µmeasure zero. Together with the weak convergence of µnj to µ in M (X) this implies that limj→∞ µnj (A) = µ(A) (see Theorem 12.6.5). Hence,

W

lim Hµnj

j→∞

 q−1 _ i=0

q−1

T

−i

_  2q 2 log k = Hµ T −i α , α + nj i=0 

giving that q−1

1 1  _ −i  T α . lim inf log Se(n, ε, T ) ≤ Hµ n→∞ n q i=0 Taking the limit as q → ∞ now yields the desired inequality. Corollary 10.2.1 (Variational Principle). The topological entropy of a continuous transformation T : X → X on a compact metric space (X, d) is given by htop (T ) = sup{hµ (T ) : µ ∈ M (X, T )}. Proof. By Theorem 10.2.1 we have htop (T ) ≥ sup{hµ (T ) : µ ∈ M (X, T )}. From Theorem 10.2.2 we get for each ε > 0 the existence of a measure µ = µε ∈ M (X, T ) with sup{hµ (T ) : µ ∈ M (X, T )} ≥ hµε (T ) ≥ lim inf n→∞

1 log Se(n, ε, T ). n

The other inequality then follows from Definition 10.1.2. To get a taste of the power of this statement, let us revisit our proof of the invariance of topological entropy under conjugacy given in Theorem 10.1.2(ii). Let (X, d), (Y, ρ) be two compact metric spaces with continuous transformations T : X → X and S : Y → Y and let ψ : X → Y be a conjugacy. Note that µ ∈ M (X, T ) if and only if µ◦ψ −1 ∈ M (Y, S). With these measures the maps T and S are measure preservingly isomorphic according to Definition 5.1.1 and thus by Theorem 9.2.2 we obtain that hµ (T ) = hµ◦ψ−1 (S). It then follows from the Variational Principle that htop (T ) = htop (S).

10.3

MEASURES OF MAXIMAL ENTROPY

The Variational Principle suggests an educated way for choosing a Borel probability measure on X, namely one that maximizes the entropy of T .

The Variational Principle  195

Definition 10.3.1. Let (X, d) be a compact metric space and T : X → X continuous. A measure µ ∈ M (X, T ) is called a measure of maximal entropy if hµ (T ) = htop (T ). Let Mmax (X, T ) = {µ ∈ M (X, T ) : hµ (T ) = htop (T )}. If Mmax (X, T ) = {µ}, then µ is called the unique measure of maximal entropy. Example 10.3.1. Recall that for any θ ∈ [0, 1) the circle rotation Rθ has topological entropy htop (Rθ ) = 0. We know from Chapter 7 that Haar measure λ is the unique invariant probability measure for Rθ . Since hλ (Rθ ) ≥ 0, it immediately follows from the Variational Principle that λ is the unique measure of maximal entropy for Rθ with hλ (Rθ ) = 0. Exercise 10.3.1. Consider the set K = {f : [0, 1] → [0, 1] : |f (x) − f (y)| ≤ |x − y| for all x, y ∈ [0, 1]} of Lipschitz continuous functions on [0, 1] with Lipschitz constant 1. Endowed with the uniform distance metric d(f, g) = sup |f (x) − g(x)| x∈[0,1]

the set K becomes a compact metric space. Let g ∈ K be a bijection. Define the mapping T : K → K by T ◦ f = f ◦ g. (a) Prove that T ◦ f ∈ K for all f ∈ K. (b) Prove that hµ (T ) = 0 for all µ ∈ M (K, T ). Measures of maximal entropy are closely connected to (uniquely) ergodic measures, as will become apparent from the following theorem. Theorem 10.3.1. Let (X, d) be a compact metric space and T : X → X continuous. Then the following hold. (i) Mmax (X, T ) is a convex set. (ii) If htop (T ) < ∞, then the extreme points of Mmax (X, T ) are precisely the ergodic members of Mmax (X, T ). (iii) If htop (T ) = ∞, then Mmax (X, T ) 6= ∅. If, moreover, T has a unique measure of maximal entropy, then T is uniquely ergodic. Proof. (i) Let p ∈ [0, 1] and µ, ν ∈ Mmax (X, T ). Then, by Exercise 10.2.1, hpµ+(1−p)ν (T ) = phµ (T ) + (1 − p)hν (T ) = phtop (T ) + (1 − p)htop (T ) = htop (T ).

196  A First Course in Ergodic Theory

Hence, pµ + (1 − p)ν ∈ Mmax (X, T ). (ii) Suppose µ ∈ Mmax (X, T ) is ergodic. Then, by Theorem 7.1.2, µ cannot be written as a non-trivial convex combination of elements of M (X, T ). Since Mmax (X, T ) ⊆ M (X, T ), µ is an extreme point of Mmax (X, T ). Conversely, suppose µ is an extreme point of Mmax (X, T ) and suppose there is a p ∈ (0, 1) and ν1 , ν2 ∈ M (X, T ) such that µ = pν1 + (1 − p)ν2 . By Exercise 10.2.1, htop (T ) = hµ (T ) = phν1 (T ) + (1 − p)hν2 (T ). Since hν1 (T ), hν2 (T ) ≤ htop (T ) < ∞, we must have htop (T ) = hν1 (T ) = hν2 (T ). Thus, ν1 , ν2 ∈ Mmax (X, T ). Since µ is an extreme point of Mmax (X, T ), we must have µ = ν1 = ν2 . Therefore, µ is also an extreme point of M (X, T ) and we conclude that µ is ergodic. (iii) By the Variational Principle we can find for any n ≥ 0 a measure P µn ∈ M (X, T ) with hµn (T ) > 2n . Define µ ∈ M (X, T ) by µ = n≥1 µ2nn . P µn µn P Then µ = N n=1 2n + n≥N +1 2n for any N ≥ 1. But then Exercise 10.2.1 implies that N X hµn hµ (T ) ≥ > N. 2n n=1 Since this holds for arbitrary N ≥ 1, we obtain hµ (T ) = htop (T ) = ∞ and µ is a measure of maximal entropy. Now suppose that µ is the unique measure of maximal entropy for T . Then, for any ν ∈ M (X, T ), hµ/2+ν/2 (T ) = 21 hµ (T ) + 12 hν (T ) = ∞. Hence, µ = ν and M (X, T ) = {µ}. Corollary 10.3.1. Let (X, d) be a compact metric space and T : X → X continuous. If T has a unique measure of maximal entropy, then it is ergodic. Conversely, if T is uniquely ergodic, then T has a measure of maximal entropy. Proof. The first statement follows from Theorem 10.3.1(ii) for htop (T ) < ∞ and (iii) for htop (T ) = ∞. If T is uniquely ergodic, then M (X, T ) = {µ} for some µ. By the Variational Principle, hµ (T ) = htop (T ), hence µ is a measure of maximal entropy. Exercise 10.3.2. Let X = {0, 1, . . . , k − 1}Z and let T : X → X be the left shift on X. Use Proposition 10.1.3 to show that the uniform product measure is the unique measure of maximal entropy for T . We end this section with a generalization of the above exercise. A homeomorphism T : X → X is called expansive with constant δ > 0

The Variational Principle  197

if for all x, y ∈ X with x 6= y there is a k = kx,y ∈ Z such that d(T k x, T k y) > δ. Example 10.3.2. Let T be the left shift on X = {0, 1, . . . , k − 1}Z and as in Example 7.1.1 consider the metric d(x, y) = 2− min{|i| : xi 6=yi } . We already saw in Example 7.1.1 that T is a homeomorphism on (X, d). If x, y ∈ X with x 6= y, then there is an i ∈ Z with xi 6= yi . Then d(T i x, T i y) = 1 and hence T is expansive with any constant δ < 1. Proposition 10.3.1. Every expansive homeomorphism of a compact metric space has a measure of maximal entropy. Proof. Let T : X → X be an expansive homeomorphism, and let δ > 0 be an expansive constant for T . Fix 0 < ε < δ and let µ ∈ M (X, T ) be the measure given by Theorem 10.2.2, so that hµ (T ) ≥ lim inf n→∞

1 log Se(n, ε, T ). n

We will show that htop (T ) = lim inf n→∞ n1 log Se(n, ε, T ), from which it then immediately follows that µ ∈ Mmax (X, T ). Pick any 0 < η < ε, n ≥ 1 and let A be an (n, η)-separated set of cardinality Se(n, η, T ). By expansiveness we can find for any x, y ∈ A some k = kx,y ∈ Z such that d(T k x, T k y) > ε. Since A is finite, ` := max{|kx,y | : x, y ∈ A} exists. For any pair of points x, y ∈ T −` A with x 6= y we have T ` x, T ` y ∈ A, so there is a k ∈ Z with |k| ≤ ` such that d(T k+` x, T k+` y) > ε. Hence, d2`+n (x, y) ≥ max d(T i x, T i y) > ε. 0≤i≤2`

This implies that T −` A is a (2` + n, ε, T )-separated set and #T −` A = Se(n, η, T ). Hence, Se(2` + n, ε, T ) ≥ Se(n, η, T ) and lim inf n→∞

1 1 log Se(n, η, T ) ≤ lim inf log Se(2` + n, ε, T ) n→∞ n n 1 = lim inf log Se(2` + n, ε, T ) n→∞ 2` + n 1 = lim inf log Se(n, ε, T ). n→∞ n

198  A First Course in Ergodic Theory

Conversely, any (n, ε)-separated set is also (n, η)-separated, so that Se(n, ε, T ) ≤ Se(n, η, T ). We conclude that lim inf n→∞

1 1 log Se(n, η, T ) = lim inf log Se(n, ε, T ). n→∞ n n

Since 0 < η < ε was arbitrary, this shows that htop (T ) = lim lim inf η↓0 n→∞

1 1 log Se(n, η, T ) = lim inf log Se(n, ε, T ). n→∞ n n

From the proof we extract the following corollary. Corollary 10.3.2. Let T : X → X be an expansive homeomorphism with constant δ. Then for any 0 < ε < δ, htop (T ) = lim inf n→∞

1 log Se(n, ε, T ). n

CHAPTER

11

Infinite Ergodic Theory

In large parts of the book we have only considered measure preserving transformations T : X → X on a probability space (X, F, µ) and in fact several fundamental results we have seen, the Ergodic Theorems in particular, do not work in case the underlying measure space is infinite. In this chapter we discuss the dynamics on infinite measure systems in more detail. We first consider some more examples and elaborate on the recurrence behavior of infinite measure systems. We then introduce jump and induced transformations. For these transformations, one considers the dynamics only at the time steps in which the system visits a well chosen finite measure subset of the space. Then many tools developed for finite measure systems in the previous chapters become available nevertheless. At the end of the chapter we discuss Ergodic Theorems in relation to infinite measure systems.

11.1

EXAMPLES

In Example 1.3.1 and Example 1.3.4 we already encountered two transformations that were measure preserving with respect to Lebesgue measure on R. Here we introduce two examples of transformations on the unit interval [0, 1] that are non-singular with respect to Lebesgue measure λ, but have an invariant measure that is infinite and absolutely continuous with respect to λ. Contrary to Examples 1.3.1 and 1.3.4 the infiniteness of the measure here is not caused by the Lebesgue measure of the domain, but by the presence of a neutral fixed point, i.e., a point c in the space for which T c = c and the derivative of T at c equals 1.

199

200  A First Course in Ergodic Theory

Example 11.1.1. The Farey map is the transformation T : [0, 1] → [0, 1] defined by ( x , if 0 ≤ x < 12 , T x = 1−x 1 1−x x , if 2 ≤ x ≤ 1. See Figure 11.1(a) for the graph. Note that 0 is a neutral fixed point for T . Let B denote the Lebesgue σ-algebra on [0, 1] and define the measure µ on ([0, 1], B) by Z

µ(A) = A

1 dλ(x) x

for all A ∈ B.

(11.1)

We check that µ is T -invariant. For any b ∈ [0, 1] it holds that µ((0, b)) = ∞ and since T0=0, T −1 (0, b) contains an interval of the form (0, c). Hence, µ(T −1 (0, b)) = ∞ = µ((0, b)). Let 0 < a < b ≤ 1. Then T

−1



(a, b) =

a b , a+1 b+1







1 1 , , b+1 a+1 

where the union is disjoint. Hence, b a − log b+1 a+1     1 1 + log − log a+1 b+1 = log b − log a = µ((a, b)).

µ(T −1 (a, b)) = log









The same holds for any other type of interval, so the invariance of µ follows from Theorem 1.2.1. Example 11.1.2. Define the transformation T : [0, 1] → [0, 1] by (

Tx =

x 1−x ,

2x − 1,

if 0 ≤ x < 12 , if 21 ≤ x ≤ 1.

This map equals the Farey map on the interval 0, 12 . The graph is shown in Figure 11.1(b). Let µ be the measure from (11.1). As in the previous example for any interval (0, b) with 0 < b ≤ 1 it holds that  −1 µ (0, b) = ∞ = µ T (0, b) . Let 0 < a < b ≤ 1. Then 



b a − log b+1 a+1     b+1 a+1 + log − log 2 2 = log b − log a = µ((a, b)).

µ(T −1 (a, b)) = log









Infinite Ergodic Theory  201 1

1

0

1 2

(a) Farey

1

1

1

0

1 2

1

(b) T from Example 11.1.2

0

1 23 2 34

1

(c) R´enyi

0

1 2

1

(d) LSV

The graph of the Farey map from Example 11.1.1 in (a), of the transformation from Example 11.1.2 in (b), of the R´enyi map from Exercise 11.1.1 in (c) and of an LSV maps from Exercise 11.1.2 in (d). Figure 11.1

Hence also this transformation T is measure preserving with respect to µ. In Examples 11.1.1 and 11.1.2, once an orbit lands near the neutral fixed point 0 it will stay close to 0 for a long time. The invariant measure µ from (11.1) reflects the amount of time typical orbits spend in different parts of the state space. It therefore has an accumulation of mass at 0 1  and in light of that what happens on the interval 2 , 1 has no significant effect. Exercise 11.1.1. The R´enyi map or backwards continued fraction map T : [0, 1) → [0, 1) is defined by Tx =

x (mod 1). 1−x

The graph is shown in Figure 11.1(c). (a) Verify that for Lebesgue a.e. x ∈ [0, 1) it holds that T x = S(1 − x), where S is the Gauss map from Example 1.3.7. Also find an invertible map ϕ : [0, 1) → [−1, 0), such that T = ϕ−1 ◦T0 ◦ϕ, where T0 : [−1, 0) → [−1, 0) is the α-continued fraction transformation from Example 6.3.3 with α = 0. (b) Find an invariant measure for T that is absolutely continuous with respect to the Lebesgue measure. Exercise 11.1.2. Let α > 0 and define the map Tα : [0, 1) → [0, 1) by (

Tα x =

x(1 + 2α xα ), if 0 ≤ x < 12 , 2x − 1, if 21 < x < 1.

202  A First Course in Ergodic Theory

Verify that the measure µ from (11.1) is not invariant for any LSV map Tα . These maps Tα , which are now commonly called LSV maps due to the article [36] by C. Liverani, B. Saussol and S. Vaienti, are a modification of the so-called Pomeau Manneville maps, see [53, 52]. Figure 11.1(d) shows the graph of one of them. Tα has a σ-finite invariant measure that is absolutely continuous with respect to Lebesgue measure λ for any α > 0. This is a probability measure for α ∈ (0, 1) and an infinite measure for α > 1.

11.2

CONSERVATIVE AND DISSIPATIVE PART

Recall from the statement of Halmos’ Recurrence Theorem that a conservative dynamical system is characterized by the property that for any measurable set B almost all points in B return to B infinitely often. In that sense conservative systems are those infinite measure systems that behave most like finite measure systems. Any infinite measure system can be split into two parts according to its recurrence behavior: the conservative and the dissipative part. To make this precise, we introduce the following notions. Definition 11.2.1. Let (X, F, µ) be a measure space. (i) A collection H ⊆ F is called hereditary if C ∈ H, B ⊆ C and B ∈ F imply that B ∈ H. (ii) A set U ∈ F is called a cover of H if µ(A \ U ) = 0 for all A ∈ H. (iii) A hereditary collection H is said to saturate a set A ∈ F if for all B ∈ F with B ⊆ A and µ(B) > 0 there is a C ∈ H with C ⊆ B and µ(C) > 0. (iv) A set U ∈ F is called the measurable union of a hereditary collection H if U is a cover of H and H saturates U . Measurable unions are unique modulo sets of measure zero. To see this, let U, U 0 be two measurable unions for the same hereditary collection H and assume µ(U ∆U 0 ) > 0. Then µ(U 0 \ U ) > 0 or µ(U \ U 0 ) > 0. Assume with no loss of generality that µ(U \ U 0 ) > 0. Then, since H saturates U , there is a C ∈ H with C ⊆ U \ U 0 and µ(C) > 0 and since U 0 covers H, µ(C \ U 0 ) = 0. This gives a contradiction. Hence, µ(U \ U 0 ) = 0 and by symmetry also µ(U 0 \ U ) = 0. The existence of a measurable union is given by Proposition 12.2.1 from the Appendix. We denote the measurable union of H by U (H).

Infinite Ergodic Theory  203

Exercise 11.2.1. Recall from Definition 2.1.1 that a set W ∈ F is called wandering for a transformation T if µ(T −n W ∩T −m W ) = 0 for all 0 ≤ n < m. Prove that the collection of wandering sets WT is hereditary, so that the measurable union U (WT ) exists. Definition 11.2.2. Let T : X → X be non-singular. The dissipative part of T is D(T ) := U (WT ). The conservative part is the complement: C(T ) = X \ D(T ). The partition {C(T ), D(T )} of X is called the Hopf decomposition of X. The set U (WT ) is the smallest measurable set containing all wandering sets in the sense that if A ∈ F has µ(A ∩ U (WT )) > 0, then A contains a wandering set W with µ(W ) > 0, see Definition 11.2.1(iii). In particular, if µ(U (WT )) > 0, then there is a set W ∈ WT with µ(W ) > 0 and T is not conservative. On the other hand, if there is a wandering set W for T with µ(W ) > 0, then µ(U (WT )) > 0. So T is conservative if and only if µ(D(T )) = 0. On the other end of the spectrum we have totally dissipative systems. Definition 11.2.3. A transformation T : X → X on a measure space (X, F, µ) is called totally dissipative if µ(C(T )) = 0. Example 11.2.1. Recall the dynamical system (R, B, λ, T ) from Example 1.3.1 where T is a translation on R by t ∈ R. For t 6= 0 each of the intervals [tn, t(n + 1)), n ∈ Z, is a wandering set for T , so λ([tn, t(n + 1)) \ D(T )) = 0. Hence also λ(R \ D(T )) =

X



λ [tn, t(n + 1)) \ D(T ) = 0.

n∈Z

This implies that λ(C(T )) = 0, and thus (R, B, λ, T ) is totally dissipative. Since T −1 WT ⊆ WT , it follows that µ(T −1 D(T ) \ D(T )) = 0. This implies that points in the conservative part of the space typically stay within the conservative part, but it makes no claim about what happens on the dissipative part. In case T is invertible and ergodic, we can say more. Proposition 11.2.1. Let T : X → X be an invertible transformation on a measure space (X, F, µ). Then there is a wandering set W , such that   [ n µ D(T )∆ T W = 0. n∈Z

204  A First Course in Ergodic Theory

Proof. Let I = {A ∈ F : T −1 A = A} be the sub-σ-algebra of F of T -invariant sets. Define the collection V=

 [



n

T W : W ∈ WT .

n∈Z

Then obviously V ⊆ I. We show that V is hereditary in I. To this end, S let W ∈ WT and consider A = n∈Z T n W ∈ V. Let B ⊆ A with B ∈ I. Then B ∩ W is a wandering set and [

T n (B ∩ W ) =

[

B ∩ T n W = B.

n∈Z

n∈Z

So, B ∈ V and V is hereditary. By Proposition 12.2.1 there then exists a countable collection (Ak ) of disjoint sets in V and with that a countable collection (Wk ) of disjoint wandering sets, such that U (V) =

[

Ak =

k≥1

[ [

T n Wk .

k≥1 n∈Z

Note that D(T ) = U (WT ) ⊆ U (V). Since the sets are all disjoint, for each k, `, m, n we have

S

n∈Z

T n Wk , k ≥ 1,

µ(T n Wk ∩ T m W` ) = 0. Hence, the set W =

S

k≥1

Wk is a wandering set and

U (V) =

[

T n W ⊆ U (WT ).

n∈Z

Exercise 11.2.2. Let T : X → X be an invertible and ergodic transformation on a measure space (X, F, µ). Prove that if µ is non-atomic, then T is conservative. The next proposition gives a characterization of the conservative part. Proposition 11.2.2. Let (X, F, µ, T ) be a dynamical system with T : X → X measure preserving. (i) For any f ∈ L1 (X, F, µ) with f ≥ 0 it holds that µ

n

x∈X :

X n≥1

o



f (T n x) = ∞ \ C(T ) = 0.

Infinite Ergodic Theory  205

(ii) For any f ∈ L1 (X, F, µ) with f > 0 µ-a.e. it holds that 

n

µ C(T )∆ x ∈ X :

X

f (T n x) = ∞

o

= 0.

n≥1

Proof. For (i) let f ∈ L1 (X, F, µ) with f ≥ 0 be given and let W be a wandering set. By the change of variable formula (6.6) together with the fact that T is measure preserving, we have for each k ≥ 1 and each Z Z

g ∈ L1 (X, F, µ) that we obtain Z

n X

f ◦ T k dµ =

W k=1

g ◦ T k dµ. Thus, for each n ≥ 1

g dµ = T −k W

W n−1 XZ

f ◦ T n−k dµ =

n−1 XZ

Z

= Sn−1 k=0

f ◦ T n dµ

−k k=0 T W

k=0 W n

f ◦ T dµ ≤

Z

T −k W

n

f ◦ T dµ =

X

Z

f dµ. X

Since f ≥ 0 we can apply the Monotone Convergence Theorem to get Z

X

f ◦ T k dµ ≤

W k≥1

Z

f dµ < ∞,

X

n so that µ W ∩ x ∈ X : n≥1 f (T x) = ∞ arbitrary wandering set, we obtain



P





X

µ U (WT ) ∩ x ∈ X :



n

= 0. Since W was an 

f (T x) = ∞

= 0,

n≥1

which gives the result. To prove (ii), assume that f > 0 µ-a.e. Let A ∈ F ∩ C(T ) satisfy µ(A) > 0. Then µ(A ∩ W ) = 0 for all wandering sets W . Set E = A ∩ {x ∈ X : f (x) > 0} and define for each k ≥ 1 the set n

Ak = A ∩ x ∈ X : f (x) >

1o . k

Then µ(E) = µ(A) > 0 and the sequence (Ak ) increases to E. So by continuity of measures we can find a k ≥ 1, such that µ(Ak ) > 0 and this implies that f |Ak ≥ k1 1Ak . Since Ak ⊆ A, we can apply Halmos’ Recurrence Theorem to A and Ak to get for µ-almost every x ∈ Ak that P n n≥1 1Ak ◦ T (x) = ∞. Hence, 1 f ◦ T dµ ≥ k A n≥1

Z X

n

Z

X

Ak n≥1

1Ak ◦ T n dµ = ∞.

206  A First Course in Ergodic Theory

Since this holds for all positive measure sets A ∈ F ∩ C(T ), it follows that  n o X µ C(T ) \ x ∈ X : f (T n x) = ∞ = 0. n≥1

Combining this with (i) then yields (ii). The following theorem, due to D. Maharam (see [40] from 1964), gives a way to check whether a system is conservative. Recall from Proposition 2.2.2 that in a conservative and ergodic system any set A ∈ F with µ(A) > 0 is visited eventually by µ-a.e. x ∈ X. A set A of finite, positive measure with this property is called a sweep-out set. Definition 11.2.4. A set A ∈ F is called a sweep-out set if 0 < µ(A) < S ∞ and µ X \ n≥0 T −n A) = 0. As in Remark 2.2.1 it follows that if A is a sweep-out set, then in fact µ-a.e. x ∈ X will visit A infinitely many times under T . Theorem 11.2.1 (Maharam’s Recurrence Theorem). Let (X, F, µ) be a measure space and T : X → X a measure preserving transformation. If there exists a sweep-out set for T , then T is conservative. Proof. Let A be a sweep-out set for T . Then by Remark 2.2.1, 

n

µ X\ x∈X :

X

o

1A (T n x) = ∞

= 0.

n≥1

From Proposition 11.2.2(i) we obtain that µ

n

x∈X :

X

o



1A (T n x) = ∞ \ C(T ) = 0.

n≥1

Hence, µ(X \ C(T )) = 0 and T is conservative. Example 11.2.2. Recall Boole’s transformation from Example 1.3.4  and consider the set A = − √12 , √12 . Then for Lebesgue measure λ we √ have λ(A) = 2 > 0. The endpoints of A are in the same period 2 orbit: 

T

1 −√ 2



1 =√ 2

1 T √ 2 

and



1 = −√ . 2

+ + √1 Define the sequence of points (x+ n )n≥0 ⊆ (0, ∞) by x0 = 2 and T xn = + n + x+ n−1 for n ≥ 1. Then for each n ≥ 1 it holds that T [xn−1 , xn ] =

Infinite Ergodic Theory  207

A. We can similarly define a sequence of points (x− n )n≥0 ⊆ (−∞, 0) + − and deduce from limn→∞ xn = ∞ and limn→∞ xn = −∞ that X = S −n A. So A is a sweep-out set. From Theorem 11.2.1 it follows n≥0 T that T is conservative. Example 11.2.3. Let T be any of the transformations from Exam  ple 11.1.1, Example 11.1.2 or Exercise 11.1.1. Take A = 12 , 1 . Then Z

µ(A) =

[ 21 ,1)

1 dλ(x) = log 2 ∈ (0, ∞). x

1 Notice that for each n ≥ 2 we have T n+1 = n1 , so T n−1 Hence,  [ [ 1 1 −n T A= , = (0, 1), n+1 n n≥0 n≥1



1 1 n+1 , n



= A.

which by Theorem 11.2.1 implies that T is conservative.

11.3

INDUCED SYSTEMS

A powerful technique to study infinite measure dynamical systems is inducing. Here the system is restricted to a subset of the state space X and the dynamics is only observed when it passes through this set. To be sure that almost all points return to this part, we take a conservative system as a starting point. Let (X, F, m) be a measure space and T : X → X a conservative transformation. Fix a set A ∈ F with 0 < m(A) < ∞. For x ∈ A, set r(x) := inf{n ≥ 1 : T n x ∈ A}. We call r(x) the first return time of x to A. By Corollary 2.1.1 r is finite m-a.e. on A. In the sequel we remove from A the zero measure set of points that do not return to A infinitely often, i.e., the set of points x ∈ A for which there is a k ≥ 1 with r(T k x) = ∞, and we denote the new set again by A. Consider the σ-algebra F ∩ A on A, which is the restriction of F to A. Exercise 11.3.1. Show that r is measurable with respect to F ∩ A. Let mA be the probability measure on (A, F ∩ A) defined by mA (B) =

m(B) m(A)

for B ∈ F ∩ A,

208  A First Course in Ergodic Theory

so that (A, F ∩ A, mA ) is a probability space. Define the induced transformation TA : A → A by TA x = T r(x) x. What kind of a transformation is TA ? Exercise 11.3.2. Show that TA is measurable with respect to F ∩ A. For k ≥ 1, let Ak = {x ∈ A : r(x) = k}, so that A =

S

k≥1

(11.2)

Ak . We can use these sets to prove the following.

Proposition 11.3.1. Let (X, F, m) be a measure space and T : X → X a conservative transformation. Then for any A ∈ F with 0 < m(A) < ∞ the induced transformation TA : A → A is non-singular with respect to the measure mA . Proof. Let C ∈ F ∩ A. Then we can write TA−1 C as a countable disjoint union according to the first return times, i.e., TA−1 C =

[

(Ak ∩ T −k C).

(11.3)

k≥1

Assume that mA (C) = 0. Then m(C) = 0 and the non-singularity of T gives 0 ≤ m(TA−1 C) =

X k≥1

m(T −k C ∩ Ak ) ≤

X

m(T −k C) = 0.

k≥1

So, mA (TA−1 C) = 0. For the other direction, assume mA (C) > 0. Then m(C) > 0 and by the conservativity of T there is an n ≥ 1, such that m(A ∩ T −n C) ≥ m(C ∩ T −n C) > 0. Let n0 be the least integer with this property. Since C ⊆ A, there exists a k ≤ n0 such that m(Ak ∩T −n0 C) > 0. We Ak ∩ T −n0 C ⊆ 0 and note that  claim k = n0. Suppose k< n T −k A ∩ T −(n0 −k) C . Hence, m T −k A ∩ T −(n0 −k) C > 0 and the non-singularity of T gives m(A ∩ T −(n0 −k) C) > 0, contradicting the minimality of n0 . Thus, m(An0 ∩ T −n0 C) > 0 implying m(TA−1 C) > 0 and then also mA (TA−1 C) > 0. The non-singularity of TA follows. This proposition, together with Exercise 11.3.2, implies that (A, F ∩ A, mA , TA ) is a dynamical system, which justifies the following definition.

Infinite Ergodic Theory  209

Definition 11.3.1. Let (X, F, m, T ) be a dynamical system with T : X → X conservative and let A ∈ F satisfy 0 < m(A) < ∞. Then the dynamical system (A, F ∩ A, mA , TA ) is called the induced system of (X, F, m, T ) on A. Example 11.3.1. Consider again the transformation T : [0, 1) → [0, 1) from Example 11.1.2 given by (

Tx =

x 1−x ,

2x − 1,

if 0 ≤ x < 12 , if 21 ≤ x < 1.

We construct the induced transformation TA for A = 12 , 1 . Obviously, 3  any x ∈ 4 , 1 has first return time r(x) = 1. Furthermore, for each  1 1 1 n ≥ 2 we have T n+1 = n1 , so T n−1 n+1 , n = A. The point n+1 2n lies in n+1 1 A and satisfies T 2n = n . Hence, using the notation from (11.2), 



An =

n+2 n+1 , , 2(n + 1) 2n





n ≥ 1,

or in other words r(x) = n precisely when x ∈ An . See Figure 11.2(a). x Write T1 : x 7→ 1−x for the first branch of T . Then for each n ≥ 1, x . (11.4) T1n x = 1 − nx Hence, on An it holds that TA x = T1n−1 (2x − 1) =

2x − 1 . n − (n − 1)2x

The graph of TA is shown in Figure 11.2(b). Proposition 11.3.2. Let (X, F, µ, T ) be a dynamical system with T : X → X conservative and let A ∈ F satisfy 0 < µ(A) < ∞. Then TA : A → A is conservative on (A, F ∩A, µA ). Moreover, if T is measure preserving with respect to µ, then so is TA with respect to µA . Proof. Recall the definition of the sets Ak from (11.2). To show that TA is conservative, let C ∈ F ∩ A. Note that for each x ∈ A, since C ⊆ A, X

1C (TAn x) = ∞



n≥1

X

1C (T n x) = ∞.

n≥1

T is conservative, so by Corollary 2.1.1 µ

n

x∈C :

X n≥1

o

1C (T n x) = ∞

= µ(C)

210  A First Course in Ergodic Theory

A2 A1

0

1 1 4 3

1 2

1

2 3 3 4

1 2

(a) T

5 2 8 3

1

3 4

(b) TA

The graph of the map T from Example 11.3.1 on the left and   its induced transformation on the interval A = 12 , 1 on the right. Figure 11.2

and hence µA

n

x∈C :

X

o

1C (TAn x) = ∞

n≥1

µ

n

x∈C :

=

o

n n≥1 1C (T x) = ∞

P

µ(A)

= µA (C).

Since C ∈ F ∩ A was arbitrary, Corollary 2.1.1 implies that TA is conservative. Now assume that T is measure preserving with respect to µ and let C ∈ F ∩ A. Since µ(C) = µ(T −1 C), for the measure preservingness of TA it is enough to show that µ(TA−1 C) = µ(T −1 C). For each k ≥ 1 define Bk = {x ∈ Ac : T x, . . . , T k−1 x 6∈ A, T k x ∈ A}. Notice that T −1 A = A1 ∪ B1

and

T −1 Bn = An+1 ∪ Bn+1 .

(11.5)

See Figure 11.3 for an illustration of where T maps the sets Ak and Bk . From (11.3) we know that µ(TA−1 C) =

X n≥1

µ(An ∩ T −n C).

Infinite Ergodic Theory  211

B1 B1

B2

B1

B2

B3

B1

B2

B3

B4

A2

A3

A4

A5

...

A1 Figure 11.3

T 4A \

S3

T jA

T 3A \

S2

T jA

j=0

j=0

T 2 A \ (A ∪ T A) TA \ A A

A tower.

On the other hand, using (11.5) repeatedly one gets for any n ≥ 1, µ(T −1 C) = µ(A1 ∩ T −1 C) + µ(B1 ∩ T −1 C) = µ(A1 ∩ T −1 C) + µ(T −1 (B1 ∩ T −1 C)) = µ(A1 ∩ T −1 C) + µ(A2 ∩ T −2 C) + µ(B2 ∩ T −2 C) .. . =

n X

µ(Ak ∩ T −k C) + µ(Bn ∩ T −n C).

k=1

We claim that limn→∞ µ(Bn ) = 0. This follows from the fact that if we replace C by A in the above calculations, we get for each n µ(A) = µ(T −1 A) =

n X

µ(Ak ) + µ(Bn ).

k=1

Taking limits, and using the fact that µ(A) = ∞ k=1 µ(Ak ) as well as µ(A) < ∞, we see that limn→∞ µ(Bn ) = 0. Since µ(Bn ∩T −n C) ≤ µ(Bn ), we have limn→∞ µ(Bn ∩ T −n C) = 0. Thus, P

µ(C) = µ(T −1 C) =

X

µ(An ∩ T −n C) = µ(TA−1 C).

n≥1

This gives the result. Exercise 11.3.3. Let (X, F, µ, T ) be a dynamical system and assume

212  A First Course in Ergodic Theory

that T : X → X is conservative, measure preserving and invertible. Show that µA (C) = µA (TA C) for all C ∈ F ∩ A without using Proposition 11.3.2. Exercise 11.3.4. Let (X, F, m, T ) be a dynamical system. Let T : X → X be conservative and A ∈ F be such that 0 < m(A) < ∞. Prove the following. (a) If T is ergodic, then TA is ergodic. (b) If TA is ergodic and A is a sweep-out set, then T is ergodic. Exercise 11.3.5. Consider the rotation from Example 1.3.2, i.e., T x = x + θ (mod 1) for some θ ∈ (0, 1). Assume that θ irrational. Determine explicitly the induced transformation TA of T on the interval A = [0, θ). √

Exercise 11.3.6. Let G = 1+2 5 be the golden mean, so that G2 = G+1. Consider the set h 1 h1  h 1 X = 0, × [0, 1) ∪ , 1 × 0, , G G G endowed with the product Lebesgue σ-algebra and the normalized Lebesgue measure λ2 . Define the transformation (

T (x, y) =

if (x, y) ∈ 0, G1 × [0, 1], Gx, Gy ,     1+y  Gx − 1, G , if (x, y) ∈ G1 , 1 × 0, G1 . 





(a) Show that T is measure preserving with respect to λ2 . (b) Determine explicitly the induced transformation U of T on the set   [0, 1) × 0, G1 . Now let S : [0, 1)2 → [0, 1)2 be given by S(x, y) =

    Gx,

y G ,

   G2 x − G,

if (x, y) ∈ 0, G1 × [0, 1), 

G+y  G2 ,

if (x, y) ∈



1

G, 1



× [0, 1).

(c) Show that S is measure preserving with respect to Lebesgue measure on [0, 1)2 .   (d) Show that the map φ : [0, 1)2 → [0, 1) × 0, G1 given by  y φ(x, y) = x, G defines an isomorphism from S to U , where [0, 1)× 0, G1 has the induced measure structure. 



Infinite Ergodic Theory  213

The next proposition is one of the reasons why inducing is so useful for infinite measure systems. Proposition 11.3.3. Let (X, F, m, T ) be a dynamical system with T : X → X conservative. Let A ∈ F be a sweep-out set. Assume that the induced map TA preserves some probability measure ν on (A, F ∩ A) that is absolutely continuous with respect to the measure mA . (i) The measure µ on (X, F) given by µ(B) =

X

ν(T −n B ∩ {x ∈ A : r(x) > n})

(11.6)

n≥0

for all B ∈ F is a T -invariant measure on (X, F) and is absolutely continuous with respect to m. (ii) The system (X, F, µ, T ) is conservative. Proof. For (i) it is straightforward to check that µ is a measure. To prove that it is T -invariant, let B ∈ F be given and write Ak = {x ∈ A : r(x) = k} as before. Then B ∩ A ∈ F ∩ A, so by (11.3) ν(B ∩ A) = ν(TA−1 (B ∩ A)) =

X

ν(An ∩ T −n B).

n≥1

Then µ(T −1 B) =

X

ν(T −(n+1) B ∩ {x ∈ A : r(x) > n})

n≥0

=

X

ν(An ∩ T −n B) +

n≥1

X

ν(T −n B ∩ {x ∈ A : r(x) > n})

n≥1

= ν(B ∩ A) +

X

ν(T

−n

B ∩ {x ∈ A : r(x) > n}) = µ(B).

n≥1

So µ is T -invariant. Finally, suppose m(B) = 0. The non-singularity of T then gives m(T −n B) = 0 for all n. Hence mA (T −n B ∩ {x ∈ A : r(X) > n}) = 0 for all n, and since ν is absolutely continuous with respect to mA , also ν(T −n B ∩ {x ∈ A : r(X) > n}) = 0 for all n. This shows that µ(A) = 0 and µ is absolutely continuous with respect to m. To prove (ii), we first note that µ(A) =

X

ν(T −n A ∩ {x ∈ A : r(x) > n})

n≥0

= ν(A) +

X n≥1

ν(T −n A ∩ {x ∈ A : r(x) > n}) = ν(A) = 1.

214  A First Course in Ergodic Theory

By T -invariance we have µ(T −k A) = 1 for all k. Since m(X \ S T −k A) = 0 it follows by absolute continuity that µ(X \ Sk≥0 −k A) = 0, so (X, F, µ) is a σ-finite measure space and A is a k≥0 T sweep-out set with respect to µ. The conservativity of µ then follows from Maharam’s Recurrence Theorem. Under the hypotheses of Proposition 11.3.3 the measure ν equals the induced measure µA . To see this, let C ∈ F ∩ A. Then, since C ⊆ A, µA (C) =

µ(C) X ν(T −n C ∩ {x ∈ A : r(x) > n}) = ν(C). = µ(A) n≥0

The proof of Proposition 11.3.3 shows that A is a sweep-out set for µ as well as for m. It then follows from Exercise 11.3.4 that if ν is ergodic, then also µ is ergodic. Exercise 11.3.7 (Kac’s Lemma). Let (X, F, µ, T ) be a dynamical system with T : X → X a measure preserving, conservative and ergodic transformation. Let A ∈ F satisfy 0 < µ(A) < ∞. Prove that Z

r dµ = µ(X). A

Conclude that if µ is a probability measure, then r ∈ L1 (A, F ∩ A, µA ), and X 1 n−1 1 lim , r(TAi x) = n→∞ n µ(A) i=0 for µA -almost every x. From the previous exercise we immediately deduce the following. Corollary 11.3.1. Under the conditions of Proposition 11.3.3, suppose that T is ergodic withR respect to the measure µ from (11.6). Then µ is infinite if and only if A r dµ = ∞. Example 11.3.2. Define the transformation T : [0, ∞) → [0, ∞) by

Tx =

   0, 1 x

− 1,

 x − 1,

if x = 0, if 0 < x ≤ 1, if x > 1.

See Figure 11.4(a) for the graph. Let A = [0, 1) and consider the induced

Infinite Ergodic Theory  215

transformation TA : [0, 1) → [0, 1). Note that r(x) = 1 for all x ∈ 21 , 1 ∪  1 {0}. For any n ≥ 2 and x ∈ n+1 , n1 we have T x = x1 − 1 ∈ [n − 1, n) and thus r(x) = n. We then immediately see that TA becomes a very familiar map: TA 0 = 0 and TA x = x1 (mod 1) for x 6= 0. The induced map therefore has the Gauss measure (see (1.6)) as an invariant probability measure ν and we can use Proposition 11.3.3 to obtain an invariant measure µ for T . The ergodicity of T for µ follows immediately from the ergodicity of TA for ν (see Theorem 8.2.1). Proposition 11.3.3 also gives the conservativity of T with respect to µ. We will compute µ. 

1 It is easily verified that {x ∈ [0, 1) : r(x) > n} = 0, n+1 for any 1 n  n ≥ 1. Moreover, T | x = x − n. Let B ⊆ (0, ∞) be an arbitrary 1



0, n+1

Lebesgue set. If B ⊆ (0, 1) it holds that 1 µ(B) = ν(B) = log 2

1 dλ(x). x+1

Z B

If, on the other hand, B ⊆ (1, ∞), then µ(B) =

X 

ν T

−n

n≥0

1 B ∩ 0, n+1 



1 = ν . B + n n≥0 X 



The same computation that shows ν is the invariant measure for the Gauss map yields that 1 µ(B) = log 2

Z B

1 dλ(x) x+1

(11.7)

also in this case. So, (11.7) gives µ(B) for any Lebesgue measurable subset B ⊆ [0, ∞). It is now immediately clear that µ([0, ∞)) = ∞, but we could also deduce this from Kac’s formula. Exercise 11.3.8. Show that in the above example, Z [0,∞)

r dµ = ∞.

216  A First Course in Ergodic Theory

2 1 0 11 1 32

2

(a) T from ple 11.3.2

0 Exam-

1 G

1

(b) JA

Figure 11.4 The graph of the map T from Example 11.3.2 on the left. The  dashed lines indicate that the interval 13 , 12 is first mapped to [1, 2) and then back to [0, 1) giving return time 2 on this interval. On the right we see the jump transformation JA from Example 11.4.1.

11.4

JUMP TRANSFORMATIONS

A construction very similar to the induced transformation is the jump transformation which we introduce next. Let (X, F, m) be a measure space and T : X → X a transformation. Let A ∈ F be a sweep-out set with the property that T A ∈ F and m(X \ T A) = 0. Define the first passage time p : X → N by p(x) = 1 + inf{n ≥ 0 : T n x ∈ A}, so it is 1 plus the number of iterates needed to map x inside A. The jump transformation JA : X → X is defined by JA x = T p(x) x. In contrast to an induced transformation, a jump transformation is defined on the whole space X. A jump transformation can be used for systems that are slow on some part of the space X (often the reason that an invariant measure is infinite). The system is then accelerated to the point that it admits a finite invariant measure. This is formulated in the next proposition, the proof of which is to a large extent equal to the proof of Proposition 11.3.3. Proposition 11.4.1. Let (X, F, m, T ) be a dynamical system. Let A ∈ F be a sweep-out set with the properties that T A ∈ F and m(X \ T A) =

Infinite Ergodic Theory  217

0. Assume that JA preserves a probability measure ν on (X, F). Then the measure µ on (X, F) given by µ(B) =

X

ν(T −n B ∩ {x ∈ X : p(x) > n}) for all B ∈ F

(11.8)

n≥0

is T -invariant. Moreover, if ν is ergodic for JA , then the measure µ from (11.8) is ergodic for T . Exercise 11.4.1. Prove Proposition 11.4.1. Exercise 11.4.2. Let T : X → X be a measure preserving transformation on a measure space (X, F, µ) and let A be a sweep-out set. Prove that µ({x ∈ A : r(x) > k}) = µ({x ∈ X : p(x) = k + 1}) for any k ≥ 0. √

Example 11.4.1. Let G = 1+2 5 be the golden mean (so G2 = G + 1) and let T be the β-transformation from Example 1.3.6 with β = G. The formula for the invariant density from (1.4) can easily be deduced   from Proposition 11.4.1. Let A = 0, G1 . Then p(x) = 1 for x ∈ A and p(x) = 2 for x 6∈ A. So, (

JA x =

if x ∈ A, if x ∈ 6 A.

Gx, G2 x − G,

See Figure 11.4(b) for the graph. One immediately sees that Lebesgue measure λ is invariant for JA , so by Proposition 11.4.1 the measure µ ˆ given by µ ˆ(B) =

X

λ(T −n B ∩ {x ∈ [0, 1) : p(x) > n})

n≥0

1 ,1 G    1 1 = λ(B) + λ B ∩ 0, G G 

= λ(B) + λ T −1 B ∩





for any Lebesgue set B is T -invariant. Note that µ ˆ([0, 1)) = 1 + G12 < ∞, so if we normalize µ ˆ we get the invariant probability measure µ for T given by G2 1 λ B∩ ,1 2 1+G G  

µ(B) =







+ 1+

which is precisely the measure from (1.4).

1 1 λ B ∩ 0, G G  





,

218  A First Course in Ergodic Theory

Example 11.4.2. Consider again the Farey map T : [0, 1] → [0, 1] from Example 11.1.1 with its σ-finite, infinite invariant measure µ on ([0, 1], B) given by Z 1 µ(A) = dλ(x) for all A ∈ B. A x  Consider the jump transformation of T to the interval A = 12 , 1 . Obviously T A = [0, 1). As in Example 11.2.3 for any n ≥ 2, T n−1



1 1 , = A. n+1 n 

1 , n1 and n≥0 T −n A = (0, 1], so A is Hence, p(x) = n for x ∈ n+1 a sweep-out set with respect to λ. By (11.4) the jump transformation  1 1 JA : [0, 1] → [0, 1] is given on n+1 , n by



JA x =

S

1 − T n−1 x 1 − nx 1 = = − n. n−1 T x x x

In other words, JA is the Gauss map from Example 1.3.7: JA x = x1 (mod 1). Let ν denote the Gauss measure from (1.6). One can check that µ(B) = log 2

X

ν(T −n B ∩ {x ∈ [0, 1] : p(x) > n})

(11.9)

n≥0

for all B ∈ B. Since JA is ergodic, it follows from Proposition 11.4.1 that also T is ergodic. Exercise 11.4.3. Prove (11.9).

11.5

INFINITE ERGODIC THEOREMS

While the Pointwise Ergodic Theorem was one of the main results we discussed, the statement is not true for infinite measure systems. In fact we have the following result. Theorem 11.5.1. Let (X, F, µ) be a measure space with µ(X) = ∞ and let T : X → X be a conservative, measure preserving and ergodic transformation. Then for all f ∈ L1 (X, F, µ), X 1 n−1 lim f (T i x) = 0 µ − a.e. n→∞ n i=0

The proof of this result is a consequence of the following more general theorem.

Infinite Ergodic Theory  219

Theorem 11.5.2 (Hopf’s Ratio Ergodic Theorem). Let (X, F, µ) be a measure space with µ(X) = ∞ and let T : X → X be a conservative, measure preserving and ergodic transformation. Let f, g ∈ L1 (X, F, µ) R with g ≥ 0 and X g dµ > 0. Then Pn−1

f (T i x) lim Pi=0 n−1 i n→∞ i=0 g(T x)

R f dµ = RX X

µ − a.e.

g dµ

Before we prove this theorem, we first show how to derive Theorem 11.5.1 from it. Proof of Theorem 11.5.1. By the σ-finiteness of the measure space, we can find for each k ≥ 1 a set Bk ∈ F with k ≤ µ(Bk ) < ∞. Let f ∈ L1 (X, F, µ) be given and assume without loss of generality that f ≥ 0. We apply Theorem 11.5.2 on the functions f and 1Bk . Note that by Proposition 2.2.2 the set Bk is a sweep-out set, so that for µ-a.e. x ∈ X P i for all n large enough it holds that n−1 i=0 1Bk (T x) ≥ 1. Therefore, for µ-almost every x ∈ X, n−1 i X 1 n−1 i=0 f (T x) 0 ≤ lim sup f (T i x) ≤ lim sup Pn−1 i n→∞ n n→∞ i=0 1Bk (T x) i=0

P

Pn−1

= Let

i i=0 f (T x) lim Pn−1 i n→∞ i=0 1Bk (T x)

R

=

f dµ . µ(Bk ) X

X f dµ o 1 n−1 f (T i x) > X Yk = x ∈ X : lim sup µ(Bk ) n→∞ n i=0 R

n

denote the exceptional set for Bk and put Y = and for all x ∈ X \ Y and k ≥ 1, X 1 n−1 1 0 ≤ lim sup f (T i x) ≤ k n→∞ n i=0

S

k≥1

Yk . Then µ(Y ) = 0

Z

f dµ. X

Since f ≥ 0 the statement follows. The proof of Theorem 11.5.2 we give here is basically the same proof as the one given for the Pointwise Ergodic Theorem in Chapter 3. We give a sketch of the proof with a description of the necessary modifications. Sketch of the proof of Theorem 11.5.2. Assume without loss of generality that f ≥ 0. Define Pn−1

R(x) = lim sup Pi=0 n−1 n→∞

i=0

f (T i x) g(T i x)

Pn−1

and

R(x) = lim inf Pi=0 n−1 n→∞ i=0

f (T i x) . g(T i x)

220  A First Course in Ergodic Theory

From Proposition 2.2.3 it follows that i≥0 g(T i x) = ∞ for µ-almost every x ∈ X, so that R and R are well-defined. Then R and R are T invariant functions (this is an exercise), so that by Theorem 2.2.2(ii) R and R are constants µ-almost everywhere. We claim that to prove the theorem it is enough to prove that P

Z

Rg dµ ≤

Z

f dµ ≤

X

X

Z

(11.10)

Rg dµ. X

Since R ≥ R and g ≥ 0 from (11.10) it would follow that Z

g · (R − R)dµ = 0

X

and hence g · (R − R) = 0 µ-a.e. Since R and R are both constant and X g dµ > 0 we see that R − R = 0. Hence,

R

Pn−1

R = R = lim Pi=0 n−1 n→∞ i=0

R

f (T i x) . g(T i x) R

R

The proofs of the inequalities X Rg dµ ≤ X f dµ and X f dµ ≤ X Rg dµ are the same as the corresponding estimates in the proof of Theorem 3.1.1 with the some small modifications. We treat the estimate R R Rg dµ ≤ one to the reader. For any L > 0 X X f dµ and leave the other R the measure ν defined by ν(A) = A L · g dµ for all A ∈ F is absolutely continuous with respect to µ. So for any given δ > 0 there is a δ 0 < δ R 0 such that if µ(A) < δ , then A L · g dµ < δ. By the definition of limsup, given 0 < ε < 1 and L > 0 there is an M ≥ 1, such that if R

Pm−1

n

X0 = x ∈ X : ∃ m ≤ M with Pi=0 m−1 i=0

o f (T i x) ≥ min{R(x), L}(1 − ε) , g(T i x)

then µ(X \ X0 ) < δ 0 < δ. The function F is modified to (

F (x) =

f (x), if x ∈ X0 , Lg(x), if x ∈ / X0 .

The sequences (an (x)) and (bn (x)) are now defined by an (x) = F (T n x) and bn (x) = g(T n x) min(R(x), L)(1 − ). The rest of the proof follows the same steps as in Theorem 3.1.1. Exercise 11.5.1. Prove that the functions R and R from the proof of Theorem 11.5.2 are T -invariant, and then finish the proof of this theorem.

Infinite Ergodic Theory  221

The statement from Theorem 11.5.1 is bad news, but one could ar 1 gue that the sequence n n≥1 used in front of the ergodic sums is just converging to 0 too fast and that the result could maybe be salvaged by replacing this sequence by another sequence (cn ) with slower convergence to 0. The following result by J. Aaronson, the proof of which can be found in [1, Section 2.4], shows that this will not work. Theorem 11.5.3 (Aaronson’s Theorem). Let (X, F, µ) be a measure space with µ(X) = ∞ and T : X → X a conservative, measure preserving and ergodic transformation. Let (cn )n≥1 ⊆ (0, ∞) be an arbitrary sequence. Then either X 1 n−1 (i) lim inf f (T i x) = 0 µ-a.e. for all f ∈ L1 (X, F, µ) with f ≥ 0, n→∞ cn i=0

or nk −1 1 X (ii) there is a subsequence (cnk )k≥1 such that lim f (T i x) = ∞ k→∞ cnk i=0 µ-a.e. for all f ∈ L1 (X, F, µ) with f ≥ 0.

To end on a positive note, there exist many other tools to describe the long term behavior of infinite measure dynamical systems, but they fall beyond the scope of this book. We refer the reader to e.g. [1] or [29]. Here we give a small example, showing that even with Hopf’s Ratio Ergodic Theorem we can still obtain a little bit of information on relative digit frequencies as we did in Section 3.2. Example 11.5.1. Recall the definition of the R´enyi map T : [0, 1) → [0, 1) by x Tx = (mod 1). 1−x from Exercise 11.1.1. It can be used to produce backwards continued fraction expansions of numbers in the interval [0, 1) of the form 1

x=1− b1 −

b2 −

bn ≥ 2.

,

1 1 . b3 − . .

The digit sequence (bn ) is obtained from setting bn = bn (x) = k + 1

if T n−1 x ∈

hk − 1

k

,

k  , k ≥ 1. k+1

(11.11)

222  A First Course in Ergodic Theory

Then T x =

1 1−x

− b1 + 1 and rewriting and iterating gives

x= 1−

1 =1− b1 − 1 + T x

1 b1 −

1 b2 − 1 + T 2 x (11.12)

1

= ··· = 1 − b1 −

1 . b2 − . . −

1 bn − 1 + T n x

for any n ≥ 1. From Exercise 11.1.1(a) we see how T relates to the αcontinued fraction map with α = 0. It then follows from Proposition 8.4.1 that the process from (11.12) converges. The measure µ on ([0, 1), B) given by (11.1) is invariant for T . It is a straightforward application of Theorem 11.5.1 that for Lebesgue almost all x ∈ [0, 1) and all k ≥ 3, X #{1 ≤ j ≤ n : bj (x) = k} 1 n−1 lim = lim 1[ k−1 , k ) (T i x) = 0. n→∞ n→∞ n k k+1 n i=0

So for Lebesgue almost all x any digit k > 2 occurs with frequency 0 in the digit sequence (bn ). As a consequence, the frequency of the digit 2 is one. Using Hopf’s Ratio Ergodic Theorem we can compare the frequency   k of any two digits k, ` 6= 2. For any k ≥ 2 the interval k−1 , k k+1 has 

µ

k−1 k , k k+1



k2 = log . k2 − 1 



Hence, for Lebesgue almost all x ∈ [0, 1) and any k, ` > 2, log #{1 ≤ j ≤ n : bn (x) = k} lim = n→∞ #{1 ≤ j ≤ n : bn (x) = `} log

k2 k2 −1 . `2 `2 −1



Using this expression it is easy to calculate, for example, that for any k ≥ 3 and any m ≥ 2 #{1 ≤ j ≤ n : bn (x) = k} = m2 . n→∞ #{1 ≤ j ≤ n : bn (x) = mk} lim

So in typical digit sequences (bn ) the digit k appears m2 times as often as the digit mk.

CHAPTER

12

Appendix

12.1

TOPOLOGY

We will assume that the reader has some familiarity with basic (analytic) topology, but for future reference we start by recalling some concepts over compact metric spaces that are needed in some parts of the book. For any subset A ⊆ X we write Ao for the interior of A, A for the closure of A and ∂A = A \ Ao for the boundary of A. Lemma 12.1.1. Let (X, d) be a compact metric space and T : X → X a continuous function. Then for any subset A of X we have ∂(T −1 A) ⊆ T −1 (∂A). Proof. Let y ∈ ∂(T −1 A), then there exists a sequence (xn ) ∈ T −1 A converging to y. By continuity we have T xn → T y with T xn ∈ A ⊆ A. Since X is compact, there exists a subsequence (xnj ) such that (T xnj ) converges to some a ∈ A. However, (T xnj ) converges to T y so that T y = a and T y ∈ A = ∂A ∪ Ao . Since ∂(T −1 A) = T −1 A \ (T −1 A)o ⊆ T −1 A \ T −1 Ao and y ∈ ∂(T −1 A), we see that T y ∈ / Ao . This implies that −1 T y ∈ ∂A or that y ∈ T (∂A). Theorem 12.1.1. Let X be a compact metric space, Y a topological space and f : X → Y continuous. Then all the following hold. (i) X is a Hausdorff space. (ii) Every closed subspace of X is compact. (iii) Every compact subspace of X is closed. (iv) X has a countable basis for its topology. 223

224  A First Course in Ergodic Theory

(v) X is normal, i.e., for any pair of disjoint closed sets A and B of X there are disjoint open sets U , V containing A and B, respectively. (vi) If O is an open cover of X, then there is a δ > 0 such that every subset A of X with diam(A) < δ is contained in an element of O. We call δ > 0 a Lebesgue number for O; (vii) X is a Baire space, i.e., every intersection of a countable collection of open, dense subsets of X is itself dense. (viii) f is uniformly continuous. (ix) If Y is an ordered space, then f attains a maximum and minimum on X. (x) If A and B are closed sets in X and [a, b] ⊂ R, then there exists a continuous map g : X → [a, b] such that g(x) = a for all x ∈ A and g(x) = b for all x ∈ B. The reader may recognize that this theorem contains some of the most important results in analytic topology, most notably the Lebesgue Number Lemma, Baire Category Theorem, the Extreme Value Theorem and Urysohn’s Lemma.

12.2 MEASURE THEORY We also assume a basic knowledge of measure theory, since the underlying state space of our dynamical systems is always (with the exception of Section 7.3) a measure space. We denote this by (X, F, µ), where F is a σ-algebra on X and µ is a measure on the measurable space (X, F). In this section we quickly recall some basic concepts and list some famous results that are used in the main text for easy reference. We do not provide proofs of these results, since they can be found in any textbook on measure theory. This section also includes some less well known results, some are provided with a proof. The measure spaces we consider are either a probability space with µ(X) = 1 or a σ-finite, infinite measure space in which case µ(X) = ∞ and there exists a countable collection (An )n≥1 ⊆ F with µ(An ) < ∞ S for all n ≥ 1 and n≥1 An = X. A set A ∈ F is a µ-null set if µ(A) = 0. A result is said to hold µ-almost everywhere if the set of points for which the result does not hold is a µ-null set. Often the results in this book hold µ-almost everywhere and sets of measure zero can be neglected. Two sets are said to be equal modulo sets of µ-measure zero if they are equal up to adding or subtracting µ-null sets. If µ and ν are two measures on the

Appendix  225

same measurable space (X, F), then µ is called absolutely continuous with respect to ν if for each A ∈ F with ν(A) = 0 it holds that also µ(A) = 0. We denote this by µ  ν. The measures µ and ν are called equivalent if they have the same null sets. Besides σ-algebras there are three other types of collections of subsets of X that can sometimes make life easier. A collection S of subsets of X is called a semi-algebra if it satisfies (i) ∅ ∈ S, (ii) A ∩ B ∈ S whenever A, B ∈ S, and (iii) if A ∈ S, then Ac = ∪ni=1 Ei is a disjoint union of elements of S. An algebra A is a collection of subsets of X satisfying (i) ∅ ∈ A, (ii) if A, B ∈ A, then A ∩ B ∈ A, and finally (iii) if A ∈ A, then Ac ∈ A. Clearly an algebra is a semi-algebra. Furthermore, given a semialgebra S one can form an algebra by taking all finite disjoint unions of elements of S. We denote this algebra by A(S), and we call it the algebra generated by S. It is in fact the smallest algebra containing S. Likewise, given a semi-algebra S (or an algebra A), the σ-algebra generated by S (or A) is denoted by σ(S) (or σ(A)), and is the smallest σ-algebra containing S (or A). A monotone class M is a collection of subsets of X with the following two properties (i) if E1 ⊆ E2 ⊆ . . . are elements of M, then ∪∞ i=1 Ei ∈ M, ∞ (ii) if F1 ⊇ F2 ⊇ . . . are elements of M, then ∩i=1 Fi ∈ M. The monotone class generated by a collection S of subsets of X is the smallest monotone class containing S. The following theorem relates these concepts and will be used in the proof of Theorem 12.3.1 further on. Theorem 12.2.1 (Monotone Class Theorem). Let A be an algebra of X, then the σ-algebra σ(A) generated by A equals the monotone class generated by A. The next useful lemma states that any measurable set can be approximated arbitrarily well by an element in a generating algebra.

226  A First Course in Ergodic Theory

Lemma 12.2.1 (Approximation Lemma). Let (X, F, µ) be a finite measure space, and A an algebra generating F. Then, for any A ∈ F and any  > 0, there exists a C ∈ A such that µ(A∆C) < . Proof. Let D be the collection of all sets A ∈ F satisfying the property that for any  > 0, there exists a C ∈ A such that µ(A∆C) < . We will show that D = F. First note that since X ∈ A, then X ∈ D. Now let A ∈ D and ε > 0. There exists C ∈ A such that µ(A∆C) < ε. Since C c ∈ A and A∆C = Ac ∆C c , we have µ(Ac ∆ C c ) < ε and hence Ac ∈ D. Finally, suppose (An )n ⊂ D and ε > 0. For each n, there exists ε Cn ∈ A such that µ(An ∆Cn ) < 2n+1 . It is easy to check that ∞ [ n=1

so that µ

∞ [

∞ [

An ∆

n=1

An ∆

n=1

∞ [

∞ [

Cn ⊆

(An ∆Cn ),

n=1

!

Cn



n=1

∞ X

ε µ(An ∆Cn ) < . 2 n=1

Since A is closed under finite unions we do not know at this point if S∞ of A. To solve this problem, we proceed as follows. n=1 Cn is an element Tm T c First note that n=1 Cnc & ∞ n=1 Cn , hence ∞ [

µ

An ∩

n=1

∞ \

∞ [

!

Cnc

= lim µ m→∞

n=1

An ∩

n=1

m \

!

Cnc

,

n=1

and therefore, µ

[ ∞

An ∆

n=1

∞ [



Cn

n=1

= lim µ m→∞

∞ [

An ∩

n=1

m \

!

Cnc



n=1

∞ \

Acn



n=1

∞ [

!!

Cn

n=1

Hence there exists an m sufficiently large so that µ

 [ ∞ n=1

An ∩

m \ n=1



Cnc ∪

\ ∞ n=1

Acn ∩

∞ [



Cn

n=1

0. Proposition 12.2.1. Let (X, F, µ) be a σ-finite measure space and G ⊆ F a hereditary collection. Then a measurable union U for G exists. In S fact, U = k≥1 Ak for a countable collection (Ak ) of disjoint set in G. Proof. Let {An }n ⊆ F be a countable collection of pairwise disjoint sets  S with the properties that 0 < µ(An ) < ∞ for each n and µ X \ n An = 0. Note that each of the collections G ∩ An := {G ∩ An : G ∈ G} is hereditary. We first prove that for each G ∩ An a measurable union exists that is the countable union of disjoint sets in G ∩ An . Fix n and let ε1 = sup{µ(A) : A ∈ G ∩ An } < ∞. Let An,1 be any set in G ∩ An with µ(An,1 ) ≥ ε21 . Then inductively define a sequence (εj )j and a sequence of sets (An,j )j with the properties that for each j, εj = sup{µ(A) : A ∈ G ∩ An and A ∩ An,i = ∅ for all 1 ≤ i < j}

228  A First Course in Ergodic Theory

and An,j ∈ G ∩ An satisfies µ(An,j ) ≥ 1 ≤ i < j. Note that X j≥1

εj ≤

X

2µ(An,j ) = 2µ

[

εj 2

and An,j ∩ An,i = ∅ for all 

An,j ≤ 2µ(An ) < ∞.

j

j≥1

S

Hence, limj→∞ εj = 0. We claim that Un := j An,j is a measurable union of G ∩ An . Suppose Un does not cover G ∩ An . Then there is a set A ∈ G ∩ An with µ(A) > 0 and A ∩ An,j = ∅ for each j. But this would imply that µ(A) < εj for each j, contradicting that µ(A) > 0. Hence Un covers G ∩ An . To see that G ∩ An saturates Un , let B ∈ F with B ⊆ Un and µ(B) > 0. Then there is a j, such that µ(B ∩ An,j ) > 0. Since An,j ∈ G ∩ An and G ∩ An is hereditary, also B ∩ An,j ∈ G ∩ An . Thus Un is a measurable union of G ∩ An . S

S S

Now let U = n Un = n j An,j . We show that U is a measurable union of G. To see that U covers G, let A ∈ G. Then for each n, A ∩ An ∈ G ∩ An . Since Un covers G ∩ An , µ(A ∩ An \ Un ) = 0. Then µ(A \ U ) =

X n

µ(A ∩ An \ U ) ≤

X

µ(A ∩ An \ Un ) = 0.

n

To see that G saturates U , let B ∈ F satisfy B ⊆ U and µ(B) > 0. Then there is an n, such that µ(B ∩ Un ) > 0. Since G ∩ An saturates Un , there is a set G ∈ G, such that G ∩ An ⊆ B ∩ Un and µ(G ∩ An ) > 0. Since G ∩ An ⊆ G and G is hereditary, it follows that G ∩ An ∈ G. Hence, G saturates U and U is a measurable union of G. Theorem 12.2.2 (Carath´ eodory Extension Theorem). Let X be a non-empty set, S a semi-algebra onX and µ0 : S → [0, ∞] a countS P ably additive function, i.e., µ i≥1 Ai = i≥1 µ(Ai ) for any countable S collection (Ai )i≥1 ⊆ S of pairwise disjoint sets with i≥1 Ai ∈ S. Then there exists a measure µ on (X, σ(S)) that satisfies µ(A) = µ0 (A) for all A ∈ S. This measure µ is unique if there is a countable sequence S (Sn )n≥1 ⊆ S with µ0 (Sn ) < ∞ for all n ≥ 1 and n≥1 Sn = X. Note that the uniqueness of µ from the previous theorem holds in particular in case µ0 : S → [0, 1]. Theorem 12.2.3 (Kolmogorov 0-1 Law). Let (X, F, µ) be a probability space. Let (An ) ⊆ F be a sequence of independent σ-algebras, so for any n ≥ 1 and any sets Ai ∈ Ai , 1 ≤ i ≤ n, it holds that µ(A1 ∩ A2 ∩ · · · ∩ An ) = µ(A1 )µ(A2 ) · · · µ(An ).

Appendix  229



For each n ≥ 1 let Tn = σ m≥n Am . Set T∞ = n≥1 Tn , the tail σ-algebra. Then for any A ∈ T∞ either µ(A) = 0 or µ(A) = 1. S

12.3

T

LEBESGUE SPACES

A function f : (X1 , F1 ) → (X2 , F2 ) between two measurable spaces is called measurable if f −1 C ∈ F1 for any C ∈ F2 . We call it measure preserving if moreover the measure is preserved. Definition 12.3.1. Let (Xi , Fi , µi ), i = 1, 2, be two measure space and φ : X1 → X2 measurable. The map φ is said to be measure preserving if µ1 (φ−1 A) = µ2 (A) for all A ∈ F2 . Two probability spaces (Xi , Fi , µi ), i = 1, 2, are called measurably isomorphic if there is a bijection φ : X1 → X2 that is measurable and measure preserving up to sets of measure zero. The definitions of measurability and measure preservingness require one to verify the conditions for all measurable sets. The following theorem states that it is enough to check them on a generating semi-algebra only. Theorem 12.3.1. Let (Xi , Fi , µi ) be measure spaces and T : X1 → X2 a map. Suppose S2 is a generating semi-algebra of F2 that contains an S exhausting sequence (Sn ), i.e., an increasing sequence with Y = ∞ n=1 Sn . −1 −1 Suppose that for each A ∈ S2 one has T A ∈ F1 and µ1 (T A) = µ2 (A). If furthermore, µ2 (Sn ) = µ1 (T −1 Sn ) < ∞ for all n, then T is measurable and measure preserving. Proof. Let m ≥ 1 and consider the collection Dm = B ∈ F2 : T −1 (B∩Sm ) ∈ F1 and µ1 (T −1 (B∩Sm )) = µ2 (B∩Sm ) , 



then S2 ⊆ Dm ⊆ F2 . We show that Dm is a monotone class. Let E1 ⊆ S −1 E2 ⊆ . . . be elements of Dm , and let E = ∞ (E ∩ Sm ) = i=1 Ei . Then, T S∞ −1 T (E ∩ S ) ∈ F , and i m 1 i=1 µ1 (T −1 (E ∩ Sm )) = µ1

∞ [



T −1 (En ∩ Sm ) = lim µ1 (T −1 (En ∩ Sm )) n→∞

n=1

= lim µ2 (En ∩ Sm ) = µ2 n→∞

= µ2 (E ∩ Sm ).



∞ [ n=1

(En ∩ Sm



230  A First Course in Ergodic Theory

Thus, E ∈ Dm . A similar proof shows that if F1 ⊇ F2 ⊇ . . . are elements T of Dm , then ∞ i=1 Fi ∈ Dm . Hence, Dm is a monotone class containing the algebra A(S2 ). By the Monotone Class Theorem, F2 is the smallest monotone class containing A(S2 ), hence F2 ⊆ Dm . This shows that F2 = Dm for all m. Now let B ∈ F2 , then B ∈ Dm for all m. This implies that T −1 (B ∩ Sm ) ∈ F1 and µ1 (T −1 (B ∩ Sm )) = µ2 (B ∩ Sm ) for all m. Since the sequence (B ∩ Sm ) increases to B, we have T −1 B = S∞ −1 (B ∩ Sm ) ∈ F1 and m=1 T µ1 (T −1 B) = lim µ1 (T −1 (B ∩ Sm )) m→∞

= lim µ2 (B ∩ Sm ) m→∞ ∞ [

= µ2 (

(B ∩ Sm )) = µ2 (B).

m=1

This proves that T is measurable and measure preserving. On a topological space X we usually consider the Borel σ-algebra B, which is the smallest σ-algebra containing all open sets, together with the Lebesgue measure λ. For some purposes it is more convenient to consider the completion of the Borel σ-algebra. A measure space is called complete if any subset of a null set is measurable. The Lebesgue σ-algebra is the completion of B, containing all subsets of Lebesgue null sets as well. Throughout this book, the spaces we are working with are assumed to be standard Lebesgue spaces. Let (X, F, µ) be a measure space. A subset A ∈ F is called an atom if µ(A) > 0 and if each measurable subset B ⊆ A with µ(B) < µ(A) has µ(B)=0. Definition 12.3.2. Let (X, F, µ) be a complete finite measure space. It is called a Lebesgue space if we can remove from X an at most countable (possibly empty) set of atoms, such that the remainder (X0 , F ∩X0 , µ|X0 ) is measurably isomorphic to a measure space ([a, b), B([a, b)), λ), where [a, b) ⊆ R is an interval and B([a, b)) and λ are the corresponding Lebesgue σ-algebra and Lebesgue measure. In case (X, F, µ) is an infinite, σ-finite measure space with {An } ⊆ F S such that µ(An ) < ∞ for all n and n≥1 An = X, then we call it a Lebesgue space if each of the restrictions (An , F ∩An , µ|An ) is a Lebesgue space.

Appendix  231

In [54], see also [60], the following equivalent characterization of Lebesgue spaces was given. Theorem 12.3.2 (Rohlin). Let (X, F, µ) be a complete probability space. Then it is a Lebesgue space if and only if there is a countable collection {Bn : n ≥ 0} ⊆ F with all the following properties: (i) σ({Bn : n ≥ 0}) = F; (ii) there is a full measure set X0 such that for all x, y ∈ X0 there is an n ≥ 0 such that either x ∈ Bn and y 6∈ Bn or x 6∈ Bn and y ∈ Bn ; T (iii) the intersection n≥0 En 6= ∅, where each En either equals Bn or Bnc .

12.4

LEBESGUE INTEGRATION

For any A ⊆ X we use 1A : X → C to denote the indicator function, i.e., ( 1, if x ∈ A, 1A (x) = 0, if x ∈ 6 A. This is a measurable function if A ∈ F. A simple function is a linear combination of indicator functions of measurable sets, so a function h : P X → C of the form h(x) = ni=1 ai 1Ai (x), where ai ∈ C and all Ai ∈ F are pairwise disjoint. Theorem 12.4.1 (Simple Function Approximation Theorem, part I). Let (X, F, µ) be a measure space and f : X → R a measurable function with respect to the Lebesgue σ-algebra on R. Then f is the pointwise limit of a sequence of simple functions. We denote by L0 (X, F, µ) the space of all complex valued measurable functions on a σ-finite measure space (X, F, µ). Let p

n

0

L (X, F, µ) = f ∈ L (X, F, µ) :

Z

o

|f |p dµ(x) < ∞ .

X p

p

On L (X, F, µ), p ∈ [1, ∞), the L -norm k · kp is defined by kf kp =

Z

|f |p dµ

 p1

.

X

We say that the sequence (fn )n≥1 ⊆ Lp (X, F, µ) converges in Lp to f if Z

lim

n→∞ X

|fn − f |p dµ = 0.

232  A First Course in Ergodic Theory

Finally, L∞ (X, F, µ) is the space of essentially bounded measurable functions on (X, F, µ), i.e., L∞ (X, F,µ) = {f ∈ L0 (X, F, µ) : ∃ c > 0 s.t. µ({x ∈ X : |f (x)| ≥ c}) = 0}. The L∞ -norm is defined by kf k∞ = inf{c > 0 : µ({x ∈ X : |f (x)| ≥ c}) = 0}.

(12.1)

A sequence (fn )n≥1 ⊆ L∞ (X, F, µ) converges to f ∈ L∞ (X, F, µ) if limn→∞ kfn − f k∞ = 0. This is equivalent to the statement that there exists a set A ∈ F with µ(X \ A) = 0 such that on A the sequence (fn )n≥1 converges to f uniformly. There are several well known convergence results for Lebesgue Integration that we will use throughout the book. Theorem 12.4.2 (Monotone Convergence Theorem). Let (X, F, µ) be a measure space. Let f, fn : X → R be non-negative, measurable functions such that fn % f pointwise. Then Z

Z

f dµ = lim X

n→∞ X

fn dµ.

Theorem 12.4.3 (Dominated Convergence Theorem). Let (X, F, µ) be a measure space. Let f, fn : X → R be measurable functions such that fn → f pointwise. If there is a Lebesgue integrable function g : X → R, such that |fn | ≤ g for all n, then fn and f are Lebesgue integrable as well and Z

Z

f dµ = lim X

n→∞ X

fn dµ.

Lemma 12.4.1 (Scheff´ e’s Lemma). Let (X, F, µ) be a measure space and (fn ) a sequence in L1 (X, F, µ) converging µ-a.e. to a function f ∈ L1 (X, F,R µ). Then, Rthe sequence (fn ) converges to f in L1 if and only if limn→∞ X fn dµ = X f dµ. Theorem 12.4.4 (Radon-Nikodym Theorem). Let (X, F) be a measurable space with two σ-finite measures µ and ν, such that µ  ν. Then there exists a measurable function f : X → [0, ∞), such that for any A ∈ F, Z µ(A) =

f dν. A

The function f is called the Radon-Nikodym derivative and is denoted by dµ dν .

Appendix  233

Theorem 12.4.5 (Simple Function Approximation Theorem, part II). Let (X, F, µ) be a measure space and f : X → R Lebesgue integrable. Then f is the L1 -limit of a sequence of simple functions. A last convergence result is the following. Theorem 12.4.6 (Fubini’s Theorem). Let (Xi , Fi , µi ), i = 1, 2, be two σ-finite measure spaces. If f ∈ L1R (X1 × X2 , F1 ⊗ F2 , µ1 × µ2 ), then Z X1

Z



Z

f dµ2 dµ1 =

X1 ×X2

X2

f dµ1 × µ2 =

Z X2

Z



f dµ1 dµ2 .

X1

We now define the concept of conditional expectation. Definition 12.4.1. Let (X, F, µ) be a probability space, G a sub-σalgebra of F and f ∈ L1 (X, F, µ). The conditional expectation of f given G, denoted by Eµ (f |G) is the µ-a.e. unique integrable function satisfying the following two properties: (i) Eµ (f |G) is G measurable, (ii) for any A ∈ G, Z

Z

f dµ = A

A

Eµ (f |G) dµ.

The following version of the Martingale Convergence Theorem is adapted to our setting. Theorem 12.4.7 (Martingale Convergence Theorem). Let (X, F, µ) be a probability space. Let C1 ⊆ C2 ⊆ . . . be a sequence of increasing σ-algebras, and let C = σ(∪n Cn ). If f ∈ L1 (X, F, µ), then Eµ (f |C) = lim Eµ (f |Cn ) n→∞

µ-a.e., and in L1 .

12.5

HILBERT SPACES

For the material treated in this book we need a few basic notions from functional analysis. Definition 12.5.1. A Banach space X is a complete normed vector space, i.e., it is a vector space with a norm k · k defined on it such that X is complete with respect to the metric induced by the norm.

234  A First Course in Ergodic Theory

Definition 12.5.2. A Hilbert space X is a real or complex vector space with an inner product (·, ·) defined on it such that X is complete with respect to the metric induced by the inner product. If H is a Hilbert space and S is a closed linear subspace of H, then the orthogonal complement S ⊥ of S is the space S ⊥ = {h ∈ H : (h, g) = 0 for all g ∈ S}. Theorem 12.5.1 (Decomposition Theorem). Let S be a closed linear subspace of a Hilbert space H. Then for any element h ∈ H, there exists a unique element g ∈ S satisfying inf{kh − skH : s ∈ S} = kh − gkH , where k · kH denotes the norm induced by the inner product on H. Furthermore, if we define Π : H → S by Π(h) = g, then every element h ∈ H can be written uniquely as h = Π(h) + t, where t ∈ S ⊥ . The transformation Π is called the orthogonal projection of H onto S. Let (X, F, µ) be a measure space. For p ≥ 1 the space Lp (X, F, µ) is a Banach space under the Lp -norm. The space L2 (X, F, µ) equipped with R the inner product (f, g) = X f g dµ, where g is the complex conjugate of g, is a Hilbert space. Let V be a normed vector space over C (or R). The (topological) dual space V ∗ is the space of all continuous linear functionals φ : V → C (or R). The Riesz Representation Theorem establishes a relation between a Hilbert space and its dual space. Theorem 12.5.2 (Riesz Representation Theorem for Hilbert Spaces). Let H be a Hilbert space with inner product (·, ·) and let φ ∈ H ∗ . Then there exists an element h ∈ H, such that for any g ∈ H, φ(g) = (h, g) and khkH = kφkH ∗ . Let L : H → G be a linear operator between two Hilbert spaces H and G and use k · kH and k · kG to denote the norms on H and G, respectively. Then L is called bounded if there is an M ≥ 0, such that kLhkH ≤ M khkH for all h ∈ H. The operator norm is defined by n

kLk = inf M ≥ 0 :

o kLhkG ≤ M for all h ∈ H, with khk = 6 0 . khkH

L is called positive if (Lh, h) ≥ 0 for every h ∈ H. It is an isometry if it preserves distances, i.e., kLhkG = khkH for all h ∈ H.

Appendix  235

If L : H → H is a bounded linear operator from a Hilbert space H to itself, then the adjoint of L is the bounded linear operator L∗ : H → H defined by (Lh, g) = (h, L∗ g) for all h, g ∈ H. Existence and uniqueness of the adjoint follow from Theorem 12.5.2.

12.6

BOREL MEASURES ON COMPACT METRIC SPACES

If (X, d) is a compact metric space, then we can say a bit more about the collection of Borel probability measures on X. Let M (X) be the collection of all Borel probability measures on X. This set is non-empty if X 6= ∅, since for each x ∈ X the Dirac measure δx concentrated at x, which is given by ( 1, if x ∈ A, δx (A) = 0, if x 6∈ A, is a Borel probability measure on X. The space M (X) is also convex, i.e., pµ + (1 − p)ν ∈ M (X) whenever µ, ν ∈ M (X) and 0 ≤ p ≤ 1. We denote by C(X) the Banach space of all complex valued continuous functions on X under the supremum norm kf k∞ , see (12.1). Since X is a compact space and f is continuous we in fact have kf k∞ = supx∈X |f (x)|. This implies that a sequence (fn ) in C(X) converges to f in the supremum norm if and only if (fn ) converges uniformly to f . Furthermore, C(X) is a separable space, i.e., it contains a countable dense subset. Theorem 12.6.1. Every member of M (X) is regular, i.e., if µ ∈ M (X), then for all A ∈ B and every ε > 0 there exist an open set Oε and a closed set Cε such that Cε ⊆ A ⊆ Oε and µ(Oε \ Cε ) < ε. Idea of proof. Call a set B ∈ B with the above property a regular set. Let R be the collection of all regular sets B ∈ B. Show that R is a σ-algebra containing all the closed sets. Corollary 12.6.1. For any A ∈ B, and any µ ∈ M (X), µ(A) =

sup C⊆A : C closed

µ(C) =

inf

A⊆O : O open

µ(O).

Theorem 12.6.2 below shows that a member of M (X) is determined by how it integrates continuous functions.

236  A First Course in Ergodic Theory

Theorem 12.6.2. Let µ, ν ∈ M (X). If for all f ∈ C(X) Z

Z

f dµ = X

f dν, X

then µ = ν. R

R

Proof. Let µ, ν ∈ M (X) be such that X f dµ = X f dν for all f ∈ C(X). We want to show that µ(A) = ν(A) for all A ∈ B. By Corollary 12.6.1 it is enough to show that µ(C) = ν(C) for all closed subsets C of X. Let ε > 0. By regularity of the measure ν there exists an open set Oε such that C ⊆ Oε and ν(Oε \ C) < ε. Define f ∈ C(X) by f (x) =

d(x, X \ Oε ) . d(x, X \ Oε ) + d(x, C)

Notice that 1C ≤ f ≤ 1Oε , thus µ(C) ≤

Z

Z

f dµ =

f dν ≤ ν(Oε ) ≤ ν(C) + ε.

X

X

Using a similar argument, one can show that ν(C) ≤ µ(C)+ε. Therefore, µ(C) = ν(C) for all closed sets, and hence for all Borel sets. This allows us to define a metric structure on M (X) as follows. A sequence (µn ) in M (X) is said to converge weakly to µ ∈ M (X) if and only if Z Z lim

n→∞ X

f dµn =

f dµ X

for all f ∈ C(X). We will show that under this notion of convergence the space M (X) is compact, but first we need a second version of the Riesz Representation Theorem. Theorem 12.6.3 (Riesz Representation Theorem for Compact Metric Spaces). Let X be a compact metric space and J : C(X) → C a continuous linear map such that J is a positive Roperator and J(1) = 1. Then there exists a µ ∈ M (X) such that J(f ) = X f dµ. Theorem 12.6.4. The space M (X) is compact. Idea of proof. Let (νn ) be a sequence in M (X),Rand choose a countable dense subset of (gn ) of C(X). The sequence ( X g1 dνn ) is a bounded sequence of complex numbers, hence one can find a subsequence (ν1,n )

Appendix  237

R

of (νn ) such that the sequence ( X g1 dν1,n ) is convergent. Now, the seR quence ( X g2 dν1,n ) is bounded, and hence one can find a subsequence R (ν2,n ) of (νR1,n ) such that the sequence ( X g2 dν2,n ) is convergent. Notice that ( X g1 dν2,n ) is also convergent. We continue in this manner, to get for each i a subsequence (νi,nR ) of (νn ) such that for all j ≤ i, (νi,n ) is a subsequence of (νj,n ) and ( X gj dνi,n ) converges. Consider the R diagonal sequence (νn,n ),R then ( X gj dνn,n ) converges for all j. Since (gn ) is dense in C(X), ( X f dνn,n ) converges for all f ∈ C(X). Now R define L : C(X) → C by L(f ) = limn→∞ X f dνn,n . Then L is linear, continuous (|L(f )| ≤ supx∈X |f (x)| = kf k∞ ), positive and L(1) = 1. Thus, by Theorem 12.6.3, there exists a measure ν ∈ M (X) such that R R L(f ) = limn→∞ X f dνn,n = X f dν. Therefore, limn→∞ νn,n = ν, and M (X) is compact. The following result on equivalent definitions for weak convergence of measures is part of a longer list of equivalences sometimes known as the Portmanteau Theorem. Theorem 12.6.5. Let (X, d) be a metric space with Borel σ-algebra B. Let (µn ) ⊆ M (X) and µ ∈ M (X). Then (µn ) converges weakly to µ if and only if limn→∞ µn (A) = µ(A) for all sets A ∈ F with µ(∂A) = 0. The next theorem is an infinite dimensional version of the statement that each element of a compact convex subset of a finite dimensional vector space can be written as a finite convex combination of the extreme points. Theorem 12.6.6 (Choquet). Let Y be a metrizable compact convex subset of a locally convex space V and let y ∈ Y . Let E ⊆ Y denote the collection of extreme points of Y . Then there is a probability measure ν on Y with ν(Y \ E) = 0 and such that for each continuous linear R functional φ on V one has φ(y) = X φ dν.

12.7 FUNCTIONS OF BOUNDED VARIATION As in Chapter 6 we define the total variation as follows. Let a, b ∈ R with a < b and fix the measure space ([a, b], B, λ), where B is the Lebesgue σ-algebra on the interval [a, b] and λ is the one-dimensional Lebesgue measure. The total variation of a function g : [a, b] → R is defined by V ar[a,b] (g) =

sup

n X

x0 =a