Hyperbolic Dynamics, Fluctuations and Large Deviations 1470411121, 978-1-4704-1112-1, 9781470422660, 1470422662

This volume contains the proceedings of the semester long special program on Hyperbolic Dynamics, Large Deviations and F

703 123 3MB

English Pages 339 [354] Year 2015

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Hyperbolic Dynamics, Fluctuations and Large Deviations
 1470411121, 978-1-4704-1112-1, 9781470422660, 1470422662

Table of contents :
Content: Introduction / Dmitry Dolgopyat, Yakov Pesin, Mark Pollicott and Luchezar Stoyanov --
The almost Borel structure of diffeomorphisms with some hyperbolicity / Jérôme Buzzi --
Lectures on large deviations in probability and dynamical systems / Yuri Kifer --
Thermodynamic formalism for countable Markov shifts / Omri M. Sarig --
Limit theorems for horocycle flows / Giovanni Forni --
Limit theorems in dynamical systems using the spectral method / Sébastien Gouëzel --
Kinetic limits of dynamical systems / Jens Marklof --
Limit theorems for toral translations / Dmitry Dolgopyat and Bassam Fayad --
Spectral gap properties and limit theorems for some random walks and dynamical systems / Yves Guivarc'h --
The martingale approach after Varadhan and Dolgopyat / Jacopo De Simoi and Carlangelo Liverani --

Citation preview

Volume 89

Hyperbolic Dynamics, Fluctuations and Large Deviations Special Semester Hyperbolic Dynamics, Large Deviations and Fluctuations January – June 2013 Centre Interfacultaire Bernoulli, ´ Ecole Polytechnique F´ed´erale de Lausanne

D. Dolgopyat Y. Pesin M. Pollicott L. Stoyanov Editors

Volume 89

Hyperbolic Dynamics, Fluctuations and Large Deviations Special Semester Hyperbolic Dynamics, Large Deviations and Fluctuations January – June 2013 Centre Interfacultaire Bernoulli, ´ Ecole Polytechnique F´ed´erale de Lausanne

D. Dolgopyat Y. Pesin M. Pollicott L. Stoyanov Editors

Volume 89

Hyperbolic Dynamics, Fluctuations and Large Deviations Special Semester Hyperbolic Dynamics, Large Deviations and Fluctuations January – June 2013 Centre Interfacultaire Bernoulli, ´ Ecole Polytechnique F´ed´erale de Lausanne

D. Dolgopyat Y. Pesin M. Pollicott L. Stoyanov Editors

2010 Mathematics Subject Classification. Primary 37D35, 37A60, 60F10.

Library of Congress Cataloging-in-Publication Data Hyperbolic dynamics, fluctuations, and large deviations / D. Dolgopyat, Y. Pesin, M. Pollicott, L. Stoyanov, editors. pages cm. – (Proceedings of symposia in pure mathematics ; volume 89) “Special Semester on Hyperbolic Dynamics, Large Deviations and Fluctuations, January-June, ´ 2013, Centre Interfacultaire Bernoulli, Ecole Polytechnique F´ ed´ erale de Lausanne, Switzerland.” Includes bibliographical references and index. ISBN 978-1-4704-1112-1 (alk. paper) 1. Differentiable dynamical systems. 2. Statistical physics. 3. Large deviations. 4. Thermodynamics–Mathematics. I. Dolgopyat, Dmitry, 1972– editor. II. Pesin, Ya. B., editor. III. Pollicott, Mark, editor. IV. Stoyanov, Luchezar N., 1954– editor. QA614.8.H97 2015 515.392–dc23

2014043353

DOI: http://dx.doi.org/10.1090/pspum/089

Copying and reprinting. Individual readers of this publication, and nonprofit libraries acting for them, are permitted to make fair use of the material, such as to copy select pages for use in teaching or research. Permission is granted to quote brief passages from this publication in reviews, provided the customary acknowledgment of the source is given. Republication, systematic copying, or multiple reproduction of any material in this publication is permitted only under license from the American Mathematical Society. Permissions to reuse portions of AMS publication content are handled by Copyright Clearance Center’s RightsLink service. For more information, please visit: http://www.ams.org/rightslink. Send requests for translation rights and licensed reprints to [email protected]. Excluded from these provisions is material for which the author holds copyright. In such cases, requests for permission to reuse or reprint material should be addressed directly to the author(s). Copyright ownership is indicated on the copyright page, or on the lower right-hand corner of the first page of each article within proceedings volumes. c 2015 by the American Mathematical Society. All rights reserved.  The American Mathematical Society retains all rights except those granted to the United States Government. Printed in the United States of America. ∞ The paper used in this book is acid-free and falls within the guidelines 

established to ensure permanence and durability. Visit the AMS home page at http://www.ams.org/ 10 9 8 7 6 5 4 3 2 1

20 19 18 17 16 15

Contents

Preface

vii

Introduction Dmitry Dolgopyat, Yakov Pesin, Mark Pollicott, and Luchezar Stoyanov

1

Large Deviations and Thermodynamical Formalism

7

The almost Borel structure of diffeomorphisms with some hyperbolicity ˆ me Buzzi J´ e ro

9

Lectures on large deviations in probability and dynamical systems Yuri Kifer

45

Thermodynamic formalism for countable Markov shifts Omri M. Sarig

81

Limit Theorems for Dynamical Systems

119

Limit theorems for horocycle flows Giovanni Forni

121

Limit theorems in dynamical systems using the spectral method S´ ebastien Gou¨ ezel

161

Kinetic limits of dynamical systems Jens Marklof

195

Additional Topics

225

Limit theorems for toral translations Dmitry Dolgopyat and Bassam Fayad

227

Spectral gap properties and limit theorems for some random walks and dynamical systems Yves Guivarc’h

279

The martingale approach after Varadhan and Dolgopyat Jacopo De Simoi and Carlangelo Liverani

311

v

Preface The articles presented in this volume are based on lectures and talks given during the semester-long (January–June, 2013) Special Program on Hyperbolic Dynamics, Large Deviations and Fluctuations, which took place at the Centre Inter´ facultaire Bernoulli, Ecole Polytechnique F´ed´erale de Lausanne, Switzerland. The broad theme of the Program was the long term behavior of dynamical systems and their statistical behavior. During the last half century, the statistical properties of dynamical systems of many different types have been the focus of much research interest. Moreover, the results of this study have had a profound effect on many areas across mathematics (including geometry, analysis, and number theory), physics (including both equilibrium and non-equilibrium statistical physics), engineering and biology. The program concentrated on four main research topics. • • • •

large deviations and fluctuations for systems with some hyperbolicity limit theorems in hyperbolic dynamics fluctuations for systems with zero entropy statistical properties of coupled hyperbolic systems

This volume contains all six lectures of the short courses given during the two schools and expanded versions of a number of selected talks given at the two workshops held during the Program. The material presented is primarily directed at researchers and graduate students in the very broad area of dynamical systems and ergodic theory, but will also be of interest to researchers in related areas such as statistical physics, spectral theory and some aspects of number theory and geometry. It may also be profitably read by people who have a keen interest in the applications of statistical methods to other areas of scientific research, and have some background in the physical and mathematical sciences. The main activities during the Program concentrated on two general topics: A. Large deviations and thermodynamical formalism. Three Mini-courses related to this topic were delivered during the period 11 - 15 March 2013: (i) Y. Kifer (Hebrew University of Jerusalem), Large deviations in probability and dynamical systems; (ii) J.Buzzi (Universit´e Paris Sud), Entropy and measurable classifications of dynamical systems; (iii) O. Sarig (Weizmann Institute), Thermodynamic formalism for countable Markov shifts. The Mini-courses were followed by a Workshop held during the period 18 - 22 March 2013, organised by J´erˆ ome Buzzi (Universit´e Paris Sud), Stefano Luzzatto (International Centre for Theoretical Physics), Yakov Pesin (Pennsylvania State University) and Omri Sarig (The Weizmann Institute for Science). vii

viii

PREFACE

B. Limit theorems for dynamical systems. The following Mini-courses related to this topic were delivered during the period 27 - 31 May 2013: (i) J. Marklof (University of Bristol), Kinetic limits of dynamical systems; (ii) G. Forni (University of Maryland), Limit theorems for classical horocycle flows; (iii) S. Gou¨ezel (Universite de Rennes), Limit theorems in dynamical systems using the spectral method. These Mini-courses were followed by a Workshop held during the period 3 - 7 June 2013, organised by N. Chernov (University of Alabama at Birmingham), D Kleinbock (Brandeis University), M. Pollicott (University of Warwick) and R. Sharp (University of Warwick). Another important part of the Program were the Bernoulli Lectures delivered by prominent mathematicians and theoretical physicists. During the Program there were three such lectures: • Yakov Pesin (Pennsylvania State University) Appearance and genericity of chaos (7 March 2013) • Giovanni Gallavotti (Rutgers University) Hyperbolic systems and fluctuation theorems (16 May 2013) • David Ruelle (IHES) Nonequilibrium statistical mechanics: From heat transport to hydrodynamic turbulence (30 May 2013). The Program brought together mathematicians and scientists with a wide variety of different backgrounds and interests, and at different stages of their careers. The schools and workshops proved to be both enlightening and stimulating and we hope that the lecture notes in this volume will be equally beneficial to anyone who wants to understand more about recent developments in aspects of the statistical properties of dynamical systems. Dmitry Dolgopyat Yakov Pesin Mark Pollicott Luchezar Stoyanov

Proceedings of Symposia in Pure Mathematics Volume 89, 2015 http://dx.doi.org/10.1090/pspum/089/01503

Introduction During the last 50 years statistical properties of dynamical systems of various types and in many different situations have been a subject of extensive study in statistical mechanics and thermodynamics, ergodic and probability theories, and some areas of mathematical physics. The results of this study have had a profound effect on many areas in mathematics, physics, engineering and biology. The modern systematic study of statistical properties for hyperbolic dynamical systems really began in the early 1960’s with the fundamental work of Sinai, and Ruelle, and later of Bowen, who also laid the foundations of what is now called thermodynamic formalism. Though the methods they developed were of rather general nature, their applications were restricted mostly to uniformly hyperbolic systems due to the limitations of the techniques available at the time. The class of systems which could be analyzed was significantly broadened in 1970’s with the work of Pesin who laid the foundations for the ergodic theory of non-uniformly hyperbolic systems. This was followed by an extensive activity in studying and further developing of what is now known as “Pesin’s theory”, and it facilitated a great deal in studying uniformly hyperbolic systems as well. At present it is widely accepted that uniformly hyperbolic systems are, in general, well understood, although some important problems still remain unsolved. The main activities are in studying non-uniformly hyperbolic systems, hyperbolic systems with singularities, and more generally, systems which possess some kind of hyperbolicity however do not belong to the previously mentioned categories. One of the important ingredients in the work of Sinai, Ruelle and Bowen was symbolic dynamics which allowed the use of methods from statistical mechanics to study finite dimensional dynamical systems. The symbolic dynamics for uniformly hyperbolic dynamical systems was developed in the works of Adler and Weiss, Sinai, and Bowen. It allowed the proof of a large number of limit theorems for uniformly hyperbolic dynamical systems as described, for example in the book of Parry and Pollicott Zeta functions and the periodic orbit structure of hyperbolic dynamics. More recently there was a significant progress in the development of symbolic dynamics for non-uniformly hyperbolic systems. In particular, the method of Young towers allowed the establishment of limit theorems for many important example of non-uniformly hyperbolic systems, including Lorenz and H´enon systems. In 2013 Sarig showed that low dimensional systems with positive entropy can be described by topological Markov chains with countable alphabet. From a longer historical perspective, it is now three centuries since Jakob Bernoulli’s strong law of large numbers (for independent identically distributed c 2015 American Mathematical Society

1

2

INTRODUCTION

Figure 1. The 1994 commemorative stamp issued by the Swiss Post Office. random variables) appeared in his major work Ars Conjectandi published in 1713, eight years after his death. A commemorative postage stamp was published in 1994, bearing both his image and the formula for the strong law (Figure 1). While much of the historical motivation for the original definition of ergodicity came from the pioneering work of Boltzmann in statistical mechanics in the 19th Century and from his formulation of the “Ergodic Hypothesis”, the law of large numbers can be viewed as a precursor to the Birkhoff’s Ergodic theorem for ergodic transformations, from 1931, which is itself a cornerstone for the understanding of the statistical properties. It has subsequently been an important problem to establish ergodicity of particular classes of transformations, and to establish more precise and refined extensions of the ergodic theorem including, for example, Central Limit Theorems and Large Deviation Theorems. Generally speaking, the Central Limit Theorem measures the distributions of discrepancies of ergodic averages on a smaller scale. In the late 1950’s Sinai noticed that the behavior of geodesic flows on manifolds of constant negative curvature resembles that of stationary random processes and proved the Central Limit Theorem for them. In the 1970’s and early 1980’s the Central Limit Theorem and related statistical properties were established for quite general uniformly hyperbolic systems (by Ratner, Denker-Philipp and others). In the meantime in the early 1980’s Bunimovich and Sinai constructed (countable) Markov partitions for two-dimensional dispersing billiards and derived the Central Limit Theorem for such systems. In the last 20 years various results concerning limit theorems and related statistical properties have been established for some important partially hyperbolic and some non-uniformly hyperbolic systems, including systems with singularities. Important contributions in this area were made, among others, by Aaronson, Balint, Chernov, Denker, Dolgopyat, Gou¨ezel, Holland, Liverani, Melbourne, Nicol, Sarig, Sz´asz, Varj´ u, and Young. On the other hand, one could broadly say that the theory of Large Deviations studies the rate of probabilities of “rare events”. It also has natural relationships with some areas in statistical physics, e.g. thermodynamics and hydrodynamics, and in fact, some of the developments in this theory have been motivated by problems in statistical physics, particularly related to fluctuation theorems. On the other hand, the mathematical study of large deviations have made a significant

INTRODUCTION

3

contribution to the understanding of certain phenomena in statistical physics. The first large deviations results of Cramer (1938) and Chernoff (1952) on random variables with finite moment generating functions were followed by a series of significant applications and generalizations in statistics and probability theory and in the late 1970’s and early 1980’s Donsker and Varadhan extended these results to Markov processes. In hyperbolic dynamics general large deviation principles were obtained by Kifer and Young in the early 1990s. Since then there has been a significant development in the study of large deviation principles. For uniformly hyperbolic (discrete and continuous) dynamical systems this study (at least in its standard setting) was by and large completed in the 1990s. The last 10 years have seen a significant activity in obtaining large deviations results for non-uniformly hyperbolic or partially hyperbolic systems, including some systems with singularities. For some of these results Young towers’ technique has been used and since then there have been some substantial new and promising development in this direction. Following works of Gurevich and Gordin, there has been a significant activity in developing the foundations of thermodynamics and large deviations theory for countable Markov shifts and for non-uniformly hyperbolic systems (Aaronson-Denker, Mauldin-Urbanski, Pollicott-Sharp, Sarig, Yuri). The Gallavotti-Cohen fluctuation theorem (1995) has been the subject of similar developments. Furthermore, large deviation and fluctuation results have been established for flows as well. The study of statistical properties of non-uniformly hyperbolic systems is now a well established area of research. By comparison, much less is known in the case of zero entropy systems. Recently significant progress in the analysis of several important examples has been achieved in the works of Bufetov, Flaminio, Forni and Marklof. In these examples the progress was made by identifying an appropriate renormalization transformation which turned out to be hyperbolic. For example, for Interval Exchange Transformations and area preserving flows on surfaces the renormalization is effected by the Teichm¨ uller flow. The study of the Teichm¨ uller flow has been significantly advanced recently. This includes simplicity of Lyapunov spectrum (the Kontevich-Zorich conjecture) proven in the works of Forni (2002) and Avila-Viana (2007), exponential decay of correlations proven by Avila-Gou¨ezelYoccoz (2006), and classification of SL2 (R) invariant measures achieved by EskinMirzakhani (2013). These results have significant implications for the study of flows on surfaces. The main activities during the Program concentrated on two different aspects of its theme: Large deviations and thermodynamical formalism and Limit theorems for dynamical systems. We now briefly describe how these topics are presented in the lectures appearing in this volume. 0.1. Large deviations and thermodynamical formalism. The lectures of Kifer addressed the general theory of large deviations theorems which have direct applications both for Markov processes and (uniformly) hyperbolic dynamical systems. They started with an explanation of relationships between large deviations, thermodynamic formalism and fractal dimensions and then proceeded to describe large deviations in the conventional framework of averaging along orbits. Finally, more recent results concerning large deviations for some non-uniformly hyperbolic

4

INTRODUCTION

dynamical systems and large deviations for nonconventional averages related to ergodic theorems through subsequences were discussed. The lectures of Sarig dealt with the underlying tools of symbolic dynamics, thermodynamic formalism and transfer operators which underpin one of the classical approaches to uniformly hyperbolic dynamical systems. It also made connections with theoretical physics through phase transitions. Moreover, Sarig cast his lectures in the more general setting of countable state systems which play an important role in extending these results into the non-uniformly hyperbolic systems. These lectures focused on the construction of Gibbs measures and their underlying dynamical properties. The theory described here plays a key role in the recent work of Sarig on equilibrium states for surface diffeomorphisms. These lectures dovetailed nicely with the lectures of Buzzi, which concentrated on the Ornstein theory of Bernoulli processes to show that many probabilistic dynamical systems are classified by their entropy and period up to measure-preserving conjugacy. There was also a detailed explanation of how this extends to measurable conjugacies, preserving all or “most” invariant measures, in smooth and symbolic settings, and the role played by measures of maximal entropy. 0.2. Limit Theorems for dynamical systems. The lectures of Forni and Marklof dealt with the dynamical behavior of specific classes of distinctly different dynamical systems. While Forni considered the case of horocycle flows and the associated invariant distributions, Marklof explained the limiting behavior of certain types of dispersive billiards. Since the pioneering work of Maxwell and Boltzmann in the 1860s and 1870s, a major challenge in mathematical physics has been the derivation of macroscopic evolution equations from the fundamental microscopic laws of classical or quantum mechanics. Macroscopic transport equations lie at the heart of many important physical theories, including fluid dynamics, condensed matter theory and nuclear physics. The rigorous derivation of macroscopic transport equations is thus not only a conceptual exercise that establishes their consistency with the fundamental laws of physics: the possibility of finding deviations and corrections to classical evolution equations makes this subject both intellectually exciting and relevant in practical applications. The lectures of Marklof dealt with billiards models, focusing on his recent work with Strombergsson. The study of statistical properties of nonuniformly hyperbolic systems is now a well established area of research. By comparison, much less is known in the case of zero entropy systems. Recently, significant progress in the analysis of several important examples has been achieved in the works of Bufetov, Flaminio, Forni and Marklof. In these examples the progress was made by identifying an appropriate renormalization transformation which turned out to be hyperbolic. In the lectures by Forni, he described recent joint results with Flaminio and Bufetov on the deviation of ergodic averages and limit distributions of ergodic integrals of smooth functions for horocycle flows on the unit tangent bundle of compact surfaces of constant negative curvature. The classical horocyle flow is a parabolic (zero entropy) renormalizable dynamical system, it is uniquely ergodic, mixing of all orders, with nearly but not quite integrable decay of correlations. It is the simplest parabolic renormalizable system for which the study of deviation of ergodic averages and limit distributions can be carried out in some detail by tools

INTRODUCTION

5

of harmonic analysis, namely those of the theory of unitary representations of the group P SL(2, R). The results should be considered as a model for generalizations to other systems of the same kind such as interval exchange transformations and flows on surfaces (carried out by Bufetov), 2-step nilflows or horospherical foliations of geodesic flows on manifolds of negative curvature. In his lectures Gou¨ezel described the spectral approach (using transfer operators) to the Central Limit Theorem and stronger invariance principles, first for hyperbolic and then intermittent systems. The clarity of these lectures was enhanced by concentrating on maps of the interval. While martingale arguments are often convenient to prove limit theorems in dynamical systems, some classes of problems and some classes of systems are not amenable to such arguments. Gou¨ezel described the so-called Nagaev-Guivarc’h spectral method, that is often fruitful in such situations. Starting from the simplest example (the Central Limit Theorem for interval maps), more recent applications are also described including the convergence towards stable distributions, or to the almost sure invariance principle. 0.3. Additional material. The article of Dolgopyat and Fayad extends Dolgopyat’s workshop lecture. It gives a wide ranging survey of limit theorems in the context of total translations. Despite the simplicity of translations many natural questions remain open. As it is the case with other zero entropy systems available results rely heavily on renormalization which in the present case is effected by diagonal flows on the space of lattices. Guivarc’h’s article is an expansion of his lecture on the spectral gap properties for products of random matrices and limit theorems for some affine random walks, which complemented nicely the lectures of Gou¨ezel. Given a finitely supported probability measure p on the affine d-dimensional euclidean space V , let q denote the projection of p on GL(V ) and let G denote the subgroup generated by the support of q. Using spectral gap properties for Fourier-type operators associated with q and p, he obtained limit theorems for the p-random walk on V in two different situations. In the first case, G is dense in the orthogonal group O(V ) of V and convolution by q has a spectral gap in L2 (O(V )). This allows Guivarc’h to show a local limit theorem in V . In the second case G ⊂ GL(V ) is “large” (in a suitable sense) and the dominant q-Lyapunov exponent is simple and there exists on V a unique p-stationary measure which has unbounded support. Using spectral gap properties for twisted convolutions by q on projective spaces he shows that it is “homogeneous at infinity”, and he obtains a Fr´echet type limit law for the extremes of the p-random walk on V . Important ingredients in his analysis are local limit theorems and precise large deviation estimates for the random walks on GL(V ). The contribution of de Simoi and Liverani is an amplification of Liverani’s lecture on Fast-slow hyperbolic systems and averaging. The authors introduce a class of fast-slow hyperbolic systems and discuss the martingale method of Dolgopyat to obtain averaging and fluctuations around the average. The presentation is especially pedagogical and explained with the simplest possible examples.

Large Deviations and Thermodynamical Formalism

Proceedings of Symposia in Pure Mathematics Volume 89, 2015 http://dx.doi.org/10.1090/pspum/089/01491

The almost Borel structure of diffeomorphisms with some hyperbolicity J´erˆome Buzzi Abstract. These lectures focus on a recent result of Mike Hochman: an arbitrary standard Borel system can be embedded into a mixing Markov shift with equal entropy, respecting all invariant probability measures, with two exceptions: those carried by periodic orbits and those with maximal entropy. We discuss the corresponding notions of almost Borel embedding, isomorphism, and universality. The main part of this paper is devoted to a self-contained and detailed proof of Hochman’s theorem. We then explain how Katok’s horseshoe theorem can be used to analyze diffeomorphisms with ”enough” measures that are hyperbolic in the sense of Pesin theory, in both mixing and non-mixing situations. In the latter setting, new invariants generalizing the measures maximizing the entropy emerge.

Contents 1. Introduction 2. Definitions and background 3. Almost Borel Embedding and Isomorphism 4. Borel version of Krieger’s Embedding Theorem 5. Hochman’s Embedding Theorem 6. Katok’s theorem with periods 7. Diffeomorphisms beyond the mixing case References

1. Introduction Ornstein’s classical theory [30] gave powerful criteria to show that many naturally occuring measure-preserving transformations are isomorphic to Bernoulli schemes and are completely classified by their entropy. Since then, similar classifications by entropy have been obtained in more rigid categories. One of the first 2010 Mathematics Subject Classification. Primary 37A35; Secondary 37D25, 37B10. Key words and phrases. Dynamical systems, ergodic theory, entropy, Borel isomorphism, factors, universality, Markov shifts, diffeomorphisms, Pesin theory, hyperbolicity. The author gratefully acknowledges the support of the semester Hyperbolicity, large devi´ ations and fluctuations organized at the Centre Interfacultaire Bernoulli at Ecole polytechnique f´ ed´ erale de Lausanne where a version of these lectures was delivered. c 2015 American Mathematical Society

9

´ OME ˆ JER BUZZI

10

such results is the classification by Adler and Marcus [1] of mixing shifts of finite type up to almost topological conjugacy. The goal of these lectures is to explain a recent, striking extension of this circle of ideas achieved by Mike Hochman [20]. We will give an essentially self-contained proof and some applications to smooth dynamics, mainly based on Katok’s theorem on the approximation of hyperbolic measures by horseshoes [23]. 1.1. Hochman’s Theorem. Following Hochman [20], we consider:1 Definition 1.1. A Borel system (X, S) is a Borel automorphism S of a standard Borel space S. We denote by Perg (S) the set of its aperiodic, ergodic, invariant probability measures. A Borel subset Y ⊂ X is almost all of (X, S) —no reference measure specified or understood— if it has full measure with respect to all measures in Perg (S). The subject of these lectures is the following theorem. Theorem 1.2 (Hochman [20]). Let (Σ, σ) be a mixing Markov shift with entropy h(Σ). Then any Borel dynamical system (X, S) such that: (∗)

∀μ ∈ Perg (S)

h(S, μ) < h(Σ)

has an almost Borel embedding in Σ, i.e., there is a map ψ : X  → Σ satisfying: (1) X  is Borel and almost all of (X, S); (2) ψ : X  → Σ is Borel and one-to-one; (3) ψ ◦ S = σ ◦ ψ over X  . In other words, Σ is almost Borel universal (see Sec. 3) in the class of Borel systems satisfying (*). We discuss this striking result in Sec. 3. It should be compared to Krieger’s generator theorem [27, 28]2 : Theorem 1.3 (Krieger’s generator theorem). Let (Σ, σ) be a mixing SFT with entropy htop (Σ). Let (S, μ) be an ergodic system which is not reduced to a periodic orbit. If h(S, μ) < h(Σ) then there is a measurable embedding of a full measure subset of (S, μ) into Σ. Hochman’s theorem realizes all the embeddings given by Krieger’s theorem in a Borel and coherent way (i.e., preserving the injectivity). However this gives a statement of a different kind than Krieger’s: it embeds (under an entropy condition) any Borel system into any mixing Markov shift which is also a Borel system. In Sec. 3, we will see how such a property (called strict universality, see Sec. 3.1) yields a classification up to mutual embedding. In fact Hochman obtains a classification up to isomorphism thanks to a Cantor-Bernstein principle (see Lemma 3.2). The isomorphism is in the following sense: Definition 1.4. Two Borel systems (X, S), (Y, T ) are almost Borel isomorphic if there exists a Borel isomorphism ψ : X  → Y  such that: 1 We

refer to Sec. 2 for notations, definitions and background. are many versions of this classical theorem. Another one, perhaps closer to Hochman’s theorem, can be found in [14, chap. 31]: given a mixing SFT Σ, if (S, μ) is an aperiodic measurepreserving system and if for almost all ergodic components ν of μ, h(S, ν) < htop (Σ), then there is a measure-preserving embedding of S into Σ whose image is topologically minimal and uniquely ergodic. It is also interesting to compare with [2] which considers homeomorphisms of manifolds, not Cantor spaces. 2 There

ALMOST BOREL STRUCTURE

11

(1) X  is Borel and almost all of (X, S); (2) Y  is Borel and almost all of (Y, T ); (3) ψ ◦ S = T ◦ ψ over X  . From these results, Hochman obtained a classification of mixing Markov shifts by their entropy and their possession (or not) of a measure maximizing the entropy (or m.m.e.). He then easily deduced that many systems are isomorphic to Markov shifts if they contains ”enough” Markov shifts and ”expected” m.m.e. In the rest of Sec. 3, we compare almost Borel isomorphism with related notions. 1.2. A proof. The main part of these lectures is devoted to a self-contained proof of Hochman’s result. We give all necessary definitions and background and assume only basic results like the Kuratowski theorem from descriptive set theory or the Shannon-McMillan-Breiman theorem from ergodic theory. We essentially follow Hochman’s ideas, with only minor technical simplifications or variations (e.g., we do not use Benjamin Weiss countable generator theorem or finitary coding, and we use a Borel construction of a Borel subset of given measure). The first step of Hochman’s proof establishes a Borel version of Krieger’s Theorem 1.3 restricted to embedding into some special mixing SFTs together with a coding of the empiric measure by a binary sequence (in this paper N := {1, 2, 3, . . . }). Theorem 1.5 (See Theorem 4.2). For any Borel system (X, S) and  > 0, there are a mixing SFT (Σ, σ) with h(X) < htop (Σ) < h(X) +  and Borel maps φ : X  → {0, 1}N and ψ : X  → Σ with X  almost all of (X, S) such that: φ ◦ S = φ, ψ ◦ S = σ ◦ ψ, and (φ × ψ) : X  → {0, 1}N × Σ injective. The second step recodes φ × ψ into an equivariant, injective Borel map Ψ into some special mixing SFTs. This sequence Ψ(x) is built by ”splicing” into ψ(x) an equivariant version of φ(x) (obtained by considering the times of visit to a wellchosen set). Theorem 1.6 (See Theorem 5.2). For any Borel system (X, S) and  > 0, ˜ with h(X) < htop (Σ) ˜ < h(X) +  and an almost Borel there are a mixing SFT Σ ˜ embedding Ψ : (X, S) → (Σ, σ). To conclude, one embeds the image of Ψ into the given mixing SFT (Lemma 5.6) using markers (Lemma 5.7) and then removes the auxiliary assumptions which simplified the previous steps: generalizing the target system Σ from a special SFT to an arbitrary Markov shift and weakening the uniform entropy gap, i.e., the inequality h(X) < h(Σ) to the entropy inequality (*). 1.3. Application to smooth dynamics with mixing. In the rest of these lectures, we consider C 1 -diffeomorphisms T of compact manifolds whose differential older-continuous function of x (or C 1+ -diffeomorphisms for short).3 We T  (x) is a H¨ use the classical approximation result of A. Katok [23] (see Thm. 6.3 and Sec. 2.6): any ergodic, invariant probability measure which is hyperbolic (ie, without zero Lyapunov exponents) can be approximated by horseshoes. In Sec. 6, we recall Katok’s theorem and relate the period of the horseshoe with the periods of the measure, an additional piece of information which is necessary for our purposes. 3 This smoothness assumption is required by the proof of Katok’s theorem which relies on Pesin theory.

´ OME ˆ JER BUZZI

12

Perhaps unexpectedly, Hochman’s theorem turns such approximations into isomorphisms. In fact, Katok’s theorem provides the embedded mixing SFTs needed to follow Hochman’s approach. This shows that diffeomorphisms with ”enough” hyperbolic measures are almost Borel isomorphic to Markov shifts up to measures of maximal entropy. Our first results assume some mixing. Recall that a measure-preserving system (T, μ) is totally ergodic if all its iterates are ergodic and it is p-Bernoulli if it is isomorphic to the product of a Bernoulli system (see Sec. 2.5) and a cyclic permutation on p elements. Theorem 1.7. Let T be a C 1+ -diffeomorphism of a compact manifold M . Assume that: (#) for any h < htop (T ), there is a totally ergodic, hyperbolic invariant measure with entropy > h. Then, (1) (M, T ) is a disjoint union of Borel subsystems M1 M2 such that T |M1 is almost Borel isomorphic to a mixing Markov shift and M2 carries exactly the measures of maximal entropy of T ; (2) (M, T ) is almost Borel isomorphic to a Markov shift if and only if: (§) T has at most countably many m.m.e. (i.e., ergodic measures maximizing the entropy) and each one is p-Bernoulli for some (possibly varying) p ≥ 1. For surface diffeomorphisms, results of Sarig [36] and Berger [3] imply the condition (§) in point (2) of the above theorem and we obtain: Corollary 1.8. Let T be a C 1+ -diffeomorphism of a compact surface with positive entropy and a totally ergodic m.m.e. Then T is almost Borel isomorphic to a Markov shift. Moreover, such diffeomorphisms are classified up to almost Borel isomorphism by the following data: their topological entropy and the (possibly zero or infinite) number of their m.m.e.’s that are p-Bernoulli for each p ≥ 1. Recall the celebrated H´enon family of diffeomorphisms Ha,b : R2 → R2 with Ha,b (x, y) = (1 − ax2 + y, bx) for (a, b) ∈ R × R∗ . For 1 < a < 2 and |b| > 0 small enough, there is a relatively compact topological open disk Ua,b that carries all invariant measures and satisfies Ha,b (Ua,b ) ⊂ Ua,b . Hence these maps are almost Borel isomorphic to C 1+ -diffeomorphisms of a compact surface. P. Berger has shown4 : Theorem 1.9 (Berger [3]). There is a subset of the previous parameters with positive area such that for any such (a, b), h(Ha,b ) > 0 and there exists a unique m.m.e. which, moreover, is mixing. We obtain: Corollary 1.10. There is a set of positive measure of parameters (a, b) ∈ R2 so that each Ha,b is almost Borel isomorphic to a mixing Markov shift. More precisely, it is isomorphic to any positive recurrent mixing Markov shift with entropy h(Ha,b ). In particular, these maps are classified up to almost Borel isomorphism by their entropy. In [13], we considered diffeomorphisms of the type introduced by Bonatti and Viana [4]. As announced at the end of Sec. 1.3 of that paper: 4 Berger’s statements are more precise and more general. Our corollary 1.10 actually holds for all his ”strongly regular H´enon-like maps”.

ALMOST BOREL STRUCTURE

13

Corollary 1.11. The robustly transitive, non-partially hyperbolic diffeomorphisms obtained in [13] by deformation of Anosov diffeomorphisms satisfy the following stability property. Any C 1 -perturbation of such a diffeomorphism is almost Borel isomorphic to the initial Anosov diffeomorphism. 1.4. Application to smooth dynamics without mixing. Finally, in section 7, we remove the assumption of mixing. We use some general tools developed in [5]. It turns out that one has to take into account entropy, not only globally, but ”at given period”. The set of periods of μ ∈ Perg (T ) is: (1.1)

per(T, μ) := {p ≥ 1 : ∃f ∈ L2 (μ) f ◦ T = e2iπ/p f and f = 0}.

Some measures have a maximum period p, i.e., p is a period and no larger integer is a period. See Sec. 6.1 for some examples and properties. We need to consider the following generalization of m.m.e.’s: Definition 1.12. Let (M, T ) be a Borel system. A measure μ ∈ Perg (T ) is entropy-period-maximal if for any measure ν ∈ Perg (T ) with set of periods per(T, ν) ⊂ per(T, μ), one has h(T, ν) ≤ h(T, μ). Remark 1.13. It follows from Katok’s theorem that, if T is a C 1+ -diffeomorphism of a compact manifold, any entropy-period-maximal measure μ ∈ Perg (T ) has positive entropy unless it has a zero exponent. We show: Theorem 1.14. Let T : M → M be a C 1+ -diffeomorphism of a compact manifold M . Assume that there is h0 < htop (T ) such that all ergodic measures with entropy > h0 are hyperbolic. Then T is the disjoint union of three Borel subsystems M0 M1 M2 such that: (1) M0 carries only measures with entropy < h0 ; (2) M1 is almost Borel isomorphic to a Markov shift; (3) M2 carries only entropy-period-maximal measures of (M, T ). The entropy-hyperbolic condition studied in [12] defines non-empty open sets of C ∞ -diffeomorphisms such that the non hyperbolic measures have entropy ≤ h for some h < htop (T ). Hence an entropy-conjugacy in the sense of [8] (see Sec. 3.2) will be obtained if one can analyze the entropy-period-maximal measures in M2 . Remark 1.15. Sarig [36] has reduced surface diffeomorphisms to factors of Markov shifts which are continuous and “almost finite-to-one”. Boyle and the author [5] have shown that these factors are of ”Bowen type” and that this implies an almost Borel isomorphism to Markov shifts. Thus, C 1+ -diffeomorphisms of compact surfaces are Borel isomorphic to Markov shifts up to sets negligible for all measures of nonzero entropy. Alternatively and more in line with these lectures, the Bowen property can be used to analyze the entropy-period-maximal measures [5, Prop. 6.3]. One can then apply the above theorem with h0 = 0 and conclude by an abstract result about Markov shifts [5, Cor. 1.8]. 1.5. Some open problems. The application of Hochman’s theorem to smooth ergodic theory strengthens in a spectacular way some previous results that dealt only with m.m.e.’s or large entropy measure. It is perhaps even more interesting that it points to new invariants, like the entropy-period-maximal measures. It also asks new questions in smooth dynamics. Let us list three of them.

´ OME ˆ JER BUZZI

14

Measures maximizing the entropy. The m.m.e.’s and entropy-period-maximal measures that appear in the above theorems cannot be analyzed by the techniques of this paper. As explained in the previous remark, these are analyzed in [5] in the case of surface diffeomorphisms using Sarig’s symbolic dynamics [36] and a ”Bowen property” of the corresponding factors. How far can this be generalized? To higher dimensions or partially hyperbolic systems? Period and maximal entropy. F. Rodriguez-Hertz, M. Rodriguez-Hertz, Tahzibi and Ures [34] have studied the m.m.e.’s of a class of partially hyperbolic systems. In the generic case, these m.m.e.’s are hyperbolic and periodic-Bernoulli. However, their periods can be larger than 1. Is it possible that measures with smaller entropy have smaller period sets than any m.m.e.? Or does the classification of such partially hyperbolic systems up to almost Borel isomorphism reduce to that of their m.m.e.’s? Abundance of hyperbolicity. We deduce universality from Katok’s horseshoe theorem. Hence we need ”enough” hyperbolic measures so that5 : (*) any ergodic measure which is not entropy-period-maximal is dominated (Def. 7.1) by some hyperbolic measure. The usual tools to perturb Lyapunov exponents away from 0 consider a nice but fixed invariant measure (the volume). Among the partially hyperbolic diffeomorphisms with central dimension 1, are those with ”enough” hyperbolic measure in the sense of (*) C 1 or C 2 dense? Acknowledgment. I wish to thank the referee for careful reading and helpful suggestions that have significantly improved this paper. 2. Definitions and background We recall some standard facts to make these lectures as self-contained as it is reasonable and to fix notations. We also prove some basic facts for which we did not find references. 2.1. Borel spaces. A standard Borel space is (X, B) such that there exists a distance on X such that (X, d) is complete and separable, and B is the σ-algebra generated by the open sets. One often omits the Borel structure X if it is clear from the context. We refer to [25] for background. Example 2.1. The following are standard Borel spaces: N; R; NN ; compact metric spaces. A map between two Borel spaces is Borel map if the preimage of any Borel set is Borel. A Borel isomorphism beween spaces is a Borel map, which is invertible and with Borel inverse. The following result shows that standard Borel spaces are rather nice spaces. Theorem 2.2 (Kuratowski). All uncountable standard Borel spaces are isomorphic. The forward image of a Borel subset by a Borel (even continuous) map is not always Borel. However (see [25, (15.2)]): Theorem 2.3 (Lusin-Souslin). Let X, Y be standard Borel spaces. If Ψ : X → Y is a Borel injection, then Ψ(X) is a Borel subset of Y and Ψ : X → Ψ(X) is a Borel isomorphism. 5 However,

see [32, 33] for measure-preserving universality without hyperbolicity.

ALMOST BOREL STRUCTURE

15

More generally, if Ψ−1 (y) is countable for each y ∈ Y , then Ψ has a Borel section: Ψ(X) is Borel and there exists a Borel map Φ : Ψ(X) → X such that Φ ◦ Ψ = IdX . The following constructions stay within standard Borel spaces. Proposition 2.4. Let (X, X ) be a standard Borel space. Then the following are also standard Borel spaces: (1) If (Xi )i∈I  is a countable family of standard Borel spaces, then so is their product i∈I Xi . (2) any Borel subset Y ∈ X equipped with Y := {B ∈ X : B ⊂ Y }; (3) P(X), the set of Borel probability measures, equipped with the σ-algebra generated by μ → μ(B), B ∈ X ; (4) P(S) the subset of invariant probability measures; (5) Perg (S) the subset of ergodic, invariant probability measures. Moreover, the above Borel structure of P(X) coincides with that coming from the usual the weak star topology. 2.2. Categories of dynamical systems. In this paper, we mainly consider Borel systems, i.e., (X, X , T ) (or simply (X, T )) is a Borel automorphism T of a standard Borel space (X, X ). A Borel homomorphism between two Borel systems (X, S) and (Y, T ) is a Borel map ψ : (X, X ) → (Y, Y) such that ψ◦S = T ◦ψ. A Borel isomorphism between two Borel systems (X, S) and (Y, T ) is a Borel homomorphism which is an isomorphism between the Borel spaces. A Borel embedding of one system (X, S) into another (Y, T ) is a Borel injective map ψ : (X, X ) → (Y, Y) such that ψ ◦ S = T ◦ ψ. We turn (X, S) into a measure-preserving (dynamical) system (S, μ) by selecting a measure μ ∈ P(S). A measure-preserving system is aperiodic if the set of periodic points has zero measure. We refer to the first chapters of, e.g., [31] for background on ergodic theory. Recall that Perg (S) is the set of all aperiodic, ergodic measures (recall that a ‘measure’ means an invariant probability measure, unless specified otherwise). A property holds for almost all x ∈ X if it holds for all x outside an almost null set, i.e., it fails for a set of zero measure with respect to any measure in Perg (S). Equivalently, the complement set is almost all. An almost Borel map (or homomorphism) ψ : (X, S) → (Y, T ) between Borel systems is a Borel map ψ : X  → Y defined on an invariant6 Borel subset X  which is almost all of X. An almost Borel embedding of (X, S) into (Y, T ) is an almost Borel map ψ : (X, S) → (Y, T ) with an invariant Borel subset X  which is almost all of X and on which the restriction of ψ is defined everywhere and injective. An almost Borel isomorphism from (X, S) to (Y, T ) is an almost Borel map ψ : (X, S) → (Y, S) with invariant Borel subsets X  ⊂ X, Y  ⊂ Y such that ψ : X  → Y  is a bijection. Note that we can characterize these embeddings, resp., isomorphisms, by the existence in the category of almost Borel maps of a left inverse, resp., of an inverse, if we identify almost Borel maps ψi : (X, S) → (Y, T ), i = 1, 2, whenever there is an invariant Borel subset X  which is almost all of X and on which both maps are 6 It is not necessary to require invariance since for any Borel subset X  and measure μ ∈  Perg (S), μ(X  ) = 1 ⇐⇒ μ( n∈Z X  ) = 1.

´ OME ˆ JER BUZZI

16

defined and coincide. These characterizations are easily deduced from the LusinSouslin theorem. 2.3. Entropy of dynamical systems. We refer to chapters 5 and 6 of [31] for the following facts and to [15] for background. We consider a Borel system (X, T ).  A partition P is a countable Borel partition of X. The join i∈I Pi of a family (Pi )i∈i of partitions is the coarsest partition finer then any in the family. In  particular, P T,n := nk=0 T −k P . If x ∈ X and P is a partition of X, then P (x) denotes the unique element of P that contains x. For a partition P and a subset I ⊂ Z, the P, I-name of a point x ∈ X is the map w : I → P such that w(i) is the element of P containing S i x. The P -name is short hand for P, Z-name. The Kolmogorov-Sinai entropy of a measure-preserving system (T, μ) is h(T, μ) := sup{h(T, P, μ) : P is a finite partition of X}  where h(T, P, μ); = limn→∞ n1 H(P T,n , μ) with H(P, μ) := A∈P −μ(A) ln μ(A) ∈ [0, ∞] (0 ln 0 = 0). h(T, μ) is an invariant of measure-preserving isomorphism (i.e. isomorphism of Borel subsystems defined by full measure subsets). Let P1 , P2 , . . . be a refining sequence of finite partitions (each element of Pn+1 is contained in an element of Pn ). It is generating with respect to (T, μ) if there is X  ⊂ X with μ(X  ) = 1 such that for all x, y ∈ X  , (∀k ∈ Z∀n ∈ N Pn (T k x) = Pn (T k y)) =⇒ x = y. By Sinai’s theorem, in this situation, h(T, μ) = lim h(T, μ, Pn ). n→∞

The following theory is fundamental to the theory: Theorem 2.5 (Shannon-McMillan-Breiman). Let (T, μ) be an ergodic measurepreserving map. Let P be a countable Borel partition modulo μ such that H(P, μ) < ∞. Then, for μ-a.e. x, 1 lim − log μ(P T,n (x)) = h(T, μ, P ). n→∞ n For our purposes, the relevant notion of entropy for a Borel system (no invariant measure being distinguished) is the following. Definition 2.6. The Borel entropy (or just entropy) of a Borel system (X, T ) is:7

h(T ) := sup{h(T, μ) : μ ∈ Perg (T )} ∈ [0, ∞] ∪ {−∞}.

The topological entropy of a continuous map T on a compact metric space is defined as follows. Define the Bowen-Dinaburg balls as BT (x, , n) := {y ∈ X : max0≤k 0, X can be covered by finitely many sets of diameter < r. Hence: Fact 2.8. There exists a sequence of finite partitions P1 , P2 , . . . such that, for all distinct x, y ∈ X, there exists n ≥ 1, Pn (x) = Pn (y). For a continuous topological dynamical system T : X → X, one can associate to points x ∈ X (up to an almost null set) an ergodic invariant probability measure, called the empirical measure. We shall use a Borel version of this construction: Proposition 2.9. Let (X, T ) be a Borel system. There is a surjective almost Borel map M : X → Perg (S) such that: for all Borel P ⊂ Perg (T ), for all μ ∈ Perg (T ), (2.1)

μ(M −1 (P )) > 0 ⇐⇒ μ(M −1 (P )) = 1 ⇐⇒ μ ∈ P.

Proof. By the Kuratowski theorem 2.2, one can assume that X is the Cantor set (T is not necessarily continuous). In particular we can find a generating sequence sets such that Pn+1 is finer than P1 , P2 , . . . of finite partitions of X into clopen  Pn ∨ T −1 Pn . Let P∗ be the countable set n≥1 Pn . [0, 1]P∗ is a standard Borel space (see Prop. 2.4). We define f : P(X) → [0, 1]P∗ , μ → (μ(A))A∈P∗ . For each A ∈ P∗ , μ → μ(A) is Borel (recall Prop. 2.4), hence f is Borel. As P∗ is generating, f is injective so the Lusin-Souslin theorem 2.3 implies that f (P(X)) is Borel and f −1 : f (P(X)) → P(X) is Borel. We define F : X → [0, 1]P∗ by   1 . (2.2) F (x) := lim sup #{0 ≤ k < n : S k x ∈ A} n→∞ n A∈P∗ It is defined for every x ∈ X, and F : X → [0, 1]P∗ is a Borel map. Let 1 1 X1 := {x : lim inf #{0 ≤ k < n : S k x ∈ A} = lim sup #{0 ≤ k < n : S k x ∈ A}} n→∞ n n→∞ n Obviously it is a Borel set and, by Birkhoff’s ergodic theorem, its complement is a null set. By the choice of P∗ in the compact metric space X, the Carath´eodory extension theorem yields some μ ∈ P(X) such that F (x) = f (μ), i.e., F (X1 ) ⊂ f (P(X)). Hence M := f −1 ◦ F is well-defined and Borel. Note that for any A ∈ P∗ , T −1 A is a finite union of elements of P∗ , hence μ is invariant. Also Birkhoff’s ergodic theorem implies: (2.3)

∀μ ∈ Perg (S) ν(M −1 ({ν})) = 1.

The implication μ ∈ P =⇒ μ(M −1 (P )) = 1 of (2.1) follows. 8 Hints: Consider the possible Lipschitz constants of T and the Lebesgue measure as an ergodic measure of T .

´ OME ˆ JER BUZZI

18

We claim that, for all Q ⊂ P(T ) Borel and μ ∈ P(T ) such that, if μ(M −1 (Q)) > 0, then: ∃positive measure set of ergodic components ν of μ in Q

(2.4)

Indeed, the hypothesis implies that ν(M −1 (Q)) > 0 for a positive measure subset of the ergodic component ν of μ. Then eq. (2.3) implies M −1 ({ν}) ∩ M −1 (Q) = ∅, hence ν ∈ Q, proving (2.4). Applied to Q = P(T ) \ Perg (T ), (2.4) shows by contradiction that M (x) ∈ Perg (T ) outside a null set. Thus M : X \ X1 → Perg (T ) is a well-defined, almost Borel map. Applied to Q = P ⊂ Perg (T ), (2.4) shows that μ(M −1 (P )) > 0 =⇒ μ ∈ P , concluding the proof of eq. (2.1). Finally, the surjectivity of M follows from (2.1) with P := {μ} for μ ranging  over Perg (T ). Exercise 2.10. Show that: (1) if M  : X → P(X) is another almost Borel map satisfying eq. (2.1), then M = M  except on a null set. (2) if (X, S) is a homeomorphism of a metrizable Polish space, then, for all x  outisde of a null set, the limit9 μx := limn→∞ n1 n−1 k=0 δS k x exists in the vague topology (generated by the continuous, compactly supported real functions on X) and satisfies μx = M (x). Let P(X) be the set of finite Borel partitions. If A ⊂ X and P is a collection of subset of X, A  P means that A is a finite union of elements of P . If Q is a collection of subsets of X, then Q  P means that A  P for each A ∈ Q. Definition 2.11. Let h : Perg (T ) → [0, ∞] and, for any finite Borel partition Q of X, hQ : Perg (T ) → [0, ∞] be defined as the Kolmogorov entropies hT (μ) := h(T, μ) and hT,Q (μ) := h(T, μ, Q) (we often omit T from the notation). Proposition 2.12. For (X, S) a Borel system and P a finite Borel partition, the functions h, hP : Perg (S) → [0, ∞] are Borel.  Proof. Let Q∗ be the countable set n≥1 QT,n and E : P(X) → [0, 1]Q∗ . Observe that E is Borel as each function μ → μ(A), A ∈ Q∗ , is Borel. But H(QT,n , μ) is a continuous function of E(μ), so the following is Borel: hQ (μ) = limn→∞ n1 H(μ, QT,n ). Finally, h(μ) is Borel since it is equal to supn≥1 hQn (μ) if  Qn is a generating sequence of partitions by Sinai’s theorem. 2.5. Shifts. We refer to [19, 26] for background. An alphabet A is a countable (possibly finite) set with the discrete topology. Its elements are called symbols. The full shift on A is (ΣA , σ) where ΣA := AZ with the product topology and the homeomorphism σ : ΣA → ΣA defined by σ((An )n∈Z ) = (An+1 )n∈Z . The cylinders in ΣA are the closed-open subsets: [an . . . an+m ]X := {A ∈ ΣAZ : ∀k = n, . . . , n+m Ak = ak }. We also write [abc]X (with no explicit indices) for [x0 x1 x2 ]Σ with x0 x1 x2 = abc. A word of X (or an X-word) of length n is w ∈ An such that [w]X = ∅. 9 Recall

that δx is the probability measure such that δx ({x}) = 1.

ALMOST BOREL STRUCTURE

19

Exercise 2.13. Let Σ be {0, 1}Z with the shift σ. Let p(0) ∈ [0, 1], p(1) := 1 − p(0) and let νp be the unique σ-invariant Borel probability measure on Σ such that μ([xn . . . xm−1 ]Σ ) = p(xn )p(xn+1 ) . . . p(xm−1 ) for all integers n < m and words xn . . . xm−1 ∈ {0, 1}{n,...,m−1} . Show that {[0]Σ , [1]Σ } is a generating partition and that h(σ, νp ) = −p log p − (1 − p) log p. A subshift is (Σ, σ) where Σ ⊂ ΣA is a compact, shift-invariant subset of ΣA and σ is the restriction of the previous homeomorphism. A subshift Σ is said to be a Markov shift if there is a directed graph, i.e., a subset E ⊂ A2 such that: A ∈ Σ ⇐⇒ ∀n ∈ Z (An , An+1 ) ∈ E. A Markov shift Σ is called irreducible if it can be defined by a strongly connected graph, i.e., such that for any couple (a, b) of vertices, there is a path from a to b. Any Markov shift Σ is equal to a countably union of irreducible Markov shift (its components) up to an almost null set. The period of an irreducible Markov shift Σ is the greatest common divisor of all periods of all periodic orbits of Σ. A Markov shift is called mixing if it is irreducible and has period 1. Let Σ be an irreducible Markov shift. According to Gureviˇc [17], its Borel entropy satisfies: (2.5) h(Σ) = sup{h(X) : X Markov shift defined by a finite irreducible subgraph} and, if h(Σ) < ∞, then it has at most one m.m.e.. In this case, X is called positive recurrent. Recall that a p-Bernoulli system is a measure-preserving system isomorphic to the product of the cyclic permutation on p elements and (NZ , σ, μN ) for some probability measure μ on N (note that μ may be carried by a single point). For p ≥ 1, p-Bernoulli system is called periodic-Bernoulli and simple Bernoulli if p = 1. By a theorem of Gureviˇc [18], the m.m.e.’s of a Markov shift with finite entropy are, if they exist Markovian measure. It is well-known [22] that they are p-Bernoulli where p coincides with the period of the Markov shift. We recall an immediate consequence of Ornstein theorem [30]: any two periodic-Bernoulli are measurepreservingly isomorphic if and only if they have equal entropy and equal period. Moreover, for each t ≥ 0 and p ≥ 1, there are irreducible Markov shifts Σ0t,p , + Σt,p , which have Borel entropy t, period p with respectively zero and one m.m.e. The following exercise details a construction (see also [35]).  Exercise 2.14. Let f (z) = n≥1 fn z n be a non-identically zero formal power series with nonnegative integer coefficients. f defines a graph by taking for each n ≥ 1, fn loops of length n, each with a distinguished vertex 0 and identifying all the distinguished vertices (a petal graph in [19]). This graph defines an irreducible Markov shift Σf . Show that it is mixing if the greatest common divisor gcd({n ∈ N : fn = 0})) is equal to 1. For each n ≥ 1, let tn := #{x ∈ [0]Σf : σ n (x) = x}. Show that t(z) :=  1 + n≥1 tn z n = (1 − f (z))−1 , as formal power series. We recall that h(Σf ) = lim supn→∞ n1 log tn [17, 26]. Show that exp h(Σf ) = 1/ min(ρ, r) where 0 ≤ ρ ≤ ∞ is the radius of convergence of f and r := inf{0 < t ≤ ρ : f (t) = 1} ∈]0, ∞]. From now on, we assume h(Σf ) < ∞. We recall from [19, 26] that Σf has a m.m.e. if and only if  f (e−h(Σf ) ) = 1 and n≥1 nfn e−nh(Σf ) < ∞.

´ OME ˆ JER BUZZI

20

Show that if r < ρ then Σf has a m.m.e. For each 1 < Λ < ∞, find two formal power series f and g such that the Markov shifts Σf , Σg are mixing, with Borel entropy log Λ and Σf is positive recurrent and Σg is not. Generalize this to irreducible Markov shifts with a given period and (finite) entropy. of finite type (or SFT) is a subshift which can be written as ΣA \  A shift −k σ [w] for some finite alphabet A and finite set of words F . It is a w∈F,k∈Z one-step SFT if it is a Markov shift defined by a finite graph. SFTs are also characterized as those subshifts topologically conjugate to one-step SFTs (see [29] for background). If Σ is an irreducible SFT with period p, given any two symbols α, ω, there is an integer n0 and a number C > 1 such that the number Nα,ω (n) of X-words of length n whose first symbol is α and last symbol is ω satisfies: ∀n ≥ n0

C −1 ≤ Nα,ω (n)e−nh(X) ≤ C.

Exercise 2.15. Find a mixing Markov shifts such that, for each n ≥ 1, there is a vertex such that there is no loop of length n based at that vertex. Hint: recall the petal graph and associated shift from Exer. 2.14. What about the upper bound: supn≥1 supα,ω Nα,ω (n)e−nh(X) < ∞? 2.6. Hyperbolicity. See [24] and, more precisely, the supplement by Katok and Mendoza for background on smooth ergodic theory and Pesin theory. Let T be a diffeomorphism of a compact manifold M . For k = 1, . . . , dim M , the kth Lyapunov exponent at x ∈ M , is the following value: λk (T, x) := inf

sup

lim sup

E k−1 v∈(E k−1 )⊥ \{0} n→∞

1 log (T n ) .vT n x n

where E k−1 ranges over the (k − 1)-dimensional subspaces of the tangent space Tx M and  · x , x ∈ M , is an arbitrary Riemmanian structure and (E k )⊥ := {v ∈ Tx M : ∀w ∈ E k v · w = 0}. Obviously λk (T, ·) is a Borel function. An ergodic measure μ of T is said to be (Pesin) hyperbolic if, for μ-a.e. point there is no zero Lyapunov exponent. Observe that {x ∈ M : M (x) is hyperbolic} is Borel subset of X. Ruelle’s inequality bounds the entropy h(T, μ) from above by a sum of the positive Lyapunov exponents with multiplicity. If T is a surface diffeomorphism, this inequality applied to (T, μ) and (T −1 , μ) shows that ergodic measures with positive entropy are hyperbolic. A horseshoe is an invariant subset of M which is a continuous embedding of an irreducible SFT with positive entropy, and whose ergodic (invariant) measures are hyperbolic measures.10

3. Almost Borel Embedding and Isomorphism We recall basic facts about almost Borel embedding, isomorphism and universality and then compare almost Borel isomorphism to related notions. 10 This

last property is a simple consequence of being uniformly hyperbolic, see [24].

ALMOST BOREL STRUCTURE

21

3.1. Universality, Embedding and Isomorphism. A Borel system (X, S) is almost Borel universal 11 for a class C of Borel systems, if, for every (Y, T ) ∈ C, there is an almost Borel embedding of (Y, T ) into (X, S). (X, S) is almost Borel strictly universal for C, if it is both universal and an element of C. All classes admit almost Borel universal systems. Indeed, B. Weiss [45] has shown that (NZ , σ), the full-shift over a countable alphabet, is universal with respect to any class: any Borel system has a Borel embedding into it up to periodic orbits and a W -negligible set (see Sec. 3.2). However, not every class admits a strictly universal system. Trivial counter-examples are the class of uniquely ergodic Borel systems or that of systems with finite entropy. Serafin [39] has shown that the class of self-homeomorphisms of compact metric spaces with zero entropy has no strictly universal system. Now, let B(t) be the class of Borel (X, S) systems such that h(S, μ) < t for all μ ∈ Perg (S). Hochman’s Theorem 1.2 says that any mixing Markov shift Σ with entropy t is almost Borel universal for B(t), or for short, B(t)-universal. Using this remark, it is easy to build a strictly B(t)-universal system for each 0 < t < ∞: Exercise 3.1. Let Σ be a mixing Markov shift with entropy h(Σ) = t. For 0 ≤ s ≤ t, let Σs := (hΣ ◦ M )−1 ([0, s[). Prove that it is a Borel subsystem carrying exactly the ergodic measures of Σ with entropy < s. Show that Σs is strictly B(s)-universal. Check that Σ is strictly B(t)-universal if it has no m.m.e. Recall the Cantor-Bernstein theorem of set theory: if two sets embeds one into another, then they are in bijection. There is a well-known Borel version of this theorem (see [25, (15.7)]). As observed by Hochman, there is an equivariant version of this theorem: Lemma 3.2 (Dynamical Cantor-Bernstein [20]). Let (X, S) and (Y, T ) be Borel systems. Assume that there are almost Borel embeddings f : (X, S) → (Y, T ) and g : (Y, T ) → (X, S). Then there is an almost Borel isomorphism h : (X, S) → (Y, T ). Remark 3.3. The previous lemma should be distinguished from the arguments that deduce the Ornstein theorem from a generic Sinai theorem (see [15]). Indeed, the above lemma deals with subsystems and not with factors. Moreover, if (X, S) has a m.m.e., then the existence of an isomorphic measure in (Y, T ) is part of the assumption. Proof. The domains X  and Y  of f and g can be assumedto be S- and T invariant by removing almost null sets. We further remove X∗ := n≥0 (gf )−n (X  ∪ f −1 (Y  )) and Y∗ , defined symmetrically. We leave it to the reader to check these are invariant, almost null subsets of (X, S) and (Y, T ) and that f (X \ X∗ ) ⊂ Y \ Y∗ and g(Y \ Y∗ ) ⊂ X \ X∗ . Hence we can assume that we have mutual Borel embeddings of Borel systems. Define inductively two non-increasing sequences of subsets: X0 = X and Xn+1 = gf (Xn ); Y0 = Y and Yn+1 = f g(Yn ). Let      A= Xn ∪ (Xn \ g(Yn )) and B = (Yn \ f (Xn )). n≥0

n≥0

n≥0

Let us assume that (†) A and B are Borel and that f (A) = Y \B and g(B) = X \A. 11 Almost

Borel universal systems are ‘universal terminal objects’ in an obvious category.

´ OME ˆ JER BUZZI

22

We define h : X → Y by setting h(x) = f (x) if x ∈ A and h(x) = g −1 (x) otherwise. The claim (†) implies that it is Borel, bijective, and therefore a Borel isomorphism between the spaces using the Lusin-Souslin theorem. Finally observe that S(X) = X and T (Y ) = Y imply that A and B are S or T -invariant. Hence, using the invariance of A we get: h(S(x)) = f (S(x)) = T (f (x)) = T (h(x)) for x ∈ A and likewise for x ∈ X \ A. The following exercice suffices to conclude.  Exercise 3.4. Prove the claim (†) above. Hint: To analyze A and B one can identify, e.g., Xn \ g(Yn ) as the set of points in X such that x, g −1 x, f −1 g −1 x, . . . , (f −1 g −1 )n (x) is well-defined, but not g −1 ◦ (f −1 g −1 )n (x). Using Ex. 2.14 and 3.1, we obtain: Corollary 3.5. Given any class of Borel systems, its almost Borel strictly universal systems are pairwise almost Borel isomorphic (if they exist). In particular, any almost Borel strictly B(t)-universal system is almost Borel isomorphic to a mixing Markov shift with entropy t and no m.m.e. The following lemma of [20] is easy but important: Lemma 3.6 (Hochman). Let (Y, T ) be a Borel system. Let H be the set of numbers 0 ≤ h ≤ ∞ such that (Y, T ) is B(h)-universal. Then H = [0, sup H]. In particular, if for every h < h(T ), one can embed a mixing SFT into (Y, T ), then (Y, T ) is B(h(T ))-universal. This allows Hochman [20, Thm 1.6] to analyze many systems (e.g., mixing Markov shifts or natural extensions of β-shifts). This will allow us to use Katok’s theorem. Proof. As any system is B(0)-universal, we can assume sup H > 0. For 0 ≤ s < t, B(t)-universality implies B(s)-universality, hence there is a sequence h1 := 0 < h2 < h3 < . . . converging to sup H such that (Y, T ) is B(hn )-universal for each n ≥ 1. Let (X, S) be in B(sup H). According to Propositions 2.9 and 2.12, the following invariant sets are Borel:

Xn := {x ∈ X : hn ≤ h(S, Mx ) < hn+1 },

n ≥ 1.

Observe that n≥1 Xn is a disjoint union and that it is equal to X up to an almost null set. (Y, T ) being B(hn+1 )-universal, there is an almost Borel embedding ψn : Xn → Y for each n ≥ 1. Hence we have an almost Borel homomorphism Ψ : X → Σ defined by Ψ|Xn = ψn for each n ≥ 1. For any μ ∈ Perg (Σ), if μ(Ψ(Xn )∩Ψ(Xm )) > 0, h(σ, μ) ∈ [hn−1 , hn [ ∩ [hm−1 , hm [ so m = n. Hence, Ψ : X → Σ is an almost Borel embedding and (Y, T ) is B(sup H)universal.  The following is amusing and useful: Exercise 3.7. Let (X, S), be an almost Borel strictly universal system for B(t) for some t ≥ 0. Let ∅ = I ⊂ R. Show that (X ×I, S ×Id) is almost Borel isomorphic to (X, S).

ALMOST BOREL STRUCTURE

23

3.2. Other partial Borel isomorphisms. We compare almost Borel isomorphism with related notions among Borel systems: Borel isomorphism, Borel isomorphism up to wandering sets and entropy-conjugacy. It is a nice exercise to put all these notions in the following common framework of ”partial Borel isomorphisms”. Indeed, each of those notions corresponds to a choice of negligible subsets in each Borel system (possibly restricted to some subclass). Then two systems are said to be isomorphic if they each contain Borel isomorphic subsystems whose complement sets are negligible in the chosen sense. The following admissibility conditions ensure that such notions are equivalence relation: (A0) (A1) (A2) (A3)

the empty set is negligible; each negligible subset is contained in an invariant negligible subset; a finite union of negligible sets is negligible; if A, B ⊂ X are negligible and ψ : (X \ A, S) → (Y, T ) is a Borel embedding, then ψ(B \ A) is negligible.

Exercise 3.8. Show that (A0)-(A3) imply that the partial isomorphism is an equivalence relation. Check that the relation defined by neglecting invariant subsets Borel isomorphic to countable unions of Markov shifts is not an equivalence relation. Neglecting wandering sets. Shelah and B. Weiss [40] (see also [44, 45]) have introduced and studied the following notion. A Borel set is W -negligible if it contained in a countable union of wandering sets, i.e., Borel sets W that are disjoint from all their iterates T −n W , n ∈ Z \ {0}. This definition was motivated by the proof of Poincar´e’s recurrence theorem in ergodic theory. A further indication of its naturalness is: Theorem 3.9 (Shelah-B. Weiss [40]). A Borel subset E ⊂ X is W -negligible if and only if it has zero measure with respect to all ergodic conservative Borel probability measures μ such that μ and μ ◦ T are equivalent (i.e., have the same sets of zero measure). This notion is obviously stronger than almost Borel isomorphism. Indeed, according to B. Weiss [44]: ‘the true complexity of Borel automorphisms lie in those that have no invariant measure’. He offered the following question [44, p. 397]. For α ∈ R \ Q, let Rα : [0, 1[→ [0, 1[ be defined by Rα (x) = x + α mod 1. Fix F a closed subset  of [0, n1[ with empty interior and positive Lebesgue measure and (F ). Iα has zero Lebesgue measure and is residual. Are let Iα := [0, 1[\ n∈Z Rα all Rα |Iα Borel isomorphic up to W -negligible sets? This question is still open. In the classification of Markov shifts, the problem of strengthening the isomorphism is linked to the relations between ergodic and symbolic classifications. A generalized Perron-Frobenius theorem define very natural classes of Markov shifts (see [42,43]) which have dynamical meaning [19,36,37]. These classes are invariant under topological conjugacy in the locally compact case and, more generally, under symbolic notions of isomorphisms like the almost isomorphism of [6]. Hochman’s theorem implies that some of these distinctions are not invariant under almost Borel isomorphism (strongly positive recurrent among positive recurrent, or null recurrent vs. transient). Would this still be the case for Borel isomorphisms? Borel isomorphism up to W -negligible sets?

´ OME ˆ JER BUZZI

24

Entropy-conjugacy. We turn to a weaker notion of isomorphism. For many systems with non-uniform hyperbolicity properties, one is often led to distinguish ”more tractable” measures, e.g., those that have better hyperbolicity properties. Indeed, some natural constructions will only work for those ”better measures”. One can therefore focus on invariant probability measures with nonzero entropy by taking as negligible sets those that have zero measure with respect to all ergodic, invariant probability measures with nonzero entropy. This obviously satisfies (A0)(A3). A first example can be found in Hofbauer’s analysis of piecewise monotone maps12 of the interval, e.g., C 1 maps of [0, 1] into itself with finitely many critical points. Hofbauer [21] built a partial Borel isomorphism (following prior work of Takahashi [41]) and showed that the discarded set had zero measure for all ergodic invariant probability measures with nonzero entropy. Hofbauer then used this isomorphism to analyze entropy maximizing measures and showed that whenever the topological entropy of the interval map is nonzero, then there are only finitely many ergodic entropy maximizing measures (and exactly one for maps with a single critical point). The author generalized such constructions to other settings: C ∞ interval maps with infinite critical set [8], piecewise expanding and affine maps [9, 10] and a class of smooth maps with critical hypersurfaces [11, 12]. In these settings, one is led to focus on measures with large entropy. Indeed, for some of these examples and results, some measures with positive (but small) entropy do not have the ”right” properties (for instance their support is contained in hypersurfaces). Therefore, in these studies, one defines a Borel subset E to be entropy-negligible in some Borel system (X, S) if there exists h < h(S) such that μ(E) = 0 for all ergodic measures μ with h(T, μ) > h. Conditions (A0)-(A3) are again easily checked13 . The resulting notion of partial Borel isomorphism is called entropyconjugacy. We note, that, in many cases, analysis up to entropy-conjugacy is a necessary preparation to the almost Borel analysis, which then may follow more or less easily from Hochman’s theorem. In these lectures, we shall be especially concerned with the example of surface diffeomorphisms. Here the powerful construction of Sarig yields representations up to entropy-conjugacy (and finite fibers). More precisely, for each χ > 0, Sarig builds a finite-to-one symbolic cover up to a set negligible for all measures with entropy > χ (the complement set of a χ-large subset in Sarig’s terminology). As announced in the introduction, we shall improve this to a partial Borel conjugacy up to a positive-entropy-negligible set under a mixing assumption (the general case is treated in [5] using different methods). Remark 3.10. In minimal dimensions (dimension 1 for maps, 2 for diffeomorphisms), Lyapunov exponents rather than entropy seem to be the main phenomenon. Indeed, Bruin [7] has shown that, under a classical distortion condition the natural partial conjugacy in a variant of Hofbauer’s construction exactly discards measures with zero Lyapunov exponents. Similarly, the symbolic dynamics of Sarig discard only measures with (some) zero exponents (by all codings for χ > 0). 12 More

precisely, their natural extensions. this type of partial Borel isomorphism, condiiton (A2) cannot be strengthened from finite to countable. 13 For

ALMOST BOREL STRUCTURE

25

4. Borel version of Krieger’s Embedding Theorem Hochman proves the universality of mixing Markov shifts by using a Borel version of Krieger’s embedding theorem. 4.1. Statement of the Embedding Theorem. We will first encode the Borel system by the following type of concatenations of words. Definition 4.1. For any positive integers s, T, N , we consider the following sets of symbols and words: • A(s) := {1, 2, . . . , s, | , ∅}; T −1 T −1 r • S(s, T ) := {∅ | ∅ : 0 ≤ r < T };  σ : σ = 1, . . . , s} and T (s, T ) := {∅ • W(s, T ) := q≥0 Wq (s, T ) with Wq (s, T ) := {w1 . . . wq : wi ∈ S(s, T )}; T, N ) := {tw : t ∈ T (s, T ) and w ∈ W(s, T ) with |tw| ≥ N }. • W(s, We define the following invariant sets of sequences: T, N ); (1) Σ∗ (s, T, N ) as the infinite concatenations of words from W(s, (2) Σ(s, T, N ) as the infinite concatenations of words from S(s, T ) ∪ T (s, T ) such that two symbols | are always at least N positions apart. Remark that Σ∗ (s, T, N ) ⊂ Σ(s, T, N ) and that the latter is not compact. Also both sets decreases when N is increased. We will often omit (s, T, N ) when they are obvious from the context and so write simply Σ∗ and Σ. The core technical result in this section is: Theorem 4.2. Let (X, S) be a Borel system. For any integers s, T, N∗ such that log s/T > h(X), there are two Borel maps: φ : X → {0, 1}N and ψ : X → Σ(s, T, N∗ ) such that: • φ ◦ S = φ and ψ ◦ T = σ ◦ ψ; • (φ × ψ) : X → {0, 1}N × Σ(s, T, N∗ ) is injective. Moreover, for any μ ∈ Perg (S), μ(ψ −1 ([| ])) > 0 and the map M : X → P(S) ˜ ◦ φ for some Borel map M ˜ : Σ2 → P(S). factorizes through φ: M = M The following proof builds on the proof of Krieger’s theorem presented in [15, Thm. 4.2.3]. 4.2. Coding φ of the measures. The first step in the proof of Theorem 4.2 is the following consequence of Proposition 2.9: Lemma 4.3. Let (X, S) be a Borel system. There exists a Borel map φ : X → {0, 1}N such that if M : X → Perg (X) is the almost Borel map from Proposition 2.9: (1) φ ◦ S = φ; ˜ : P(X) → {0, 1}N . (2) φ = φ˜ ◦ M for some Borel injective map M Proof. As P(X) {0, 1}N and {0, 1}N are uncountable, standard Borel spaces, the Kuratowski theorem gives a Borel injection φ˜ : P(X) → {0, 1}N . It suffices to set φ := φ˜ ◦ M .  4.3. Basic tools for Equivariant coding. The starting point is the ShannonMcMillan-Breiman theorem 2.5. We rephrase it in a Borel way:

26

´ OME ˆ JER BUZZI

Theorem 4.4 (Shannon-McMillan-Breiman). Let (X, S) be a Borel system with a finite Borel partition P . For each x ∈ X (up to a null set), let hP (x) = h(S, Mx , P ) be the entropy of the empirical measure with respect to the partition P . Let GP (, N ) := {x ∈ X : ∀n ≥ N Mx (P n (x)) = e−(hP (x)±)n }. Then hP and GP are Borel and, for all μ ∈ Perg (S), ∀ > 0 lim μ(GP (, N )) = 1. N →∞

Exercise 4.5. Check that the above theorem is implied by the classical version, Theorem 2.5. The following deduces from the above a conditional coding for some good orbit segments. One can first consider the case Q = {X}. Corollary 4.6. Let P, Q be two finite Borel partitions and let N be a positive integer. Assume that P is finer than Q. Then, for each n ≥ N , there is a Borel function (we omit the dependence on ): iP,Q,n : GP (, N ) ∩ GQ (, N ) → N such that: (1) iP,Q,n (x) ≤ e(hP (x)−hQ (x)+)n ; (2) for all x, y ∈ GP (, N ) ∩ GQ (, N ) belonging to the same element of Qn : iP,Q,n (x) = iP,Q,n (y) ⇐⇒ P n (x) = P n (y). To get the equivariance of the cutting (assumed in the previous exercice), we use a (Borel version of) Rokhlin towers (see [16, Prop. 7.9]): Proposition 4.7 (Glasner-Weiss). Let (X, T ) be a Borel system. For all n ≥ 1 and δ > 0, there exists a Borel set B such that: (i) B, T B, . . . , T n−1 B are pairwise  k disjoint; (ii) ∀μ ∈ Perg (T ) μ( n−1 k=0 T B) > 1 − δ. n−1 We say that B is the basis of a tower k=0 T k B of height n. Proof. We can assume that there is no periodic point. For any Borel subset A, let τA : X → N be defined by τA (x) := inf{n ≥ 1 : T n x ∈ A} and let O(A) :=  p p∈Z T A. Claim. For any Borel set A and any integer 0 < n ≤ minx∈A τA (x), there is a Borel set B ⊂ O(A) such that B, T B, . . . , T n−1 B are pairwise disjoint, O(B) = O(A), and for all μ ∈ Perg (T ), ⎛ ⎞ n−1  n μ ⎝O(A) \ . T j B⎠ ≤ min τA (x) x∈A j=0 Proof of the Claim. Set B := {x ∈ X : τA (x) ∈ nN}. Obviously, B ⊂ O(A). By assumption, for any x ∈ A, τA (x) ≥ n so τA (T k x) = n for some k ≥ 0: A ⊂ O(B). O(A) = O(B) follows.  j For all 0 < j < n, τA (T j B) ⊂ nN−j so B ∩T j B = ∅. Also O(A)\ n−1 j=0 T B ⊂ n−1 j j=0 T A. and the frequency of visits to this set is at most n/ min τA . The claim follows from the Birkhoff ergodic theorem. 

ALMOST BOREL STRUCTURE

27

Fix a Polish distance on X. Let P1 , P2 , . . . be finer and finer partitions of X with maxA∈Pk diam(A) → 0. Let N := [n/δ]+1. Let A1 , A2 , . . . be an enumeration of the elements of the partitions Pk , k ≥ 1, such that T j A∩A = ∅ for all 0 < j < N . Set X0 := X. We define inductively subsets Bk , Xk by applying the above Claim ∩ Xk−1 and n < δN to get Bk ⊂ Xk−1 and we set Xk := Xk−1 \ O(Bk ). Let to Ak  B := k≥1 Bk It is Borel. k ≥ 1} is a partition of X. The disjointness is obvious. We claim that {O(Bk ) :  If there were some x ∈ X \ k≥1 O(Bk ), it would belong to all Xk−1 , k ≥ 1. But x, . . . , T N −1 (x) are at a positive distance from each other, so for all  large enough, T j (P (x)) ∩ P (x) = ∅ for all 0 < j < N . Thus there must be some k ≥ 1 with x ∈ Ak ∩ Xk−1 . This implies x ∈ O(Bk ) = O(Ak ∩ Xk−1 ), a contradiction. Finally, • B ∩ T j B = ∅ for all 0 < j < n is obvious from the construction. • For any μ ∈ Perg (T ), there is, by ergodicity, some k ≥ 1 such that n−1 n−1 μ(O(Bk )) = 1. Hence, μ(X \ j=0 T j B) = μ(O(Bk ) \ j=0 T j Bk ), which is at most n/N < δ.  Assumptions. Let s, T, N∗ be positive integers as in Theorem 4.2. Let k , k ≥ 1, be positive numbers with  1 k < δh(S) where δh := log s/T − h(S) > 0. (4.1) 4 k≥1

4.4. First level coding. We define the first level of coding φ1 into Σ∗ := Σ∗ (s, T, N∗ ) such that ψ1 (x) determine the P1 -name of almost all points with enough space and flexibility left for the coding of the remaining partitions. We define: hP (x) + 1 (1 − g1 (x)) free positions. Proof. (1) is clear. To check (2), remark that the 1-free positions in a given 1-interval is the complement of marker, filling and special positions. Hence their number is: −(T + r + ET (g1 (x)) − 3T + T ) = ([/T ]T + r) − (r + ET (g1 (x)) + T ) = ([/T ] − [g1 (x)/T ] + 1)T > (/T − [g1 (x)/T ])T ≥ (1 − g1 (x)).  Recall that a Borel set B to be almost completely positive if for any μ ∈ μ(B) > 0. See Sec. 2.3 for P -names.

Perg (S),

Proposition 4.10 (Level 1-coding map). There exist a completely positive Borel subset B1 ⊂ X and an almost Borel homomorphism ψ1 defined on X and into Σ∗ with the following properties for almost all x ∈ X: (1) for all p ∈ Z, S p x ∈ B1 ⇐⇒ p ∈ S1 (ψ1 (x)); (2) the P1 -name of x depends only on symbols in 1-filling positions and Mx ; (3) the 1-free positions occupy a fraction > (1 − g1 (x)) of the interval; (4) the 1-special positions and the 1-free positions are repetitions of ∅T −1 1; (5) α as well as any of its 1-modifications is an element of Σ∗ . We first build the subset B1 . We abbreviate GPk (k /3, n) to Gk (n) for k, n ≥ 1. Lemma 4.11. For any invariant almost Borel function N0 : X → N, there exists a Borel B1 ⊂ X such that for almost all x ∈ X : (i) Mx (B1 ) > 0; (ii) for any n ≥ 1, x, S n x ∈ B1 implies n ≥ N0 (x) and x ∈ G1 (n). Proof. For x ∈ X, we define N1 (x) to be the smallest integer n ≥ N0 (x) such that Mx (G1 (n)) > 9/10. Theorem 4.4 ensures that this is well-defined for almost all x ∈ X. x → Mx (G1 (n)) is an almost Borel function given n, so is N1 . N1 (x) depends on x only through Mx , so it is invariant. We can assume that it is a constant, denoted also N1 , by splitting X into countably many invariant Borel subspaces. A subset B1 for the full space will be obtained as the union of the subsets B1 built in each of the subspaces as follows. Proposition 4.7 with n = N1 and δ = 1/10 gives a Borel subset B ⊂ X such that, for almost all x ∈ X, Mx (B ∪ · · · ∪ S N1 −1 B) > 9/10 and B ∩ S k B = ∅ =⇒ |k| ≥ N1 , or, equivalently: 8/10 < Mx

 G1 (N1 ) ∩

N 1 −1 q=0

 q

S B

=

N 1 −1

Mx (G1 (N1 ) ∩ S q B) .

q=0

Thus, for almost all x ∈ X, there is an integer 0 ≤ q < N1 such that Mx (G1 (N1 ) ∩ S q B) > 8/10N1 . This q depends only on Mx , so is an almost Borel and invariant function of x. As before, we can assume it to be constant (maybe after splitting X). The set  B1 := G1 (N1 ) ∩ S q B has the required properties. A time of visit of x ∈ X to E ⊂ X is an integer n such that S n x ∈ E.

ALMOST BOREL STRUCTURE

29

Proof of Proposition 4.10. Let N > N∗ := [6 log s/1 ]+1 (so e21 N/3 > s4 ) be an integer. By definition of G1 (N ), for all  ≥ N , #{(P1 ) (x) : x ∈ G1 (N∗ )} ≤ exp ((hP1 (x) + 1 /3)) < sg1 (x)/T −4 ≤ s[g1 (x)/T ]−3 . Recall iP,Q, : G1 (N ) → N from Corollary 4.6: for all x ∈ G1 (N ),  ≥ N , iP1 ,{X}, (x) ≤ s[g1 (x)/T ]−3 , so we can set: (4.3)

c1, (x) = c[g1 (x)/T ]−3 ◦ iP1 ,{X}, (x) ∈ W(s, T )

where a family of injective maps cp : {1, . . . , sp } → Wp (s, T ), p ≥ 1, has been selected, depending only on s, T, p. We apply Lemma 4.11 to get B1 with minimum return time > N0 (x) := max(N∗ , 6 log s/1 , T /(1 − g1 (x)) + 1). B1 being completely positive, almost all x visit it infinitely many times in the future and in the past. Hence all of Z is partitioned into (finite) 1-intervals. Finally the length of each interval is at least N0 (x). To define ψ1 (x) for almost all x, we specify ψ1 (x)|ba for any two consecutive times a < b of visit to B1 . Let  := b − a = qT + r with q ≥ 0, 0 ≤ r < T . We set: σ = 1, u := [/T ] − [g1 (x)/T ] + 1, and: (4.4)

−1 σ (∅T −1 1)u ψ1 (x)|ba = ∅T −1 |∅r c1, (S a x) ∅ T          marker

1-filling

1-special

1-free

To justify the annotations (marker, 1-filling, etc.) compare with Def. 4.8, and observe that c1, (S a x) and (∅T −1 1)u have the right lengths (use Lemma 4.9). It is now clear that ψ1 (x) ∈ Σ∗ . The same applies to any level 1-modification of ψ1 (x), which can only replace σ by 2 and (∅T −1 1)u by a word from W(s, T ). Claim (5) is proved. The Borel and equivariant character of ψ1 are obvious from its construction. Claims (1), (2), (4) follow, once one observes that the decomposition of each 1interval into marker, filling, special and free is determined by its endpoints.  4.5. Structure of the coding. We have just seen how to encode orbits with respect to P1 . We are going to do it simultaneously with respect to all Pk , k ≥ 1. Obviously we cannot do these encodings independently (and get a finite entropy process), since hPk (S, μ) → h(S, μ) > 0 as k → ∞. Instead we use conditional coding: we assume that P1 , P2 , . . . are finer and finer and remark that we only need to specify which (Pk , I)-name occurs given the (Pk−1 , I)-name for a collection of intervals I covering Z. The number of possibilities is bounded by the ratio between the measures of (Pk , I)-cylinders and (Pk−1 , I)-cylinders. We will estimate these ratios by the Shannon-McMillan-Breiman theorem in terms of the conditional entropies of Pk with respect to Pk−1 for all k ≥ 1. Of course, the intervals I above have no reason to be uniform in k ≥ 1 (they cannot be, except in very special cases). To address this, we use nested partitions of Z into longer and longer intervals. This hierarchical structure will be defined by visits to a sequence of nested, completely positive sets: B1 ⊃ B2 ⊃ . . . , generalizing the level 1-coding. We turn to the details. Recall that positive numbers 1 , 2 , . . . have been chosen small enough, see (4.1). We will use the following positive Borel functions, for k ≥ 1, gk (x) :=

hPk |Pk−1 (S, Mx ) + k + k−1 . log s/T

´ OME ˆ JER BUZZI

30

n1

n2

n3

n4

n5

n6

n7

S3

S1

S1

S2

S1

S2

S3

· · · | a1 ↑ b11 | a2× b12 ↑c1 | a3× c2×· · · | a4↑ b21 | a5× ↑ b22 × | a6 ↑ b3 ×· · · | a7 ↑ · · · Figure 1. The above is a segment of a coding sequence as in Def. 4.12. The symbols ↑, × or | stand for the blocks ∅T −1 2, ∅T −1 1, or ∅T −1 |∅t , 0 ≤ t ≤ T − 1. The letters a· , b·· , c·· stand for words from Wq (s, T ), q ≥ 1. The filling and special positions are in colors (black, blue, red) corresponding to their level (except for the symbols | which belong to the 1-filling, but are colored according to the maximum k such that their positions belong to Sk and the level 3-free positions which are in green. For instance n4 ∈ S2 \ S3 is the left endpoint of a level 2-interval indexed by the word b21 b22 and finishing at n6 ∈ S2 . Taking 0 := 0, P0 := {X}, this is compatible with (4.2). These numbers satisfy:  K  hPK (S, Mx ) + 2 K−1 k=1 k + K < 1 (∀K ≥ 1). (4.5) gk (x) = log s/T k=1

 Definition 4.12 (Coding Structure). Given positive numbers g1 , g2 , . . . with k≥1 gk < 1, the coding structure of a sequence α ∈ Σ∗ is the following sequence of refining partitions of Z into intervals. Level 1-intervals and their partitions into marker, 1-filling, 1-special, 1-free positions, and S1 (α), have been defined in Def. 4.8. For k ≥ 2, the level k-structure is defined inductively: (1) Sk (α) := {n ∈ Sk−1 (α) : the word ∅T −1 2 appears at the level (k − 1)special positions inside the level (k − 1)-interval starting at n}; (2) [n, m[ is a level k-interval if n, m are consecutive elements of Sk (α). A level j-interval, j < k, contained in [n, m[ is a level j-subinterval. [n, m[ is divided into (a) filling positions of each j-subinterval for j < k and marker positions of each 1-subinterval; (b) the first ET ((m − n)gk ) − T level (k − 1)-free positions, called the level k-filling positions; (c) the following T level (k − 1)-free positions, called the level k-special positions; (d) the remaining (k − 1)-free positions in [n, m[, called the level k-free positions. ˜ ∈ AZ such Definition 4.13. A level k-modification of α ∈ Σ∗ is a sequence α that, for each k-interval, the restriction of α ˜ to (i) marker or j-filling positions for j ≤ k or j-special positions for j < k coincides with that of α; (ii) k-special positions are either ∅T −1 1 or ∅T −1 2; (iii) k-free positions is a word from W(s, T ). A straightforward induction gives: Lemma 4.14. If β is a k-modification of α, then Sk (β) = Sk (α). Definition 4.15. A synchronized T -block in [n, m[ with respect to some α ∈ Σ∗ is an interval [p − (a + 1)T, p − aT [⊂]p, p [ where a ∈ Z and [p, p [ is some 1-interval contained in [n, m[.

ALMOST BOREL STRUCTURE

31

4.6. Coding Map. Proposition 4.16 (All-level coding). There exist completely positive Borel sets B1 ⊃ B2 ⊃ . . . and almost Borel homomorphisms ψ1 , ψ2 , · · · : X → Σ∗ with the following properties. For almost all x ∈ X, let gk := gk (x) and αk := ψk (x) for each k ≥ 1 and consider the coding structure from Def. 4.12. For all j = 1, . . . , k and almost all x ∈ X: (1) αk is a j-modification of αj . More precisely αk and αk−1 only differ at k-filling and (k − 1)-special positions; (2) n ∈ Sj (αk ) ⇐⇒ S n x ∈ Bj and − inf Sj (α) = sup Sj (α) = ∞. For each .k-interval [n, m[ in αk : (3) the (k−1)-special positions are occupied in αk by ∅T −1 2 in the first (k−1)subinterval and by ∅T −1 1 in the other (k − 1)-subintervals; (4) the restriction α to the k-filling positions in [n, m[ is the word ck,m−n (S n x). Moreover, this word determines the Pk , [n, m[-name of x given its Pk−1 , Zname; (5) the number of k-free positions in [n, m[ is greater than (m − n)(1 − g1(x) − · · · − gk (x)). Moreover the set of those positions is a disjoint union of synchronized T -blocks, each one carrying the word ∅T −1 1; (6) αk and any of its k-modification belong to Σ∗ . This somewhat technical statement (useful for its proof by induction) will yield: Corollary 4.17. There exists an almost Borel homomorphism ψ : X → Σ∗ with the following injectivity property. There is a null set X0 such that, for all x, y ∈ X \ X0 , if Mx = My and ψ(x) = ψ(y), then x = y. Proof of Corollary 4.17. To define ψ : X → Σ∗ , we restrict to a full set X1 on which all ψk (x) are well-defined and belong to Σ∗ . Let p ∈ Z and x ∈ X1 . We claim that k → ψk (x)|p is constant for k ≥ k(x, p) for some integer k(x, p). Indeed, ψk+1 (x)|p = ψk (x)|p implies that p is (k − 1)special or a k-filling. But this can happen only once for a given p. Thus we can set ψ(x)|p := ψk(x,p) (x)|p . To see that ψ(x) ∈ Σ∗ , consider any 1-interval [a, b[ defined by ψ1 (x). Now, ψ(x)|[a, b[= ψK (x)|[a, b[ for K = maxa≤p 1 − Mx (Bk ) }. Nk+1 (x) := min{n ≥ max(N 10 By Theorem 4.4, Nk+1 is finite for almost all x. It is invariant, Borel and takes countably many values, hence we can assume it to be constant (similarly as in the proof of Lemma 4.11). Proposition 4.7 for n = Nk+1 gives a tower with basis B ⊂ X height Nk+1 and Mx -measure at least 1 − Mx (Bk )/10. It follows that there is an integer 0 ≤ q < Nk+1 (depending only on Mx ) such that 9 Mx (Bk ∩ Gk+1 (Nk+1 ) ∩ S q (B)) > Mx (Bk ) > 0. 10Nk+1 As before we can assume q to be constant and set: Bk+1 := Bk ∩ Gk+1 (Nk+1 ) ∩ S q (B). To conclude, observe that x, S x ∈ Bk+1 implies that n ≥ Nk+1 ≥ Nk so x ∈ Gk+1 (Nk+1 ) ∩ Gk (Nk ) ⊂ Gk+1 (n) ∩ Gk (n), proving eq. (4.6).  Proof of Proposition 4.16. We will need the following encodings generalizing c1, from eq. (4.3). For k,  ≥ 1, we define ck, : X → W (s, T ) by: (4.7)

ck, (x) = c[gk (x)/T ]−1 ◦ iPk ,Pk−1 , (x) (P0 := {X}).

Here cp , p ∈ N is the same as in (4.3). In particular ck, (x) characterizes (Pk ) (x) given (Pk−1 ) (x) if iPk ,Pk−1 , (x) < s[gk (x)/T ]−1 . We will also need the sets B1 ⊃ B2 ⊃ . . . from Lemma (4.18) with parameters: ¯k (x) = 3T /k + 1. (4.8) N We assume by induction that, for all 1 ≤ j ≤ k, an almost Borel map ψj : X → Σ∗ has been defined satisfying all the claims (1)-(6) (note that these hold for k = 1 by Proposition 4.10). We build αk+1 as a k-modification of αk . We call (provisionally) *-intervals the integer intervals [n, m[ such that n < m are consecutive times of visit of x to Bk+1 . As Bj ⊂ Bk+1 , any *-interval is a union of j-subintervals for 1 ≤ j ≤ k (see Claim 2). Inside each *-interval [n, m[: (M1) we write the word ∅T −1 2 into the k-special positions in the first k-subinterval of the *-interval (and keep ∅T −1 1 in the k-special positions in the other k-subintervals). (M2) we write the word ck+1,m−n (S n x) of length ET (gk+1 (x)(m − n)) − T into the first k-free positions in [n, m[. This is possible: for (M1), this is clear; for (M2), we use Claim (5) for level k: each k-subinterval Ii contains more than |Ii |(1 − g1 (x) − · · · − gk (x)) k-free positions. Summing over those subintervals, we see that [n, m[ contains at least (m − n)(1 − g1 (x) − · · · − gk (x)) level k-freepositions. But (M2) requires only (less than) (m − n)gk+1 (x), which is less since k gk (x) < 1. We now check that this αk+1 satisfies the claims for k + 1. Claim (1) obviously holds from the construction and the fact that a k-modification of a j-modification is still a j-modification. Claim (2) holds for j ≤ k because of the same Claim for k. For j = k + 1, let S n x ∈ Bk+1 ⊂ Bk . By the induction hypothesis, n ∈ Sk (αk ), but this is the

ALMOST BOREL STRUCTURE

33

same as Sk (αk+1 ) by Lemma 4.14. Since Bk+1 ⊂ Bk , the k-interval I starting at n is a subinterval in a unique *-interval. By the point (M1), ∅T −1 2 appears at the k-special positions in I: n ∈ Sk+1 (αk+1 ). The converse follows from the same point (M1). Claim (2) holds. In particular, *-intervals coincide with (k + 1)-intervals, see Def. 4.12. Now, Claims (3) and the first half of (4) are obvious consequences of the modifications (M1)-(M2). Set x := S n x and  := m − n. For the second half of (4) it suffices to show: (4.9)

iPk+1 ,Pk , (x ) < sgk+1 (x)/T −2 < s[gk+1 (x)/T ]−1

By construction, x , S  x ∈ Bk+1 implies x ∈ Gk () ∩ Gk+1 (). Therefore,  Mx (Pk+1 (x )) exp (−hk+1 (x) − k+1 /3)  ≥ exp (−hk (x) + k /3))  Mx (Pk (x ))

Hence, using Corollary c.SMBcoding:   log s hPk+1 |Pk (x) + k+1 /3 + k /3 iPk+1 ,Pk , (x ) ≤ exp  T log s/T ≤ s(/T )(gk+1 (x)−(2/3)(k+1 +k )) < sgk+1 (x)/T −2 , since  ≥ Nk+1 (x) > 3T /k+1 , concluding the proof of (4.9) and thus of Claim (4). To prove (5), observe that the number of k + 1-free positions in [m, n[ (obtained by substracting the k + 1-special and k + 1-filling positions) is larger than: (m − n)(1 − g1 (x) − · · · − gk (x)) − (T + ET (gk+1 (x)(m − n)) − T ) > (m − n)(1 − g1 (x) − · · · − gk (x) − gk+1 (x)). Furthermore, as we wrote words from S(s, T ) into the synchronized T -blocks constituting the k-free positions, the (k + 1)-free positions are still a union of T -blocks carrying the word ∅T −1 1. Thus (5) holds for k + 1. For Claim (6), we observe that αk ∈ Σ∗ and that the successive changes to produce αk+1 only involves replacing synchronized T -blocs by elements of S(s, T ). But this operation leaves Σ∗ unchanged. This also applies to any k+1-modification. Claim 6 holds for k + 1. The induction is complete.  4.7. Proof of Theorem 4.2. We collect out results to prove the Borel version of Krieger’s embedding theorem. Let φ : X → {0, 1}N be the invariant, almost Borel map from Proposition 4.3. As log s/T > h(S) we can apply Proposition 4.17 and get an equivariant, almost Borel map ψ : X → Σ∗ . To see that these maps satisfy the claims of Theorem 4.2, it is enough to remark that, for all x, y outside of an almost null set, φ(x) = φ(y) implies that M (x) and M (y) are well-defined and equal. Therefore, (φ × ψ)(x) = (φ × ψ)(y) implies that M (x) = M (y) and ψ(x) = ψ(y) and thus x = y. 5. Hochman’s Embedding Theorem We deduce Hochman’s Theorem 1.2 from the Borel variant of Krieger’s embedding theorem.

34

´ OME ˆ JER BUZZI

5.1. Embedding into Σ∗ (s, T, N ) with T1 log s > h(X). We first prove the Embedding Theorem assuming a uniform entropy gap and embedding into a specific class of SFTs.  (s, T, N ) be Definition 5.1. Let Σ (s, T, N ), Σ∗ (s, T, N ), W  (s, T, N ), and W defined as Σ(s, T, N ), Σ∗ (s, T, N ), etc. in Def. 4.1, but replacing T (s, T ) by T  (s, T ) := {∅T −2 σ|∅r : 0 ≤ r < T and σ = ∅ or 1}. We often write Σ∗ instead of Σ∗ (s, T, N ). Note that Σ∗ is invariant but not closed. It is included in the mixing SFT Σ (s, T, N ) with htop (Σ (s, T, N )) ≤ htop (Σ(s, T, N )) + N1 log 2. 

Theorem 5.2. Given a Borel system (X, S), there exists an almost Borel embedding Ψ : (X, S) → Σ∗ (s, T, N ) for all (s, T, N ) such that: T1 log s > h(S) and N is large enough. To deduce this from Theorem 4.2, we will replace the invariant map φ : X → {0, 1}N by an equivariant modification of ψ : X → Σ∗ obtained by putting symbols 1 just before the symbols | at times of visit to a carefully built Borel subset B∗ of the Borel set ψ −1 ([∅| ]). 5.2. Borel construction of a set with given measure. The above subset B∗ will be defined using the following: Lemma 5.3. Let Y be a standard Borel space and let P (Y ) be the set of atomless Borel probability measures on Y . Then P (Y ) is a Borel subset of P(Y ) and there exists a Borel function F : [0, 1] × P (Y ) × Y → {0, 1} such that, for all t ∈ [0, 1], μ ∈ P (Y ), μ({y ∈ X : F (t, μ, y) = 1}) = t. Remark 5.4. Hochman uses another idea which avoids such an explicit construction. Proof. Recall the generating sequence of finite Borel partitions P1 , P2 , . . . . To begin with, observe that μ ∈ P(Y ) is atomless if and only if there is some  > 0 such that for every k, Pk has an element with measure larger than . As μ → μ(A) is Borel for any Borel A ⊂ X, P (Y ) is Borel. For each k ≥ 1, let σ(Pk ) be the finite algebra generated by Pk . Endow σ(Pk ) with some total order. For each μ ∈ P (Y ), t ∈ [0, 1], let Aμn ⊂ Y , n ≥ 0, be the μ,t sequence of Borel subsets defined inductively as: Aμ,t 0 = Y and, for all n ≥ 1, An is the first element A of σ(Pn ) (for its chosen ordering) such that (1) A ⊂ Aμ,t n−1 ; (2) μ(A) ≥ t; (3) μ(A) ≤ μ(B) for all B ∈ σ(Pn ) satisfying (1) and (2). μ,t The consideration of A = Aμ,t n−1 and the finiteness of σ(Pn ) show that An as above always exists. As μ → μ(A) is Borel for each A ∈ σ(Pn ), there is a finite Borel partition of [0, 1] × P (Y ), on each element E of which, Aμ,t n is constant. Hence each Fn : (x), is Borel in E × Y ⊂ [0, 1] × P (Y ) × Y . E × Y → {0, 1}, (t, μ, x) → 1Aμ,t n Therefore, Fn : [0, 1] × P (Y ) × Y → {0, 1} is Borel. Define F : [0, 1] × P (Y ) × Y → {0, 1} by F (t, μ, x) := inf n≥1 Fn (t, μ, x). F is Borel as a pointwise limit of Borel functions. Let Aμ,t := {x ∈ X : F (t, μ, x) = 1}.

ALMOST BOREL STRUCTURE

35

Fix some (t, μ) ∈ [0, 1] × P (Y ). By dominated convergence, μ(F (t, μ, ·)) = limn→∞ μ(Fn (t, μ, ·)) ≥ t. Assume by contradiction that μ(Aμ,t n ) ≥ t +  for some  > 0 and all n ≥ 1. Doob’s martingale convergence theorem implies that, for all large n, there is Bn ∈ Pn with μ(Bn ∩ Aμ,t ) > 12 μ(Bn ) > 0. The measure μ being μ,t atomless, μ(Bn ) <  for all large n. Thus, Aμ,t n \ Bn ∈ σ(Pn ) and t ≤ μ(An \ Bn ) ≤ μ,t μ,t  μ(An ) − μ(Bn )/2, contradicting the definition of An . 5.3. Proof of the restricted theorem. Let ψ : X → Σ∗ and φ : X → {0, 1}N be the two almost Borel maps from Theorem 4.2. Let t : {0, 1}N → [0, 1] be the map defined by:  t(x) := 2 · 3−n φ(x)|n . n≥1

Exercise 5.5. Show that t : X → [0, 1] is almost Borel, injective and invariant. Lemma 5.3 applied to the Borel space Y = B1 gives a Borel function F1 : [0, 1] × P(Y ) × Y → {0, 1}. We define F : [0, 1] × Perg (S) × X → {0, 1} by  F1 (t, μ(· ∩ B1 )/μ(B1 ), x) if x ∈ B1 F (t, μ, x) := 0 otherwise, so μ(F (t, μ, ·)) = μ(B1 ) · t. This is well-defined since B1 is completely positive. Let B∗ := {x ∈ X : F (t(x), M (x), x) = 1}. Note that Mx (B∗ ) = t(x)Mx (B1 ) for almost all x ∈ X since, for any μ ∈ Perg (S) and μ-a.e. x, M (x) = μ and t(x) is equal to some constant t∗ , hence μ(B∗ ) = μ(F (t∗ , μ, ·)). Let φ : X → {0, 1}N and ψ : X → Σ∗ given by Theorem 4.2. Let ψ  : X → Σ∗ be defined by:  1 if S −p+T −2 x ∈ B∗ ψ  (x)|p = ψ(x)|p otherwise. Observe that ψ  is Borel because each coordinate is. It is equivariant because ψ is. Also, ψ  (x) ∈ Σ∗ since ψ(x) ∈ Σ∗ and the modifications only turns symbol ∅ into symbol 1 just before symbol | . It remains to prove the injectivity of ψ  . It suffices to show that ψ  (x) determines t(x), since φ × ψ is injective by Theorem 4.2. By Birkhoff’s ergodic theorem, for almost all x ∈ X, the following limits 1 1 k+T k+T lim {0 ≤ k < n : ψ  (x)|k+T lim {0 ≤ k < n : ψ  (x)|k+T −2 = 1| } and n→∞ −1 = | } n→∞ n n exist and are equal to Mx (B∗ ) = t(x)Mx (B1 ) and Mx (B1 ). As Mx (B1 ) > 0, this shows that ψ  (x) determines t(x) and hence Mx . Theorem 5.2 follows. 5.4. Embedding into a given mixing SFT. The next step is: Lemma 5.6. Let Σ be a mixing SFT. Let s ≥ 2, T ≥ 1 be integers with T1 log s < h(Σ). For all large enough N , there is an almost Borel embedding of Σ∗ (s, T, N ) into Σ. The following is a variant of a standard tool of symbolic dynamics (see [29]): Lemma 5.7. Let X be a one-step mixing SFT with  htop (X) > 0. If h < htop (X), there exist a finite X-word w defining Xw := n∈Z σ −n (X \ [w]X ) and a positive integer M such that: (1) no two occurences of w in X can overlap, i.e., for any 0 < k < |w|, [w]X ∩ σ −k [w]X = ∅;

36

´ OME ˆ JER BUZZI

(2) for each pair of words u, v in Xw of lengths ≥ |w|, there are M -words s, t on X such that uswtv is a word on X and, for 0 ≤ k < |uswtv|, [uswtv]X ∩ σ −k [w]X = ∅ =⇒ k = |us|. (3) Xw is a mixing SFT; (4) htop (Xw ) > h; (we call a word satisfying (1) and (2) a marker.) Proof. As htop (X) > 0, X is not reduced to a single periodic orbit and there must exist a = a0 . . . ap−1 , a := ap . . . ap−1 for 1 ≤ p ≤ p, b = b0 . . . bq−1 such that, setting , aaba is an X-word and a0 , . . . , ap−1 , b0 , . . . , bq−1 are pairwise distinct symbols. For a positive multiple m of |ba a2 |, let: 

 2

w := ba (ba a)m and w ˜ := ba (ba a2 )m|ba a|/|ba a | . Note, |w| = |w|. ˜ We claim that for m large enough, w has the required properties. We prove (1) by contradiction. We must have: wk = w0 = b0 , so: 0 < k ≤ |w| − |ba a| so ba a0 = ba b0 . As a0 = b0 , this is a contradiction. Similarly, (∗)

˜ = [w] ˜ ∩ σ −k [w] = [w] ˜ ∩ σ −k [w] ˜ = ∅ for all 0 < k < |w|. [w] ∩ σ −k [w]

The mixing of X gives an integer M1 such that for any two symbols α, β of X and any n ≥ M1 there is an X-word s of length n such that αsβ is an X-word. We assume that m is large enough so that |w| > M := M1 + 2|a| + |b|. To prove Claim (2) let u, v be Xw -words of lengths ≥ |w|. We consider overlaps involving v. Note v ∈ Xw so k < |uswt|. Let j ≥ 0 be the largest integer such that v0 . . . vj−1 = w−j . . . w−1 First case: if j = 0, then no overlap is possible so any v as above fulfills the Claim. Second case: if j < |a (ba a)m | and v0 = bi , we set c = a2 b0 . . . bi−1 (c := a2 if i = 0) so cv is an X-word. Pick a word t˜ such that wt˜c is a X-word. The same argument as in (1) shows that w cannot overlap with tv for t := t˜c. Third case: if j < |a (ba a)m | and v0 = ai , this is entirely similar as above using c = aa0 . . . ai−1 . Fourth case: if j ≥ |a (ba a)m |, remark that v ∈ Xw , gives a word c of length |w| − j ≤ |b| such that cv0 . . . vj−1 = w. One concludes as before. Thus, for all v, one can find t excluding any overlap with k > |usw|. |us| ≤ k < [usw| is forbidden by Claim 1. Similarly, one can find s depending on u excluding any overlap with k < |us|. Claim (2) is proved. We turn to Claim (3). Xw is defined from X by forbidding a single word, hence it is a (possibly multi-step) SFT. The mixing will follow if, for any two Xw -words u, v and integer n ≥ 2M + |w|, there is a word t of length n such that utv is an Xw -word. Claim (2) gives a word swt with length n such that u(swt)v is a X-word where w only appear in the obvious place. Now u(swt)v ˜ is a X-word in which w cannot appear: otherwise, by Claim (2) it would overlap w, ˜ but this is not possible by the property (*) above. To prove Claim (4), associate to each X-word u of length n a Xw -word by replacing each occurence of w in u with a copy of w ˜ and trimming the first and last |w| symbols. The non-overlapping properties of w and w ˜ with respect to themselves and with respect to each other show that this map is at most #A2|w| · 2n/|w| -to-1.

ALMOST BOREL STRUCTURE

37

It follows that, for |w| large enough, htop (X) ≤ htop (Xw ) +

log 2 < htop (Xw ) + h. |w| 

Proof of Lemma 5.6. Lemma 5.7 applied to Σ and h2 gives a marker w and an integer M . Since htop (Σw ) > h2 , there exist numbers C1 ,  > 0 and L1 < ∞ such that, for any  ≥ L1 , there exist N () ≥ C1 e(h2 +) distinct Σw -words of length : u,1 , . . . , u,N () . By Claim (2) of Lemma 5.7, there are pairs t,i , s,i of Σ-words of length M such that: ws,i u,i t,i w is a Σ-word for i = 1, . . . , N (). There is N∗ < ∞ such that for all  ≥ N∗ , the following words of length  are pairwise distinct: v ,i := ws−M,i u−M,i t−M,i ,

i = 1, . . . , [eh2  ],

as N (−2M −|w|) ≥ C1 e−h2 (|w|+2M ) e ·eh2  . For N ≥ N∗ , we define the embedding i : Σ∗ (s, T, N ) → Σ as follows. For α ∈ Σ∗ (s, T, N∗ ), we define: = ∅T −2 σ| , σ = ∅, 1}. {· · · < p−1 < 0 ≤ p0 < p1 < . . . } := {p ∈ Z : α|p+T p p  (s, T ). The number of words of belongs to W For each n ∈ Z, the word α|pn+1 n [/T ]−1 length  in this latter set is at most: 2s ≤ s/T . Indeed, if  = qT + r with 0 ≤ r < T , q ∈ N ∪ {0}, then the first T + r symbols are ∅T −2 σ| ∅r with σ = ∅ or 1 and the remainder is a concatenation of words from S(s, T ). For each  ≥ N , we fix an enumeration

 (s, T ) ∩ A {w1 , . . . , wn() } = W and we define i(α) := β where β|pn+1 := v pn+1 −pn ,j if α|pn+1 = wj . Obviously the n n  map i is Borel and equivariant. To see that i : Σ∗ (s, T, N ) → Σ is injective, it suffices to see that the marker w occurs in i(α) exactly at positions pn , n ∈ Z. This follows from Claim (2) from Lemma 5.7.  p

p

5.5. General case. We now prove Hochman’s embedding Theorem 1.2. Recall the class B(h) of Borel systems from Sec. 3.1. Proof of Theorem 1.2. By Lemma 3.6, it suffices to show that, for any h < h(Σ), any Borel system (X, S) from B(h) has an almost Borel embedding into Σ. ¯ > h. Fix integers ¯ ⊂ Σ with htop (Σ) By eq. (2.5), there is a mixing SFT Σ 1 ¯ s ≥ 2, T ≥ 1 such that h < T log s < htop (Σ). Theorem 5.2 gives an almost Borel embedding ψ : X → Σ∗ (s, T, N ) for all large N . Proposition 5.6 gives an almost ¯ for all large N . Thus, picking one such large Borel embedding i : Σ∗ (s, T, N ) → Σ, integer N , Ψ := i ◦ ψ : X → Σ is an almost Borel embedding.  6. Katok’s theorem with periods In this section, we recall the classical theorem by Katok approximating nontrivial hyperbolic measures by horseshoes and supplement it by relating the periods of the measures and that of the horseshoes. In particular a mixing hyperbolic measure is approximated by mixing horseshoes. Combining this with Hochman’s Theorem 1.2, we will obtain Theorem 1.7 and then Corollaries 1.8, 1.10, and 1.11.

´ OME ˆ JER BUZZI

38

6.1. Periods of an ergodic system. Recall the notion of periods and maximal period of an ergodic system from (1.1). Exercise 6.1. An integer p ≥ 1 is a period of (T, μ) if and only if there exists Borel subsets X0 , . . . , Xp−1 such that T Xi = Xi+1 (here Xp := X0 ) and {X0 , . . . , Xp−1 } is a partition of X (both assertions modulo μ). Check that (T, μ) totally ergodic implies that only 1 is a period. Prove the converse. Exercise 6.2. Check that any positive divisor of a period of (T, μ) is again a period of (T, μ). Give examples of ergodic measure-preserving systems (T, μ), the set of periods of which coincides with (1) {2n : n ∈ N ∪ {0}}; (2) {2p 3q : p, q ∈ N ∪ {0}, q ≤ 10}. Characterize the sets of periods of ergodic systems among the subsets of N. 6.2. Katok’s theorem. Recall the definition of a horseshoe from Sec. 2.6 as an embedding of some SFT. Theorem 6.3. Let f be a C 1+ -diffeomorphism of a compact manifold. Let μ be an ergodic, aperiodic, hyperbolic invariant probability measure and let  > 0. Then there exists a horseshoe embedding an SFT S,14 such that (1) htop (S) > h(f, μ) − ; (2) the period of S is a period of (f, μ). Moreover, if μ is totally ergodic, then the SFT S can be chosen to be mixing. This theorem is due to Katok ([23] for the existence of a horseshoe, [24] for the approximation in entropy), except for the new additional information (2) about the period of the horseshoe, which we now explain. Let (S, μ) be an ergodic measure-preserving system. Let A be a Borel set with μ(A) > 0 and let τA : X → {1, 2, . . . , ∞} be the time to enter A: τA (x) := inf{n ≥ 1 : S n x ∈ A}. Lemma 6.4. If τA (A) ⊂ pN, then p is a period of (S, μ). If (S, μ) has a period q, then any set A of positive measure contains a subset B of positive measure such that the greatest common divisor gcd(τB (B)) is a multiple of q and a period of (S, μ). In particular, if q is the maximum period of (S, μ) then q = gcd(τB (B)). Proof. For 0 ≤ i < p, let Ai := {x ∈ X : τA (x) ∈ −i + pZ}. A := {A0 , . . . , Ap−1 } is a partition of X modulo μ, since τA is finite μ-a.e. on X by ergodicity. If 0 ≤ i < p − 1, then S(Ai ) ⊂ X \ A so τA (Sx) = τA (x) − 1 for all x ∈ Ai and S(Ai ) = Ai+1 . Now, S(Ap−1 ) ⊂ A0 ∪ A, but by assumption A ⊂ A0 , so S(Ap−1 ) = A0 . Thus, A is a p-cyclically moving partition modulo μ and p is a period of μ. The first claim of the Lemma is proved. Let q be a period of (S, μ), so there exists a q-cyclic partition modulo μ, i.e., a Borel set X0 ⊂ X such that S q (X0 ) = X0 and {X0 , S(X0 ), . . . , S q−1 X0 } are disjoint with union of measure 1. For some 0 ≤ i < q, B := A ∩ S i X0 has positive measure. Obviously τB (B) ⊂ qN∪{∞}. By removing points that don’t return infinitely often to B (a μ-negligible subset), we exclude the infinite value. Let p := gcd(τB (B)). Obviously, p is a multiple of q and τB (B) ⊂ pN, so p is a period by the first part of this lemma. The second claim of the Lemma is proved. The last claim is now immediate.  14 Note

that even if h(f, μ) = 0, S must have positive entropy since it is a horseshoe.

ALMOST BOREL STRUCTURE

39

Proof of Claim (2) in Theorem 6.3. We consider Theorem S.5.9 and its proof in [24, pp. 698-700]. We will use the notations from this text. The horseshoe is constructed by considering a Pesin set Λδ (a possibly non-invariant compact set with good hyperbolicity estimates) and finding a large enough set Dm of wellseparated points that return after m iterates to the Pesin set, near to themselves, for arbitrarily large times m. The horseshoe is then constructed as the set of orbits that shadow arbitrary concatenations of the previously mentioned orbit segments. In this way we have an SFT Σ defined by those concatenations and a continuous factor map π : Σ → X defined by this shadowing. According to [24, pp. 698-700], π(Σ) is the announced horseshoe with entropy log #Dn /n. We concentrate on the case h(f, μ) > 0 as the following arguments are easily adapted to the situation where μ is only assumed to be aperiodic (and then #Dn ≥ 2 is enough). The lower bound (close to exp nh(S, μ)) for #Dn is obtained from a formula for the entropy also established in [23]. This formula shows that if μ is an ergodic measure, δ > 0, and if N (n, , δ) is the minimal number of Bowen-Dinaburg (, n)balls, needed to cover a set of measure ≥ 1 − δ, then 1 h(S, μ) = lim lim inf log N (n, , δ). →0 n→∞ n We apply this not to the Pesin set Λδ but to a subset with positive measure. Lemma 6.4 shows that one can find a set of positive measure B ⊂ Λδ such that p := gcd(τB (B)) is a period of (f, μ) (the maximum period of μ if it exists). Applying Katok’s construction, we get an SFT with a large period m. By its construction, B contains finitely many points whose return times have greatest common divisor p. We add them to the previously mentioned set Dm . The entropy of the image of the resulting SFT, Σ, can only increase. To conclude, we use the following result on continuous factors of SFTs from [5]: Σ contains another SFT Σ with the same period and htop (f, Σ ) arbitrarily  close to h(f, π(Σ)) such that π|Σ is injective. 6.3. Diffeomorphisms with hyperbolicity and mixing. We prove Theorem 1.7 about diffeomorphisms with hyperbolicity and mixing and then its Corollaries. Proof of Theorem 1.7. We assume htop (T ) > 0, since otherwise point (1) holds with M1 = ∅ and M2 = M and point (2) is then trivial (the periodic-Bernoulli systems of zero entropy are the finite cyclic permutations). Recall from Propositions 2.9 and 2.12 the Borel maps M : X → Perg (T ) and h : P(T ) → R. Hence M2 := (h ◦ M )−1 ({htop (T )}) is Borel and carries exactly the m.m.e.’s. We are going to show that M1 := M \M2 is strictly universal for B(htop (T )) (see Sec. 3.1). By Corollary 3.5, this will imply that (M1 , T ) is almost Borel isomorphic to a mixing Markov shift, say Σ0 , without m.m.e. Such Markov shift exists by Ex. 2.14. Obviously (M1 , T ) ∈ B(htop (T )). By Lemma 3.6, it suffices to show that M1 is B(h)-universal for h < htop (T ), arbitrarily close. For such h, let μ ∈ Perg (T ) be totally ergodic and hyperbolic with h(T, μ) > h as in assumption (#). Theorem 6.3 gives a mixing horseshoe H ⊂ M with htop (T |H) > h. By Theorem 1.2, H is B(h)-universal. This completes the proof of point (1).

´ OME ˆ JER BUZZI

40

We turn to point (2). First (§) is necessary since any ergodic measure of a Markov shift is carried by one of its countably many irreducible component, that each irreducible component carries at most one m.m.e. and that this measure, if it exists, is period-Bernoulli. Conversely, we build an isomorphic Markov shift assuming (§). For each p ≥ 1, let Ep be the (empty, finite or countably infinite) set of m.m.e.’s that are p-Bernoulli. By Ex. 2.14, one can find Σp a positive recurrent15 Markov shift with period p and ˆ p := {x ∈ Σp : h(σ, M (x)) = htop (T )}. We claim that T is entropy htop (T ) and Σ almost Borel isomorphic to the Markov shift:  Σp × (Ep , Id). Σ = Σ0 p≥1

where Σ0 is the mixing Markov shift introduced above. Note that Σ0 is strictly B(htop (T ))-universal. Indeed, Σ0 is almost Borel isomorphic to Σ0 × (N, Id) (see Ex. 3.7) and Σ0 ˆ p is almost Borel isomorphic to Σ0 . Hence (≡ representing almost Borel Σp \ Σ isomorphisms),   ˆ p ) × Ep Σ ˆ p × Ep (Σp \ Σ Σ ≡ Σ0 × N p≥1

≡ Σ0



ˆ p × Ep Σ0 × E p Σ



p≥1

≡ Σ0



ˆ p × Ep ≡ M1 M2 ≡ T, Σ

p≥1

using Ornstein theory in the step before the last.



Proof of Corollary 1.8. Let T be a C 1+ -diffeomorphism of a compact surface with htop (T ) > 0 and a mixing m.m.e. μ∗ . Ruelle’s inequality implies that μ∗ is hyperbolic (see Sec. 2.6). Also the m.m.e.’s are periodic-Bernoulli and countably many according to Sarig [38]. Thus Theorem 1.7 shows that T is almost Borel isomorphic to a Markov shift. The data in Corollary 1.8 are clearly invariant. For the converse, observe that these data determine the Markov shift Σ built in the proof of Theorem 1.7.  Proof of Corollary 1.10. Let T = Ha,b be a H´enon-like map for a good parameter (a, b) in the sens of Berger [3]. According to that work, T has a unique m.m.e. which is Bernoulli and hyperbolic. Restricting to a bounded, open forward invariant set, we can apply Corollary 1.8. We obtain an almost Borel isomorphism to a positive-recurrent, mixing Markov shift with entropy h(T ).  Proof of Corollary 1.11. According to Theorem 1.2 of [13], these diffeomorphisms are entropy-conjugate to the initial Anosov diffeomorphism. Hence they have a unique m.m.e. and this m.m.e. is Bernoulli. This m.m.e. is also hyperbolic by the proof of the injectivity of the factor map denoted by π in the end of Sec. 6.2 of [13]. One concludes as in the proof of Corollary 1.10.  15 This

means that Σp has some (and therefore exactly one) m.m.e., see Sec. 2.5.

ALMOST BOREL STRUCTURE

41

7. Diffeomorphisms beyond the mixing case In this section we extend the analysis of diffeomorphisms of Theorem 1.7, still relying only on Katok’s and Hochman’s theorems. We use the notion of an unionentropy-periodic universal part from [5] (which we will not re-prove here). 7.1. The universal part. Recall from Sec. 6.1 the set of periods per(S, μ) of an ergodic system (S, μ). Definition 7.1. A measure ν ∈ Perg (S) entropy-period dominates a measure μ ∈ Perg (S) if (1) per(S, ν) ⊂ per(S, μ); and (2) h(S, ν) > h(S, μ). Theorem 7.2. Any C 1+ -diffeomorphism T of a compact manifold M , contains a Borel subsystem M1 which: • carries all ergodic measures which are entropy-period dominated by some hyperbolic measure; • is almost Borel isomorphic to a Markov shift. Proof. Recall the following from [5]. Any Borel system such as (M, T ) has a subsystem (MU , TU ) called its union-entropy-period universal part and a sequence uT : N → [0, ∞] called its universality sequence such that (Σ0t,p denotes any irreducible Markov shift with entropy t and period p and no m.m.e.): uT (p) := sup{t > 0 : ∃Σ0t,p that almost Borel embeds into (M, T )}. with the following properties: • (MU , TU ) is almost Borel isomorphic to a Markov shift; • it carries all μ ∈ Perg (S) such that h(T, μ) < uT (p) for some p ∈ per(T, μ). We set M1 := MU . Now, let μ ∈ Perg (T ) be entropy-period dominated by a hyperbolic ergodic measure ν. Applying Katok’s Theorem 6.3 to (T, ν) yields a continuous embedding into (M, T ) of some irreducible SFT Σ with period p ∈ per(T, μ) and htop (Σ) > h(T, μ). It follows that h(f, μ) < uT (p). Thus  μ(M1 ) = 1. 7.2. Proof of Theorem 1.14. The statement is in term of the entropyperiod-maximal measures, which generalize m.m.e.’s (see Def. 1.12) (Note, that μ is entropy-period-maximal if and only if it is not dominated by any other measure). Exercise 7.3. Let Σ+ t,p be an irreducible Markov shift which has period p, entropy t and is positive recurrent. Determine the entropy-period-maximal measures + + + of Σ = Σ+ 1,1 ∪ Σ2,2 ∪ Σ2,3 ∪ Σ6,4 . Proof of Remark 1.13. This is a direct consequence of Katok’s theorem: any aperiodic hyperbolic ergodic measure with zero entropy is dominated by another measure which is hyperbolic with positive entropy.  We deduce Theorem 1.14 from Theorem 7.2. The latter yields a Borel subsystem M1 , almost Borel isomorphic to a Markov shift. Propositions 2.9 and 2.12 show that M∗ := {x ∈ M : h(T, M (x)) < h0 } is a Borel subset such that μ(M∗ ) = 1 if and only if h(T, μ) < h0 . We set M0 := M∗ \M1 and M2 := M \(M0 ∪M1 ). Obviously M = M0 M1 M2 is an invariant Borel partition of M and claims (1) and (2) are clear.

42

´ OME ˆ JER BUZZI

Let μ ∈ Perg (T ) with μ(M2 ) = 1. As μ(M∗ ) = 0, h(T, μ) ≥ h0 . If μ is dominated by some measure ν, then h(T, ν) > h(T, μ) ≥ h0 hence ν is hyperbolic, contradicting μ(M1 ) = 0. Therefore μ is not dominated by any measure: it is entropy-period-maximal. Theorem 1.14 is proved.

References [1] R. L. Adler and B. Marcus, Topological entropy and equivalence of dynamical systems, Mem. Amer. Math. Soc. 20 (1979), no. 219, iv+84, DOI 10.1090/memo/0219. MR533691 (83h:28027) [2] F. B´ eguin, S. Crovisier, and F. Le Roux, Construction of curious minimal uniquely ergodic homeomorphisms on manifolds: the Denjoy-Rees technique (English, with English ´ and French summaries), Ann. Sci. Ecole Norm. Sup. (4) 40 (2007), no. 2, 251–308, DOI 10.1016/j.ansens.2007.01.001. MR2339286 (2008m:37013) [3] P. Berger, Properties of the maximal entropy measure and geometry of H´ enon attractors, preprint arXiv:1202.2822 [math.DS]. [4] C. Bonatti and M. Viana, SRB measures for partially hyperbolic systems whose central direction is mostly contracting, Israel J. Math. 115 (2000), 157–193, DOI 10.1007/BF02810585. MR1749677 (2001j:37063a) [5] M. Boyle and J. Buzzi, The almost Borel structure of surface diffeomorphisms, Markov shifts and their factors, arXiv:1409.7330, submitted. [6] M. Boyle, J. Buzzi, and R. G´ omez, Almost isomorphism for countable state Markov shifts, J. Reine Angew. Math. 592 (2006), 23–47, DOI 10.1515/CRELLE.2006.021. MR2222728 (2006m:37011) [7] H. Bruin, Induced maps, Markov extensions and invariant measures in one-dimensional dynamics, Comm. Math. Phys. 168 (1995), no. 3, 571–580. MR1328254 (96m:58134) [8] J. Buzzi, Intrinsic ergodicity of smooth interval maps, Israel J. Math. 100 (1997), 125–161, DOI 10.1007/BF02773637. MR1469107 (99g:58071) [9] J. Buzzi, Intrinsic ergodicity of affine maps in [0, 1]d , Monatsh. Math. 124 (1997), no. 2, 97–118, DOI 10.1007/BF01300614. MR1462857 (98g:58098) [10] J. Buzzi, Thermodynamical formalism for piecewise invertible maps: absolutely continuous invariant measures as equilibrium states (English, with English and French summaries), Smooth ergodic theory and its applications (Seattle, WA, 1999), Proc. Sympos. Pure Math., vol. 69, Amer. Math. Soc., Providence, RI, 2001, pp. 749–783, DOI 10.1090/pspum/069/1858553. MR1858553 (2003a:37042) [11] J. Buzzi, Ergodicit´ e intrins` eque de produits fibr´ es d’applications chaotiques unidimensionelles (French, with English and French summaries), Bull. Soc. Math. France 126 (1998), no. 1, 51–77. MR1651381 (99g:58072) [12] J. Buzzi, Dimensional entropies and semi-uniform hyperbolicity in: New Trends in Mathematical Physics: Selected contributions of the XVth International Congress on Mathematical Physics, V. Sidoravicius (ed.) (also as arXiv:1102.0612 [math.DS]). [13] J. Buzzi and T. Fisher, Entropic stability beyond partial hyperbolicity, J. Mod. Dyn. 7 (2013), no. 4, 527–552, DOI 10.3934/jmd.2013.7.527. MR3177771 [14] M. Denker, C. Grillenberger, and K. Sigmund, Ergodic theory on compact spaces, Lecture Notes in Mathematics, Vol. 527, Springer-Verlag, Berlin-New York, 1976. MR0457675 (56 #15879) [15] T. Downarowicz, Entropy in dynamical systems, New Mathematical Monographs, vol. 18, Cambridge University Press, Cambridge, 2011. MR2809170 (2012k:37001) [16] E. Glasner and B. Weiss, On the interplay between measurable and topological dynamics, Handbook of dynamical systems. Vol. 1B, Elsevier B. V., Amsterdam, 2006, pp. 597–648, DOI 10.1016/S1874-575X(06)80035-4. MR2186250 (2006i:37005) [17] B. M. Gureviˇ c, Topological entropy of a countable Markov chain (Russian), Dokl. Akad. Nauk SSSR 187 (1969), 715–718. MR0263162 (41 #7767) [18] B. M. Gureviˇ c, Shift entropy and Markov measures in the space of paths of a countable graph (Russian), Dokl. Akad. Nauk SSSR 192 (1970), 963–965. MR0268356 (42 #3254)

ALMOST BOREL STRUCTURE

43

[19] B. M. Gurevich and S. V. Savchenko, Thermodynamic formalism for symbolic Markov chains with a countable number of states (Russian), Uspekhi Mat. Nauk 53 (1998), no. 2(320), 3– 106, DOI 10.1070/rm1998v053n02ABEH000017; English transl., Russian Math. Surveys 53 (1998), no. 2, 245–344. MR1639451 (2000c:28028) [20] M. Hochman, Isomorphism and embedding of Borel systems on full sets, Acta Appl. Math. 126 (2013), 187–201, DOI 10.1007/s10440-013-9813-8. MR3077948 [21] F. Hofbauer, On intrinsic ergodicity of piecewise monotonic transformations with positive entropy, Israel J. Math. 34 (1979), no. 3, 213–237 (1980), DOI 10.1007/BF02760884. MR570882 (82c:28039a) [22] S. Ito, H. Murata, and H. Totoki, Remarks on the isomorphism theorems for weak Bernoulli transformations in general case, Publ. Res. Inst. Math. Sci. 7 (1971/72), 541–580. MR0310195 (46 #9297) [23] A. Katok, Lyapunov exponents, entropy and periodic orbits for diffeomorphisms, Inst. Hautes ´ Etudes Sci. Publ. Math. 51 (1980), 137–173. MR573822 (81i:28022) [24] A. Katok and B. Hasseblatt, An introduction to the Modern Theory of Dynamical Systems, Cambridge University Press, 1985. [25] A. S. Kechris, Classical descriptive set theory, Graduate Texts in Mathematics, vol. 156, Springer-Verlag, New York, 1995. MR1321597 (96e:03057) [26] B. P. Kitchens, Symbolic dynamics, Universitext, Springer-Verlag, Berlin, 1998. One-sided, two-sided and countable state Markov shifts. MR1484730 (98k:58079) [27] W. Krieger, On entropy and generators of measure-preserving transformations, Trans. Amer. Math. Soc. 149 (1970), 453–464. MR0259068 (41 #3710) [28] W. Krieger, On the subsystems of topological Markov chains, Ergodic Theory Dynam. Systems 2 (1982), no. 2, 195–202 (1983), DOI 10.1017/S0143385700001516. MR693975 (85b:28020) [29] D. Lind and B. Marcus, An introduction to symbolic dynamics and coding, Cambridge University Press, Cambridge, 1995. MR1369092 (97a:58050) [30] D. Ornstein, Bernoulli shifts with the same entropy are isomorphic, Advances in Math. 4 (1970), 337–352 (1970). MR0257322 (41 #1973) [31] K. Petersen, Ergodic theory, Cambridge Studies in Advanced Mathematics, vol. 2, Cambridge University Press, Cambridge, 1983. MR833286 (87i:28002) [32] A. Quas and T. Soo, Weak mixing suspension flows over shifts of finite type are universal, J. Mod. Dyn. 6 (2012), no. 4, 427–449. MR3008405 [33] A. Quas and T. Soo, Ergodic universality of some topological dynamical systems, arXiv:1208.3501 [math.DS]. [34] Rodriguez H. F., Rodriguez H. M. A., A. Tahzibi, and R. Ures, Maximizing measures for partially hyperbolic systems with compact center leaves, Ergodic Theory Dynam. Systems 32 (2012), no. 2, 825–839, DOI 10.1017/S0143385711000757. MR2901373 [35] I. A. Salama, Topological entropy and recurrence of countable chains, Pacific J. Math. 134 (1988), no. 2, 325–341. MR961239 (90d:54076) [36] O. M. Sarig, Symbolic dynamics for surface diffeomorphisms with positive entropy, J. Amer. Math. Soc. 26 (2013), no. 2, 341–426, DOI 10.1090/S0894-0347-2012-00758-9. MR3011417 [37] O. M. Sarig, Thermodynamic formalism for null recurrent potentials, Israel J. Math. 121 (2001), 285–311, DOI 10.1007/BF02802508. MR1818392 (2001m:37059) [38] O. M. Sarig, Bernoulli equilibrium states for surface diffeomorphisms, J. Mod. Dyn. 5 (2011), no. 3, 593–608, DOI 10.3934/jmd.2011.5.593. MR2854097 (2012k:37071) [39] J. Serafin, Non-existence of a universal zero-entropy system, Israel J. Math. 194 (2013), no. 1, 349–358, DOI 10.1007/s11856-011-0219-x. MR3047074 [40] S. Shelah and B. Weiss, Measurable recurrence and quasi-invariant measures, Israel J. Math. 43 (1982), no. 2, 154–160, DOI 10.1007/BF02761726. MR689974 (84d:28025) [41] Y. Takahashi, Isomorphisms of β-automorphisms to Markov automorphisms, Osaka J. Math. 10 (1973), 175–184. MR0340552 (49 #5304) [42] D. Vere-Jones, Geometric ergodicity in denumerable Markov chains, Quart. J. Math. Oxford Ser. (2) 13 (1962), 7–28. MR0141160 (25 #4571) [43] D. Vere-Jones, Ergodic properties of nonnegative matrices. I, Pacific J. Math. 22 (1967), 361–386. MR0214145 (35 #4996)

44

´ OME ˆ JER BUZZI

[44] B. Weiss, Measurable dynamics, Conference in modern analysis and probability (New Haven, Conn., 1982), Contemp. Math., vol. 26, Amer. Math. Soc., Providence, RI, 1984, pp. 395–421, DOI 10.1090/conm/026/737417. MR737417 (85j:28027) [45] B. Weiss, Countable generators in dynamics—universal minimal models, Measure and measurable dynamics (Rochester, NY, 1987), Contemp. Math., vol. 94, Amer. Math. Soc., Providence, RI, 1989, pp. 321–326, DOI 10.1090/conm/094/1013000. MR1013000 (90g:28025) ´matiques d’Orsay (CNRS & UMR 8628), Ba ˆt. 425, Universit´ Laboratoire de Mathe e Paris-Sud, 91405 Orsay France E-mail address: [email protected]

Proceedings of Symposia in Pure Mathematics Volume 89, 2015 http://dx.doi.org/10.1090/pspum/089/01489

Lectures on large deviations in probability and dynamical systems Yuri Kifer Abstract. These lectures contain general large deviations results for stochastic processes and dynamical systems, large deviations in averaging and large deviations for nonconventional sums and they demonstrate, in particular, the analogy between large deviations results for certain stochastic processes and dynamical systems.

Introduction Large deviations became an important area of probability during the last forty years and more recently many large deviations results were obtained for various classes of dynamical systems, as well. Large deviations theory was born in probability nand it usually studies asymptotical behavior as n → ∞ of probabilities P { n1 k=1 Xn ∈ I} for a sequence X1 , X2 , . . . of random variables and various  ¯ as n → ∞ holds true sets I. If a kind of law of large numbers n1 nk=1 Xk → X ¯ and X ∈ / I for a closed interval I then the above probabilities tend to zero and sometimes it is possible to show that the convergence is exponentially fast allowing the large deviations theory to come into the picture whose prime goal is to describe the corresponding exponent. In the dynamical systems framework the large deviations theory is supposed to describe deviations from the Birkhoff ergodic theorem. Namely, let F be a transformation  g be a μ-integrable function and n−1 preserving an ergodic measure μ, Sn (x) = k=0 g(F k x). Then n1 Sn (x) → g¯ = gdμ for μ-almost all x, and so if I is a closed interval which does not contain g¯ then μ{x : n1 Sn (x) ∈ I} → 0 as  n → ∞. We can consider also occupational measures ζnx = n1 n−1 k=0 δF k x , where δz is the unit mass at z, which for μ-almost all x converge weakly as n → ∞ to μ. If μ does not belong to a closed set K then μ{x : ζnx ∈ K} → 0 as n → ∞. When the above convergencie is exponentially fast then the theory of large deviations comes into the picture and tries to describe the corresponding exponent. This article consists of three lectures given by the author at the Bernoulli center, EPFL, Lausanne in March 2013 in the framework of the programm ”Hyperbolic dynamics, large deviations and fluctuations”. The first lecture deals with general 2010 Mathematics Subject Classification. Primary 60F10; Secondary 37D35, 34C29, 37C45. Key words and phrases. Large deviations, thermodynamic formalism, averaging, limit theorems, nonconventional sums, Hausdorff dimension, hyperbolic dynamical systems. Partially supported by the ISF grant no. 82/10 . c 2015 American Mathematical Society

45

46

Y. KIFER

results on large deviations and on their applications to stochastic processes, statistical mechanics and dynamical systems. It is based on papers [12], [19], [21], [22], [24] and among other relevant papers we cite also [4], [7], [8], and [16]. The second lecture deals with large deviations in averaging and it is based on papers [20], [23], [26], [27], [28] and among other relevant works we note [1], [25] [5], [10], [11], [13], [17], [18] and [35]. Finally, the third lecture deals with the new topic of large deviations for nonconventional sums and related questions and it is based on papers [29], [30], [31], [32] and among other related papers we refer the reader to [2], [3], [33], [6], [9], [14], [15] and [34]. We observe also that sections below correspond to our lectures given at EPFL while subsections correspond to lectures’ slides. 1. General large deviations theorems and their applications 1.1. Large deviations: definitions. Let X be a compact metric space, P(X) be the space of probability measures on X with topology of weak convergence, (Ωt , Ft , Pt ) be a family of probability spaces where t ∈ Z or t ∈ R, ζ t : Ωt → P(X), ζ t (ω) = ζωt be a family of measurable maps, and r(t) → ∞ as t → ∞. Upper large deviation bound: lim sup(1/r(t)) log Pt {ζ t ∈ K} ≤ − inf {I(ν) : ν ∈ K}, ∀K closed ⊂ P(X). t→∞

Lower large deviation bound: lim inf (1/r(t)) log Pt {ζ t ∈ G} ≥ − inf {I(ν) : ν ∈ G}, ∀G ⊂ P(X) open. t→∞

Here I is a lower semi continuous convex functional to be identified. General fact (Donsker-Varadhan):    If both LD bounds hold true then I(μ) = supV ∈C(X) V dμ − Q(V ) if μ ∈ P(X) where

 Q(V ) = lim (1/r(t)) log t→∞

 exp(r(t)

V (x)dζωt )dPt (ω).

1.2. Applications to occupational measures: It is called usually 2nd level of large deviations if we apply the above to ζωt =

1 t



1 δY (ω) t s=0 s t−1

t

δYs (ω) ds, ζωt = 0

where Ys is a stochastic process, in particular, a dynamical system Ys (ω) = F s ω. For statistical mechanics applications we can consider  1 ζωa = δθq ω |D(a)| q∈D(a)

where X = Ω is a compact subset of QZ , Q is a finite set, (θq ω)m = ωm+q , ω = (ωm , m ∈ Zd ), D(a) = {m : 0 ≤ mi < ai , i = 1, . . . , d}. d

LARGE DEVIATIONS

47

1.3. A general theorem on large deviations. 1.1. Theorem. Suppose that the limit   Q(V ) = lim (1/r(t)) log exp(r(t) V (x)dζωt )dPt (ω) t→∞

exists for any V from the space C(X) of continuous functions on X. Then the upper large deviations bound holds true with   I(μ) = sup V dμ − Q(V ) if μ ∈ P(X) V ∈C(X)

and I(μ) = ∞ otherwise. If, in addition, there exists a sequence V1 , V2 , · · · ∈ C(X) such that span{Vi , i ≥ 1} = C(X), and for ∀n, any numbers β1 , . . . , βn , and every function V = β1 V1 +· · · +βn Vn there exists a unique measure μV ∈ P(X) satisfying    Q(V ) = V dμV − I(μV ) = sup V dμ − I(μ) μ∈P(X)

then the lower large deviation bound holds true, as well. Convex analysis fact: The uniqueness of a maximizing measure μV for V = V (β1 , . . . , βn ) is equivalent to differentiability of Q(V ) in β = (β1 , . . . , βn ). 1.4. Contraction principle. Let Ψ : P(X) → Y be continuous, Y be a Hausdorff space, ζ t be as in the general theorem above. Then P {Ψζ t ∈ U } = P {ζ t ∈ Ψ−1 U } for U ⊂ Y and P ∈ P(X). If LD holds true for ζ t with the rate functional I then LD holds true for Ψζ t with the rate J(y) = inf{I(ν) : ν ∈ Ψ−1 y}. For occupational  t  measures ζωt = 1t 0 δF s ω ds, ω ∈ X or ζωt = 1t t−1 gdν for some ω and Ψν = s=0 δF s  t s fixed continuous g we obtain LD for Ces´ aro averages 1t 0 g ◦ F s ds or 1t t−1 s=0 g ◦ F with the rate function J(y) = inf{I(ν) : gdν = y} which is called 1st level of LD. It can be obtained directly: if the limit  t−1  1 Q(λ) = lim log exp(λ g ◦ F s )dP t→∞ t s=0 t−1 1 exists and is differentiable in λ then LD for t s=0 g ◦ F s holds true with the rate t J(y) = supλ (λy − Q(λ)) (and similarly for 1t 0 g ◦ F s ds). 1.5. Applications of the general theorem to dynamical systems. Let f : X  be a continuous map of a compact metric space X and Bx (δ, n) = {y : max0≤i≤n dist(f i x, f i y) ≤ δ}. 1.2. Proposition. Suppose that m ∈ P(X) and for some ϕ ∈ C(X) and for all n, δ > 0, x ∈ X one has  n−1   −1 i ((Aδ (n)) ≤ m(Bx (δ, n)) exp − ϕ(f x) ≤ Aδ (n) i=0

where Aδ (n) > 0 satisfies lim

n→∞

1 log Aδ (n) = 0 . n

48

Y. KIFER

Then for any V ∈ C(X), 1 log lim n→∞ n



   n exp n V dζx dm(x) = Π(ϕ + V )

X

where Π is the topological pressure and n



V dζxn =

n−1 i=0

V (f i x).

1.6. Proposition 1.2 implies upper bound of LD. since by the variational principle:      ϕdμ + hμ (f ) V dμ + Q(V ) = Π(ϕ + V ) = supμ−f -invariant    = supμ∈P(X) V dμ − I(μ) where

⎧  ⎨ − ϕdμ − h (f ) if μ is f -invariant, μ I(μ) = ⎩ ∞ otherwise.  By Proposition 1.2, 0 = Π(ϕ) ≥ ϕdμ + hμ (f ), and so I(μ) ≥ 0. If hμ is upper semicontinuous and since hμ is affine then I(μ) is convex and lower semicontinuous and we have upper large deviation bound with the rate I(μ). Thus if n−1 ζxn = n1 i=0 δf i x then limn→∞ n−1 log m{x : ζxn ∈ K} ≤ − inf μ∈K I(μ) for any closed K ⊂ P(M ). 1.7. Examples: subshifts of finite type, expanding and hyperbolic transformations. • Subshift of finite type: Here X ⊂ {1, . . . , s}N , (f x)i = xi+1 , x = (x0 , x1 , . . . ) is the left shift, X = {x = (x0 , x1 , . . . ), γxi ,xi+1 = 1}, ∀i ≥ 0 where Γ = (γij , i, j = 1, . . . , s) is a s × s matrix with 0 and 1 en˜i }). Then tries. Define the metric d(x, x ˜) = exp(− min{i ≥ 0 : xi = x x∈X : x ˜i = xi , ∀i ≤ n} provided δ = e−1 . Bx (δ, n) = [x0 , . . . , xn ] = {˜ If m is a Gibbs measure with a continuous potential ϕ then conditions of Proposition 1.2 are satisfied, and so upper LD bound holds true. If ϕ is H¨ older continuous then by uniqueness of equilibrium states (maximizing measures in the variational principle, see [4]) we see from Theorem 1.1 that lower LD bound holds true, as well. • Expanding and Axiom A transformations: Here we can take as m not only Gibbs measures but in view of the Bowen-Ruelle volume lemma we can take m to be the normalized Riemannian volume. The function ϕ will be here the differential expanding coefficient (on the unstable subbundle: ϕ(x) = − ln |Jacux f |). 1.8. Large deviations from equidistribution. Let f : X  be as in either of the examples above. Let On be the set of all periodic points of f with τxperiod and τx be the least period of x ∈ O. Set ζx = τx−1 i=1 δf i x n, O = ∪n On  and μn = Nn−1 x∈On ζx where Nn = #On . Bowen showed that μn ⇒ μmax as n → ∞ where μmax is the measure of maximal entropy and this result he called the equidistribution of periodic orbits. For Γ ⊂ O let νn (Γ) = Nn−1 #(Γ ∩ On ).

LARGE DEVIATIONS

49

Then for any g ∈ C(M ) (Bowen),

  limn→∞ n−1 log O exp(n gdζx )dνn (x)     i = limn→∞ n−1 log Nn−1 x∈On exp n−1 i=0 g(f x)    ˜ gdμ − I(μ) = −htop (f ) + Π(g) = supμ∈P(M )

˜ ˜ where I(μ) = htop − hμ if μ is f -invariant and I(μ) = ∞, otherwise. Theorem 1.1 and the uniqueness of equilibrium states for H¨older continuous functions (see [4]) yield ˜ μ ∈ K} and lim supn→∞ n−1 log νn {x ∈ O : ζx ∈ K} ≤ − inf{I(μ), −1 ˜ lim inf n→∞ n log νn {x ∈ O : ζx ∈ G} ≥ − inf{I(μ) : μ ∈ G} for any closed K ⊂ P(M ) and open G ⊂ P(M ). 1.9. Large deviations from equidistribution in continuous time. Let f t : M → M be a group of homeomorphisms of a compact metric space. Let CO be the set of all closed (periodic) orbits and COδ (t) ⊂ CO be the set of closed orbits with some period in the interval [t − δ, t + δ]. Let τ (γ) denotes the least τ (γ) period of γ ∈ CO. Set ζγ = (τ (γ))−1 δf s x ds in the continuous time case and 0

ζγ = (τ (γ))−1

τ (γ) i=1

−1 δf i x in the discrete time case. Let μt,δ = Nt,δ



ζγ where

γ∈COδ (t)

Nt,δ = #{COδ (t)} is the number of elements in COδ (t). Bowen showed that under general conditions of expansiveness and specification μt,δ weakly converges as t → ∞ to the measure μmax with maximal entropy for f t . For Γ ⊂ CO set  −1 νt,δ (Γ) = Nt,δ # {Γ COδ (t)}. Then lim νt,δ {γ ∈ CO : ζγ ∈ / Uμmax } = 0 for any t→∞ neighborhood Uμmax of μmax . A more precise statement is obtained via LD. Namely, for any δ > 0 small enough, lim sup t−1 log νt,δ {γ ∈ CO : ζγ ∈ K} ≤ − inf {I(μ) : μ ∈ K} t→∞

for any closed K ⊂ P(M ) while for any open G ⊂ P(M ), lim inf t−1 log νt,δ {γ ∈ CO : ζγ ∈ G} ≥ − inf {I(μ) : μ ∈ G} t→∞

where I(μ) = htop (f 1 ) − hμ (f 1 ) if μ ∈ P(M ) is f t -invariant and = ∞, otherwise. In particular, this yields bounds of large deviations from the equidistribution for closed geodesics on negatively curved manifolds. 1.10. LD for path counting in graphs. Let Γ be a finite directed graph with vertices V = {1, . . . , m} and an adjacency matrix B = (bij )0≤i,j≤m , i.e. bij = 1 iff an arrow goes from i to j. Now the space of paths of length n has the form XB (n) = {x = (x0 , x1 , . . . , xn ) : xi ∈ V, ∀i = 0, . . . , n and bxi xi+1 = 1, ∀i = 0, . . . , n − 1} with XB = XB (∞) ⊂ V N taken with the product topology. Let Πn (a, b) be the set of all (x0 , x1 , . . . , xn ) ⊂ XB (n) with x0 = a and xn = b, Πn = ∪a Πn (a, a) and

50

Y. KIFER

ζxn = XB ,

1 n

n−1 i=0

δf i x where f is the left shift. Then for any continuous function g on

  limn→∞ n1 log (exp gdζxn )dηn (x) = limn→∞ n1 log |Πn (a, b)|−1 !   × α∈Πn (a,b) eSn g(xα ) = supν∈IB ( gdν − (htop (f ) − hν (f )))

where ηn (Y ) = |Πn (a, b)|−1 |Y ∩ Πn (a, b)|, xα ∈ cylinder Cα , and IB ⊂ P(XB ) is the space of shift invariant measures. Then for any a, b ∈ V , ! 1 lim sup log |Πn (a, b)|−1 |{α ∈ Πn (a, b) : ζxnα ∈ K}| ≤ − inf I(ν) ν∈K n→∞ n for each closed K ⊂ P(XB ) while for each open U ⊂ P(XB ), lim inf n→∞

! 1 log |Πn (a, b)|−1 |{α ∈ Πn (a, b) : ζxnα ∈ U }| ≥ − inf I(ν) ν∈U n

where xα , α = (α0 , . . . , αn ) is in the cylinder Cα , I(ν) = htop (f ) − hν (f ) if ν ∈ IB and I(ν) = ∞, otherwise. This remains true for Πn in place of Πn (a, b). 1.11. Applications to Markov chains. Here Yn is a Markov chain on a compact X with  transition probabilities P (x, dy) satisfying Doeblin’s condition and TV g(x) = eV (y) g(y)P (x, dy). Then   n −1 n −1 V (Yk ) log λV = lim n log TV 1(x) = lim n log Ex exp n→∞

n→∞

k=1

where λV is the principal eigenvalue of TV . Knowing that log λV satisfies the Donsker-Varadhan variational formula and uniqueness of the maximizing measure  μV in this formula for each V ∈ C(X) we derive for ζωn = n−1 nk=1 δYk (ω) and x ∈ X, lim supn→∞ n−1 log Px {ω : ζωn ∈ K} ≤ − inf{I(μ), μ ∈ K} and lim inf n→∞ n−1 log Px {ω : ζωn ∈ G} ≥ − inf{I(μ), μ ∈ G} for any closed K ⊂ P(X) and open G ⊂ P(X), where     Pu I(μ) = − inf log dμ, P u(x) = u(y)P (x, dy) . u>0 u 1.12. First applications to averaging. We consider a difference equation xk+1 − xk = B(xk , F k ω), x0 = x ∈ Rd where B(x, ω) is a bounded Lipschitz in x and continuous in ω vector field on Rd × Ω, F : Ω → Ω is continuous. First, we study LD on the 2nd level, namely, for  T δ(s,F [s−1 ] ω) ds ζ ,T = ζω,T = T −1 0

which is a probability measure on [0, T ] × Ω. By Theorem 1.1 we have to consider the limit   T −1 exp(−1 T −1 V (t, F [t ] ω)dt)dP (ω) QF (V ) = lim  log →0

Ω

0

LARGE DEVIATIONS

51

for any Vt (ω) = V (t, ω) ∈ C([0, T ] × Ω). For the same classes of dynamical systems as we obtained LD above we get that  T  −1 QF (V ) = Π(ϕ + T Vt )dt = sup ( V dη − I0T (η)) η∈P([0,T ]×M )

0

where ϕ is the potential of the Gibbs measure P (or P is the normalized volume T on Ω), I0T (η) = 0 I(ηt )dt if dη = T −1 dηt dt and = ∞, otherwise, while I(ηt ) = − ϕdηt − hηt (F ) if ηt is F -invariant and = ∞, otherwise. 1.13. LD for paths in averaging.  t 1. We obtain LD for time changed sequences ztε = xε[t/ε] . Let wtε (ω) = x + 0 B(wsε (ω), F [s/ε] ω)ds. Then we get by induction that for some C > 0, max

0≤k≤T ε−1

ε ε |zkε − wkε | ≤ C2CT ε

and so we can deal with wε in place of z ε . Next, we apply a contraction principle argument. On the subspace M ⊂ P([0, T ]) × Ω) of measures μ such that dμ = T −1 dμt dt, μt ∈ P(M ), t ∈ [0, T ] define the map Ψx M  C0T (Rd ), x ∈ Rd by ϕ = Ψx μ with  t ϕt = x + B(ϕs , ω)dμs (ω)ds 0

Ω

which is well defined since B is Lipshitz continuous in the first variable, where C0T (Rd ) = {ϕ : ϕt ∈ Rd continuous in t ∈ [0, T ]}. Then Ψ is a continuous map if one takes the topology of weak convergence on P([0, T ] × Ω) and the metric ˜ = sup dist(ϕt , ϕ˜t ) ρ0T (ϕ, ϕ) 0≤t≤T

on C0T (Rd ). 1.14. LD for paths in averaging. 2. Clearly, w = Ψx ζ ,T and we obtain the large deviations bounds for w with the rate functional S0T (ϕ) = inf {I0T (η) : Ψx η = ϕ} η

This functional can be written also in the following form  T ¯μ (ϕt ), μ ∈ P(M )}dt S0T (ϕ) = inf{I(μ) : dϕt /dt = B 0



¯μ (x) = B(x, y)dμ(y). If we set Φa = {ϕ ∈ C0T (Rd ) : ϕ0 = x, S0T (ϕ) ≤ where B 0T a} then these large deviations bounds can be written in the form: for any a, β, λ > 0, each δ > 0 small enough, and every ϕ ∈ C0T (Rd ), ϕ0 = x there exists 0 > 0 such that for all  ∈ (0, 0 ),  " 1 P {ρ0T (w , ϕ) < β, w0 = x} ≥ exp − (S0T (ϕ) + λ)  and P {ρ0T (w



, Φa0T (x))

≥ β,

w0

 " 1 = x} ≤ exp − (S0T (ϕ) + λ) . 

52

Y. KIFER

1.15. Another approach to the lower bound. This is used without uniqueness of equilibrium states as, for instance, in the case of Zd -actions, d > 1. Let (Ω, B) be a measurable space and P(Ω) be the space of probability measures defined on B. Suppose that F ⊂ B and ν, μ ∈ P(Ω). Define the Kullback-Leibler information by F H F (ν|μ) = μ(pF ν,μ log pν,μ ) F

F if ν ≺ μ with a Radon-Nikodym derivative pF ν,μ and H (ν|μ) = ∞, otherwise. Let n ζ : Ω → P(Ω), n = 1, 2, . . . be a sequence of measurable maps where P(Ω) is taken with some measurable structure.

1.3. Proposition. Suppose that there exists a measurable set U ⊂ P(Ω) and a sequence of σ-algebras Fn ⊂ B, n = 1, 2, . . . such that {ω : ζωn ∈ U } ∈ Fn for all n = 1, 2, . . . and lim ν{ζ n ∈ U } = 1 .

n→∞

If r(n) → ∞ as n → ∞ and h = lim sup(r(n))−1 H Fn (ν|μ) n→∞

then lim inf (r(n))−1 log μ{ζ n ∈ U } ≥ −h. n→∞

1.16. Scheme of application of Proposition 1.3. Let f : X , ν be a n−1 f -invariant ergodic measure and ζxn = n−1 i=0 δf i x . Then by Birkhoff’s ergodic theorem ν{x : ζxn ∈ U} → 1 as n → ∞ for any open U ⊃ ν. Let ξ be a finite  −k ξ. generating partition and Fn be the finite algebra generated by ξ n = n−1 k=0 f Then   H Fn (ν|μ) = ν(A) log ν(A) − ν(A) log μ(A). A∈ξ n

If

A∈ξ n

n−1  n−1    C −1 exp ϕ(f i x) ≤ μ(A) ≤ C exp ϕ(f i x) i=0

i=0

for some constant C > 0, any x ∈ A, each A ∈ ξ n and every integer n > 0 (such a measure is called a Gibbs one, see [4]) for certain continuous functions ϕ then  lim n−1 H Fn (ν|μ) = −hν (f ) − ϕdν . n→∞

If, in addition, for any f -invariant measure ν there exists a sequence of ergodic f invariant measures νn which converge weakly to ν and hνn (f ) → hν (f ) as n → ∞ then it follows from Proposition above that for any open U ⊂ P(X), lim inf n−1 log μ{x : ζxn ∈ U} ≥ − inf{I(ν) : ν ∈ U} n→∞

 with I(ν) as before = − ϕdν − hμ (f ) if ν is f -invariant.

LARGE DEVIATIONS

53

1.17. Subshifts of finite type for Zd -actions. Here Q is a finite alphabet d (spins), QZ is considered with product topology: it is the space of all maps ω : d Zd → Q (configurations), θm , m ∈ Zd shifts of QZ , (θm ω)n = ωn+m where ωk ∈ Q d is the value of ω on k. Let Ω be a closed θm -invariant subset of QZ of permissible configurations, (Ω, θ) is called a subshift. It is a subshift of finite type if ∃ a finite (window) F ⊂ Zd and Ξ ⊂ QF such that  " Zd d Ω = Ω(F,Ξ) = ω ∈ Q : (θm ω)F ∈ Ξ for every m ∈ Z , where ωR is the restriction of ω ∈ QZ to R ⊂ Zd . We say that a weak specification property (in the sense of dynamical systems) holds true if there exists N such that for any subsets Ri ⊂ Zd which are N apart and for any ξi ∈ ΩRi (this is the restriction of Ω to Ri which gives permissible configurations on Ri ) one can find ω ∈ Ω such that ωRi = ξi . Shiftinvariant interaction potential is by the definition a family " d

Φ =

where ΦΛ : ΩΛ −→ R is defined for all Λ ⊂ A where A is the

ΦΛ

collection of all nonempty finite sets, assuming  |ΦΛ | < ∞, where |ΦΛ | = sup |ΦΛ (ξ)| Φ = ξ∈ΩΛ

Λ:0∈Λ∈A

and ΦΛ−m (θm ξ) = ΦΛ (ξ), ∀Λ ∈ A, ∀ξ ∈ ΩΛ . 1.18. More statistical mechanics notions. The following expressions   Φ UΛΦ (ξ) = ΦX (ξX ), UΛ,η (ξ) = ΦX ((ξ ∨ η)X ) , ξ ∨ η ∈ Ω, X⊂Λ

ZΛΦ

=



exp

ξ∈ΩΛ

−UΛΦ (ξ)

!

X⊂A:X∩Λ =φ

,

ZΛΦ (η)



=

Φ exp −UΛ,η (ξ)

!

ξ∈ΩΛ :ξ∨η∈Ω

are called energy (potential) and partition functions, respectively. A Gibbs measure on Ω is a measure μ ∈ P(Ω) satisfying   !−1 ! Φ μ ΞΛ (ξ) | BΛc (η) = ZΛΦ (η) exp −UΛ,η (ξ) , for ∀ξ ∈ ΩΛ , η ∈ ΩΛc , ξ ∨ η ∈ Ω, ∀Λ ∈ A, ΞΛ (ξ) = {ω ∈ Ω : ωΛ = ξ}, where Λc = Zd  Λ and BΛc is the Borel σ-algebra on ΩΛc . Next we define the limit in the sense of van Hove writing Λγ  ∞, where γ belongs to a directed set Γ, and considering limits along Γ assuming |(Λγ + a)  Λγ | = 0. lim |Λγ | = ∞ and lim γ∈Γ γ∈Γ |Λγ |  1.19. LD for Zd -actions. Set ζωΛ = |Λ|−1 m∈Λ δθm ω . 1.4. Theorem. Let (Ω, θ) be a subshift of finite type satisfying the weak specification. Then for any interaction Φ as above and any Gibbs measure μ, lim sup |Λ|−1 log μ{ω : ζωΛ ∈ K} ≤ − inf I Φ (ν) Λ∞

ν∈K

∀K closed ⊂ P(Ω) while for any open G ⊂ P(Ω), lim inf |Λ|−1 log μ{ω : ζωΛ ∈ G} ≥ − inf I Φ (ν) Λ∞

ν∈G

54

Y. KIFER

where I Φ (ν) = Π(AΦ ) −



AΦ dν − hν if ν ∈ P(Ω) is shift invariant and " = ∞,  otherwise. Here AΦ (ω) = − |R|−1 ΦR (ωR ) : R ⊂ Zd is finite and 0 ∈ R , ⎛ Φ  log Z Λ = lim |Λ|−1 log ⎝ exp Π(AΦ ) = lim Λ∞ Λ∞ |Λ| ξ∈ΩΛ



⎞ ! AΦ (θm ω ξ ) ⎠

m∈Λ, ∀ω ξ ∈ΞΛ (ξ)

is the pressure and hν = limΛ∞ |Λ|−1 HΛ (ν) with  HΛ (ν) = − ν (ΞΛ (ξ)) log ν(ΞΛ (ξ)) ξ∈ΩΛ

being the (mean) entropy. 1.20. Application: LD for configurations counting. Let, again, (Ω, θ), Ω = Ω(F,Ξ) be a subshift of finite type. For a = (a1 , . . . , ad ) ∈ Zd , ai > 0, 1 ≤ i ≤ d set Λ(a) = {i ∈ Zd : 0 ≤ ik < ak , 1 ≤ k ≤ d} and we write a → ∞ if a1 , . . . , ad → ∞. Let also Zd (a) be the subgroup of Zd generated by (a1 , 0, . . . , 0), . . . , (0, . . . , 0, ad ). The set of a-periodic points is defined by Πa = {ω ∈ Ω : Zd (a)ω = ω}. We say that the weak specification holds true if ∃N > 0 such that ∀ Ri ⊂ Zd which are N apart and ∀ permissible configurations ξi on Ri ∃ω ∈ Ω such that ωRi = ξi . The strong specification takes place if ∃N > 0 such that ∀ Ri ⊂ Λ(a) which are N apart and for all permissible configurations Λ(a) ξi on Ri there = ξi . Set ζωa = ζω where, again,  exists ω ∈ Πa such that ωRi −1 Λ −1 m δ . Define ν (Γ) = |Π | |Γ ∩ Π |, Γ ⊂ Ω which is the uniζω = |Λ| a a a m∈Λ θ ω form distribution on Πa . 1.5. Theorem. Suppose that (Ω, θ) is a subshift of finite type satisfying the strong specification. Then for any closed K ⊂ P(Ω) and for any open G ⊂ P(Ω), lim sup |Λ(a)|−1 log νa {ω : ζωa ∈ K} ≤ − inf J(η) and η∈K

a→∞

−1

lim inf |Λ(a)| a→∞

log νa {ω :

ζωa

∈ G} ≥ − inf J(η) η∈G

where J(η) = htop − hη if η is shift invariant and = ∞, otherwise, and htop = sup{hη : η is shift invariant} is the topological entropy of the subshift. 1.21. LD for Benedics–Carleson quadratic maps by Chung and Takahasi. Let X = [−1, 1] and fa : X  where fa x = 1 − ax2 for 0 < a < 2. Suppose 9 1 log 2 and α = 100 the following conditions are satisfied: that for λ = 10 mixing on [f 2 0, f 0]; • f = fa with a close enough to 2; f is topologically √ • |(f n ) (f 0)| ≥ eλn ∀n ≥ 0; |f n 0| ≥ e−α n ∀n ≥ 1. Then f has an absolutely continuous invariant probability measure μ and the set of parameters a satisfying the above has positive Lebesgue measure. Set δxn =  n−1 1 log |f  |dν and let h(ν) be the entropy of an f -invariant i=0 δf i x , λ(ν) = n ν ∈ P(X). 1.6. Theorem. Let F (ν) = h(ν) − λ(ν) if ν ∈ P(X) is f -invariant and = ∞, otherwise and define I(ν) = − inf G sup{F (η) : η ∈ G}, where inf is over all open neighborhoods of ν (lower semi-continuous regularization of −F ). Then lim inf n→∞ log μ{x ∈ X : δxn ∈ G} ≥ − inf{I(ν) : ν ∈ G} and lim supn→∞ log μ{x ∈ X : δxn ∈ K} ≤ − inf{I(ν) : ν ∈ K}

LARGE DEVIATIONS

55

for any open G ⊂ P(X) and closed K ⊂ P(X) (with respect to the topology of weak convergence). By the general Donsker-Varadhan result:   n−1  1 lim ϕ ◦ f i )dμ = sup ( ϕdν − I(ν)), ∀ϕ ∈ C(X). log exp( n→∞ n ν∈P(X) i=0 2. Large deviations in averaging 2.1. Averaging setup: continuous time. We consider a system of differε ε ential equations for X ε = Xx,y and Y ε = Yx,y , dX ε (t) dY ε (t) = εB(X ε (t), Y ε (t)), = b(X ε (t), Y ε (t)) dt dt with initial conditions X ε (0) = x, Y ε (0) = y on the product Rd × M where M is a compact nM -dimensional C 2 Riemannian manifold and B(x, y), b(x, y) are smooth in x, y families of bounded vector fields on Rd and on M, respectively, so that y serves as a parameter for B and x for b. The solutions of the above equations determine the flow of diffeomorphisms Φtε on Rd ×M acting by Φtε (x, y) = ε ε (t), Yx,y (t)). Taking ε = 0 we arrive at the flow Φt = Φt0 acting by Φt (x, y) = (Xx,y t (x, Fx y) where Fxt is another family of flows given by Fxt y = Yx,y (t) with Y = Yx,y = 0 being the solution of Yx,y dY (t) = b(x, Y (t)), Y (0) = y. dt It is natural to view the flow Φt as describing an idealized physical system where the parameters x = (x1 , . . . , xd ) are assumed to be constants (integrals) of motion while the perturbed flow Φtε is regarded as describing a real system where the evolution of these parameters is also taken into consideration. 2.2. Averaging setup: discrete time. In the discrete time case we deal ε ε (n) and Y ε (n) = Yx,y (n), n = with difference equations for sequences X ε (n) = Xx,y 0, 1, . . . so that X ε (n + 1) − X ε (n) = εB(X ε (n), Y ε (n)), Y (n + 1) = FX ε (n) Y ε (n), X ε (0) = x, Y ε (0) = y ε

where B : X × M → Rd is Lipschitz in both variables and the maps Fx : M → M are smooth and depend smoothly on the parameter x ∈ Rd . Introduce the map ε ε Φε (x, y) = (Xx,y (1), Yx,y (1)) = (x + εB(x, y), Fx y).

This setup is viewed as a perturbation of the map Φ(x, y) = (x, Fx y) describing an ideal system where the parameters x ∈ Rd do not change. Assuming that Fx , x ∈ Rd are C 2 depending on x families of either C 2 expanding transformations or C 2 Axiom A diffeomorphisms in a neighborhood of an attractor Λx we will derive ε ¯ xε (n) where X ¯ε = X ¯ xε solves (n) − X large deviations estimates for the difference Xx,y the equation ¯ ε (t) dX ¯ X ¯ ε (t)), X ¯ ε (0) = x = εB( dt  SRB ¯ is the corresponding Sinai-Ruelle-Bowen where B(x) = B(x, y)dμSRB x (y) and μx (SRB) invariant measure of Fx on Λx . The discrete time results are obtained,

56

Y. KIFER

essentially, by simplifications of the corresponding arguments in the continuous time case. 2.3. Averaging principle. Assume that the limit  T −1 ¯ ¯ B(x) = By (x) = lim T B(x, Fxt y)dt T →∞

0

exists and it is the same for ”many” y  s. For instance, suppose that μx is an ergodic invariant measure of the flow Fxt then this limit exists for μx −almost all y and is equal to  ¯ ¯μ (x) = B(x, y)dμx (y). B(x) =B x

If b(x, y) does not, in fact, depend on x then Fxt = F t and μx = μ are also independent of x and we arrive at the classical uncoupled setup while the above general setup we call fully coupled. In this case Lipschitz continuity of B implies ¯ already that B(x) is also Lipshitz continuous in x, and so there exists a unique ¯ ¯ x of the averaged equation solution X = X ¯ ε (t) dX ¯ X ¯ ε (t)), X ¯ ε (0) = x. = εB( dt In this case the standard averaging principle says that for μ-almost all y, ε ¯ xε (t)| = 0. lim sup |Xx,y (t) − X ε→0 0≤t≤T /ε

In the fully coupled case the averaging principle in the form above usually does not hold true both for nearly integrable Hamiltonian systems in the presence of resonances and for hyperbolic systems (Anosov flows depending on parameters). 2.4. Nonconvergence examples in the fully coupled case. 1st example could be viewed as a perturbation of circle rotations but describes also the motion of a pendulum with a small friction and it has the form ! dY ε (t) dX ε (t) = ε 4 + 8 sin Y ε (t) − X ε (t) , = X ε (t) dt dt ¯ε ¯ ε (t)). Due to the ”capwith the corresponding averaged equation dXdt(t) = ε(4 − X ε ε ¯ ture into resonance” if X (0) = X (0) = x ∈ (−2, −1) then ε ¯ xε (1/ε)| > 3/2 ∀y. lim sup |Xx,y (1/ε) − X ε→0

2nd example is determined by the system of difference equations for sequences ε ε (n) ∈ R and Y ε (n) = Yx,y (n) ∈ R, n = 0, 1, . . . such that X ε (n) = Xx,y X ε (n + 1) − X ε (n) = ε sin(Y ε (n)), X ε (0) = x Y (n + 1) = 2Y ε (n) + X ε (n) + cξn (mod 2π), Y ε (0) = y ε

where {ξn }∞ n=0 is an arbitrary sequence with supn |ξn | ≤ 1 and c ≥ 0 is a small number. Here Fx y = Fx,n y = 2y + x + cξn (mod 2π) and all Fx,n act on the circle  2π T1 (of length 2π) preserving the Lebesgue measure there and since 0 sin ydy = 0 we obtain that the averaged motion stays forever at the initial point. Then for any 1 x ∈ R and ξ = {ξn }∞ n=0 as above ∃ Γx,ξ with full Lebesgue measure on the circle T such that for each y ∈ Γx,ξ , ε lim sup max |Xx,y (n) − x| ≥ δ > 0. ε→0

0≤n≤1/ε

where δ does not depend on x, y, ξ and ε.

LARGE DEVIATIONS

57

2.5. Averaging principle in the fully coupled case. Set ε ¯ xε (s) ∈ X for all y ∈ M and s ∈ [0, t/ε]} Xt = {x ∈ X : Xx,y (s) ∈ X , X and 1 Eε (t, δ) = {(x, y) ∈ Xt × M : | t



t ε ¯ B(x, Yx,y (u))du − B(x)| > δ}. 0

2.1. Theorem. Suppose that  the vector fields b and B are Lipschitz continuous ¯ and bounded and that B(x) = B(x, y)dμx (y) is Lipschitz, as well. Then   ε ¯ xε (t)|dμ(x, y) = 0, dμ(x, y) = dμx (y)dη(x) lim sup |Xx,y (t) − X ε→0

XT

M 0≤t≤T /ε

if and only if there exists an integer valued function n = n(ε) → ∞ as ε → 0 such that for any δ > 0, lim

max μ{(XT × M) ∩ Φ−jt(ε) Eε (t(ε), δ)} = 0, ε

ε→0 0≤j 0 such that ¯ ⊂ W ∀t ≥ t0 , and ∩t>0 Fxt W = Λx ∀x ∈ X¯ . Λx ⊂ W, Fxt W Let Jxu (t, y) be the absolute value of the Jacobian of the linear map DFxt (y) : → Γux (Fxt y) with respect to the Riemannian inner products and set dJ u (t, y) ## ϕux (y) = − x t=0 dt which is a H¨ older continuous function. Γux (y)

2.7. LD functionals. Denote by Mx the space of Fxt −invariant probability measures on Λx then we have the variational principle for  the topological pressure Πx (ψ) = supμ∈Mx ( ψdμ + hμ (Fx1 )). If Πx (ϕux + q) = (ϕux + q)dμqx + hμqx (Fx1 ) then a Fxt −invariant measure μqx on Λx is called the equilibrium state for ϕux + q while μ0x = μSRB is called the Sinai–Ruelle–Bowen (SRB) measure. Since Λx are x attractors Πx (ϕux ) = 0.  ¯ set Ix (ν) = − ϕux dν − hν (Fx1 ) if ν ∈ Mx For any probability measure ν on W  and = ∞, otherwise. Then Πx (ϕux + q) = supν ( qdν − Ix (ν)). The functional Ix (ν)

58

Y. KIFER

is lower semi-continuous in ν and it is also  convex (and affine on Mx ), and so by the duality theorem Ix (ν) = supq∈C(M) ( qdν − Πx (ϕux + q)). Set  L(x, x , α) = inf{Ix (ν) : B(x , y)dν(y) = α} if ∃ ν ∈ Mx satisfying the condition in brackets and L(x, x , α) = ∞, otherwise. Let C0T be the space of continuous curves γ = γt , t ∈ [0, T ] in X . For each absolutely continuous γ ∈ C0T denote by γ˙ t its velocity. Now set  T  T ¯ν (γt ), ν ∈ Mγ }dt, S0T (γ) = L(γt , γ˙ t )dt = inf{Iγt (ν) : γ˙ t = B t 0 0  ¯ν (x) = B(x, y)dν(y), provided for Lebesgue almost all t ∈ [0, T ] there where B ¯ν (γt ), and S0T (γ) = ∞ otherwise. exists νt ∈ Mγt for which γ˙ t = B t 2.8. Basic LD theorem in averaging. Let γtu be the unique solution of the ¯ tu ), γ0u = x where B(z) ¯ ¯ SRB (z). Define the uniform metric equation γ˙ tu = B(γ =B μz on C0T by r0T (γ, γ˜ ) = sup0≤t≤T |γt − γ˜t | for any γ, γ˜ ∈ C0T . Set Ψa0T (x) = {γ ∈ C0T : γ0 = x, S0T (γ) ≤ a}. Then S0T is a lower semi-continuous functional on C0T with respect to the metric r0T , and so Ψa0T (x) is a closed set. ε ¯ xε (s) ∈ X for all y ∈ W, ¯ s ∈ [0, t/ε], ε > (s) ∈ X and X Set Xt = {x ∈ X : Xx,y 0}. Clearly, Xt ⊃ {x ∈ X : inf z∈∂X |x − z| ≥ 2Kt}. ε ε 2.3. Theorem. If Zx,y (t) = Xx,y (t/ε) then for any x ∈ XT , a, δ, λ > 0 and every γ ∈ C0T , γ0 = x there exists ε0 = ε0 (x, γ, a, δ, λ) > 0 such that for ε < ε0 ,  " $ % 1 ε m y ∈ W : r0T (Zx,y , γ) < δ ≥ exp − (S0T (γ) + λ) and ε "  $ % 1 ε a m y ∈ W : r0T (Zx,y , Ψ0T (x)) ≥ δ ≤ exp − (a − λ) ε where, recall, m is the normalized Riemannian volume on M. The functional S0T (γ) ¯ν (γt ), νt ∈ Mγ for Lebesgue almost all for γ ∈ C0T is finite if and only if γ˙ t = B t t t ∈ [0, T ]. Finally, S0T (γ) achieves its minimum 0 only on γ u .

2.9. Exits from a domain for bounded times. Let V ⊂ X be a connected ε ε ε open set and put τx,y (V ) = inf{t ≥ 0 : Zx,y (t) ∈ / V } where we take τx,y (V ) = ∞ if ε Xx,y (t) ∈ V for all t ≥ 0. 2.4. Corollary. Under the conditions of the above theorem for any T > 0 and x ∈ V, $ % ε (V ) < T limε→0 ε log m y ∈ W : τx,y = − inf {S0t (γ) : γ ∈ C0T , t ∈ [0, T ], γ0 = x, γt ∈ V } . Next, we will study ”very long”, i.e. exponential in 1/ε, time ”adiabatic” behaviour of the slow motion which cannot be described usually in the traditional theory of averaging where only perturbations of integrable Hamiltonian systems are considered which are not chaotic enough. Namely, we will describe such long time behavior of Z ε in terms of the function R(x, z) =

inf

t≥0,γ∈C0t

{S0t (γ) : γ0 = x, γt = z}

¯ Observe that R satisfies the under various assumptions on the averaged motion Z. triangle inequality R(x1 , x2 ) + R(x2 , x3 ) ≥ R(x1 , x3 ) for any x1 , x2 , x3 ∈ X and it

LARGE DEVIATIONS

59

determines a semi metric on X which measures ”the difficulty’” for the slow motion to move from point to point in terms of the functional S. 2.10. S-compacts. Introduce the averaged flow Ψt on Xt by dΨt x ¯ t x), x ∈ Xt = B(Ψ dt  ¯ ¯ SRB (z) and B ¯ν (z) = B(z, y)dν(y) for any probability where, recall, B(z) = B μz measure ν on M. Call a Ψt -invariant compact set O ⊂ X an S-compact if ∀ η > 0, ∃ Tη ≥ 0 and ∃ open Uη ⊃ O such that whenever x ∈ O and z ∈ Uη there are t ∈ [0, Tη ] and γ ∈ C0t with γ0 = x, γt = z and S0t (γ) ≤ η. Then R(x, z) = 0 for any x, z in an S-compact O and R(x, z) ≡const when z ∈ X is fixed and x runs over O. A vector field B on X × M is called complete at x ∈ X if the convex set ¯ν (x) : β ∈ [0, 1], ν ∈ Mx } contains an open neighborhood of the of vectors {β B d origin in R . It turns out that if O ⊂ X is a compact Ψt -invariant set such that B is complete at each x ∈ O and either O contains a dense orbit of the flow Ψt (i.e. Ψt is topologically transitive on O) or R(x, z) = 0 for any x, z ∈ O then O is an S-compact. Moreover, O is an S-compact if B is complete only at some point of O and the flow Ψt on O is minimal, i.e. the Ψt -orbits of all points are dense in O. A compact Ψt -invariant set O ⊂ X is called an attractor for Ψt if ∃ open U ⊃ O ¯ ⊂ U and limt→∞ dist(Ψt z, O) = 0 for all z ∈ U . The and tU > 0 such that ΨtU U set V = {z ∈ X : limt→∞ dist(Ψt z, O) = 0}, which is clearly open, is called the basin of O. An attractor which is an S-compact is called an S-attractor. 2.11. Exits from a domain for very long times. 2.5. Theorem. Let O ⊂ X be an S-attractor whose basin contains the closure V¯ of a connected open set V with a piecewise smooth boundary ∂V such that V¯ ⊂ X . Assume that for each z ∈ ∂V there exists  = (z) > 0 and an Fzt −invariant ¯ ¯ν (z) ∈ Rd \ V¯ ∈ V but z + sB probability measure ν = νz on Λz such that z + sB(z) ¯ ¯ν (z) = 0 and the former vector points out into the ∀ s ∈ (0, ], i.e. B(z) = 0, B interior while the latter into the exterior of V . Set R∂ (z) = inf{R(z, z˜) : z˜ ∈ ∂V } z ∈ ∂V : R(z, z˜) = R∂ (z)}. Then R∂ (z) ≡ R∂ and ∂min (z) ≡ ∂min and ∂min (z) = {˜ ∀ z ∈ O, R∂ (x) ≤ R∂ ∀ x ∈ V and  ε τx,y (V )dm(y) = R∂ > 0 ∀ x ∈ V. lim ε log ε→0

W

Next, ∀ α > 0 ∃ λ(α) = λ(x, α) > 0 such that ∀ ε > 0 small, $ % ε ε m y ∈ W : e(R∂ −α)/ε > τx,y (V ) or τx,y (V ) > e(R∂ +α)/ε ≤ e−λ(α)/ε and ! % $ ε ε lim m y ∈ W : dist Zx,y (τx,y (V )), ∂min ≥ δ = 0 ∀ x ∈ V, δ > 0 ε→0

provided R∂ < ∞ and the latter holds true if and only if for some T > 0 there exists ¯ν (γt ) for Lebesgue almost all t ∈ [0, T ] γ ∈ C0T , γ0 ∈ O, γT ∈ ∂V such that γ˙ t = B t with νt ∈ Mγt .

60

Y. KIFER

2.12. Exit from a domain: picture.

LARGE DEVIATIONS

61

2.13. Several S-attractors. Next, we want to describe transitions of the slow motion Z ε between basins of attractors of the averaged flow Ψt . 2.6. Assumption. The family {Fxt , t ≤ 1, x ∈ Rd } is a compact set of diffeomorphisms in the C 2 topology, B(x, y)C 2 (Rd ×M) ≤ K for some K > 0 independent of x, y and there exists r0 > 0 such that ! x, B(x, y) ≤ −K −1 for any y ∈ W and |x| ≥ r0 . Suppose that the ω-limit set of Ψt is compact and it consists of a finite number of S-attractors O1 , . . . , O whose basins V1 , . . . , V have piecewise smooth boundaries ∂V1 , . . . , ∂V and of the remaining part which is contained in ∪1≤j≤ ∂Vj . Assume also that for any z ∈ ∩1≤i≤k ∂Vji , k ≤  there exist  = (z) > 0 and an Fzt ¯ν (z) ∈ Vj ∀ s ∈ (0, ] and i = invariant measures ν1 , . . . , νk such that z + sB i i ¯ν (z) = 0 and it points out into the interior of Vj which means 1, . . . , k, i.e. B i i that from any boundary point it is possible to go to any adjacent basin along a curve with an arbitrarily small S-functional. Let δ > 0 be so small that each Uδ (Oi ) = {z ∈ X : dist(z, Oi ) < δ} is contained with its closure in the basin Vi . For any x ∈ Vi set $ % ε ε (i) = inf t ≥ 0 : Zx,y (t) ∈ ∪j =i Uδ (Oj ) . τx,y 2.14. Rare (adiabatic) transitions between S-attractors. 2.7. Theorem. The function Rij (x) = inf z∈Vj R(x, z) takes on the same value Rij for all x ∈ Oi , i = j. Let Ri = minj =i,j≤ Rij . Then  ε lim ε log τx,y (i)dm(y) = Ri > 0 ∀ x ∈ Vi ε→0

W

and ∀ α > 0 ∃ λ(α) = λ(x, α) > 0 such that ∀ ε > 0 small, % $ ε ε (i) or τx,y (i) > e(Ri +α)/ε ≤ e−λ(α)/ε . m y ∈ W : e(Ri −α)/ε > τx,y Suppose that for all i, B is complete on ∂Vi and the restriction of the ω-limit set of Ψt to ∂Vi consists of a finite number of S-compacts. Assume also that for any i there is a unique ι(i) ≤ , ι(i) = i such that Ri = Riι(i) . Define ι0 (i) = i, τvε (i, 1) = τvε (i) and recursively, ιk (i) = ι(ιk−1 (i)) and τvε (i, k) = τvε (i, k − 1) + ! ε−1 τ ε (i,k) τvεε (k−1) j(vε (k − 1)) , where vε (k) = Φε v v, j((x, y)) = j if x ∈ Vj . Then ∀ x ∈ Vi and ∀ α > 0 ∃ λ = λ(x) > 0 such that ∀ n ∈ N, % $ ε ε (τx,y (i, k)) ∈ Vιk (i) for some k ≤ n ≤ ne−λ/ε . m y ∈ W : Zx,y

62

Y. KIFER

2.15. Several S-attractors: picture.

LARGE DEVIATIONS

63

2.16. Discrete time case, Markov fast motions. Similar results hold true for difference equations where fast motions satisfy 2.8. Assumption. The family Fx = Φ(x, ·) consists of C 2 -diffeomorphisms or endomorphisms of a compact nM -dimensional Riemannian manifold M with uniform C 2 dependence on the parameter x belonging to a relatively compact open connected set X ⊂ Rd . All Fx , x ∈ X¯ are either expanding maps of M or diffeomorphisms possessesing basic hyperbolic attractors Λx with (uniform in x) hyperbolic splittings and one open set W ⊂ M which contains all Λx but is contained in their basins. Similar results hold also true when fast motions are Markov processes satisfying Doeblin conditions with the rate functional I given by the Donsker-Varadhan formula. In the continuous time take X ε (t), Y ε (t) determined by an ordinary differential equation for the slow motion X ε together with a non degenerated stochastic differential equation with coefficients dependent on the slow x-variable. In the discrete time we can start with a parametric family of Markov chains Yx,y (n), n ≥ 0, Yx,y (0) = y on a compact C 2 Riemannian manifold M with transition probabilities P x (y, Γ) = Pyx {Yx,y (1) ∈ Γ} having positive C 1 densities px (y, z) = P x (y, dz)/m(dz) where m is the volume. Now, define X ε (n) and Y$ε (n) adding to# the difference equation% for the slow motion X ε another equation P Y ε (n + 1) ∈ Γ#X ε (n) = x, Y ε (n) = y = P x (y, Γ). 2.17. Computer simulations: two examples. In both examples Fx y = 3y + x (mod 1) where x ∈ R1 and y ∈ [0, 1], Fx are expanding maps of the circle T1 . Next, B(x, y) = x(x2 − 4)(1 − x)(a + x) + 50 sin 2πy where a = 1 in the 1st example and a = 1.5 in the 2nd example. Thus, X ε (n + 1) = X ε (n) + εB(X ε (n), Y ε (n)), Y ε (n + 1) = 3Y ε (n) + X ε (n) (mod 1) and we have maps Φε : R1 × T1 → R1 × T1 defined by ! Φε (x, y) = x + ε(x(x2 − 4)(1 − x)(a + x) + 50 sin 2πy), 3y + x (mod 1) . All maps Fx preserve the normalized Lebesgue measure Leb on T1 which is the SRB ¯ =X ¯ ε (t/ε) measure μSRB for each Fx in this case. The averaged equation for Z(t) x has here the form ¯ dZ(t) ¯ Z(t)), ¯ = B( dt ¯ where B(x) = x(x2 − 4)(1 − x)(a + x). When a = 1 the one dimensional vector field ¯ B(x) has 3 attracting fixed points O1 = 2, O2 = 0, O3 = −2 and two repelling fixed points 1 and −1. When a = 1.5 it has the same attracting fixed points but one of two repelling fixed points moves from −1 to −3/2 making the basin of −2 smaller while the left interval of the basin of 0 becomes larger, so it is more difficult to exit from there to the left than to the right. 2.18. Discussion. ε (n), n = • We plot below histograms of a single orbit of the slow motion Xx,y 9 −3 0, 1, 2, . . . , 10 with ε = 10 and the initial values in the 1st case x = 0, y = 0.001 and in the 2nd case x = −2, y = 0.001 (though these histograms do not depend essentially on initial points).

64

Y. KIFER

• In order to verify that B is complete at the fixed points −2, −1, 0, 1, 2 of the averaged system in the 1st example we observe that at these points Fx coincides with the map y → 3y(mod 1), and so we can take the periodic orbits 1/8, 3/8 and 5/8, 7/8 of √ the latter and notice that√the average of sin 2πy along the former is 1/ 2 and along the latter −1/ 2 which yields ¯ For the 2nd example it remains to verify completeness of B at zeros of B. completness only for x = −3/2 which follows since sin 2πy equals 1 and −1 at two fixed points 1/4 and 3/4 of F−3/2 , respectively. • By the theorem the transitions between O1 , O2 , and O3 are determined by Rij , i, j = 1, 2, 3 obtained via the functionals S0t (γ). Even here they are not easy to compute since this leads to complicated non traditional variational problems though the functionals Ix (ν) are given now by the simple formula Ix (ν) = ln 3 − hν (Fx1 ) if ν is Fx -invariant and = ∞, otherwise, while the set of Fx -invariant measures can be reasonably described since all Fx ’s are conjugate to the simple map y → 3y (mod 1). • It turns out that we obtain ”exactly” the same histograms when we replace Y ε (n + 1) = 3Y ε (n) + X ε (n) (mod 1) by Y ε (n + 1) = Y ε (n) + X ε (n) + ξn+1 (mod 1) where ξ1 , ξ2 , . . . are i.i.d. random variables on [0, 1] with a Lebesgue equivalent distribution.

LARGE DEVIATIONS

2.19. Symmetrical basins case.

65

66

Y. KIFER

2.5

2

1.5

1

0.5

0 −3

−2

−1

0

2.20. Asymmetrical basins case.

1

2

3

4

LARGE DEVIATIONS

67

2.21. Remarks on the proof of Theorems 2.3, 2.5, 2.7. In the non coupled situation dX ε (t) = εB(X ε (t), F t y) dt the basic large deviations theorem in averaging is not difficult. Then it suffices to show that for any continuous function qt (y) = q(t, y),   T  T ! −1 t/ε lim ε log exp ε qt (F y)dt dm(y) = Π(ϕu + qt )dt ε→0

0

0

which can be done splitting the interval [0, T ] into small subintervals. In the fully coupled case one needs more complicated technical tools, in particular, a version of the general large deviations bounds when usual assumptions hold true with errors. This allows approximate decoupling on small time intervals since the flows Fxt change slowly in time as x is the slow variable but one has to be careful since for hyperbolic flows errors accumulate exponentially in time. The above results concerning exponential in 1/ε time behavior are not easy already in the non coupled situation. Actually, in the non coupled probabilistic situation when the fast motion is a non degenerated Markov chain with continuous time and finitely many states this is easier and it was done by Freidlin. In the dynamical systems setup this can be done by a kind of rough Markov property argument for unstable disks. 2.22. Rough Markov property. The argument illustrated at the picture on the next page proceeds in the following way: 1) Cover an unstable disk by very small disks so that for some time (long for fast variables but short for slow variables) all solutions starting on each small disk go close to each other so that the questions we are interested in (exit from a domain or transition to the basin of another S-compact) hold or do not hold true simultaneously for all points in the small disk. The disks are chosen in such a way that disks with the same center of half radius are disjoint. 2) Apply approximate LD on the original disk estimating volumes of small disks satisfying our conditions taking into account that by the construction the sum of volumes of appropriate small discs gives measures of needed for LD sets up to multiplicative constants which disappear in LD limits. 3) Iterate small disks for the chosen above time so that they become of bounded size and repeat the procedure.

68

Y. KIFER

2.23. Rough Markov property: picture.

LARGE DEVIATIONS

69

3. Nonconventional large deviations and related topics 3.1. Nonconventional ergodic theorems. Nonconventional ergodic theorems studied the limits of expressions lim 1/N

N →∞

N 

T q1 (n) f1 · · · T q (n) f

n=1

where T is an ergodic (or weakly mixing) measure μ preserving transformation, fi ’s are bounded measurable functions and qi ’s are polynomials taking on integer values on the integers (Bergelson, Furstenberg, Weiss derived L2 -convergence in various situations while Assani obtained an almost sure convergence under additional conditions). Recently such results were extended to the continuous time, i.e. to expressions  1 T q1 (t) T f1 · · · T q (t) f dt T 0 where T s is now an ergodic measure preserving flow (Potts, Bergelson-LeibmanMoreira). If we take fj to be the indicators IΓj of some sets Γj then we obtain an application to multiple recurrence since then the above sum gives the number of events {T qi (n) x ∈ Γi , i = 1, . . . , } for n ≤ N . 3.2. Nonconventional limit theorems. The next natural step was to study central limit theorem type results for such expressions which we did together with Varadhan. Namely, we obtained the functional central limit theorem for ! [N t] ¯ √1 n=1 B(ξ(q1 (n)), . . . , ξ(q (n))) − B and N !  [N t] ¯ ds √1 B(ξ(q1 (s)), . . . , ξ(q (s))) − B N 0 where {ξ(n), n ≥ 0}, ( or {ξ(t)}, t ≥ 0) is a sufficiently fast mixing vector valued stochastic process with mild stationarity  properties satisfying certain moment ¯ = Bd(μ × · · · × μ), ξ(t) has distribution conditions, B is H¨older continuous, B μ, qj = jt for j ≤ k and qj (t), j > k satisfy some fast growth conditions. In the discrete time case results are readily applicable to fast mixing dynamical systems (subshifts of finite type, hyperbolic and expanding transformations etc.) with ξ(m) = ξ(m, x) = T m x. For appropriate flows such as hyperbolic ones these results were not yet proven. Warning: Summands in nonconventional sums are usually strongly dependent. 3.3. Nonconventional large deviations. Large deviations (1st level) in our situation are supposed to give estimates as N → ∞ for probabilities ln P {

N 1  B(ξ(q1 (n)), . . . , ξ(q (n))) ∈ I} N n=1

where I is an interval (closed for an upper bound, open for a lower bound). The asymptotic here is supposed to be of the form −N inf x∈I J(x) where J(x) ≥ 0 is to be found. A similar problem arises in the continuous time case with the integral in place of the sum. Together with Varadhan we derived such estimates in some cases. In the dynamical systems case we consider expressions of the form ln μ{x :

N 1  B(T q1 (n) x, . . . , T q (n) x) ∈ I} N n=1

70

Y. KIFER

for an appropriate measure μ (say, the normalized Riemannian volume in the hyperbolic and expanding transformations cases), and, again, the sum is replaced by the integral in the case of flows. 3.4. Nonconventional large deviations: 1st level, discrete time. General fact: if the limit  N  ! 1 Q(B, λ) = lim ln exp λ B(ξ(q1 (n)), . . . , ξ(q (n))) dP N →∞ N n=1 exists for any λ and it is differentiable in λ then J(x) = supλ (xλ − Q(λ)) is the rate function of large deviations estimates, i.e. lim sup N →∞

N 1 1  ln P { B(ξ(q1 (n)), . . . , ξ(q (n))) ∈ K} ≤ − inf J(x) x∈K N N n=1

for any closed set K ⊂ R, while for any open set U ⊂ R, lim inf N →∞

N 1  1 ln P { B(ξ(q1 (n)), . . . , ξ(q (n))) ∈ U } ≥ − inf J(x). x∈U N N n=1

We will explain how to deal with such limits in our nonconventional setup. 3.5. Markov chain case: statement. 3.1. Theorem. Let ξ(1), ξ(2), . . . be a Markov chain on a space M having a transition density p(j, x, y) with respect to some probability measure ν which for some j0 satisfies 0 < δ ≤ p(j0 , x, y) ≤ δ −1 < ∞ for ∀x, y ∈ M . Let Wλ (x1 , . . . , x ) be a bounded (in x variables) with a bounded derivative in λ measurable function on (−∞, ∞) × M  and qj , j = 1, . . . ,  be positive integer valued increasing functions such that q1 (n) = n, qj (n + 1) − qj (n) → ∞ as n → ∞ for all j ≥ 2 and qj (n − 1) ≥ qj−1 (n) for all n ≥ n0 and j ≥ 2. Then the limit  N  ! 1 ln exp Q(Wλ ) = lim Wλ (ξ(q1 (n)), . . . , ξ(q (n))) dP N →∞ N n=1 exists and it is differentiable in λ. In fact, Q(λ) = ln(specradRλ ) where Rλ g(x) = ! ˆ Ex g(ξ(1))Wλ (ξ(1)) ,   ! ˆ Wλ (x) = · · · exp Wλ (x, x2 , . . . , x ) dμ(x2 ) · · · dμ(x ) and μ is the unique invariant measure of the Markov chain. 3.6. Dynamical systems case: statement. 3.2. Theorem. Let Wλ and qj ’s be as before and let T be a C 2 expanding endomorphism or an Axiom A (in particular, Anosov) diffeomorphism in a small neighborhood O of an attractor on a compact Riemannian manifold M . Let Γ = M in the case of an expanding endomorphism or an Anosov diffeomorphism and Γ = O in the Axiom A case. If ν is the normalized Riemannian volume then the limit  N  ! 1 ln exp Wλ (T q1 (n) x, . . . , T q (n) x) dν(x) Q(Wλ ) = lim N →∞ N Γ n=1

LARGE DEVIATIONS

exists and it is differentiable in λ. Moreover, ˆλ + ϕ Q(Wλ ) = Π ln W

71

!

where Π is the topological pressure for T , ϕ is the minus logarithm of the differential ˆ λ as in the previous theorem with expanding coefficient on unstable leaves and W μ = μSRB being the Sinai-Ruelle-Bowen measure. A similar result holds true when T is a topologically mixing subshift of finite type with ν = μ being a Gibbs measure with a potential ϕ. 3.7. Reduction to linear qi ’s (k = ). Let qi (n) = in for i = 1, . . . , k ≤  and qj (n + 1) − qj (n) → ∞ as n → ∞ and qj (n − 1) ≥ qj−1 (n) ∀j > k and n ≥ n0 . 3.3. Proposition. Let V (x1 , . . . , x ) be a bounded continuous function and let the dynamical system {T n } be as before (though here only ψ-mixing suffices). Then,  !  N 1 q1 (n) x, . . . , T q (n) x) dν(x) limN →∞ N ln exp n=1 V (T  !  N (k) n 2n kn V (T x, T x, . . . , T x) dν(x) =0 − ln exp n=1 where for each m < , V (m) (x1 , . . . , xm ) = ln

 M

...

 M

exp(V (x1 , . . . , xm , xm+1 , . . . , x ))

dμ(xm+1 ) . . . dμ(x ) and V () = V. The same result holds true if we replace T n x by a Markov chain ξ(n) satisfying conditions as before. If k = 1 this reduces the problem to the standard (conventional) situation. For k > 1 the problem becomes complicated and we consider next the case of i.i.d. ξ(n)’s. 3.8. The i.i.d. case. We obtain LD for N  V (ξ(n), ξ(2n), . . . , ξ(kn)) SN (F ) = n=1

where ξ(n), n ≥ 1 are i.i.d. random variables (vectors) with a compact support M . Let r1 , . . . , rm ≥ 2 be all primes not exceeding k. Set An = {a ≤ n : dm for a is relatively prime with r1 , . . . , rm } and Bη (a) = {b ≤ η : b = ar1d1 r2d2 · · · rm some nonnegative integers d1 , . . . , dm }. Now for any bounded measurable function V on M k write   SN,a (V ) with SN,a (V ) = V (ξ(b), ξ(2b), . . . , ξ(kb)). SN (V ) = a∈AN

b∈BN (a)

Observe that SN,a (V ), a ∈ AV are independent. 3.4. Theorem. For any continuous function V on M k the limit ! N Q(V ) = limN →∞ N1 ln E exp n=1 V (ξ(n), ξ(2n), . . . , ξ(kn))  = limN →∞ N1 a∈AN ln E exp SN,a (V ) exists and the functional Q(V ) is convex and lower semi-continuous. If V = Vλ depends on a parameter λ and has a bounded derivative in λ then Q(Vλ ) is also differentiable in λ. Thus taking Vλ = λF we obtain that here also for k ≥ 2 both LD bounds hold true with the rate functional J being the Fenchel-Legendre transform J(u) = supλ (λu − Q(λF )) of Q.

72

Y. KIFER

3.9. Discussion. As a model application of Theorem 3.4 we can consider ∞ ξ(n,ω) digits ξ(n) = ξ(n, ω), n ≥ 1 of base M expansions ω = n=1 M n , ξ(n, ω) ∈ {0, 1, . . . , M − 1} of numbers ω ∈ [0, 1) which are i.i.d. random variables on the probability space ([0, 1), B, P ) where B is the Borel σ-algebra and P is the Lebesgue measure. Take, for instance, V (x1 , . . . , xk ) = δα1 x1 δα2 x2 · · · δαk xk for some α1 , . . . , αk ∈ {0, 1, . . . , M − 1} with δij = 1 if i = j and = 0, otherwise. Then the above theorem provides large deviations estimates for the number nα1 ,...,αk (N, ω) = #{n ≤ N : ξ(n, ω) = α1 , ξ(2n, ω) = α2 ,  . . . , ξ(kn, ω) = αk } = N n=1 V (ξ(n, ω), . . . , ξ(kn, ω)). The same setup can be reformulated in the following way. Consider infinite sequences of letters (colors, spins, etc.) taken out of an alphabet of size M . Let nα1 ,...,αk (N ) be the number of arithmetic progressions of length k with both the first term and the difference equal n ≤ N and having the letter (color, spin, etc.) αi on the place i = 1, 2, . . . , k. Then the above theorem yields large deviations bounds for nα1 ,...,αk (N ) as N → ∞ considered as a random variable on the space of sequences of letters with any product probability measure, in particular, with uniform probability measure which assigns the same weight to each combination of n consecutive letters (i.e. to each cylinder set of length n) for all n = 1, 2, . . . . Observe that a particular case of Theorem 3.4 was obtained in [9]. 3.10. A glimpse at the proof of Theorem 3.4. Write BN (a) = {b ∈ B(a) : b ≤ N } where dm for some nonnegative integers d1 , . . . , dm }. B(a) = {b ≥ 1 : b = ar1d1 r2d2 · · · rm  Then ZN (V ) = a∈AN ZN,a (V ) where, recall, An = {a ≤ n : a relatively prime with r1 , . . . , rm } and  ! Zη,a (V ) = E exp V (ξ(b), ξ(2b), . . . , ξ(kb)) . b∈Bη (a)

A crucial point here is that ZN,a (V ) is determined only by |BN (a)| and not by N ˆη (a) = Bη (a) ∪ {n : n = ln for some n ∈ Bη (a) and and a themselves. Set B l = 2, 3, . . . , k}. Then we can write    ! & V (xb , x2b , . . . , xkb ) dμ(xb ). Zη,a (V ) = . . . exp b∈Bη (a)

ˆη (a) b ∈B

It is easy to see from here that Zη,a (V ) = Zη/a,1 (V ) for any η > 0 and an integer a ≥ 2 relatively prime with r1 , . . . , rm . Set D(ρ) = {n = (n1 , . . . , nm ) ∈ Zm : ni ≥ 0, i = 1, . . . , m and

m 

ni ln ri ≤ ρ}

i=1 nm with then D(ln(N/a)) is in one-to-one correspondence (n1 , . . . , nm ) ↔ ar1n1 · · · rm BN (a).

LARGE DEVIATIONS

73

3.11. A glimpse at the proof continues. Let l = |BN (a)| and set Rl (V ) = ZN,a (V ) since the latter depends only on l (and, of course, on V ). Denote ρmin (l) = inf{ρ ≥ 0 : |D(ρ)| = l} and ρmax (l) = sup{ρ ≥ 0 : |D(ρ)| = l} which is well defined for each integer l ≥ 1 and ρmax (l) > ρmin (l) ≥ (l1/m − 1) ln 2. (l) Set AN = {a ∈ AN : |BN (a)| = l}. Then a computation shows (by a kind of inclusion-exclusion argument) that lim

N →∞

1 (l) |A | = (e−ρmin (l) − e−ρmax (l) )r N N

where r =1−

1 1 1 1 1 1 1 1 − + − + + − + · · · + (−1)m . 2 3 2·3 5 2·5 3·5 2·3·5 r1 · r2 · · · rm

It follows that 1 N

ln ZN (V ) = −→

 (l) 1 m |AN | ln Rl (V a∈AN ln ZN,a (V ) = N 1≤l≤(1+ ln12 ln N a ) ∞ −ρmin (l) r l=1 (e − e−ρmax (l) ) ln Rl (V ) as N → ∞

1 N



)

and the last series converges absolutely. If V = Vλ depends on λ and its derivative in λ exists and is bounded by C˜ then each ln Rl (Vλ ) is differentiable in λ with a ˜ Hence, the whole above series is differentiable in λ and derivative bounded by Cl. the assertion of Theorem 3.4 follows. 3.12. Nonconventional large deviations: 2nd level, discrete time. We consider here occupational measures on M  , N 1  δ(ξ(n),ξ(q2 (n)),...,ξ(q (n)) N n=1

where δa is the Dirac measure and ξ(n), n ≥ 0 is a Markov chain on a compact space M satisfying the Doeblin condition and having an invariant measure μ. For ˆ be as before. Let, again, a continuous W on M  let W  N  ! 1 W (ξ(q1 (n)), . . . , ξ(q (n))) dP Q(W ) = lim ln exp N →∞ N n=1 then by the Donsker-Varadhan formula  ! ˆ (x)dν(x) − I(ν) ˆ Q(W ) = sup ln W ν∈P(M )

ˆ where I(ν) = − inf u∈C+ (M )



M

ln Puu dν and P is the transition operator of ξ(n).

3.13. 2nd level nonconventional large deviations: continued. Next, (i) let Yn , i = 2, . . . , ; n = 0, 1, 2, . . . be i.i.d. M -valued random variables with the distribution μ all of them independent of the Markov chain ξ(n), n ≥ 0. Then it is easy to see that N  ! 1 ln E exp W (ξ(n), Yn(2) , . . . , Yn() ) = Q(W ). N →∞ N n=1

lim

74

Y. KIFER

Indeed, let Fξ be the σ-algebra generated by the Markov chain ξn , n ≥ 0. Then (2) () ! n=1 W (ξ(n), Yn , . . . , Yn ) !  (2) () E(exp( N n=1 W (ξ(n), Yn , . . . , Yn ))|Fξ ) N ˆ (ξ(n))) = E exp( n=1 ln W

E exp =E

N

implying the above formula. 3.14. 2nd level nonconventional large deviations: conclusion. But now (2) () we have the standard situation for the Markov chain (ξ(n), Yn , . . . , Yn ), n ≥ 0, and so  ! W (x1 , x2 , . . . , x )dν(x1 , . . . , x ) − I(ν) Q(W ) = sup ν∈P(M ×···×M )

where I(ν) = − inf u∈C+ (M ×···×M ) ln

Ex1





M ×···×M u(ξ(1),x1 ,...,x )dμ(x2 )...dμ(x ) dν(x1 , . . . , x ). u(x1 ,...,x )

It is known here that there exists a unique ν = νW on which the supremum above is attained and it follows from the standard theory that I(ν) is the rate functional for the 2nd level of large deviations for both occupational measures N 1  δ N n=1

(2) () ξ(n),Yn ,...,Yn

N  ! and 1 δ N n=1

ξ(n),ξ(q2 (n)),...,ξ(q (n))

!.

3.15. Nonconventional LD in the averaging setup. Nonconventional LD theorems above and their continuous time counterparts can be extended to the corresponding averaging setups in the discrete and continuous time cases X ε (n + 1) = X ε (n) + εB(X ε (n), ξ(q1 (n)), . . . , ξ(q (n))) and dX ε (t) dt

= εB(X ε (t), ξ(q1 (t)), . . . , ξ(q (t))).

First, we define the averaged vector field  ¯ Bν (x) = B(x, ξ1 , ξ2 , . . . , ξ )dν(ξ1 )dμ(ξ2 ) · · · dμ(ξ ) where μ is the unique invariant measure in the Markov chain or diffusion cases and μ is the SRB measure in the hyperbolic dynamical systems case. Next, for each a.c. curve γt , t ∈ [0, T ] we define the functional 

T

S0T (γ) =

¯ν (γt ), ν is T − invariant}dt inf{I(ν) : γ˙ = B

0

where T is a transformation (dynamical systems case) and I(ν) is the 2nd level LD functional on measures appeared many times in these lectures.

LARGE DEVIATIONS

75

3.16. LD in the averaging setup: statement. 3.5. Theorem. Set Z ε (t) = X ε ([t/ε]) or Z ε (t) = X ε (t/ε) in the discrete or continuous time case, respectively. Then for any a, δ, λ > 0 and every continuous γt , t ∈ [0, T ], γ0 = x there exist ε0 > 0 s.t. for ε < ε0 , 1 P {· : ρ0,T (Zx,· , γ) < δ} ≥ exp{− (S0,T (γ) + λ)} and ε 1 P {· : ρ0,T (Zx,· , Φa0,T (x))) ≥ δ} ≤ exp{− (a − λ)} ε where P is the probability (· = ω) in the Markov processes case, P is the normalized Riemannian volume (· = y, ξ(t) = T t y) in the hyperbolic dynamical systems case, ρ0,T is the uniform distance and Φa0,T (x) = {γ : γ0 = x, S0,T (γ) ≤ a}. 3.17. LD in averaging: a glimpse at the proof. The main part of the proof is to show that for any continuous on [0, T ] × M  function Wt (ξ1 , . . . , ξ ), ! T  limε→0 ε ln exp ε−1 0 Wt (ξ(q1 (t/ε)), . . . , ξ(q (t/ε)))dt dP T ˆ t )dt = 0 Q(W !   ˆ λ (x) = · · · exp Wλ (x, x2 , . . . , x ) dμ(x2 ) · · · dμ(x ) Here, again, where, recall, W P is the probability in the Markov processes case and P is the normalized Riemannian volume in the hyperbolic dynamical systems case where ξ(s) = T s y and the integration is then in y. In the discrete time case we replace t/ε by [t/ε]. When the limit above is established we obtain the theorem above via some general arguments in large deviations. It is easy to see that the limit above would follow if we could show that for any continuous functions Wi (ξ1 , . . . , ξ ), i = 1, . . . , k and for any numbers 0 = t0 < t1 < · · · < tk−1 < tk = T ,  k  ti Wi (ξ(q1 (t/ε)), . . . , limε→0 ε ln exp ε−1 i=1 ti−1 ! k ˆ i ). ξ(q (t/ε)))dt dP = i=1 (ti − ti−1 )Q(W 3.18. Nonconventional multifractal formalism. It is known that the multifractional formalism is related to large deviations and though this connection will not be used here I will discuss the corresponding problems. Recall that the multifractal formalism deals with computations of Hausdorff dimensions of sets having the form N 1  {x : lim f (T n x) = ρ}. n→∞ N n=1 In the nonconventional setup the multifractal spectrum was introduced in [30] as a range of Hausdorff dimensions of more general sets (3.1)

N 1  F (f1 (T q1 (n) x), . . . , f (T q (n) x)) = ρ}. N →∞ N n=1

Gρ = {x : lim

By the nonconventional ergodic theorem if μ is T -invariant mixing and   ρ = . . . F (f1 (x1 ), . . . , f (x ))dμ(x1 ) · · · dμ(x ) then μ(Gρ ) = 1 while otherwise μ(Gρ ) = 0 and it is natural (if μ ∼ Leb) to inquire about the Hausdorff dimension of Gρ .

76

Y. KIFER

3.19. Frequencies of words with gaps: preliminaries. Instead of this general problem we consider a more specific question about Hausdorff dimensions of sets of numbers with prescribed frequencies of specific combinations of digits in m-expansions. Namely, for any x ∈ [0, 1] and an integer m > 1 write x=

∞  ai−1 (x) i=1

mi

where aj (x) ∈ {0, 1, . . . , m − 1}, j = 0, 1, . . .

allowing zero tails of expansions but not tails consisting of all (m − 1)’s. This convention affects only a countable number of points, and so it does not influence computation of the Hausdorff dimensions. Denote by A = {0, 1, . . . , m − 1} the set of all -words. For each x ∈ [0, 1] and an -word α = (α1 , α2 , . . . , α ) ∈ A define Nα (x, n) = #{k > 0, k ≤ n : (aq1 (k) (x), . . . , aq (k) (x)) = α} where #Γ denotes the number of elements in the set Γ. For each probability vector   p = (pα , α ∈ A ) ∈ Rm , α∈A pα = 1 define Up = {x ∈ (0, 1) : lim

n→∞

1 Nα (x, n) = pα for all α ∈ A }. n

3.20. Frequencies of words with gaps: statement. We want to deal with the question of computation of the Hausdorff dimension HD(Up ) of Up . When  = 1 and q1 (k) = k we arrive at the classical question studied by Eggleston via combinatorial means and by Billingsley via ergodic theory. In order to relate the limit of n−1 Nα (x, n) to the nonconventional ergodic theorem define the transformation T x = {mx} where {·} denotes the fractional part. Identifying 0 and 1 we can view T as an expanding map of the circle. Now ai (x) = a0 (T i x) and if α = (α1 , α2 , . . . , α ) ∈ A and Γj = {x : a0 (x) = j} then Nα (x, n) =

n 

IΓα1 (T q1 (k) x)IΓα2 (T q2 (k) x) · · · IΓα (T q (k) x).

k=1

3.6. Theorem. Suppose that q1 (k) = k for all k and there exists a probability  vector r = (r0 , r1 , . . . , rm−1 ) such that pα = i=1 rαi for any α = (α1 , . . . , α ) ∈ A . Then for p = (pα , α ∈ A ),  − m−1 j=0 rj ln rj HD(Up ) = ln m with the convention 0 ln 0 = 0. 3.21. Frequencies of words with gaps: remarks. • For any T -invariant probability measure μ on [0, 1] with sufficient mixing properties it follows that μ-almost everywhere  & 1 Nα (x, n) = μ(Γαi ). n→∞ n i=1

lim

Hence, if p = (pα , α ∈ A ) and there exists no probability vector r =  (r0 , r1 , . . . , rm−1 ) such that pα = i=1 rαi then μ(Up ) = 0 for any μ as above, and so such μ cannot be used for computation of the Hausdorff dimension of Up (by one of the methods where measures are involved) which complicates the study in this case.

LARGE DEVIATIONS

77

• This type of results can be extended to digits of continued fraction expansions though in this case only estimates of the Hausdorff dimension rather than precise formulas can only be obtained. 3.22. Further remarks. In order to compare the sets Gρ and Up above consider Ω = AN = {ω = (ω0 , ω1 , . . . ) : ωi ∈ A} where A = {a1 , a2 , . . . am } is a finite alphabet. Let T : Ω → Ω, (T ω)i = ωi+1 be the left shift and set b(ω) = ω0 . For each ω and an -word α = (α1 , α2 , . . . , α ) ∈ A define Nα (ω, n) = #{k > 0, k ≤ n : (b(T q1 (k) ω), b(T q2 (k) ω, . . . , b(T q (k) ω)) = α}. For each probability vector p = (pα , α ∈ Aα ) define Up = {ω : lim n1 Nα (ω, n) = pα ∀α ∈ A }. If we put a natural metric on the sequence space Ω then the Hausdorff dimension HD(Up ) of Up can be obtained in the same way as in the above theorem under the same conditions on pα . Next, let F be a function on A then  Gρ = {ω : limn→∞ n1 nk=1 F (b(T q1 (k) ω), . . . , b(T q (k) ω)) = ρ}  = {ω : limn→∞ n1 α=(α1 ,...,α )∈A Nα (ω, n)F (α1 , . . . , α ) = ρ}. Hence, Gρ = ∪p {Up :



pα F (α1 , . . . , α ) = ρ}.

α=(α1 ,...,α )∈A

3.23. Other related results. In some particular cases Hausdorff dimensions of sets of the type Gρ were found recently. • Peres and Solomyak [34] computed the Hausdorff dimensions of the sets 1 xk x2k = ρ}. n→∞ n n

Aρ = {x = (x1 , x2 , . . . ), xi ∈ {0, 1} : lim

k=1

• Fan, Schmeling and Wu [15] computed the Hausdorff dimensions of the sets E(ρ) = {x = (x1 , x2 , . . . ), xi ∈ {0, 1, . . . , m − 1} : lim

n→∞

n 1 ϕ(xk , xqk , . . . , xq−1 k ) = ρ} n k=1

where q ≥ 2 is an integer. The latter paper relies on certain thermodynamic formalism type constructions corresponding to some nonlinear transfer operator. References [1] D. V. Anosov, Averaging in systems of ordinary differential equations with rapidly oscillating solutions (Russian), Izv. Akad. Nauk SSSR Ser. Mat. 24 (1960), 721–742. MR0126592 (23 #A3888) [2] I. Assani, Multiple recurrence and almost sure convergence for weakly mixing dynamical systems, Israel J. Math. 103 (1998), 111–124, DOI 10.1007/BF02762270. MR1613556 (99f:28021) [3] V. Bergelson, Weakly mixing PET, Ergodic Theory Dynam. Systems 7 (1987), no. 3, 337–349, DOI 10.1017/S0143385700004090. MR912373 (89g:28022) [4] R. Bowen, Equilibrium states and the ergodic theory of Anosov diffeomorphisms, Lecture Notes in Mathematics, Vol. 470, Springer-Verlag, Berlin-New York, 1975. MR0442989 (56 #1364) [5] V. Bakhtin and Y. Kifer, Nonconvergence examples in averaging, Geometric and probabilistic structures in dynamics, Contemp. Math., vol. 469, Amer. Math. Soc., Providence, RI, 2008, pp. 1–17, DOI 10.1090/conm/469/09158. MR2478463 (2010c:37122)

78

Y. KIFER

[6] V. Bergelson, A. Leibman, and C. G. Moreira, From discrete- to continuous-time ergodic theorems, Ergodic Theory Dynam. Systems 32 (2012), no. 2, 383–426, DOI 10.1017/S0143385711000848. MR2901353 [7] Y.-M. Chung, Large deviations on Markov towers, Nonlinearity 24 (2011), no. 4, 1229–1252, DOI 10.1088/0951-7715/24/4/011. MR2776118 [8] Y.-M. Chung and H. Takahasi, Large deviation principle for Benedicks-Carleson quadratic maps, Comm. Math. Phys. 315 (2012), no. 3, 803–826, DOI 10.1007/s00220-012-1540-x. MR2981814 [9] G. Carinci, J.-R. Chazottes, C. Giardin` a, and F. Redig, Nonconventional averages along arithmetic progressions and lattice spin systems, Indag. Math. (N.S.) 23 (2012), no. 3, 589– 602, DOI 10.1016/j.indag.2012.05.010. MR2948646 [10] D. Dolgopyat, Limit theorems for partially hyperbolic systems, Trans. Amer. Math. Soc. 356 (2004), no. 4, 1637–1689 (electronic), DOI 10.1090/S0002-9947-03-03335-X. MR2034323 (2005k:37053) [11] D. Dolgopyat, Averaging and invariant measures (English, with English and Russian summaries), Mosc. Math. J. 5 (2005), no. 3, 537–576, 742. MR2241812 (2007i:37063) [12] A. Eizenberg, Y. Kifer, and B. Weiss, Large deviations for Zd -actions, Comm. Math. Phys. 164 (1994), no. 3, 433–454. MR1291239 (95k:60071) [13] M. I. Fre˘ıdlin, The averaging principle and theorems on large deviations (Russian), Uspekhi Mat. Nauk 33 (1978), no. 5(203), 107–160, 238. MR511884 (81k:60059) [14] H. Furstenberg, Nonconventional ergodic averages, The legacy of John von Neumann (Hempstead, NY, 1988), Proc. Sympos. Pure Math., vol. 50, Amer. Math. Soc., Providence, RI, 1990, pp. 43–56, DOI 10.1090/pspum/050/1067751. MR1067751 (91i:28011) [15] A.-H. Fan, J. Schmeling, and M. Wu, The multifractal spectra of V-statistics, Further developments in fractals and related fields, Trends Math., Birkh¨ auser/Springer, New York, 2013, pp. 135–151, DOI 10.1007/978-0-8176-8400-6 7. MR3184191 [16] M. Kesseb¨ ohmer, Large deviation for weak Gibbs measures and multifractal spectra, Nonlinearity 14 (2001), no. 2, 395–409, DOI 10.1088/0951-7715/14/2/312. MR1819804 (2002a:60037) [17] R. Z. Hasminski˘ı, Stochastic processes defined by differential equations with a small parameter (Russian, with English summary), Teor. Verojatnost. i Primenen 11 (1966), 240–259. MR0203788 (34 #3636) [18] R. Z. Khasminskii, A limit theorem for solutions of differential equations with random righthand side, Th. Probab. Appl., 11 (1966), 390–406. [19] Y. Kifer, Large deviations in dynamical systems and stochastic processes, Trans. Amer. Math. Soc. 321 (1990), no. 2, 505–524, DOI 10.2307/2001571. MR1025756 (91e:60091) [20] Y. Kifer, Averaging in dynamical systems and large deviations, Invent. Math. 110 (1992), no. 2, 337–370, DOI 10.1007/BF01231336. MR1185587 (93m:60118) [21] Y. Kifer, Topics on large deviations and Random Perturbations, Univ. of N. Carolina, Chapel Hill, 1993. [22] Y. Kifer, Large deviations, averaging and periodic orbits of dynamical systems, Comm. Math. Phys. 162 (1994), no. 1, 33–46. MR1272765 (95b:58091) [23] Y. Kifer, Limit theorems in averaging for dynamical systems, Ergodic Theory Dynam. Systems 15 (1995), no. 6, 1143–1172, DOI 10.1017/S0143385700009834. MR1366312 (97c:58087) [24] Y. Kifer, Large deviations for paths and configurations counting, Ergodic theory of Zd actions (Warwick, 1993), London Math. Soc. Lecture Note Ser., vol. 228, Cambridge Univ. Press, Cambridge, 1996, pp. 415–432, DOI 10.1017/CBO9780511662812.018. MR1411231 (97i:60032) [25] Y. Kifer, Some recent advances in averaging, Modern dynamical systems and applications, Cambridge Univ. Press, Cambridge, 2004, pp. 385–403. MR2093312 (2005h:37067) [26] Y. Kifer, Averaging principle for fully coupled dynamical systems and large deviations, Ergodic Theory Dynam. Systems 24 (2004), no. 3, 847–871, DOI 10.1017/S014338570400001X. MR2062922 (2005f:37060)

LARGE DEVIATIONS

79

[27] Y. Kifer, Another proof of the averaging principle for fully coupled dynamical systems with hyperbolic fast motions, Discrete Contin. Dyn. Syst. 13 (2005), no. 5, 1187–1201, DOI 10.3934/dcds.2005.13.1187. MR2166264 (2006d:37045) [28] Y. Kifer, Large deviations and adiabatic transitions for dynamical systems and Markov processes in fully coupled averaging, Mem. Amer. Math. Soc. 201 (2009), no. 944, viii+129, DOI 10.1090/memo/0944. MR2547839 (2011c:60089) [29] Y. Kifer, Nonconventional limit theorems, Probab. Theory Related Fields 148 (2010), no. 1-2, 71–106, DOI 10.1007/s00440-009-0223-9. MR2653222 (2011k:60076) [30] Y. Kifer, A nonconventional strong law of large numbers and fractal dimensions of some multiple recurrence sets, Stoch. Dyn. 12 (2012), no. 3, 1150023, 21, DOI 10.1142/S0219493711500237. MR2926580 [31] Y. Kifer and S. R. S. Varadhan, Nonconventional limit theorems in discrete and continuous time via martingales, Ann. Probab. 42 (2014), no. 2, 649–688, DOI 10.1214/12-AOP796. MR3178470 [32] Y. Kifer and S. R. S. Varadhan, Nonconventional large deviations theorems, Probab. Theory Related Fields 158 (2014), no. 1-2, 197–224, DOI 10.1007/s00440-013-0481-4. MR3152784 [33] A. Potts, Multiple ergodic averages for flows and an application, Illinois J. Math. 55 (2011), no. 2, 589–621 (2012). MR3020698 [34] Y. Peres and B. Solomyak, Dimension spectrum for a nonconventional ergodic average, Real Anal. Exchange 37 (2011/12), no. 2, 375–387. MR3080599 [35] J. A. Sanders, F. Verhulst, and J. Murdock, Averaging methods in nonlinear dynamical systems, 2nd ed., Applied Mathematical Sciences, vol. 59, Springer, New York, 2007. MR2316999 (2008h:34001) Institute of Mathematics, Hebrew University, Jerusalem, Israel Institute of Mathematics, The Hebrew University, Jerusalem 91904, Israel E-mail address: [email protected]

Proceedings of Symposia in Pure Mathematics Volume 89, 2015 http://dx.doi.org/10.1090/pspum/089/01485

Thermodynamic formalism for countable Markov shifts Omri M. Sarig Abstract. We give an overview of the thermodynamic formalism for countable Markov shifts, and indicate applications to surface diffeomorphisms.

1. What is “thermodynamic formalism”? 1.1. Introduction. The ergodic properties of a dynamical system depend on the choice of invariant measure. If a dynamical system has several different ergodic invariant measures, which is often the case, then the question arises which measure to choose to analyze the system. Physicists working in statistical mechanics encounter a similar problem. Thermodynamic quantities are weighted averages of microscopically defined quantities. Which weighting scheme to use? Gibbs and Boltzmann found the weighting schemes (“ensembles”) which reproduce empirical observations. Sinai and Ruelle imported these schemes to dynamics.1 They showed that some of the most important invariant measures in smooth ergodic theory can be viewed as analogues of the Gibbs canonical ensemble, and that certain laws in statistical physics translate to mathematical theorems on the properties of these measures. Sinai called such measures Gibbs measures, and Ruelle called the entire program thermodynamic formalism. The word “formalism” is appropriate, since it is just the formal aspects of equilibrium statistical physics that is important. The physics itself does not play a role. True enough, thermodynamic formalism studies “Gibbs measures” associated to “interaction potentials.” But the theory usually makes no assumptions on the functional form of the potential, which is where the physical content lies. As a result, the theory applies to many problems in geometric measure theory, Riemannian geometry, and number theory, whose natural “interaction potentials” have nothing to do with real physical interactions. The beauty of thermodynamic formalism is that its language allows for making conjectures (which often turn out to be correct) in contexts far removed from physics, by following analogies with the physical world. 2010 Mathematics Subject Classification. Primary 37D35; Secondary 37C05, 37C30, 37C35. Key words and phrases. Thermodynamic formalism, topological Markov shifts, phase transitions. The author was supported by the European Science Foundation, ERC-2009-StG 239885. 1 Notes and historical references are collected at the end of each section. c 2015 American Mathematical Society

81

82

O. SARIG

In this section we will review some of the basic concepts of equilibrium statistical physics and explain how to translate them to language of ergodic theory. 1.2. The canonical ensemble. As a rough approximation, thermodynamics is a collection of empirical laws which govern the behavior of large systems (e.g. a litre of gas) on the large (“macroscopic”) scale, and statistical physics is a theoretic attempt to derive these laws from the assumption that matter is made from molecules which follow the laws of mechanics (quantum or classical). The basic idea is to model the thermodynamic quantities q of the system by functions q(x1 (t), . . . , xN (t)) of the individual state xi (t) of the molecules which constitute it (“microscopic state”). Because of the enormous number of particles (N ∼ 1024 ), the chaotic nature of their motion, and the fact that q(·) is usually not very sensitive to changes in an exponentially small fraction of its coordinates, one expects q(x1 (t), . . . , xN (t)) to fluctuate near a constant. Statistical mechanics interprets this constant as the thermodynamic quantity q we measure in the lab. To find q, we need to average q(x1 (t), . . . , xN (t)) in time. A central working as 1 T sumption (the “ergodic hypothesis”) is that the average  T 0 q(x1 (t), . . . , xN (t))dt can be approximated as T → ∞ by the space average q(x1 , . . . , xN )dμ(x1 , . . . , xN ) over all possible microscopic states (x1 , . . . , xN ), with respect to some measure μ on the space of configurations. The question is which measure to use. One of the important choices is called the canonical ensemble. Imagine a small closed non-thermally isolated gas container (“system A”) sitting in a large open room (“system B”). We assume that A can exchange energy with B; A cannot exchange particles with B; B is at equilibrium; and B is so large that the energy it exchanges with A does not affect its thermodynamic properties. Assume for simplicity that the list of possible microscopic states ξ = (x1 , . . . , xN ) of the container is finite. Gibbs’s rule for the probabilistic weight of state ξ at equilibrium is 1 −βU(ξ) e (1.1) Pr(ξ) := Z(β) where U (ξ) :=total energy of system A when in state ξ, β := kB1T where kB is  −βU(ξ) e is the a physical constant, T is the temperature of B, and Z(β) := ξ

normalizing constant, called the partition function. This probability distribution is called the canonical ensemble. In more complicated situations, e.g. when the container is open and can also exchange particles with the room, the function U (ξ) is replaced by a more complicated expression which leads to another weighting scheme, called the grand canonical ensemble. The details are not important. What is important is the form Pr(state) ∝ exp[−parameteric function of the state] where the choice of function and parameters depends on the physics of the problem. If the space of all possible configurations ξ is uncountable, then (1.1) does not make sense, because we cannot perform the normalization. In such cases, we have to use other methods to define the canonical ensemble. Mathematical physicists have come up with several ways to do this, which we now review. For the sake of concreteness we limit ourselves to a particular example: the one-sided one-dimensional lattice gas model.

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

83

1.3. One-dimensional lattice gas. In this model, we have a one-sided onedimensional array of sites indexed by 0,1,2, etc. Each site can be either empty (“0”) or full (“1”). Thus the space of all possible microscopic states is X := {(x0 , x1 , x2 , . . .) : xi = 0 or 1}. Suppose the particles at the full sites interact with each other and/or with an external force field. This gives each (occupied) site “potential energy”. Let U (x) = U (x0 |x1 , x2 , x3 , . . .) denote the energy content of site zero due to its interaction with the world around it. One can think of U (x) as minus the energy required to “break” site zero and push it to infinity. Breaking sites 1, . . . , n − 1 successively, we find that the energy content of the first n sites due to their interaction with the world is U (x) + U (σx) + · · · + U (σ n−1 x), where σ : X → X is the left  shift map defined by σ(x0 , x1 , x2 , . . .) = (x1 , x2 , x3 , . . .). The “total energy” is then k≥0 U (σ k x). Now try to apply (1.1): Pr(x) =

 “ exp[−β k≥0 U(σ k x)]”   . “ y∈X exp[−β k≥0 U(σ k y)]”

The problems are

immediately apparent: the expressions within quotation marks are not necessarily finite, and the sum in the denominator has uncountably many terms. Mathematical physicists came up with several alternatives to (1.1) which do make sense. We will review these alternatives, paying special attention to the heuristics which motivate them. 1.4. Dobrushin-Lanford-Ruelle states. Instead of giving a formula for Pr directly, we will give formulas for the conditional probabilities of x0 , . . . , xn−1 given (xn , xn+1 , . . .). Write znm = (zn , zn+1 , . . . , zm ) and x = (y0n−1 , x∞ n ). (1.1) and the identity  n−1  n−1 k n−1 ∞ ∞ , xn )] = k=0 U (yk , xn ) + k≥n U (σ k x) give k≥0 U [σ (y0     n−1 ∞   n−1 1 ∞ k “ Z(β) exp −β U (yk , xn ) exp −β U (σ x) ” “ Pr(y0n−1 |x∞ k=0 k=n n )”     = n−1 ∞   “ Pr(z0n−1 |x∞ n−1 1 n )” ∞ k “ Z(β) exp −β U (zk , xn ) exp −β U (σ x) ” k=0

k=n

(terms within quotation marks are not well-defined)   n−1  n−1 ∞ exp −β U (yk , xn ) k=0  . = n−1  n−1 ∞ exp −β U (zk , xn ) k=0

The meaningless quantities cancelled out. Fixing z0n−1 and varying y0n−1 we get n−1 n−1 k “ Pr(y0n−1 |x∞ , x∞ n )” ∝ exp[−β n )]. Now there is no problem to k=0 (U ◦ σ )(y0 normalize, because (y0 , . . . , yn−1 ) ranges over a finite set, with 2n elements. To make this precise, let B denote the smallest σ–algebra which makes the coordinate functions Xi (x0 , x1 , x2 , . . .) = xi measurable. B is the Borel σ–algebra for the metric d(x, y) = exp[− min{i ≥ 0 : xi = yi }], and σ −n (B) is generated by Xi , i ≥ n. Given a Borel probability measure m, we define (1.2)

m(x0 , . . . , xn−1 |xn , xn+1 , . . .) := Em (1[x0 ,...,xn−1 ] |σ −n (B))(x).

84

O. SARIG

Em is the conditional expectation, and 1[x0 ,...,xn−1 ] is the indicator function of [x0 , . . . , xn−1 ] := {y : yi = xi (i = 0, . . . , n − 1)}. The right hand side of (1.2) an L1 element, not a function, and is only defined a.e. Definition 1.1. A Borel probability measure m on (X, B) is called a Dobrushin– Lanford–Ruelle (DLR) state for the potential U at inverse temperature β, if  1 exp[−β m(x0 , . . . , xn−1 |xn , xn+1 , . . .) = U (σ i (x))] for m–a.e. x ∈ X Zn (β, x∞ n ) i=0 n−1

where Zn (β, x∞ n ) :=

 y∈σ −n {σ n (x)}

exp[−β

n−1 i=0

U (σ i (y))].

The equations for m(x0 , . . . , xn−1 |xn , xn+1 , . . .) are called the DLR equations. It is not clear that they can be solved, or that if a solution exists then it is unique. We will discuss these questions later. The DLR equations appear quite naturally in ergodic theory, even in situations which have nothing to do with physics. To explain how, we need the following definitions. We continue to work in the context of the left shift σ on X = {0, 1}N , postponing the discussion of the general case to later sections. A Borel measure ν on X is called non-singular, if for every Borel set E, ν[σ −1 (E)] = 0 ⇔ ν(E) = 0. Let ν ◦ σ denote the measure (ν ◦ σ)(E) := ν[σ(E ∩ [0])] + ν[σ(E ∩ [1])], where [a] := {x : x0 = a}. Notice that (ν ◦σ)(E) ≥ ν[σ(E)], with strict inequality possible due to the non-invertibility of σ. It is easy to see that if ν is non-singular, then dν is well–defined. We ν # ν ◦ σ, and therefore the Radon-Nikodym derivative dν◦σ dν 2 call dν◦σ the Jacobian of ν w.r.t. σ. Suppose φ : X → R is Borel. A Borel probability measure ν on X is called a conformal measure for φ if it is non-singular, and if there is a constant λ s.t. dν = λ−1 exp φ, ν ◦ σ–almost everywhere. dν ◦ σ dν Every non-singular measure ν is conformal for φ := ln dν◦σ . The next theorem shows that every non-singular measure is a DLR states for some suitable (measurable) “interaction potential.”

Theorem 1.2. Any conformal measure for φ := −βU is a DLR state with potential U at inverse temperature β. Proof. Given two a, b ∈ {0, 1}n , the holonomy map ϑa,b : [a] → [b] is the dν ∞ −1 bijection ϑa,b (a, x∞ exp φ, n ) = (b, xn ). It is standard to check that if dν◦σ = λ  dν◦ϑa,b ∞ ∞ ∞ k k then dν = exp Φ(axn , bxn ), where Φ(u, v) := k=0 [φ(σ v) − φ(σ u)]. Φ(u, v) makes sense for all pairs (u, v) s.t. σ n (u) = σ n (v) for some n.

2 If T : [0, 1] → [0, 1] is an expanding piecewise smooth interval map with two full branches, dν is the symbolic coding of 1/|T  |. and ν is the symbolic coding of Lebesgue’s measure, then dν◦σ

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

85

By the martingale convergence theorem, for all words a and b of length n, ν ◦ ϑa,b [a, xn+k ] ν(b0 , . . . , bn−1 |xn , xn+1 , . . .) ν[b, xn+k ] n n = lim = lim n+k ν(a0 , . . . , an−1 |xn , xn+1 , . . .) k→∞ ν[a, xn ] k→∞ ν[a, xnn+k ]  1 ∞ = lim exp Φ(ax∞ n , bxn )dν n+k k→∞ ν[a, xn+k ] [a,xn ] n n+k ∞ ∞ ∞ = lim Eν (exp Φ(ax∞ )(ax∞ n , bxn )|x0 n ) = exp Φ(axn , bxn ) a.s. k→∞

Treating a as a constant and b as a variable, we get ν(b|x∞ n ) ∝ exp whence the DLR equations.

n−1 i=0

φ[σ i (bx∞ n )], 

1.5. Thermodynamic limits. Another approach for making sense of (1.1) is to “approximate” X from within by countable or finite Xn ⊂ X, define the canonical ensembles for Xn , and pass to the limit. One popular approximation scheme is to impose “periodic boundary conditions”: Xnper := {x ∈ X : σ n (x) = x}. Another common approach is to impose a “fixed boundary condition”: Xn = Xnx := σ −n {x} = {y ∈ X : yi = xi−n (i ≥ n)}. We focus on the latter approach, because it leads more quickly to one of the basic tools of the trade, Ruelle’s operator. If we apply (1.1) blindly to Xn then we get the following formula (expressions within quotation marks are not well-defined):  i n “ exp[−β ∞ i=0 U (σ (y))]”δx (σ y) ∞ PrXn (y) =  (δx = Dirac’s delta function) “ σn (z)=x exp[−β i=0 U (σ i (z))]” ∞ n−1 exp[−β i=0 U (σ i (y))]δx (σ n y) · “ exp[−β i=0 U (σ i (x))]” =  n−1 ∞ i i σ n (z)=x exp[−β i=0 U (σ (z))] · “ exp[−β i=0 U (σ (x))]” n−1 exp[−β i=0 U (σ i (y))]δx (σ n y) . =  n−1 i σ n (z)=x exp[−β i=0 U (σ (z))] The idea is now to let n → ∞ and look for weak-star limit points. The measures PrXn can be expressed very efficiently using Ruelle’s operator Lφ : C(X) → C(X)  eφ(y) f (y), (1.3) (Lφ f )(x) = σ(y)=x

where φ := −βU . To do this, let φn := φ + φ ◦ σ + · · · + φ ◦ σ n−1 , then it is not  difficult to see that (Lnφ f )(x) = σn y=x eφn (y) f (y). Thus for every f : X → R,  φn (y)  f (y) (Lnφ f )(x) σ n (y)=x e . = f dPrXn =  φn (y) (Lnφ 1)(x) σ n (y)=x e Equivalently, PrXn = (Lnφ )∗ δx /(Lnφ 1)(x). Definition 1.3. A thermodynamic limit with potential U , inverse temperature β and boundary condition x is a weak-star limit point of PrXn = (Lnφ )∗ δx /(Lnφ 1)(x) as n → ∞. Here φ = −βU .

86

O. SARIG

To understand the weak-star limit points of PrXn , one needs to understand the asymptotic behavior of Lnφ as n → ∞. Again we see the relevance to dynamics: Averaging operators of the form (1.3) appear naturally in dynamics as transfer operators, or dual operators for nonsingular measures. The asymptotic behavior of their powers contains information on the ergodic and stochastic properties of the dynamical system, and any general tool for determining this behavior has potential applications to dynamics. 1.6. Equilibrium measures. A completely different approach to the canonical ensemble is to characterize the probability distribution (1.1) as a solution to a variational problem, and then hope that the variational problem is well–posed when the configuration space is uncountable. This can be done. Let (U1 , . . . , UN ) be a vector, representing the energies of a system which can only occupy a finite number of states (1 to N ).  The canonical ensemble distribution of this system p = (p1 , . . . , pN ), pi = e−βUi / e−βUi turns out to be the unique minimizer of the quantity N 

N  1 pi Ui − H(p), where H(p) = − pi ln pi . F = β i=1 i=1  F is called the average Helmholtz free energy, pi Ui is called the average energy of p, and H(p) is called the entropy.3 The canonical ensemble can thus be thought of as the probability distribution which minimizes the average free energy. Ruelle had the idea of using a similar principle for thelattice gas model. Worried by the fact that the total energy k≥0 U ◦ σ k diverges at many configurations, we will replace the total free energy by the free energy per site, defined for a Borel probability measure μ by the following limit, when it exists: ⎛ ⎞    n−1  1⎝  1 U (σ k x)dμ − μ[a] ln μ[a] ⎠ lim − n→∞ n β n k=0

a∈{0,1}

If μ is shift invariant  and U is absolutely integrable w.r.t μ, then the limit does exist, and it equals U dμ − β1 hμ (σ) where hμ (σ) is the metric entropy of μ.  Thus1it makes sense to look for the shift invariant measures μ which minimize U dμ − β hμ (σ). The custom in dynamics is to pull the negative constant − β1 out,  and look for measures which maximize hμ (σ) + φdμ, where φ := −βU . Definition 1.4. An equilibrium measure for U at  inverse temperature β is a shift invariant measure μ which maximizes hμ (σ) + φdμ for φ = −βU , over all invariant Borel measures. Notice that equilibrium measures are shift invariant by definition. This is not always the case for DLR states or thermodynamic limits. However, we shall see later that for countable Markov shifts, equilibrium measures only differ from DLR states or thermodynamic limits by a positive density function. 1.7. Gibbs measures in the sense of Bowen. This definition, due to Bowen, originated in dynamics rather than mathematical physics: 3 Notice

that we used ln, not log2 , to define H(p).

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

87

Definition 1.5. Let φ := −βU . A Gibbs measure in the sense of Bowen is a shift invariant Borel probability measure m s.t. that for some M > 1 and P ∈ R, for every finite word a = (a0 , . . . , an−1 ) m[a0 , . . . , an−1 ] ≤ M for all x ∈ [a]. n−1 exp( k=0 φ(σ k x) − nP )  k This is a strong uniform way of saying that m[a] $ Z1n exp(−β n−1 k=0 U (σ x)), where Zn = exp(nP ). It can be shown that for the lattice gas model, if U is H¨older continuous, then every equilibrium measure is a Gibbs measure in the sense of Bowen. This extends to topological Markov shifts with a finite alphabet. But when the alphabet is infinite, Gibbs measures in the sense of Bowen cannot exist unless the underlying shift satisfies a very strong combinatorial property, called the big images and preimages property, see Theorem 5.9 below. Therefore while this definition is central to the thermodynamic formalism of subshifts of finite type, it is less important to the thermodynamic formalism of countable Markov shifts.

(1.4)

M −1 ≤

1.8. Plan of the survey. We have given several non-equivalent definitions for the canonical ensemble in the case of the lattice gas model {0, 1}N : DLR states, thermodynamics limits, equilibrium measures, and Gibbs measures in the sense of Bowen. Do these measures exist? Are they unique? What are their ergodic properties? What are their stochastic properties? Do the different approaches to the canonical ensemble lead to the same measures? We will discuss these questions in the case when {0, 1}N is replaced by a general topological Markov shift. The interest in such questions is that topological Markov shifts appear as symbolic models for smooth dynamical systems, and measures with smooth Jacobians (which as we saw above can be viewed to be DLR states for a smooth potentials) appear naturally in dynamical contexts. The interest in countable state topological Markov shifts is that they appear in the non-uniformly hyperbolic setup, see §8.1. At the end of the survey we will mention applications to surface diffeomorphisms. 1.9. Notes and references. The different approaches to the canonical ensemble originate in the mathematical theory of the foundations of equilibrium statistical physics. Equilibrium measures were introduced by Ruelle [Rue67]. DLR measures were introduced by Dobrushin [Dob68a], [Dob68b], [Dob68c] and Lanford & Ruelle [LR69]. What we call “thermodynamic limits” are called in the literature “Gibbs states” or “Gibbs measures.” DLR measures are also called “Gibbsian random fields.” We chose different terminology, to avoid confusion. Ruelle gave a complete analysis of the one-dimensional lattice gas model in [Rue68]. Around the same time, Sinai constructed Markov partitions for Anosov diffeomorphisms [Sin68a],[Sin68b], and showed that the resulting symbolic models are simple generalizations of the lattice gas models of statistical physics. This discovery linked hyperbolic dynamics to mathematical statistical physics. Sinai suggested to use the thermodynamic approach to study Anosov diffeomorphisms [Sin72]. He showed that some of the natural invariant measures in hyperbolic dynamics can be viewed as certain types of thermodynamic limits. Bowen [Bow75] and Ruelle [Rue76] explained how equilibrium measures appear naturally in the theory of Axiom A diffeomorphisms. Bowen showed that in this case,

88

O. SARIG

equilibrium measures are also what we call “Gibbs measures is the sense of Bowen” [Bow75]. Since then the thermodynamic formalism has been applied to a wide variety of dynamical problems. Here is a partial list of excellent references to thermodynamic formalism, with indications of some of the applications they discuss: [Bow75] (Anosov diffeomorphisms), [Rue78] (equilibrium statistical physics), [PP90] (Anosov flows), [Zin96] (complex dynamics), [Kel98] (interval maps), [MU03] and [PU10] (geometric measure theory). 2. Topological Markov shifts (TMS) 2.1. One sided topological Markov shifts. Suppose G = G(V, E) is a directed graph with a finite or countable collection of vertices V and edges E ⊂ V × V . We always assume that every vertex v has at least one in-coming edge a → v and at least one out-going edge v → b. The notation a → b means that n → b means that there are vertices ξ1 , . . . , ξn s.t. (a, b) ∈ E, and the notation a − a → ξ1 → · · · → ξn−1 → b. In this case we say that a connects to b in n steps, and that (a, ξ1 , . . . , ξn−1 , b) is admissible. Definition 2.1. The (one-sided) topological Markov shift (TMS) associated to G is the set Σ+ (G) := {(x0 , x1 , . . .) ∈ V N0 : xi → xi+1 for all i}, together with the metric d(x, y) := exp[− min{n : xn = yn }] and the action of the left shift map σ(x0 , x1 , x2 , . . .) = (x1 , x2 , . . .). V is called the alphabet, and elements of V are called states. The matrix (tab )V ×V , tab = 1 when a → b and tab = 0 when a → b, is called the transition matrix. (Σ+ (G), d) is a complete and separable metric space. Its topology is generated by the cylinders [a0 , . . . , an−1 ] := {x ∈ Σ+ (G) : xi = ai (i = 0, . . . , n − 1)}. The cylinders are open and closed. Σ+ (G) is compact iff V is finite. When V is infinite, Σ+ (G) is not compact, and sometimes not even locally compact. It is easy to check that Σ+ (G) is locally compact iff for every a ∈ V , #{b ∈ V : a → b} < ∞. Compact TMS are also called subshifts of finite type. Non-compact TMS are also called countable Markov shifts (“countable” relates to V , not to Σ+ (G)). Definition 2.2. A topological Markov shift Σ+ (G) is called topologically trann sitive, if for all a, b ∈ V , ∃n s.t. a − → b, and topologically mixing if for all a, b ∈ V , n ∃N = N (a, b) s.t. a − → b for all n ≥ N . It is routine to show that Σ+ (G) is topologically transitive iff σ is topologically transitive (i.e. for all open sets U, V there exists n > 0 s.t. U ∩ σ −n (V ) = ∅). Similarly, Σ+ (G) is topologically mixing iff σ is topologically mixing (i.e. for all open sets U, V there exists N = N (U, V ) > 0 s.t. U ∩ σ −n (V ) = ∅ for all n > N ). Remark 2.3. Let A denote the transition matrix of G. If |V | < ∞, then topological mixing is equivalent to the existence of N s.t. all the entries of AN are positive. But when V is infinite, topological mixing is strictly weaker than this. The following two theorems allow in many cases to reduce the discussion of general TMS to the topologically transitive or even topologically mixing case:

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

89

Theorem 2.4. Suppose μ is a shift invariant ergodic measure on Σ+ (G). Let Gμ = G(Vμ , Eμ ) be the subgraph of G with vertices Vμ := {a ∈ V : μ[a] = 0} and edges Eμ := {(a, b) ∈ E : μ[a, b] = 0}, then μ is supported inside Σ+ (Gμ ), and Σ+ (Gμ ) is topologically transitive. Theorem 2.5 (Spectral decomposition). Let Σ+ (G) denote a topologically transitive TMS. There exists a natural number p, called the the period, such that Σ+ (G) = X0 % · · · % Xp−1 , σ(Xi ) = Xi+1(mod p) , and s.t. σ p : Xi → Xi is topologically conjugate to a topologically mixing TMS for all i. n

→ a} for some (all) states a. The period is given by p := gcd{n : a − 2.2. Regularity of functions. The modulus of continuity of φ : Σ+ (G) → R is captured by the decay of the variations of φ: varn φ := sup{φ(x) − φ(y) : xi = yi (i = 0, . . . , n − 1)}. A function φ : Σ+ (G) → R is called (1) Markovian, if var2 φ = 0. In this case φ(x) = φ(x0 , x1 ). (2) Weakly H¨ older continuous, if ∃A > 0, θ ∈ (0, 1) s.t. ∀n ≥ 2, varn φ ≤ Aθ n . older. If in If in addition var1 φ < ∞ then we say that φ is locally H¨ addition φ is bounded, then we say that φ is H¨older. The number θ is called a H¨ older exponent of φ.  (3) A function with summable variations, if n≥2 varn φ < ∞. Every Markovian function is weakly H¨older continuous, and every weakly H¨older continuous function has summable variations. Neither of these conditions implies that φ is bounded, and it could even happen that var1 φ = ∞.4 The essence of the summable variations condition is the following estimate: Lemma 2.6. Let φn := φ + φ ◦ σ + · · · + φ ◦ σ n−1 . For every n ≥ 1 and = y0m−1 , then every admissible word a = (a0 , . . . , an−1 ), if x, y ∈ σ[an−1 ] and xm−1 0 |φn (a x) − φn (a y)| ≤ k≥m+1 vark φ. In particular, if φ has summable variations, then supn≥1 [varn+k φn ] −−−−→ 0. k→∞

2.3. Two-sided topological Markov shifts. Suppose G = G(V, E) is a finite or countable directed graph, such that every vertex a has at least one outgoing edge and at least one in-coming edge. Definition 2.7. The two-sided topological Markov shift associated to G is the set Σ(G) := {x ∈ V Z : xi → xi+1 for all i ∈ Z}, together with the metric d(x, y) := exp[− min{|n| : xn = yn }] and the action of the left-shift map σ(x)i = xi+1 . The conditions for the compactness, topological transitivity, and topological mixing of Σ(G) are the same as for Σ+ (G). Local compactness is different: a twosided TMS is locally compact iff for every a, #{(u, v) : u → a → v} < ∞. Cylinders are also slightly more complicated because of the need to keep track of the left-most coordinate of the constraint. We will use the notation m [a0 , . . . , an−1 ]

:= {x ∈ Σ(G) : xm+i = ai (i = 0, . . . , n − 1)}.

4 It is important to include unbounded functions in the discussion, because for non-compact TMS with infinite topological entropy, only unbounded functions can have finite pressure.

90

O. SARIG

Every shift invariant Borel probability measure μ on Σ(G) defines a shift invariant Borel probability measure μ+ on Σ+ (G) by μ+ ([a]) := μ(0 [a]) and the Carath´eodory extension procedure. Conversely, every shift invariant Borel probability measure μ+ on Σ+ (G) determines a unique shift invariant measure μ on Σ(G) through the equations μ(m [a]) := μ+ [a]. This is an instance of the “natural extension” procedure. Every φ+ : Σ+ (G) → R defines a function φ on Σ(G) by φ(x) := φ+ (x0 , x1 , . . .). The converse is not true, but the following is enough for most applications. Theorem 2.8. Suppose φ : Σ(G) → R has summable variations, then there exists φ+ : Σ+ (G) → R with summable variations and a bounded continuous function h : Σ(G) → R s.t. for every x ∈ Σ(G), φ(x) + h(x) − h(σx) = φ+ (x0 , x1 , . . .). Most of the thermodynamic formalism is invariant under addition of coboundaries: what works for φ, works for φ + h − h ◦ σ. Therefore this theorem allows to reduce problems on two-sided shifts to problems for one-sided shifts. Henceforth, unless stated otherwise, all topological Markov shifts are one-sided. 2.4. Notes and references. For the material in §2.1, see chapter 4 in [Aar97]. Theorem 2.8 was first proved for compact Markov shifts and H¨ older continuous potentials by Sinai [Sin72], see also [Bow75]. For functions with summable variations, see [CQ98] (compact TMS) and [Dao13] (non-compact TMS).

3. Conformal measures and their ergodic properties 3.1. Conformal measures and DLR states. Let X := Σ+ (G) denote a topological Markov shift with set of states V . Given a non-singular σ–finite  Borel measure ν on X, let ν ◦ σ denote the measure (ν ◦ σ)(E) = a∈V ν[σ(E ∩ [a])]. It is easy to verify that for all non-negative Borel functions, f dν ◦ σ =  a∈V σ[a] f (ax)dν(x). Also, ν # ν ◦ σ. Definition 3.1. The Jacobian of ν is gν :=

dν dν◦σ .

Definition 3.2. A non-singular Borel measure ν, which is finite and positive on cylinders, is called conformal for a function φ, if there is a constant λ s.t. gν = λ−1 exp φ, ν ◦ σ–a.e. There is a useful  characterization of conformal measures in terms of Ruelle’s operator Lφ f = σ(y)=x eφ(y) f (y): Theorem 3.3. ν is conformal for φ iff ν is an eigenmeasure  of Lφ : ∃λ > 0 s.t. L∗φ ν = λν, i.e. for all non-negative measurable functions f , Lφ f dν = λ f dν.    This follows immediately from the identity f dν ◦ σ = a∈V σ[a] f (ax)dν(x). Every conformal measure ν s.t. ν(X) = 1 satisfies the DLR equations ν(x0 , . . . , xn−1 |xn , xn+1 , xn+2 , . . .) = 

exp φn (x) exp φn (y)

ν–a.e.

y∈σ −n {σ n x}

where φn = φ + φ ◦ σ + · · · + φ ◦ σ n−1 . The proof is the same as in the case X = {0, 1}N , which was discussed in the first section.

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

91

It follows that any non-singular Borel probability measure is a DLR state for some “potential” φ (equal to − log gν ). What is special about the measures studied in the thermodynamic formalism, is that their “potential” is not just any Borel function, but a function with good regularity properties. 3.2. Ergodic properties when the log Jacobian is regular. Recall the following general properties of non-singular measurable maps T on a σ–finite standard measure space (Ω, F , ν): (1) Conservativity: All wandering sets have measure zero. A wandering set is a measurable set W s.t. {T −k W }k≥0 are pairwise disjoint. (2) Ergodicity: Every T –invariant set E satisfies ν(E) = 0 or ν(Ω \ E) = 0. An invariant set is a set E s.t. T −1 E = E. (3) Exactness: Every tail set E satisfies ν(E) = 0or ν(Ω \ E) = 0. A tail set is a set which belongs to the tail σ–algebra n≥0 T −n F . This implies mixing. For invertible probability preserving maps, we also have the following property: (4) Bernoulli property: Measure theoretic isomorphism to a map of the form σ : S Z → S Z with |S| ≤ ℵ0 and the measure m(m [am , . . . , an ]) = pam · · · pan , where p is a fixed probability vector. Theorem 3.4. Let X be a topologically transitive TMS and suppose ν is a conformal measure with Jacobian gν = λ−1 eφ , where φ has summable variations. If X is compact (equivalently the number of states is finite), then (1) ν is conservative and ergodic. (2) There is a positive continuous function h s.t. dm = hdν is an invariant probability measure. (3) If X is topologically mixing, then ν and m are exact. (4) If X is topologically mixing, the natural extension of (X, m, σ) is Bernoulli. When the number of states is infinite and X is not compact, the situation is more complicated. The simplest example of the new phenomena possible in this case is a transient countable Markov chain: The associated Markov measure is conformal with a Markovian potential, but the measure is not conservative. Given a function φ : X → R and a state (vertex) a, let  eφn (x) 1[a] (x), where φn = φ + φ ◦ σ + · · · + φ ◦ σ n−1 . Zn (φ, a) := σ n (x)=x

Theorem 3.5. Let X denote a topologically transitive TMS and suppose ν is a conformal measure with Jacobian gν = λ−1 eφ , where φ has summable variations. ∞ ∞   (1) If λ−n Zn (φ, a) = ∞ for some a, then λ−n Zn (φ, a) = ∞ for all a. n=0 n=0  (2) ν is conservative iff n≥0 λ−n Zn (φ, a) = ∞ for some a. In this case ν is ergodic, and in the topologically mixing case exact.  (3) If n≥0 λ−n Zn (φ, a) = ∞ for some a, then there is a positive continuous  function h s.t. dm = hdν is shift invariant. It is possible that hdν = ∞.  −n Zn (φ, a) = ∞, and the measure m in (3) is a probability (4) If n≥0 λ measure, and X is topologically mixing, then the natural extension of (X, B, m, σ) is Bernoulli.

92

O. SARIG

3.3. Existence of conformal measures. Theorem 3.6. Suppose X is a topologically transitive TMS, and φ : X → R is continuous. If X is compact (equivalently, the number of states is finite), then φ has a finite conformal measure. As noted by Ruelle, the theorem  can be proved by considering the action of the continuous map T (ν) := (L∗φ ν)/ Lφ 1dν on the convex weak-star compact set P(X) := {shift invariant probability measures on X}. The Schauder-Tychonoff theorem provides a fixed point T (ν) = ν, this fixed point is an eigenmeasure of Lφ , and eigenmeasures of Lφ are conformal measures by Theorem 3.3. This argument fails when X is not compact, because in this case P(X) may lose its compactness. The following theorem summarizes our knowledge on the non-compact case: Theorem 3.7. Suppose X is a topologically transitive TMS, and φ : X → R has summable variations, then φ has a conservative (possibly infinite) conformal measure ν s.t. gν = λ−1 eφ iff 1 (1) lim n ln Zn (φ, a) = ln λ for some (whence every) a, and  supn→∞ −n (2) Zn (φ, a) = ∞. n≥0 λ Infinite conformal measures do appear. To see how, suppose ν is a finite conformal measure with Jacobian λ−1 exp φ. For every positive h : X → R, dν  := hdν has Jacobian λ−1 exp φ where φ := φ + ln h − (ln h) ◦ σ. There is no problem to cook a continuous h s.t. log h − log h ◦ σ has summable variations, but hdν = ∞. For such h, φ has an infinite conformal measure ν  = hdν. 3.4. Uniqueness of conformal measures. This depends on the regularity of φ, as can be seen from the following theorems. Theorem 3.8. Suppose X is a topologically transitive topological Markov shift and φ : X → R has summable variations, then φ can have at most one conservative conformal measure. In the special  case X = {0, 1}N , more is known. A g–function is a continuous g : X → (0, 1) s.t. σy=x g(y) = 1. It can be shown that a conformal measure with continuous Jacobian is shift invariant, iff its Jacobian is a g–function. Conformal measures for a g–function are also called g–measures. Theorem 3.9. Suppose X = {0, 1}N and g : X → R is a g–function.  measure. (1) If (varn ln g)2 < ∞, then φ = ln g has a unique conformal  (2) For every  > 0 there is a g–function g satisfying (varn ln g)2+ < ∞, s.t. φ = ln g has more than one conformal measure. 3.5. Notes and references. Theorem 3.4 (1)–(3) is from [ADU93], and part ¨ (4) is from [Sar11], see also [Dao13],[JOP12]. In the compact case, these results are older: ergodicity, existence of acip (and whence conservativity) can be traced to [R´ en57], and the Bernoulli property of the natural extension of the acip (whence exactness) is due to Bowen [Bow75a], see also [Gal73]. The existence of conformal measures for compact topological Markov shifts is due to Ruelle [Rue68],[Rue76]. Theorem 3.7 on the non-compact case follows from results in [Sar01b] (the proof there is stated for locally H¨older potentials, but works verbatim under summable variations).

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

93

The uniqueness of conformal measures with potentials with summable variations is due to Bowen [Bow75] and Ruelle [Rue76] in the finite alphabet case. In the infinite alphabet case, it can be deduced from the fact if such measures are conservative, then they are ergodic [ADU93]. The concept of g–measures is due to Keane [Kea72], and its importance to thermodynamic formalism is explained in [Led74] and [Wal75a]. Examples of continuous g–functions with several g-measures are given in [BK93]. Part (1) of ¨ ¨ ¨ Theorem 3.9 is due to Johansson & Oberg [JO03], see [JOP12] for generalizations. Part (2) is due to Berger, Hoffman, and Sidoravicius [BHS03].

4. Ruelle’s operator, thermodynamic limits, and modes of recurrence 4.1. Thermodynamic limits and Ruelle’s operator. When we discussed the lattice gas model X = {0, 1}N with the interaction U , we defined thermo dynamic limits as a weak-star limit points of (1/Zn (x)) σn (y)=x eφn (y) δy , where  φ = −βU , x is a boundary condition, and Zn (x) := σn (y)=x eφn (y) . This definition extends without change to general compact TMS. But in the non-compact case, the number of states is infinite, and Zn (x) may diverge. Is it tempting to restrict the discussion to φ = −βU for which Zn (x) < ∞, but this is too strong for some purposes. It is better to use the following weaker restriction: Definition 4.1. Suppose X is a TMS. A continuous function φ : X → R  is called admissible if σn (y)=x eφn (y) 1[a] (y) < ∞ for all n, a, and x. As always φn = φ + φ ◦ σ + · · · + φ ◦ σ n−1 . Definition 4.2. Suppose X is a TMS and φ := −βU is admissible. A thermodynamic limit with potential U , inverse temperature β and boundary condition x ∈ X is any σ–finite ν which is finite on cylinders, such that for some nk ↑ ∞, for every finite union of partition sets F and for every cylinder [a] 1 Znk (F, x) where Zn (F, x) =



eφnk (y) 1F ∩[a] (y) −−−−→

σ nk (y)=x

 σ n (y)=x

k→∞

ν(F ∩ [a]) , ν(F )

eφn (y) 1F (y).

 We rewrite this in terms of Ruelle’s operator (Lφ f )(x) = σ(y)=x eφ(y) f (y). We are being intentionally vague as to the space on which this “operator” acts. As we shall see in later sections, it is useful to consider the action of Lφ on different spaces, depending on the case at hand.  A formal calculation shows that (Lnφ f )(x) = σn (y)=x eφn (y) f (y). Therefore, ν is a thermodynamic limit with boundary condition x iff there is a subsequence nk → ∞ s.t. for any cylinder [a] and every finite union of partition sets F , (Lnφk 1[a] )(x)/(Lnφk 1F )(x) −−−− → ν[a]/ν(F ). The analysis of thermodynamic limits k→∞

reduces to the study of the asymptotic behavior of Lnφ f as n → ∞ for “sufficiently many” functions f .

94

O. SARIG

4.2. Gurevich pressure. The first step in the analysis of Lnφ f as n → ∞ is to understand what happens to n1 ln Lnφ f as n → ∞. Suppose X is a topologically mixing TMS and φ : X → R has summable  variations. Given a state a, let Zn (φ, a) := σn (x)=x eφn (x) 1[a] (x). 1 n→∞ n

Theorem 4.3. The limit PG (φ) := lim

ln Zn (φ, a) exists for all states a,

and is independent of the choice of a. If Lφ 1∞ < ∞, then PG (φ) < ∞. Theorem 4.4. If PG (φ) < ∞ then φ is admissible, and for every f continuous, non-negative, not identically equal to zero, and supported inside a finite union of partition sets, lim n1 ln(Lnφ f )(x) = PG (φ) for all x. n→∞

Definition 4.5. PG (φ) is called the Gurevich pressure of φ. PG (φ) plays a central role in the thermodynamic formalism. In section 5 we will see that PG (φ) = sup{hμ (σ) + φdμ} (sup over all shift invariant measures), and in sections 6 and 7 we’ll see that behavior of the functions t → PG (φ+tψ) characterizes the equilibrium measure of φ, and some of its statistical properties. The following theorems provide additional information on PG (φ): Theorem 4.6. Let X be a topologically mixing TMS, and φ, ψ be functions with summable variations, then (1) PG (φ + c) = PG (φ) + c for every constant c (2) φ ≤ ψ ⇒ PG (φ) ≤ PG (ψ) (3) PG (tφ + (1 − t)ψ) ≤ tPG (φ) + (1 − t)PG (ψ) for all 0 < t < 1 (4) If φ − ψ = f − f ◦ σ, then PG (φ) = PG (ψ) Theorem 4.7. Let X = Σ+ (G) be a topologically mixing TMS associated to a directed graph G. If φ has summable variations, then PG (φ) = sup{PG (φ|Σ+ (G ) )}, where the supremum ranges over all finite subgraphs G ⊂ G. 4.3. Generalized Ruelle’s Perron-Frobenius Theorem. Suppose X is a topologically mixing TMS associated to a graph G = G(V, E). Before we tackle the asymptotic behavior of Lnφ for general admissible functions with summable variations, we consider the special case when Lφ 1 = 1 and φ is Markovian. This means that φ(x) = f (x0 , x1 ) for some function f : E → R. Let G denote the graph obtained from G by reversing the direction of its edges:  G = G(V, E  ) where E  = {(b, a) : (a, b) ∈ E}. Since Lφ 1 = 1, ' ef (b,a) (a, b) ∈ E  P = (pab )V ×V , pab := 0 otherwise is a stochastic matrix, compatible with G . Reversing the edges guarantees the following identity: If [b] = [b0 , . . . , bm−1 ] and x0 = a, then (n)

1[b] (x) = pabm−1 · pbm−1 bm−2 · · · pb1 b0 , Ln+m φ (n)

where pabm−1 is the (a, bm−1 )–entry of the matrix P n :  (n) pabm−1 = paξ1 pξ1 ξ2 · · · pξn−2 ξn−1 pξn−1 bm−1 . ξ1 ,...,ξn−1 ∈V (n)

Equivalently, pabm−1 = Pr(Xn = bm−1 |X0 = a), where {Xn }n≥0 is the Markov chain on G with transition probabilities Pr(a → b) = pab .

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

95

Mixing countable state Markov chains fall into three classes: positive recurrent chains, null recurrent chains, and transient chains. To describe these cases, let  (n) a paa := ξ1 ,...,ξn ∈V \{a} paξ1 pξ1 ξ2 · · · pξn−2 ξn−1 pξn−1 a (the probability that if the chain starts at a, then it will return to a for the first time at time n). (n) Positive recurrent behavior : pab −−−−→ pb , where π = (pb )b∈V is a stationary n→∞

probability vector (πP = π). If we start a positive recurrent Markov chain at some state a, then it will return to a infinitely many times with full probability, and the average time till the first return to a is finite. The frequency of visits to a is positive. ∞ ∞   (n) (n) A Markov chain is positive recurrent iff paa = ∞ and n · a paa < ∞. n=0 n=1  (n) (n) Null recurrent behavior : pab −−−−→ 0 but pab = ∞ and there is a n>0 n→∞  (k) −−−→ pb where sequence an ↑ ∞, an = o(n) s.t. for all states b, a1n n−1 k=0 pab − n→∞  π = (pb )b∈V is a stationary positive vector s.t. πb = ∞. If we start a null recurrent Markov chain at a state a, then it will return to a infinitely many times with full probability, but the average time till the first return to a is infinite. In this case the asymptotic frequency of visits to a is zero. Null recurrence happens ∞ ∞   (n) (n) iff paa = ∞ and n · a paa = ∞. n=0 n=1  (n) Transient behavior : n>0 pab < ∞. A transient Markov chain started at a state a has positive probability never to return to a. Transience is characterized by ∞  (n) the condition paa < ∞. n=0

All finite state Markov chains are positive recurrent. For examples of null recurrence and transience, consider the random walk on Zd . This is the Markov chain with set of states Zd , allowed transitions v → v ± ei , ei = (δik )dk=1 , and transition probabilities 1/2d. The random walk on Zd is null recurrent when d = 1, 2, and transient when d ≥ 3. The discussion above provides a full description of the asymptotic behavior of Lkφ 1[b] as k → ∞ in the case when φ is Markovian, and Lφ 1 = 1. The theory extends to general potentials with summable variations: Definition 4.8. Suppose X is a topologically mixing TMS, and φ : X → R is a function with summable variation and finite Gurevich pressure PG (φ). For every state a, let ϕa (x) := 1[a] (x) inf{n ≥ 1 : xn = a} and set   eφn (x) 1[a] (x) and Zn∗ (φ, a) := eφn (x) 1[ϕa =n] (x). Zn (φ, a) := σ n (x)=x

σ n (x)=x

Let λ := exp PG (φ), and fix some state a. We say that  −n  −n ∗ (1) φ is positive recurrent,if λ Zn (φ, a) = ∞, nλ Zn (φ, a) < ∞,  −n −n ∗ (2) φ is null recurrent, if λ Z (φ, a) = ∞, nλ Zn (φ, a) = ∞, n  −n (3) φ is transient, λ Zn (φ, a) < ∞. It can be shown that these definitions do not depend on the choice a. Zn (φ, a) and Zn∗ (φ, a) generalize paa and a paa : Summing over n–periodic x ∈ [a] is the same as summing over all paths a → x1 → · · · → xn−1 → a, and summing over n–periodic x ∈ [ϕa = n] is the same as summing over all paths a → ξ1 → · · · → ξn−1 → a such that ξi = a for all i. (n)

(n)

96

O. SARIG

With this dictionary in mind, definition 4.8 is a translation of the characteri(n) (n) zation of the modes of recurrence of a Markov chain in terms of paa and a paa . Theorem 4.9 (Generalized Ruelle’s Perron-Frobenius Theorem). Let X be a topologically mixing topological Markov shift, and let φ : X → R be a function with summable variations and finite Gurevich pressure. (1) φ is positive recurrent iff there are λ > 0, a positive continuous function h, and a conservative measure ν which is finite and positive on cylinders,  s.t. Lφ h = λh , L∗φ ν = λν , and hdν = 1. In this case λ = exp PG (φ) and for every cylinder [a], λ−n Lnφ 1[a] −−−−→ hν[a] uniformly on compacts. n→∞

(2) φ is null recurrent iff there are λ > 0, a positive continuous function h, and a conservative measure ν which is finite and positive on cylinders, s.t. Lφ h = λh , L∗φ ν = λν , and hdν = ∞. In this case λ = exp PG (φ) and for every cylinder [a], λ−n Lnφ 1[a] −−−−→ 0 uniformly on compacts. There n→∞

is a sequence an ↑ ∞, an = o(n) s.t. for all cylinders [a], n−1 1  −k k λ Lφ 1[a] −−−−→ hν[a] uniformly on compacts. n→∞ an k=0

(3) φ is transient iff there is no conservative measure ν, which is finite and positive on cylinders, such that L∗φ ν = λν for some λ > 0. In this case  −nPG (φ) n Lφ 1[a] < ∞ pointwise for every cylinder [a]. e If X has finitely many states, every φ with summable variations is positive recurrent. Theorem 4.10. Under the assumptions of the previous theorem, if φ is positive recurrent, then there is a unique thermodynamic limit up to normalization, equal to ν. If in addition ν(X) < ∞, then ν/ν(X) is a DLR state. Proof. That ν is the thermodynamic limit follows from the expression of this property in terms of the Ruelle operator. Now suppose ν(X) = 1. To see that ν is a DLR state for φ, we show that the Jacobian of φ equals λ−1 exp φ. For every non-negative measurable f : X → R,   f (ax)λ−1 eφ(ax) dν(x) f λ−1 eφ dν ◦ σ = a∈V



=

λ

σ[a] −1



 φ(ax)

1σ[a] (x)e

f (ax)dν(x) =

λ

−1

 (Lφ f )dν =

f dν,

a∈V dν a.e., and ν is conformal for φ. As we saw because L∗φ ν = λν. Thus λ−1 eφ = dν◦σ in section 3, conformal probability measures are DLR states. 

Definition 4.11. Suppose X is a topologically mixing TMS, and φ : X → R is a positive recurrent function with finite Gurevich pressure and  summable variations. The measure dm = hdν, where Lφ h = λh, L∗φ ν = λν, and hdν = 1 is called the Ruelle-Perron-Frobenius (RPF) measure of φ. This is a shift invariant measure,  because for every measurable non-negative  function f , Lφ [(f ◦ σ)h] = λf h, so (f ◦ σ) hdν = λ−1 Lφ [(f ◦ σ)h]dν = f hdν. We will see in the next section that hdν is the solution to the variational problem, whenever such a solution exists.

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

97

4.4. Notes and references. The Gurevich pressure first appeared in [Gur69], in the special case φ ≡ 0. Gurevich extended the theory to general Markovian potentials in [Gur78]. In these works, the formula in Theorem 4.7 was used as the definition, and what we presented as a definition was proved as a theorem. The extension of the theory to non-Markovian potentials was done in [Sar99]. There, Theorems 4.3 and 4.7 are proved for locally H¨ older potentials, but the same proof works for potentials with summable variations. The Generalized Ruelle’s Perron-Frobenius Theorem (Theorem 4.9) has a more complicated history. When φ is Markovian, Lφ preserves the space W = span{1[a] : a ∈ V }, and Lφ : W → W is encoded by a non-negative |V | × |V | matrix. If |V | < ∞, then Theorem 4.9 reduces to the classical Perron–Frobenius Theorem. If V is countable, then Theorem 4.9 follows from Vere-Jones’s generalization of the Perron-Frobenius theorem to countable positive matrices [VJ67],[VJ68], see also [Gur78],[GS98]. A central idea in Vere-Jones’s papers is to model the analysis on the theory of countable state Markov chains, see e.g. [Chu60]. When φ is not Markovian and X is a compact TMS, Theorem 4.9 is due to Ruelle in [Rue68],[Rue76] under the assumption that φ is H¨older. Bowen extended the theorem to functions with summable variations, and gave it the name “Ruelle’s Perron-Frobenius Theorem” [Bow75]. Walters extended the theorem to even larger classes of regularity [Wal01], and Pollicott extended it to complex valued potentials [Pol86]. All these results are for compact TMS, where only positive recurrent behavior is possible. The generalization of Ruelle’s Perron-Frobenius Theorem to non-compact topological Markov shifts was done in [Sar01b]. The proof is written there for weakly H¨older functions, but works verbatim under the weaker summable variations condition. For even weaker regularity, see [Dao13].

5. Pressure, equilibrium measures, and Gibbs measures in the sense of Bowen 5.1. The variational problem. Let T be a continuous map on a complete metric separable space (Y, d), and suppose φ : Y → R is Borel measurable.  Definition 5.1. The variational pressure of φ is sup{hμ (T ) + φdμ}, where the supremum ranges over all T–invariant Borel probability measures for which φdμ makes sense and hμ (T ) + φdμ = ∞ − ∞. The measures which attain the supremum are called equilibrium measures (for φ). The variational problem is to find, for a given φ, the variational pressure of φ, and to determine its equilibrium measures. We will focus on TMS. Remark 5.2. The word “pressure” is a relic of the first papers on the subject by Ruelle, which treated lattice gas models. Actually, the role of the “pressure” in thermodynamic formalism is much closer to the role of (minus) the free energy in thermodynamics, see Theorems 6.5, 7.4–7.6 and the discussion preceding them. 5.2. The variational pressure and the Gurevich pressure. How to calculate the variational pressure? Recall that we defined the Gurevich pressure of a

98

O. SARIG

function φ with summable variations to be 1 ln Zn (φ, a), where Zn (φ, a) := PG (φ) = lim n→∞ n



eφn (x) 1[a] (x).

σ n (x)=x

Theorem 5.3 (Variational Principle). If X is a topologically mixing TMS and φ : X → R has summable variations, then  PG (φ) = sup{hμ (σ) + φdμ} where the supremum ranges over  all shift invariant Borel probability measures μ s.t. φ is μ–integrable, and (hμ (σ), φdμ) = (∞, −∞).  In particular, if PG (φ) < ∞ and φ is positive recurrent, then sup{hμ (σ) + φdμ} is an eigenvalue of Ruelle’s operator. 5.3. Equilibrium measures. Our next taskis to find the equilibrium measures of φ: the measures which bring sup{hμ (σ) + φdμ} to a maximum. Recall the definition of the Ruelle Perron Frobenius (RPF) measure: dm = hdν  where Lφ h = λh, L∗φ ν = λν, hdν = 1. This measure exists whenever φ is a positive recurrent function with summable variations and finite Gurevich pressure. Theorem 5.4. Suppose X is a topologically mixing TMS, and φ : X → R is positive recurrent, with summable variations, and finite Gurevich pressure. If the RPF measure of φ has finite entropy, then it is an equilibrium measure for φ. This is always the case when X has finitely many states. For an example of an RPF measure with infinite entropy, take X = NN and φ(x) = ln px0 , where p = (pk )k∈N is a probability vector with infinite entropy. In this case PG (φ) = 0, h ≡ 1, ν =Bernoulli measure with vector p, so the  probability   (σ) = − p ln p = ∞, φdν = p ln pi = −∞. We RPF measure is ν, and h ν i i i  see that hν (σ) + φdν = ∞ − ∞, and is meaningless. Theorem 5.5. Suppose X is a topologically mixing TMS, and φ : X → R has summable variations, sup φ < ∞, and PG (φ) < ∞, then (1) φ has at most one equilibrium measure. (2) This equilibrium measure, if it exists, is the RPF measure of φ. (3) In particular, if φ has an equilibrium measure, then φ is positive recurrent, and the RPF measure of φ has finite entropy. Theorem 5.6. If X is a topologically mixing TMS and φ : X → R has summable variations, sup φ< ∞ and PG (φ) < ∞, then every equilibrium measure m of dm dm, and the natural extension of m is Bernoulli. φ is exact, hm (σ) = − log dm◦σ 5.4. Gibbs measures in the sense of Bowen. Recall that a shift invariant probability measure m is called a Gibbs measure (in the sense of Bowen) if there are constants M > 1 and P ∈ R s.t. for every cylinder m[a0 , . . . , an−1 ] ≤ M for all x ∈ [a0 , . . . , an−1 ]. (5.1) M −1 ≤ n−1 exp( k=0 φ(σ k x) − nP ) Theorem 5.7. Suppose X is a topologically mixing TMS. If X is compact (i.e. the number of states is finite), then every φ : X → R has a unique Gibbs measure in the sense of Bowen. This measure is also the equilibrium measure of φ, and is equal to the RPF measure of φ. The P in ( 5.1) equals PG (φ).

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

99

When the number of states is infinite, the situation is more complicated: Gibbs measures can only exist on some TMS, but not on others. The combinatorial property which distinguishes the TMS which carry Gibbs measures is the following: Definition 5.8. A topological Markov shift is said to have the big images and pre-images (BIP) property, if there is a finite collection of states b1 , . . . , bN s.t. for every state a there are i, j s.t. a → bi and bj → a. Every compact topological Markov shift has the BIP property (take {b1 , . . . , bN } to be the full collection of states), but many non-compact TMS do not. Theorem 5.9. Let X be a topologically mixing TMS. A function φ : X → R with summable variations possesses a Gibbs measure in the sense of Bowen iff the following three conditions hold: (1) PG (φ) < ∞ (2) var1 φ < ∞ (3) X has the big images and pre-images property. In this case, φ is positive recurrent, the Gibbs measure m equals the RPF measure of φ, the P in ( 5.1) equals PG (φ), and the natural extension of m is Bernoulli. 5.5. Notes and references. The variational principle for non-compact TMS (theorem 5.3) was proved in [Sar99] under the additional assumption that sup φ < ∞. This assumption was removed in [IJT13]. Theorems 5.4, 5.5 on equilibrium measures are from [BS03]. The following important cases were done before: (1) Compact TMS, φ ≡ 0 (Parry [Par64]). This was the first calculation of the measure of maximal entropy for a dynamical system. (2) Compact TMS, φ H¨older continuous (Ruelle [Rue67],[Rue76] and Bowen [Bow75b]). These were the first papers to define topological pressure and equilibrium measures for dynamical systems. (3) Non-compact TMS, φ Markovian (Gurevich [Gur69],[Gur70],[Gur78], [Gur84]). These were the first papers to treat the non-compact case. See also [MU01],[FFY02],[Yur03a],[Zar85] and references therein. The formula for the entropy of an equilibrium measure in Theorem 5.6 can be found in [Led74] for compact Markov shifts and in [BS03] for non-compact TMS (see also the appendix to [CS09]). The rest of the theorem follows from the results for conformal measures in §3. See the end of that section for further references. Bowen introduced his notion of a Gibbs measure in [Bow75], and showed that such measures always exist for compact TMS. The ergodic and stochastic properties of Gibbs measures for non-compact TMS shifts were studied by Aaronson and Denker in [AD01], and Aaronson, Denker & Urbanski in [ADU93]. The BIP condition and Theorem 5.9 are from [Sar03]. The direction “BIP⇒existence” in Theorem 5.9 follows from earlier work of Mauldin & Urbanski [MU01]. For generalizations of Bowen’s definition, see [Yur00],[Yur03b] and references therein. Various authors considered the variational problem for general dynamical syson tems. Walters defined the topological pressure Ptop (·) for continuous maps  general compact metric spaces, and showed that Ptop (φ) = sup{hμ (T ) + φdμ} for general continuous functions φ [Wal75b]. Bowen gave an alternative formula for Ptop (φ) in terms of the metric structure of Y [Bow75]. For the special case φ ≡ 0, see the earlier works [AKM65],[Goo69],[Goo71]. The general non-compact case is still not understood, except when φ has a “nice” continuous

100

O. SARIG

extension to some “nice” compactification of X, see [Wal78], [Zar85], [GS98]. In the absence of such conditions, various possible definitions of Ptop (φ) have  been suggested [Bow73],[PP84], which provide upper bounds for sup{hμ (T ) + φdμ}, see also [Tho11]. For definitions of pressure in complex dynamics, see [PRLS04]. We turn to equilibrium measures. Upper-semi-continuous functions attain their maximum over compact sets. So, if a continuous map T on a compact metric space X has the property that the entropy map μ → hμ (T ) is upper-semi-continuous with respect to the weak-star topology, then every continuous potential on X has an equilibrium measure. Sufficient conditions for the upper-semi-continuity of the entropy map were given by Misiurewicz [Mis76] and Newhouse [New89], see §8.1.

6. Strong positive recurrence and spectral gap 

6.1. Spectral Gap Property. Ruelle’s operator (Lφ f )(x) =

eφ(y) f (y)

σ(y)=x

has played a central role in our discussion of the variational problem and the thermodynamic limit. In this section we discuss a technical property of Lφ which provides detailed information on Lnφ as n → ∞. Let X be a topologically mixing TMS, and let φ : X → R be a weakly H¨ older continuous function with finite Gurevich pressure (recall that weak H¨ older continuity means that for some 0 < θ < 1, varn φ ≤ Aθ n for all n ≥ 2). Let  dom(Lφ ) := {f : X → R : eφ(y) f (y) converges absolutely for all x}. σ(y)=x

Definition 6.1. We say that φ has the spectral gap property (SGP), if there is a Banach space L of continuous functions on X s.t.: (1) L is “rich”: (a) L ⊂ dom(Lφ ) and L ⊃ ({1[a]( : [a] = ∅} (b) f ∈ L =⇒ |f | ∈ L and (|f |(L ≤ f L (c) L –convergence implies uniform convergence on cylinders. (2) Lφ : L → L has spectral gap: (a) Lφ (L ) ⊂ L and Lφ : L → L is bounded (b) Lφ = λP + N where λ = exp PG (φ), P N = N P = 0, P 2 = P , dim ImP = 1, and the spectral radius of N is less than λ. (3) If g : X → R is weakly H¨older continuous and bounded, then Lφ+zg : L → L is bounded, and z → Lφ+zg is holomorphic on some complex neighborhood U of zero: For all z0 ∈ U , lim h1 (Lφ+(z0 +h)g − Lφ+z0 g ) h→0

exists in the operator norm. Property (2)(b) is an algebraic way of saying that λ = ePG (φ) is a simple eigenvalue of Lφ : L → L , and that the remainder of the spectrum lies in {z : |z| ≤ ρ} where ρ (the spectral radius of N ) is strictly smaller than λ. The relations N P = P N = 0 and P 2 = P imply that P is the eigenprojection of λ, and λ−n Lnφ = λ−n (λP + N )n = P + λ−n N n . Since ρ < λ, λ−n Lnφ − P  = λ−n N n  −−−−→ 0 exponentially fast. n→∞

Thus (2)(b) implies that λ

−n

Lnφ

converges exponentially fast in norm to P .

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

101

Property (3) is saying that Lφ+zg is an “analytic perturbation” of Lφ . Perturbation theory for linear operators says that isolated simple eigenvalues (such as λ) survive analytic perturbations, and vary analytically. Property (3) allows us to apply this theory to PG (φ + zg) = ln λ(Lφ+zg ), for |z| small. 6.2. Implications of the spectral gap property. Throughout this section we make the following assumptions: X is a topologically mixing TMS, φ is weakly H¨older continuous, PG (φ) < ∞, and φ has the spectral gap property. Let Lφ = λP + N be the decomposition given by the SGP. Theorem 6.2(Stable Positive Recurrence). φ is positive recurrent, and P takes the form P f = h f dν, where Lφ h = λh, L∗φ ν = λν, and hdν = 1. Moreover, for every weakly H¨ older continuous bounded function g : X → R, there is an  > 0 s.t. φ + tg is positive recurrent for all |t| < . Theorem 6.3 (Exponential decay of correlations). Let m denote the RPF measure of φ. There exists 0 < κ < 1 such that for every f : X → R bounded H¨ older continuous and g ∈ L∞ (m) there is a constant C(f, g) s.t. # #   # # # f (g ◦ σ n )dm − f dm gdm# ≤ C(f, g)κn for all n ≥ 1. # # Theorem 6.4 (Central Limit Theorem). Let m denote the RPF measure of φ, and suppose g : X → R is a bounded H¨ older continuous function. If gdm = 0 and g cannot be put in the form ϕ − ϕ ◦ σ with ϕ continuous, then there is a positive constant σφ (g) s.t. for every t ∈ R ' )  t n−1 2 2 1 1  k m x∈X: √ g(σ (x)) < t −−−−→ * e−s /2σφ (g) ds. n→∞ n 2πσ 2 (g) −∞ k=0

φ



The identity PG (φ) = sup{hμ (σ) + φdμ} represents PG (φ) as a dynamical analogue of (minus) the Helmholtz free energy F . The next theorem is a translation of a well-known property of the free energy into the language of dynamics. First some background from thermodynamics. The functional dependence of F on its natural parameters completely characterizes the thermodynamic state in equilibrium: All thermodynamic quantities can be written as first order partial derivatives of F with respect to a suitable parameter. If first-order partial derivatives of F are thermodynamic quantities, then secondorder partial derivatives of F equal the rate of change of thermodynamic quantities when an external parameter is changed. Such quantities are called “linear response functions.” The “linear response theorem” relates the linear response functions to the fluctuations of the corresponding microscopically defined quantities. The following theorem shows that something similar happens in the world of dynamical systems: Theorem 6.5 (Derivatives of Pressure). Let m denote the RPF measure of φ, and suppose g : X → R is a bounded H¨ older continuous function, then t → PG (φ + tg) is analytic on a neighborhood of zero, and # #  d ## d2 ## PG (φ + tg) = gdm , PG (φ + tg) = σφ2 (g), dt #t=0 dt2 #t=0 where σφ (g) is given by the previous theorem.

102

O. SARIG

Thus the functional φ → PG (φ) completely characterizes the RPF measure m through its directional derivatives, and it relates the second order directional derivatives of PG (·) to the fluctuations of the Birkhoff sums with respect to this measure. We see that PG (·) behaves like a dynamical “free energy.” 6.3. Strong positive recurrence. We give a necessary and sufficient condition for the spectral gap property, and then we discuss the prevalence of this condition. The condition relies on the induction procedure, which we now explain. Suppose X is a topologically mixing TMS, and a is some state. The induced system on [a] is σa : Xa → Xa , where Xa := {x ∈ X : x0 = a, xi = a infinitely often} and σa (x) := σ ϕa (x) (x), where ϕa (x) := min{n ≥ 1 : xn = a}. The resulting transformation can be given the structure of a TMS as follows: Let S := {[a, ξ1 , . . . , ξn−1 , a] : n ≥ 1, ξ1 , . . . , ξn−1 = a} \ {∅} and let σ : X → X denote the left shift on X = (S)N , then σ : X → X is topologically conjugate to σa : Xa → Xa . The conjugacy π : X → Xa is given by π([a, ξ 0 , a], [a, ξ 1 , a], [a, ξ 2 , a], · · · ) := (a, ξ 1 , a, ξ 2 , a, ξ 3 , a, . . .). Functions φ : X → R can also be induced: The induced potential φ : X → R is ϕ −1  a  i φ := φ ◦ σ ◦ π. i=0

If φ is weakly H¨older continuous, then φ is locally H¨ older continuous (Lemma 2.6). Notice that if we induce φ on [a], then for all p ∈ R, φ + p = φ + pϕa ≥ φ + p. Definition 6.6. Suppose X is a topologically mixing TMS, and φ : X → R is weakly H¨older continuous with finite Gurevich pressure. The a–discriminant of φ is Δa [φ] := sup{PG (φ + p) : p ∈ R s.t. PG (φ + p) < ∞}. Theorem 6.7 (Discriminant Theorem). Let X be a topologically mixing TMS, and suppose φ : X → R is a weakly H¨ older continuous function s.t. PG (φ) < ∞. For every state a, (1) Δa [φ] > 0 iff φ is positive recurrent with the spectral gap property, (2) Δa [φ] = 0 iff φ is null recurrent or φ is positive recurrent without the spectral gap property, (3) Δa [φ] < 0 iff φ is transient. In case (1) we call φ strongly positive recurrent. Suppose φ is strongly positive recurrent. We will describe a Banach space L where Lφ acts with spectral gap. We need the following observation: Lemma 6.8. If φ is weakly H¨ older continuous, positive recurrent, and with finite Gurevich pressure, then PG (φ) = 0 ⇔ PG (φ) = 0. Let φ be a weakly H¨older continuous potential with finite Gurevich pressure. Assume without loss of generality that PG (φ) = 0 (otherwise work with φ − PG (φ)). By weak H¨ older continuity, there is 0 < θ < 1 (which we now fix once and for all) s.t. varn φ ≤ const.θ n for all n ≥ 2. By strong positive recurrence, there is a state a s.t. Δa [φ] > 0, therefore there exists  > 0 small s.t. 0 < PG (φ + 2) < ∞. By the lemma, PG (φ) = 0 ⇒ PG (φ) = 0. Since  → PG (φ + ) is convex, it is continuous, and one can choose  small enough so that 0 < θep < 1 for p := PG (φ + ).

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

103

Let ψ := φ +  − p1[a] , then PG (ψ) = PG (φ + ) − p = 0. Also, PG (ψ + ) < ∞ and PG (ψ + ) ≥ PG (ψ) +  > 0. Consequently, Δa [ψ] > 0, whence ψ is (strongly) positive recurrent. By the lemma, PG (ψ) = 0, and by the generalized Ruelle’s Perron-Frobenius theorem, there is a continuous function h0 s.t. Lψ h0 = h0 . Using the weak H¨ older continuity of ψ and the convergence Lnψ 1[a] −−−−→ ν[a]h0 , it is easy to check that var1 (ln h0 ) < ∞. This allows us to define

n→∞

h0 [b] := sup h0 [b]

for all states b. We will use these numbers to modulate the size and smoothness of the elements of our Banach space on the partition sets [b]. Given x, y ∈ X, let t(x, y) := min{n ≥ 0 : xn = yn }, and sa (x, y) := #{0 ≤ i ≤ t(x, y) − 1 : xi = yi = a}. Define for a function f : X → C, + . , 1 sa (x,y) sup |f | + sup |f (x) − f (y)|/θ f L := sup : x, y ∈ [b], x = y . b h0 [b] [b] Theorem 6.9. Lφ acts with the spectral gap property on (L ,  · L ), where L := {f : X → C : f L < ∞}. We finish this section with two useful facts on the discriminant. The first is a consequence of the obvious fact that if ψ = 1[a] , then the induced version of ψ on [a] is ψ ≡ 1. Using this observation it is easy to see that (6.1)

Δa [φ + t1[a] ] = Δa [φ] + t.

In particular, if Δa [φ] < ∞, then the one parameter family of potentials φ + t1[a] exhibits for some parameters transience, and for other parameters recurrence. The second fact like to mention is a useful estimate for Δa [φ]. Recall  we would ∗ φn (x) that Zn (φ, a) = σn (x)=x e 1[ϕa =n] (x). Theorem 6.10. Suppose X is a topologically mixing TMS and φ has summable variations and finite Gurevich pressure. For every a, either Δa [φ] = ∞ or # #  ∞ ∞  # # n ∗ #Δa [φ] − ln # ≤ R Z (φ, a) varn φ, n # # n=1

n=2



where R is the radius of convergence of the series tn Zn∗ (φ, a).  n ∗ In particular, if ∞ n=1 t Zn (φ, a) diverges at its radius of convergence, then φ has the spectral gap property. Since it is rather common for a positive power series to diverge at its radius of convergence, it seems likely that the spectral gap property happens “often.” In the next section we will investigate this further. 6.4. How common is spectral gap? We mention some particular cases when the spectral gap property holds, and then discuss the general case. Theorem 6.11. Suppose X is a topologically mixing TMS with finitely many states, then every weakly H¨ older continuous potential has the spectral gap property. In this case one can use the Banach space , % $ L = f : X → C : f  := max |f | + sup |f (x) − f (y)|/θ t(x,y) < ∞ , where θ is a H¨ older exponent of φ.

104

O. SARIG

Recall that a topologically mixing topological Markov shift has the big images and pre-images (BIP) property, if there is a finite collection of states b1 , . . . , bN s.t. for every state a there are i, j s.t. bi → a and a → bj . Theorem 6.12. Suppose X is a topologically mixing TMS with the BIP property, then every weakly H¨ older continuous potential φ with finite Gurevich pressure and such that var1 φ < ∞ has the spectral gap property. In this case one can use the Banach space L = {f : X → C : f  := sup |f | + sup sup |f (x) − f (y)|/θ t(x,y) < ∞}, B∈β x,y∈B

where θ is the H¨ older exponent of φ and β is the smallest partition whose σ–algebra contains {σ[a] : a is a state}. Next we characterize the topological Markov shifts for which every weakly H¨ older continuous potential with finite Gurevich pressure has the spectral gap property (including potentials for which var1 φ = ∞). Definition 6.13. Let G = G(V, E) be a directed graph. A subset F ⊂ V is called a uniform Rome if every path of length N in G contains at least one vertex in F (“all roads lead to Rome, in less than N steps”). Theorem 6.14. A topologically mixing TMS X has the property that all weakly H¨ older continuous potentials on X have the spectral gap property, iff its associated graph has a finite uniform Rome. Notice that while every finite graph has a finite uniform Rome (equal to the full set of vertices), this property is rare for infinite graphs. Graphs with “infinite rays” a1 → a2 → a3 → · · · with ai distinct do not have finite uniform Romes. Thus Theorem 6.14 says that while for all compact TMS all “reasonable” potentials have spectral gap, in the non-compact case, barring very pathological combinatorial scenarios, there will exist some potentials without the spectral gap property. Next we try to determine the topological “size” of the set of strongly positive recurrent potentials within Φ := {φ : X → R : φ is weakly H¨older continuous, and PG (φ) < ∞}. Several topologies come to mind. To define them efficiently, we fix ω = (ωn )n≥1 , ∞ where 0 ≤ ωn ≤ ∞, and let f ω := sup |f | + n=2 ωn varn (f ), where 0 · ∞ := 0. Definition 6.15. The ω–topology on Φ is the topology generated by the basic ω–neighborhoods V (φ, ) := {φ ∈ Φ : φ − φ ω < }. Theorem 6.16. The set of φ ∈ Φ with the spectral gap property is open and dense in Φ with respect to every ω–topology. In particular, it is open in the sup–norm topology ω = (0, 0, 0, . . .), and dense in the θ–H¨ older topology ω = (0, θ−1 , θ −2 , θ −3 , . . .), 0 < θ < 1. We see that “most” potentials has spectral gap. But, because of Theorem 6.14, except in very strange cases (finite uniform Rome), some potentials do not. We will see in the next section that breakdown of the spectral gap property opens the way to critical phenomena similar to what one observes in a thermodynamic system undergoing a phase transition. Viewed from this perspective, Theorems 6.14 and 6.16 are in the spirit of the physical intuition that thermodynamic systems exhibit

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

105

critical behavior for some configuration of external parameters, but that the set of the “critical parameters” where this happens is small. 6.5. Notes and references. For references on spectral gaps and their dynamical implications, see [PP90], [Aar97], [Bal00] or [HH01]. Proofs and references to the results in §6.2 in the specific case of TMS can be found in [CS09]. For a discussion of the connection between the central limit theorem and the derivatives of the pressure, see [Rue78]. For other stochastic implications of spectral gap such as almost sure invariance principles and local limit theorems see [RE83],[DP84],[GH88], [AD01],[Gou10a],[Gou10b]. Gurevich and Savchenko characterized the stability of positive recurrence for Markovian potentials in [GS98]. The discriminant was introduced in [Sar01a], as a tool for characterizing stable positive recurrence for potentials with summable variations. The discriminant theorem (Theorem 6.7) is taken from [Sar01a], except for the equivalence Δ > 0 ⇔Spectral Gap Property, which was shown in [CS09]. Lemma 6.8 and Theorem 6.10 are from [Sar01a]. The construction of Banach spaces with spectral gap for averaging operators similar to Lφ has a long history, starting with the paper of Doeblin & Fortet [DF37]. Theorem 6.11 is due to Ruelle [Rue67],[Rue76], see also [PP90]. Theorem 6.12 is due to Aaronson and Denker [AD01]. The Banach space which demonstrates the spectral gap property for general strongly positive recurrent potentials was constructed in [CS09], and was motivated by Young [You98]. The uniform Rome condition and Theorem 6.14 are due to Cyr [Cyr11]. Theorem 6.16 on the genericity of the spectral gap property is shown in [CS09].

7. Absence of spectral gap and critical phenomena 7.1. Changes in mode of recurrence. Throughout this section we assume that X is a topologically mixing TMS, and φ : X → R is a positive recurrent weakly H¨ older continuous function with finite Gurevich pressure but without the spectral gap property. By the discriminant theorem, such potentials have zero discriminant, and by (6.1) Δa [φ + t1[a] ] = t. It follows that φ + t1[a] is transient for all t < 0, and positive recurrent (with the spectral gap property) for t > 0. Thus the one-parameter family φ + t1[a] exhibits a change in the mode of recurrence. There are examples of TMS X and potentials φ s.t. the one-parameter family {βφ}β>0 changes its mode of recurrence infinitely many times, or stay “stuck” in the “critical” phase Δa = 0 on a full interval of parameters. I am not aware of any restrictions on the possible behavior in the general case. Changes in mode of recurrence can often result in non-analyticity for the pressure function. The following theorem gives the mechanism. Recall the definition and notation for the induced potential from the previous section. Theorem 7.1. Suppose X is a topologically mixing TMS, φ has summable variations and finite Gurevich pressure, and fix some state a. (1) The equation PG (φ + p) = 0 has a unique solution p = p(φ) if Δa [φ] ≥ 0 and no solution if Δa [φ] < 0. (2) If Δa [φ] ≥ 0, then PG (φ) = −p(φ). (3) If Δa [φ] < 0, then PG (φ) = lim sup n1 log Zn∗ (φ, a). n→∞

106

O. SARIG

Thus when a one-parameter family φt changes its mode of recurrence from transient to recurrent, the formula for PG (φt ) changes from −p(φ) to lim sup n1 log Zn∗ (φt , a). This can result in lack of analyticity for t → PG (φt ).

n→∞

7.2. Statistical implications of non-analyticity of the pressure. Recall that the Gurevich pressure is an analogue of the free energy, and the free energy F is a “thermodynamic potential”: Its first partial derivatives are thermodynamic quantities. If F is not differentiable in some direction, then some thermodynamic quantity (a partial derivative) is discontinuous. This is called a “first order phase transition.” If the free energy is differentiable, but one of its second order partial derivatives blows up, then some linear response function explodes. This is called a “second order phase transition.” In both cases, the precise type of singularity carries information on the behavior at the phase transition. It is natural to look for a similar theory in the world of dynamical systems. First some preparations from probability theory. Let Yn denote a sequence of real random variables, possibly on different probability spaces (Ωn , Fn , Prn ). Definition 7.2. Yn /n converges exponentially in distribution to y0 ∈ R, if for every  > 0 there is some I() > 0 s.t. Prn [|Yn /n − y0 | > ] ≤ e−I()n for all n large exp enough. In this case we write Yn /n −−−−→ y0 . n→∞

Theorem 7.3. Suppose E(etYn ) < ∞ for all n ∈ N, t ∈ R, and assume that the exp limit F (t) := lim n1 ln E(etYn ) exists and is finite for all t ∈ R, then Yn /n −−−−→ y0 n→∞

iff F (t) is differentiable at zero and F  (0) = y0 .

n→∞

Here is an application to our context. Suppose X is topologically mixing, and φ : X → R has summable variations and finite Gurevich pressure. Fix a state a and consider the sequence of measures defining the thermodynamic limit with boundary condition x, conditioned on [a]:  φn (y) 1[a] (y)δy σ n (y)=x e (7.1) Prn = n (Lφ 1[a] )(x) Next we fix a bounded ψ : X → R with summable variations, and consider the k distribution of Yn := ψn = n−1 k=0 ψ ◦ σ with respect to Prn . Observe that E(etYn ) = (Lnφ+tψ 1[a] )(x)/(Lnφ 1[a] )(x), which is finite for all t because ψ is bounded and φ is admissible. By lemma 4.4, 1 ln E(etYn ) = PG (φ + tψ) − PG (φ). F (t) := lim n→∞ n We obtain the following interpretation of differentiability for the pressure function: Theorem 7.4. If t → PG (φ + tψ) is differentiable at zero, then n1 Yn converges exponentially in distribution to a constant c, and if# t → PG (φ + tψ) is not differend# tiable at zero, then it doesn’t. The value of c is dt P (φ + tψ). t=0 G Next we consider a case when t → PG (φ + tψ) is differentiable at zero, but not twice differentiable there. In the physics literature, it is often assumed that the singularity is a power law singularity, and we will discuss the implications of this assumption. We specify our assumptions more precisely. A positive Borel function L : (a, ∞) → (0, ∞) is called

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

107

slowly varying (s.v.) (at infinity), if L(st)/L(t) −−−→ 1 for all s > 0. A sequence t→∞

(n) is slowly varying, if L(x) := ('x() is s.v. at infinity. Typical examples include L(t) = const., L(t) = ln t, L(t) = 1/ ln t etc. We will consider the situation when PG (φ + tψ) = a + ct + tα L(1/t) with L(t) s.v. at infinity. We’ll take α > 1 to guarantee differentiability at zero, and α < 2 to guarantee that the second derivative has a singularity at 0+ . Our results are simplest to state in the special case when ψ := 1[a] and PG (0) < ∞, although these assumptions can be significantly weakened. See the end of the section for references to more general results. Let Xa := {x ∈ [a] : xn = a infinitely often}, and define for every x ∈ Xa , ϕ (x) := 1[a] (x) inf{n ≥ 1 : xn = a} and σa (x) = σ ϕa (a) (x). By the Kac formula, a  ϕa dμ = 1 for every ergodic shift invariant probability measure μ s.t. μ[a] = 0. Theorem 7.5. Suppose X is topologically mixing TMS s.t. PG (0) < ∞ and let φ be a locally H¨ older continuous function s.t. sup φ < ∞ and with an equilibrium measure μφ . The following are equivalent for every state a and 1 < α < 2: (1) PG (φ + t1[a] ) = PG (φ) + ct + tα L(1/t) as t → 0+ , with L s.v. at infinity.  n−1  (2) B1n ϕa ◦ σak − cn −−−−→ cGα where Bn = n1/α (n) and  is s.v., n→∞ k=0   α c = μφ1[a] [a] ϕa dμφ , and Gα is the stable law s.t. e−sξ Gα (dξ) = es . μ [a]−α

φ (3) μφ [ϕa (x) > t] = [1 + o(1)] |Γ(1−α)| t−α L(t) as t → ∞.

Thus the theorem expresses a power law singularity for PG (φ + t1[a] ) in terms of abnormal fluctuations and heavy tails for the gaps between appearances of a. Next we consider the effect of singular expansion on the decay of correlations.  The covariance of two functions f, g ∈ L2 (μ) is Covμ (f, g) := f gdμ− f dμ gdμ. Theorem 7.6. Under the assumptions of Theorem 7.5, if PG (φ + t1[a] ) = PG (φ) + ct + tα L(1/t) as t → 0+ , where 1 < α < 2 and L is s.v. at infinity, then   L(n) 1 + o(1) n f1 dμφ f2 dμφ Covμφ (f1 , f2 ◦ σ ) = μφ [a]α Γ(2 − α) nα−1 for all f1 , f2 weakly H¨ older continuous s.t. var1 fi < ∞, fi are supported inside a  finite union of cylinders, and fi dμφ > 0. The theorem follows from a general asymptotic formula for the decay of correlations, discussed in the following section. 7.3. Asymptotic behavior of Lnφ in the absence of spectral gap. Suppose X is a general topologically mixing TMS, and φ : X → R is a positive recurrent weakly H¨ older continuous with exponent θ and finite Gurevich pressure PG (φ) = ln λ. Let μφ := hdν denote the RPF measure of φ. Fix a state a and define a Banach algebra of functions on [a] by  " |f (x) − f (y)| n] = O(n−β ) with β > 1, then for every f ∈ La there are functions n ∈ La s.t.   ∞  f dν + h μφ [ϕ > k] f dν + n on [a], λ−n Lnφ (f 1[a] ) = h [a]

k=n+1

[a]

−β

where n La = O(n ) when β > 2, n La = O(n−2 ln n) when β = 2, and n La = O(n−2(β−1) ) when 1 < β < 2. The following theorem follows immediately, once we recall that ln h is locally H¨ older continuous and therefore h, h−1 ∈ La . Theorem 7.8. If μφ [ϕa > n] = O(1/nβ ) for some β > 1, then there exist older continuous with exponent θ ∈ (0, 1) such that for all f, g : X → R bounded, H¨ θ  , and supported inside [a] there is a constant C = C(f, g) s.t. # #  ∞   # #  # # n μφ [ϕa > k] f dμφ gdμφ # ≤ CFβ (n), #Covμφ (f, g ◦ σ ) − # # 

k=n+1

−β

if β > 2, Fβ (n) = n−2 ln n if β = 2, and Fβ (n) = n−2(β−1) if where Fβ (n) = n 1 < β < 2. In particular, if C1 n−β ≤ μφ [ϕa > n] ≤ C2 n−β , then  ∞    n μφ [ϕa > k] f dμφ gdμφ as n → ∞. Covμφ (f, g ◦ σ ) ∼ k=n+1

7.4. Notes and references. Theorem 7.1 is from [Sar01a]. There, one can also find examples of one-parameter families {βφ}β>0 with complicated changes in mode of recurrence. For other examples of dynamical systems undergoing a “phase transition”, see [FF70], [Hof77], [Wan89], [PS92], [Lop93], [BK98], [Sar00], [Hu08], [PZ06] (this is a partial list). For a description of the critical phenomena encountered in thermodynamic systems, see e.g. [Sta71]. For a discussion of exponential convergence in distribution and a proof of Theorem 7.3, see [Ell06]. For the theory of slowly varying functions see [BGT89]. Theorem 7.5 is implicit in [Sar06] (see the discussion on pages 635–636). That paper also contains versions of Theorems 7.5 and 7.6 for a more general class of ψ and X. Theorems 7.7 and 7.8 are due to Gou¨ezel [Gou04], and improve earlier results in [Sar02]. For other results on subexponential decay of correlations which apply to topological Markov shifts, see [You99] and [Pol00]. We focused in this section of stochastic implications of singular pressure functions. For consequences in multifractal analysis and geometric measure theory, see [PW99],[Iom05],[Iom10], [BI11] and references therein.

8. Application to surface diffeomorphisms 8.1. Symbolic dynamics for surface diffeomorphisms. Suppose f : M → M is a C 1+ diffeomorphism on a compact smooth orientable surface. Assume that f possesses invariant measures with positive entropy. Theorem 8.1. For every δ > 0, no matter how small, there exists a locally compact two-sided TMS σ : Σ → Σ with alphabet S, and there exists a H¨ older continuous map π : Σ → M with the following properties:

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

109

(1) π ◦ σ = f ◦ π. (2) π(Σ) has full measure for every ergodic f –invariant μ s.t. hμ (f ) > δ. (3) π : Σ# → M is finite-to-one (but perhaps not bounded-to-one), where Σ# = {x : {xi }i0 have constant susbsequences}. (4) Moreover, ∃Cab (a, b ∈ S) s.t. #{x ∈ Σ# : π(x) = p} < Cab for every p ∈ M s.t. p = π(x) where xi = a for infinitely many i < 0 and xi = b for infinitely many i > 0. (5) For every ergodic f –invariant measure μ on M s.t. hμ (f ) > δ, there exists a shift invariant ergodic measure μ / on Σ s.t. μ = μ / ◦ π −1 and hμ (σ) = hμ (f ). The theorem allows to reduce some questions on the thermodynamic formalism for f : M → M to questions on the thermodynamic formalism for σ : Σ → Σ. We explore some of the applications in the following sections. 8.2. Growth of periodic points. Let f : M → M denote a C 1+ –surface diffeomorphism on a compact surface M . A periodic point is a point p ∈ M s.t. f n (p) = p. Let Pern (f ) := #{p ∈ M : f n (p) = p}. Recall that the topological entropy of a surface diffeomorphism equals h = sup hμ (f ), where the supremum is taken over all f –invariant probability measures. By Kushnireko’s Theorem, the supremum is finite. Measures which attain the maximum are called measures of maximal entropy. Theorem 8.2. Suppose f has positive topological entropy h. If f possesses a measure of maximal entropy, then there are p ∈ N and C > 0 s.t. Pern (f ) ≥ Cehn for all n > 0 divisible by p. Remark 8.3. The condition that f possesses a measure of maximal entropy is always fulfilled when f is C ∞ , because of the following theorem of S. Newhouse. Let M (f ) denote the collection of f –invariant Borel probability measures μ, equipped with the weak-star topology, then M (f ) is compact and w∗

Theorem 8.4. If f is C ∞ , then μn −−−−→ μ ⇒ hμ (f ) ≥ lim sup hμn (f ). n→∞

n→∞

Proof of Theorem 8.2. Fix 0 < δ < h, and let Σ denote the two–sided topological Markov shift given by Theorem 8.1. Since every f –invariant probability measure on M with entropy larger than δ lifts to a shift invariant probability measure on Σ with the same entropy, / is shift a invariant probability measure on Σ} ≥ h. sup{hμ (σ) : μ Every shift invariant measure μ / projects to an f –invariant measure μ := μ / ◦ π −1 . The factor map π : (Σ, μ /) → (M, μ) is finite-to-one on Σ# . Since Σ# has full measure (Poincar´e Recurrence Theorem), and finite-to-one factors preserve entropy, hμ (σ) = hμ (f ) ≤ h. It follows that / is shift a invariant probability measure on Σ} ≤ h. sup{hμ (σ) : μ / is shift a invariant probability measure on Σ} = h. Thus sup{hμ (σ) : μ This argument also shows that f : M → M has a measure of maximal entropy iff σ : Σ → Σ has a measure of maximal entropy, and the entropy is the same.

110

O. SARIG

Now let Σ+ denote the one-sided TMS corresponding to Σ: Σ+ = {(x0 , x1 , . . .) : x ∈ Σ}. Abusing notation, we denote the left shift map on Σ+ by σ. Every shift-invariant measure on Σ defines a shift invariant measure on Σ+ with the same entropy, and every shift invariant measure on Σ+ arises this way (because of the natural extension construction). It follows that σ : Σ+ → Σ+ also possesses a measure of maximal entropy, with entropy h. If σ : Σ+ → Σ+ possesses a measure of maximal entropy, then σ : Σ+ → Σ+ possesses an ergodic measure of maximal entropy. This is because the entropy map is affine, so a.e. ergodic component of a measure of maximal entropy is an ergodic measure of maximal entropy. Let m0 denote an ergodic measure of maximal entropy for σ : Σ+ → Σ+ . By the discussion in §2.1, m0 is carried by a topologically transitive topological Markov ⊂ Σ+ . By Theorem 2.5, there is a positive integer p and a decomposition shift Σ+ 0 0p−1 + Σ0 = i=0 Xi s.t. σ(Xi ) = X(i+1)mod p and s.t. σ p : Xi → Xi is topologically conjugate to a topologically mixing topological Markov shift. This shift must also possess a measure of maximal entropy, with entropy ph. Denote this topological Markov shift by T : X → X. Applying Theorems 5.3 and 5.5 to T , we find that the zero potential on X is positive recurrent, and its Gurevich pressure equals ph. By the generalized Ruelle’s Perron-Frobenius Theorem e−nph Ln0 1[a] −−−−→ positive constant, where (L0 f )(x) = n→∞  f (y). T y=x Notice that if x ∈ X starts with a, then Ln0 1[a] (x) equals the number of admissible words (a, ξ1 , . . . , ξn−1 , a). Equivalently, np (Ln0 1[a] )(x) = #{z ∈ X : z0 = a, T n z = z} = #{y ∈ Σ+ 0 : y0 = a, σ (y) = y}. np nph Thus for some positive C and all n large, #{y ∈ Σ+ , 0 : y0 = a, σ (y) = y} ≥ Ce np nph whence also #{y ∈ Σ : y0 = a, σ (y) = y} ≥ Ce for the two-sided shift Σ. Every periodic sequence in Σ projects to a periodic point of f , and the map y → π(y) is at worst Caa –to-one on the collection of sequences which contains the symbol a infinitely many times in the past and in the future. It follows that  Pernp (f ) ≥ (C/Caa )enph for all n large.

8.3. Ergodic properties of equilibrium measures. Let f : M → M denote a C 1+ surface diffeomorphism, and suppose φ : M → R is  continuous. An equilibrium measure for φ is a measure that maximizes hμ (f ) + φdμ. Such measures always exist when f is C ∞ , because of Newhouse’s Theorem 8.4. In this section we will determine the ergodic theoretic structure of ergodic equilibrium measures of H¨ older continuous potentials, subject to the assumption that their entropy is positive. Theorem 8.5. If φ : M → R is H¨ older continuous, then every ergodic equilibrium measure of φ which has positive entropy is measure theoretically isomorphic to the product of a Bernoulli scheme and a finite rotation. (A finite rotation is a map of the form x → x + 1 mod p on {0, 1, 2, . . . , p − 1}.) Sketch of proof. Suppose μ is an ergodic equilibrium measure for φ, and μ has positive entropy. Fix 0 < δ < hμ (f ), and construct the two-sided topological

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

111

Markov shift Σ corresponding to δ. Let μ / denote an ergodic lift of μ to Σ: μ = / is an ergodic μ / ◦ π −1 . Arguing as in the previous section, one can see that μ equilibrium measure for the lifted potential φ/ := φ ◦ π : Σ → R. Since φ : M → R and π : Σ → R are H¨older, φ/ : Σ → R is H¨older. By theorem / = ψ(x0 , x1 , . . .), 2.8 there is a bounded continuous function h s.t. φ(x)+h(x)−h(σx) where ψ is a function with summable variations on Σ+ , the one-sided version of Σ (ψ can actually be chosen to be weakly H¨ older and bounded). Since h is bounded, / induces μ / is also the equilibrium measure of φ/ + h − h ◦ σ. So μ /+ , the measure μ + on Σ , is the equilibrium measure of ψ. /+ is carried by a topologically Since μ / is ergodic, μ /+ is ergodic, and therefore μ + transitive topological Markov shift Σ0 . /+ is the Assume for simplicity that Σ+ 0 is actually topologically mixing. Since μ + equilibrium measure of ψ and ψ is H¨older, the natural extension of μ / is Bernoulli. So μ / is Bernoulli. A theorem of Ornstein says that factors of Bernoulli schemes are Bernoulli, so μ = μ / ◦ π −1 has the Bernoulli property, and we proved the theorem with p = 1. In general, Σ+ 0 is not topologically mixing. In such cases one appeals to the spectral decomposition to show that Σ+ 0 splits into a cycle of p mutually disjoint pieces so that the restriction of σ p to each piece is topologically mixing. Then one argues as above to show that σ p is Bernoulli on each piece. Once this is shown the theorem follows in a standard way. 

8.4. Notes and references. Theorem 8.1 is proved in [Sar13]. Earlier examples of symbolic codings of diffeomorphisms by TMS include the Smale Horseshoe [Sma65], hyperbolic automorphisms of T2 [AW67],[AW70], Anosov diffeomorphisms [Sin68a],[Sin68b], and Axiom A diffeomorphisms [Bow70]. Sinai’s paper [Sin72], Bowen’s monograph [Bow75], and Ruelle’s book [Rue78] were particularly influential in positioning symbolic dynamics and topological Markov shifts as central tools for studying smooth dynamical systems. Theorem 8.2 is from [Sar13]. Earlier, Katok showed lim sup n1 ln Pern (f ) ≥ h [Kat80]. Katok’s bound does not require the existence of a measure of maximal entropy. In the case of uniformly hyperbolic diffeomorphisms, it is also true that Pern (f ) ≤ C  enh and much more can be said on the periodic points of f , see [Bow71], [PP90]. But for general diffeomorphisms, Pern (f ) could grow superexponentially, see [Kal00]. Theorem 8.4 is due to Newhouse [New89]. Theorem 8.5 is from [Sar11]. In the case of Anosov and Axiom A diffeomorphisms, the result is due to Ratner [Rat74] and Bowen [Bow75a]. Pesin proved the Bernoulli property for smooth invariant measures with positive entropy for general smooth surface diffeomorphisms [Pes77], and Ledrappier did this for SRB measures [Led84]. The thermodynamic formalism of countable Markov shifts has been used in a similar way to prove similar results for other dynamical systems, such as interval maps, multi-dimensional beta-transformations, and piecewise affine homeomorphisms, see [Buz97],[Buz05], [Buz09], [Buz10] and references therein.

112

O. SARIG

References [Aar97]

[AD01]

[ADU93]

[AKM65] [AW67] [AW70]

[Bal00]

[BGT89]

[BHS03] [BI11]

[BK93] [BK98]

[Bow70] [Bow71] [Bow73] [Bow75]

[Bow75a] [Bow75b] [BS03]

[Buz97] [Buz05] [Buz09]

Jon Aaronson, An introduction to infinite ergodic theory, Mathematical Surveys and Monographs, vol. 50, American Mathematical Society, Providence, RI, 1997. MR1450400 (99d:28025) Jon Aaronson and Manfred Denker, Local limit theorems for partial sums of stationary sequences generated by Gibbs-Markov maps, Stoch. Dyn. 1 (2001), no. 2, 193–237, DOI 10.1142/S0219493701000114. MR1840194 (2002h:37014) Jon Aaronson, Manfred Denker, and Mariusz Urba´ nski, Ergodic theory for Markov fibred systems and parabolic rational maps, Trans. Amer. Math. Soc. 337 (1993), no. 2, 495–548, DOI 10.2307/2154231. MR1107025 (94g:58116) R. L. Adler, A. G. Konheim, and M. H. McAndrew, Topological entropy, Trans. Amer. Math. Soc. 114 (1965), 309–319. MR0175106 (30 #5291) R. L. Adler and B. Weiss, Entropy, a complete metric invariant for automorphisms of the torus, Proc. Nat. Acad. Sci. U.S.A. 57 (1967), 1573–1576. MR0212156 (35 #3031) Roy L. Adler and Benjamin Weiss, Similarity of automorphisms of the torus, Memoirs of the American Mathematical Society, No. 98, American Mathematical Society, Providence, R.I., 1970. MR0257315 (41 #1966) Viviane Baladi, Positive transfer operators and decay of correlations, Advanced Series in Nonlinear Dynamics, vol. 16, World Scientific Publishing Co., Inc., River Edge, NJ, 2000. MR1793194 (2001k:37035) N. H. Bingham, C. M. Goldie, and J. L. Teugels, Regular variation, Encyclopedia of Mathematics and its Applications, vol. 27, Cambridge University Press, Cambridge, 1989. MR1015093 (90i:26003) Noam Berger, Christofer Hoffman, and Vladas Sidoravicius. Nonuniqueness for specifications in 2+ . arXiv:math/0312344, 2003. Luis Barreira and Godofredo Iommi, Multifractal analysis and phase transitions for hyperbolic and parabolic horseshoes, Israel J. Math. 181 (2011), 347–379, DOI 10.1007/s11856-011-0013-9. MR2773047 (2012c:37057) Maury Bramson and Steven Kalikow, Nonuniqueness in g-functions, Israel J. Math. 84 (1993), no. 1-2, 153–160, DOI 10.1007/BF02761697. MR1244665 (94h:28011) Henk Bruin and Gerhard Keller, Equilibrium states for S-unimodal maps, Ergodic Theory Dynam. Systems 18 (1998), no. 4, 765–789, DOI 10.1017/S0143385798108337. MR1645373 (2000g:37039) Rufus Bowen, Markov partitions and minimal sets for Axiom A diffeomorphisms, Amer. J. Math. 92 (1970), 907–918. MR0277002 (43 #2739) Rufus Bowen, Periodic points and measures for Axiom A diffeomorphisms, Trans. Amer. Math. Soc. 154 (1971), 377–397. MR0282372 (43 #8084) Rufus Bowen, Topological entropy for noncompact sets, Trans. Amer. Math. Soc. 184 (1973), 125–136. MR0338317 (49 #3082) Rufus Bowen, Equilibrium states and the ergodic theory of Anosov diffeomorphisms, Lecture Notes in Mathematics, Vol. 470, Springer-Verlag, Berlin-New York, 1975. MR0442989 (56 #1364) Rufus Bowen, Bernoulli equilibrium states for Axiom A diffeomorphisms, Math. Systems Theory 8 (1974/75), no. 4, 289–294. MR0387539 (52 #8379) Rufus Bowen, Some systems with unique equilibrium states, Math. Systems Theory 8 (1974/75), no. 3, 193–202. MR0399413 (53 #3257) J´ erˆ ome Buzzi and Omri Sarig, Uniqueness of equilibrium measures for countable Markov shifts and multidimensional piecewise expanding maps, Ergodic Theory Dynam. Systems 23 (2003), no. 5, 1383–1400, DOI 10.1017/S0143385703000087. MR2018604 (2004k:37053) J´ erˆ ome Buzzi, Intrinsic ergodicity of smooth interval maps, Israel J. Math. 100 (1997), 125–161, DOI 10.1007/BF02773637. MR1469107 (99g:58071) J´ erˆ ome Buzzi, Subshifts of quasi-finite type, Invent. Math. 159 (2005), no. 2, 369–406, DOI 10.1007/s00222-004-0392-1. MR2116278 (2005i:37013) J´ erˆ ome Buzzi, Maximal entropy measures for piecewise affine surface homeomorphisms, Ergodic Theory Dynam. Systems 29 (2009), no. 6, 1723–1763, DOI 10.1017/S0143385708000953. MR2563090 (2010i:37013)

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

[Buz10]

[Chu60]

[CQ98]

[CS09]

[Cyr11] [Dao13]

[DF37] [Dob68a]

[Dob68b] [Dob68c] [DP84]

[Ell06]

[FF70]

[FFY02]

[Gal73] [GH88]

[Goo69] [Goo71] [Gou04] [Gou10a]

[Gou10b]

[GS98]

113

J´ erˆ ome Buzzi, Puzzles of quasi-finite type, zeta functions and symbolic dynamics for multi-dimensional maps (English, with English and French summaries), Ann. Inst. Fourier (Grenoble) 60 (2010), no. 3, 801–852. MR2680817 (2011f:37026) Kai Lai Chung, Markov chains with stationary transition probabilities, Die Grundlehren der mathematischen Wissenschaften, Bd. 104, Springer-Verlag, Berlin-G¨ ottingenHeidelberg, 1960. MR0116388 (22 #7176) Zaqueu Coelho and Anthony N. Quas, Criteria for d-continuity, Trans. Amer. Math. Soc. 350 (1998), no. 8, 3257–3268, DOI 10.1090/S0002-9947-98-01923-0. MR1422894 (99d:28028) Van Cyr and Omri Sarig, Spectral gap and transience for Ruelle operators on countable Markov shifts, Comm. Math. Phys. 292 (2009), no. 3, 637–666, DOI 10.1007/s00220009-0891-4. MR2551790 (2011f:37041) Van Cyr, Countable Markov shifts with transient potentials, Proc. Lond. Math. Soc. (3) 103 (2011), no. 6, 923–949, DOI 10.1112/plms/pdr011. MR2861747 Yair Daon, Bernoullicity of equilibrium measures on countable Markov shifts, Discrete Contin. Dyn. Syst. 33 (2013), no. 9, 4003–4015, DOI 10.3934/dcds.2013.33.4003. MR3038050 Wolfgang Doeblin and Robert Fortet, Sur des chaˆınes a ` liaisons compl` etes (French), Bull. Soc. Math. France 65 (1937), 132–148. MR1505076 R. L. Dobruˇsin, Description of a random field by means of conditional probabilities and conditions for its regularity (Russian, with English summary), Teor. Verojatnost. i Primenen 13 (1968), 201–229. MR0231434 (37 #6989) R. L. Dobruˇsin, Gibbsian random fields for lattice systems with pairwise interactions. (Russian), Funkcional. Anal. i Priloˇ zen. 2 (1968), no. 4, 31–43. MR0250630 (40 #3862) E. M. Semenov, A new interpolation theorem (Russian), Funkcional. Anal. i Priloˇ zen 2 (1968), no. 2, 68–80. MR0236694 (38 #4989) Manfred Denker and Walter Philipp, Approximation by Brownian motion for Gibbs measures and flows under a function, Ergodic Theory Dynam. Systems 4 (1984), no. 4, 541–552, DOI 10.1017/S0143385700002637. MR779712 (86g:28025) Richard S. Ellis, Entropy, large deviations, and statistical mechanics, Classics in Mathematics, Springer-Verlag, Berlin, 2006. Reprint of the 1985 original. MR2189669 (2006m:82002) M.E. Fisher and B.U. Felderhof. Phase transition in one-dimensional cluster–interaction fluids:ia. thermodynamics, ib. critical behavior.ii,simple logarithmic model. Ann. Phys., 58:177–280, 1970. Doris Fiebig, Ulf-Rainer Fiebig, and Michiko Yuri, Pressure and equilibrium states for countable state Markov shifts, Israel J. Math. 131 (2002), 221–257, DOI 10.1007/BF02785859. MR1942310 (2004b:37055) G. Gallavotti, Ising model and Bernoulli schemes in one dimension, Comm. Math. Phys. 32 (1973), 183–190. MR0356801 (50 #9270) Y. Guivarc’h and J. Hardy, Th´ eor` emes limites pour une classe de chaˆınes de Markov et applications aux diff´ eomorphismes d’Anosov (French, with English summary), Ann. Inst. H. Poincar´e Probab. Statist. 24 (1988), no. 1, 73–98. MR937957 (89m:60080) L. Wayne Goodwyn, Topological entropy bounds measure-theoretic entropy, Proc. Amer. Math. Soc. 23 (1969), 679–688. MR0247030 (40 #299) T. N. T. Goodman, Relating topological entropy and measure entropy, Bull. London Math. Soc. 3 (1971), 176–180. MR0289746 (44 #6934) S´ ebastien Gou¨ ezel, Sharp polynomial estimates for the decay of correlations, Israel J. Math. 139 (2004), 29–65, DOI 10.1007/BF02787541. MR2041223 (2004m:37011) S´ ebastien Gou¨ ezel, Almost sure invariance principle for dynamical systems by spectral methods, Ann. Probab. 38 (2010), no. 4, 1639–1671, DOI 10.1214/10-AOP525. MR2663640 (2011h:60073) S´ ebastien Gou¨ ezel, Characterization of weak convergence of Birkhoff sums for GibbsMarkov maps, Israel J. Math. 180 (2010), 1–41, DOI 10.1007/s11856-010-0092-z. MR2735054 (2011k:37010) B. M. Gurevich and S. V. Savchenko, Thermodynamic formalism for symbolic Markov chains with a countable number of states (Russian), Uspekhi Mat. Nauk 53 (1998),

114

[Gur69] [Gur70] [Gur78]

[Gur84]

[HH01]

[Hof77] [Hu08]

[IJT13] [Iom05]

[Iom10]

¨ [JO03]

¨ [JOP12]

[Kal00]

[Kat80] [Kea72] [Kel98]

[Led74]

[Led84] [Lop93]

[LR69]

[Mis76]

O. SARIG

no. 2(320), 3–106, DOI 10.1070/rm1998v053n02ABEH000017; English transl., Russian Math. Surveys 53 (1998), no. 2, 245–344. MR1639451 (2000c:28028) B. M. Gureviˇ c, Topological entropy of a countable Markov chain (Russian), Dokl. Akad. Nauk SSSR 187 (1969), 715–718. MR0263162 (41 #7767) B. M. Gureviˇ c, Shift entropy and Markov measures in the space of paths of a countable graph (Russian), Dokl. Akad. Nauk SSSR 192 (1970), 963–965. MR0268356 (42 #3254) B. M. Gureviˇ c, A variational principle for one-dimensional latticed Gibbs random fields with a countable number of states (Russian), Dokl. Akad. Nauk SSSR 241 (1978), no. 4, 749–752. MR0501441 (58 #18795) B. M. Gurevich, A variational characterization of one-dimensional countable state Gibbs random fields, Z. Wahrsch. Verw. Gebiete 68 (1984), no. 2, 205–242, DOI 10.1007/BF00531778. MR767801 (86g:60065) Hubert Hennion and Lo¨ıc Herv´e, Limit theorems for Markov chains and stochastic properties of dynamical systems by quasi-compactness, Lecture Notes in Mathematics, vol. 1766, Springer-Verlag, Berlin, 2001. MR1862393 (2002h:60146) Franz Hofbauer, Examples for the nonuniqueness of the equilibrium state, Trans. Amer. Math. Soc. 228 (1977), no. 223–241. MR0435352 (55 #8312) Huyi Hu, Equilibriums of some non-H¨ older potentials, Trans. Amer. Math. Soc. 360 (2008), no. 4, 2153–2190, DOI 10.1090/S0002-9947-07-04412-1. MR2366978 (2009m:37061) Godofredo Iommi, Thomas Jordan, and Mike Todd. Recurrence and transience for suspension flows. Israel J. of Math (to appear), 2013. Godofredo Iommi, Multifractal analysis for countable Markov shifts, Ergodic Theory Dynam. Systems 25 (2005), no. 6, 1881–1907, DOI 10.1017/S0143385705000350. MR2183299 (2006g:37007) Godofredo Iommi, Multifractal analysis of the Lyapunov exponent for the backward continued fraction map, Ergodic Theory Dynam. Systems 30 (2010), no. 1, 211–232, DOI 10.1017/S0143385708001090. MR2586352 (2012d:37066) ¨ Anders Johansson and Anders Oberg, Square summability of variations of g-functions and uniqueness of g-measures, Math. Res. Lett. 10 (2003), no. 5-6, 587–601, DOI 10.4310/MRL.2003.v10.n5.a3. MR2024717 (2004m:37003) ¨ Anders Johansson, Anders Oberg, and Mark Pollicott, Unique Bernoulli g-measures, J. Eur. Math. Soc. (JEMS) 14 (2012), no. 5, 1599–1615, DOI 10.4171/JEMS/342. MR2966661 Vadim Yu. Kaloshin, Generic diffeomorphisms with superexponential growth of number of periodic orbits, Comm. Math. Phys. 211 (2000), no. 1, 253–271, DOI 10.1007/s002200050811. MR1757015 (2001e:37035) A. Katok, Lyapunov exponents, entropy and periodic orbits for diffeomorphisms, Inst. ´ Hautes Etudes Sci. Publ. Math. 51 (1980), 137–173. MR573822 (81i:28022) Michael Keane, Strongly mixing g-measures, Invent. Math. 16 (1972), 309–324. MR0310193 (46 #9295) Gerhard Keller, Equilibrium states in ergodic theory, London Mathematical Society Student Texts, vol. 42, Cambridge University Press, Cambridge, 1998. MR1618769 (99e:28022) Fran¸cois Ledrappier, Principe variationnel et syst` emes dynamiques symboliques, Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 30 (1974), 185–202. MR0404584 (53 #8384) ´ F. Ledrappier. Propri´ et´ es ergodiques des mesures de Sina¨ı. Inst. Hautes Etudes Sci. Publ. Math., (59):163–188, 1984. A. O. Lopes, The zeta function, nondifferentiability of pressure, and the critical exponent of transition, Adv. Math. 101 (1993), no. 2, 133–165, DOI 10.1006/aima.1993.1045. MR1242602 (94m:58183) O. E. Lanford III and D. Ruelle, Observables at infinity and states with short range correlations in statistical mechanics, Comm. Math. Phys. 13 (1969), 194–215. MR0256687 (41 #1343) Michal Misiurewicz, Topological conditional entropy, Studia Math. 55 (1976), no. 2, 175–200. MR0415587 (54 #3672)

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

115

R. Daniel Mauldin and Mariusz Urba´ nski, Gibbs states on the symbolic space over an infinite alphabet, Israel J. Math. 125 (2001), 93–130, DOI 10.1007/BF02773377. MR1853808 (2002k:37048) [MU03] R. Daniel Mauldin and Mariusz Urba´ nski, Graph directed Markov systems, Cambridge Tracts in Mathematics, vol. 148, Cambridge University Press, Cambridge, 2003. Geometry and dynamics of limit sets. MR2003772 (2006e:37036) [New89] Sheldon E. Newhouse, Continuity properties of entropy, Ann. of Math. (2) 129 (1989), no. 2, 215–235, DOI 10.2307/1971492. MR986792 (90f:58108) [Par64] William Parry, Intrinsic Markov chains, Trans. Amer. Math. Soc. 112 (1964), 55–66. MR0161372 (28 #4579) [Pes77] Ja. B. Pesin, Characteristic Ljapunov exponents, and smooth ergodic theory (Russian), Uspehi Mat. Nauk 32 (1977), no. 4 (196), 55–112, 287. MR0466791 (57 #6667) [Pol86] Mark Pollicott. Meromorphic extensions of generalised zeta functions. Invent. Math., 85(1):147–164, 1986. [Pol00] Mark Pollicott, Rates of mixing for potentials of summable variation, Trans. Amer. Math. Soc. 352 (2000), no. 2, 843–853, DOI 10.1090/S0002-9947-99-02382-X. MR1621698 (2000c:37005) [PP84] Ya. B. Pesin and B. S. Pitskel, Topological pressure and the variational principle for noncompact sets (Russian, with English summary), Funktsional. Anal. i Prilozhen. 18 (1984), no. 4, 50–63, 96. MR775933 (86i:28031) [PP90] William Parry and Mark Pollicott, Zeta functions and the periodic orbit structure of hyperbolic dynamics (English, with French summary), Ast´erisque 187-188 (1990), 268. MR1085356 (92f:58141) [PRLS04] Feliks Przytycki, Juan Rivera-Letelier, and Stanislav Smirnov, Equality of pressures for rational functions, Ergodic Theory Dynam. Systems 24 (2004), no. 3, 891–914, DOI 10.1017/S0143385703000385. MR2062924 (2005e:37103) [PS92] Thomas Prellberg and Joseph Slawny, Maps of intervals with indifferent fixed points: thermodynamic formalism and phase transitions, J. Statist. Phys. 66 (1992), no. 1-2, 503–514, DOI 10.1007/BF01060077. MR1149493 (93g:58085) [PU10] Feliks Przytycki and Mariusz Urba´ nski, Conformal fractals: ergodic theory methods, London Mathematical Society Lecture Note Series, vol. 371, Cambridge University Press, Cambridge, 2010. MR2656475 (2011g:37002) [PW99] Mark Pollicott and Howard Weiss, Multifractal analysis of Lyapunov exponent for continued fraction and Manneville-Pomeau transformations and applications to Diophantine approximation, Comm. Math. Phys. 207 (1999), no. 1, 145–171, DOI 10.1007/s002200050722. MR1724859 (2000j:37027) [PZ06] Yakov Pesin and Ke Zhang, Phase transitions for uniformly expanding maps, J. Stat. Phys. 122 (2006), no. 6, 1095–1110, DOI 10.1007/s10955-005-9005-7. MR2219529 (2007f:37043) [Rat74] M. Ratner, Anosov flows with Gibbs measures are also Bernoullian, Israel J. Math. 17 (1974), 380–391. MR0374387 (51 #10587) [RE83] J. Rousseau-Egele, Un th´ eor` eme de la limite locale pour une classe de transformations dilatantes et monotones par morceaux (French, with English summary), Ann. Probab. 11 (1983), no. 3, 772–788. MR704569 (84m:60032) [R´ en57] A. R´ enyi, Representations for real numbers and their ergodic properties, Acta Math. Acad. Sci. Hungar 8 (1957), 477–493. MR0097374 (20 #3843) [Rue67] D. Ruelle, A variational formulation of equilibrium statistical mechanics and the Gibbs phase rule, Comm. Math. Phys. 5 (1967), 324–329. MR0217610 (36 #699) [Rue68] D. Ruelle, Statistical mechanics of a one-dimensional lattice gas, Comm. Math. Phys. 9 (1968), 267–278. MR0234697 (38 #3013) [Rue76] David Ruelle, A measure associated with axiom-A attractors, Amer. J. Math. 98 (1976), no. 3, 619–654. MR0415683 (54 #3763) [Rue78] David Ruelle, Thermodynamic formalism, Encyclopedia of Mathematics and its Applications, vol. 5, Addison-Wesley Publishing Co., Reading, Mass., 1978. The mathematical structures of classical equilibrium statistical mechanics; With a foreword by Giovanni Gallavotti and Gian-Carlo Rota. MR511655 (80g:82017) [MU01]

116

O. SARIG

Omri M. Sarig, Thermodynamic formalism for countable Markov shifts, Ergodic Theory Dynam. Systems 19 (1999), no. 6, 1565–1593, DOI 10.1017/S0143385799146820. MR1738951 (2000m:37009) [Sar00] Omri M. Sarig, On an example with a non-analytic topological pressure (English, with English and French summaries), C. R. Acad. Sci. Paris S´er. I Math. 330 (2000), no. 4, 311–315, DOI 10.1016/S0764-4442(00)00189-0. MR1753300 (2000m:37020) [Sar01a] Omri M. Sarig, Phase transitions for countable Markov shifts, Comm. Math. Phys. 217 (2001), no. 3, 555–577, DOI 10.1007/s002200100367. MR1822107 (2002b:37040) [Sar01b] Omri M. Sarig, Thermodynamic formalism for null recurrent potentials, Israel J. Math. 121 (2001), 285–311, DOI 10.1007/BF02802508. MR1818392 (2001m:37059) [Sar02] Omri Sarig, Subexponential decay of correlations, Invent. Math. 150 (2002), no. 3, 629–653, DOI 10.1007/s00222-002-0248-5. MR1946554 (2004e:37010) [Sar03] Omri Sarig, Existence of Gibbs measures for countable Markov shifts, Proc. Amer. Math. Soc. 131 (2003), no. 6, 1751–1758 (electronic), DOI 10.1090/S0002-9939-0306927-2. MR1955261 (2004b:37056) [Sar06] Omri Sarig, Continuous phase transitions for dynamical systems, Comm. Math. Phys. 267 (2006), no. 3, 631–667, DOI 10.1007/s00220-006-0072-7. MR2249785 (2007i:37067) [Sar11] Omri M. Sarig, Bernoulli equilibrium states for surface diffeomorphisms, J. Mod. Dyn. 5 (2011), no. 3, 593–608, DOI 10.3934/jmd.2011.5.593. MR2854097 (2012k:37071) [Sar13] Omri M. Sarig, Symbolic dynamics for surface diffeomorphisms with positive entropy, J. Amer. Math. Soc. 26 (2013), no. 2, 341–426, DOI 10.1090/S0894-0347-2012-00758-9. MR3011417 [Sin68a] Ja. G. Sina˘ı, Construction of Markov partitionings (Russian), Funkcional. Anal. i Priloˇ zen. 2 (1968), no. 3, 70–80 (Loose errata). MR0250352 (40 #3591) [Sin68b] Ja. G. Sina˘ı, Markov partitions and U-diffeomorphisms (Russian), Funkcional. Anal. i Priloˇ zen 2 (1968), no. 1, 64–89. MR0233038 (38 #1361) [Sin72] Ja. G. Sina˘ı, Gibbs measures in ergodic theory (Russian), Uspehi Mat. Nauk 27 (1972), no. 4(166), 21–64. MR0399421 (53 #3265) [Sma65] Stephen Smale, Diffeomorphisms with many periodic points, Differential and Combinatorial Topology (A Symposium in Honor of Marston Morse), Princeton Univ. Press, Princeton, N.J., 1965, pp. 63–80. MR0182020 (31 #6244) [Sta71] E. Eugene Stanley. Introduction to phase transitions and critical phenomena. The International series of monographs on physics. Oxford Univ. Press, New York and Oxford, 1971. [Tho11] Daniel J. Thompson, A thermodynamic definition of topological pressure for noncompact sets, Ergodic Theory Dynam. Systems 31 (2011), no. 2, 527–547, DOI 10.1017/S0143385709001151. MR2776387 (2012c:37028) [VJ67] D. Vere-Jones, Ergodic properties of nonnegative matrices. I, Pacific J. Math. 22 (1967), 361–386. MR0214145 (35 #4996) [VJ68] D. Vere-Jones, Ergodic properties of nonnegative matrices. II, Pacific J. Math. 26 (1968), 601–620. MR0236745 (38 #5039) [Wal75a] Peter Walters, Ruelle’s operator theorem and g-measures, Trans. Amer. Math. Soc. 214 (1975), 375–387. MR0412389 (54 #515) [Wal75b] Peter Walters, A variational principle for the pressure of continuous transformations, Amer. J. Math. 97 (1975), no. 4, 937–971. MR0390180 (52 #11006) [Wal78] Peter Walters, Invariant measures and equilibrium states for some mappings which expand distances, Trans. Amer. Math. Soc. 236 (1978), 121–153. MR0466493 (57 #6371) [Wal01] Peter Walters, Convergence of the Ruelle operator for a function satisfying Bowen’s condition, Trans. Amer. Math. Soc. 353 (2001), no. 1, 327–347 (electronic), DOI 10.1090/S0002-9947-00-02656-8. MR1783787 (2001g:37029) [Wan89] X.-J. Wang. Abnormal fluctuations and thermodynamic phase transitions in dynamical systems. Phys. Rev. A, 39:3214–3217, 1989. [You98] Lai-Sang Young, Statistical properties of dynamical systems with some hyperbolicity, Ann. of Math. (2) 147 (1998), no. 3, 585–650, DOI 10.2307/120960. MR1637655 (99h:58140) [You99] Lai-Sang Young, Recurrence times and rates of mixing, Israel J. Math. 110 (1999), 153–188, DOI 10.1007/BF02808180. MR1750438 (2001j:37062) [Sar99]

THERMODYNAMIC FORMALISM FOR COUNTABLE MARKOV SHIFTS

117

Michiko Yuri, Weak Gibbs measures for certain non-hyperbolic systems, Ergodic Theory Dynam. Systems 20 (2000), no. 5, 1495–1518, DOI 10.1017/S014338570000081X. MR1786726 (2002d:37011) [Yur03a] Michiko Yuri, Thermodynamic formalism for countable to one Markov systems, Trans. Amer. Math. Soc. 355 (2003), no. 7, 2949–2971 (electronic), DOI 10.1090/S0002-994703-03269-0. MR1975407 (2004g:37012) [Yur03b] Michiko Yuri, Weak Gibbs measures for intermittent systems and weakly Gibbsian states in statistical mechanics, Comm. Math. Phys. 241 (2003), no. 2-3, 453–466. MR2013806 (2005a:37045) [Zar85] A. S. Zargaryan, A variational principle for a symbolic Markov chain with a countable number of states (Russian), Probability theory, theory of random processes, and functional analysis (Russian) (Moscow, 1984), Moskov. Gos. Univ., Moscow, 1985, pp. 24– 26. MR1032013 [Zin96] Michel Zinsmeister, Formalisme thermodynamique et syst` emes dynamiques holomorphes (French, with English and French summaries), Panoramas et Synth`eses [Panoramas and Syntheses], vol. 4, Soci´ et´ e Math´ ematique de France, Paris, 1996. MR1462079 (98k:58150) [Yur00]

Faculty of Mathematics and Computer Science, Weizmann Institute of Science, 234 Herzl Street, Rehovot 7610001, Israel E-mail address: [email protected]

Limit Theorems for Dynamical Systems

Proceedings of Symposia in Pure Mathematics Volume 89, 2015 http://dx.doi.org/10.1090/pspum/089/01484

Limit theorems for horocycle flows Giovanni Forni Abstract. In this paper we will describe joint results with A. Bufetov and L. Flaminio on the deviation of ergodic averages and limit distributions of ergodic integrals of smooth functions for horocycle flows on the unit tangent bundle of compact surfaces of constant negative curvature. The classical horocycle flow is a parabolic (zero entropy) renormalizable dynamical system, it is uniquely ergodic, mixing of all orders, with nearly but not quite integrable decay of correlations. It is the simplest renormalizable, in fact self-similar, parabolic system for which the study of the asymptotic behavior of ergodic integrals of smooth functions and their limit distributions can be carried out in some detail by tools of harmonic analysis, namely of the theory of unitary representations of the group SL(2, R). It provides therefore a model for other analogous dynamical systems (translation flows, Heisenberg nilflows, horospherical foliations in higher dimensions).

Contents 1. Introduction and summary of results 1.1. Horocycle flows 1.2. Smooth coboundaries and invariant distributions 1.3. Asymptotics of ergodic integrals 1.4. Limit distributions 2. The cohomological equation 2.1. Reduction to irreducible components 2.2. Bases of irreducible components 2.3. Difference equations 2.4. Unitary models 3. Finitely additive measures 3.1. Definitions and properties 3.2. Basic currents and invariant distributions 3.3. Dynamical projections of rectifiable arcs 3.4. Asymptotic formulas and limit theorems References 2010 Mathematics Subject Classification. Primary 37A17, 60F99; Secondary 43A85, 46F99, 58A25. Key words and phrases. Horocycle flows, ergodic averages, limit distributions, basic currents. c 2015 American Mathematical Society

121

122

GIOVANNI FORNI

1. Introduction and summary of results The classical horocycle flow is a fundamental example of a unipotent, parabolic (non-hyperbolic) flow. Its dynamical properties have been studied in great detail. For compact hyperbolic surfaces, it is known that the flow is minimal [27], uniquely ergodic [19], has Lebesgue spectrum and is therefore strongly mixing [42], in fact mixing of all orders [39], and has zero entropy [26]. Its finer ergodic and rigidity properties, as well as the rate of mixing, were investigated by M. Ratner in a series of papers [44], [45], [46], [47] (for results on the rate of mixing of the geodesic as well as horocycle flows see also the paper by C. Moore [40]). In joint work with L. Flaminio [14], the author has proved precise bounds on ergodic integrals of smooth functions. In the case of finite-volume, non-compact surfaces, the horocycle flow is not uniquely ergodic and the classification of invariant measures is due to Dani [12]. The asymptotic behavior of averages along closed horocycles has been studied by D. Zagier [55], P. Sarnak [51], D. Hejhal [28] and more recently in [14] and by A. Str¨ ombergsson [52]. Horocycle flows on general geometrically finite surfaces have been studied by M. Burger [10]. The dynamics of horocycle flows on hyperbolic surfaces of infinite volume which are “Zd -covers” of finite volume surfaces has been studied by F. Ledrappier and O. Sarig [33], [34], [35], [36], O. Sarig [48], [49] and O. Sarig and B. Schapira [50]. The above mentioned work is devoted to the classification of Radon invariant measures, to questions of unique ergodicity and to the classification of generic points of the ratio ergodic theorem for horocycle flows on “Zd -covers” of compact and finite area surfaces. The fluctuations of ergodic integrals of horocycle flows are analyzed in [34] for “Zd -covers” of compact surfaces and in [36] for “Zd -covers” of general finite area surfaces. In these lectures1 we outline joint work with L. Flaminio [14] and A. Bufetov [9] on the asymptotics and limit distributions of ergodic averages for classical horocycle flows on compact hyperbolic surfaces. The results which we will describe in these lectures should be considered as a model for the theory of limit distributions of ergodic integrals for other (renormalizable) parabolic systems. In particular, limit theorems for Interval Exchange Tranformations and flows on surfaces or, more generally, suspension flows over Vershik automorphisms were proved by A. Bufetov in the papers [7] and [8] and have inspired our joint analogous study of horocycles presented below, for 2-step nilflows and skew-translations on 2-dimensional tori there are recent results of J. Griffin and J. Marklof [24] and, finally, for horospherical foliations of geodesic flows on manifolds of negative curvature limit theorems are not yet known in higher dimensions. 1.1. Horocycle flows. Let M be a compact hyperbolic surface, that is, the quotient of the hyperbolic plane H under the action of a co-compact Fuchsian group Γ (a discrete group of isometries of the hyperbolic metric). In the Poincar´e model geodesics in H are arcs of circles perpendicular to the circle at infinity, while horocycles are circles tangent to it. There are natural lifts of geodesic and horocycle arcs to the unit tangent bundle SM of the hyperbolic surface. For geodesic arcs 1 This paper is derived from lectures on the asymptotics and on limit distributions of ergodic integrals for classical horocycle flows on compact surfaces given by the author at the School on Limit Theorems in Dynamical Systems held at the Bernoulli Center of the Ecole Polytechnique Federale de Lausanne, Switzerland, in May 2013. The author would like to thank the organizers of the school, D. Dolgopyat, Y. Pesin, M. Pollicott and L. Stoyanov for the opportunity to present his work there and for the pleasant and stimulating environment provided by the school.

LIMIT THEOREMS FOR HOROCYCLE FLOWS

123

such a lift is given by the tangent distribution of unit tangent vectors, while for horocycle arcs it is given by the normal distribution. Notice that for every horocycle arc there are two possible lifts. By definition geodesic arcs in the unit tangent bundle are orbits of the geodesic flow, horocycle arcs are orbits of the stable or unstable horocycle flow. In fact, arcs of the stable/unstable horocycle flow are stable/unstable manifolds for the geodesic flow, which is a fundamental example of an Anosov flow. The geometric picture sketched above can be made precise in terms of homogenous spaces and flows. In fact, up to the choice of a base point, the unit tangent bundle S(H) can be identified with the group P SL(2, R) := SL(2, R)/{I, −I} of M¨obius transformations which preserve the circle at infinity, hence the unit tangent bundle SM can be identified with the (compact) quotient Γ\P SL(2, R). In the following we will consider the lift of horocycle flows to the double cover SΓ := Γ\SL(2, R) of the unit tangent bundle SM , that is, we will consider the asymptotics of ergodic averages and limit distributions for SL(2, R) unipotent flows on compact quotients. The Lie algebra sl(2, R) of the Lie group SL(2, R) is the Lie algebra of 2 × 2 real matrices with zero trace. It has generators {X, Y, Θ} given by the formulas       1/2 0 0 −1/2 0 1/2 (1) X = , Y = and Θ = . 0 −1/2 −1/2 0 −1/2 0 The above generators satisfy the commutation relations (2)

[X, Y ] = −Θ ,

[Θ, X] = Y

and

[Θ, Y ] = −X .

The above vector fields are respectively the generators of the geodesic flow, of the perpendicular geodesic flow and of the circle subgroup (that is, the maximal compact subgroup) of SL(2, R). The vector fields     0 1 0 0 (3) U := −Y + Θ = and V := −Y − Θ = 0 0 1 0 are the generators of the stable and unstable horocycle flows, in fact the following commutation relations can be derived (4)

[X, U ] = U ,

and

[X, V ] = −V .

The geodesic flow {gt }, the stable horocycle flow {hU t } and the unstable horocycle flow {hVt } on P SL(2, R) are therefore given by right multiplication by the following one-parameter groups:  t/2      e 1 t 1 0 0 U V and ht := . , ht := (5) gt := 0 1 t 1 0 e−t/2 Since the quotient SΓ = Γ\SL(2, R) is defined as a quotient under the action of the group Γ on the left, the above one-parameter group induce well-defined flows on the quotient. By the above definitions, horocycle flows are unipotent flows, hence they are parabolic flows, by the commutation relations in formula (4), the geodesic flow is an Anosov flows with invariant (stable and unstable) foliations given by the orbits of the horocycle flows. We remark that since the action of the above flows is by multiplication on the right, the flow {hU t } is the stable horocycle flow, while the

124

GIOVANNI FORNI

flow {hVt } is the unstable horocycle flow, in fact by that commutation relations it follows that, for all x ∈ SΓ , for all s, t ∈ R, −s tU ) = hU gs ◦ h U t (x) = Γx exp(tU ) exp(sX) = Γx exp(sX) exp(e e−s t ◦ gs (x) ,

gs ◦ hVt (x) = Γx exp(tV ) exp(sX) = Γx exp(sX) exp(es tV ) = hVes t ◦ gs (x) . 1.2. Smooth coboundaries and invariant distributions. Our approach to the problem of limit distributions is, as we have mentioned, to prove sharp asymptotic formulas for ergodic integrals. The asymptotics of ergodic integrals is derived from the scaling of invariant distributions for the horocycle flow under the geodesic flow and from bounds on solutions of the cohomological equation. Let us recall that a distribution D ∈ E  (SΓ ) (in the sense of S. L. Sobolev and L. Schwartz) is a continuous linear functional on the space C ∞ (SΓ ) of infinitely differentiable functions. An invariant distribution for the horocycle flow {hU t } is any distribution D ∈ E  (SΓ ) such that the Lie derivative LU D of the distribution in the direction of the flow vanishes: (6)

LU D = 0

in E  (SΓ )

that is, D(U f ) = 0 for all f ∈ C ∞ (SΓ ) .

The invariant volume (in general, any invariant measure) for a flow is an invariant distribution. By unique ergodicity, up to multiplicative constants the unique horocycle-invariant distribution which is a measure is given by integration with respect to the volume. The existence of horocycle invariant distributions which are not (signed) measures was first proved by S. Greenfield and N. Wallach [25]. Horocycle invariant distributions that are distributional eigenvectors for the geodesic flow are related to conical distributions in the sense of Helgason for the horocycle foliations of the Poincar´e plane, that is, of the symmetric space SL(2, R)/SO(2, R). In fact, Helgason conical distributions are defined (see for instance [29]) for horospherical foliations of symmetric spaces of non-compact type of any connected semisimple Lie group with finite center. The importance of invariant distributions in the study of ergodic integrals of smooth functions derives from the immediate fact that they are obstructions to solving the cohomological equation. For the stable horocycle flow {hU t } the cohomological equation for a function f ∈ L2 (SΓ ) is the following linear PDE in the unknown u ∈ L2 (SΓ ) for the Lie derivative operator: (7)

LU u = f

If the above equation is satisfied, then the function f ∈ L2 (SΓ ) is called a coboundary with transfer function u ∈ L2 (SΓ ). It is clear from the above definitions that if a function f ∈ C ∞ (SΓ ) is a coboundary with transfer function u ∈ C ∞ (SΓ ), then D(f ) = 0 for all horocycle-invariant distributions D ∈ E  (SΓ ). If the converse holds, the flow is stable in the sense of A. Katok. By the Gottschalk-Hedlund argument (in fact, by the fundamental theorem of calculus), ergodic integrals of coboundaries with bounded transfer functions are bounded for all times. This suggests that in some sense the growth of ergodic averages, at least for a stable flow, is carried by invariant distributions. In [14] it was proved that the horocycle flows are stable and a complete description of the space of invariant distributions was given in terms of the Fourier analysis of the homogeneous space SΓ = Γ\SL(2, R). We recall that the space L2 (SΓ ) of complex-valued square-integrable functions has a splitting into components which

LIMIT THEOREMS FOR HOROCYCLE FLOWS

125

are irreducible for the right action of the group SL(2, R) on SΓ . By the theory of irreducible unitary representations of the group SL(2, R) (see [5]), each irreducible unitary representation is a subspace of an eigenspace of the center of the enveloping algebra of SL(2, R), which is generated by the Casimir operator. The Casimir operator is the second order differential operator given by the formulas  := −X 2 − Y 2 + Θ2 .

(8)

Each eigenspace of the Casimir operator is a direct sum or a direct integral of SL(2, R) -irreducible components. In our setting, since M is assumed to be compact, that is, since Γ is a co-compact lattice, every eigenvalue of the Casimir operator has finite multiplicity, in the sense that the corresponding eigenspace is a direct sum of finitely many SL(2, R)-irreducible components. Let σ() denote the spectrum of the Casimir operator with each eigenvalue repeated according to its multiplicity. The following splitting into SL(2, R)-irreducible components therefore holds 1 Hμ . (9) L2 (SΓ ) = μ∈σ()

We remark that the restriction of the Casimir operator on SΓ to the space of Θ-invariant functions projects to the hyperbolic Laplacian ΔM on the hyperbolic surface M . Let σ(ΔM ) denote the spectrum of the hyperbolic Laplacian with each eigenvalue repeated according to its multiplicity. The spectrum of the Casimir operator on the double cover SΓ = Γ\SL(2, R) of the unit tangent bundle SM can be described as follows:  {μ ∈ σ()|μ = −n2 + n} . (10) σ() = σ(ΔM ) ∪ n∈N\{0}

For each μ ∈ σ() we have 1 Hμi = {f ∈ L2 (SΓ )|f = μf } . μi =μ

Non-trivial irreducible unitary representations of SL(2, R) are of three types [5]. Representations of the principal series (μ ≥ 1/4), representations of the complementary series (0 < μ < 1/4) and discrete series (μ ≤ 0). Every irreducible representation of the principal and complementary series is generated by the lift of a unique normalized eigenfunction of the Laplacian operator on the quotient Γ\H of eigenvalue μ ≥ 1/4 (principal series) or 0 < μ < 1/4 (complementary series). Irreducible components of the discrete series are orthogonal to the subspace of Θ-invariant functions. We now state the main results on invariant distributions and on the cohomological equation for horocycle flows. Let us denote by W s (SΓ ) the Sobolev space of order s ∈ R and W s (Hμ ) := W s (SΓ ) ∩ Hμ for every μ ∈ σ(). We recall that Sobolev spaces can be defined in terms of the (positive) Laplace operator (on SΓ ) ΔSΓ = −X 2 − Y 2 − Θ2 =  + 2Θ2 . In fact, the Sobolev space W s (SΓ ) is the domain of the operator (I + ΔSΓ )s/2 endowed with the graph norm, that is, W s (SΓ ) := {f ∈ L2 (SΓ )|(I + ΔSΓ )s/2 f ∈ L2 (SΓ )} .

126

GIOVANNI FORNI

The splitting in formula (9) induces a splitting of Sobolev spaces. For every s ∈ R, we have 1 W s (Hμ ) . (11) W s (SΓ ) = μ∈σ() 

Let IU (SΓ ) ⊂ E (SΓ ) be the space of all invariant distributions for the flow {hU t } and for every s > 0, let IU−s (SΓ ) := IU (SΓ ) ∩ W −s (SΓ ) . The structure of the space of horocycle-invariant distributions is described below. For every μ ∈ σ(), let E  (Hμ ) ⊂ E  (SΓ ) be the space of all distributions supported on C ∞ (Hμ ) := C ∞ (SΓ ) ∩ Hμ . For every μ ∈ σ() and every s > 0, let IU−s (Hμ ) := IU−s (SΓ ) ∩ E  (Hμ ) . √ Theorem 1. [14] For every μ ∈ R, let ν = 1 − 4μ. The following holds. For μ ∈ R+ \ {1/4} (principal and complementary series), the space IU (Hμ ) has a basis {Dμ+ , Dμ− } ⊂ IU−s (Hμ ), for every s > (1 + Re(ν))/2, such that 1±ν ± Dμ . 2 For μ = 1/4 (exceptional value of the principal series), the space IU (Hμ ) has a basis {Dμ+ , Dμ− } ⊂ IU−s (Hμ ), for every s > 1/2, such that  +    + 1 1 0 Dμ Dμ LX =− . Dμ− Dμ− 2 1 1 LX Dμ± = −

For μ = −n2 + n < 0, n ∈ N \ {0} (discrete series), the space IU (Hμ ) has a basis {Dμ+ } ⊂ IU−s (Hμ ), for every s > (1 + Re(ν))/2 = n, such that 1+ν + Dμ = −nDμ+ . 2 We remark that since horocycle flows for compact surfaces are uniquely ergodic, they have no signed invariant measures (of finite total mass) besides scalar multiples of the phase space volume. Even for non-compact, finite area surfaces there are few invariant measures, supported on periodic orbits. By Theorem 1 the picture is completely different for the action of the horocyle flow on the larger space of distributions: the space of all invariant distributions has countable dimension. The construction of horocycle-invariant distributions is based on Fourier analysis and no dynamical construction or representation of horocycle-invariant distributions is known. Equivalent conjectures of Greenfield-Wallach [25] and A. Katok (see for instance [30]) state that the only smooth stable flows with a one-dimensional space of invariant distributions are flows smoothly conjugated to Diophantine linear flows on tori. This conjecture is fully proved only in dimension 2 and 3, while in higher dimension there is an interesting partial result by F. and J. Rodriguez Hertz (see [18] and references therein). In all non-toral examples of smooth stable uniquely ergodic flows for which invariant distributions could be constructed (for instance area-preserving flows on higher genus surfaces [17], horocyle flows [14] and nilflows [15]) the space of all invariant distributions has countable dimension. In all such cases the construction is based on harmonic analysis. These results have been recently extended to all non-toral homogenous flows [16]. For general nonhomogeneous flows on compact manifolds the problem of existence of a space of LX Dμ+ = −

LIMIT THEOREMS FOR HOROCYCLE FLOWS

127

invariant distributions of higher or countable dimension is wide open. Recently A. Avila, A. Kocsard and B. Fayad [1] have constructed examples of uniquely ergodic smooth flows on manifolds not homeomorphic to a torus with a space of invariant distributions reduced to the line of invariant measures. Another approach to invariant distributions for horocycle flows is based on the Patterson-Sullivan point of view. In fact, by decomposing lifts to the universal cover with respect to the geodesic flow, the stable/unstable horocycle flow and the boundary circle, it is possible to establish a one-to-one correspondence between horocycleinvariant distributions and Γ-invariant conformal (or simply Γ-conformal) distributions on the boundary circle. These in turn can be obtained as “boundary values” of bounded Γ-invariant eigenfunctions of the Laplace-Beltrami operator on the hyperbolic plane (see for instance [43]) or of holomorphic and anti-holomorphic forms. By this construction and by a result J.-P. Otal [41], S. Cosentino [11] has proved that horocycle-invariant distributions are in fact H¨older (that is, they are continuous on the appropriate spaces of H¨older functions) with H¨ older exponent equal to their Sobolev order. Since measures are just positive distributions, it follows in particular that for compact surfaces the horocycle flow is uniquely ergodic, as the only positive harmonic functions on a compact surface are the positive constant functions. However, the Laplace-Beltrami operator has a countable basis of eigenfunctions (and the space of holomorphic forms has also countable dimension), hence the space of all horocycle-invariant distributions has countable dimension. It is also clear from the above discussion that whenever the Laplace-Beltrami operator on a (non-compact) hyperbolic surface admits non-constant positive (unbounded) eigenfunctions, then the space of all horocycle-invariant distributions contains horocycleinvariant Radon measures. In fact, all horocycle-invariant Radon measures which are quasi-invariant under the action of the geodesic flow can be described in this way [33], [35]. In the case of Abelian covers of compact surfaces these measures were constructed by M. Babillot and F. Ledrappier [2]. The following theorem states that horocycle flows are stable (in the sense of A. Katok) and that the Green operator (the solution operator) satisfies Sobolev estimates (with a loss of one 1 +  derivatives). Theorem 2. [14] Let f ∈ W s (SΓ ) for s > 1. If D(f ) = 0, for all D ∈ IU−s (SΓ ), then there exits a unique function u ∈ L2 (SΓ ) such that LU u = f . In fact, the function u belongs to W t (SΓ ) for every t < s − 1. In addition, for every s > 1 and every t < s − 1 there exists a constant Cs,t > 0 such that, for all f ∈ W s (SΓ ) such that D(f ) = 0 for all D ∈ IU−s (SΓ ), we have ut ≤ Cs,t f s . Our approach to the asymptotics of ergodic integrals is not surprisingly based on renormalization under the action of the geodesic flow. Theorem 1 completely describes the action of the geodesic flow (in fact, of its generator) on the space of horocycle-invariant distributions. This point of view is related to the proof of exponential decay of correlations for Anosov diffeomorphisms and flows by the methods of Liverani [37], Gou¨ezel-Liverani [23], Baladi-Tsujii [4] , Baladi-Gou¨ezel [3] (see for instance, the recent work [22] by P. Giulietti, C. Liverani and M. Pollicott on

128

GIOVANNI FORNI

zeta functions and references therein). In fact, since the orbit foliation of the stable/unstable horocycle flow is the stable/unstable foliation for the geodesic flow, which is a uniformly hyperbolic (Anosov) flow, polynomial bounds on the equidistribution of long horocycles establish a strong form of exponential decay of correlations for the geodesic flow. In fact, our analysis of the cohomological equation and invariant distributions for horocycle flows does imply exponential decay of correlations for the geodesic flow. The proof of exponential decay of correlations for geodesic flows on compact surfaces of negative curvature goes back to D. Dolgopyat’s thesis [13]. Dolgopyat’s work was later generalized by C. Liverani [37] by an approach based on the spectral analysis of the transfer operator of the flow on appropriate spaces of currents. In this approach exponential decay of correlations follows from bounds on the radius of the essential spectrum of the transfer operator. Our analysis shows that for surfaces of constant curvature, the spectrum of the transfer operator is entirely pure-point on a space of horocycle-invariant distributions, that is on the space of distributions perpendicular to the space of smooth coboundaries for the horocycle flow. In particular, the transfer operator of the geodesic flow has (countable) pure point spectrum on a space of currents which is sufficiently large to include all horocycle arcs. We remark that these results are proved without recourse to the spectral theory of unbounded operators, since in our special case it is possible to give an explicit description of the distributional eigenvectors by Fourier analysis, that is, by the theory of unitary representations of the group SL(2, R), as we will outline below in section 2. In the general case, it is known by Liverani’s result [37] that geodesic flows on compact manifolds of negative curvature have exponential decay of correlations (and more is known for instance from [22]). It is therefore natural to ask whether by our methods it would be possible to derive results on the limit distributions of horospheres (stable and unstable foliations). There are several difficulties to overcome to achieve a similar goal. In particular, to the best of the author’s knowledge it is not known under what conditions the spectrum of the transfer operator has a pure point component and, even in that case, available methods are based on spaces of currents which do not seem to include bounded subsets (with smooth boundary) of horospheres. Presumably, a generalization to higher dimension is possible in the constant curvature case, along the lines explained below for the case of surfaces, assuming that it is possible to carry out the analysis of the related higher dimensional cohomological equation for horospheres by methods of representation theory. 1.3. Asymptotics of ergodic integrals. We state below our main result on the asymptotics of ergodic averages for smooth functions. It states that ergodic integrals can be well-approximated by certain H¨ older cocycles which can be decomposed with a respect to a basis of fundamental H¨ older cocycles. The fundamental H¨ older cocycles have a key exact scaling property under the action of the geodesic flow. Our results on limit distributions are immediate consequences of our approximation theorem and of the exact scaling properties of the fundamental H¨older cocycles. older cocycles βμ± : SΓ ×R → C For every μ ∈ σ(ΔM ), there exist fundamental H¨ U for the horocycle flow {ht }. The following theorem summarizes their properties. Theorem 3. [9] For every μ ∈ σ(ΔM ), there exists βμ± : SΓ × R → C with the following properties.

LIMIT THEOREMS FOR HOROCYCLE FLOWS

129

(1) (Cocycle property) For all x ∈ SΓ and all S, T ∈ R, ± βμ± (x, S + T ) = βμ± (hU S (x), T ) + βμ (x, S) ;

(2) (Geodesic scaling) For μ = 1/4, for all x ∈ SΓ and for all t, T ∈ R, 1∓ν ± )βμ (x, T ) , 2 and for μ = 1/4 (ν = 0), for all x ∈ SΓ and for all t, T ∈ R,   +    + t 1 −t/2 βμ (x, T ) βμ (g−t x, T et ) ) = exp( ; 1 βμ− (g−t x, T et ) βμ− (x, T ) 2 0 βμ± (g−t x, T et ) = exp(

(3) (H¨ older property) there exists a constant C > 0 such that, for all (x, T ) ∈ SΓ × R, for all μ = 1/4, |βμ± (x, T )| ≤ C|T |

1∓Reν 2

,

and for μ = 1/4 (ν = 0), 1

+ |β1/4 (x, T )| ≤ C|T | 2 (1 + | log T |) , − (x, T )| ≤ C|T | 2 ; |β1/4 1

(4) (Orthogonality) For any T ∈ R, the bounded functions βμ± (·, T ) belong to the irreducible component Hμ ⊂ L2 (SΓ ). For every zero average f ∈ W s (SΓ ) (s > 11/2), let  Dμ+ (f )βμ+ + Dμ− (f )βμ− . (12) βf := μ∈σ(ΔM )

It follows from Theorem 3 and from the decay properties of the sequence of coefficients {Dμ± (f )|μ ∈ σ(ΔM )} that the series in formula (12) converges and defines a H¨ older cocycle with the following property. There exist a constant Cf > 0 and numbers 0 < αf− < αf+ < 1 such that, for all (x, T ) ∈ SΓ × R,   − + (13) |βf (x, T )| ≤ Cf max |T |αf , |T |αf . We are now ready to state the our approximation result. Theorem 4. [9] Let s > 11/2. For every zero average function f ∈ W s (SΓ ), the following approximation results holds: there exists a constant Cs > 0 such that  T (14) | f ◦ hU t (x, t)dt − βf (x, T )| ≤ Cs f s (1 + max{log T, 0}) . 0

Roughly speaking, the H¨ older cocycles βμ± are constructed as follows. For every + (x, T ) ∈ SΓ ×R , the complex number βμ± (x, T ) is a dynamically defined component of the distribution defined by integration along the orbit segment γx,T = {hU t (x)|0 ≤ t ≤ T } in the direction of the invariant distribution Dμ± ∈ E  (SΓ ). The core of the argument consists in proving that the dynamically defined projection is well-defined by the following construction. The distribution given by the horocycle arc γx,T is pushed under the action of the geodesic flow for a time t > 0. Its orthogonal components in the direction of the invariant distributions Dμ± with respect to a Sobolev norm

130

GIOVANNI FORNI

converges after normalization to βμ± (x, T ) as t gets large. We remark that the distribution γx,T belongs to W −s (SΓ ) for all s > 3/2 by Sobolev embedding theorem, while by Theorem 1 the distributions Dμ± belongs to W −s (SΓ ) for all s > 1. Finally, we would like to point out that the above results lead to a complete description of the asymptotics and of the limit distributions of ergodic averages for all smooth functions with the exception of finitely many functions. In fact, for any smooth function f ∈ W s (SΓ ) (s > 11/2) supported on the irreducible components of the principal or complementary series, Theorem 4 can in fact be stated in the following stronger form: for all (x, T ) ∈ SΓ × R,  T (15) | f ◦ hU t (x, t)dt − βf (x, T )| ≤ Cs f s . 0

In other words, the logarithmic error term comes from the discrete series. However, if the function f ∈ W s (SΓ ) (s > 1) is supported on an irreducible component of the discrete series with μ = −n2 + n < 0 (n ≥ 2), then by Theorem 2, the function f is a coboundary with at least square-integrable transfer function. In fact, by Theorem 1 the invariant distribution Dμ+ belongs to W s (Hμ ) for s > n and it can be proved that Dμ+ does not belong to W s (Hμ ) for s < n. Since ergodic integrals of coboundaries are completely described (by the fundamental theorem of calculus) in terms of the transfer function, only smooth functions supported on irreducible components of type H0 (μ = 0, n = 1) which are not coboundaries are not included in the above asymptotics. Up to coboundaries, there are only finitely many such functions, since irreducible components of type H0 appear with finite multiplicity and there is a unique (normalized) invariant distribution in every component. It was proved in [14] that ergodic integrals for smooth function supported on irreducible components of type H0 grow at most logarithmically (hence the error term in Theorem 4), but no asymptotics was established in this case. Results on the limit distributions of ergodic integrals for such functions can be derived from work of F. Ledrappier and O. Sarig [34] on fluctuations of ergodic integrals for horocycle flows on “Zd -covers” of compact surfaces (we owe this observation to D. Dolgopyat). 1.4. Limit distributions. In these lectures we will describe the limit distributions of the ergodic integrals of smooth functions for classical horocycle flows on compact surfaces. Let f ∈ C 0 (SΓ ) be any continuous function of zero average. By the unique ergodicity of the (stable) horocycle flow it follows that  1 T f ◦ hU t dt → 0 T 0 uniformly on SΓ as T > 0 gets large. We are interested in sharp bounds on the ergodic integrals of sufficiently smooth functions, on the growth of their variance (that is, of their L2 norm) and on the existence and properties of their limits in distributions as T > 0 gets large. Results on the growth of the variance and on limit distributions will be derived in our case from sharp asymptotics. We recall that for hyperbolic (Anosov) flows (such as the geodesic flow) the answer to these questions is derived along the lines of the theory of independent identically distributed random variables. According to this theory, the speed of convergence of ergodic integrals if given by the Law of Iterated Logarithms (LIL) and limit distributions are described by the Central Limit Theorem (CLT). Our

LIMIT THEOREMS FOR HOROCYCLE FLOWS

131

work on the horocycle flow was in part motived by a question of Y. Sinai who asked whether the central limit theorem holds in this case. We recall that the CLT holds for a flow (φt ) on a probability space (Ω, μ) for a zero average function f ∈ L2 (Ω, μ) if in the sense of probability distributions  T 1 f ◦ φt dt → N (σf , 0) , T 1/2 0 where N (σf , 0) is a Gaussian (normal) distribution on the real line of mean zero and variance σf > 0. The CLT in particular implies a universal square-root growth rate of the variance with time. The main result of our analysis is that the CLT fails for horocycle flows for several reasons. Namely, the growth of the variance may follow a power-law with a different exponent (which depends on the hyperbolic surface and on the function) and limit distributions are always compactly supported. We prove that limit distributions exist for generic functions on hyperbolic surfaces with small eigenvalues of the Laplacian, but we are unable to decide whether there exists functions for which the limit distribution fails to exist. We do prove that for smooth functions in a subspace which has in general finite codimension, except when the hyperbolic surface has no small eigenvalues, the random variables Et (f, T ) are asymptotic to a linear flow on a torus (of finite or infinite dimension) in the space of random variables. The latter statement is an indication that the limit distribution may fail to be unique. However, we are not able to exclude that all random variables in the limit torus have identical distribution. We conclude this introduction with the statements of our main results on limit distributions of ergodic integrals. In the following, given f ∈ L2 (SΓ ) with zero average, we will denote for all t > 0 and for all T ∈ [0, 1],  et T f ◦ hU τ dτ . (16) Et (f, T ) :=  0et T  0 f ◦ hU τ dτ 0 The function Et (f, T ) denotes the ergodic integral of the function f ∈ L2 (SΓ ), as a function on SΓ , after normalization of their variance, that is, of their L2 norm. We are interested in the existence and properties of limit points of the probability distributions on the real line defined as follows: for any a < b, let (17)

Mt (f, T )(a, b) = vol{x ∈ SΓ |a < Et (f, T )(x) < b} .

Let f be a smooth function with non-zero orthogonal projection onto irreducible components of the complementary series. Let  f= fμ μ∈σ()

denote the decomposition of f with respect to a splitting of the space L2 (SΓ ) into irreducible components. Let μf := min{μ ∈ σ() \ {0}|fμ = 0} let H1 , . . . , Hk ⊂ L (SΓ ) be all the irreducible components of Casimir parameters μ1 = · · · = μk = μf . Let {D1± , . . . , Dk± } denote the basis of distributional eigenvectors of the geodesic flow of the space of invariant distributions for the horocycle flow 2

132

GIOVANNI FORNI

supported on D (H1 ) ⊕ · · · ⊕ D (Hk ). Let {β1± , . . . , βk± } be the corresponding cocycles for the horocycle flow. For every T ∈ R, let Pcp (f, T ) denote the probability distribution on the real line of the random variable on SΓ k − − i=1 Di (f )βi (·, T )  1/2 . k − 2 β − 2 |D (f )| i i 0 i=1 By Theorem 4 on the asymptotics of ergodic integrals and by the orthogonality of cocycles, the above function is bounded, non-constant and has zero average on SΓ . The probability measure Pcp (f, T ) is therefore non-atomic and has compact support on the real line. Let dLP denote the L´evy-Prohorov metric on the space of probability measures on the real line. We recall that on any separable metric space, hence, in particular, on the real line, the L´evy-Prohorov metric induces the weak∗ topology on the space of probability measures (see, e.g. [6]). Theorem 5. [9] There exists a constant α > 0 depending only on the surface M such that the following holds. For any s > 11/2 there exists a constant Cs > 0 depending only on s such that the following holds. Let f ∈ W s (SΓ ) be any realvalued function of zero average such that the2Casimir parameter μf ∈ (0, 1/4) and (D1− (f ), . . . , Dk− (f )) = (0, . . . , 0). Let νf := 1 − 4μf ∈ (0, 1). Then (1) For all T ∈ [0, 1] and all t > 0, we have # #  T et #  0 f ◦ hU # τ (x)dτ 0 # 1+ν  1/2 −  f # k − − 2 t 2 2 #e i=1 |Di (f )| βi 0

# # # # 1# ≤ Cs f s e−αt . # #

(2) For any T ∈ [0, 1] we have the convergence in distribution Mt (f, T ) → Pcp (f, T ) as t → ∞ with the following estimate that holds for all t > 0 uniformly in T ∈ [0, 1]: dLP (Mt (f, T ), Pcp (f, T )) ≤ Cs f s e−αt . Now we prove that, for sufficiently smooth functions supported on irreducible components of the principal series, normalized ergodic integrals converge in distribution on SΓ to a quasi-periodic motion on an infinite-dimensional torus. Let {μn } be the sequence of Casimir parameters in the interval (1/4, +∞) (listed with multiplicities). The isotypical components of the decomposition of L2 (SΓ ) into irreducible representations, being eigenspaces of the Casimir operator, are closed under complex conjugation. It follows that there exists an orthogonal decomposition of L2 (SΓ ) into irreducible components each closed under complex conjugation. Let {Dμ±n } denote the corresponding sequence of horocycle-invariant older cocycles described in distributions, and let {βμ±n } the sequence of additive H¨ Theorem 3. By the characterization of the distributions {Dμ±n } as distributional eigenvectors of the geodesic flow and by the construction of the cocycles {βμ±n }, it follows that, for all n ∈ N, (18)

Dμ−n = Dμ+n

and

βμ−n = βμ+n .

LIMIT THEOREMS FOR HOROCYCLE FLOWS

133

For any s > 11/2, let f ∈ W s (SΓ ) be a real-valued function supported on irreducible components of the principal series. By definition and by formula (18), the cocycle βf : SΓ × R → C is also real-valued, and from (12) we have  βf (x, T ) = Re[ Dμ+n (f )βμ+n (x, T )] , for all (x, T ) ∈ SΓ × R . n∈N ∞



Let T := (R/2πZ) be the infinite-dimensional torus endowed with the product ∞ vector υ/2 ∈ R∞ topology. We will consider the linear flow on √ T with frequency + defined as follows. For all n ∈ N, let υn := 4μn − 1 ∈ R , and let υ := (υ1 , . . . , υn , . . . ) . For any real-valued function f ∈ W s (SΓ ) supported on irreducible unitary components of the principal series and for all θ ∈ T∞ , let  Dμ+n (f )eiθn βμ+n (x, T )] , for all (x, T ) ∈ SΓ × R . βf (θ, x, T ) := Re[ n∈N ∞

For θ ∈ T and T > 0, let Ppr (f, θ, T ) be the probability distribution of the random variable given by the formula βf (θ, x, T ) , for all x ∈ SΓ . (19) βf (θ, x, T )0 Since the random variables βf (θ, ·, T ) on SΓ are non-constant and bounded, the probability distributions Ppr (f, θ, T ) are non-atomic compactly supported measures on the real line (uniformly with respect to T ∈ [0, 1]). Our main result on the asymptotics of distributions of normalized ergodic integrals for real-valued functions supported on the principal series is the following Theorem 6. [9] For any s > 11/2 there exists a constant Cs > 0 such that the following holds for any real-valued function f ∈ W s (SΓ ) supported on the irreducible components of the principal series such that {Dμ+n (f )} = 0 in 1 (N, C). (1) For all T ∈ [0, 1] and all t > 0, we have #  T et # # # U t # 0 f ◦ hτ (x)dτ 0 # − 1 # # ≤ Cs f s e− 2 . t # e 2 βf ( υt , ·, T )0 # 2 (2) For all T ∈ [0, 1] and all (x, t) ∈ SΓ × R+ , we have # # υt # # #Et (f, T )(x) − βf ( 2 , x, T ) # ≤ Cs f s e− 2t . υt # β ( , x, T ) # f

2

0

(3) For all T ∈ [0, 1] and all t > 0, we have   t υt dLP Mt (f, T ), Ppr (f, , T ) ≤ Cs f s e− 2 . 2 The above theorem implies that for real-valued functions supported on the principal series limit distributions exist along sequences of times such that the orbit of the toral translation of frequency υ/2 ∈ R∞ on the infinite torus T∞ converges. We conjecture that the limit does not exist otherwise. In fact, for generic surfaces the limit can exist if and only if the probability distribution Ppr (f, θ, T ) are constant on T∞ . This seems highly unlikely as it leads to significant symmetry restrictions on the probability distributions for the cocycles {βμ±n (x, T )} (see Corollary 1.5 in [9]). In the unlikely case that the distributions Ppr (f, θ, T ) are constant on T∞ we

134

GIOVANNI FORNI

prove that they would be independent on the function for all functions supported on the principal series which are not coboundaries (see Corollary 1.4 in [9]). 2. The cohomological equation In this section we outline the proof of Theorems 1 and 2, which describe solutions of the cohomological equation and invariant distributions for the horocycle flow, following [14]. Our analysis is based on the theory of unitary representations for the group SL(2, R) developed for instance in [5], [20] and [21]. This section contains the core analytic results which underlie all of our results on the asymptotics of ergodic integrals and on limit distributions of horocycle flows. 2.1. Reduction to irreducible components. As a first step, we will reduce the study to a single non-trivial irreducible component Hμ of Casimir parameter μ ∈ σ(). We remark that Theorem 1 on invariant distributions already describes the space IU (Hμ ) of horocycle-invariant distributions supported on a single nontrivial irreducible component. Solutions of the cohomological equation for a single non-trivial irreducible component are given by the following result. √ Theorem 7. [14] Let ν = 1 − 4μ. For every s > (1 + Re(ν))/2 and for every t < s − 1, there exists a constant C(μ, s, t) > 0 such that the following holds. For any f ∈ W s (Hμ ) such that D(f ) = 0 for all D ∈ IU (Hμ ), there exists a solution u ∈ Hμ of the cohomological equation LU u = f such that u belongs to W t (Hμ ) for all t < s − 1 and satisfies the following bound: ut ≤ C(μ, s, t)f s . Notice that the above theorem is an abstract result on irreducible representations of the group SL(2, R) on a Hilbert space H. In applications to general unitary representations, to construct the solution of the cohomological equation LU u = f , we decompose the given vector f into irreducible components fμ ∈ Hμ , that is,  f= fμ , μ∈σ()

then for each μ ∈ σ() we consider the solution uμ ∈ Hμ of the cohomological equation LU uμ = fμ and write the formal solution  (20) u := uμ . μ∈σ()

Notice that the splitting into irreducible components is by definition invariant under the unitary action of SL(2, R) and orthogonal, hence f ∈ W s (H) if and only if fμ ∈ W s (Hμ ) for all μ ∈ σ(), in fact  f 2s = fμ 2s . μ∈σ()

It follows that under the assumption that f ∈ W s (H) for s > 1 by the above Theorem 7 the solutions uμ ∈ W t (Hμ ) for t < s − 1 exist for all μ ∈ σ(). It only remains to prove that the series in formula (20) converges in W t (H). The key of the solution of this “glueing” problem lies in the Sobolev estimates given in

LIMIT THEOREMS FOR HOROCYCLE FLOWS

135

Theorem 7. In fact, assume that such estimates are uniform with respect to the representation parameter, that is, assume that (21)

C(s, t) = sup C(μ, s, t) < +∞ . μ∈σ()

By Theorem 7 the following estimate then holds:   u2t = uμ 2t ≤ C(s, t) fμ 2s = C(s, t)f 2s . μ∈σ()

μ∈σ()

It follows from the proof of Theorem 7, which will be outlined below, that the uniform estimate given by formula (21) holds if and only if the given unitary representation of the group SL(2, R) on the Hilbert space H has a spectral gap, that is, the trivial representation is simple and it is isolated, in the sense that the bottom of the positive part of the Casimir spectrum is strictly positive: (22)

μ0 := inf{μ|μ ∈ σ() ∩ R+ } > 0 .

In our situation, for representations of SL(2, R), in fact of P SL(2, R)), on the Hilbert space H := L2 (SΓ ) of square-integrable functions on the double cover SΓ of the unit tangent bundle SM of a hyperbolic surface M , the bottom of the positive part of the Casimir spectrum coincides with the bottom of the spectrum of the Laplacian on the surface: μ0 = inf{μ|μ ∈ σ(ΔM ) ∩ R+ } . Theorem 2 (for compact hyperbolic surfaces or more generally for hyperbolic surfaces of finite total area) then follows from Theorem 7 by the above argument. We now proceed to outline the proof of Theorem 1 and Theorem 7. In the case of SL(2, R) a rather elegant proof, based only on the structure of irreducible representations, which entirely avoids working with concrete models, is possible. In fact, this was the original approach followed in [14]. This structural approach has proved hard to generalize to other cases or to analogous problems, for which unitary models of irreducible representations seem more effective. For this reason, after an outline of the original abstract approach, we will briefly outline a different approach based on unitary models. 2.2. Bases of irreducible components. The operators η+ = X − iY

and

η− = X + iY

of Cauchy-Riemann type play the role of creation and annihilation operators respectively for the spectrum of the circle action of generator Θ (that is, the action of the maximal compact subgroup). In fact, the following commutation relations hold: [−iΘ, η± ] = ±η± . It follows immediately from the above commutation relations that for every smooth vector vn such that Θvn = invn we have Θ(η± vn ) = i(n ± 1)(η± vn ) . Irreducible components Hμ of the principal and complementary series (μ > 0) are characterized as follows: there exists a Θ-invariant smooth vector v0 ∈ C ∞ (Hμ ). Since v0 ∈ Hμ , by definition v0 = μv0 , and since Θv0 = 0, it follows that −(X 2 + Y 2 )v0 = v0 = μv0 .

136

GIOVANNI FORNI

Since by Θ-invariance the function v0 is the lift to SΓ of a smooth function v¯0 on the surface M , it follows that v0 = μ¯ v0 , ΔM v¯0 = −(X 2 + Y 2 )¯ that is, the vector v0 is the lift of an eigenfunction of the Laplace-Beltrami operator on the hyperbolic surface. It can be proved that such a vector is unique up to multiplicative scalars in every irreducible component Hμ of the principal and complementary series. In addition, the system (23)

{(η− )k v0 |k < 0} ∪ {v0 } ∪ {(η+ )k v0 |k > 0}

is an orthogonal basis of Hμ of eigenvectors of the circle action. Irreducible components Hμ of the discrete series (μ ≤ 0) come in two flavors: the holomorphic and the anti-holomorphic discrete series. Since the antiholomorphic components are unitarily equivalent to the holomorphic ones by complex conjugation, it will be sufficient to describe the holomorphic discrete series. Let Hμ be a irreducible component of the holomorphic discrete series of Casimir parameter μ = −n2 + n < 0. There exists a holomorphic Θ-eigenfunction vn ∈ C ∞ (Hμ ), that is, a smooth vector vn such that Θvn = invn

and

η− vn = 0 .

This conclusion follows from the remark that, if no Θ-invariant vector belongs to Hμ , then since η± are creation/annihilation operators either there exists a non-zero vector v + ∈ C ∞ (Hμ ) and n+ > 0 such that Θv + = −in+ v +

and

η+ v + = 0 ,

or there exists a non-zero vector v − ∈ C ∞ (Hμ ) and n− > 0 such that Θv − = in− v −

and

η− v − = 0 .

In either case it is possible to compute the Casimir parameter from the spectral parameters n± for the circle action by the formulas (24)

−η− η+ = −X 2 − Y 2 + iΘ =  − Θ2 − iΘ , −η+ η− = −X 2 − Y 2 − iΘ =  − Θ2 + iΘ .

For irreducible components of the holomorphic discrete series the system (25)

{vn } ∪ {(η+ )k vn |k > 0}

is an orthogonal basis of Hμ of eigenvectors of the circle action. A similar construction holds irreducible components of the anti-holomorphic discrete series. √ For every Casimir parameter μ ∈ R, let ν = 1 − 4μ. For the principal series (μ ≥ 1/4) the parameter ν ∈ iR, for the complementary series ν ∈ (0, 1) and for the discrete series ν = 2n − 1 ≥ 1, whenever μ = −n2 + n for n ∈ N \ {0}. In [14] we introduced for convenience in calculations the following modified bases. For principal and complementary series irreducible representations (μ > 0), we let ' 2 ck (η+ uk−1 ) with ck = 2k−1+ν for k > 0 ; v0 and uk = (26) u0 = 2 v0 0 ck (η− uk+1 ) with ck = −2k−1+ν for k < 0 .

LIMIT THEOREMS FOR HOROCYCLE FLOWS

137

For discrete series holomorphic irreducible representations (μ = −n2 + n), we let (27) un =

vn vn 0

and

uk = ck (η+ uk−1 ) with ck =

2 for k > n . 2k − 1 + ν

The anti-holomorphic case is isomorphic by complex conjugation. In the following we will often leave implicit that the basis {uk } is indexed by the set {k|k ∈ Z} of all integers for the case of the principal and complementary series (μ > 0), while it is indexed by the set {k ∈ Z|k > n} for irreducible components of the holomorphic discrete series of Casimir parameter μ = −n2 + n ≤ 0. We note that the basis {uk } is orthogonal, but not orthonormal in general. The Hilbert space norms of the vectors of the basis can be computed explicitly. For irreducible components of the principal and complementary series, let Πν,k =

k & 2i − 1 − ν i=1

2i − 1 + ν

,

for all k > 0 ;

for irreducible components of the holomorphic discrete series of Casimir parameter μ = −n2 + n ≤ 0, let Πν,k

k & 2i − 1 − ν , = 2i − 1 + ν i=n+1

for all k > n .

By elementary calculations and estimates, (28)

uk 20 = |Πν,|k| | ≈ (1 + k2 )−

Re(ν) 2

.

Notice that by the above formula the basis {uk } is orthonormal in Hμ if and only if Hμ belongs to the principal series. The Sobolev norms of the vector of the basis can also be computed explicitly. In fact, the basis {uk } is made of joint eigenvectors of the Casimir operator and of the circle action, which implies that it is a basis of eigenvectors of the Laplacian on SΓ : ΔSΓ uk := −(X 2 + Y 2 + Θ2 )uk = ( + 2Θ2 )uk = (μ + 2k2 )uk . It follows that the basis {uk } is orthogonal in W s (Hμ ) for all s ≥ 0 and that the Sobolev norms of its vectors are given by the formula: (29)

uk 2s = (1 + μ + 2k2 )s |Πν,|k| | ≈ (1 + k2 )s−

Re(ν) 2

.

The above estimates for the Sobolev norms of the basis vectors allow us to derive precise results on the Sobolev regularity of invariant distributions and of solutions of the cohomological equation. 2.3. Difference equations. We prove below that the Lie derivative operator along the horocycle flow is given in coordinates with respect to the basis {uk } by a second order difference operator. We begin by computing the action of the Lie derivative operator on the vectors of the basis introduced above. Lemma 1. The following formula holds: LU uk = −i

2k + 1 + ν 2k − 1 − ν uk+1 + ikuk − i uk+1 . 4 4

138

GIOVANNI FORNI

Proof. By formula (3) we have i U = −Y + Θ = Θ − (η+ − η− ) . 2 By construction we also have 2k + 1 + ν uk+1 . 2 It remains to compute as follows, taking into account formula (24): Θuk = ikuk

and

η+ uk = c−1 k+1 uk+1 =

η− uk = ck η− η+ uk−1 = ck (− + Θ2 + iΘ)uk−1 ! = ck −μ − (k − 1)2 − (k − 1) uk−1   2 ν −1 2 − (k − 1) − (k − 1) uk−1 = ck 4 2k − 1 − ν uk−1 . =− 2 The formula in the statement follows immediately.



The equation for the Fourier coefficients of invariant distributions can be immediately derived from Lemma 1. In fact, let D ∈ E  (Hμ ) be any distribution ˆ := (D ˆ k ) be the supported on a single non-trivial irreducible component Hμ . Let D sequence of the Fourier coefficients of the distribution D ∈ E  (Hμ ) with respect to the orthogonal basis {uk } of the irreducible component Hμ : ˆ k := D(uk ) . D Lemma 2. The distributional equation LU D = 0 is equivalent to the following ˆ = (D ˆ k) second order difference equation for the sequence D ˆ k := −i (Lν D)

2k + 1 + ν ˆ ˆ k−1 = 0 . ˆ k − i 2k − 1 − ν D Dk+1 + ikD 4 4

Proof. Since {uk } is an orthogonal basis of all Sobolev spaces W s (Hμ ), the equation LU D = 0 is equivalent to the following sequence of equations: LU D(uk ) = −D(LU uk ) = 0 . The statement then follows immediately from Lemma 1.



We are now ready to describe the space invariant distributions supported on non-trivial irreducible components. Proof of Theorem 1. It is clear from Lemma 2 that the space IU (Hμ ) has dimension at most two for an irreducible component Hμ . This is because the Fourier coefficients of invariant distributions satisfy a two-sided second order difference equation. For the discrete series the space of solutions has in fact dimension at most one since for μ = −n2 + n (ν = 2n − 1) the difference equation in Lemma 2 implies ˆ n = −inD ˆ n+1 + inD ˆ n = 0 , hence D ˆ n+1 = D ˆn . (Lν D) This condition lowers by one the dimension of the space of formal solutions. The construction of formal solutions can be carried out as follows. By inspection the sequence ˆ + = 1 , for all k ∈ Z , D k

LIMIT THEOREMS FOR HOROCYCLE FLOWS

139

ˆ = 0. For an irreducible is always a formal solution of the difference equation Lν D component Hμ of the holomorphic discrete series of parameter μ = −n2 + n ≤ 0, we consider the restriction of the above solution to the half-infinite interval k ≥ n. By the estimates (29) on Sobolev norms of the basis vectors, it follows that the above formal solution in fact defines an invariant distribution 1 + Re(ν) . Dμ+ ∈ W −s (Hμ ) for every s > 2 For irreducible components of the principal and complementary series another solution of the difference equation can be found by reduction of the order (based on the above solution): for μ = 1/4 (ν = 0) a second independent solution is given by the formula D0− = 1 ,

Dk− = Πν,|k| =

|k| & 2i − 1 − ν i=1

2i − 1 + ν

,

for all k ∈ Z \ {0} ;

for μ = 1/4 (ν = 0) a second independent solution is given by the formula D0−

= 0,

Dk−

=

|k|  i=1

1 2k − 1

for all k ∈ Z \ {0} .

In both cases by the estimates (29) on Sobolev norms of the basis vectors, it follows that the above formal solution in fact defines an invariant distribution 1 − Re(ν) Dμ− ∈ W −s (Hμ ) for every s > . 2 It remains to compute the action of the Lie derivative operator along the geodesic flow on the above basis of horocycle-invariant distributions. By Lemma 1 and by the commutation formula X = [Θ, U ] we have 2k + 1 + ν 2k − 1 − ν uk+1 − uk−1 . 4 4 Direct calculation based on the above formula leads to the conclusions stated in Theorem 1. For instance for irreducible representation of the complementary series and of the principal series with μ = 1/4, we have L X uk =

2k + 1 + ν + 2k − 1 − ν + Dμ (uk+1 ) + Dμ (uk−1 ) 4 4 2k − 1 − ν 1+ν 1+ν + 2k + 1 + ν + =− =− Dμ (uk ) ; =− 4 4 2 2 2k + 1 + ν − 2k − 1 − ν − Dμ (uk+1 ) + Dμ (uk−1 ) LX Dμ− (uk ) = −Dμ− (LX uk ) = − 4 4 2k − 1 − ν 1−ν − 2k + 1 + ν Πν,|k+1| + Πν,|k−1| = − Dμ (uk ) . =− 4 4 2 The computations in the case μ = 1/4 and in the case of irreducible representations of the discrete series are similar.  LX Dμ+ (uk ) = −Dμ+ (LX uk ) = −

The construction of solutions of the cohomological equation in any irreducible component Hμ is based on solving a second order non-homogeneous difference equation dual to that for invariant distributions. Let fˆ := (fˆk ) denote the sequence of the Fourier coefficients of a vector f ∈ Hμ with respect to the orthogonal basis {uk } of Hμ . The following holds:

140

GIOVANNI FORNI

Lemma 3. The cohomological equation LU u = f is equivalent to the following second order difference equation for the sequence ˆ)k := −i (L∗ν u

2k + 1 − ν 2k − 1 + ν u ˆk+1 + ikˆ u ˆk−1 = fˆk . uk − i 4 4

The construction of solutions of the cohomological equation follows by standard methods in the theory of difference equations. Proof of Theorem 7. There is a standard construction of the Green operator for the dual operator L∗ν based on a basis of its kernel. In our problem, it is immediate to see that L∗ν = L−ν , hence the kernel of the operator L∗ν can be described in terms of the solutions found in the construction of invariant distributions. In other terms, once we know invariant distributions there is a procedure to write down the Green operator for the difference equation associated to the cohomological equation. The Green operator (or fundamental solution) for the operator L∗ν is the operator with kernel Gν (k, ) which satisfies the equation L∗ν Gν (·, )k = δk . It is immediate to verify from the definition of a Green operator that a formal solution u ˆ := (ˆ uk ) of the cohomological equation L∗ν u ˆ = fˆ is given by the formula  Gν (k, )fˆk . u ˆk = ∈Z

The construction of the Green operator is based on the following basis for the kernel of the operator L∗ν . By taking into account that L∗ν = L−ν and by the formulas for invariant distributions given above, it follows that the kernel of L∗ν is generated by the following sequences. For irreducible unitary representations Hμ of any type a first vector of the kernel is given by the formulas (1)

fk

= Πk,−ν = Π−1 k,ν ,

for all k ∈ Z .

For an irreducible component Hμ of the holomorphic discrete series of parameter μ = −n2 +n ≤ 0, we consider the restriction of the above solution to the half-infinite interval k ≥ n. For irreducible unitary representations Hμ of the principal or complementary series with μ = 1/4 (ν = 0), a second solution is given by the formula (2)

fk

= 1,

for all k ∈ Z .

For irreducible unitary representations Hμ of the principal series with μ = 1/4 (ν = 0) a second solution is given by the formula: (2)

f0

=0

and

(2)

fk

=

|k|  i=1

1 , 2i − 1

for all k ∈ Z \ {0} .

By a standard construction the Green operator is given by the following formulas: for irreducible representations of the principal and complementary series with μ = 1/4 (ν = 0):  ' Π ν,|| 2i − 1 , k > , (30) Gν (k, ) := ν Πν,|k| 0, k ≤ ;

LIMIT THEOREMS FOR HOROCYCLE FLOWS

141

for irreducible representations of the principal series with μ = 1/4 (ν = 0): ' || |k| 1  1 , k > , − 4i i=1 2i−1 i=1 2i−1 (31) G0 (k, ) := 0, k ≤ ; finally, for irreducible representations of the holomorphic discrete series: ' Πν,|| n ≤ k < , − 2i ν Πν,|k| , (32) Gν (k, ) := 2i −ν , n ≤  ≤ k. We remark that the Green operators above are uniquely determined by the condition that the solution sequence Gν fˆ has finite support whenever the sequence fˆ has finite support and belongs to the kernel of all invariant distributions. From the above formulas (30)-(32) for the Green operator and from formula (29) by elementary calculations it is possible to derive bounds on the norms of the Green operators, in particular restricted to the kernel of invariant distributions. For irreducible unitary representations Hμ of the principal and complementary series, for all s > (1 + Re(ν))/2 and for all t < s − 1, there exists a constant Cs,t > 0 such that, if f ∈ W s (Hμ ) and Dμ+ (f ) = Dμ− (f ) = 0, then the function Gν f belongs to W t (Hμ ) and Cs,t Gν f t ≤ 2 f s . 1 − Re(ν) Notice that, as anticipated in section 2.1, the above Sobolev estimates are uniform with respect to the representation parameter as long as ν ∈ (0, 1) is bounded away from 1 or, equivalently, as long as μ > 0 is bounded away from zero. A spectral gap condition is therefore needed to be able to ‘glue’ solutions of the cohomological equation for representations in the complementary series. For irreducible unitary representations Hμ of the holomorphic discrete series for parameter μ = −n2 + n the following holds. Let f ∈ W s (Hμ ). For s > 1 − n and t < min(s − 1, n − 1), the function Gν f belongs to W t (Hμ ) unconditionally; for s > n and t < s − 1, under the condition Dμ+ (f ) = 0, the function Gν f belongs to W t (Hμ ). In both cases, there exists a constant Cs,t > 0 such that, Cs,t Gν f t ≤ s−t f s . ν The above Sobolev estimates for the Green operator of the difference equation complete the construction of solutions of the cohomological equation. In fact, the Green operators are well-defined on functions with finitely supported Fourier series, which belong to the kernel of invariant distributions, and can be extended by continuity.  Notice that for the discrete series, with the exception of irreducible components of Casimir parameter μ = 0 (n = 1), the above argument implies that every function f ∈ W s (Hμ ) with s > 1 is a coboundary with square integrable transfer function. It follows that ergodic integrals of smooth functions which belong to most components of the discrete series are completely described by the value of the transfer function along horocycle orbits. In the analysis presented in these lectures, which focuses on the power law behavior of ergodic integrals, all discrete series components will contribute to a logarithmic error terms in the asymptotics. For smooth functions supported on the exceptional components of the discrete series, which are given by closed 1-forms on the base surface, a Central Limit Theorem with respect to

142

GIOVANNI FORNI

logarithmic time can be derived from the analysis carried out by F. Ledrappier and O. Sarig [34] of the fluctuations of ergodic integrals of horocycle flows for “Z-covers” of compact hyperbolic surfaces (we owe this observation to D. Dolgopyat). 2.4. Unitary models. In this section we briefly outline an equivalent approach to the cohomological equation for horocycle flows based on unitary models of irreducible representations. The abstract approach of the previous sections in fact does not generalize easily to higher rank semisimple groups or to other cohomological equations (for instance, the cohomological equation for horocycle maps, see [53]). Unitary models for irreducible unitary representations of the principal and complementary series can be described as follows. Up to unitary equivalence, for irreducible representations of the principal series (μ ≥ 1/4, ν ∈ iR) the representation space Hμ = L2 (R, dx), while for irreducible representations of the complementary series (0 < μ < 1/4, 0 < ν < 1) the representation space Hμ is the space of complex-valued functions on R endowed with the inner product  f (x)g(y) dxdy . < f, g >μ := 1−ν R |x − y| In both cases the action of SL(2, R) can be written as follows:   dx + b a b (33) f (x) = (cx + a)−(1+ν) f ( ). c d cx + a By the formulas (5), it follows that the generators X of the geodesic flow, U and V of the horocycle flows are given in the above unitary models by the following formulas: d 1+ν d d X = −x − , U= , V = −x2 − (1 + ν)x . dx 2 dx dx Since the horocycle flow {hU t } is given in representation by the group of translations, the only invariant linear functional on the Schwartz space of rapidly decaying smooth functions is the Lebesgue measure. However, smooth vectors for the representation are not in general in the Schwartz space. The space of smooth vectors can be described as follows. The generators Y of the orthogonal geodesic flow and Θ of the circle action are given by the formulas 1+ν 1 + x2 d 1+ν 1 − x2 d + x, Θ = + x. 2 dx 2 2 dx 2 By direct calculations the Θ-invariant functions u0 ∈ Hμ can be found by solving the first order O.D.E. Θu0 = 0, then the full basis {un } can be constructed for instance by the action of the creation/annihilation operators: there exists a normalization constant Cν > 0 such that  n ix + 1 Cν 2in(arctan x+ π ) 2 un (x) = e u0 (x) = u0 (x) . u0 (x) = 1+ν , ix − 1 (1 + x2 ) 2 Y =−

We recall that the basis {un } is a basis of eigenfunctions for the circle action. In fact, the following formula holds: for all φ ∈ C 1 ([−π/2, π/2]), 1 (34) Θ(φ(arctan x)u0 (x)) = φ (arctan x)u0 (x) . 2 The space of smooth vectors for the unitary representation (33) can be characterized as the space of functions with Fourier coefficients with respect to the above basis

LIMIT THEOREMS FOR HOROCYCLE FLOWS

143

decaying faster than any polynomial. Equivalently, by the above formula a function f ∈ Hμ belongs to C ∞ (Hμ ) if and only if there exists a smooth periodic function φ ∈ C ∞ ([−π/2, π/2]) such that (35)

f (x) = φ(arctan x)u0 (x) .

It follows that the linear functional Dμ+ (f ) = lim

x→±∞

f (x) = lim φ(θ) u0 (x) θ→±π/2

is well-defined and continuous on the space of smooth vectors. A straightforward calculation shows that the distribution Dμ+ is invariant under the horocycle flow, that is, for all smooth vector f ∈ C ∞ (Hμ ), Dμ+ (LU f ) = lim

x→±∞

d f (x) = 0. dx u0 (x)

Notice that the Fourier coefficients of the distribution Dμ+ are indeed π

Dμ+ (un ) = lim e2in(arctan x+ 2 ) = 1 , x→±∞

for all n ∈ Z .

The horocycle-invariant distribution Dμ− can be written as follows. For any smooth vector f ∈ C ∞ (Hμ ), the vector f − Dμ+ (f )u0 is represented by a function in the space L1 (R, dx). In fact, for any smooth vector f ∈ C ∞ (Hμ ) if Dμ+ (f ) = 0, then f belongs to L1 (Hμ ). It follows that  ! f (x) − Dμ+ (f )u0 (x) dx αμ (f ) := R

is a well-defined continuous linear functional on the space of smooth vectors. It can be verified that in the case μ = 1/4 (ν = 0) Dμ− (f ) := Dμ+ (f ) −

2ν αμ (f ) . 1 + ν αμ (u1 )

In fact, it is immediate to verify that the distribution αμ is horocycle-invariant, hence the above definition also gives a horocycle-invariant distribution. It is therefore enough to verify that 2ν 1+ν 2ν Dμ+ (u1 ) − 1+ν

Dμ+ (u0 ) −

αμ (u0 ) = Dμ+ (u0 ) = 1 , αμ (u1 ) αμ (u1 ) 2ν 1−ν =1− = . αμ (u1 ) 1+ν 1+ν

Similarly it can be verified that in the case μ = 1/4 (ν = 0), Dμ− (f ) :=

αμ (f ) . αμ (u1 )

Finally, the solution of the cohomological equation is given by the Green operator. For all smooth functions f ∈ C ∞ (Hμ ) such that Dμ+ (f ) = Dμ− (f ) = 0 the Green operator Gμ is defined by the formula:  x  +∞ f (t)dt = − f (t)dt . Gμ f (x) := −∞

x

The above integrals are absolutely convergent and coincide under the condition that Dμ+ (f ) = Dμ− (f ) = 0. In fact, in that case the smooth function f is integrable and

144

GIOVANNI FORNI

has zero average, moreover for any s > 1 there exists a constant Cs > 0 such that the following estimate holds (36)

|f (x)| ≤ Cs

f s , (1 + x2 )(2s+1)/4

for all x ∈ R ,

which by the above definition of the Green operator implies that Gμ f ∈ Hμ since |Gμ f (x)| ≤ Cs

f s , (1 + x2 )(2s−1)/4

for all x ∈ R .

The estimate in formula (36) can be derived from formulas (34) and (35). In fact, under the hypothesis that f ∈ C s (Hμ ), for s > 1/2, by the Sobolev embedding theorem, the function φ is H¨older of class C s−1/2 in the ordinary sense, hence there exists a constant C > 0 such that π Cf s π . |f (x) − φ(± )u0 (x)| = Cf s | arctan x − |s−1/2 |u0 (x)| ≤ 2 2 (1 + x2 )(2s+1)/4 Estimates for higher order Sobolev norms can be derived by elementary calculations. Although elementary, such calculations are not entirely straightforward, especially if one is interested in the optimal loss of Sobolev regularity for solutions (which as we have seen is 1 +  for any  > 0). For the discrete series there are similar formulas on unitary models given by spaces of square-integrable functions of the upper half-plane (or the unit disc) with respect to the appropriate area forms (see for instance [53], §2). 3. Finitely additive measures In this section we outline the construction of the H¨older cocyles mentioned in the introduction and the proof of the related asymptotics of ergodic integrals stated in Theorem 4. The H¨older cocycles are constructed by evaluation on horocycle orbit segments of finitely additive measures defined on general rectifiable arcs. H¨older estimates however are only proved for a subclass of rectifiable arcs, called the admissible rectifiable arcs, which includes all horocycle orbit segments. The notion of admissible arcs is as follows. A rectifiable arc is admissible if it can be projected onto a rectifiable arc contained in a single leaf of the weak stable foliation of the geodesic flow, that is, the foliation tangent to the integrable (involutive) distribution {X, U } ⊂ T (SΓ ), by holonomy along the unstable horocycle foliation, that is, along the orbit foliation of the unstable horocycle flow {hVt }. ˆ V (SΓ ) 3.1. Definitions and properties. Following [9], we introduce a class B of finitely additive measures on admissible rectifiable arcs on the double cover SΓ of the unit tangent bundle SM of a compact hyperbolic surface. ˆ U ˆ , Vˆ } be the frame of the cotanWe will adopt the following notation. Let {X, ∗ gent bundle T (SΓ ) dual to the frame {X, U, V } of the tangent bundle T (SΓ ). The ˆ U ˆ , Vˆ } are smooth (non-closed) differential 1-forms on SΓ . elements of the frame {X, ˆ ˆ ˆ Let |X|, |U | and |V | denote the 1-dimensional measures on rectifiable arcs given by ˆ U ˆ and Vˆ respectively. integration of the forms X, ˆ V (SΓ ) be the space of all functionals βˆ which to every Definition 1. Let B rectifiable arc γ ⊂ SΓ assign a complex number β(γ) so that the following holds:

LIMIT THEOREMS FOR HOROCYCLE FLOWS

145

(1) (Additive property) For any decomposition γ = γ1 + γ2 into subarcs, ˆ 2) ; ˆ ˆ 1 ) + β(γ β(γ) = β(γ (2) (Weak unstable vanishing) For all γ tangent to the weak unstable foliation, that is, to the foliation tangent to the integrable (involutive) distribution {X, V } ⊂ T (SΓ ), ˆ β(γ) = 0. (3) (Unstable horocycle invariance) For all t ∈ R, ˆ V γ) = β(γ) ˆ . β(h t (4) (H¨ older property) There exists an exponent α ∈ (0, 1) and a constant C > 0 such that for all admissible rectifiable arcs γ satisfying  ˆ| ≤ 1 |U γ

we have

   α    ˆ ˆ + |U ˆ | |Vˆ | ˆ| . |β(γ)| ≤ C 1 + |X| |U γ

γ

γ

γ

ˆ V (SΓ ). We will outline the construction of a special basis of the space B Theorem 8. For any Casimir parameter μ > 0 there exist two independent (normalized) finitely-additive measures βˆμ± such that the following holds. For all rectifiable arcs γ in SΓ the following properties hold: (1) (Additive property) For any decomposition γ = γ1 + γ2 into subarcs, βˆμ± (γ) = βˆμ± (γ1 ) + βˆμ± (γ2 ) ; (2) (Geodesic scaling) For all t ∈ R and for μ = 1/4, 1 ∓ ν ˆ± t)βμ (γ) , βˆμ± (g−t γ) = exp( 2 while for μ = 1/4 (ν = 0),       ˆ+ + β1/4 (γ) (g−t γ) βˆ1/4 t 1 − 2t = exp( ) ; − − 2 0 1 βˆ1/4 βˆ1/4 (g−t γ) (γ) (3) (Unstable horocycle invariance) For all t ∈ R, βˆμ± (hVt γ) = βˆμ± (γ) . (4) (H¨ older property) There exists a constant C > 0 such that, for all admissible rectifiable arc γ ⊂ SΓ , for all μ = 1/4, ν     1∓Re    2 ± ˆ + |U ˆ | |Vˆ | ˆ| (37) |βˆμ (γ)| ≤ C 1 + |X| |U γ

γ

γ

γ

and, for μ = 1/4 (ν = 0),     12 +    + ˆ ˆ ˆ ˆ ˆ |β1/4 (γ)| ≤ C 1 + |X| + |U | |V | |U | , (38)

γ

γ

γ

γ

γ

γ

γ

γ

    12    − ˆ ˆ ˆ ˆ ˆ |β1/4 (γ)| ≤ C 1 + |X| + |U | |V | |U | .

146

GIOVANNI FORNI

Notation: In the above formula (38), for any L > 0, we set 1

1

L 2 + = L 2 (1 + | log L|). It follows immediately from the H¨ older and the geodesic scaling properties that for all Casimir parameter μ > 0 and for rectifiable arc γwu contained in a single weak unstable manifold of the geodesic flow βˆ+ (γwu ) = βˆ− (γwu ) = 0 . μ

μ

In particular, for all Casimir parameter μ > 0 the finitely additive measures βˆμ+ ˆ V (SΓ ) introduced above. and βμ− belong to the space B ˆ V (SΓ ) to stable horocycle The restriction of a finitely-additive measure βˆ ∈ B arcs is a H¨ older cocycle β for the stable horocycle flow {hU t }. In particular, for any (x, T ) ∈ SΓ × R let γU (x, T ) denote the stable horocycle arc γU (x, T ) := {hU t (x)|t ∈ [0, T ]} . For every Casimir parameter μ > 0, the cocycles βμ± are defined as follows: βμ± (x, T ) = βˆμ± (γU (x, T )) ,

for all (x, T ) ∈ SΓ × R .

By this definition, Theorem 3 of the Introduction follows immediately, with the exception of the orthogonality property, from the above Theorem 8. The orthogonality property follows from the asymptotic theorem for ergodic integrals. The proof of Theorem 8 follows the following plan. Rectifiable arcs act by integration on smooth 1-forms, hence they define currents (of dimension 1 and degree 2) in the sense of de Rham. We describe a space of special currents, that is, the space of the so-called basic currents for the horocycle foliations. These currents are in one-to-one correspondence with horocycle-invariant distributions. By the analysis of horocycle-invariant distributions carried out in section 2, we easily derive that the action of the geodesic flow on the space of basic currents can be diagonalized (with the exception of at most finitely many 2 × 2 Jordan blocks coming from irreducible components of the principal series with Casimir parameter μ = 1/4). The values βˆμ± (γ) on a rectifiable arc γ of the finitely-additive measures described above in Theorem 8 are then given by dynamically defined components of γ, as a current, in the directions of the elements of a Jordan basis {Bμ± } for the action of the geodesic flow on the space of basic currents. 3.2. Basic currents and invariant distributions. We recall the definition of a basic current for a foliation. A current C of dimension d (and degree n − d) on a smooth manifold N of dimension n ≥ 1 is a continuous linear functional on the space of smooth d-forms endowed with the smooth topology. We recall the definition of the contraction and Lie derivative operators in the direction of a vector field on currents. Such operators are naturally defined on forms and are defined on currents by duality, that is, for any smooth vector field W on N , we have, for all smooth forms α of degree d + 1 on N , ıW C(α) = (−1)d+1 C(ıW α)

and

LW C(α) = −C(LW α)

Definition 2. Let F be a smooth foliation on a manifold N of dimension n ≥ 1. A basic current C for the foliation F is a current such ı W C = LW C = 0 ,

for all vector fields W tangent to F .

LIMIT THEOREMS FOR HOROCYCLE FLOWS

147

We are interested in basic currents of dimension 1 (and degree 2) for the stable horocycle foliation. Let Ω1 (SΓ ) denote the space of all smooth 1-forms and let Ω1 (SΓ ) denote the dual space, that is, the space of all currents of dimension 1 (and degree 2). Let BU (SΓ ) ⊂ Ω1 (SΓ ) denote the subspace of basic currents for the stable horocycle foliation. We also introduce Sobolev spaces of currents. For every s ≥ 0, let W s (SΓ ) denote as above the standard Sobolev space of functions and let Ωs1 (SΓ ) denote the Sobolev space of 1-forms on SΓ defined as follows: ˆ + λU U ˆ + λV Vˆ ∈ Ωs1 (SΓ ) ⇐⇒ (λX , λU , λV ) ∈ [W s (SΓ )]3 . λ = λX X s Let W −s (SΓ ) and Ω−s 1 (SΓ ) denote the dual spaces of the (Hilbert) spaces W (SΓ ) −s −s s and Ω1 (SΓ ) respectively. Let BU (SΓ ) ⊂ Ω1 (SΓ ) denote the subspace of basic forms, that is, −s (SΓ ) = BU (SΓ ) ∩ Ω−s BU 1 (SΓ ) . ˆ ∧U ˆ ∧ Vˆ be the volume form on the homogeneous manifold SΓ . Let Let ω = X

ˆ ∧ Vˆ , ηX = ıX ω = U

ˆ ∧ Vˆ , ηU = ıU ω = −X

ˆ ∧U ˆ. ηV = ıV ω = X

Lemma 4. A current C of dimension 1 (and degree 2) on SΓ is basic for the stable horocycle foliation if and only if there exists a horocycle-invariant distribution D such that C = DηU . The map D → DηU defines an isomorphism from the space IU (SΓ ) of horocycleinvariant distributions onto the space BU (SΓ ) of basic currents for the horocycle foliations, moreover for any s ≥ 0 such a map restricts to an isomorphism from the −s (SΓ ) ⊂ BU (SΓ ). Sobolev space IU−s (SΓ ) ⊂ IU (SΓ ) onto the space BU Proof. The 2-forms ηX , ηU , ηV form a frame of the bundle of smooth 2-forms on the 3-dimensional manifold SΓ , hence for any current C of dimension 1 (and degree 2) there exist distributions DX , DU and DV such that C = DX ηX + DU ηU + DV ηV . ˆ equals 0 if and only if DX = DV = 0. Finally, It follows that ıU C = DX Vˆ − DV X since the form ηU is by definition invariant under the stable horocycle flow

(39)

LU (DηU ) = (LU D)ηU + D(LU ηU ) = (LU D)ηU , that is, a current of the form DηU is horocycle-invariant if and only if the distribution D is. It follows that the map D → DηU is an isomorphism from the space IU (SΓ ) of all horocycle-invariant distributions to the space BU (SΓ ) of all basic currents. By the above definition of the Sobolev norms, such an isomorphism restricts to all Sobolev spaces as claimed. The argument is therefore complete.  By the above Lemma 4 and by Theorem 1 we have a complete description of the space BU (SΓ ) of basic currents for the stable horocycle foliation and of the action of the geodesic flow on the space of basic currents. The Sobolev spaces of forms, currents and basic currents have direct sum, orthogonal decompositions into irreducible components. In particular, we have 1 1 Ω1 (Hμ ) and Ω−s Ω−s Ω1 (SΓ ) = 1 (SΓ ) = 1 (Hμ ) ; μ∈σ()

BU (SΓ ) =

1

μ∈σ()

μ∈σ()

BU (Hμ )

and

−s BU (SΓ )

=

1

μ∈σ()

−s BU (Hμ ) .

148

GIOVANNI FORNI

Lemma 5. For any Casimir parameter μ ∈ R+ \ {1/4}, the space BU (Hμ ) has −s (Hμ ), for any s > (1 + Re(ν))/2, such that a basis {Bμ+ , Bμ− } ⊂ BU 1∓ν ± Bμ . 2 −s (Hμ ), for any For μ = 1/4 (ν = 0), the space BU (Hμ ) has a basis {Bμ+ , Bμ− } ⊂ BU s > 1/2, such that     +  + B1/4 B1/4 1 1 0 = . (41) LX − − −1 1 B1/4 B1/4 2

(40)

LX Bμ± =

−s (Hμ ), For μ = −n2 + n < 0 (ν = 2n − 1), the space BU (Hμ ) has a basis {Bμ } ⊂ BU for any s > (1 + Re(ν))/2 = n such that 1−ν Bμ = (1 − n)Bμ . (42) L X Bμ = 2

Proof. By Theorem 1, for any μ ∈ R+ , the space IU−s (Hμ ) has complex dimension 2. Let {Dμ+ , Dμ− } be a basis of eigenvectors for the action of the geodesic flow, when μ = 1/4, and a Jordan basis, when μ = 1/4. For any μ ≤ 0 the space IU−s (Hμ ) has complex dimension 1. Let {Dμ+ } be a basis, which consists of eigenvectors for the action of the geodesic flow. For every Casimir parameter μ ∈ R, let Bμ± := Dμ± ηU . By Lemma 4, the system {Bμ+ , Bμ− } is a basis of the space −s BU (Hμ ) for any s > (1 + Re(ν))/2 , when μ > 0, and {Bμ+ } is a basis of the space −s BU (Hμ ) for any s > (1 + ν)/2 = 2n − 1 when μ = −n2 + n ≤ 0, Since U is the generator of the stable horocycle flow, by the commutation relation [X, U ] = U , the form ηU is an eigenvector with eigenvalue 1 for the action of geodesic flow on 1-forms, that is, LX ηU = ηU , hence by Leibniz rule, LX Bμ± = (LX Dμ± )ηU + Dμ± ηU = (LX Dμ± + Dμ± )ηU . The formulas (40), (41) and (42) for the action of the geodesic flow on basic currents then follow immediately from the corresponding formulas of Theorem 1 for the action on the basis {Dμ± } of horocycle-invariant distributions.  3.3. Dynamical projections of rectifiable arcs. In this section we outline the construction of finitely additive measures on rectifiable arcs. The values of these measures on a rectifiable arc will be defined as dynamically defined components of the arc, viewed as a current, in the direction of the elements of the basis of the space of basic currents constructed above. For any s ≥ 0 and for every Casimir parameter μ ∈ R, let −s Pμ−s : Ω−s 1 (SΓ ) → Ω1 (Hμ )

denote the orthogonal projection. By the Sobolev embedding theorem, any rectifiable arc γ defines a current of dimension 1 (and degree 2), denoted by the same symbol, which belongs to any dual Sobolev space Ω−s 1 (SΓ ) for any s > 3/2. By the orthogonal decomposition (43)

−s ⊥ −s ⊥ Ω−s 1 (Hμ ) = BU (Hμ ) ⊕ BU (Hμ ) ,

± −s ⊥ there exist complex numbers α ˆ μ,−s (γ) and a current R−s μ (γ) ∈ BU (Hμ ) such that the following identities hold:

(44)

+ − ˆ μ,−s (γ)Bμ+ + α ˆ μ,−s (γ)Bμ− + R−s Pμ−s (γ) = α μ (γ) .

LIMIT THEOREMS FOR HOROCYCLE FLOWS

149

−s In fact, let Bμ−s : Ω−s 1 (SΓ ) → BU (Hμ ) denote the orthogonal projection on the (closed) subspace of basic currents supported on the irreducible component Hμ and let −s −s −s −s ⊥ R−s μ = Pμ − Bμ : Ω1 (SΓ ) → BU (Hμ ) denote the complementary projection. By construction there is a decomposition

(45)

Pμ−s (γ) = Bμ−s (γ) + R−s μ (γ) .

−s (Hμ ) of basic currents given in Finally, by the characterization of the space BU ± Lemma 5, there exists complex numbers α ˆ μ,−s (γ) such that + − ˆ μ,−s (γ)Bμ+ + α ˆ μ,−s (γ)Bμ− . Bμ−s (γ) = α

Dynamical projections of rectifiable arcs are then defined as follows. Let gt∗ γ denote the action of the geodesic flow map gt on SΓ on the current γ by pull-back. Let ± α ˆ μ,−s (gt∗ γ) denote the components of the current gt∗ γ along the basic currents Bμ± defined by formula ((44)). For any Casimir parameter μ > 0 and for any sufficiently ± (gt∗ γ) converge after the appropriate norlarge s > 3/2, the complex numbers α ˆ μ,s malization. The value βˆ± (γ)μ of the finitely additive measures βμ± on the rectifiable arc γ ⊂ SΓ is defined to be such a limit, which does not depend on s >> 3/2. In more precise terms we can state the following result. Theorem 9. For any rectifiable arc γ ⊂ SΓ the following holds. For any Casimir parameter μ ∈ R+ \ {1/4} the following limits exist and do not depend on s > 9/2: (46)

βˆμ± (γ) := lim

t→+∞

± α ˆ μ,−s (gt∗ γ)

exp( 1∓ν 2 t)

;

For μ = 1/4 the limits below exist and do not depend on s > 9/2: + (γ) := lim βˆ1/4

+ − (α ˆ 1/4,−s + 2t α ˆ 1/4,−s )(gt∗ γ)

exp( 2t )

t→+∞

(47) − (γ) := lim βˆ1/4

− α ˆ 1/4,−s (gt∗ γ)

exp( 2t )

t→+∞

,

.

The convergence in the limits (46), (47) is exponential in the following precise sense. For all t > 0, let us introduce the rescaled weak unstable length   ˆ + e−t |Vˆ | . (48) |γ|XV,t := |X| γ

γ

There exists a constant Cs > 0 such that, for any μ = 1/4, # # ± # (gt∗ γ) ## α ˆ μ,−s Cs 1 + |γ|XV,t # ˆ± (49) , #≤ #βμ (γ) − 1∓ν # # exp( 2 t) Bμ± −s exp( 1∓ν 2 t) while for μ = 1/4, # # + − # + 2t α ˆ 1/4,−s )(gt∗ γ) ## Cs (1 + t) (1 + |γ|XV,t ) (α ˆ 1/4,−s # ˆ+ , #≤ #β1/4 (γ) − # # exp( 2t ) exp( 2t ) Bμ± −s # # (50) − # α ˆ 1/4,−s (gt∗ γ) ## Cs (1 + t) (1 + |γ|XV,t ) # ˆ− . #β1/4 (γ) − #≤ # exp( 2t ) # exp( 2t ) Bμ± −s

150

GIOVANNI FORNI

For all Casimir parameters μ ∈ R+ , the following bound holds:    Cs ± ˆ ˆ ˆ (1 + |X| + |U | + |Vˆ |) . (51) |βμ (γ)| ≤ Bμ± −s γ γ γ We outline below the proof of the above theorem. We remark that by Lemma 5 basic currents are (generalized) eigenvectors for the action of the geodesic flow on the spaces of Sobolev currents. If the orthogonal splitting in formula (43) were invariant (which is not!), the proof would be completely straightforward. In fact, for instance for μ ∈ R+ \ {1/4} we would simply have, for all t ∈ R, 1∓ν ± ± t) α ˆ μ,−s α ˆ μ,−s (gt∗ γ) = exp( (γ) . 2 Since the Sobolev inner products, and so the orthogonal splitting (43), are not invariant under the action of the geodesic flow, the above identity does not necessarily hold. However, we will prove that it holds asymptotically. The main technical ingredient of the argument are bounds on the “remainder terms” R−s μ (γ) in terms of the lengths of the arc γ along the weak unstable leaf, which are uniform with respect to the length of γ along the stable horocycle. Such bounds are derived from the analysis of the cohomological equation carried out in section 2. In fact, the following lemma holds. Lemma 6. For any s ≥ r > 7/2, for any rectifiable arc γ in SΓ and for any irreducible component Hμ ⊂ L2 (SΓ ) of Casimir parameter μ ∈ R \ {0}, the current −s −s,−r (γ) ∈ Ω−r R−s μ (γ) ∈ Ω1 (Hμ ) has a unique continuous extension Rμ 1 (Hμ ) and the following uniform bound holds. There exists a constant Cr,s > 0 such that   −s,−r ˆ (γ)−r ≤ Cs,r (1 + |X| + |Vˆ |) . Rμ γ

γ

Proof. By the orthogonal decomposition, dual to the decomposition (43), −s −s Ωs1 (Hμ ) = Ker BU (Hμ ) ⊕⊥ [Ker BU (Hμ )]⊥ , −s (Hμ ) for any Sobolev 1-form λμ ∈ Ωs1 (Hμ ), there exist Sobolev 1-forms λμ ∈ Ker BU −s and λμ ∈ [Ker BU (Hμ )]⊥ such that

(52)

λμ = λμ + λμ

and

λμ 2s = λμ 2s + λμ 2s .

−s −s (Hμ )]⊥ = Ker [BU (Hμ )⊥ ], it follows that Since [Ker BU

(53)

−s  −s  −s   R−s μ (γ)[λμ ] = Rμ (γ)[λμ ] + Rμ (γ)[λμ ] = Rμ (γ)[λμ ] = γ(λμ ) .

We are therefore led to estimate the current γ on 1-forms which belong to the kernel of all basic currents. There exists functions aμ , bμ and cμ ∈ W s (Hμ ) such that we can write ˆ + bμ U ˆ + cμ Vˆ . λμ = aμ X −s Since λμ ∈ Ker BU (Hμ ), by the characterization of basic currents in terms of invariant distributions given in Lemma 4, it follows that

bμ ∈ Ker IU−s (Hμ ) , hence, by Theorem 2, for all s ≥ r > 1 and t < r − 1 there exists a constant Cr,t > 0 such that the following holds: the cohomological equation LU uμ = bμ has a (unique) solution uμ ∈ W t (Hμ ) which satisfies the Sobolev bounds (54)

uμ t ≤ Cr,t bμ r .

LIMIT THEOREMS FOR HOROCYCLE FLOWS

151

By a straightforward calculation (expressing the exterior differential in terms of the ˆ U ˆ and Vˆ ), we have Lie derivative LX , LU and LV and of the dual form X, ˆ + (cμ − LV uμ )Vˆ + duμ . λμ = (aμ − LX uμ )X By the Sobolev embedding theorem, for any t > 5/2 there exists Ct > 0 such that uμ C 0 (SΓ ) + LX uμ C 0 (SΓ ) + LV uμ C 0 (SΓ ) ≤ Ct uμ t , hence, again by the Sobolev embdedding theorem and by the estimate in formula (54) for s ≥ r > 7/2 there exists a constant Cr > 0 such that    ˆ + |Vˆ |)λμ r . (55) |γ(λμ )| = | λμ | ≤ Cs (1 + |X| γ

γ

γ

Note that the key point of the estimate in formula (55) is that since the boundary of any rectifiable arc is a difference of Dirac masses, the integral of an exact form along a rectifiable arc of arbitrary length is bounded only in terms of the uniform norm of its primitive function. Finally, from formulas (52), (53) and (55) we derive that for any s ≥ r > 7/2 there exists a constant Cr > 0 such that, for all 1-forms λ ∈ Ωs1 (Hμ ) we have   ˆ (γ)[λ ]| ≤ C (1 + | X| + |Vˆ |)λμ r . |R−s μ r μ γ

γ

We have thus proved that for any s ≥ r > 7/2 the current R−s μ (γ) can be extended −r −s,−r by continuity to a linear functional Rμ (γ) on Ω1 (Hμ ) which satisfies the stated Sobolev bounds. The argument is therefore complete.  Proof of Theorem 9. Since the orthogonal projections Pμ−s : W −s (SΓ ) → W (Hμ ) commute with the geodesic flow (in fact, with the action of the full group SL(2, R)), for t, τ ∈ R we have −s

∗ Pμ−s (gt+τ γ) = gτ∗ Pμ−s (gt∗ γ) .

By the decomposition in formula (45) it follows that ∗ ∗ γ) = gτ∗ Bμ−s (gt∗ γ) + Bμ−s (gτ∗ R−s Bμ−s (gt+τ μ (gt γ)) .

By taking the derivative of the above formula at τ = 0 we have the following ordinary differential equation: d −s ∗ ∗ B (g γ) = LX Bμ−s (gt∗ γ) + Bμ−s (LX R−s μ (gt γ)) . dt μ t ∗ Let us now write the distribution Bμ−s (LX R−s μ (gt γ)) as a linear combination of the −s + − vectors the basis {Bμ , Bμ } of the subspace BU (Hμ ) of basic currents, that is,

(56)

∗ + ∗ + − ∗ − Bμ−s (LX R−s μ (gt γ)) = ρμ (gt γ)Bμ + ρμ (gt γ)Bμ .

By Lemma 6 there exists constant Cs > 0 such that   ∗ + ∗ −t ˆ (57) |ρ+ (g γ)| + |ρ (g γ)| ≤ C (1 + (1 + | X| + e |Vˆ |) . s μ t μ t γ

γ

Note that the above estimate also requires a proof that the distorsion of the (non−s orthogonal) basis {Bμ+ , Bμ− } ⊂ BU (Hμ ) is uniformly bounded with respect to Casimir parameter on every closed subset of R+ \ {1/4} (see [9], Lemma 2.3).

152

GIOVANNI FORNI

By writing the above differential equation (56) in coordinates with respect to −s the basis {Bμ+ , Bμ− } of the space BU (Hμ ) and by Lemma 5, we derive the following + differential equations: for μ ∈ R \ {1/4}, d ± 1∓ν ± ∗ α ˆ μ,−s (gt∗ γ) = α ˆ μ,−s (gt∗ γ) + ρ± μ,−s (gt γ) ; dt 2 for μ = 1/4, we have instead  +   +    +  d α 1 1 −1 ˆ μ,−s (gt∗ γ) α ˆ μ,−s (gt∗ γ) ρμ,−s (gt∗ γ) (59) = + . − − ∗ ˆ μ,−s α ˆ μ,−s ρ− (gt∗ γ) (gt∗ γ) dt α 2 0 1 μ,−s (gt γ) (58)

By writing down solutions of the above O.D.E.’s we conclude that, for μ = 1/4,  t ± (gt∗ γ) α ˆ μ,−s ± ∗ − 1∓ν 2 τ dτ , (γ) + ρ± (60) = α ˆ μ,−s μ,−s (gτ γ)e exp( 1∓ν t) 0 2 while for μ = 1/4, after some elementary calculations, + − α ˆ 1/4,−s (t) + 2t α ˆ 1/4,−s (gt∗ γ)

exp( 2t ) (61)

+ =α ˆ 1/4,−s (γ)



t

+ 0

− α ˆ 1/4,−s (gt∗ γ)

exp( 2t )

− = α ˆ 1/4,−s (γ) +



0

t

∗ [ρ+ 1/4,−s (gτ γ) +

τ − ρ (τ )]e−τ /2 dτ ; 2 1/4,−s

∗ −τ /2 ρ− dτ . 1/4,−s (gτ γ)e

By the bounds in formula (57) the integrals in formulas (60) and (61) are absolutely convergent (as t → +∞) and are absolutely and uniformly bounded in terms of the transverse length of the rectifiable arc γ in SΓ . The argument is concluded.  All the properties of the finitely additive measures βˆμ± stated in Theorem 8 follow readily from Theorem 9. Proof of Theorem 8 (outline). The additive property follows from the linearity of the projections, of the normalization and of taking limits. The geodesic scaling property follows formally from the definitions. In fact, this is exactly the point of dynamically defined projections: once they are well-defined, the scaling properties are automatic. The unstable horocycle invariance is based on the observation that for every admissible rectifiable arc γ, it is possible to project γ along the strong stable foliation (that is, along the stable horocycle) to a rectifiable arc γws contained in a single weak stable leaf so that βˆμ± (γ) = βˆμ± (γws ) . The above identity holds since γ and γws are contained in the boundary of a 2dimensional disc D(γ, γws ) such that g−t D(γ, γws ) has bounded area for all t > 0. Finally, for every t ∈ R and every admissible arc γ, the arcs γ and hVt γ have by definition a common weak stable projection. The H¨ older property can be proved for any weak stable arc, that is, for any arc γws contained in a single weak stable leaf. For a general admissible arc γ the argument then proceeds to estimate the length of a weak stable projection γws in the geodesic direction in terms of the lengths of γ, hence the general H¨ older estimate is derived from the weak stable case. 

LIMIT THEOREMS FOR HOROCYCLE FLOWS

153

The unstable horocycle invariance and the weak unstable vanishing of the finitely additive measures constructed above lead to an interesting duality between the stable and unstable horocycle flow. Namely, finitely additive measures for the stable horocycle flow, as defined by the properties listed in Definition 1, can be written in terms of invariant distributions for the unstable horocycle flow. This result in turn leads on the one hand to a complete classification of the abstractly ˆ V (SΓ ) (see [9], Theorem 1.8), on the other hand it helps to establish defined space B ˆ V (SΓ ) and Γ-invariant a direct equivalence between finitely additive measures in B conformal distributions on the boundary of the Poincar´e disc (see [9], Theorem 1.9). We outline below the main construction, but leave out the details of the classification theorem and the relation to Γ-invariant conformal distributions. Any finitely ˆ V (SΓ ) can be regarded as a current of dimension 2 (and additive measure βˆ ∈ B degree 1). In fact, any smooth 2-form η induces a 2-dimensional signed measure on SΓ , hence it is possible to define the tensor product measure βˆ ⊗ η as a signed finitely additive measure on SΓ . The Riemann integral  ˆ η >:= < β, βˆ ⊗ η SΓ

is therefore well-defined and yields a continuous linear map on the space Ω∞ 2 (SΓ ) of smooth 2-forms on SΓ , that is, a current of dimension 2 (and degree 1). The following result holds: ˆ V (SΓ ) there exists a H¨ ˆβ, Corollary 1. For any βˆ ∈ B older distribution D invariant under the unstable horocycle flow {hVt }, such that ˆβ ∧ U ˆ (62) βˆ = D Proof. By the weak unstable vanishing property (in Definition 1) ıX ıV βˆ = 0 , ˆ β such that formula (62) holds. We note that the hence there exists a distribution D above statement follows from the fact that currents of dimension 2 (and degree 1) can be written as linear combinations with distributional coefficients of the frame ˆ U ˆ , Vˆ } of the space of 1-forms (viewed as currents of dimension 2). {X, ˆ β is a continuous By the H¨ older property (in Definition 1) it follows that D functional on a space of H¨ older continuous functions. ˆ β is invariant under the unstable It remains to be proved that the distribution D horocycle flow. This is done by the following calculation. By the unstable horocycle invariance property (in Definition 1) we have ˆβ ∧ U ˆ. ˆ +D ˆ β ∧ LV U 0 = LV βˆ = LV D A direct calculation based on the commutation relations yields ˆ = ıV dU ˆ = −ıV (X ˆ ∧U ˆ) = 0 , LV U ˆ β = 0, that is, the distribution hence by the above identities we conclude that LV D ˆ β is invariant under the unstable horocycle flow. D  By the classification of horocycle-invariant distributions (see Theorem 1, stated ˆ V (SΓ ) for the unstable horocycle flow) and by the above construction of a basis of B ˆ β is injective (see Theorem 8) it is straightforward to prove that the map βˆ → D

154

GIOVANNI FORNI

and surjective onto the subspace of all distributions, invariant with respect to the unstable horocycle flow, supported on irreducible unitary representations of the principal and complementary series. Thus we can completely describe all the elˆ V (SΓ ) in terms of horocycle-invariant distributions. Finally we note ements of B that by inverting formula (62) we can also write ˆβ = X ˆ ⊗ βˆ ⊗ Vˆ . D It follows that all horocycle-invariant distributions can be written in terms of products of smooth measures in the geodesic and horocycle direction and a finitely additive H¨ older measure in the direction of the complementary horocycle. In S. Cosentino’s approach [11] to horocycle-invariant distributions, which follows the Patterson-Sullivan point of view, a similar formula holds for the lift on invariant distributions to the Poincar´e disc, with the finitely additive H¨ older measures in the direction of the complementary horocycle replaced by Γ-invariant conformal H¨ older distributions on the boundary of the Poincar´e disc. It is therefore possible to derive a relation between finitely additive H¨ older measures on horocycles and Γ-invariant conformal H¨ older distributions (see [9], Theorem 1.9). 3.4. Asymptotic formulas and limit theorems. In section 3.3 we have proved the existence of dynamical projections of rectifiable arcs along basic currents in each irreducible unitary representation of the principal and complementary series. In this section we glue together these projections to construct a ‘global’ dynamically defined projection on the space of basic currents. We then derive asymptotic results for ergodic integrals and from there results on limit distributions. −s (SΓ ) be the (closed) subspace of basic currents for For any s > 0, let B+ the stable horocycle foliation supported on irreducible unitary representations of −s −s : Ω−s the principal and complementary series and let B+ 1 (SΓ ) → B+ (SΓ ) be the orthogonal projection. The following result is an immediate corollary of Theorem 8. However, a stronger regularity assumption is needed to ensure that the series which appear in summing up the contributions from different irreducible components are convergent. Corollary 2. For any r > 9/2, for any s > r + 1 and for any rectifiable arc γ ⊂ SΓ , the limit −r ∗ ˆ B(γ) := lim (g−t ◦ B+ ◦ gt∗ )(γ) ∈ Ω−s 1 (SΓ ) t→+∞

exists and is equal to a uniquely determined basic current for the stable horocycle foliation. In fact, there exists a basis {Bμ± } ⊂ BU (SΓ ) of (generalized) eigenvectors for the action of the geodesic flow on the space of basic currents such that  ˆ B(γ) = βˆμ+ (γ)Bμ+ + βˆμ− (γ)Bμ− . μ∈σ()∩R+

For all rectifiable arcs γ in SΓ the following properties hold: (1) (Additive property) For any decomposition γ = γ1 + γ2 into subarcs, ˆ ˆ 1 ) + B(γ ˆ 2) ; B(γ) = B(γ (2) (Weak unstable vanishing) For all γ tangent to the weak unstable foliation, ˆ B(γ) = 0.

LIMIT THEOREMS FOR HOROCYCLE FLOWS

155

(3) (Unstable horocycle invariance) For all t ∈ R, ˆ Vt γ) = B(γ) ˆ B(h . (4) (H¨ older property) There exist exponents α± ∈ (0, 1) and a constant C > 0 such that, for all admissible rectifiable arc γ ⊂ SΓ , we have   α     ˆ ˆ ˆ ˆ ˆ B(γ)−s ≤ C 1 + |X| + |U | |V | max |U | + − γ

γ

α∈{α ,α }

γ

γ

For any sufficiently smooth 1-form λ ∈ Ω1 (SΓ ), let βˆλ be the finitely additive functional defined on rectifiable arcs γ ⊂ SΓ as ˆ λ> . (63) βˆλ (γ) :=< B(γ), ˆ V (SΓ ). In particular, for any sufficiently It follows from Corollary 2 that βˆλ ∈ B ˆ V (SΓ ) be finitely additive smooth complex-valued function f on SΓ , let βˆf ∈ B functional βˆf Uˆ , that is, for any rectifiable arc γ ⊂ SΓ , ˆ ˆ> . fU βˆf (γ) = βˆf Uˆ (γ) =< B(γ),

(64)

By Corollary 2 and the identification between basic currents and invariant distributions given by Lemma 4 the finitely-additive measure βˆf has the expansion:  Dμ+ (f )βˆμ+ + Dμ− (f )βˆμ− . (65) βˆf = μ∈σ()∩R+

The following asymptotic theorem follows immediately from the exponential estimates in Theorem 8, in particular from formulas (49) and (50). Theorem 10. For any s > 11/2 there exists a constant Cs > 0 such that for every rectifiable arc γ ⊂ SΓ and for all 1-forms λ ∈ Ωs1 (SΓ ) supported on irreducible components of the principal and complementary series, we have    ˆ ˆ | λ − βλ (γ)| ≤ Cs λs (1 + |X| + |Vˆ |) . γ

γ

γ

In particular, for all functions f ∈ W (SΓ ) supported on irreducible components of the principal and complementary series, we have    ˆ − βˆf (γ)| ≤ Cs f s (1 + |X| ˆ + |Vˆ |) . | fU s

γ

γ

γ

By the results of [14] it is possible to derive a logarithmic upper bound on the uniform norm of the ergodic integrals along horocycle orbits of functions supported on irreducible components of the discrete series. In the particular case of arcs of horocycle orbits, Theorem 10 therefore implies an asymptotic result on ergodic integrals along horocycle orbits, stated as Theorem 4 stated in the Introduction. This asymptotic result is the cornerstone of all our results on limit distributions. We conclude our lectures by explaining our limit theorems for each irreducible component. As we shall see our limit theorems in each irreducible component are derived from the asymptotic theorem and from geodesic scaling property of the cocycles along the horocycle flow. Proof of the limit theorems for the complementary series (Theorem 5). Let us assume that our hyperbolic surface admits complementary

156

GIOVANNI FORNI

series, that is, the spectrum of the Laplace operator has eigenvalue in the open interval (0, 1/4). Let s > 11/2 and consider smooth functions with non-trivial projection on the complementary series components. Let μf ∈ (0, 1/4) be the smallest Casimir parameter appearing (non-trivially) in the decomposition of a zero-average func2 tion f ∈ C ∞ (SΓ ). Let νf := 1 − 4μf . Let H1 , . . . , Hk ⊂ L2 (SΓ ) be the collection of all irreducible components of Casimir parameters μ1 = · · · = μk = μf and let {D1± , . . . , Dk± } be the basis of eigenvectors of the geodesic flow of the space of invariant distributions supported on W s (H1 ⊕ · · · ⊕ Hk ) and let β ± , . . . , βk± : SΓ × R → C be the corresponding cocycles for the horocycle flow. The main step in the proof is the following approximation result. Lemma 7. There exists α > 0 such that the following holds. For every s > 11/2 there exists a constant Cs > 0 such that, for every function f ∈ W s (SΓ ) of zero average, for all (x, T ) ∈ SΓ × R and t > 0,  T et k  1 | 1+νf f ◦ hU (x)dτ − Di− (f )βi− (gt x, T )| ≤ Cs f s e−αt . τ t 0 2 e i=1 Proof. By the asymptotic theorem (Theorem 10) in the special case of horocycle arcs, there exists νf < νf such that 

T et

f ◦ hU τ (x)dτ − 0

k 

Di− (f )βi− (x, et T )| ≤ Cs f s e

 1+νf 2

t

.

i=1

By the geodesic scaling property of cocycles (see Theorem 3) we also have, for all i ∈ {1, . . . , k} and for all T , t > 0, βi− (x, et T ) = e

1+νf 2

t − βi (gt x, T )



The argument is therefore complete.

The cocycles β1− , . . . , βk− (in fact, all cocycles βμ± ) have zero average but are not identically zero on SΓ . It follows that, for all i ∈ {1, . . . , k}, we have βi− (·, T )0 = 0 ,

for all T ∈ R \ {0} .

By the orthogonality property of Theorem 3, the random variables β1− (·, T ), . . . , βk− (·, T ) are orthogonal (uncorrelated). By Lemma 7, for any s > 11/2 and for any function f ∈ W s (SΓ ) of zero average such that (D1− (f ), . . . , Dk− (f )) = (0, . . . , 0), we have # # Tet # # U # # f ◦ ht (x) dt0  # # # # 0 (66) − 1 # 1+ν  # ≤ Cs e−αt .  1/2  f # # k − − t 2 2 #e 2 # i=1 |Di (f )| βi (·, T )0 # # Finally, by Lemma 7 and by definition of the L´evy-Prohorov metric [6], for all T ∈ [0, 1] and all t > 0, we have (67)

dLP (Mt (f, T ), Pcp (f, T )) ≤ Cs f s e−αt . 

LIMIT THEOREMS FOR HOROCYCLE FLOWS

157

We turn next to limit theorems for functions supported on the principal series. We prove our main theorem (Theorem 6) on the asymptotics of probability distributions of normalized ergodic integrals. Proof of the limit theorems for the principal series (Theorem 6). By construction, for any Casimir parameter μ > 1/4, Dμ− = Dμ+

βμ− = βμ+ .

and

It follows that for any real-valued function f ∈ W s (Hμ ), for all (x, T ) ∈ SΓ × R .

βf (x, T ) = Re[Dμ+ (f )βμ+ (x, T )] ,

Let {μn } be the sequence of Casimir parameters in the interval (1/4, +∞) (listed with multiplicities), let {Dμ±n } be the sequence of normalized horocycle-invariant older cocydistributions and let {βμ±n } be the corresponding sequence of additive H¨ 1 cles. For any sequence z ∈  (N, C), let βz : SΓ × R → R be the H¨older additive cocycle for the horocycle flow defined as follows:   (68) βz := Re[ zn βμ+n ] = (zn βμ+n + z¯n βμ−n ) . n∈N

n∈N

It follows from Theorem 3, in particular from the uniform bound on additive cocycles given in the H¨ older property, that the series in formula (68) is convergent for any z ∈ 1 (N, C), hence the additive cocycle βz is well-defined. By the orthogonality property of the system {βμ+n } of additive cocycles, it follows that, for any z ∈ 1 (N, C) \ {0}, the zero-average function βz (·, T ) is non-constant, hence βz (·, T )0 = 0 ,

for all T > 0 .

Let s > 11/2. For any real-valued function f ∈ W s (SΓ ) supported on irreducible components of the principal series, we have  Dμ+n (f )βμ+n (x, T )] , for all (x, T ) ∈ SΓ × R . βf (x, T ) = Re[ n∈N

Theorem 6 then follows from the following approximation result. √ Lemma 8. For all n ∈ N, let υn := 4μn − 1 ∈ R+ . For every s > 11/2 there exists a constant Cs > 0 such that, for any real-valued function f ∈ W s (SΓ ) supported on irreducible components of the principal series, for all (x, T ) ∈ SΓ × R and t > 0, we have  T et  iυn t t 1 | t f ◦ hU Dμ+n (f )e 2 βμ+n (gt x, T )]| ≤ Cs f s e− 2 . τ (x)dτ − Re[ e2 0 n∈N The above lemma follows from the approximation theorem (see Theorem 10) and from the geodesic scaling properties of the cocycles (see Theorem 3) by an argument similar to that given in the proof of Lemma 7. The proof of Theorem 6 is therefore complete.  By Theorem 6 for real-valued functions supported on the principal series limit distributions exist along time sequences such that the orbit of the toral translation of frequency υ/2 ∈ R∞ on the infinite torus T∞ converges. Conjecturally the limit does not exist otherwise. However, for no function and no time sequence we are able to prove that the limit distribution fails to exist. Nevertheless, as a straightforward consequence of Theorem 6, we are able to derive constraints on

158

GIOVANNI FORNI

limit distributions under the assumption (unlikely, in our opinion) that they exist for all smooth functions supported on irreducible components of the principal series. In fact, it is proved in [9] that in case the limit distribution exists for all smooth functions, then the limit is largely independent on the function (Corollary 1.4) and the joint distribution of the cocycles has rotational symmetries (Corollary 1.5).

References [1] A. Avila, B. Fayad & A. Kocsard, On manifolds supporting distributionally uniquely ergodic diffeomorphisms, preprint (arXiv:1211.1519). [2] M. Babillot and F. Ledrappier, Geodesic paths and horocycle flow on abelian covers, Lie groups and ergodic theory (Mumbai, 1996), Tata Inst. Fund. Res. Stud. Math., vol. 14, Tata Inst. Fund. Res., Bombay, 1998, pp. 1–32. MR1699356 (2000e:37029) [3] V. Baladi and S. Gou¨ ezel, Banach spaces for piecewise cone-hyperbolic maps, J. Mod. Dyn. 4 (2010), no. 1, 91–137, DOI 10.3934/jmd.2010.4.91. MR2643889 (2012f:37077) [4] V. Baladi and M. Tsujii, Anisotropic H¨ older and Sobolev spaces for hyperbolic diffeomorphisms (English, with English and French summaries), Ann. Inst. Fourier (Grenoble) 57 (2007), no. 1, 127–154. MR2313087 (2008d:37034) [5] V. Bargmann, Irreducible unitary representations of the Lorentz group, Ann. of Math. (2) 48 (1947), 568–640. MR0021942 (9,133a) [6] P. Billingsley, Convergence of probability measures, 2nd ed., Wiley Series in Probability and Statistics: Probability and Statistics, John Wiley & Sons, Inc., New York, 1999. A WileyInterscience Publication. MR1700749 (2000e:60008) [7] A. I. Bufetov, Limit theorems for translation flows, Ann. of Math. (2) 179 (2014), no. 2, 431–499, DOI 10.4007/annals.2014.179.2.2. MR3152940 [8] A. I. Bufetov, Limit theorems for special flows over Vershik transformations (Russian, with Russian summary), Uspekhi Mat. Nauk 68 (2013), no. 5(413), 3–80; English transl., Russian Math. Surveys 68 (2013), no. 5, 789–860. MR3155159 [9] A. I. Bufetov, Limit theorems for translation flows, Ann. of Math. (2) 179 (2014), no. 2, 431–499, DOI 10.4007/annals.2014.179.2.2. MR3152940 [10] M. Burger, Horocycle flow on geometrically finite surfaces, Duke Math. J. 61 (1990), no. 3, 779–803, DOI 10.1215/S0012-7094-90-06129-0. MR1084459 (91k:58102) [11] S. Cosentino, A note on H¨ older regularity of invariant distributions for horocycle flows, Nonlinearity 18 (2005), no. 6, 2715–2726, DOI 10.1088/0951-7715/18/6/015. MR2176954 (2006g:37052) [12] S. G. Dani, Invariant measures and minimal sets of horospherical flows, Invent. Math. 64 (1981), no. 2, 357–385, DOI 10.1007/BF01389173. MR629475 (83c:22009) [13] D. Dolgopyat, On decay of correlations in Anosov flows, Ann. of Math. (2) 147 (1998), no. 2, 357–390, DOI 10.2307/121012. MR1626749 (99g:58073) [14] L. Flaminio and G. Forni, Invariant distributions and time averages for horocycle flows, Duke Math. J. 119 (2003), no. 3, 465–526, DOI 10.1215/S0012-7094-03-11932-8. MR2003124 (2004g:37039) [15] L. Flaminio and G. Forni, On the cohomological equation for nilflows, J. Mod. Dyn. 1 (2007), no. 1, 37–60. MR2261071 (2008h:37003) [16] L. Flaminio, G. Forni & F. Rodriguez Hertz, Invariant Distributions for homogeneous flows, preprint (arXiv:1303.7074). [17] G. Forni, Solutions of the cohomological equation for area-preserving flows on compact surfaces of higher genus, Ann. of Math. (2) 146 (1997), no. 2, 295–344, DOI 10.2307/2952464. MR1477760 (99d:58102) [18] G. Forni, On the Greenfield-Wallach and Katok conjectures in dimension three, Geometric and probabilistic structures in dynamics, Contemp. Math., vol. 469, Amer. Math. Soc., Providence, RI, 2008, pp. 197–213, DOI 10.1090/conm/469/09167. MR2478471 (2011b:37030) [19] H. Furstenberg, The unique ergodicity of the horocycle flow, Recent advances in topological dynamics (Proc. Conf., Yale Univ., New Haven, Conn., 1972; in honor of Gustav Arnold Hedlund), Springer, Berlin, 1973, pp. 95–115. Lecture Notes in Math., Vol. 318. MR0393339 (52 #14149)

LIMIT THEOREMS FOR HOROCYCLE FLOWS

159

[20] I. M. Gelfand and S. V. Fomin, Unitary representations of Lie groups and geodesic flows on surfaces of constant negative curvature (Russian), Doklady Akad. Nauk SSSR (N.S.) 76 (1951), 771–774. MR0044756 (13,473e) [21] I. M. Gelfand and M. Neumark, Unitary representations of the Lorentz group, Acad. Sci. USSR. J. Phys. 10 (1946), 93–94. MR0017282 (8,132b) [22] P. Giulietti, C. Liverani, and M. Pollicott, Anosov flows and dynamical zeta functions, Ann. of Math. (2) 178 (2013), no. 2, 687–773, DOI 10.4007/annals.2013.178.2.6. MR3071508 [23] S. Gou¨ ezel and C. Liverani, Banach spaces adapted to Anosov systems, Ergodic Theory Dynam. Systems 26 (2006), no. 1, 189–217, DOI 10.1017/S0143385705000374. MR2201945 (2007h:37037) [24] J. Griffin & J. Marklof, Limit theorems for skew-translations, preprint, 12 pages (available at www.maths.bris.ac.uk/ majm/bib/skew.pdf). [25] S. J. Greenfield and N. R. Wallach, Globally hypoelliptic vector fields, Topology 12 (1973), 247–254. MR0320502 (47 #9039) [26] B. M. Gureviˇ c, The entropy of horocycle flows (Russian), Dokl. Akad. Nauk SSSR 136 (1961), 768–770. MR0132410 (24 #A2255) [27] G. A. Hedlund, Fuchsian groups and transitive horocycles, Duke Math. J. 2 (1936), no. 3, 530–542, DOI 10.1215/S0012-7094-36-00246-6. MR1545946 [28] D. A. Hejhal, On the uniform equidistribution of long closed horocycles, Asian J. Math. 4 (2000), no. 4, 839–853. Loo-Keng Hua: a great mathematician of the twentieth century. MR1870662 (2002j:11049) [29] S. Helgason, A duality for symmetric spaces with applications to group representations, Advances in Math. 5 (1970), 1–154 (1970). MR0263988 (41 #8587) [30] A. Katok, Cocycles, cohomology and combinatorial constructions in ergodic theory, Smooth ergodic theory and its applications (Seattle, WA, 1999), Proc. Sympos. Pure Math., vol. 69, Amer. Math. Soc., Providence, RI, 2001, pp. 107–173, DOI 10.1090/pspum/069/1858535. In collaboration with E. A. Robinson, Jr. MR1858535 (2003a:37010) [31] A. Katok and J.-P. Thouvenot, Spectral properties and combinatorial constructions in ergodic theory, Handbook of dynamical systems. Vol. 1B, Elsevier B. V., Amsterdam, 2006, pp. 649– 743, DOI 10.1016/S1874-575X(06)80036-6. MR2186251 (2006k:37002) [32] A. G. Kuˇsnirenko, Spectral properties of certain dynamical systems with polynomial dispersal (Russian, with English summary), Vestnik Moskov. Univ. Ser. I Mat. Meh. 29 (1974), no. 1, 101–108. Collection of articles dedicated to the memory of Ivan Georgieviˇ c Petrovski˘ı. (Russian). MR0353369 (50 #5853) [33] F. Ledrappier and O. Sarig, Invariant measures for the horocycle flow on periodic hyperbolic surfaces, Electron. Res. Announc. Amer. Math. Soc. 11 (2005), 89–94 (electronic), DOI 10.1090/S1079-6762-05-00151-4. MR2183007 (2007b:37060) [34] F. Ledrappier and O. Sarig, Unique ergodicity for non-uniquely ergodic horocycle flows, Discrete Contin. Dyn. Syst. 16 (2006), no. 2, 411–433, DOI 10.3934/dcds.2006.16.411. MR2226490 (2007f:37047) [35] F. Ledrappier and O. Sarig, Invariant measures for the horocycle flow on periodic hyperbolic surfaces, Israel J. Math. 160 (2007), 281–315, DOI 10.1007/s11856-007-0064-0. MR2342499 (2008i:37061) [36] F. Ledrappier and O. Sarig, Fluctuations of ergodic sums for horocycle flows on Zd -covers of finite volume surfaces, Discrete Contin. Dyn. Syst. 22 (2008), no. 1-2, 247–325, DOI 10.3934/dcds.2008.22.247. MR2410958 (2009k:37074) [37] C. Liverani, On contact Anosov flows, Ann. of Math. (2) 159 (2004), no. 3, 1275–1312, DOI 10.4007/annals.2004.159.1275. MR2113022 (2005k:37048) [38] B. Marcus, Unique ergodicity of the horocycle flow: variable negative curvature case, Israel J. Math. 21 (1975), no. 2-3, 133–144. Conference on Ergodic Theory and Topological Dynamics (Kibbutz Lavi, 1974). MR0407902 (53 #11672) [39] B. Marcus, Ergodic properties of horocycle flows for surfaces of negative curvature, Ann. of Math. (2) 105 (1977), no. 1, 81–105. MR0458496 (56 #16696) [40] C. C. Moore, Exponential decay of correlation coefficients for geodesic flows, Group representations, ergodic theory, operator algebras, and mathematical physics (Berkeley, Calif., 1984), Math. Sci. Res. Inst. Publ., vol. 6, Springer, New York, 1987, pp. 163–181, DOI 10.1007/9781-4612-4722-7 6. MR880376 (89d:58102)

160

GIOVANNI FORNI

[41] J.-P. Otal, Sur les fonctions propres du laplacien du disque hyperbolique (French, with English and French summaries), C. R. Acad. Sci. Paris S´ er. I Math. 327 (1998), no. 2, 161–166, DOI 10.1016/S0764-4442(98)80081-5. MR1645108 (99e:35161) [42] O. S. Parasyuk, Flows of horocycles on surfaces of constant negative curvature (Russian), Uspehi Matem. Nauk (N.S.) 8 (1953), no. 3(55), 125–126. MR0058883 (15,442c) [43] M. Pollicott, Distributions at infinity for Riemann surfaces, Dynamical systems and ergodic theory (Warsaw, 1986), Banach Center Publ., vol. 23, PWN, Warsaw, 1989, pp. 91–100. MR1102704 (92e:58162) [44] M. Ratner, Factors of horocycle flows, Ergodic Theory Dynam. Systems 2 (1982), no. 3-4, 465–489 (1983), DOI 10.1017/S0143385700001723. MR721735 (86a:58076) [45] M. Ratner, Rigidity of horocycle flows, Ann. of Math. (2) 115 (1982), no. 3, 597–614, DOI 10.2307/2007014. MR657240 (84e:58062) [46] M. Ratner, Horocycle flows, joinings and rigidity of products, Ann. of Math. (2) 118 (1983), no. 2, 277–313, DOI 10.2307/2007030. MR717825 (85k:58063) [47] M. Ratner, The rate of mixing for geodesic and horocycle flows, Ergodic Theory Dynam. Systems 7 (1987), no. 2, 267–288, DOI 10.1017/S0143385700004004. MR896798 (88j:58103) [48] O. Sarig, The horocyclic flow and the Laplacian on hyperbolic surfaces of infinite genus, Geom. Funct. Anal. 19 (2010), no. 6, 1757–1812, DOI 10.1007/s00039-010-0048-9. MR2594621 (2011i:37042) [49] O. Sarig, Unique ergodicity for infinite measures, Proc. Inter. Congress Math., Hyderabad (2010). [50] O. Sarig & B. Schapira, The Generic Points for the Horocycle Flow on a Class of Hyperbolic Surfaces with Infinite Genus, IMRN, 2008, doi: 10.1093/imrn/rnn086, 37 pp. [51] P. Sarnak, Asymptotic behavior of periodic orbits of the horocycle flow and Eisenstein series, Comm. Pure Appl. Math. 34 (1981), no. 6, 719–739, DOI 10.1002/cpa.3160340602. MR634284 (83m:58060) [52] A. Str¨ ombergsson, On the uniform equidistribution of long closed horocycles, Duke Math. J. 123 (2004), no. 3, 507–547, DOI 10.1215/S0012-7094-04-12334-6. MR2068968 (2005f:11105) [53] J. Tanis, The cohomological equation and invariant distributions for horocycle maps, Ergodic Theory Dynam. Systems 34 (2014), no. 1, 299–340, DOI 10.1017/etds.2012.125. MR3163034 [54] A. Venkatesh, Sparse equidistribution problems, period bounds and subconvexity, Ann. of Math. (2) 172 (2010), no. 2, 989–1094, DOI 10.4007/annals.2010.172.989. MR2680486 (2012k:11061) [55] D. Zagier, Eisenstein series and the Riemann zeta function, Automorphic forms, representation theory and arithmetic (Bombay, 1979), Tata Inst. Fund. Res. Studies in Math., vol. 10, Tata Inst. Fundamental Res., Bombay, 1981, pp. 275–301. MR633666 (83j:10027) Department of Mathematics, University of Maryland, College Park, Maryland

Proceedings of Symposia in Pure Mathematics Volume 89, 2015 http://dx.doi.org/10.1090/pspum/089/01487

Limit theorems in dynamical systems using the spectral method S´ebastien Gou¨ezel Abstract. There are numerous techniques to prove probabilistic limit theorems for dynamical systems. These notes are devoted to one of these methods, the Nagaev-Guivarc’h spectral method, which extends to dynamical systems the usual proof of the central limit theorem relying on characteristic functions. We start with the simplest example (expanding maps of the interval), where everything is elementary. We then consider more recent (and more involved) applications of this method, on the one hand to get the convergence to stable laws in intermittent maps, on the other hand to obtain precise results on the almost sure approximation by a Brownian motion.

1. Introduction There are many ways to prove the central limit theorem for square-integrable independent random variables, each of them having various advantages (and weaknesses) that make them generalizable to different situations. While the most versatile approaches are probably those relying on martingale arguments, this text is devoted to the approach that is generally used in first-year probability courses, relying on characteristic functions. Our goal in this text is to illustrate its effectiveness to prove the central limit theorem, or more general limit theorems, in deterministic dynamical systems. This powerful method, which we will call the Nagaev-Guivarc’h spectral method, was devised by Nagaev to study Markov chains [Nag57, Nag61], and reinvented by Guivarc’h for dynamical systems [RE83, GH88]. An excellent reference on this method is [HH01]. We start with the classical proof of the central limit theorem (see for instance [Fel66, Theorem XV.5.1]), that we will revisit later in dynamical situations. 2 Theorem 1.1. Let Xi be a sequence √ of i.i.d. centered random variables in L . Write Sn = X1 + · · · + Xn . Then Sn / n converges in distribution to a Gaussian random variable N (0, σ 2 ), where σ 2 = E(Xi2 ).

Proof. Thanks to the independence, we can compute the characteristic function of Sn : for any real t, (1.1)

E(eitSn ) = λ(t)n ,

2010 Mathematics Subject Classification. Primary 37A50; Secondary 37A30, 37D25. Key words and phrases. Transfer operator, spectral method, intermittent map, almost sure invariance principle. c 2015 S´ ebastien Gou¨ ezel

161

´ ¨ SEBASTIEN GOUEZEL

162

where λ(t) = E(eitX1 ). Since X1 is square integrable, one has for small t the asymptotics t2 E(X12 ) + o(t2 ) = 1 − σ 2 t2 /2 + o(t2 ). 2 Combining those two equations, we get for every fixed t ∈ R √ !n 2 2 E(eitSn / n ) = 1 − σ 2 t2 /(2n) + o(1/n) → e−σ t /2 . (1.2)

λ(t) = 1 + itE(X1 ) −

The function e−σ t /2 is the characteristic function of the Gaussian distribution N (0, σ 2 ). As pointwise convergence of characteristic functions implies convergence in distribution (this is L´evy’s theorem, see [Fel66, Theorem XV.3.2]), the result follows.  2 2

There are two main points in the above proof: • First, in (1.1), one uses independence to write E(eitSn ) as λ(t)n , for some function λ(t) (here, it is simply E(eitX1 )). • Then, in (1.2), one finds the asymptotic behavior of λ(t) for small t. √ The convergence in distribution of Sn / n follows. One can abstract the above proof, to get the following statement: Theorem 1.2 (Nagaev-Guivarc’h method, naive version). Let X1 , X2 , . . . be a sequence of real random variables. Write Sn for their partial sums. Assume that there exist δ > 0 and functions c(t), λ(t) and dn (t), defined on [−δ, δ], such that for all t ∈ [−δ, δ] and all n ∈ N, (1.3)

E(eitSn ) = c(t)λ(t)n + dn (t).

Moreover, assume that: (1) there exist A and σ 2 in C such that λ(t) = exp(iAt − σ 2 t2 /2 + o(t2 )) when t → 0; (2) the function c is continuous at 0; (3) the quantity dn L∞ [−δ,δ] tends to 0 when n tends to infinity. √ Then A ∈ R, σ 2  0, and (Sn − nA)/ n converges to a Gaussian distribution N (0, σ 2 ) when n tends to infinity. Proof. The proof is essentially the same. First, taking t = 0, one gets c(0)λ(0)n = 1 − dn (0) → 1, hence c(0) = 1 and λ(0) = 1. Then, one uses the asymptotic expansion of λ to show that E(eit(Sn −nA)/n ) tends to 1. By L´evy’s theorem, Sn /n − A converges in distribution to 0. As Sn is real, it follows that √ A ∈ R. Using again the asymptotic expansion of λ, we show that E(eit(Sn −nA)/ n ) 2 2 converges to the function e−σ t /2 . By L´evy’s theorem again, this has to be the characteristic function of a real random variable, showing that σ 2  0.  As such, the method does not seem of great interest: in concrete non-independent situations, how could one effectively construct functions c, λ and dn that satisfy (1.3)? The main hindsight is that, in various contexts, these quantities will come for free from spectral arguments: in particular, λ(t) will correspond to an eigenvalue of an operator depending on t. Once spectral theory tools are available, one can prove the expansion (1.3) in several situations, that we will review in this paper. As we explained above, this easily implies the central limit theorem. However, it has much deeper consequences:

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

163

almost any proof in probability theory that relies on characteristic functions can be adapted to this setting. In a dynamical system setting, one can therefore prove, among other results (under suitable assumptions): • the Berry-Esseen theorem, i.e., control the speed of convergence in the central limit theorem (see [GH88] or Theorem 3.7 below). This kind of estimates can sometimes also be proved using martingale arguments. However, with the Nagaev-Guivarc’h method, one can characterize the functions for which there is a given speed of convergence in the central limit theorem [Gou10b]. • the local limit theorem, i.e., control the probability that Sn belongs to a given interval of size of order √ 1 (while the central limit theorem deals with intervals of size of order n). See for instance [GH88]. Note that this kind of fine control is impossible to prove using martingale arguments. • deduce the local limit theorem from the central limit theorem [Her05]. • the convergence to other limit laws, for instance stable laws, see [AD01]. These results will be explained in Section 4, together with applications to intermittent maps. • the vector-valued almost sure invariance principle [Gou10a], see Section 5 below. One expects that virtually any result known for i.i.d. sequences should hold for dependent random variables if one can prove that (1.3) holds. In this article, we concentrate on the applications of the Nagaev-Guivarc’h method to dynamical systems. Let us note, however, that the method applies equally well to Markov chains, and that most of the above statements have counterparts for Markov chains. In this respect, see [HP10] and references therein. In the next section, we will give the precise statements of spectral theory that we will use all over this paper. Then, in Section 3, we will give full self-contained proofs of how the method can be used in the simplest situation, expanding maps of the interval (in this situation, there are numerous proofs of the central limit theorem, so this section is really meant as an illustration of how the method works, not of his full power). Sections 4 and 5 are then devoted to more complicated applications, for which the Nagaev-Guivarc’h method gives the best known results.

2. A bit of spectral theory In this paragraph, we give a brief overview of spectral theory, or more precisely of the spectral theory we will need. A major reference on this topic is [Kat66]. Let (B, ·) be a complex Banach space, and let L be a continuous linear operator on B (i.e., there exists a constant C such that Lu  Cu for all u ∈ B). The spectrum of L, denoted by σ(L), is the set of complex numbers z such that zI − L is not invertible. Example 2.1 (An example to keep in mind). Let B be the set of continuous functions  x f : [0, 1] → C with f (0) = 0. Define an operator L : B → B by Lf (x) = 0 f (y) dy. It is a continuous linear operator. If Lf = 0, then f = 0 (just differentiate), hence L is injective. However, it is not surjective, since its image is made of C 1 functions. Hence, 0 ∈ σ(L). For any z = 0, the operator zI − L is invertible (one should just solve a differential equation to invert it), hence

164

´ ¨ SEBASTIEN GOUEZEL

σ(L) = {0}. Note that L has no nonzero eigenfunction at all. This shows that the behavior of operators can be very different from the finite-dimensional situation. Let z be an isolated point in σ(L). The corresponding spectral projection Πz 1 is defined by Πz := 2iπ (wI − L)−1 dw, where C is a small circle around z. This C definition is independent from the choice of C, by holomorphy of w → (wI − L)−1 outside of σ(L). When L is finite-dimensional, it is easy to check (for instance by putting L in upper-triangular form and using the Cauchy formula for integrals) that Πz is the projection on the generalized eigenspace associated to z, with kernel the direct sum of the other generalized eigenspaces. In infinite dimension, an analogous result is true: the operator Πz is a projection, its image and kernel are invariant under L, and the spectrum of the restriction of L to the image is {z}, while the spectrum of the restriction of L to the kernel if σ(L) − {z}. See [Kat66, Theorem III.6.17]. We say that z ∈ C is an isolated eigenvalue of finite multiplicity of L if z is an isolated point of σ(L), and the range of Πz is finite-dimensional. In this case, ker(zI − L)j is independent of j for large enough j, and coincides with Im Πz . Note that the converse is not true: in Example 2.1, we have ker(0I − L)j = {0} for all j, but 0 is not an isolated eigenvalue of finite multiplicity (the spectral projection Π0 is the identity). If z is an isolated eigenvalue of finite multiplicity of an operator L, its multiplicity is the dimension of the range of Πz . We denote by σess (L) the essential spectrum of L, i.e., the set of points in σ(L) that are not isolated eigenvalues of finite multiplicity. One can think of L as a finite matrix outside of σess (L), and as something more complicated on σess (L). This will make it possible to understand the asymptotics of Ln if the dominating elements of the spectrum (i.e., the points in σ(L) with large modulus) do not belong to σess (L). We say that such an operator is quasi-compact, or that it has a spectral gap. The spectral radius r(L) of L is sup{|z| : z ∈ σ(L)}, and the essential spectral radius ress (L) is sup{|z| : z ∈ σess (L)}. An operator is quasi-compact if ress (L) < 1/n r(L). These quantities can be computed as follows: r(L) = inf n>0 Ln  , and (2.1)

1/n

ress (L) = infLn − K

,

where the infimum is over all integers n > 0 and all compact operators K. In particular, we get that the essential spectral radius of a compact operator vanishes. This corresponds to the classical fact that the spectrum of a compact operator is a sequence of eigenvalues of finite multiplicity tending to 0. The above formula also shows that the essential spectral radius is not altered by the addition of a compact operator. Intuitively, adding a compact operator only amounts to adding (or perturbing) isolated eigenvalues of finite multiplicity. To estimate the essential spectral radius of an operator, the formula (2.1) is not very convenient, since one should exhibit good compact operators to use it. A more efficient technique relies on inequalities named in the literature after DoeblinFortet or after Lasota-Yorke, that we will call DFLY inequalities. This technique is formalized in the following lemma, essentially due to [Hen93] (the following formulation can be found in [BGK07]). Lemma 2.2. Consider a continuous linear operator L on a complex Banach space B. Let M > 0. Suppose that, for some n > 0 and for all x ∈ B, (2.2)

Ln x  M n x + xw ,

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

165

where ·w is a seminorm on B such that the unit ball of B (for ·) is relatively compact for ·w . Then ress (L)  M . If one could decompose L as a sum of two operators L1 + L2 bounded respectively by the first and second term of (2.2), the operator L1 would have a spectral radius (and therefore an essential spectral radius) at most M , while L2 would be compact. Hence, (2.1) would imply that the essential spectral radius of L would be at most M , giving the claim of the lemma. This heuristic argument motivates the lemma, but the rigorous proof is completely different. We will also need to describe the evolution of the spectrum when one perturbs operators. Consider a family of operators t → Lt , depending continuously on t (i.e., Lt − Ls  tends to 0 when t tends to s). There are such situations where the spectrum varies discontinuously: one can for instance have σ(L0 ) equal to the unit disk in C, and σ(Lt ) equal to the unit circle for t = 0. However, the situation is better for eigenvalues of finite multiplicity (where everything can be reduced to finite-dimensional arguments), and even better for eigenvalues of multiplicity one. We only give a precise statement for this case, since this is what we will need later on, see [Kat66, IV.3.6 and Theorem VII.1.8] for more general statements. Proposition 2.3. Let z0 be an isolated eigenvalue of multiplicity one of an operator L0 . Then any operator L close enough to L0 has a unique eigenvalue z(L) close to z0 . Moreover, if t → Lt is a family that depends on t in a C 0 , or C k , or analytic way, then the eigenvalue zt and the corresponding eigenprojection Πt and eigenvector ξt depend on t in the same way. Assume moreover that the rest of the spectrum of L0 is contained in a disk of strictly smaller radius B(0, |z0 | − ε). Write Qt = (I − Πt )Lt for the part of Lt corresponding to σ(Lt ) − {zt }, so that Lt = zt Πt + Qt . For any r > |z0 | − ε, these operators satisfy Qnt   Cr n , for small enough t and for all n ∈ N, where the constant C is independent of t and n. We can now reformulate the Nagaev-Guivarc’h spectral method of Theorem 1.2 in a spectral setting: Theorem 2.4 (Nagaev-Guivarc’h method, spectral version). Let X1 , X2 , . . . be a sequence of real random variables, with partial sums denoted by Sn . Assume that there exist a complex Banach space B and a family of operators Lt acting on B (for |t|  δ) and ξ ∈ B, ν ∈ B ∗ such that: (1) coding: for all n ∈ N, for all |t|  δ, E(eitSn ) = -ν, Lnt ξ.. (2) spectral description: ress (L0 ) < 1, and L0 has a single eigenvalue of modulus  1, located at 1. It is an isolated eigenvalue, of multiplicity one. (3) regularity: The family t → Lt is C 2 . √ Then there exist A ∈ R and σ 2  0 such that (Sn −nA)/ n converges in distribution 2 to a Gaussian N (0, σ ). Proof. The first part of Proposition 2.3 ensures that, for small enough t, the operator Lt has a unique eigenvalue λ(t) close to 1. Moreover, the second part of this proposition ensures that the rest of the spectrum of Lt is contained in a disk of radius r < 1 (with uniform bounds for the iterates of the restricted operators). We get Lnt = λ(t)n Πt + Qnt ,

166

´ ¨ SEBASTIEN GOUEZEL

where Πt is the eigenprojection of Lt corresponding to the eigenvalue λ(t), and Qt = (I −Πt )Lt satisfies Qnt   Cr n . It follows that E(eitSn ) = λ(t)n -ν, Πt ξ.+-ν, Qnt ξ.. This is a decomposition of the form λ(t)n c(t) + dn (t), where dn tends to 0 with n, and c is continuous at 0. To apply Theorem 1.2, it remains to see that the function 2 2 2 λ(t) has an asymptotic expansion of the form eiAt−σ t /2+o(t ) for small t. This follows from the fact that this function is C 2 , by Proposition 2.3 and the regularity assumption.  In a lot of applications of the above theorem, the space B will be a space of functions, ξ will be the function 1, and ν the integration with respect to a fixed measure. In particular, the sequence Xn belongs to the space B (see for instance the illustration with one-dimensional expanding maps, in Section 3). However, there are more exotic applications, such as [BGK07]. In this article, the method is applied to a lattice of coupled expanding maps: the Banach space B is a set of “projective limits of sequences of compatible measures on increasing sequences of boxes” (it is therefore very far from a space of functions, or even from a space of distributions). This justifies the seemingly abstract formulation of Theorem 2.4, where we have not insisted that the Banach space B should be related in any sense to the sequence X1 , X2 , . . . , except for the relation E(eitSn ) = -ν, Lnt ξ..

3. The Nagaev-Guivarc’h spectral method for expanding maps of the interval In this paragraph, we illustrate how Theorem 2.4 can be applied to prove a central limit theorem, in one of the simplest possible situations: uniformly expanding maps of the interval which are piecewise onto. We consider a map T : I → I, where I = [0, 1] is written as the union of two disjoint intervals I1 = [0, a) and I2 = [a, 1]. We assume that the restriction of T to Ii admits a C 2 extension to Ii , which is a diffeomorphism between Ii and I, and satisfies T   α > 1. Since the boundaries will not play any role in what follows, we will abusively write Ii instead of Ii .  k Theorem 3.1. Let f : I → R be a C 1 function. Write Sn f = n−1 k=0 f ◦ T . 2 There exist A ∈ R and √ σ  0 such that, on the probability space (I, Leb), the sequence (Sn f − nA)/ n converges in distribution to a Gaussian N (0, σ 2 ). There are a lot of comments to be made about this theorem, various methods of proof, various extensions, and so on. Since the main emphasis here is to show a simple application of the spectral method, we first start with the proof, and will give the comments afterwards. To apply Theorem 2.4, we should introduce an operator L = L0 acting on a Banach space, related to the composition with T , and with good spectral properties (it should be quasi-compact). The first idea is to use the composition with T (also called the Koopman operator) but there are difficulties to do so, since f ◦ T n is usually much wilder than f (for instance, its derivative is  Cαn → ∞). These difficulties can be resolved by working in distribution spaces, but it is more elementary (and, often, more efficient) to consider the dual of the Koopman operator, and work with smooth functions. Let hi : I → Ii be the inverse of T|Ii . We define an

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

167

operator L by (3.1)

Lu(x) =



hi (x)u(hi x) =

i

 T (y)=x

1 T  (y)

u(y).

This operator satisfies     Lu · v dLeb = hi (x)u(hi x)v(x) dx = u(y)v(T y) dy i

(3.2) =

I

i

Ii

u · v ◦ T dLeb .

Hence, L is the adjoint of the Koopman operator, as desired. It is called the transfer operator, or the Ruelle-Perron-Frobenius operator. Let us now define perturbed transfer operators, as follows. Let f be a C 1 function for which we want to prove a central limit theorem. We define Lt u = L(eitf u). Lemma 3.2. The operators Lt satisfy the identity   n Lt u · v = u · eitSn f · v ◦ T n dLeb .



Proof. We start from the right hand side, and use the duality property (3.2):  itSn f n u·e · v ◦ T dLeb = (ueitf ) · (eitSn−1 f v ◦ T n−1 ) ◦ T dLeb   = L(ueitf ) · eitSn−1 f v ◦ T n−1 dLeb = Lt (u) · eitSn−1 f v ◦ T n−1 dLeb .

Therefore, an induction gives the result of the lemma.   In particular, E(eitSn f ) = Lnt 1 · 1 dLeb. This shows that the coding assumption of Theorem 2.4 is satisfied, taking for ξ the constant function 1, and for ν the integration with respect to Lebesgue measure. Note that we have not yet specified the Banach space B on which the operators Lt will act. One could try to use the spaces L∞ or L2 , but there would be no quasi-compactness. We will rather use . Note that t → eitf is analytic from R to C 1 (just use the series expansion B = C 1 itf e = (itf )k /k!), hence t → Lt is analytic, and in particular C 2 . As is often the case, the only difficulty to apply Theorem 2.4 is the quasicompactness assumption. This is where the assumption of uniform expansion T   α > 1 will play a role (without this assumption, all we have said until now remains true, but the theorem is false even if T  (x0 ) = 1 at a single point x0 , as we will see later in Section 4). The main point is that the iterates of T have a small distortion, i.e., if a set has relatively small measure then its images also have relatively small measure. This is the content of the following technical lemma. For i = (i1 , . . . , in ) ∈ {1, 2}n , let hi = hi1 ◦ · · · ◦ hin . These functions are the inverse branches of T n . Lemma 3.3. There exists C such that, for all n, for all i of length n, |hi (x)|  Chi (x).

´ ¨ SEBASTIEN GOUEZEL

168

This lemma shows that (log hi ) is uniformly bounded. Hence, |log hi (x) − log hi (y)|  C|x − y|. In particular, hi (x)/hi (y) is bounded away from zero and infinity. This implies that Leb(hi U )/ Leb(hi V ) is equal to Leb(U )/ Leb(V ) up to a uniform multiplicative constant. This justifies the affirmation that the lemma proves a bounded distortion property of T n . Proof. Writing the formula for the derivative of a composition, and taking the logarithm, we get log(hi ) =

n 

(log hik ) ◦ hik+1 ◦ · · · ◦ hin .

k=1

We differentiate again this equality, getting an expression for hi /hi . On the right hand side, the derivative of the k-th term has a factor hik /hik (bounded by a constant C), multiplied by derivatives of the functions hij for j > k. Each of these is  α−1 < 1, hence we get |hi /hi | 

n 

Cα−(n−k)  C  .



k=1

Corollary 3.4. There exists a constant C such that, for all n ∈ N, for all C 1 function u, Ln uC 1  Cα−n uC 1 + CuC 0 .   hi (x)u ◦ hi (x), where the sum is over all i of Proof. We have Ln u = length n. Writing Ii = hi (I), the bounded distortion lemma implies that |hi (x)|  C Leb(Ii ). Therefore,  |Ln u(x)|  C Leb(Ii )uC 0 = CuC 0 ,  as Leb(Ii ) = Leb(I) = 1. Let us now control the derivative of Ln u, i.e.,   hi (x)u ◦ hi (x) + hi (x)u ◦ hi (x)hi (x). In the first term, |hi |  C|hi | by Lemma 3.3. Hence, this term is bounded by CuC 0 as above. In the second term, we bound the last hi (x) by α−n , and the  remaining part is bounded by Cu C 0 as above. The above corollary is a DFLY inequality. Together with Lemma 2.2 and the compactness of the inclusion of C 1 in C 0 , this implies that the essential spectral radius of L acting on C 1 is  α−1 < 1. Let us now control the outer spectrum of L. Lemma 3.5. The operator L acting on C 1 has a simple eigenvalue at 1, and no other eigenvalue of modulus  1. With this lemma, we can conclude the proof of Theorem 3.1 by applying Theorem 2.4 (the spectral version of Nagaev-Guivarc’h argument). Indeed, we have checked all its assumptions: the coding is proved in Lemma 3.2, the spectral description follows from Lemma 3.5, and the smoothness of t → Lt is trivial since this family is analytic as we explained above. It remains to prove Lemma 3.5.

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

169

Proof of Lemma 3.5. The iterates of L have a uniformly bounded norm, by Corollary 3.4. This shows that L has no eigenvalue of modulus > 1. Moreover, if there are eigenvalues of modulus 1, they are semisimple (i.e., there are no Jordan blocks). To conclude, one should control these (finitely many) eigenvalues and the corresponding eigenfunctions. For ρ of modulus 1, denote by Πρ the corresponding spectral projection (it vanishes if ρ is not an eigenvalue of L). We have n−1 n−1 1 k 1  k L 1= ρ Πρ 1 + O(1/n) n n ρ k=0

k=0

(where the sum over ρ involves only finitely many terms, the eigenvalues of modulus n−1 1 of L). This converges in C 1 to Π1 1 (since for ρ = 1 the sequence k=0 ρk is bounded). By bounded distortion, the function Lk 1 takes its values in an interval [C −1 , C] for some C > 0. It follows that the function ϕ = Π1 1 also takes its values in [C −1 , C]. In particular, Π1 is not zero, i.e., 1 is an eigenvalue, and ϕ is a corresponding eigenfunction. It remains to see that any other eigenfunction is in fact proportional to ϕ. Consider an eigenfunction ψ, for an eigenvalue ρ of modulus 1. Consider x such that M = |ψ(x)/ϕ(x)| is maximal (since ϕ  C −1 , this is well defined). Then # # # # M ϕ(x) = |ψ(x)| = |Ln ψ(x)| = # hi (x)ψ(hi x)#    hi (x)|ψ|(hi x)  hi (x)M ϕ(hi x) = M Ln ϕ(x) = M ϕ(x). It follows that all these inequalities are equalities. In particular, all the complex numbers ψ(hi x) have the same phase, and moreover |ψ|(hi x) = M ϕ(hi x). This shows that ψ/ϕ is constant on T −n (x). As this set becomes more and more dense when n tends to infinity, this shows that ψ/ϕ is constant, as desired.  Let us give some comments about Theorem 3.1 and its proof: (1) In the course of the proof, we have constructed an eigenfunction ϕ of the transfer operator L. One can check that ϕ is the density of an invariant probability measure m (i.e., it satisfies m(T −1 B) = m(B)), called the SRB measure, or physical measure, of the system. Moreover, this measure m is ergodic (i.e., if T −1 B = B, then m(B) ∈ {0, 1}): this follows from the fact that the eigenvalue 1 of L is simple. 2 (2) One can identify  the quantities A and σ in the statement of the theorem: ˜ one has A = f dm, and writing f = f − f dm,  ∞   (3.3) σ 2 = f˜2 dm + 2 f˜ · f˜ ◦ T k dm. k=1

 Moreover, this series is converging, since the correlations f˜ · f˜ ◦ T k dm tend to zero exponentially fast. This again follows from the spectral description of L (here, what matters is that, apart from the eigenvalue 1, the spectrum of L is contained in a disk of radius < 1). (3) The previous items show that the natural measure for the system is not Lebesgue measure, but m. Hence, a central limit theorem on (I, m) would be more natural than on (I, Leb). It turns out that these statements are equivalent, see Theorem 4.9 below.

170

´ ¨ SEBASTIEN GOUEZEL

We have written the proof for B the space of C 1 functions, hence it only applies older functions (the opif f is C 1 . However, essentially the same proof works for H¨ erator L is still quasi-compact on the space of H¨older functions). More importantly, the proof also applies to much more general dynamics if they retain uniform expansion. Let us give a (non-exhaustive) list of examples that can be treated using such methods. In most cases, the difficulty is to find a space on which the unperturbed transfer operator L has a spectral gap. • Subshifts of finite type with a Gibbs measure [GH88, PP90]. • Anosov or Axiom A systems. They can be reduced to subshifts of finite type using coding, but one may also work directly with spaces of distributions [BT07,GL08] (these distributions should be smooth in the unstable direction and dual of smooth in the stable direction). • Piecewise expanding maps in dimension 1, using spaces of bounded variation functions or variations around this idea [HK82, Kel85]. • Piecewise expanding maps in higher dimension, if the expansion dominates the local complexity in the sense of [Buz97]. See for instance [Sau00]. • Lattices of weakly coupled expanding maps [BGK07]. • Non-uniformly hyperbolic maps for which the time to wait before seeing the hyperbolicity has exponentially small tails (including for instance billiards, or unimodal maps of the interval of Collet-Eckmann type, or some H´enon maps), see [You98]. • Billiard maps [DZ11]. • Time-one maps of contact Anosov flows [Tsu10, Tsu12]. Since there is no hyperbolicity in the direction of the flow, the mixing in this direction should come from a different mechanism related to the non-integrability coming from the contact structure. To conclude this section, let us describe a strengthening of Theorem 3.1, by estimating the speed of convergence in the central limit theorem: Theorem 3.6. Under the assumptions of Theorem 3.1, assume moreover that σ 2 > 0. Then there exists C > 0 such that, for any n > 0, for any real interval J, # # √ √ #μ{x : (Sn f (x) − nA)/ n ∈ J} − P(N (0, σ 2 ) ∈ J)#  C/ n. One can show that, for the interval maps under consideration, the condition σ 2 > 0 is equivalent to the fact that f is not a coboundary, i.e., it can not be written as f = g − g ◦ T + c, where c is a scalar and g is a C 1 function. This theorem is not at all specific to interval maps, it is a general consequence of the spectral method, more precisely of a version of the Nagaev-Guivarc’h theorem that mimics the statement (and the proof) of the classical Berry-Esseen theorem in the independent case: Theorem 3.7 (Nagaev-Guivarc’h method, Berry-Esseen version). Under the assumptions of Theorem 2.4, assume moreover that t → Lt is C 3 and σ 2 > 0. There exists C > 0 such that, for any n > 0, for any real interval J, # # √ √ #P((Sn − nA)/ n ∈ J) − P(N (0, σ 2 ) ∈ J)#  C/ n. The C 3 assumption corresponds to the fact that, in probability theory, the Berry-Esseen theorem is true for random variables in L3 . This theorem readily implies Theorem 3.6 since t → Lt is analytic in this case.

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

171

Proof of Theorem 3.7. Let X be any real random variable, and Y be Gaussian with variance σ 2 > 0. The Berry-Esseen lemma (see for instance [Fel66, Lemma XVI.3.2]) ensures that, for any real interval J, for any T > 0,  T 2 2 |E(eitX ) − e−σ t /2 | |P(X ∈ J) − P(Y ∈ J)|  C dt + C/T. t 0 √ We will use this√inequality with T = ε n (for some suitably small ε  δ) and X = (Sn − nA)/ n. To conclude, it suffices to show that √  ε√n 2 2 |E(eit(Sn −nA)/ n ) − e−σ t /2 | C dt  √ . (3.4) t n 0 √



By assumption, we have E(eit(Sn −nA)/ n ) = e−it nA -ν, Lnt/√n ξ.. We have to deal in a special way with the interval t ∈ [0, 1/n], since the factor 1/t in (3.4) is not integrable at 0. In this case, we use a crude estimate: # # # # # # √ √ # # # # # # #E(eit(Sn −nA)/ n ) − e−it nA # = #-ν, Lnt/√n ξ. − 1# = #-ν, Lnt/√n ξ. − -ν, Ln0 ξ.# #3 n−1 4# # #  # # n−k−1 k √ = # ν, Lt/ n (Lt/√n − L0 )L0 ξ #. # # k=0

The iterates of the operator L0 have uniformly bounded norm, by assumption. The same holds for Lt/√n since its dominating eigenvalue is bounded in modulus by 1. √ Finally, Lt/√n − L0   Ct/ n. It follows that the above quantity is bounded by √ n · Ct/ n. Integrating over t, we obtain √  1/n 2 2 |E(eit(Sn −nA)/ n ) − e−σ t /2 | dt t 0 √ √ √  1/n 2 2 |E(eit(Sn −nA)/ n ) − e−it nA | + |e−it nA − 1| + |1 − e−σ t /2 |  dt t 0 √ √  1/n C n · Ct/ n + t nA + σ 2 t2 /2  dt  √ . t n 0 √ Now, we deal with the remaining interval t ∈ [1/n, ε n]. The spectral decomposition of Lt gives a decomposition E(eitSn ) = λ(t)n c(t) + dn (t), where c is a C 1 function with c(0) = 1, and dn is exponentially small (see the proof of Theorem 2.4). Moreover, λ is C 3 at 0 (since this is the case of Lt , and the eigenvalue is as smooth as the operator by Proposition 2.3), and it has an as2 2 3 ymptotic expansion λ(t) = eiAt−σ t /2+t h(t) where h(t) = O(1). We will use the 2 2 3 ˜ shorthand λ(t) = e−iAt λ(t) = e−σ t /2+t h(t) , designed so that E(eit(Sn −nA) ) = ˜ n c(t) + e−itnA dn (t). λ(t) The contribution of dn to the integral (3.4) is exponentially small (this is why we had to restrict to t  1/n). The remaining quantity to be estimated is # #  ε√n ##λ(t/ ˜ √n)n c(t/√n) − e−σ2 t2 /2 ## dt t t=1/n # # √  ε√n  ε√n ##λ(t/ ˜ √n)n − e−σ2 t2 /2 ## √ n |c(t/ n) − 1| ˜  dt + dt. |λ(t/ n) | t t t=1/n t=1/n

´ ¨ SEBASTIEN GOUEZEL

172

√ √ In the first term, we have |c(t/ n) − 1|  Ct/ n since c is C 1 with c(0) = 1. 2 2 ˜ ˜ Moreover, λ(u)  e−σ u /4 if u is small enough, thanks to its expansion λ(u) = −σ 2 u2 /2+u3 h(u) e with h(u) = O(1). Hence, if ε is small enough, this term is   √ √ 2 2 C R e−σ t /4 dt/ n = O(1/ n). For the second term, using the inequality |ex − 1|  |x|e|x| , we have # # # # 3 √ √ 2 2 2 2 #˜ √ n # # # #λ(t/ n) − e−σ t /2 # = e−σ t /2 #et h(t/ n)/ n − 1# √ √ √ √ 2 2 2  e−σ t /2 et |h(t/ n)|·t/ n · t3 |h(t/ n)|/ n √ 2 2 2  e−σ t /2 et h∞ ε · t3 h∞ / n. √ 2 2 If ε is small enough so that h∞ ε < σ 2 /4, this is bounded by Ct3 e−σ t /4 / n. Dividing √ by t and integrating, we get that the contribution of this term is also  O(1/ n), as desired. Remark 3.8. In this proof, we used the spectral decomposition of Lnt as λ(t) Πt + Qnt , which gives E(eitSn ) = λ(t)n c(t) + dn (t) where c(t) = -ν, Πt ξ., and dn (t) = -ν, Qnt ξ. is exponentially small as Qnt  is itself exponentially small. We claim that, in fact, |dn (t)|  C|t|r n for some r < 1, with an additional factor t which makes it possible to avoid the special treatment of the interval [0, 1/n] in the above proof. To show this claim, let us first note that, for any η ∈ B, n

1 = E(ei0Sn ) = -ν, Ln0 ξ. = -ν, Π0 ξ. + -ν, Qn0 ξ.. Letting n tend to infinity, we get -ν, Π0 ξ. = 1. With the above equation, this gives for any n that -ν, Qn0 ξ. = 0. Finally, dn (t) = -ν, Qnt ξ. = -ν, (Qnt − Qn0 )ξ. =

n−1 

-ν, Qkt (Qt − Q0 )Qn−k−1 ξ.. 0

k=0

Qkt 

Q0n−k−1 

As and are both exponentially small and Qt − Q0  = O(t), this gives the desired conclusion. 4. Stable limit distributions for intermittent maps In the previous section, we have described what happens to uniformly expanding maps: there is so much chaos that the sequence f, f ◦ T, . . . almost behaves like an independent sequence, and satisfies a central limit theorem. However, if T is less chaotic, one might expect a different behavior. As we can guess from the formula (3.3) for σ 2 , the critical parameter is the speed of decay of correlations: if f has zero average for a given invariant measure, how fast does f · f ◦ T n tend to 0? If this sequence is summable, the formula (3.3) makes sense and one expects a central limit theorem (martingale methods make this intuition precise: if the correlations are summable and some technical conditions are satisfied, then a central limit theorem holds, see for instance [Liv96]). However, if the correlations are not summable, one would expect a different behavior. This paragraph is devoted to such an example, so-called intermittent maps. An intermittent map is an expanding map of the interval, with uniform expansion except for a fixed point x0 where T  (x0 ) = 1. This implies that a point close to x0 takes a long time to drift away from x0 , so that the dynamics does not look chaotic for a long time and one expects mixing to be rather slow. The

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

173

precise behavior depends on the fine asymptotics of T close to x0 : if we have T (x0 + h) = x0 + h + ch1+γ (1 + o(1)), then the exponent γ dictates almost everything. While the results we will explain hold in a much wider setting (see [Zwe98]), we will for simplicity only consider an explicit family of maps, that were introduced by Liverani, Saussol and Vaienti in [LSV99] as a modification of the classical Pomeau-Manneville maps [PM80]. Given γ > 0, the map Tγ : [0, 1] → [0, 1] is defined by ' x(1 + 2γ xγ ) if x  1/2, Tγ (x) = 2x − 1 if x > 1/2. It has two branches, which are both onto, and is uniformly expanding away from 0. For γ < 1, it admits an absolutely continuous invariant probability measure m (while, for γ  1, the corresponding measure is infinite), whose density is Lipschitz on any compact subset of (0, 1], and grows like c/xγ for small x. It is also known that the correlations for m decay exactly like c/n1/γ−1 (see for instance [You99, Sar02, Gou04b]). Hence, we expect a central limit theorem for γ < 1/2, and a different behavior for γ  1/2. This is indeed the case:  Theorem 4.1. Let f : I → R be a C 1 function. Write A = f dm. √ (1) If γ < 1/2, there exists σ 2  0 such that (Sn f − nA)/ n converges in distribution to a Gaussian random variable N (0, σ 2 ).  (2) If γ ∈ (1/2, 1) and f is generic in the sense that f (0) = f dm, there exists a stable law W of index 1/γ such that (Sn f − nA)/nγ tends in distribution to W .  (3) If γ ∈√(1/2, 1) and f (0) = f dm, there exists σ 2  0 such that (Sn f − nA)/ n converges in distribution to a Gaussian random variable N (0, σ 2 ). The convergence in distribution in the theorem holds in both probability spaces (I, m) and (I, Leb). We will concentrate on the stable law case: there are numerous proofs for the central limit theorem, while the only available proof for the stable law case relies on the Nagaev-Guivarc’h spectral method. Contrary to the previous section, we will not give a complete proof of the result, rather a detailed sketch of the argument. Note that the behavior is also known for γ = 1/2: there to a  √ is always convergence n log n if f (0) = f dm, and Gaussian distribution, but with a normalization 1/ √ 1/ n otherwise (see [Gou04a]). The proofs are very similar, so we will not say more about this case. Remark 4.2. Refining the method of proof below, one can also obtain a speed of convergence towards the limit distribution, as in Theorem 3.6. See [Gou05] for the case γ < 1/2 (convergence to a Gaussian distribution), and [GM14] for γ > 1/2 (convergence to stable laws). 4.1. Stable laws. Stable laws are the probability distributions on R answering the following question: if X1 , X2 , . . . is a sequence of i.i.d. random variables, with partial sums Sn , what are the possible limits of (Sn − An )/Bn (where An and Bn are suitable normalizing sequences). Definition 4.3. A probability distribution on R (which is not a Dirac measure) is stable if it arises as such a limit.

174

´ ¨ SEBASTIEN GOUEZEL

For instance, a Gaussian distribution is stable (just take Xi to be in L2 ), but there are other examples: Example 4.4. Assume that the Xi are i.i.d., bounded from below and satisfy P(Xi  z) ∼ z −β when z → ∞, for some β ∈ (1, 2). Then, for t > 0, a simple computation shows that we have E(eitXi ) = 1 + itA + ctβ (1 + o(1)) when t → 0, where A = E(Xi ) and c is a complex number. It follows that, for any t > 0,  Sn −nA  1/β 1/β it = E(eitXi /n )n e−nitA/n E e n1/β !n 1/β β = exp itA/n1/β + ctβ /n(1 + o(1)) e−nitA/n → ect . Since our random variable are real valued, their characteristic functions satisfy E(e−itY ) = E(eitY ). Hence, convergence of the characteristic functions for t  0 implies the same convergence for all t ∈ R. This shows that the characteristic function of (Sn − nA)/n1/β converges pointwise to a continuous function, and therefore that this sequence converges in distribution. The limit W is a stable law by definition, with an explicit characteristic function (in particular, it is not Gaussian). One checks that it satisfies P(W > z) ∼ c z −β and P(W < −z) = o(z −β ) when z → ∞, for some c > 0. This example is significant: all non-Gaussian stable laws have the same kind of behavior, characterized by an exponent β ∈ (0, 2) called the index of the stable law. Their tails are heavy, of the order z −β . Their characteristic function are explicit, of the form E(eitW ) = exp(itA + ctβ ) for t > 0 (for some A ∈ R and c ∈ C). The complex number c is related to a skewness parameter, in [0, 1], parameterizing the balance between the tails at +∞ and −∞ – the above example is called totally asymmetric since the tails at −∞ are negligible with respect to the tails at +∞, corresponding to a skewness parameter 1. The precise classification, due to L´evy, can for instance be found in [Fel66, Chapter XVII]. It will not be important for us, since we will only encounter the totally asymmetric stable laws that we have described in the above example. Let us explain heuristically  why stable laws show up for intermittent maps. Consider a function f with f dm = 0 (so that the growth of Sn f is typically sublinear) and f (0) > 0. If x is very close to 0, so are T (x), and T (T (x)), and so on. In particular, the Birkhoff sums Sn f (x) will grow linearly, like nf (0), until T n x is far away from 0. The quantities one adds to the Birkhoff sums until one regains independence behave like ϕ · f (0), where ϕ is the time to drift away from 0. If the fixed point 0 of T is very neutral (i.e., if γ is large), then ϕ has heavy tails, and the Birkhoff sums really behave like the addition of random variables with heavy tails, just as in Example 4.4. The rigorous proof of Theorem 4.1 follows this intuition. It is done in two main steps, that we will explain with more details in the following paragraphs: (1) to regain some independence, we replace long excursions close to 0 (in which there are strong correlations) by a single step: in dynamical terms, this is an inducing process. We will show the convergence to a stable law for the induced map. (2) Then, we need to go back from the induced map to the original map, using general arguments. Since there is no completely general exposition of this argument in the literature, we will give all details here.

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

175

4.2. Stable limits for the induced map. Let Y = (1/2, 1]. The first return time to Y of x ∈ Y is ϕY (x) = inf{n  1 : T n x ∈ Y }. The combinatorics of T ensure that all points come back to Y . We define the induced map (or first return map) TY : Y → Y by TY (x) = T ϕY (x) (x). Note that TY is uniformly expanding, i.e., TY  α > 1, since T  = 2 on (1/2, 1]. Let x0 = 1, and xn+1 = T −1 xn ∩ [0, 1/2]. The sequence xn converges to 0. One checks that xn ∼ c/n1/γ , xn − xn+1 ∼ c /n1+1/γ . The points in [xn+1 , xn ] are precisely those that take n iterates to reach Y . Let yn = T −1 (xn ) ∩ Y . Then the interval Yn = [yn , yn−1 ] comes back to Y precisely in n steps, i.e., ϕY = n on Yn . Moreover, TY has full branches, i.e., TY (Yn ) = Y for all n  1.  Take a C 1 function f for which we want to prove a limit theorem, with f dm = 0. Define on Y the induced function 

ϕY (x)−1

fY (x) =

f (T k x).

k=0

The Birkhoff sums of fY for the map TY (that we will denote by SnY fY ) form a subsequence of the Birkhoff sums of f for the map T , corresponding precisely to those times where T n x comes back to Y . To prove a limit theorem for Sn f , the idea will be to first prove a limit theorem for SnY fY using the Nagaev-Guivarc’h spectral method (this step is due to Aaronson and Denker [AD01]) and then deduce a result for Sn f (using general arguments of [MT04] and [Zwe07]). The original argument for Theorem 4.1, given in [Gou04a], relies only on the spectral method and is more complicated (but it allows extensions that are not available by more elementary methods, for instance to the local limit theorem [Gou05]).  Proposition 4.5. Let γ ∈ (1/2, 1). Consider a C 1 function f with f dm = 0 and f (0) > 0. There exists a stable law W of index 1/γ such that SnY fY /nγ converges in distribution (on the probability space (Y, Leb|Y / Leb(Y ))) towards W . This proposition is proved using the spectral method. We first need to understand the distribution of fY . On the interval Yn (which has a measure ∼ C/n1+1/γ ),  k this function is equal to nf (0) + o(n): indeed, fY (x) is the sum n−1 k=0 f (T x), and among the T k x most are very close to 0. Writing LebY = Leb|Y / Leb(Y ), we obtain: ∞ ∞   (4.1) LebY {x : fY (x)  z} ∼ LebY (Yn ) ∼ C/n1+1/γ ∼ C  /z 1/γ . n=z/f (0)

n=z/f (0)

This shows that the function fY has heavy tails, of the order 1/z 1/γ , just as in Example 4.4. We expect that the map TY , being uniformly expanding, will give enough independence to ensure that SnY fY behaves like a sum of independent random variables. In this way, the convergence to a stable law would follow from Example 4.4. We define a transfer operator L for the map TY as in Section 3, in (3.1). Our map TY has infinitely many branches, but it is uniformly expanding so that most arguments of Section 3 still work: one checks that L, acting on the space B of C 1

176

´ ¨ SEBASTIEN GOUEZEL

functions on Y , is a quasi-compact operator, and that it has a unique eigenvalue of modulus 1, at 1. Moreover, this eigenvalue is simple. The only difficulty is to check the distortion lemma 3.3, even for the branches of TY , since there are infinitely many of them. This is an elementary computation, see [You99] (one can also give a proof using the fact that the branches of T have a negative Schwarzian derivative). Let us define perturbed operators Lt , for small t, by Lt u = L(eitfY u). By definition, it satisfies as in Section 3 (see Lemma 3.2) the identity  Y itSn fY ) = Lnt 1 dLebY . E(e Since the function fY is unbounded, eitfY u is in general not C 1 even if u is. However, the operator L has an additional averaging effect (and it gives a small weight to regions where the derivative of fY is large). One can therefore check that Lt u is C 1 if u is C 1 , i.e., Lt maps B to itself. Even more, the same computation gives that (4.2)

Lt − LB→B  C|t|.

We are almost in a situation to apply Theorem 2.4. There is one assumption of this theorem which is not satisfied, of course (otherwise, SnY fY would satisfy a central limit theorem, while we expect it to converge, suitably renormalized, to a stable law): the family t → Lt is continuous by (4.2), but it is not C 2 . Copying the first steps of the proof of Theorem 2.4, we obtain the following. Denote by λ(t) the dominating eigenvalue of Lt , then (4.3)

E(eitSn f ) = c(t)λ(t)n + dn (t),

where c(t) tends to 1 when t → 0, and dn L∞ ([−δ,δ]) → 0. However, we do not know the expansion of λ(t) for small t. If we could prove that this expansion is similar to the expansion of E(eitfY ), then we could follow the argument in Example 4.4 and get the desired limit theorem (this is the essence of the Nagaev-Guivarc’h spectral method: follow the proofs of the independent case). It remains to study λ(t). We do it by hand, instead of relying on an abstract argument such as smoothness in Theorem 2.4. Denote by ξ˜t = Πt (1) an eigenfunction of Lt for its dominating eigenvalue λ(t) (where Πt denotes the corresponding spectral projection). By Proposition 2.3, t → Πt is as smooth as t → Lt . With (4.2), we get ξ˜t − ξ˜0 C 1 = O(|t|). The function ξ˜0 is the density of the invariant measure,   it is positive and therefore satisfies ξ˜0 > 0. We deduce that ξ˜t = 0 for small  t. Hence, we can define a normalized eigenfunction ξt = ξ˜t / ξ˜t dLebY . It still satisfies ξt − ξ0 C 1 = O(|t|). Integrating the equation λ(t)ξt = Lt ξt , we get:    λ(t) = Lt ξt dLebY = Lt (ξt − ξ0 ) dLebY + Lt ξ0 dLebY   = (Lt − L0 )(ξt − ξ0 ) dLebY + Lt ξ0 dLebY , where we have inserted L0 (ξt − ξ0 ) in the second line since its integral vanishes by our normalization choice for ξt . In the first term, Lt − L0  = O(|t|) and ξt − ξ0  = O(|t|), hence this term is O(t2 ). In the second term,     Lt ξ0 dLebY = L(eitfY ξ0 ) dLebY = eitfY ξ0 dLebY = eitfY dmY ,

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

177

where mY is the measure with density ξ0 (it is the normalized restriction of the absolutely continuous invariant measure to Y ). We have proved that (4.4)

λ(t) = EmY (eitfY ) + O(t2 ).

The function fY has heavy tails, as we explain in (4.1) (there, we did the computation for Lebesgue measure, but the density ξ0 of mY is continuous and the heavy tails all come from small neighborhoods of 1/2, so the same estimate follows for mY ). Hence, the asymptotic expansion in Example 4.4 gives, for t > 0, EmY (eitfY ) = 1 + itEmY (fY ) + ct1/γ (1 + o(1)). Since fY is the function induced by f on Y , we have EmY (fY ) = 2

1 m(Y ) Em (f ) 1/γ

= 0.

Since 1/γ < 2, the term O(t ) in (4.4) is negligible with respect to t . We get, for t > 0, λ(t) = 1 + ct1/γ (1 + o(1)). We can then use (4.3) and repeat the computation in Example 4.4 to deduce that SnY fY /nγ converges to a stable law of index 1/γ. This proves Proposition 4.5.  4.3. Inducing limit theorems. In this paragraph, we explain how (under suitable assumptions) limit theorems for an induced map imply limit theorems for the original map. This statement is inspired by [MT04] and [CG07]. Definition 4.6. A continuous function L : R∗+ → R∗+ is slowly varying if, for any λ > 0, L(λx)/L(x) → 1 when x → ∞. A function f is regularly varying with index d if it can be written as xd L(x) where L is slowly varying. A sequence an is regularly varying with index d if there exists a function f , regularly varying with index d, such that an = f (n). Definition 4.7. A family of real random variables (Xi )i∈I is tight if there is no possible loss of mass at infinity, i.e., for all ε > 0, there exists M > 0 such that P(|Xi |  M ) < ε for all i ∈ I. For instance, a sequence of random variables which converges in distribution is tight. Theorem 4.8. Let T : X → X be an ergodic probability preserving map, let α(n) and Bn be two sequences of integers which are regularly varying with positive indexes, let An ∈ R, and let Y ⊂ X be a subset with positive measure. We will denote by mY := m|Y /m(Y ) the induced probability measure. Let ϕ : Y → N∗ be the first return time to Y for T , and TY = T ϕ : Y → Y the induced Consider a measurable function f : X → R and define fY : Y → R by map. ϕ−1 n−1 fY = k=0 f ◦ T k . Let us define the sequence of Birkhoff sums SnY fY = k=0 fY ◦ Y TY . Assume that (Sn fY − An )/Bn converges in distribution (with respect to mY ) to a random variable W . Additionally, assume either that (4.5)

SnY ϕ − n/m(Y ) tends in probability to 0 and α(n)

max |SkY fY |/Bn is tight

0kα(n)

or (4.6)

SnY ϕ − n/m(Y ) is tight and α(n)

max |SkY fY |/Bn tends in probability to 0.

0kα(n)

Then (Sn f − Anm(Y ) )/Bnm(Y ) converges in distribution (with respect to m) to W .

178

´ ¨ SEBASTIEN GOUEZEL

The intuition behind this theorem is the following. For x ∈ Y , one can write Y SN f (x) = Sn(x,N ) fY (x) + RN (x), where n(x, N ) is the number of visits of x to Y until time N , and RN (x) is a remainder term corresponding to the last excursion. By Birkhoff theorem, n(x, N ) is close to N m(Y ). Hence, we may write Y SN SN f (x) − AN m(Y ) m(Y ) fY (x) − AN m(Y ) = BN m(Y ) BN m(Y ) N m(Y )

+

Y Sn(x,N )−N m(Y ) fY (TY

BN m(Y )

x)

+

RN (x) . BN m(Y )

The term on the right of the first line converges in distribution (with respect to mY ) to W , by assumption. To conclude, we should show that the remaining two terms are small. For the last one, this follows from a general argument. For the middle one, one should control the growth of the Birkhoff sums of fY , on a time scale given by the fluctuations of SnY ϕ around its average n/m(Y ). This is exactly the role of the assumptions (4.5) or (4.6), showing that such a good control holds for the time scale α(n). To complete the argument, one should finally understand how the above convergence in distribution with respect to mY yields a corresponding convergence with respect to m. The precise technical implementation of the above idea to prove Theorem 4.8 is given in Subsection 4.6, relying on intermediate results of independent interest that are described in Subsections 4.4 and 4.5. Before this, we explain how Theorem 4.1 follows from the above theorem. Proof of Theorem 4.1. We concentrate on Case (2), i.e., the case of stable laws, and indicate briefly at the end the modifications  for the other cases. Let γ ∈ (1/2, 1), consider a C 1 function f with f dm = 0 and f (0) > 0. By Proposition 4.5, the sequence SnY fY converges in distribution to a stable law (for the measure LebY or for the measure mY , this is equivalent according to Theorem 4.9 below). Denote by ϕ the first return time to Y . The function ϕ − 1/m(Y ) is the induced function on Y of the function g equal to 1 − 1/m(Y ) on Y , and 1 outside of Y . Hence, Proposition 4.5 shows that (SnY ϕ−n/m(Y ))/nγ converges in distribution to a stable law. (Strictly speaking, g is not C 1 , but it is C 1 on [0, 1/2] and [1/2, 1], which is sufficient for the proposition.) We will apply Theorem 4.8 (more precisely (4.6)) with α(n) = Bn = nγ . We have just shown that (SnY ϕ − n/m(Y ))/nγ is tight. To conclude, we should show that max0kN |SkY fY |/N tends in probability to 0 (where N = nγ ). It suffices to show that SkY fY /k tends almost surely to 0. This follows from Birkhoff’s ergodic theorem since fY is integrable. This concludes the proof of the theorem in the stable law case. To handle the other cases, one should show that the induced function fY satisfies a central limit theorem for TY , and then induce back to T . When γ < 1/2, or γ > 1/2 and f (0) = 0 = f dm, a simple computation shows that the induced function is square integrable. Hence, one can for instance apply Theorem 2.4 to obtain a central limit theorem for fY , or rely on the martingale arguments of [Liv96]. To go back to the original system, one may again apply Theorem 4.8. 

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

179

4.4. Limit theorems do not depend on the reference measure. The following theorem has been proved by Eagleson [Eag76] and popularized by Zweim¨ uller (in much more general contexts, [Zwe07]). It is instrumental in the proof of Theorem 4.8. Theorem 4.9. Let T : X → X be a transformation preserving an ergodic probability measure m. Let f : X → R be measurable, let An ∈ R, let Bn tend to ∞ and let m be an absolutely continuous probability measure. Then (Sn f − An )/Bn converges in distribution to a random variable W with respect to m if and only if it satisfies the same convergence with respect to m .  Proof. For the proof, let us write M (n, g, ϕ) = g((Sn f − An )/Bn )ϕ dm, where g is a bounded Lipschitz function and ϕ is an integrable function. We claim that (4.7)

M (n, g, ϕ) − M (n, g, ϕ ◦ T ) → 0 when n → ∞.

Let us first prove this assuming that ϕ is bounded. Since g is Lipschitz and bounded, it satisfies |g(a) − g(b)|  C min(1, |a − b|). Therefore, |M (n, g, ϕ)−M (n, g, ϕ ◦ T )| #  5 # 6 5 6 # # Sn f (T x) − An Sn f (x) − An = ## −g ϕ(T x) dm(x)## g Bn Bn   C min(1, |Sn f (x) − Sn f (T x)|/Bn ) dm  = C min(1, |f (x) − f (T n x)|/Bn ) dm   C (min(1, |f |/Bn ) + min(1, |f | ◦ T n /Bn )) dm  = 2C min(1, |f |/Bn ) dm. This quantity converges to 0 when n → ∞, since Bn → ∞. This proves (4.7) for bounded ϕ. In general, we have (4.8)

|M (n, g, ψ)|  g∞ ψL1 (m) .

Hence, the general case of (4.7) follows by writing ϕ = ϕ1 + ϕ2 with ϕ1 bounded and ϕ2 L1 (m)  ε: we obtain lim sup|M (n, g, ϕ) − M (n, g, ϕ ◦ T )|  2εg∞ . Assume now that (Sn f − An )/Bn converges in distribution with respect to m towards W . Write dm = ϕ dm with ϕ integrable (and of integral 1). Let g be a bounded Lipschitz function. Then M (n, g, ϕ) → E(g(W )), hence M (n, g, ϕ ◦ T k ) → E(g(W )) for any fixed k, by (4.7). Therefore, M (n, g, Sk ϕ/k) → E(g(W )). Let ε > 0, choose k large enough so that Sk ϕ/k − 1L1 (m)  ε. Then lim sup|M (n, g, 1) − E(g(W ))|  lim sup|M (n, g, 1) − M (n, g, Sk ϕ/k)| + lim sup|M (n, g, Sk ϕ/k) − E(g(W ))|. The first term is at most ε by (4.8), while the second one is 0. Hence, M (n, g, 1) converges to E(g(W )). This proves the convergence of (Sn f − An )/Bn to W with respect to m.

´ ¨ SEBASTIEN GOUEZEL

180

Conversely, if (Sn f −An )/Bn converges to W with respect to m, the convergence with respect to m follows in the same way: lim sup|M (n, g, ϕ) − E(g(W ))|  lim sup|M (n, g, ϕ) − M (n, g, Sk ϕ/k)| + lim sup|M (n, g, Sk ϕ/k) − M (n, g, 1)| + lim sup|M (n, g, 1) − E(g(W ))|. The third term tends to 0 by assumption, the first one tends to 0 by (4.7), and the second one is at most CSk ϕ/k − 1L1 (m) , which can be made arbitrarily small by choosing k large enough.  4.5. Limit theorems do not depend on random indices. In this paragraph, we show that random time changes, if they do not deviate too much from linearity, do not change the validity of limit theorems. Again, this statement is instrumental in the proof of Theorem 4.8. Theorem 4.10. Let T : X → X be a probability preserving map, and let α(n) and Bn be two sequences of integers which are regularly varying with positive indexes. Let also An ∈ R. Let f : X → R measurable be such that (Sn f − An )/Bn converges in distribution to a random variable W . Let also t1 , t2 , . . . be a sequence of integer valued measurable functions on X, and let c > 0. Assume that either (4.9)

tn − cn tends in probability to 0 and α(n)

max |Sk f |/Bn is tight

0kα(n)

or (4.10)

tn − cn is tight and α(n)

max |Sk f |/Bn tends in probability to 0.

0kα(n)

Then the sequence (Stn f − Acn )/Bcn converges in distribution to W . As in Theorem 4.8, the assumptions (4.9) or (4.10) say that tn deviates from its average behavior cn roughly on an order α(n), which is sufficiently small so that the Birkhoff sums of f do not vary too much on this time scale. Proof. We will show that, under (4.9) or (4.10), there exists a sequence β(n) of integers such that (4.11)

|tn − cn|/β(n) and

max

|Sk f |/Bn both tend in probability to 0.

0k2β(n)

Let us show how it implies the theorem. It is sufficient to prove that # #  " # Stn (x) f − Scn f # # # m x : # #  ε → 0. Bcn Abusing notations, we will omit the integer parts. The measure of the set in the last equation is bounded by m{|tn −cn|  β(n)}+m{∃i ∈ [γ(n), β(n)], |Scn+i f −Scn f |  εBcn }, where γ(n) = − min(cn, β(n)). The measure of the first set tends to 0 by (4.11). If x belongs to the second set, then either |Scn f − Scn+γ(n) f |  εBcn /2 or |Scn+i f − Scn+γ(n) f |  εBcn /2. In both cases, max0k2β(n) |Sk f |(T cn+γ(n) x)  εBcn /2. Since Bn is regularly varying, there exists C such that Bcn /2  CBn . Hence, the measure of the second set is bounded by m{max0k2β(n) |Sk f |  CεBn }, which also tends to 0 by (4.11). To conclude the proof, it is therefore sufficient to construct β(n) satisfying (4.11).

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

181

Lemma 4.11. Let Yn be a sequence of real random variables tending in probability to 0. There exists a non-decreasing sequence A(n) → ∞ such that A(n)Yn still tends in probability to 0. Proof. For k > 0, let N (k) be such that, for n  N (k), P(|Yn | > 1/k2 )  1/k. We can also assume that N (k + 1) > N (k). Define A by A(n) = k when N (k)  n < N (k + 1), this sequence tends to infinity. Consider k ∈ N, and n  N (k). Let   k be such that N ()  n < N ( + 1). Then P(A(n)|Yn | > 1/k)  P(A(n)|Yn | > 1/) = P(|Yn | > 1/2 )  1/ = 1/A(n). Hence, P(A(n)|Yn | > 1/k) tends to 0 for any k.



Lemma 4.12. Let Bn be a regularly varying sequence with positive index, and let Yn be a sequence of real random variables such that Yn /Bn converges in probability to 0. Then there exists a non-decreasing sequence ϕ(n) = o(n) such that Yn /Bϕ(n) still converges in probability to 0. We can also ensure ϕ(n + 1)  2ϕ(n) for any n, and ϕ(n) → ∞. Proof. Applying the previous lemma to Yn /Bn , we obtain a non-decreasing sequence A(n) tending to infinity such that A(n)Yn /Bn converges in probability to 0. Replacing A(n) with min(A(n), log n) if necessary, we can assume A(n) = O(log n). Write Bn = nd L(n) where L is slowly varying. Let ϕ(n) be the integer part of n/A(n)1/(2d) , it satisfies the equation ϕ(n + 1)  2ϕ(n) since A is nondecreasing, tends to infinity since A(n) = O(log n), and Yn A(n)Yn Bn = · . Bϕ(n) Bn A(n)Bϕ(n) The first factor tends to 0 in probability, while the second one is equivalent to nd L(n) . A(n)(nd /A(n)1/2 )L(n/A(n)1/(2d) ) By Potter’s bounds [BGT87, Theorem 1.5.6], for any ε > 0, there exists C > 0 such that L(n)/L(n/A(n)1/(2d))  CA(n)ε . Taking ε = 1/4, we obtain that the last equation is bounded by C/A(n)1/4 , and therefore tends to 0.  We can now prove (4.11). Assume first (4.9). Applying the previous lemma to Yn = tn − cn, we obtain a non-decreasing sequence ϕ(n) = o(n) such that (tn − cn)/α(ϕ(n)) → 0. Let β(n) = α(ϕ(n))/2, then (4.12)

max

|Sk f |/Bn =

0k2β(n)

Bϕ(n) Bn

max

|Sk f |/Bϕ(n) .

0kα(ϕ(n))

The factor Bϕ(n) /Bn tends to 0 since ϕ(n) = o(n) and Bn is regularly varying with positive index. The second factor is tight by assumption. Hence, (4.12) tends in probability to 0, as desired. Assume now (4.10). Applying the previous lemma to Yn = max0kα(n) |Sk f |, we obtain a non-decreasing sequence ϕ(n) = o(n) such that (4.13)

max |Sk f |/Bϕ(n) tends in probability to 0.

0kα(n)

Let ψ(n) be the smallest integer p such that ϕ(p)  n/2. Then ϕ(ψ(n))  n/2. Moreover, ϕ(ψ(n)−1) < n/2, hence ϕ(ψ(n)) < n by the inequality ϕ(k+1)  2ϕ(k).

´ ¨ SEBASTIEN GOUEZEL

182

Therefore, Bn  C −1 Bϕ(ψ(n)) since the sequence Bn , being regularly varying with positive index, is increasing up to a constant multiplicative factor. Let β(n) = α(ψ(n))/2, we get max

|Sk f |/Bn =

0k2β(n)

max

|Sk f |/Bn  C

0kα(ψ(n))

max

|Sk f |/Bϕ(ψ(n)) .

0kα(ψ(n))

This converges to 0 in probability by (4.13). Since ϕ(ψ(n))  n/2 and ϕ(k) = o(k), we have n = o(ψ(n)). Since α is regularly varying with positive index, this yields α(n) = o(β(n)). In particular, the tightness of (tn − cn)/α(n) implies the convergence to 0 of (tn − cn)/β(n). We have proved (4.11) as desired.  4.6. Proof of Theorem 4.8. In this paragraph, we prove Theorem 4.8. Going to the natural extension, we can without loss of generality assume that T is invertible. Abusing notations, we will write Bnm(Y ) instead of Bnm(Y ) . We will prove that (Sn f − Anm(Y ) )/Bnm(Y ) converges to W in distribution with respect to mY : this will imply the desired result by Theorem 4.9, since mY is absolutely continuous with respect to m. For x ∈ Y and N ∈ N, let n(x, N ) = Card{1  i < N : T i x ∈ Y } denote the number of visits of x to Y . By construction, it satisfies n(x, N )  k ⇐⇒ SkY ϕ(x) < N. ψ(x) Define also a function H on X by H(x) = k=1 f (T −k x), where ψ(x) = inf{n  1 : T −n x ∈ Y }. By construction, for x ∈ Y , (4.14)

Y N SN f (x) = Sn(x,N ) fY (x) + H(T x).

Moreover, H ◦ T N /BN m(Y ) converges to 0 in distribution on X (since the measure is invariant and Bn tends to infinity), and therefore on Y . To prove the theorem, it is therefore sufficient to show that Y Sn(x,N ) fY − AN m(Y ) → W. BN m(Y ) This will follow from Theorem 4.10 if we can check its assumptions (4.9) or (4.10) for tN (x) = n(x, N ) and c = m(Y ). The assumptions concerning SkY fY are already contained in (4.5) or (4.6) respectively, we only have to check the assumptions about tN . Birkhoff’s theorem ensures that n(x, N ) = N m(Y ) + o(N ) for almost every x. Therefore, along any subsequence Nk for which α(Nk )  δNk with δ > 0, we get that |n(x, Nk ) − Nk m(Y )|/α(Nk ) converges in probability to 0, and there is nothing left to prove. Thus, it is sufficient to consider only values of N along which α(N )/N → 0. For any a > 0, we have by (4.14) "  , n(x, N ) − N m(Y ) Y  a = mY SN mY m(Y )+α(N )a ϕ < N α(N ) ' Y SN m(Y )+α(N )a ϕ − (N m(Y ) + α(N )a)/m(Y ) = mY α(N m(Y ) + α(N )a) " a α(N ) 0. Hence, ' Y )  " Sp(N ) ϕ − p(N )/m(Y ) n(x, N ) − N m(Y ) (4.15) mY  a  mY < −Ca α(N ) α(p(N )) for some integer p(N ) which , tends to infinity with - N. n(x,N )−N m(Y ) < −a . Using again α(N ) = o(N ), we Let us now study mY α(N ) get N m(Y ) − α(N )a  N m(Y )/2 > 0 for large enough N . Hence, "  , n(x, N ) − N m(Y ) Y mY < −a = mY SN m(Y )−α(N )a ϕ  N α(N ) ' Y SN m(Y )−α(N )a ϕ − (N m(Y ) − α(N )a)/m(Y ) = mY α(N m(Y ) − α(N )a) " a α(N )  . α(N m(Y ) − α(N )a) m(Y ) α(N ) α(N m(Y )−α(N )a)

Since

 (4.16)

mY

 C > 0, we obtain

n(x, N ) − N m(Y ) < −a α(N )

'

"  mY

Y Sq(N ) ϕ − q(N )/m(Y )

α(q(N ))

)  Ca ,

for some q(N ) tending to infinity with N . The equations (4.15) and (4.16) together show that the tightness (resp. the convergence in probability to 0) of (SnY ϕ − n/m(Y ))/α(n) implies the tightness (resp. the convergence in probability to 0) of (n(x, N ) − N m(Y ))/α(N ). We can therefore apply Theorem 4.10, to conclude the proof.  5. The almost-sure invariance principle In this section, we describe another application of the Nagaev-Guivarc’h method to prove a limit theorem for dynamical systems, the almost sure invariance principle. This limit theorem ensures that the trajectories of a process can be coupled with a Brownian motion so that, almost surely, the difference between the processes is negligible with respect to their size. The precise definition is the following (we formulate it for vector-valued observables since the Nagaev-Guivarc’h method applies directly to this case, while a lot of martingale-based arguments are restricted to real-valued observables). Definition 5.1. Let ρ ∈ (0, 1/2] and let Σ2 be a (possibly degenerate) d × d symmetric nonnegative matrix. A random process (X0 , X1 , . . . ) taking its values in Rd satisfies the almost sure invariance principle with error rate o(nρ ) and limiting covariance Σ2 if there exist a probability space Ω and two processes (X0∗ , X1∗ , . . . ) and (B0 , B1 , . . . ) on Ω such that: (1) The processes (X0 , X1 , . . . ) and (X0∗ , X1∗ , . . . ) have the same distribution. (2) The random variables B0 , B1 , . . . are i.i.d. and distributed like N (0, Σ2 ).

´ ¨ SEBASTIEN GOUEZEL

184

(3) Almost surely in Ω, # #n−1 n−1 #  ## # ∗ X − B # = o(nρ ). # # # =0

=0

This property implies the central limit theorem, but it is much more precise (for instance, it readily gives the law of the  iterated logarithm, describing the almost sure growth rate of the partial sums n−1 =0 X ). Such a result is well known for sums of i.i.d. random variables, but it is delicate: the optimal error rate (O(n1/p ) for random variables in Lp ) has only been proved for real-valued random variables in 1975 for p  4 [KMT75] (the case p < 4 is easier, and can be handled with different methods, for instance using Skorokhod embedding). For Rd -valued random variables, the result has been proved even more recently [Ein89, Za˘ı98], and is really difficult. The almost sure invariance principle can be proved for some dynamical systems using martingale or approximation arguments, if there is a well behaved underlying filtration (see for instance [HK82, DP84, MN09]). In this section, we describe the results of [Gou10a] relying on the Nagaev-Guivarc’h spectral method, that give better error bounds and apply to a whole range of dynamical systems (for instance those described at the end of Section 3). To simplify notations, we write tx instead of -t, x. for t ∈ Rd and x ∈ Rd . The main theorem is the following. Theorem 5.2. Let X1 , X2 , . . . be a centered stationary sequence of Rd -valued random variables, with partial sums denoted by Sn . Assume that there exist a complex Banach space B, a family of operators Lt acting on B (for t ∈ Rd with |t|  δ) and ξ ∈ B, ν ∈ B ∗ such that: (1) strong coding: for all n ∈ N, for all t0 , . . . , tn−1 ∈ Rd with |ti |  δ, E(ei

n−1 =0

t X

) = -ν, Ltn−1 Ltn−2 · · · Lt1 Lt0 ξ..

(2) spectral description: ress (L0 ) < 1, and L0 has a single eigenvalue of modulus  1, located at 1. It is an isolated eigenvalue, of multiplicity one. (3) weak regularity: there exists C > 0 such that for all |t|  δ and all n ∈ N, we have Lnt B→B  C. (4) there exists p > 2 such that Xi Lp  C. √ Then there exists a matrix Σ2 such that Sn / n converges in distribution to N (0, Σ2 ). Moreover, the process (X )∈N satisfies the almost sure invariance principle with error rate o(nρ ) for all ρ with ρ>

p 1 1 = + . 4p − 4 4 4p − 4

When p = ∞, the condition on the error rate becomes ρ > 1/4. This bound is rather good (in particular dimension-independent), although it is much weaker than the result for i.i.d. random variables (one should be able to take any ρ > 0). In really non-independent situations, all the methods seem to be stuck at this 1/4boundary, excepted the very recent paper [BLW14]. This theorem should be compared to the usual Nagaev-Guivarc’h result giving the central limit theorem, Theorem 2.4. • The coding assumption (1) in Theorem 5.2 is stronger (instead of looking at only one Fourier parameter, one should be able to change it with time),

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

185

but this is minor since operators that satisfy the usual coding property of Theorem 2.4 usually also satisfy the strong coding property. For instance, one readily checks that the proof of Lemma 3.2 also gives the strong coding property for dynamical systems. • The spectral assumption (2) on L0 is the same. • The regularity assumption (3) is considerably weaker in Theorem 5.2: since it does not require any continuity, the operator Lt for t = 0 may not have a dominating eigenvalue λ(t). In particular, the usual strategy of the Nagaev-Guivarc’h theory (reduce everything to the study of the dominating eigenvalue) can not work! • The price to pay for the very weak condition (3) is an additional condition (4) of Lp boundedness for some p > 2. This condition is not needed in Theorem 2.4 (and, indeed, significant applications are to merely L2 functions). However, it is natural for the almost sure invariance principle since it is necessary even for i.i.d. random variables. Remark 5.3. If t → Lt is continuous, then the weak regularity condition (3) is satisfied. Indeed, in this case, Proposition 2.3 shows that one can write for small t a decomposition Lnt = λ(t)n Πt + Qnt where Πt is the eigenprojection corresponding to the dominating eigenvalue λ(t) of Lt , and Qt = (I − Πt )Lt corresponds to the rest of the spectrum of Lt . It satisfies Qnt   Cr n for some r < 1. To prove the condition (3), it is therefore sufficient to show that |λ(t)|  1 for small t. We have (5.1)

E(eit

n−1 =0

X

) = -ν, Lnt ξ. = λ(t)n -ν, Πt ξ. + -ν, Qnt ξ. = λ(t)n -ν, Πt ξ. + O(r n ).

When t → 0, by continuity, the quantity -ν, Πt ξ. converges to -ν, Π0 ξ. = 1. In particular, if t is small enough, -ν, Πt ξ. = 0. As the right hand side of (5.1) should remain bounded by 1 in modulus, this gives |λ(t)|  1 as desired. The Lp condition (4) hints at the fact that the proof of the theorem is not spectral or dynamical, but rather probabilistic in spirit. (In this sense, although the statement looks very similar to Theorem 2.4, the above theorem is not a genuine application of the spectral method.) Indeed, Theorem 5.2 will follow from a corresponding statement relying only on a technical decorrelation condition that we now describe. This condition ensures that the characteristic function of the process we consider is close enough to that of an independent process. The condition, denoted by (H), is the following: there exist δ > 0 and C, c > 0 such that, for all n, m > 0, for all b1 < b2 < · · · < bn+m+1 , for all k > 0 and for all t1 , . . . , tn+m ∈ Rd with |tj |  δ, we have #  b   b  #  j+1 −1 j+1 +k−1 n+m i n tj X +i j=n+1 tj X # =bj =bj +k #E e j=1 #  n  n+m bj+1 +k−1  ## b  j+1 −1 (H) i j=1 tj X i tj X #  =bj =bj +k −E e · E e j=n+1 # #  C(1 + max|bj+1 − bj |)C(n+m) e−ck . This assumption means that, if we group the random variables in n + m blocks, then a gap of size k between two blocks yields characteristic functions that are

´ ¨ SEBASTIEN GOUEZEL

186

exponentially close (in terms of k) to independent characteristic functions, with an error that is, for each block, polynomial in terms of the size of the block. This control is only required for Fourier parameters close to 0. This condition is of course true for independent random variables. Its main interest is that it is also satisfied under the assumptions of Theorem 5.2: Lemma 5.4. Under the assumptions of Theorem 5.2, the property (H) is satisfied. Proof. Write Π0 for the eigenprojection associated to the dominating eigenvalue 1 of L0 . Then  n b   b  j+1 −1 j+1 +k−1 i j=1 tj X +i n+m X j=n+1 tj =bj =bj +k E e 7 8 bn+m+1 −bn+m bn+2 −bn+1 k bn+1 −bn = ν, Ltn+m · · · Ltn+1 L0 Ltn · · · Lbt12 −b1 Lb01 ξ 8 7 bn+m+1 −bn+m bn+2 −bn+1 b −b · · · Ltn+1 (Lk0 − Π0 )Ltnn+1 n · · · Lbt12 −b1 Lb01 ξ = ν, Ltn+m 8 7 bn+m+1 −bn+m bn+2 −bn+1 b −b · · · Ltn+1 Π0 Ltnn+1 n · · · Lbt12 −b1 Lb01 ξ . + ν, Ltn+m The operators Lti all satisfy Ljti   C by the weak regularity assumption. As Lk0 − Π0   Cr k , we deduce that the term on the penultimate line is bounded by C n+m r k , which is compatible with (H). On the last line, the projection Π0 decouples both parts, hence one proves that this line is equal to  n  n+m bj+1 +k−1   b  j+1 −1 i tj X i tj X =bj =bj +k E e j=1 · E e j=n+1 + O(C m r bn+1 +k ). This proves (H).



The main probabilistic result is the following. Theorem 5.5. Let (X0 , X1 , . . . ) be a centered stationary sequence of Rd -valued random variables, in Lp for some p > 2, satisfying (H). Then  (1) The covariance matrix cov( n−1 =0 X )/n converges to a limiting matrix Σ2 .  √ 2 (2) The sequence n−1 =0 X / n converges in distribution to N (0, Σ ). (3) The process (X0 , X1 , . . . ) satisfies the almost sure invariance principle with limiting covariance Σ2 and error rate o(nρ ) for any ρ > p/(4p − 4). Theorem 5.2 readily follows from this theorem together with Lemma 5.4. Note that Theorem 5.5 admits a non-stationary version, implying a non-stationary version of Theorem 5.2 (see [Gou10a] for details). For simplicity, we stick to the stationary version in these notes. 5.1. Tools for the proof of Theorem 5.5. The proof of Theorem 5.5 only uses probabilistic tools (note that there is no dynamical system involved in the statement of the result, and no spectral assumption). It relies in a crucial way on the notion of coupling. Given Z1 : Ω1 → E1 and Z2 : Ω2 → E2 two random variables on two (possibly different) probability spaces, a coupling between Z1 and Z2 is a way to associate those random variables, generally in order to show that they are close in a suitable sense. Formally, a coupling between Z1 and Z2 is given by a probability space

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

187

Ω and two random variables Z1 : Ω → E1 and Z2 : Ω → E2 such that Zi is distributed like Zi . By considering the distribution of (Z1 , Z2 ) in E1 × E2 , one can without loss of generality take Ω = E1 × E2 , and Z1 and Z2 respectively the first and second projection. If E1 = E2 , one often tries to ensure that Z1 and Z2 are close, for instance by minimizing Z1 − Z2 Lp for some p, or P(Z1 = Z2 ). Definition 5.6. Let P , Q be two probability measures on a metric space. Their Prokhorov distance π(P, Q) is the smallest ε > 0 such that P (B)  ε + Q(B ε ) for any Borel set B, where B ε denotes the ε-neighborhood of B. This distance is symmetric, although this is not completely trivial from the definition. Its interest is that it makes it possible to construct good couplings, thanks to the following Dudley-Strassen theorem [Bil99, Theorem 6.9]. Theorem 5.7. Let X and Y be two random variables taking their values in a metric space, with respective distributions PX and PY . If π(PX , PY ) < c, there exists a coupling between X and Y such that P(d(X, Y ) > c) < c. To construct good couplings during the proof of Theorem 5.5, we will only have information about the characteristic functions of the processes under study. Hence, it will be important to estimate the Prokhorov distance just in terms of characteristic functions. This is done in the following lemma. Let d > 0 and N > 0. We consider RdN with the norm |(x1 , . . . , xN )|N = sup |xi |, 1iN

where |x| is the Euclidean norm of x ∈ Rd . Lemma 5.8. There exists a constant C(d) with the following property. Let F and G be two probability measures on RdN , with characteristic functions ϕ and γ. For all T > 0, 61/2 N  N 5  2 π(F, G)  F (|xj | > T ) + C(d)T d/2 |ϕ − γ| . RdN

j=1

Proof. Using an approximation argument, we may assume that F and G have densities f and g. For any Borel set B, F (B) − G(B)  F (B ∩ max|xj |  T ) + F (max|xj | > T ) − G(B ∩ max|xj |  T )  N  |f − g| + F (|xj | > T ).  |x1 |,...,|xN |T

j=1

As a consequence, π(F,  G) is bounded by the last line of this equation. To conclude, it suffices to estimate |x1 |,...,|xN |T |f − g|. We have  |f − g|  f − gL2 1|x1 |,...,|xN |T L2 = ϕ − γL2 (CT )dN/2 , |x1 |,...,|xN |T

since the Fourier transform is an isometry on L2 , up to a multiplicative factor  (2π)dN/2 . We illustrate the usefulness of the above tools with the following proposition.

188

´ ¨ SEBASTIEN GOUEZEL

Proposition 5.9. Let (X0 , X1 , . . . ) be a centered process, bounded in Lp for some p > 2, satisfying (H). For every η > 0, there exists C > 0 such that, for all m, n ∈ N, (m+n−1 ( (  ( ( ( X (  Cn1/2 . ( ( ( p−η =m

L

This kind of moment estimates is classical for a large class of weakly dependent processes. The interest of the proposition is that, here, those bounds are proved assuming only the assumption (H) on characteristic functions, which apparently does not give any information on moments. Proof. The crucial step is to obtain an L2 bound. The bound in Lp−η follows using the same techniques and Rosenthal inequality for sums of independent random variables [Ros70]. m+n−1 X 2L2 . We will show that ua+b is essentially Write un = maxm∈N  =m bounded by ua + ub , which gives the result inductively. To prove this, we dem+a+b−1 m+a−1 compose the sum X as Y1 + Y2 + Y3 where Y1 = X and =m m m+a+b−1 X , while Y is the sum over the central interval. If α is small Y3 = m+a+b α  2 enough, Y2 is negligible and can safely be ignored. Let Y1 and Y3 be two independent copies of Y1 and Y3 . The assumption (H) ensures that the characteristic function of (Y1 , Y3 ) is very close to that of (Y1 , Y3 ), for Fourier parameters close enough to 0. Then, we regularize things to eliminate large Fourier parameters, as follows. Let U be a fixed random variable in Lp , whose characteristic function is supported in {|t|  δ}, and denote by U1 , U3 , U1 , U3 four independent copies of U . Then the characteristic function of (Y1 + U1 , Y3 + U3 ) is everywhere close to that of (Y1 + U1 , Y3 + U3 ). Lemma 5.8 makes it possible to couple those random variables, with a difference that is very small on a very big part of the space. Forgetting about the U variables (that are bounded in Lp independently of m, n), we obtain Y1 + Y3 L2  Y1 − Y1 L2 + Y3 − Y3 L2 + Y1 + Y3 L2 . The first two terms are small, while the last one is equal to (Y1 L2 + Y3 L2 )1/2 by independence, and is therefore bounded by (ua + ub )1/2 . We have obtained the inequality ua+b  ua + ub + r(a, b), where r(a, b) is small enough not to be a serious problem.  2

2

5.2. Sketch of proof of Theorem 5.5. We will now give some ideas of the proof of Theorem 5.5. Let us consider a process (X0 , X1 , . . . ) satisfying the assumptions n−1 of this theorem. The first step of the proof is to show the convergence of cov( =0 X )/n to some matrix Σ2 . This is the same kind of argument as in the above moment control (reduction to the independent situation by a coupling argument, and careful control of the error), we will not say more about it. From this point on, we will also assume that Σ2 is non-degenerate (the degenerate case should be handled differently, it is in fact easier). Our goal is to prove the almost sure invariance principle (from which the central limit follows), with a good control on the error term. The strategy of the proof is very classical: we will use small blocks of variables that we will throw away, replacing them with gaps, and big blocks that we will couple with independent copies (the gaps giving enough independence, thanks to the assumption (H)). The almost sure invariance principle for independent variables

LIMIT THEOREMS IN DYNAMICAL SYSTEMS USING THE SPECTRAL METHOD

189

will then be applied, giving the desired result. One should be careful enough so that the small blocks are small enough to be negligible (this will be proved using moment estimates such as Proposition 5.9), and large enough to give enough independence. This method is usually implemented using blocks of polynomial size. In our case, we obtain better results by using a Cantor-like triadic approach, as follows. First, we write N as the union of the intervals [2n , 2n+1 ). In each of these intervals, we put a small (but not too small) block in the middle, then a smaller block in the middle of the two newly created intervals, then an even smaller block in the middle of the four remaining intervals, and so on. The interest of this construction is that, to obtain n well separated big blocks, the classical argument uses small blocks whose lengths add up to n2 , while here their lengths only add up to n, making the final estimates better. The precise construction depends on two parameters β ∈ (0, 1) and ε < 1 − β. Let f = f (n) = 'βn(. We decompose [2n , 2n+1 ) as a union of F = 2f intervals (In,j )0j 0, δ > 0 is the probability (i.e. the λ-measure of the set of w ∈ B1d−1 ) that

(i)

    2 ! ξ 2   ∩ (P − y)S(w ) = ∅, Zcaps d−1 , r + r 1 − b , w r

208

JENS MARKLOF

(ii) 

ξ r d−1

+r

 2 2 1 − b 2 + r 1 − b2 , r(w − w) 5    d−1 + 0, d−1 × Brδ ∩ (P − y)S(w ) = ∅, r

cf. Figure 5. We assume here and in the following that the probability that there are two or more elements of (P − y)S(w ) in a set of small diameter is small. (This can be proved for all examples discussed in these lectures.) Note that for   d−1 r 0 (9.3) D(r) = t 0 r −1 1d−1 we have (9.4)

 Z

and (9.5)

r

 Zcaps

 , r D(r) = Z(ξ, 1) d−1 ξ

 , r D(r) −−−→ Z(ξ, 1). d−1 ξ

r

r→0

Up to a small error (as r → 0) we can therefore replace (i), (ii) by (i) Z(ξ, 1) ∩ Θr (w!  ) = ∅, (ii) [0, ) × Bδd−1 + (ξ, −w) ∩ Θr (w  ) = ∅, where (9.6)

Θr (w ) = (P − y)S(w )D(r) − (0, w )

defines a sequence (in r > 0) of random point processes for w distributed according to λ. The main objective is to show that (for every fixed y ∈ P, or if this is not possible, for suitably random y) there is a random process Θ such that (9.7)

Θr (w ) −−−→ Θ − (0, w ) r→0

in distribution (we in fact only require convergence for two-dimensional distributions), where Θ and the random variable w  are independent. The limit process Θ can then be used to find a formula for the transition kernel k(ω  , ξ, ω) with (9.8)

ω  = ω  (s , . . .),

ω = ω(b, . . .).

The requirement to keep track of other “hidden” variables (“. . . ”) will depend on the particular choice of scatterer configuration P, since the limit process Θ can in principle depend on y. We will illustrate this strategy with a few natural examples in Sections 11–19. Let us first discuss what the above argument looks like in the case of kicked Hamiltonians.

KINETIC LIMITS OF DYNAMICAL SYSTEMS

209

forbidden scatterer

/S(w ) e1 = p

exclusion zone

rw

rw particle trajectory

r −(d−1) ξ

yS(w )

Figure 6. Inter-collision flight for a kicked Hamiltonian. The exclusion zone is the cylinder without caps described in the text. 10. Renormalisation of the transition kernel for kicked Hamiltonians Recall that in this setting the scattering is described by the matrix S(w) as defined in (6.7), and the impact parameter w is random according to an absolutely continuous probability measure λ on the cross section Σ. We now use the cylinder ZΣ (, r) = (0, ) × rΣ

(10.1)

to account for the different scattering cross section, see Figure 6. Following the same argument as in Section 9, we find that the probability of hitting the next ξ d−1 , rξ+ equals (this scatterer at a time in [ rd−1 d−1 ) with impact parameter in w + Brδ time no approximation is needed)  (i) ZΣ (ξ, 1) ∩ Θr (w ! ) = ∅, d−1 + (ξ, −w) ∩ Θr (w  ) = ∅, (ii) [0, ) × Bδ with the random process Θr (w ) = (P − y)S(w )D(r) − (0, w ).

(10.2)

The question is, as before, does Θr (w ) −−−→ Θ − (0, w )

(10.3)

r→0



hold with Θ and w independently distributed? 11. Poisson process The Poisson process Θ in Rd with intensity n is characterised by the property that for any collection of bounded, pairwise disjoint Borel sets B1 , . . . , Bk and integers r1 , . . . , rk ≥ 0 we have the probability (11.1)

k & (n vol Bi )ri −n vol Bi e . P(#(Θ ∩ B1 ) = r1 , . . . , #(Θ ∩ Bk ) = rk ) = ri ! i=1

If the limit process Θ in Section 9 or 10 is a Poisson process (conditioned so that 0 ∈ Θ), then the probability of the events (i) and (ii) is (set Σ = B1d−1 in the case

210

JENS MARKLOF

of the Lorentz gas) d−1

(11.2) e−n vol ZΣ (ξ,1) (1 − e−n vol Bδ

)

= e−n vol ZΣ (ξ,1) n vol Bδd−1 + (lower order terms in , δ) and thus, with  → dξ, vol Bδd−1 → db, we find for the limit transition kernel (11.3)

k(ω  , ξ, ω) = ξ

−1

e−ξ/ξ ,

ξ :=

1 , nσ

with (11.4)

ω := b,

d(ω) := σ −1 db

and the total scattering cross section σ = vol Σ. This is the same kernel as in the example (4.22) leading to the linear Boltzmann equation. It follows from a standard probabilistic argument (cf. [5]) that, if P is a typical realisation of a Poisson process, then Θr → Θ − (0, w ) with Θ Poisson (again conditioned to 0 ∈ Θ). Let us now discuss examples of scatterer configurations, for which Θ is not Poisson and for which the linear Boltzmann equation fails. 12. Point processes and homogeneous spaces In the following we will consider homogeneous spaces of the form (12.1)

Γ\G = {Γg : g ∈ G}

where G is a connected Lie group and Γ is a lattice in G. Lattice means Γ is a discrete subgroup of G such that there exists a fundamental domain FΓ of Γ in G with finite left Haar measure μ. It is a general property of the left Haar measure that it is, up to a character, also right invariant. That is, there exists a character χ : G → R such that, for any measurable set B and any g ∈ G, we have μ(Bg) = χ(g)μ(B). It is an easy exercise to show that every fundamental domain of Γ in G has the same finite left Haar measure. Since FΓ g is a fundamental domain for every g ∈ G, we see that μ(FΓ g) = μ(FΓ ) and hence χ = 1. This means G is unimodular, i.e., left and right Haar measure coincide. We normalise the Haar measure so that μ(FΓ ) = 1, and also denote by μ the push-forward of Haar measure to Γ\G, which is the unique G-invariant (under right multiplication) probability measure on Γ\G. The plan is now to consider point patterns P, whose corresponding point processes Θr and Θ can be parametrised by a homogeneous space of the above type. The convergence of Θr will follow from the ergodic theory for subgroups of G acting on Γ\G by right translation. We will start developing our theory by starting with the simplest example for Γ\G: the space of lattices. 13. The space of lattices The opposite extreme of a random scatterer configuration is a perfectly periodic point set P. We begin by assuming that P is a Euclidean lattice L. We fix the covolume of L (the volume of its fundamental domain) to be n−1 , so that the

KINETIC LIMITS OF DYNAMICAL SYSTEMS

211

asymptotic density of P = L is n, as assumed in Section 2. Every such lattice can be written as L = n−1/d Zd M

(13.1)

for some M ∈ SL(d, R). Since the stabiliser of Zd under right multiplication by G = SL(d, R) is the subgroup Γ = SL(d, Z), one can show that there is a bijection ∼

Γ\G − → {Euclidean lattices of covolume n−1 }

(13.2)

ΓM → n−1/d Zd M.

To find the inverse map, note that, for any (positively oriented) basis a1 , . . . , ad of L, the matrix M = n1/d ( ta1 , . . . , tad ) is in SL(d, R); the substitution M → γM , γ ∈ Γ corresponds to a base change of L. It is a well known fact (due to Minkowski) that Γ = SL(d, Z) is a lattice in G = SL(d, R). We can use the G-invariant probability measure μ on Γ\G to define a random point process in Rd by setting Θ = n−1/d Zd M

(13.3)

with M random in Γ\G according to μ and the above identification (13.2) of Γ\G and the space of lattices. Alternatively, we could also take M in (13.3) to be random according to Haar measure in a fundamental domain FΓ in G. The following theorem says that, for any fixed P = L as above, Θr → Θ − (0, w )

(13.4)

where Θ and w  are independent, as required. Theorem 13.1 ([21]). Let λ be an absolutely continuous probability measure on Σ, A1 , . . . , Ak ⊂ Rd bounded with boundary of measure zero and r1 , . . . , rk ∈ Z≥0 . Then  ! μ #((Θ − (0, w )) ∩ Ai ) = ri dλ(w  ). (13.5) λ(#(Θr ∩ Ai ) = ri ) −−−→ r→0

Σ

This theorem is in turn a consequence of the following equidistribution theorem on Γ\G. The matrix S(w) is as defined in (2.5) or (6.7), respectively. (In the former, take Σ = B1d−1 .) Theorem 13.2 ([21]). For any M ∈ Γ\G, any bounded continuous f : Σ × Γ\G → R and any absolutely continuous probability measure λ on Σ,    (13.6) f (w  , M S(w )D(r)) dλ(w ) −−−→ f (w  , M ) dμ(M ) dλ(w ). Σ

r→0

Σ

Γ\G

This states the equidistribution of large spheres (for S(w) as in (2.5)), or expanding horospheres (for S(w) as in (6.7)). Statement (13.5) in Theorem 13.1 follows from Theorem 13.2 by choosing in (13.6) as test function f the characteristic function of the set ! % $  (13.7) (w , M ) ∈ Σ × Γ\G : # (n−1/d Zd M − (0, w  ) ∩ Ai = ri . Now, this characteristic function is of course not continuous, but one can show that (13.7) has boundary of measure zero, and thus the characteristic function can be approximated sufficiently well by continuous functions. Details of this technical argument can be found in [21], Sections 5 and 6.

212

JENS MARKLOF

We will later (in Section 18) return to the justification of statements of the above type, in fact, in a more general form (Theorem 18.1). Let us first see how Theorem 13.1 yields an expression for the transition kernel k(ω  , ξ, ω). 14. The transition kernel for lattices To complete the programme of Section 9 (for the Lorentz gas) and Section 10 (for kicked Hamiltonians) we apply Theorem 13.1 for k = 2, r1 = 0, r2 = 1 and the test sets A2 = ([0, ) × Bδd−1 ) + (ξ, −w)

A1 = ZΣ (ξ, 1),

(14.1)

with , δ small. Note that the condition (14.2) means that (14.3)

! # (Θ − (0, w )) ∩ A2 = 1

! # (Θ − (ξ, w − w)) ∩ ([0, ) × Bδd−1 ) = 1,

i.e., our random lattice Θ must have a lattice point near (ξ, w − w)). In order to characterise lattices with this property, set X = Γ\G and define the subspace X(y) = {M ∈ X : y ∈ Zd M }

(14.4)

for a given y ∈ Rd . In other words, X(y) is the subspace of all lattices (of covolume one) that contain y. In [21] we construct a probability measure νy on X(y) so that dμ(M ) = dνy (M ) dy.

(14.5)

(Since the spaces X(y) are not disjoint, formula (14.5) is only valid for Borel sets E ⊂ X, U ⊂ Rd \ {0} so that for all y 1 = y 2 ∈ U we have X(y 1 ) ∩ X(y 2 ) ∩ E = ∅. See [21, Prop. 7.3] for details.) The decomposition (14.5) shows that the transition kernel is, with ω  = s = (0, w ), ω = b = (0, w), d(ω) = σ −1 dw and ξ = n1σ , $ %! −1 (14.6) k(ω  , ξ, ω) = ξ νy M ∈ X(y) : n−1/d Zd M ∩ (ZΣ (ξ, 1) + (0, w )) = ∅ where y = n1/d (ξ, w − w). For an explicit description of the νy -measure of the above set, see [23], Section 2.2. We note that νy is invariant under G, in the sense that for any Borel set E ⊂ X(y) we have for all g ∈ G νy (E) = νyg (Eg).

(14.7) In particular for

 1−1/d n g= t 0

(14.8)

0 n−1/d 1d−1

 ∈G

we see that yg = (nξ, w − w), and furthermore n−1/d Zd M ∩ (ZΣ (ξ, 1) + (0, w ))g = ∅

(14.9) is equivalent to

Zd M ∩ (ZΣ (nξ, 1) + (0, w )) = ∅.

(14.10)

Hence we obtain the formula (14.11)

k(ω  , ξ, ω) = ξ

−1

νy

$

%! M ∈ X(y) : Zd M ∩ (ZΣ (nξ, 1) + (0, w )) = ∅

KINETIC LIMITS OF DYNAMICAL SYSTEMS

213

with y = (nξ, w − w). The above scaling property must of course hold a priori, as it corresponds to a simple rescaling of length units. In dimension d = 2, when Σ = B11 = (−1, 1), eq. (14.11) can be used to calculate an explicit formula for the transition kernel. We have [20] (with σ = 2)   12 n (nξ)−1 − max(|w|, |w  |) − 1  (14.12) k(ω , ξ, ω) = 2 Υ 1 + π |w − w  | with (14.13)

⎧ ⎪ ⎨0 Υ(x) = x ⎪ ⎩ 1

if x ≤ 0 if 0 < x < 1 if 1 ≤ x.

For independent derivations of Formula (14.12) that do not employ eq. (14.11), see the papers by Caglioti and Golse [9, 10] and Bykovskii and Ustinov [8]. There are no such formulas in higher dimension, although (14.11) can be used to extract information to obtain asymptotics for ξ → 0 and ξ → ∞, cf. [23]. We have in particular in the case Σ = B1d−1 (14.14)

1 − 2d−1 ξ ζ(d)ξ

−1

ξ

≤ k(ω  , ξ, ω) ≤

1 , ζ(d)ξ −1

and so for small ξ this implies k(ω  , ξ, ω) = ζ(d)−1 ξ + O(ξ). Here ζ(d) is the Riemann zeta function and ζ(d)−1 is the relative density of primitive lattice points in Zd . (The primitive lattice points are those points in Zd that are visible from the origin.) Compare (14.14) with the result for the Poisson process (Section 11): (14.15)

kPoisson (ω  , ξ, ω) = ξ

−1 −ξ/ξ

e



−1

−ξ

−2

ξ + O(ξ 2 ).

The asymptotics of k(ω  , ξ, ω) for large ξ is more complicated to state (see [23] for what we know). In the next section we will provide tail estimates for the distribution of free path lengths. To check the symmetry property k(ω, ξ, ω ) = k(ω  , ξ, ω) in (4.11), note that in (14.11) we can replace the term (0, w ) by (0, w ) − y = (−nξ, w) since y ∈ Zd M . This, upon reflecting all sets at the origin, yields $ %! −1 (14.16) k(ω  , ξ, ω) = ξ νy M ∈ X(y) : Zd M ∩ (Z−Σ (nξ, 1) + (0, −w)) = ∅ . Reflecting at the “horizontal” axis Re1 yields (14.17)

y = (ξ, w − w) → (ξ, w − w  ),

(14.18)

Z−Σ (nξ, 1) → ZΣ (nξ, 1)

and (14.19)

(0, −w) → (0, w).

The measure νy is preserved under this map and hence we see that indeed k(ω, ξ, ω ) = k(ω  , ξ, ω).

214

JENS MARKLOF

15. The distribution of free path lengths The limit distribution of the free path lengths (between consecutive collisions) can be obtained from the transition kernel via the formula (4.19). As for the transition kernel, explicit formulas are only known in two dimensions, and were first computed heuristically by Dahlqvist [11], with subsequent rigorous proof by Boca and Zaharescu [2]. These formulas can also be derived via (4.19) from (14.12), see [20]. For arbitrary dimension d ≥ 2, the expression (14.6) is used in [23] to obtain the following tail asymptotics for Σ = B1d−1 : 1 Φ0 (ξ) = (15.1) + O(ξ) (ξ > 0), ξζ(d) (15.2)

Φ0 (ξ) ∼

C ξ3

(ξ → ∞)

with the constant 22−d . d(d + 1)n2 ζ(d) These asymptotics sharpen earlier upper and lower bounds by Bourgain, Golse and Wennberg [6, 16]. Note that (15.2) implies that the density Φ0 (ξ) has no second moment. This fact is used in [27] to prove a superdiffusive central limit theorem for the periodic Lorentz gas. The first moment is of course, by our normalisation, the mean free path length  ∞ ξΦ0 (ξ) dξ = ξ. (15.4) (15.3)

C=

0

Compare these results with the Poisson case: (15.5)

Φ0,Poisson (ξ) = ξ

−1 −ξ/ξ

e

.

The main difference of the two is clearly the exponential vs. power-law tail. 16. Cut-and-project sets We will now significantly generalise the above discussion by allowing P to be a cut-and-project set, following closely the presentation in [24]. Examples include the classic quasicrystals (such as the vertex set of a Penrose tiling), and locally finite periodic point sets. A cut-and-project set P ⊂ Rd is defined as follows. For m ≥ 0, n = d + m, let (16.1)

π : R n → Rd ,

(16.2)

πint : Rn → Rm

be the orthogonal projections of Rn = Rd × Rm onto the first and second factor, respectively. Rd will be called the physical space, and Rm the internal space. Let L ⊂ Rn be a lattice of full rank. Then the closure (16.3)

A := πint (L) ⊂ Rm

is an abelian subgroup. We denote by A0 the connected component of A containing 0. A0 is a linear subspace of Rm of dimension m1 . We find vectors a1 , . . . , am2 (m = m1 + m2 ) so that (16.4)

A = A0 ⊕ Zπ(a1 ) ⊕ . . . ⊕ Zπ(am2 ).

KINETIC LIMITS OF DYNAMICAL SYSTEMS

215

# The Haar measure of A is denoted by μA and normalised so that μA #A0 is the standard Lebesgue measure on A0 . For V := Rd × A0 , we note that L ∩ V is a full rank lattice in V. For W ⊂ A with non-empty interior, we call (16.5)

P = P(W, L) = {π() :  ∈ L, πint () ∈ W}

a cut-and-project set. W is called the window set. If the boundary of the window set has μA -measure zero, we say P(W, L) is regular. We will furthermore assume that W and L are chosen so that the map (16.6)

πW : { ∈ L : πint () ∈ W} → P

is bijective. This is to avoid coincidences in P. It follows from Weyl equidistribution that such P have density μA (W) . (16.7) n= vol(V/(L ∩ V)) In the case of lattices, we were able to assume, by the translational symmetry of the lattice, that the scatterer location y from which we launch our particle is without loss of generality y = 0. This is of course not the case for a general cutand-project set considered now. Note however that for y ∈ P there is  ∈ L such that  = π(y) and (16.8)

P(W, L) − y = P(W − y int , L),

y int := πint ().

By adjusting W in this way, we may therefore assume in the following that y = 0; but keep in mind that the limit process depends on the choice of W and therefore on the scatterer location y. 17. Spaces of cut-and-project sets The aim is now to describe the “closure” (in a suitable sense) of the orbit of P under the SL(d, R)-action and construct a probability measure on it. This will yield, as we shall see, our limit random process Θ. Set G = SL(n, R), Γ = SL(n, Z) and define the embedding (for any g ∈ G) (17.1)

ϕg : SL(d, R) → G  A A → g 0m×d

 0d×m −1 g . 1m

Since SL(d, R) is generated by unipotent subgroups, Ratner’s theorems [30] imply that there is a (unique) closed connected subgroup Hg ≤ G such that: (i) Γ ∩ Hg is a lattice in Hg . (ii) ϕg (SL(d, R)) ⊂ Hg . (iii) The closure of Γ\Γϕg (SL(d, R)) is Γ\ΓHg . We will call Hg a Ratner subgroup. We denote the unique Hg -invariant probability measure on Γ\ΓHg by μHg = μg . Note that Γ\ΓHg is isomorphic to the homogeneous space (Γ ∩ Hg )\Hg . Pick g ∈ G, δ > 0 such that L = δ 1/n Zn g. Then one can show [24, Prop. 3.5] that (17.2) (17.3)

πint (δ 1/n Zn hg) ⊂ A πint (δ 1/n Zn hg) = A

for all h ∈ Hg ,

for μg -almost all h ∈ Hg .

216

JENS MARKLOF

The image of the map Γ\ΓHg → {point sets in Rd }

(17.4)

h → P(W, δ 1/n Zn hg)

defines a space of cut-and-project sets, and the push-forward of μg equips it with a probability measure. We have thus defined a random point process Θ in Rd , which is SL(d, R) invariant, and whose typical realisation is a cut-and-project set with window W and internal space A. As we will see, this process is precisely the limit process Θ we are looking for. 18. Equidistribution The following equidistribution theorems generalise Theorem 13.2 stated earlier. They are a consequence of Ratner’s measure classification theorems [30], and in particular follow from a theorem of Shah [31, Thm. 1.4] on the equidistribution of translates of unipotent orbits. Recall that the matrix S(w) is as defined in (2.5) or (6.7), respectively. Theorem 18.1 ([24]). Fix g ∈ G. For any bounded continuous f : Σ × Γ\ΓHg → R and any absolutely continuous probability measure λ on Σ,    (18.1) f (w , ϕg (S(w )D(r))) dλ(w ) −−−→ f (w , h) dμg (h) dλ(w ). r→0

Σ

Σ

Γ\ΓHg

Note that this theorem reduces to the statement of Theorem 13.2 when n = d. As in the case of the space of lattices (Section 13), Theorem 18.1 is the key tool to prove: Theorem 18.2 ([24]). If Θr is the sequence of random point processes (as defined in Section 9 resp. 10) corresponding to a regular cut-and-project set P = P(L, W) and scatterer location y ∈ P, then Θr −−−→ Θ − (0, w )

(18.2)

r→0

where Θ is the random point process for the cut-and-project set P(L, W − y int ) (as constructed in Section 17) and w  randomly distributed according to λ. The convergence in (18.2) is understood in the sense of Theorem 13.1, i.e., all test sets Ai are assumed to be bounded and have boundary of measure zero. Theorem 18.2 implies that the limit transition kernel k(ω  , ξ, ω) is a function of ω  = (s , y int ),

(18.3)

ω = (b, η int ),

where y int and η int are the “internal” coordinates of the scatterers η, y involved in the consecutive collisions, as defined (16.8). For further details see the forthcoming paper [26]. Let us now sketch the main ideas in the proof of Theorem 18.2. By a standard approximation argument (see [21, p. 1973] for details), where we localise λ near any given fixed w 0 , we may reduce the proof to test functions f that are independent on the first variable w . We thus need to establish     f (ϕg (S(w )D(r))) dλ(w ) −−−→ f (h) dμg (h). (18.4) Σ

r→0

Γ\ΓHg

KINETIC LIMITS OF DYNAMICAL SYSTEMS

Let us take S(w) as defined in (6.7), i.e., (18.5)

S(w) = n(∂W (w)),

 n(x) =

217

 1 x . t 0 1d−1

The case of S(w) as defined in (2.5) is analogous and discussed in detail in [21, 24]. The variable substitution (recall (5.12)) w → x = −∂W (w  )

(18.6)

yields for the left hand side of (18.4),     ˜ f (ϕg (S(w )D(r))) dλ(w ) = f (ϕg (n(x)D(r))) dλ(x), (18.7) Σ

˜ is still absolutely continuous in view of (5.13) and our assumption on the where λ invertibility of the scattering map. Eq. (18.7) defines a sequence of Borel probability measures ρr on Γ\G via the linear functional  ˜ (18.8) ρr [f ] := f (ϕg (n(x)D(r))) dλ(x). This sequence can be shown to be tight, which implies that any sequence of ρr contains a convergent subsequence ρri . The question is: What are the possible weak limits ρri → ρ? The first crucial observation is that, because ϕg (n(x)D(r) is a sequence of expanding horospheres, any limit must be invariant under the horospherical subgroup. That is, ρ[f ◦ ϕg (n(x))] = ρ[f ]

(18.9)

for all x ∈ R . Ratner’s theorem [30] gives a complete classification of such measures (in a significantly more general setting). In particular, the ergodic components are unique H-invariant probability measures μH supported on embedded homogeneous spaces of the form Γ\ΓH where H is a closed connected subgroup of G such that Γ ∩ H is a lattice in H. Shah [31] showed (again in a more general setting) that for expanding translates any limit ρ is in fact invariant under the group ϕg (SL(d, R)), and ρ = μH where H is uniquely determined by the fact that Γ\ΓH is the closure of the orbit Γ\Γϕg (SL(d, R)). The uniqueness of the limit implies that any subsequence converges, and thus ρr → μH as r → 0. d

19. Examples of cut-and-project sets and their SL(d, R)-closures The first obvious example is when P is a Euclidean lattice L as studied in Sections 13–15. In this case m = 0, G = SL(d, R), Γ = SL(d, Z) and the SL(d, R) closure of L is the space of lattices Γ\G; that is Hg = G for any g. This observation generalises as follows. # Proposition 19.1 ([24]). If m < d and L = Zn g is chosen such that π #L is injective, then Hg = G = SL(n, R). The conditions of this proposition are for instance satisfied in the example studied by Wennberg [37], where (19.1)

P = Q × Z ⊂ R2

and Q is the one-dimensional cut-and-project set (“Fibonacci quasicrystal”) ( ("  (j ( j 1 ( ( (19.2) Q= √ + √ 1 + τ2 τ 1 + τ 2 ( τ ( j∈Z

218

JENS MARKLOF √

where τ = 1+2 5 (the golden ratio) and  ·  is the distance to the nearest integer. It is an instructive exercise to understand why Q and P are cut-and-project sets as defined in Section 18. As we shall now see, there are counter examples to the claim in Proposition 19.1 for m ≥ d. Probably the most prominent class of quasicrystals are those constructed from algebraic number fields. The Penrose tilings fall into this class. Let us briefly sketch how such quasicrystals can be obtained as cut-and-project sets. Let • K be a totally real number field of degree N ≥ 2 over Q, • OK the ring of integers of K, and • π1 = id, π2 , . . . , πN the distinct embeddings K → R. We also use πi to denote the component-wise embedddings (19.3)

πi :K d → Rd x → (πi (x1 ), . . . , πi (xd )),

and similarly for the entry-wise embeddings of d × d matrices, πi : Md (K) → Md (R).

(19.4) Now consider the lattice (19.5)

L = {(x, π2 (x), . . . , πN (x)) : x ∈ OdK }

in RN d . This is a lattice of full rank. The dimension of the internal space is m = (N − 1)d. It is a fact of “basic” number theory [36] that A := πint (L) = Rm , so that V = RN d . To complete the discussion of this set-up, we need to work out Hg for g ∈ G and δ > 0 so that (19.6)

L = δ 1/N d ZN d g.

(In fact, δ = |DK |d/2 where DK is the discriminant of K.) The answer is given by the following lemma. Lemma 19.2 ([24]). For g as above, (19.7)

Hg = g SL(d, R)N g −1

and (19.8)

Γ ∩ Hg = g SL(d, OK )g −1 .

The group SL(d, OK ) is called the Hilbert modular group. The proof of the above lemma is written out in Section 2.2.1 of [24]. For a detailed account on how the Penrose tilings fit into this setting, see Section 2.5 of [24]. A further example of a cut-and-project set is to take the union of finite translates of a given cut-and-project set. This is explained in Section 2.3 of [24]. Let us here discuss the special case of periodic Delone sets, i.e., the union finite translates of a given lattice L0 of full rank in Rd . An example of such a set is the honeycomb lattice, which in the context of the Boltzmann-Grad limit of the Lorentz gas was recently studied by Boca et al. [3, 4] with different techniques. The scatterer configuration P we are now interested in is the union of m copies of the same lattice L0 translated by t1 , . . . , tm ∈ Rd , (19.9)

P=

m 

(tj + L0 ).

j=1

KINETIC LIMITS OF DYNAMICAL SYSTEMS

219

We assume that the tj are chosen in such a way that the above union is disjoint. Let us now show that P can be realised as a cut-and-project set P(L, W). Let (19.10)

L = (L0 × {0}) +

m 

Z (tj , ej ) ⊂ Rn ,

j=1

where 0 ∈ Rm and e1 , . . . , em are the standard basis vectors in Rm . The set L is evidently a lattice of full rank in Rn . Note that (19.11)

πint (L) =

m 

Z ej = Z m ,

j=1

and therefore the closure of this set is A = Zm with connected component A0 = {0}. It follows that for the window set W=

(19.12)

m 

{ej } ⊂ A

j=1

we indeed have (19.13)

P(L, W) =

m 

(tj + L0 ).

j=1

Let us now determine Hg in this setting. (Note that the injectivity assumption −1/d of Proposition 19.1 is now violated.) Take g0 ∈ SL(d, R) so that L0 = n0 Zd g0 , where n0 is the density of L0 . Set ⎛ ⎞ t1 ⎜ .. ⎟ (19.14) T = ⎝ . ⎠ ∈ Mm×d (R). tm We then have −1/n

L = n0

(19.15) for (19.16)

1/n

g = n0

 −1/d n0 g0 T

Zn g,

0 1m

 ∈ SL(n, R).

If L0 and t1 , . . . , tm are not rationally related, we have the following result for the Ratner subgroup Hg . Lemma 19.3 ([24]). Let a1 , . . . , ad be a basis of L0 so that the vectors a1 , . . . , ad , t1 , . . . , tm are linearly independent over Q. Then  "  h 0 : h ∈ SL(d, R), u ∈ Mm×d (R) . (19.17) Hg = u 1m The Ratner subgroups that appear in the case of rational translates tj are discussed in [24], Section 2.3.1.

220

JENS MARKLOF

20. Conclusions We have studied several classes of scatterer configurations P, for which the dynamics of the Lorentz gas converges in the Boltzmann-Grad limit to a random flight process. This process is described by a generalised linear Boltzmann equation. Its transition kernel k(ω  , ξ, ω) can be obtained from a random point process Θ, which is the limit of a sequence of dilated, randomly rotated copies of P as in (9.7). If P is sufficiently generic (e.g., a typical realisation of a Poisson point process), then the limit process Θ is a Poisson point process and the generalised Boltzmann equation reduces to the classical linear Boltzmann equation. If, on the other hand, P is a Euclidean lattice or a quasicrystal, then the limit process Θ is given by the distribution of random lattices, and the generalised Boltzmann equation does not reduce to the classical case. A striking feature in this case is that the distribution of free path lengths has a power-law tail and no second moment. This is very different to the Poisson case, where the path length distribution is exponential. By taking unions of incommensurate Euclidean lattices, one can in fact obtain path length distributions with any integer power law [25]. It is an exciting challenge to try to characterise all point processes Θ that arise as a limit in (9.7) or (10.3) for general scatterer configurations P. The limit processes we have encountered in these lectures have a common feature: they are invariant under the action of SL(d, R). That is, the probability of finding a given number of points in each of the test sets B1 , . . . , Bk is the same as for the test sets B1 M, . . . , Bk M , for any M ∈ SL(d, R). Although it is not obvious that all limit processes Θ have this property, I expect that almost all will. A natural objective is therefore: (20.1)

Classify all SL(d, R)-invariant point processes in Rd .

This seems an extremely hard problem already in dimension d = 2, when one looks beyond Ratner’s setting of point processes coming from homogeneous spaces (Section 12). Recent breakthroughs include the papers by McMullen [28] and Eskin and Mirzakhani [13] on the SL(2, R)-action on moduli space, which may be mapped to an SL(2, R)-invariant point process in R2 by analogous arguments as in Section 12, see [35] for details. The origin in Rd (which represents the current scatterer location) is a fixed point of the SL(d, R)-action and hence plays a special role. It is natural to focus on those processes Θ that are independent of the choice of origin. To this end, consider an 9 where ASL(d, R) is the group generated by SL(d, R) ASL(d, R)-invariant process Θ, and the group of translations of Rd . We then obtain the desired SL(d, R)-invariant 9 to contain the origin. The goal now seems a little process Θ by conditioning Θ easier: (20.2)

Classify all ASL(d, R)-invariant point processes in Rd .

The limit processes Θ we have discussed in these lectures fall into this more restricted class. This is evident for the Poisson process, but less obvious in the case 9 is constructed of Euclidean lattices and cut-and-project sets. Here the process Θ via the space of affine lattices, see [21, 24] for details. I should point out that the limit processes Θ in (9.7) or (10.3) do not necessarily have to be SL(d, R)-invariant (in the sense defined above), when P is dependent on r. Assume for instance that the scatterer locations P are no longer fixed, but

KINETIC LIMITS OF DYNAMICAL SYSTEMS

221

oscillate around their equilibrium position at a given points set P0 , where the amplitude of oscillation is on the same scale as the scattering radius. We may model this by assuming that the position of the scatterer is a random variable y + rξy , where y ∈ P0 , and {ξ y : y ∈ Rd } is a random field of identical, independently distributed random vectors ξy ∈ Rd with rotation-invariant distribution. Denote by ξ y⊥ ∈ Rd−1 the orthogonal projection onto the plane perpendicular to e1 . The renormalisation approach in Sections 9 and 10 shows that the limit random process is given by % $ (20.3) Θ = y + ξy⊥ − ξ0⊥ : y ∈ Θ0 , where Θ0 is the limit process corresponding to the fixed configuration P0 . If P0 is given by a Euclidean lattice or cut-and-project set, then Θ0 is SL(d, R)-invariant but Θ is not. Acknowledgements These lecture notes were prepared for the summer schools “Limit theorems for Dynamical Systems” at the Bernoulli Center, EPFL Lausanne, 27–31 May 2013, “Current Topics in Mathematical Physics” at CIRM Marseille, 1–7 September 2013, and the Masterclass “Randomness, Classical and Quantum” at the University of Copenhagen, 4–8 November 2013. I thank the organizers for the invitation to speak at these events, and the participants for stimulating discussions and feedback. Much of the material in these notes is based on joint work with Andreas Str¨ombergsson, and I would like to thank him for the long-standing collaboration. I am grateful to Daniel El-Baz, Jory Griffin, Andreas Str¨ ombergsson, Jim Tseng and Ilya Vinogradov for their comments on a first draft of these notes. The research leading to the results presented here has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) / ERC Grant Agreement n. 291147. I furthermore gratefully acknowledge support from a Royal Society Wolfson Research Merit Award. References [1] P. M. Bleher, Statistical properties of two-dimensional periodic Lorentz gas with infinite horizon, J. Statist. Phys. 66 (1992), no. 1-2, 315–373, DOI 10.1007/BF01060071. MR1149489 (93b:82052) [2] F. P. Boca and A. Zaharescu, The distribution of the free path lengths in the periodic twodimensional Lorentz gas in the small-scatterer limit, Comm. Math. Phys. 269 (2007), no. 2, 425–471, DOI 10.1007/s00220-006-0137-7. MR2274553 (2008f:37013) [3] Florin P. Boca, Distribution of the linear flow length in a honeycomb in the small-scatterer limit, New York J. Math. 16 (2010), 651–735. MR2763082 (2012c:37067) [4] Florin P. Boca and Radu N. Gologan, On the distribution of the free path length of the linear flow in a honeycomb (English, with English and French summaries), Ann. Inst. Fourier (Grenoble) 59 (2009), no. 3, 1043–1075. MR2543662 (2010h:37075) [5] C. Boldrighini, L. A. Bunimovich, and Ya. G. Sina˘ı, On the Boltzmann equation for the Lorentz gas, J. Statist. Phys. 32 (1983), no. 3, 477–501, DOI 10.1007/BF01008951. MR725107 (85h:82025) [6] J. Bourgain, F. Golse, and B. Wennberg, On the distribution of free path lengths for the periodic Lorentz gas, Comm. Math. Phys. 190 (1998), no. 3, 491–508, DOI 10.1007/s002200050249. MR1600299 (98k:82146) [7] L. A. Bunimovich and Ya. G. Sina˘ı, Statistical properties of Lorentz gas with periodic configuration of scatterers, Comm. Math. Phys. 78 (1980/81), no. 4, 479–497. MR606459 (82m:82007)

222

JENS MARKLOF

[8] V. A. Bykovski˘ı and A. V. Ustinov, The statistics of particle trajectories in the nonhomogeneous Sina˘ı problem for a two-dimensional lattice (Russian, with Russian summary), Izv. Ross. Akad. Nauk Ser. Mat. 73 (2009), no. 4, 17–36, DOI 10.1070/IM2009v073n04ABEH002461; English transl., Izv. Math. 73 (2009), no. 4, 669–688. MR2583964 (2011a:82020) [9] Emanuele Caglioti and Fran¸cois Golse, The Boltzmann-Grad limit of the periodic Lorentz gas in two space dimensions (English, with English and French summaries), C. R. Math. Acad. Sci. Paris 346 (2008), no. 7-8, 477–482, DOI 10.1016/j.crma.2008.01.016. MR2417573 (2009d:82126) [10] Emanuele Caglioti and Fran¸cois Golse, On the Boltzmann-Grad limit for the two dimensional periodic Lorentz gas, J. Stat. Phys. 141 (2010), no. 2, 264–317, DOI 10.1007/s10955-010-00461. MR2726643 (2011k:82079) [11] Per Dahlqvist, The Lyapunov exponent in the Sinai billiard in the small scatterer limit, Nonlinearity 10 (1997), no. 1, 159–173, DOI 10.1088/0951-7715/10/1/011. MR1430746 (98a:58102) [12] R. J. DiPerna and P.-L. Lions, On the Cauchy problem for Boltzmann equations: global existence and weak stability, Ann. of Math. (2) 130 (1989), no. 2, 321–366, DOI 10.2307/1971423. MR1014927 (90k:82045) [13] A. Eskin and M. Mirzakhani, Invariant and stationary measures for the SL(2,R) action on Moduli space, arXiv:1302.3320. [14] Isabelle Gallagher, Laure Saint-Raymond, and Benjamin Texier, From Newton to Boltzmann: hard spheres and short-range potentials, Zurich Lectures in Advanced Mathematics, European Mathematical Society (EMS), Z¨ urich, 2013. MR3157048 [15] G. Gallavotti, Divergences and approach to equilibrium in the Lorentz and the wind-treemodels, Physical Review 185 (1969), 308–322. [16] Fran¸cois Golse and Bernt Wennberg, On the distribution of free path lengths for the periodic Lorentz gas. II (English, with English and French summaries), M2AN Math. Model. Numer. Anal. 34 (2000), no. 6, 1151–1163, DOI 10.1051/m2an:2000121. MR1812731 (2002a:82092) [17] Oscar E. Lanford III, Time evolution of large classical systems, Dynamical systems, theory and applications (Rencontres, Battelle Res. Inst., Seattle, Wash., 1974), Springer, Berlin, 1975, pp. 1–111. Lecture Notes in Phys., Vol. 38. MR0479206 (57 #18653) [18] A. J. Lichtenberg and M. A. Lieberman, Regular and chaotic dynamics, 2nd ed., Applied Mathematical Sciences, vol. 38, Springer-Verlag, New York, 1992. MR1169466 (93c:58071) [19] H. Lorentz, Le mouvement des ´ electrons dans les m´ etaux, Arch. N´ eerl. 10 (1905), 336–371. [20] Jens Marklof and Andreas Str¨ ombergsson, Kinetic transport in the two-dimensional periodic Lorentz gas, Nonlinearity 21 (2008), no. 7, 1413–1422, DOI 10.1088/0951-7715/21/7/001. MR2425325 (2009j:82029) [21] Jens Marklof and Andreas Str¨ ombergsson, The distribution of free path lengths in the periodic Lorentz gas and related lattice point problems, Ann. of Math. (2) 172 (2010), no. 3, 1949–2033, DOI 10.4007/annals.2010.172.1949. MR2726104 (2012b:37103) [22] Jens Marklof and Andreas Str¨ ombergsson, The Boltzmann-Grad limit of the periodic Lorentz gas, Ann. of Math. (2) 174 (2011), no. 1, 225–298, DOI 10.4007/annals.2011.174.1.7. MR2811599 (2012g:37075) [23] Jens Marklof and Andreas Str¨ ombergsson, The periodic Lorentz gas in the BoltzmannGrad limit: asymptotic estimates, Geom. Funct. Anal. 21 (2011), no. 3, 560–647, DOI 10.1007/s00039-011-0116-9. MR2810859 (2012h:37019) [24] Jens Marklof and Andreas Str¨ ombergsson, Free path lengths in quasicrystals, Comm. Math. Phys. 330 (2014), no. 2, 723–755, DOI 10.1007/s00220-014-2011-3. MR3223485 [25] Jens Marklof and Andreas Str¨ ombergsson, Power-law distributions for the free path length in Lorentz gases, J. Stat. Phys. 155 (2014), no. 6, 1072–1086, DOI 10.1007/s10955-014-0935-9. MR3207729 [26] Jens Marklof and Andreas Str¨ ombergsson, Kinetic transport in quasicrystals, in preparation. [27] Jens Marklof and B´ alint T´ oth, Superdiffusion in the periodic Lorentz gas, arXiv:1403.6024. [28] Curtis T. McMullen, Dynamics of SL2 (R) over moduli space in genus two, Ann. of Math. (2) 165 (2007), no. 2, 397–456, DOI 10.4007/annals.2007.165.397. MR2299738 (2008k:32035) [29] Mario Pulvirenti, Chiara Saffirio, and Sergio Simonella, On the validity of the Boltzmann equation for short range potentials, Rev. Math. Phys. 26 (2014), no. 2, 1450001, 64, DOI 10.1142/S0129055X14500019. MR3190204

KINETIC LIMITS OF DYNAMICAL SYSTEMS

223

[30] Marina Ratner, On Raghunathan’s measure conjecture, Ann. of Math. (2) 134 (1991), no. 3, 545–607, DOI 10.2307/2944357. MR1135878 (93a:22009) [31] Nimish A. Shah, Limit distributions of expanding translates of certain orbits on homogeneous spaces, Proc. Indian Acad. Sci. Math. Sci. 106 (1996), no. 2, 105–125, DOI 10.1007/BF02837164. MR1403756 (98b:22024) [32] H. Spohn, The Lorentz process converges to a random flight process, Comm. Math. Phys. 60 (1978), no. 3, 277–290. MR0496299 (58 #14861) [33] Rob Sturman, Julio M. Ottino, and Stephen Wiggins, The mathematical foundations of mixing, Cambridge Monographs on Applied and Computational Mathematics, vol. 22, Cambridge University Press, Cambridge, 2006. The linked twist map as a paradigm in applications: micro to macro, fluids to solids. MR2265644 (2009k:37187) [34] Domokos Sz´ asz and Tam´ as Varj´ u, Limit laws and recurrence for the planar Lorentz process with infinite horizon, J. Stat. Phys. 129 (2007), no. 1, 59–80, DOI 10.1007/s10955-007-9367-0. MR2349520 (2008k:37087) [35] William A. Veech, Siegel measures, Ann. of Math. (2) 148 (1998), no. 3, 895–944, DOI 10.2307/121033. MR1670061 (2000k:37028) [36] Andr´ e Weil, Basic number theory, Die Grundlehren der mathematischen Wissenschaften, Band 144, Springer-Verlag New York, Inc., New York, 1967. MR0234930 (38 #3244) [37] Bernt Wennberg, Free path lengths in quasi crystals, J. Stat. Phys. 147 (2012), no. 5, 981–990, DOI 10.1007/s10955-012-0500-3. MR2946632 School of Mathematics, University of Bristol, Bristol BS8 1TW, United Kingdom E-mail address: [email protected]

Additional Topics

Proceedings of Symposia in Pure Mathematics Volume 89, 2015 http://dx.doi.org/10.1090/pspum/089/01492

Limit theorems for toral translations Dmitry Dolgopyat and Bassam Fayad Abstract. We review known results about the distribution of orbits of toral translations and present open questions.

Contents 1. Introduction 2. Ergodic sums of smooth functions with singularities 2.1. Smooth observables 2.2. Observables with singularities 3. Ergodic sums of characteristic functions. Discrepancies 3.1. The maximal discrepancy 3.2. Limit laws for the discrepancy as α is random 4. Poisson regime 5. Poisson processes 6. Uniform distribution on the space of lattices 7. Ideas of the Proofs 7.1. The Poisson regime 7.2. Application of the Poisson regime theorems to the ergodic sums of smooth functions with singularities 7.3. Limit theorems for discrepancy. 8. Shrinking targets 8.1. Dynamical Borel-Cantelli lemmas for translations. 8.2. On the distribution of hits. 8.3. Proofs outlines. 9. Skew products. Random walks. 9.1. Basic properties. 9.2. Recurrence. 9.3. Ergodicity. 9.4. Rate of recurrence. 10. Special flows. 10.1. Ergodic integrals. 10.2. Smooth time change. 10.3. Time change with singularities. 2010 Mathematics Subject Classification. Primary 11K06, 37C55; Secondary 37A17, 37A45. The authors thank Mariusz Lema´ nczyk, Jens Marklof, Jean Paul Thouvenot, and Ilya Vinogradov for useful discussions. The research of DD was partially supported by the NSF. The research of BF was partially supported by ANR-11-BS01-0004. c 2015 Dmitry Dolgopyat and Bassam Fayad

227

228

DMITRY DOLGOPYAT AND BASSAM FAYAD

10.4. Mixing properties. 10.5. An application. 11. Higher dimensional actions 11.1. Linear forms. 11.2. Cut-and-project sets. References

1. Introduction One of the surprising discoveries of dynamical systems theory is that many deterministic systems with non-zero Lyapunov exponents satisfy the same limit theorems as the sums of independent random variables. Much less is known for the zero exponent case where only a few examples have been analyzed. In this survey we consider the extreme case of toral translations where each map not only has zero exponents but is actually an isometry. These systems were studied extensively due to their relations to number theory, to the theory of integrable systems and to geometry. Surprisingly many natural questions are still open. We review known results as well as the methods to obtain them and present a list of open problems. Given a vast amount of work on this subject, it is impossible to provide a comprehensive treatment in this short survey. Therefore we treat the topics closest to our research interests in more detail while some other subjects are mentioned only briefly. Still we hope to provide the reader with the flavor of the subject and introduce some tools useful in the study of toral translations, most notably, various renormalization techniques. Let X = Td , μ be the Haar measure on X and Tα (x) = x + α. The most basic question in smooth ergodic theory is the behavior of ergodic sums. Given a map T and a zero mean observable A(·) let (1)

AN (x) =

N −1 

A(T n x)

n=0

If there is no ambiguity, we may write AN for AN (x). Conversely we may use the notation AN (α, x) to indicate that the underlying map is the translation of vector α. The uniform distribution of the orbit of x by T is characterized by the convergence to 0 of AN (x)/N . In the case of toral translations Tα with irrational frequency vector α the uniform distribution holds for all points x. The study of the ergodic sums is then useful to quantify the rate of uniform distribution of the Kronecker sequence nα mod 1 as we will see in Section 3 where discrepancy functions are discussed. The question about the distribution of ergodic sums is analogous to the the Central Limit Theorem in probability theory. One can also consider analogues of other classical probabilistic results. In this survey we treat two such questions. In Section 4 we consider so called Poisson regime where (1) is  −1 n replaced by N n=0 χCN (Tα x) and the sets CN are scaled in such a way that only finite number of terms are non-zero for typical x. Such sums appear in several questions in mathematical physics, including quantum chaos [91] and BoltzmannGrad limit of several mechanical systems [93]. They also describe the resonances in the study of ergodic sums for toral translations as we will see in Section 7. In

LIMIT THEOREMS FOR TORAL TRANSLATIONS

229

Section 8 we consider Borel-Cantelli type questions where one takes a sequence of shrinking sets and studies a number of times a typical orbit hits whose sets. These questions are intimately related to some classical problems in the theory of Diophantine approximations. The ergodic sums above toral translations also appear in natural dynamical systems such as skew products, cylindrical cascades and special flows. Discrete time systems related to ergodic sums over translations are treated in Section 9 while flows are treated in Section 10. These systems give additional motivation to study the ergodic sums (1) for smooth functions having singularities of various types: power, fractional power, logarithmic... Ergodic sums for functions with singularities are discussed in Section 2. Finally in Section 11 we present the results related to action of several translations at the same time. Notations. We say that a vector α = (α1 , . . . , αd ) ∈ Rd is irrational if {1, α1 , . . . , αd } are linearly independent over Q. For x ∈ Rd , we use the notation {x} := (x1 , . . . , xd ) mod (1). We denote by x the closest signed distance of some x ∈ R to the integers. Assuming that d ∈ N is fixed, for σ > 0 we denote by D(σ) ⊂ Rd the set of Diophantine vectors with exponent σ, that is (2)

D(σ) = {α : ∃C ∀k ∈ Zd − 0, m ∈ Z |(k, α) − m| ≥ C|k|−d−σ }

Let us recall that D(σ) has a full measure if σ > 0, while D(0) is an uncountable set of zero measure and D(σ) is empty for σ < 0. The set D(0) is called the set of constant type vector or badly approximable vectors. An irrational vector α that is not Diophantine for any σ > 0 is called Liouville. 1 We denote by C the standard Cauchy random variable with density π(1+x 2) . 2 Normal random variable with zero mean and variance D will be denoted by N(D2 ). 2 2 1 Thus N(D2 ) has density 2πD e−x /2D . We will write simply N for N(1). Next, P(X, μ) will denote the Poisson process on X with measure μ (we refer the reader to Section 5 for the definition and basic properties of Poisson processes). 2. Ergodic sums of smooth functions with singularities 2.1. Smooth observables. For toral translations, the ergodic sums of smooth observables are well understood. Namely if A is sufficiently smooth with zero mean then for almost all α, A is a coboundary, that is, there exists B(α, x) such that (3) Namely if A(x) =



A(x) = B(x + α, α) − B(x, α).

2πi(k,x) k =0 ak e

B(α, x) =



then

bk e2πi(k,x) where bk =

k =0

ak . ei2π(k,α) − 1

The above series converges in L2 provided α ∈ D(σ) and A ∈ Hσ = {A :

 k

|ak |k|(σ+d) |2 < ∞}.

230

DMITRY DOLGOPYAT AND BASSAM FAYAD

Note that (3) implies that AN (x) = B(x + N α, α) − B(x, α) giving a complete description of the behavior of ergodic sums for almost all α. In particular we have Corollary 1. If α is uniformly distributed on Td then AN (x) has a limiting distribution as N → ∞, namely AN ⇒ B(y, α) − B(x, α) where (y, α) is uniformly distributed on Td × Td . Proof. We need to show that as N → ∞ the random vector (α, N α) converge to a vector with coordinates independent random variables uniformly distributed on Td × Td . To this end it suffices to check that if φ(x, y) is a smooth function on Td × Td then   φ(α, N α)dα =

lim

N →∞

Td

φ(α, β)dαdβ Td ×Td

but this is easily established by considering the Fourier series of φ.



We will see in Section 10 how our understanding of ergodic sums for smooth functions can be used to derive ergodic properties of area preserving flows on T2 without fixed points. On the other hand there are many open questions related to the case when the observable A is not smooth enough for (3) to hold. Below we mention several classes of interesting observables. 2.2. Observables with singularities. Special flows above circle rotations and under ceiling functions that are smooth except for some singularities naturally appear in the study of conservative flows on surfaces with fixed points. Another motivation for studying ergodic sums for functions with singularities is the case of meromorphic functions, whose sums appear in questions related to both number theory [48] and ergodic theory [106]. 2.2.1. Observables with logarithmic singularities. In the study of conservative flows on surfaces, non degenerate saddle singularities are responsible for logarithmic singularities of the ceiling function. Ceiling functions with logarithmic singularities also appear in the study of multi-valued Hamiltonians on the two torus. In [3], Arnold investigated such flows and showed that the torus decomposes into cells that are filled up by periodic orbits and one open ergodic component. On this component, the flow can be represented as a special flow over an interval exchange map of the circle and under a ceiling function that is smooth except for some logarithmic singularities. The singularities can be asymmetric since the coefficient in front of the logarithm is twice as big on one side of the singularity as the one on the other side, due to the existence of homoclinic loops (see Figure 1). More motivations for studying function with logarithmic singularities as well as some numerical results for rotation numbers of bounded type are presented in [69]. A natural question is to understand the fluctuations of the ergodic sums for these functions as the frequency α of the underlying rotation is random as well as the base point x. Since Fourier coefficients of the symmetric logarithm function

LIMIT THEOREMS FOR TORAL TRANSLATIONS

231

Figure 1. Multivalued Hamiltonian flow. Note that the orbits passing to the left of the saddle spend approximately twice longer time comparing to the orbits passing to the right of the saddle and starting at the same distance from the separatrix since they pass near the saddle twice. have the asymptotics similar to that of the indicator function of an interval one may expect that the results about the latter that we will discuss in Section 3 can be extended to the former. Question 1. Suppose that A is smooth away from a finite set of points x1 , x2 , . . . , xk and near xj , A(x) = a± j ln |x − xj | + rj (x) where − sign is taken if x < xj , + sign is taken if x > xj and rj are smooth functions. What can be said about the distribution of AN (α, x)/ ln N as x and α are random? 2.2.2. Observables with power like singularities. When considering conservative flows on surfaces with degenerate saddles one is led to study the ergodic sums of observables with integrable power like singularities (more discussion of these flows will be given in Section 10). Special flows above irrational rotations of the circle under such ceiling functions are called Kocergin flows. The study of ergodic sums for smooth ergodic flows with nondegenerate hyperbolic singular points on surfaces of genus p ≥ 2 shows that these flows are in general not mixing (see Section 10). A contrario Kocergin showed that special flows above irrational rotations and under ceiling functions with integrable power like singularities are always mixing. This is due to the important deceleration next to the singularity that is responsible for a shear along orbits that separates the points sufficiently to produce mixing. In other words, the mixing is due to large oscillations of the ergodic sums. In this note we will be frequently interested in the distribution properties of these sums. One may also consider the case of non-integrable power singularities since they naturally appear in problems of ergodic theory and number theory. The following result answers a question of [48]. Theorem 2. ([120]) If A has one simple pole on T1 and (α, x) is uniformly distributed on T2 then ANN has limiting distribution as N → ∞.

232

DMITRY DOLGOPYAT AND BASSAM FAYAD

The function A in Theorem 2 has a symmetric singularity of the form 1/x that is the source of cancellations in the ergodic sums. Question 2. What happens for an asymmetric singularity of the type 1/|x|? Question 3. What happens in the quenched setting where α is fixed? We now present several generalizations of Theorem 2. ˜ Theorem 3. Let A = A(x) +

c− χxx0 |x−x0 |a

where A˜ is smooth and a > 1. AN (a) If (α, x) is uniformly distributed on T2 then a converges in distribution. N AN (α, x) (b) For almost every x fixed, if α is uniformly distributed on T then Na converge to the same limit as in part (a). Theorem 4. [87] If A has zero mean and is smooth except for a singularity at 0 of type |x|−a , a ∈ (0, 1) then AN /N a converges in distribution. The proof of Theorem 4 is inspired by the proof of Theorem 10 of Section 3 which will be presented in Section 7.3. Marklof proved in [92] that if α ∈ D(σ) with σ < (1 − a)/a, then for A as in Theorem 4 AN (α, α)/N → 0. Question 4. What happens for other angles α and other type of singularities, including the non integrable ones for which the ergodic theorem does not necessarily hold. Another natural generalization of Theorem 2 is to consider meromorphic functions. Let A be such a function with highest pole of order m. Thus A can be written as r  cj ˜ + A(x) A(x) = m (x − x ) j j=1 where the highest pole of A˜ has order at most m − 1. Theorem 5. (a) Let A be fixed and let α be distributed according to a smooth AN (α, x) density on T. Then for any x ∈ T, has a limiting distribution as N → ∞. Nm ˜ c1 , . . . cr be fixed while (α, x, x1 . . . xr ) are distributed according to a (b) Let A, AN (α, x) smooth density on on Tr+2 then has a limiting distribution as N → ∞. Nm (c) If (x1 , x2 . . . xr ) is a fixed irrational vector then for almost every x ∈ T the limit distribution in part (a) is the same as the limit distribution in part (b). Proofs of Theorems 3 and 5 are sketched in Section 7. It will be apparent from the proof of Theorem 5 that the limit distribution in part (a) is not the same for all x1 , x2 . . . xj . For example if xj = jx1 leads to an exceptional distribution since a close approach to x1 and x2 by the orbit of x should be followed by a close approach to xj for j ≥ 3. We will see that phenomena appears in many limit theorems (see e.g Theorem 9, Theorem 25 and Question 52, Theorem 38 and Question 38, as well as [93]). Question 5. What can be said about more general meromorphic functions such as sin 2πx/(sin 2πx + 3 cos 2πy) on Td with d > 1?

LIMIT THEOREMS FOR TORAL TRANSLATIONS

233

3. Ergodic sums of characteristic functions. Discrepancies The case where A = χΩ is a classical subject in number theory. Define the discrepancy function DN (Ω, α, x) =

N −1 

χΩ (x + nα) − N

n=0

Vol(Ω) . Vol(Td )

Uniform distribution of the sequence x + kα on Td is equivalent to the fact that, for regular sets Ω, DN (Ω, α, x)/N → 0 as N → ∞. A step further in the description of the uniform distribution is the study of the rate of convergence to 0 of DN (Ω, α, x)/N . In d = 1 it is known that if α ∈ T − Q is fixed, the discrepancy DN (Ω, α, x)/N displays an oscillatory behavior according to the position of N with respect to the denominators of the best rational approximations of α. A great deal of work in Diophantine approximation has been done on estimating the discrepancy function in relation with the arithmetic properties of α ∈ T, and more generally for α ∈ Td . 3.1. The maximal discrepancy. Let DN (α) = sup DN (Ω, α, 0)

(4)

Ω∈B

where the supremum is taken over all sets Ω in some natural class of sets B, for example balls or boxes (product of intervals). The case of (straight) boxes was extensively studied, and growth properties of the sequence DN (α) were obtained with a special emphasis on their relations with the Diophantine approximation properties of α. In particular, following earlier advances of [48, 64, 75, 99, 117] and others, [8] proves Theorem 6. Let DN (α) = sup DN (Ω, α, 0) Ω−box

Then for any positive increasing function φ we have  DN (α) is bounded for (5) φ(n)−1 < ∞ ⇐⇒ d φ(ln ln N ) almost every α ∈ Td . (ln N ) n In dimension d = 1, this result is the content of Khinchine theorems obtained in the early 1920’s [64], and it follows easily from well-known results from the metrical theory of continued fractions (see for example the introduction of [8]). The higher dimensional case is significantly more difficult and the cited bound was only obtained in the 1990s. The bound in (5) focuses on how bad can the discrepancy become along a subsequence of N , for a fixed α in a full measure set. In a sense, it deals with the worst case scenario and does not capture the oscillations of the discrepancy. On the other hand, the restriction on α is necessary, since given any εn → 0 it is easy to see that for α ∈ T sufficiently Liouville, the discrepancy (relative to intervals) can be as bad as Nn εn along a suitable sequence Nn (large multiples of Liouville denominators). For d = 1, it is not hard to see, using continued fractions, that for any α : (α) (α) lim sup DlnN N > 0, lim inf DN (α) ≤ C; and for α ∈ D(0) lim sup DlnN N < +∞. The study of higher dimensional counterparts to these results raises several interesting questions.

234

DMITRY DOLGOPYAT AND BASSAM FAYAD

Question 6. Is it true that lim sup DlnNd(α) > 0 for all α ∈ Td ? N < +∞? Question 7. Is it true that there exists α such that lim sup DlnNd(α) N Question 8. What can one say about lim inf adequately chosen normalization? for every α?

DN (α) aN

for a.e. α, where aN is an

Question 9. Same questions as Questions 6–8 when boxes are replaced by balls. Question 10. Same questions as Questions 6–8 for the isotropic discrepancy, when boxes are replaced by the class of all convex sets [79]. 3.2. Limit laws for the discrepancy as α is random. In this survey, we will mostly concentrate on the distribution of the discrepancy function as α is random. The above discussion naturally raises the following question. Question 11. Let α be uniformly distributed on Td . Is it true that converges in distribution as N → ∞?

DN (α) lnd N

Why do we need to take α random? The answer is that for fixed α the discrepancy does not have a limit distribution. For example for d = 1 the Denjoy-Koksma inequality says that  |Aqn − qn A(x)dx| ≤ 2V where qn is the n-th partial convergent to α and V denotes the total variation of A. In particular Dqn (I, α, x) can take at most 3 values. In higher dimensions one can show that if Ω is either a box or any other convex set then for almost all α and almost all tori, when x is random the variable DN (Ω, Rd /L, α, ·) aN does not converge to a non-trivial limiting distribution for any choice of aN = aN (α, L) (see discussion in the introduction of [29]). Question 12. Is this true for all α, L? Question 13. Study the distributions which can appear as weak limits of DN (Ω, α, ·) , in particular their relation with number theoretic properties of α. aN Let us consider the case d = 1 (so the sets of interest are intervals and we will write I instead of Ω.) It is easy to see that all limit distributions are atomic for all I iff α ∈ Q. Question 14. Is it true that all limit distributions are either atomic or Gaussian for almost all I iff α is of bounded type? Evidence for the affirmative answer is contained in the following results. Theorem 7. ([55]) If α ∈ Q and I = [0, 1/2] then there is a sequence Nj such DNj (I, α, ·) converges to N. that j Instead of considering subsequences, it is possible to randomize N.

LIMIT THEOREMS FOR TORAL TRANSLATIONS

235

Theorem 8. Let α be a quadratic surd. (a) ([10]) If (x, a, l) is uniformly distributed on T3 then

D[aN ] ([0, l], α, x) √ conln N

verges to N(σ 2 ) for some σ 2 = 0. (b) ([11]) If M is uniformly distributed on [1, N ] and l is rational then there DM ([0, l], α, 0) − C(α, l) ln N √ converges to are constants C(α, l), σ(α, l) such that ln N 2 N(σ (α, l)). Note that even though we have normalized the discrepancy by subtracting the expected value an additional normalization is required in Theorem 8(b). The reason for this is explained at the end of Section 10.5. So if one wants to have a unique limit distribution for all N one needs to allow random α. The case when d = 1 was studied by Kesten. Define V (u, v, w) =

∞  sin(2πu) sin(2πv) sin(2πw) k=1

k2

.

If (r, q) are positive integers let θ(r, q) =

Card(j : 0 ≤ j ≤ q − 1 : gcd(j, r, q) = 1) . Card(j, k : 0 ≤ j, k ≤ q − 1 : gcd(j, k, q) = 1)

Finally let ⎧ = >−1   ⎪ ⎨ π3 q−1 θ(p, q) 1 1 V (u, rp , v)dudv r=0 12 q 0 0 =   >−1 c(r) = 3 ⎪ ⎩ π12 1 1 1 V (u, r, v)dudrdv 0

0

0

if r =

p q

and gcd(p, q) = 1

if r is irrational.

Theorem 9. ([61,62]) If (α, x) is uniformly distributed on T2 then converges to the Cauchy distribution C.

DN ([0,l],α,x) c(l) ln N

Note that the normalizing factor is discontinuous as a function of the length of the interval at rational values. A natural question is to extend Theorem 9 to higher dimensions. The first issue is to decide which sets Ω to consider instead of intervals. It appears that a quite flexible assumption is that Ω is semialgebraic, that is, it is defined by a finite number of algebraic inequalities. Question 15. Suppose that Ω is semialgebraic then there is a sequence aN = DN (Ω, Rd /L, α, x) aN (Ω) such that for a random translation of a random torus aN converges in distribution as N → ∞. By random translation of a random torus, we mean a translation of random angle α on a torus Rd /L where L = AZd and the triple (α, x, A) has a smooth density on Td ×Td ×GL(R, d). Notice that comparing to Kesten’s result of Theorem 9, Question 15 allows for additional randomness, namely, the torus is random. In particular, for d = 1, the study of the discrepancy of visits to [0, l] on the torus R/Z is equivalent to the study of the discrepancy of visits to [0, 1] on the torus R/(l−1 Z). Thus the purpose of the extra randomness is to avoid the irregular dependence on parameters observed in Theorem 9 (cf. also [109, 110]).

236

DMITRY DOLGOPYAT AND BASSAM FAYAD

So far Question 15 has been answered for two classes of sets: strictly convex sets and (tilted) boxes, which includes the two natural counterparts to intervals in higher dimension that are balls and boxes. Given a convex body Ω, we consider the family Ωr of bodies obtained from Ω by rescaling it with a ratio r > 0 (we apply to Ω the homothety centered at the origin with scale r). We suppose r < r0 so that the rescaled bodies can fit inside the unit cube of Rd . We define N −1  χΩr (x + nα) − N Vol(Ωr ) (6) DN (Ω, r, α, x) = n=0

Theorem 10. ([28]) If (r, α, x) is uniformly distributed on X = [a, b] × Td × Td DN (Ω,r,α,x) then r(d−1)/2 has a limit distribution as N → ∞. N (d−1)/2d The form of the limiting distribution is given in Theorem 18 in Section 7. In the case of boxes we recover the same limit distribution as in Kesten but with a higher power of the logarithm in the normalization. Theorem 11. ([29]) In the context of Question 15, if Ω is a box, then converges to C as N → ∞.

DN c lnd N

Alternatively, one can consider gilded boxes, namely: for u = (u1 , . . . , ud ) with 0 < ui < 1/2 for every i, we define a cube on the d-torus by Cu = [−u1 , u1 ] × . . . [−ud , ud ]. Let η > 0 and M Cu be the image of Cu by a matrix M ∈ SL(d, R) such that M = (aij ) ∈ Gη = {|ai,i − 1|, for every i and |ai,j | < η for every j = i}. For a point x ∈ Td and a translation frequency vector α ∈ Td we denote ξ = (u, M, α, x) and define the following discrepancy function DN (ξ) = #{1 ≤ m ≤ N : (x + mα) mod 1 ∈ M Cu } − 2d (Πi ui ) N. Fix d segments [vi , wi ] such that 0 < vi < wi < 1/2∀i = 1, . . . , d. Let (7)

X = (u, α, x, (ai,j )) ∈ [v1 , w1 ] × . . . [vd , wd ] × T2d × Gη

We denote by P the normalized restriction of the Lebesgue × Haar measure on X. Then, the precise statement of Theorem 11 is ! 1 2 2d+2 Theorem 12. ([29]) Let ρ = d! . If ξ is distributed according to λ then π DN (ξ) converges to C as N → ∞. ρ(ln N )d Question 16. Are Theorems 10–12 valid if (a) L is fixed; (b) x is fixed? Question 17. Describe large deviations for DN . That is, given bN 0 aN where aN is the same as in Question 15, study the asymptotics of P(DN ≥ bN ). One can study this question in the annealed setting when all variables are random or in the quenched setting where some of them are fixed. Question 18. Does a local limit theorem hold? That is, is it true that given a finite interval J we have lim aN P(DN ∈ J) = c|J|?

N →∞

LIMIT THEOREMS FOR TORAL TRANSLATIONS

237

4. Poisson regime The results presented in the last section deal with the so called CLT regime. This is the regime when, since the target set Ω is macroscopic (having volume of order 1), if T was sufficiently mixing, one would get the Central Limit Theorem for the ergodic sums of χΩ . In this section we discuss Poisson (microscopic) regime, that is, we let Ω = ΩN shrink so that E(DN (ΩN , α, x)) is constant. In this case, the sum in the discrepancy consists of a large number of terms each one of which vanishes with probability close to 1 so that typically only finitely many terms are non-zero. Theorem 13. ([88]) Suppose that Ω is bounded set whose boundary has zero measure. If (α, x) is uniformly distributed on Td × Td then both DN (N −1/d Ω, α, x) and DN (N −1/d Ω, α, 0) converge in distribution. Note that in this case the result is less sensitive to the shape of Ω than in the case of sets of unit size. We will see later (Theorem 16 in Section 7) that one can also handle several sets at the same time. Corollary 14. If (α, x) is uniformly distributed on Td × Td then the following random variables have limit distributions ¯) where x ¯ is a given point in Td ; (a) N 1/d min d(x + nα, x 0≤n 1, ln N i Question 23. What happens if in Theorem 19 lnd N is replaced by lna N with a ∈ (0, d)? Question 24. Is Theorem 19 still valid if the distribution of ξ is concentrated on a submanifold of X? For example, one can take α = (s, s2 ). A special case of Question 24 is when the matrix (ai,j ) is fixed equal to Identity. This case is directly related to Question 16(a). The proof of Theorem 19 proceeds by martingale approach (see [26, 27]) which requires good mixing properties in the future conditioned to the past. In the present setting, to apply this method it suffices to prove that most orbits of certain unipotent subgroups are equidisitributed at a polynomial rate. Under the conditions of Theorem 19 one can assume (after an easy reduction) that the initial point has smooth density with respect to Haar measure. Then the required equidistribution follows easily form polynomial mixing of the unipotent flows. In the setting of Question 24 (as well as Question 53 in Section 11) the initial point is chosen from a positive codimension submanifold so one cannot use the mixing argument. The problem of estimating the rate of equidistribution for unipotent orbits starting from submanifolds interpolates between the problem of taking a random initial condition with smooth density which is solved and the problem of taking fixed initial condition which seems very hard.

248

DMITRY DOLGOPYAT AND BASSAM FAYAD

8. Shrinking targets Another classical result in probability theory is the Borel-Cantelli Lemma which says that if Aj are independent sets and j P(Aj ) = ∞ then P-almost every point belongs to infinitely many sets. A yet stronger conclusion is given by the strong Borel-Cantell Lemma claiming that the number of Aj which happen up to time N is N asymptotic to j=1 P(Aj ). In the context of ergodic dynamical systems (T, X, μ), the law of large numbers is reflected in the Birkhoff theorem of almost sure converge in average of the ergodic means associated to a measurable observable, for example the characteristic function of a measurable set A ⊂ X. In a similar fashion one can study the so called dynamical Borel-Cantelli properties of the system (X, T, μ) by considering instead of a fixed stet A a sequence of ”target” sets Aj ∈ X such that  μ(Aj ) = ∞. We then say that the dynamical Borel-Cantelli property is satisfied by {Aj } if for almost every x, T j (x) belongs to Aj for infinitely many j. In the context of a dynamical system (T, X, μ) on a metric space X it is natural to assume that the sets in question have nice geometric structure, since it is always possible for any dynamical system (with a non atomic invariant measure) to construct sets with divergent sum of measures that are missed after a certain iterate by the orbits of almost every point [21, Proposition 1.6]. The simplest assumption is that the sets be balls. The dynamical Borel-Cantelli property for balls is a common feature for deterministic systems displaying hyperbolicity features (see [27, 51, 108] and references therein). Due to strong correlations among iterates of a toral translation the dynamical Borel-Cantelli properties are more delicate in the quasi-periodic context. 8.1. Dynamical Borel-Cantelli lemmas for translations. For toral translations one needs also to assume that the sets are nested since otherwise one can take Aj ⊂ (A0 + jα) for some fixed set A0 ensuring that the points from the compliment of A0 do not visit any Aj at time j. This motivates the following definition (see [21, 38, 51]). N Given T : (X, μ) → (X, μ) let VN (x, y) = n=1 χB(y,rn ) (T n x). We say that T has the shrinking target property (STP) if for any y, {rn } such that  μ(B(y, rn )) = ∞, n

it holds that VN (x, y) → ∞ for almost all x, i.e. the targets sequence (B(y, rn )) satisfies the Borel-Cantelli property for T . We say that T has the  monotone shrinking target property (MSTP) if for any y, {rn } such that n μ(B(rn )) = ∞ and rn is non-increasing VN (x, y) → ∞ for almost all x. In the case of translations, we can always assume without loss of generality that y = 0 (replace x by x − y). We then use the notation VN (x) for VN (x, y). We also use the notation B(r) for the ball B(0, r). Another interesting choice is to take y = x in which case we study the rate of return rather than the rate of approach to 0. Note that if VN (x, x) does not depend on x and so the number of close returns N −1 depends only on α. We shall write UN (α) = n=0 χB(rn ) (T n 0). The following is a straightforward consequence of the fact that toral translations are isometries. Theorem 20. ([38]) Toral translations do not have STP.

LIMIT THEOREMS FOR TORAL TRANSLATIONS

249

It turns out that the following Diophantine condition is relevant to this problem. Let (24)

D ∗ (σ) = {α : ∀k ∈ Z − 0, max kαi  ≥ C|k|−(1+σ)/d }. i∈[1,d]

Theorem 21. ([80]) A toral translation Tα has the MSTP iff α ∈ D ∗ (0). A simple proof of Theorem 21 can be found in [38]. Recall that D∗ (0) has zero Lebesgue measure. Hence, the latter result shows that one has to further restrict the targets if one wants that typical translations display the dynamical Borel-Cantelli property relative to these targets. One possible restriction on the targets is to impose a certain growth rate on the sum of their volumes. This actually allows to further distinguish among distinct Diophantine classes as it is shownin the following result. We say that T has s(M)STP if for any {rn } such that n rnds = ∞ (and rn is non-increasing) VN (x) → ∞ for almost all x. We then have the following. Theorem 22. ([124]) a) If α ∈ D ∗ (sd − d), then the toral translation Tα does not have the s-MSTP. b) A circle rotation Tα has the s-MSTP iff α ∈ D ∗ (s − 1). Question 25. Is this true that the toral translation Tα has the s-MSTP iff α ∈ D∗ (sd − d)? Another possible direction is to study specific sequences, asking for example that rn = cn−γ , or that nrnd be decreasing, in which case the sequence rn is coined a Khinchin sequence. The case rn = cn−1/d in dimension d is very particular, but important. Indeed a vector α ∈ Td is said to be badly approximable if for some c > 0, the sequence limN →∞ UN (α, {cn−1/d }) < ∞. It is known that the set of badly approximable vectors has zero measure. By contrast, vectors α such that limN →∞ UN (α, {cn−(1/d+ε) }) = ∞ for some ε > 0 are called very well approximated, or VWA. The obvious direction of the Borel-Cantelli lemma implies that almost every α ∈ Td is not very well approximated (cf. [19, Chap. VII]). The latter facts are particular cases of a more general result, the Khintchine–Groshev theorem on Diophantine approximation which gives a very detailed description of the sequences such that UN (x, {rn }) diverges for almost all α. We refer the reader to [13] for a nice discussion of that theorem and its extensions, and to Section 11.1 below. Khinchin sequences also display BC property much more likely than general sequences. For example, compare Theorem 23(b) below with Theorem 21 which shows that the set of vectors having mSTP has zero measure. If a shrinking target property holds it is natural to investigate the asymptotics of the number of target hits. This makes the following definition natural. We say that a given sequence of targets {An } is sBC or strong Borel-Cantelli for (T, X, μ) if for almost every x N n n=1 χAn (T x) = 1. lim N N →∞ n=1 μ(An ) 7

Theorem 23. [20] (a) For every α such that its convergents satisfy an ≤ Cn 6 the sequence {B( nc )} is sBC for Tα . (b) For almost every α ∈ T, any Khinchin sequence is sBC for Tα .

250

DMITRY DOLGOPYAT AND BASSAM FAYAD

(c) For any α ∈ D(1), and any decreasing sequence {rn } such that {B(rn )} is sBC for Tα .



rn = ∞,

Observe that the condition in (a) has full measure. On the other hand, it is not hard to see that if an (α) ∼ n2+ε for every n then the sequence (B( n1 )) does not have the sBC for Tα . Indeed, if 5 6 k 1 k 1 x∈ − , + qn 2nqn qn 2nqn then since qn α ≤

1 qn+1



n1+ε/2  qn

2 n2+ε qn

and ln qn ≤ Cn ln n

χB( n1 ) (x + lα) ≥ n

l=qn

1+ε/2

0

n1+ε/2  qn l=1

1 . n

But it is easy to see that a.e. x belongs to infinitely many intervals of the form 1 1 [ qkn − 2nq , k + 2nq ]. n qn n In higher dimensions, it was proved in [117] that  Theorem 24. If n rnd = ∞ then for almost every vector α ∈ Td , the sequence (B(rn )) is sBC for the translation Tα . 8.2. On the distribution of hits. Theorems 23 and 24 motivate the study of the error terms ΔN (c, α, x) = VN (α, x) −

N 

¯ N (c, α) = UN (α) − Vol(Brn ) and Δ

n=1

N 

Vol(Brn ).

n=1

One can for example try to give lower and upper asymptotic bounds on the growth of ΔN as a function of the arithmetic properties of α in the spirit of Kintchine-Beck Theorem 6 and Questions 6–8. Here we will be interested in the distribution of ΔN (c, α, x) after adequate normalization when α or x or (α, x) are random. Theorem 25. ([9,90]) Let rn = cn−1/d . Suppose that x is uniformly distributed on Td . For any c > 0, if α ∈ D∗ (0), there is a constant K such that all limit points ΔN (c, α, x) √ of are N(σ 2 ) with σ 2 ≤ K. ln N In the case of random (α, x) we have Theorem 26. Let rn = cn−1/d . ([30]) There is Σ(c, d) > 0 such that if (α, x) ΔN (c, α, x) √ is uniformly distributed on Td × Td then converges to N(Σ(c, d)). ln N There is an analogous statement for the return times. ¯ d) > 0 such Theorem 27. ([30, 107, 114]) Let rn = cn−1/d . There is Σ(c, ¯ N (c, α) Δ that if α is uniformly distributed on Td then √ converges in distribution to bN ¯ d)) where N(Σ(c, ' ln N ln ln N if d = 1 bN = ln N if d ≥ 2.

LIMIT THEOREMS FOR TORAL TRANSLATIONS

251

The case d = 1 was obtained in [107, Theorem 3.1.1 on page 44] (see also [114]), based on the metric theory of the continued fractions. In fact, one can handle more general sequences. Namely, let φ(k) satisfy the following conditions  (i) φ(k) 1 0, but k φ(k) = +∞, 2n n ≤C (ii) There exists 0 < δ < 1/2 such that k=1 φ(k) δ k=1 φ(k) k 2n n 2 (iii) k=1 φ (k) ≤ C k=1 φ(k). n) Theorem 28. ([44]) If rn = φ(ln and α is uniformly distributed on T then n ¯ N (c, α)  Δ k) 2 converges in distribution to N(Σ(c)) where F (n) = nk=1 φ(ln . k F (n) ln F (n)

The higher dimensional case is obtained via ergodic theory of homogeneous flows and martingale methods in [30]. Question 26. Study the limiting distribution of UN and VN in case rn = with γ < d1 .

c nγ

Question 27. Do Theorems 24, 26 and 27 hold when the random vector α is taken from a proper submanifold of Td , for example α = (s, s2 , . . . , sd ). One motivation for this question comes from Diophantine approximation on manifolds (see [13] and references wherein), another is multidimensional extension of Kesten Theorem (cf. Question 24). 8.3. Proofs outlines. First we sketch a proof of Theorem 24 in case rn = cn−1/d . Consider the number Nm (α, x) of solutions to x + nα ∈ B(0, cn−1/d ),

em < n < em+1 .

The argument used to prove Theorem 16 shows that Nm (α, x) = f (gm (Λα Zd+1 + x)) where f is the function on the space of affine lattices given by  f (L) = χB(0,c)×[1,e] (v). v∈L

Thus (25)

ln N 

Nm (α, x) ∼

m=1

ln N 

f (gm (Λα Zd+1 + x))

m=1 −1/d

and Theorem 24 for rn = cn reduces to the study of ergodic sums (25) under the assumption that the initial condition has a density on n+ (α, x). In fact, a standard argument allows to reduce the problem to the case when the initial condition has density on the space of lattices. Namely, it is not difficult to check that the ergodic sums of f do not change much if we move in the stable or neutral direction in the space of lattices. After this reduction, the sBC property follows from the Ergodic Theorem. The relation (25) also allows to reduce Theorem 26 to a Central Limit Theorem for ergodic sums of gm which can be proven, for example, by a martingale argument (see [81]. We refer the reader to [26] for a nice introduction to the martingale approach to limit theorems for dynamical systems.)

252

DMITRY DOLGOPYAT AND BASSAM FAYAD

The proof of Theorem 27 is similar but one needs to work with lattices centered at 0 rather than affine lattices. In particular, the non-standard normalization in case d = 1 is explained by the fact that f in this case is not in L2 and the main contribution comes from the region where f is large (in fact, the analysis is similar to [46, Section 4]). 9. Skew products. Random walks. 9.1. Basic properties. The properties of ergodic sums along toral translations are crucial to the study of some classes of dynamical systems, such as skew products or special flows. In this section we consider the skew products. Special flows are the subject of Section 10. Skew products above Tα will be denoted Sα,A : Td × Tr → Td × Tr They are given by Sα,A (x, y) = (x + α, y + A(x) mod 1). Cylindrical cascades above Tα will be denoted Wα,A : Td × Rr → Td × Rr . They are given by Wα,A (x, y) = (x + α, y + A(x)). Note that N Wα,A (x, y) = (x + N α, y + AN (x))

(the same formula holds for Sα,A but the second coordinate has to be taken mod 1). If A takes integer values then Wα,A preserves Td × Zr and it is natural to restrict the dynamics to this subset. Thus cylindrical cascades define random walks on Rr or Zr driven by the translation Tα . If α is Diophantine and A is smooth then the so called linear cohomological equation similar to (3)  A(u)du = −B(x + α) + B(x) (26) A(x) − Td

has a smooth solution B, thus Sα,A and Wα,A are respectively smoothly conjugated to the translations Sα,Td A and Wα,Td A via the conjugacy (x, y) → (x, y − B(x)). Hence the ergodic properties of the skew products and the cascades with smooth A are interesting to study only in the Liouville case. The following is a convenient ergodicity criterion for skew products. Proposition 4. [78] Sα,A is ergodic iff for any λ ∈ Zr − {0}, -λ, A. is not a measurable multiplicative coboundary above Tα , that is, iff there does not exist λ ∈ Zr − {0} and a measurable solution ψ : Td → C to (27)

ei2πλ,A(x) = ψ(x + α)/ψ(x).

This ergodicity criterion can be simply derived from the observation that the spaces Vλ of functions of the form (28)

φ(x)ei2πλ,y

are invariant under Sα,A . It then follows that the existence or nonexistence of an invariant function ϕ by Sα,A is determined by the existence or nonexistence of a solution to (27). We refer the reader to Section 10 for further discussion concerning (27). When A is not a linear coboundary, i.e. (26) does not have a solution, it is very likely and often easy to prove that (27) does not have a solution either. For example, it suffices to show that the sums ANn do not concentrate on a subgroup of lower dimension for a sequence Nn such that TαNn → Id. Indeed, if a solution to

LIMIT THEOREMS FOR TORAL TRANSLATIONS

253

(27) exists then |ψ| is constant by ergodicity of the base translation. Therefore by Lebesgue Dominated Convergence Theorem   i2πλ,ANn (x) e dx = lim ψ(x + Nn α)/ψ(x)dx = 1 lim n→∞

n→∞

Td

Td

which means that ANn (x) is concentrated near the set {u ∈ Rr : -λ, u. ∈ Z}. In particular it was shown, in [35], that for every Liouville translation vector α ∈ Rd , the generic smooth function A does not admit a solution to (27) for any λ ∈ Rd − {0}. Hence the generic smooth skew product above a Liouville translation is ergodic (cf. Section 9.3 and Theorem 42 in Section 10). It is known that ergodic skew products Sα,A are actually uniquely ergodic (see [100]). On the other hand, skew products above translations are never weak mixing since they have the translation as a factor. However, the same ideas as the ones used to prove ergodicity of the skew products often prove that all eigenfunctions come from that factor (see [42, 45, 58, 59, 128]). If one considers skew products on T × T with smooth increasing functions on (0, 1) having a jump discontinuity at 0 then the corresponding skew product will even be mixing in the fibers, that is, the correlations between functions that depend only on the fiber coordinate tends to 0. A classical example is given by the skew shift (x, y) → (x + α, y + x). The mixing in the fibers can be easily derived from the invariance of Vλ defined by (28) and the fact that, by the Ergodic ! ∂A n = → +∞. A similar phenomenon can occurs for analytic Theorem, ∂A ∂x ∂x n skew products that are homotopic to identity but over higher dimensional tori Td × T 2 (x, y) → (x + α, y + φ(x)), with α and φ as in Theorem 47 below (see [37]). This mechanism can also be used to establish ergodicity of cylindrical cascades (see [102]). A fast decay of correlations in the fibers can be responsible for the existence of non trivial invariant distributions for these skew products similarly to what occurs for the skew shift (x, y) → (x + α, x + y) (see [60]). The deviations of ergodic sums for skew products, that is the behavior of the sums   N −1  n B(Sα,A (x, z)) − N B(x, z)dxdz n=0

Td

Tr

is poorly understood. The only cases where some results are available have significant extra symmetry [41, 60, 91]. 9.2. Recurrence. Our next topic are cylindrical cascades. As it was mentioned above they are sometimes called deterministic random walks. So the first question one can ask is if the walk is recurrent (that is, AN returns to some bounded region infinitely many times) or transient. We will assume in this section that A has zero mean since otherwise Wα,A is transient by the ergodic theorem. If r = 1 this condition is also sufficient. In fact, the next result is valid for skew products over arbitrary ergodic transformations (in fact, there is a multidimensional version of this result, see Theorem 32). Theorem 29. ([5]) If r = 1, A is integrable and has zero mean then W is recurrent.

254

DMITRY DOLGOPYAT AND BASSAM FAYAD

9.2.1. Recurrence and the Denjoy Koksma Property. Next we note that if the base dimension d = 1 and A has bounded variation then W is recurrent for all r and for all α ∈ R − Q due to the Denjoy-Koksma inequality stating that  (29) max |Aqn − qn A(y)dy| ≤ 2V x∈T

T

for every denominator of the convergence of α, where V is the total variation of A. More generally we say that A (not necessarily of zero mean) has the DenjoyKoksma property (DKP) if there exist constants C, δ > 0 and a sequence nk → ∞ such that  (30)

P(|Ank − nk

Td

A(y)dy| ≤ C) ≥ δ.

We say that A has the strong Denjoy-Koksma property (sDKP) if (30) holds with δ = 1. Note that if DKP holds and A has zero mean then the set of points where lim inf n→∞ |An | ≤ C has positive measure and so by ergodicity of the base map Wα,A is recurrent. Later, we will also see how the DKP can be very helpful in proving ergodicity of the cylindrical cascades as well as weak mixing of special flows. The situation with DKP for translations on higher dimensional tori is delicate. Of course it holds for almost all α and for every smooth function by the existence of smooth solutions to the linear cohomological equation (3). But the DKP also holds above most translations even from a topological point of view. Theorem 30. ([36]) There is a residual set of vectors in α ∈ Rd such that DKP holds above Tα for every function that is of class C 4 . In fact, it is non-trivial to construct rotation vectors and smooth functions that do not have the DKP. The first construction is due to Yoccoz and it actually provides examples of non recurrent analytic cascades. Theorem 31. ([129, Appendix]) For d = 2 there exists an uncountable dense set Y of translation vectors and a real analytic function A : T2 → C with zero mean such that W is not recurrent. Denote the translation vector by (α , α ). The main ingredient in the construction of [129] is that the denominators, qn and qn of the convergents of α and α are   alternated, and more precisely, they are such that the sequence ...qn , qn , qn+1 , qn+1 ... increases exponentially. We will see later that the same construction can be used to create examples of mixing special flows with an analytic ceiling function. 2 Let Y be the set of couples (α , α ) ∈ R2 − Q , whose sequences of best approximations qn and qn satisfy, for any n ≥ n0 (α , α ) 

qn ≥ e3qn ,



 qn+1 ≥ e3qn .

Then [129] constructs a real analytic function A : T2 → C with zero integral such that for almost every (x, y) ∈ T2 |An (x, y)| → ∞, hence Wα,A is not recurrent. Note that the set Y as defined above is uncountable and dense in R2 .

LIMIT THEOREMS FOR TORAL TRANSLATIONS

255

9.2.2. Indicator functions. Now we specify the study of Wα,A to the case where A = (χΩj − Vol(Ωj ))j=1,...,r where Ωj ⊂ X = Td are regular sets. If d > 1 then the DKP does not seem to be well adapted for proving recurrence in this case (see Questions 6–10). Question 28. Show that DKP does not hold when d > 1 and A = (χΩj − Vol(Ωj ))j=1,...,r and the Ωj ⊂ X are balls or boxes. There is however another criterion for recurrence which is valid for arbitrary skew products. Theorem 32. Given a sequence δn = o(n1/r ) the following holds. (a) ([22]) Consider the map T : X → X preserving a measure μ. Let W (x, y) = (T x, y + A(x)). If there exists a sequence kn such that lim μ(x : Akn (x) ≤ δn ) = 1 n→∞ then W is recurrent. (b) Consider a parametric family of maps Tα : X → X, α ∈ A. Assume that Tα preserves a measure μα . Let (α, x) be distributed according to a measure λ on N −1 n n=0 A(Tα x) A × X such that dλ = dν(α)dμα (x) for some measure ν on A. If δN has a limiting distribution as N → ∞ then Wα,A is recurrent for ν-almost all α. Note that T is not required to be ergodic. On the other hand if T is ergodic, r = 1 and A has zero mean, then by the Ergodic Theorem μ(|An /n| > ε) → 0 for any ε so one can take kn = n and δn = εn n where εn → 0 sufficiently slowly. Therefore Theorem 32 implies Theorem 29. Proof. (a) Suppose B is a wondering set (that is, W k B are disjoint) of positive measure which is contained in {|z| < C}. Let Bn = {(x, z) ∈ B : Akn (x) ≤ δn }. Then μ(Bn ) → μ(B) as n → ∞ so for large n μ(B) . 2 On the other hand, by assumption W ki (B) ⊂ Ei := {y ≤ 2C +δi } ⊂ En if i ∈ [1, n]. Hence μ(∪1≤i≤n W ki (Bki )) ≤ δnr = o(n), a contradiction. (b) follows from (a) applied to the map T : (A × X) × Rr given by T (α, x, y) =  (α, Wα,A (x, y)). μ(∪1≤i≤n W ki (Bki )) ≥ n

Combining Theorems 10 and 32(b) we obtain 1 r

Corollary 33. If {Ωj }j=1,...,r are real analytic and strictly convex and then W is recurrent for almost all α.

(d−1) 2d


0, it holds that for almost every α ∈ Rd , An 2 = O(nγ ) where A = χΩ − Vol(Ω), the sums are considered above the translation Tα and the L2 norm is considered with respect to the Haar measure on Td . In the case of boxes, the latter naturally follows from the power log control given by Beck’s Theorem (see Section 3.1). The proof of part (b) proceeds by extending the method of [129] discussed in Section 9.2.1 to the case of indicator functions. Question 30. Is it true that for a generic choice of Ωj as in Question 33, W 1 is transient for almost all α when (d−1) 2d > r ? An affirmative answer to Question 18 (Local Limit Theorem) would give evidence that Question 18 may be true due to Borel-Cantelli Lemma. (More precisely, to answer Question 30 we need a joint Local Limit Theorem for ergodic sums of indicators of several sets.) Question 31. Let α be as in Theorem 34 (a) or Question 33. Does there exist x such that limN →∞ ||AN (x)|| = ∞? Note that this is only possible if d > 1 due to the Denjoy-Koksma inequality. On the other hand in any dimension one can have orbits which stay in a half space. Such orbits have been studied extensively (see [103] and the references wherein). Another case where recurrence is not easy to establish is that of skew products over circle rotations with functions having a singularity such as the examples discussed in Section 2. We will come back to this question in the next section. 9.3. Ergodicity. Next we discuss the ergodicity of cylindrical cascades. Here one has to overcome both problems of recurrence discussed in Section 9.2 and issues of non-arithmeticity appearing in the study of ergodicity of Sα,A . The ergodicity of Wα,A is usually established using the fact that the sums ANn are increasingly well distributed on Rr when considered above any small scale balls in the base and for some rigidity sequence Nn , i.e. such that Nn α → 0. More precisely, usual methods of proving their ergodicity take into consideration a sequence of distributions (31)

(Ank )∗ (μ), k ≥ 1

¯ r where R ¯ is the along some rigidity sequence {nk } as probability measures on R one-point compactification of R. As shown in [85] each point in the topological support of a limit measure of (31) is a so called essential value for Wα,A . Following [115] a ∈ Rr is called an essential value of A if for each B ∈ Td of positive measure, for each  > 0 there exists N ∈ Z such that μ(B ∩ T −N B ∩ [|AN (·) − a| < ]) > 0. Denote by E(A) the set of essential values of A. Then the essential value criterion states as follows Theorem 35. ([115],[1]) (a) E(A) is a closed subgroup of Rr . (b) E(A) = Rr iff Wα,A is ergodic. (c) If A is integer valued and E(A) = Zr then Wα,A is ergodic on Td × Zr .

LIMIT THEOREMS FOR TORAL TRANSLATIONS

257

Hence if the supports of the probability measures in (31) are increasingly dense on Rr then Wα,A is ergodic. The case where d = r = 1 is the most studied although there are still some open questions in this context. For d = r = 1 ergodicity is often proved using the Denjoy Koksma Property. Indeed, if A is not cohomologous to a constant then  AN − N A are not bounded. Let qn be a best denominator for the base rotation. Pick Kn which is large but not too large. Then Kqn is still a rigidity time for the translation but AKqn have sufficiently large albeit controlled oscillations to yield that a given value a in the fibers is indeed an essential value. This method is actually well adapted to A whose Fourier transform satisfies ˆ A(n) = O(1/|n|), since they display a DKP (see [84]). Example of such functions are functions of bounded variation and functions smooth everywhere except for a log symmetric singularity. Ergodicity also holds in general for characteristic functions of intervals. Theorem 36. (a) [36] If α is Liouville, there is a residual set of smooth functions A with zero integral such that the skew product Wα,A is ergodic. (b) ([43]) If A has symmetric logarithmic singularity then Wα,A is ergodic for all irrational α. (c) ([24]) If A = χ[0,1/2] − χ[1/2,1] and α is irrational then Wα,A is ergodic on T1 × Z. (d) ([97]) If A = χ[0,β] − β then Wα,A is ergodic iff 1, α and β are rationally independent.  (e) ([102]) If A is absolutely continuous, T1 A(x)dx = 0, A is Rie piecewise mann integrable and T1 A (x)dx = 0 then Wα,A is ergodic for all α ∈ R − Q. (f ) ([23]) If A : T → Rr = (A1 , . . . , Ar ) with Aj = cj,i χIj,i − βj with Ij,i a finite family of intervals, cj,i ∈ Z and βj is such that T Aj (x)dx = 0 and if the sequence ({qn β1 }, . . . , {qn βr }) is equidisitributed on Tr as n → ∞, where qn is the sequence of denominators of α, then Wα,A is ergodic. In the case r = 1, it is sufficient to ask that ({qn β}) has infinitely many accumulation points, then Wα,A is ergodic. For further results on the ergodicity of cascades defined over circle rotations with step functions as in (f), we refer to the recent work [25]. The proofs of (a) and (b) are based on DKP and progressive divergence of the sums as explained above. (c)–(e) are treated differently since the ergodic sums take discrete values. For example, the proof of (e) in the case r = 1 is based on the fact that Aqn is bounded by DKP and then the hypothesis on {qn β} implies that the set of essential values is not discrete, hence it is all of R, and the ergodicity follows. The cases of slower decay of the Fourier coefficients of A are more difficult to handle. We have nevertheless a positive result in the particular situation of log singularities. Theorem 37. [39] If A has (asymmetric) logarithmic singularity then Wα,A is ergodic for almost every α. The delicate point in Theorem 37 is that the DKP does not hold. Indeed, it was shown in [119] that the special flow above Tα and under a function that has asymmetric log singularity is mixing for a.e. α. But, as we will see in the next

258

DMITRY DOLGOPYAT AND BASSAM FAYAD

section, mixing of the special flow is not compatible with the DKP. A contrario special flows under functions with symmetric logarithmic singularities are not mixing [72, 84] because of the DKP. In the proof of Theorem 37, one first shows that the DKP (30) holds if the constant δ is replaced by a sequence δn which decays sufficiently slowly and then uses this to push through the standard techniques under appropriate arithmetic conditions. The case of general angles for the base rotation or the case of stronger singularities are harder and all questions are still open. Question 32. Are there examples of ergodic cylindrical cascades with smooth functions having power like singularities? Conversely, we may ask the following Question 33. Are there examples of non ergodic cylindrical cascades with smooth functions having non symmetric logarithmic or (integrable) power singularities? The study of ergodicity when d > 1 and r > 1 is more tricky essentially because of the absence of DKP. For smooth observable, only the Liouville frequencies are interesting. The ergodic sums above such frequencies tend to stretch at least along a subsequence of integers. And this stretch usually occurs gradually and independently in all the coordinates of A hence a positive answer to the following question is expected. Question 34. Show that for any Liouville vector α, there is a residual set of smooth functions A with zero integral such that the skew product Wα,A is ergodic. As we discussed in the proof of Theorem 37, the cylindrical cascade on T × R with a function A having an asymmetric logarithmic singularity is ergodic for almost every α although the ergodic sums AN above Tα concentrate at infinity as N → ∞. The slow divergence of these sums that compare to ln N (see Question 1) plays a role in the proof of ergodicity. The logarithmic control of the discrepancy relative to a polyhedron (see Theorems 6, 11 and 34) motivates the following question. Question 35. Is it true that for (almost) every polyhedra Ωj ⊂ Td , j = 1, . . . , r, and A = (χΩj − Vol(Ωj ))j=1,...,r , the cascades Wα,A are ergodic? We note that the answer is unknown even for boxes with d = 2 and r = 1. 9.4. Rate of recurrence. Section 9.3 described several situations where Wα,A is ergodic. However for infinite measure preserving transformations the (ratio) ergodic theorem does not specify the growth of ergodic sums. Rather it shows that for any L1 functions B1 (x, y), B2 (x, y) with B2 > 0 we have N −1  n B (x, y)dxdy n=0 B1 (Wα,A (x, y))  1 → . (32) N −1 n B2 (x, y)dxdy n=0 B2 (Wα,A (x, y)) In fact ([1]) there is no sequence aN such that N −1 n n=0 B1 (Wα,A (x, y)) (33) aN

LIMIT THEOREMS FOR TORAL TRANSLATIONS

259

converges to 1 almost surely. On the other hand, one can try to find aN such that (33) converges in distribution. By (32) it suffices to do it for one fixed function B. For example one can take B = χB(0,1) . This motivates the following question. Question 36. Let α be as in Theorem 34 (a) or Question 33. How often is ||W N || ≤ R? So far this question has? been answered only @in a special case. Namely, let  −1 d = r = 1, ZN (x) = N n=0 χ[0,1/2] (x + nα) − 1/2 . Denote LN = Card(n ≤ N : Zn = 0). Theorem 38. [6] If α is a quadratic surd then there exists a constant c = c(α) √ 2 ln N LN converges to e−N /2 . such that cN Similar results have been previously obtained by Ledrappier-Sarig for abelian covers of compact hyperbolic surfaces ([82]). The fact that the correct normaliza√ tion is N/ ln N was established in [2]. Question 37. Extend Theorem 38 to the case when 1/2 is replaced by (a) any rational number; (b) any irrational number, (in which case one needs to replace {AN = 0} by {|AN | ≤ 1}). Question 38. What happens for typical α? N N ) (rather than LN ∼ √ ln N ln N is expected in view of Kesten’s Theorem 9. Ideas of the proof of Theorem 38 will be described in Section 10.5. Note that in contrast to Theorem 38, LN ∼

10. Special flows. 10.1. Ergodic integrals. In this section we consider special flows above Tα t which will be denoted Tα,A . Here A(·) > 0 is called the ceiling function and the flow is given by Td × R/ ∼ → Td × R/ ∼ (x, s) → (x, s + t), where ∼ is the identification (34)

(x, s + A(x)) ∼ (Tα (x), s).

Equivalently the flow is defined for t + s ≥ 0 by T t (x, s) = (x + nα, t + s − An (x)) where n is the unique integer such that (35)

An (x) ≤ t + s < An+1 (x).

Since Tα preserves a unique probability measure μ then the special flow will preserve a unique probability measure that is the normalized product measure of μ on the base and the Lebesgue measure on the fibers. Special flows above ergodic maps are always ergodic for the product measure constructed as above. The interesting feature of special flows is that they can be more ”chaotic” then the base map, displaying properties such as weak mixing or mixing even if the base map does not have them. Actually any map of a very wide

260

DMITRY DOLGOPYAT AND BASSAM FAYAD

class of zero entropy measure theoretic transformations, so called Loosely Bernoulli maps, are isomorphic to sections of special flows above any irrational rotation of the circle with a continuous ceiling function (see [98]). If A = β is constant then Tα,A is the linear flow on Td+1 with frequency vector t can be viewed as time changes of translation flows on (α, β). Thus special flows Tα,A d+1 T . In particular, if we consider the linear flow on Td+1 and multiply the velocity vector by a smooth non-zero function φ we get a special flow with a smooth ceiling function A. 10.2. Smooth time change. We recall that a translation flow frequency v ∈ Rd is said to be Diophantine if there exists σ, τ > 0 such that ||(k, v)|| ≥ C|k|−σ for every k ∈ Zd . Hence a translation vector (1, v) ∈ Rd+1 is Diophantine (homogeneous Diophantine or Diophantine in the sense of flows) if and only if v is a Diophantine vector in the sense of (2). Theorem 39. [76] Smooth non vanishing time changes of translation flows with a Diophantine frequency vector are conjugated to translation flows. Proof. Let v be a constant vector field on Td+1 . We suppose WLOG that v = (1, α). Let u(x) be a smooth function on the torus and x˙ = u(x)v. Then, making φ(x) a change of variables y = Tv (x) we obtain the equation y˙ = (φ + ∂v φ)(y)u(y)v. c The equation for y is linear if φ + ∂v φ = . Passing to Fourier series, this equation u    dx −1  can be solved if c = φ(x)dx and v is such that |1 + (k, v)|| ≥ C|k|−σ u(x) for every k ∈ Zd+1 which is equivalent to α Diophantine as in (2). One can also see this fact at the level of the special flow Tα,A associated to B(x,t) x˙ = u(x)v. Namely, making a change of variables (y, s) = Tα,A (x, t) transforms Tα,A to Tα,D with D(x) = A(x) + B(x + α, 0) − B(x, 0) so one can make the LHS constant provided α is Diophantine. Finally, the similarity between linear and nonlinear flows in the  Diophantine case is also reflected in (35)  since for Diophantine vectors α An = n A(x)dx + O(1). An interesting question is that of deviations of ergodic sums above time changed linear flows. In fact, the case of linear flows is already non trivial and can be studied by the methods described in Section 7.3. More precisely, as for translations the interesting case occurs when the function under consideration has singularities, for example, for indicator functions. Namely, given a set Ω let  (36)

T

χΩr (Tvt x)dt − T Vol(Ωr )

DΩ (r, v, x, T ) = 0

where Tvt denotes the linear flow with velocity v. We assume that (x, v, r) are distributed according to a smooth density. Theorem 40. ([28, 29]). Suppose that Ω is analytic and strictly convex. (a) If d = 2 then DΩ (r, v, x, T ) converges in distribution. ) converges to a Cauchy distribution. (b) If d = 3 then DΩ (r,v,x,T ln T

LIMIT THEOREMS FOR TORAL TRANSLATIONS

(c) If d ≥ 4 then

DΩ (r,v,x,T ) r

d−1 2

T

d−3 2(d−1)

261

has limiting distribution as T → ∞.

(d) For any d ∈ N, if Ω is a box then DΩ (r, v, x, T ) converges in distribution. The proof of Theorem 40 is similar to the proofs of Theorems 10 and 11 and Corollary 1. Corollary 41. Theorem 40 remains valid for time changes Tu(x)v where u(x) is fixed smooth positive function and v is random as in Theorem 40. Proof. To fix our ideas let us consider the case where Ω is analytic and strictly τ (x,t) t convex. Note that Tuv x = Tv where the by the above discussion the function τ satisfies  −1 dx τ (t, x) = at + ε(t, x, v) where a = u(x) and ε(t, x, v) is bounded for almost all v uniformly in x and t. Accordingly it suffices to see how much time is spend inside Ωr for the linear segment of length at. Next if the linear flow stays inside Ωr during the time [t1 , t2 ] then the time t dt spend in Ωr by the orbit of Tuv equals to t12 u(x(t)) . Thus we need to control the following integral for linear flow   T χΩr (Tvt x) dx ˜ dt − T . DΩ (r, v, x, T ) = t u(Tv x) Ωr u(x) 0 χ

(x)

Ωr However the Fourier transform of u(x) has a similar asymptotics at infinity as the Fourier transform of χΩr (x) (see [123]) so the proof of the Corollary proceeds in the same way as the proof of Theorem 40 in [28]. 

Up to now, we were interested in smooth time change of linear flows with typical frequencies. We will further discuss smooth time changes for special frequencies in Section 10.4 devoted to mixing properties. 10.3. Time change with singularities. If the time changing function of an irrational flow has zeroes then the ceiling function of the corresponding special flow has poles. In this case the smooth invariant measure is infinite. In the case of a unique singularity, we have that the time changed flow is uniquely ergodic with the Dirac mass at the singularity the unique invariant probability measure: Proposition 5. Consider a flow T t given by a smooth time change of an irrational linear flow obtained by multiplying the constant vector field by a function which is smooth and non zero everywhere except for one point x0 , then for any continuous function b and any x  1 t b(T u x)du = b(x0 ). lim t→∞ t 0 Proof. To simplify the notation we assume that the time change preserves the orientation of the flow. We use the representation as a special flow Tα,A with A having a pole. It suffices to prove this statement in case b equals to 0 in a small neighborhood of x0 . In that case we have  t u (37) b(Tα,A (x, s))du = Bn(t) (x) + O(1) 0

262

DMITRY DOLGOPYAT AND BASSAM FAYAD

 A(x) where B(x) = 0 b(x, s)ds and n(t) is defined by (35). If b vanishes in a small neighborhood of x0 then B is bounded and so |Bn(t) | ≤ Cn(t). Therefore it suffices → 0 which is equivalent to Ann → ∞. Let A˜ be a continuous to show that n(t) t function which is less or equal to A everywhere. Then  A˜n An ˜ ≥ lim = A(x)dx. lim inf n→∞ n n   ˜ Since A(x)dx = ∞ we can make A(x)dx as large as possible proving our claim.  Question 39. In the setting of Proposition 5 describe the deviations of ergodic integrals from b(x0 ). Question 40. Consider the case where the time change has finitenumber of m zeroes x1 , x2 , . . . , xm . In that case all limit measures are of the form j=1 pj δxj . Which pj describe the behavior of Lebesgue-typical points? In view of the relation (37) these questions are intimately related to Theorems 3 and 5 and Questions 2, 4 and 5 from Section 2.

Figure 2. Kocergin Flow is topologically equivalent to the area preserving flow shown on Figure 1 with separatrix loop removed. The rest point is responsible for the shear along the orbits. If one is interested in flows with singularities but that preserve a finite nonatomic measure then the simplest example can be obtained by plugging (by smooth surgery) in the phase space of the minimal two dimensional linear flow an isolated singularity coming from a Hamiltonian flow in R2 (see Figure 2). The so called Kochergin flows thus obtained preserve besides the Dirac measure at the singularity a measure that is equivalent to Lebesgue measure [71]. As it was explained in Section 2 Kochergin flows model smooth area preserving flows on T2 . These flows still have T as a global section with a minimal rotation for the return map, but the slowing down near the fixed point produces a singularity for the return time function above the last point where the section intersects the incoming separatrix of the fixed point. The strength of the singularity depends on how abruptly the linear flow is slowed down in the neighborhood of the fixed point. A mild slowing down, or mild shear, is typically represented by the logarithm while stronger singularities such as x−a , a ∈ (0, 1) are also possible. Powerlike singularities appear naturally in the study of area preserving flows with degenerate fixed points. We shall see below

LIMIT THEOREMS FOR TORAL TRANSLATIONS

263

that dynamical properties of the special flows are quite different for logarithmic and power like singularities. Question 41. What can be said about the deviations of the ergodic sums above Kocergin flows? 10.4. Mixing properties. We give first a classical criterion for weak mixing of special flows. Its proof is similar to the proof of the ergodicity criterion for skew products given by Proposition 4. Proposition 6. ([126]) Tα,A is weak mixing iff for any λ ∈ R∗ , there are no measurable solutions to the multiplicative cohomological equation (38)

ei2πλA(x) = ψ(x + α)/ψ(x).

Indeed if h(x, t) is the eigenfunction when for almost all x the function h(x, t)e−λt takes the same value ψ(x) for almost almost all t. Then (38) follows from the identification (34). Theorem 42. ([35]) If the vector α ∈ Rd is not β-Diophantine then there exists a dense Gδ for the C β+d topology, of functions ϕ ∈ C β+d (Td , R∗+ ), such that the special flow constructed over Tα with the ceiling function ϕ is weak mixing. This result is optimal since smooth time changes of linear flows with Diophantine vectors α, are smoothly conjugated to the linear flow and, hence, are not weak mixing. Mixing of special flows is more delicate to establish since one needs to have uniform distribution on increasingly large scales in R+ of the sums AN for all integers N → ∞, and this above arbitrarily small sets of the base space. Indeed mixing of special flows above non mixing base dynamics is in general proved as follows: if the ergodic sums AN become as N → ∞ uniformly stretched (well distributed inside large intervals of R+ ) above small sets, the image by the special flow at a large time T of these small sets decomposes into long strips that are well distributed in the fibers due to uniform stretch and well distributed in projection on the base because of ergodicity of the base dynamics (see Figure 3). The delicate point however is to have uniform stretch for all integers N → ∞. In particular the following result has been essentially proven in [70]. t is not mixing. Theorem 43. If A has DKP then Tα,A

Proof. If A has the DKP then there is a set Ω of positive measure on which (30) holds for positive density of nk . By passing to a subsequence we can find a set I of positive measure, a sequence {tk } and a vector β such that on Ω |Ank − tk | < C and αnk → β. Pick a small η. t [I × {0}], Ωi = ∪0≤t≤η Tα,A

t Ωf = ∪0≤|t|≤C+η Tα,A [(I + β) × {0}].

By decreasing I if necessary we obtain that those sets have measures strictly between 0 and 1. On the other hand it is not difficult to see from the definition of the tk special flow that μ(Tα,A Ωi ∩ Ωf ) → μ(Ωi ) contradicting the mixing property.  In particular the flows with ceiling functions A of bounded variation or functions with symmetric log singularities are not mixing. In fact, since the sDKP holds for any minimal circle diffeomorphism, it follows from (35) and (37) that any smooth flow on T2 without cycles or fixed points is not topologically mixing. We leave this as an exercise for the reader.

264

DMITRY DOLGOPYAT AND BASSAM FAYAD

Figure 3. Mixing mechanism for special flows: the image of a rectangle is a union of long narrow strips which fill densely the phase space. The first positive result about mixing of special flows is obtained in [71]. Theorem 44. If α ∈ R − Q and A has (integrable) power singularities then Tα,A is mixing. The reason why the case of power singularities is easier than the logarithmic case (corresponding to non-degenerate flows on T2 ) is the following. The standard ! ∂A n approach for obtaining the stretching of ergodic sums is to control ∂A ∂x = ∂x n −a For A as in theorem 44, ∂A with a > 1. In this ∂x has singularities of the type x case the main contribution to ergodic sums comes from the closest encounter with the singularity (cf. Theorem 3) making the control of the stretch easier. Moreover, the strength of the singularity allows to obtain speed of mixing estimates. Theorem 45. ([34]) If α is Diophantine and A has a (integrable) power sint is power mixing. gularity then Tα,A More precisely, there exists a constant β = β(α) such that if R1 , R2 are rectangles in T × R then # # #μ(R1 ∩ T t R2 ) − μ(R1 )μ(R2 )# ≤ Ct−β . (39) The exponent β in [34] seems to be non optimal. Question 42. For α Diophantine find the asymptotics of the LHS of (39). It is interesting to surpass the threshold β = 1/2. In particular, one would like to answer the following question. Question 43. [83] Can a smooth area preserving flow on T2 have Lebesgue spectrum? On the other hand for logarithmic singularities there might be cancelations in ergodic sums of ∂A ∂x , making the question of mixing more tricky.

LIMIT THEOREMS FOR TORAL TRANSLATIONS

265

Theorem 46. in Question 1. as − Let A be = a then T t is not mixing for any α ∈ R − Q. (a) ([71]) If j a+ j  + j j  − α,A t (b) ([72, 119]) If j aj = j aj then Tα,A is mixing for almost every α ∈ R − Q. − t (c) ([73]) If a+ j − aj has the same sign for all j then Tα,A is mixing for each α ∈ R − Q.   − t Question 44. ([74]) Does the condition that j a+ j = j aj imply Tα,A is mixing for every α ∈ R − Q? t Question 45. ([74]) Under the conditions of Theorems 44 and 46 is Tα,A mixing of all orders?

In higher dimensions much less is known. Note that for smooth ceiling functions Theorems 30, 39 and 43 precludes mixing for a set of rotation vectors of full measure that also contains a residual set. The following was shown in [36]. Recall the definition of the set Y used in Theorem 31. Define the following real analytic complex valued function on T2 : ∞  ∞  ei2πkx  ei2πky A(x, y) = + . ek ek k=2



k=2



Theorem 47. For any (α , α ) ∈ Y , the special flow constructed over the translation Tα ,α on T2 , with the ceiling function 1 + ReA is mixing. Because of the disposition of the best approximations of α and α the ergodic sums ϕm of the function ϕ, for any m sufficiently large, will be always stretching (i.e. have big derivatives), in one or in the other of the two directions, x or y, depending on whether m is far from qn or far from qn . And this stretch will increase when m goes to infinity. So when time goes from 0 to t, t large, the image of a small typical interval J from the basis T2 (depending on t the intervals should be taken along the x or the y axis) will be more and more distorted and stretched in the fibers’ direction, until the image of J at time t will consist of a lot of almost vertical curves whose projection on the basis lies along a piece of a trajectory under the translation Tα ,α . By unique ergodicity these projections become more and more uniformly distributed, and so will T t (J). For each t, and except for increasingly small subsets of it (as function of t), we will be able to cover the basis with such “typical” intervals. Besides, what is true for J on the basis is true for T s (J) at any height s on the fibers. So applying Fubini Theorem in two directions, first along the other direction on the basis (for a time t all typical intervals are in the same direction), and second along the fibers, we will obtain the asymptotic uniform distribution of any measurable subset, which is, by definition, the mixing property. Question 46. Are the flows obtained in Theorem 47 mixing of all orders? Question 47. For which vectors α ∈ Rd , there exist special flows above Tα with smooth functions A such that Tα,A is mixing? The foregoing discussion demonstrates that both ergodicity of cylindrical cascades and mixing of special flows require a detailed analysis of ergodic sums (1). However, the estimates needed in those two cases are quite different and somewhat conflicting. Namely, for ergodicity we need to bound from below the probability that ergodic sums hit certain intervals, while for mixing one needs to rule out too

266

DMITRY DOLGOPYAT AND BASSAM FAYAD

much concentration. For this reason it is difficult to construct functions A such t is mixing. In fact, so far this has only been that Wα,A is ergodic while Tα,c+A achieved for smooth functions with asymmetric logarithmic singularities. However, it seems that in higher dimensions there is more flexibility so such examples should be more common. Question 48. Is it true that for (almost) every polyhedron Ω ∈ Td , d ≥ 2, and almost every a > 0, and almost every α ∈ Td , the special flow above α and under the function a + χΩ is mixing? Note that a positive answer to both this question and Question 35 will give a large class of interesting examples where ergodicity of Wα,A and mixing for Tα,c+A (for any c such that c + A > 0) hold simultaneously. Question 49. Answer Questions 35 and 48 in the case Ω is a strictly convex analytic set. 10.5. An application. Here we show how the geometry of special flows above cylindrical cascades can be used to study the ergodic sums. j i

i

4

h g

g

3

f j e

2

e d

h c

c

1

b f a

0

a

d −1

b

Figure 4. Staircase surfaces. The sides marked by the same symbol are identified. Proof of Theorem 38. The proof uses the properties of the staircase surface St shown on Figure 4. The staircase is an infinite pile of 2 × 1 rectangles so that the left bottom corner of the next rectangle is attached to the center of the top of the previous one. The sides which are differ by two units in either horizontal or vertical direction are identified. We number all the rectangles from −∞ to +∞ as shown on Figure 4. There is a translational symmetry given by G(x, y) = (x + 1, y + 1) and St/G is a torus. We shall use coordinates p¯ = (p, z) on the staircase where p

LIMIT THEOREMS FOR TORAL TRANSLATIONS

267

are coordinates on the torus which is the identified with rectangle zero and z ∈ Z is the index of rectangle. Thus we have (p, z) = Gz (p, 0). The key step in the proof is an observation of [53] that St is a Veech surface. Namely, given A ∈ SL2 (Z) such that A ≡ I mod 2 there exists unique automorphism φA of St which commutes with G, fixes the singularities of St, has derivative A at the non-singular points and has drift 0. That is, in our coordinates (40)

φ(p, z) = (Ap, z + τ (p))  and the drift condition means that T2 τ (p)dp = 0.

Figure 5. Poincare map for a linear flow on the staircase. Orbits starting from [1/2, 1] go up while orbits starting from [0, 1/2) have to go down due to the gluing conditions. Consider the linear flow on St with slope θ which is locally given by T t (x, y) = (x + t cos θ, y + t sin θ). Let Π be the union of the top sides of the rectangles in St. We identify Π with T × Z using the map η : T × Z → Π such that η(x, z) is the point on the top side of rectangle z at the distance 2x from the left corner. It is easy to check (see Figure 5) that under this identification the Poincare map for T t takes form tan θ + 1 . (x, z) = (x + α, z + χ[ 1/2, 1](x) − χ[0,1/2) (z)) where α = 2 Now suppose that α and hence surd.  tan θ isa quadratic   By Lagrange theorem cos θ cos θ there is A ∈ SL2 (Z) such that A =λ . By replacing A by Ak sin θ sin θ for a suitable (positive or negative) k we may assume that A ≡ I mod 2 and that N λ < 1. Let ΓN (x) be the ray starting from η(x, 0) having slope θ and length sin θ. LN =

˜ mes(¯ q ∈ Γ(x) : z(φ−m p) = 0) mes(¯ p ∈ ΓN (x) : z(¯ A ) = 0) = ˜ length(ΓN (x) length(Γ) = Px (z(φ−m ¯) = 0) A q

˜ = φm ΓN (x) and Px is computed under the assumption that q¯ is uniformly where Γ A ˜ Choose m to be the smallest number such that length(φm ΓN (x)) = distributed on Γ. A N m N ˜ λ sin θ ≤ 1. Note that m ≈ ln ln λ . By our choice of m, Γ is either contained in a single rectangle or intersects two of them. Let us consider the first case, the second one

268

DMITRY DOLGOPYAT AND BASSAM FAYAD

˜ is in the rectangle with index a so that q¯ = (q, a). is similar. So we assume that Γ m −m Due to (40) z(φ q¯) = a − j=1 τ (φ−j A q). Thus ⎛ ⎞ m  ! −j Px z(φ−m ¯) = 0 = Px ⎝ τ (φA q = a)⎠ . A q j=1

Now we apply the Local Limit Theorem for linear toral automorphisms (see [101, Section 4] or [46]) which says that there is a constant σ 2 ⎛ ⎞ n  2 2 ⎠≈ √ 1 e−a /2σ m . Px ⎝ τ (φ−j A q) 2πmσ j=1 It remains to note that a(x) =

m−1 

τ (φjA η(x, 0))

j=0

so applying the Central Limit Theorem for linear toral automorphisms we see that √ is approximately normal with zero if x is uniformly distributed on T1 then a(x) m 2 mean and variance σ .  Next we discuss the proof of Theorem 8(b) in case l = 12 . The proof proceeds the same way as the proof of Theorem 38 with the following changes. ¯) = 0 we need to estimate (I) Instead of estimating the probability that z(φ−m A q √ q ¯ ) belongs to an interval of length m so we use the the probability that z(φ−m A Central Limit Theorem instead of the Local Limit Theorem. (II) Instead of taking x random we take x fixed at the origin. Note that the origin is fixed by A so τ (φm A (0, 0)) = Cm. (More precisely τ is multivalued at the origin since it belong to several rectangles so by τ (φm A (0, 0)) we mean the limit of (¯ p )) as p ¯ approaches the origin inside Γ (0).) τ (φm N A 11. Higher dimensional actions Question 50. Generalize the results presented in Sections 2-10 to higher dimensional actions.  The orbits of commuting shifts T n x = x+ qj=1 nj αj are much less studied than their one-dimensional counterparts. We expect that some of the results of Sections 2-10 admit straightforward extensions while in other cases significant new ideas will be necessary. Below we discuss two areas of research where multidimensional actions appear naturally. 11.1. Linear forms. Statements about orbits of a single translation can be interpreted as results about joint distribution of fractional part of inhomogenuous linear forms of one variable evaluated over Z. From the point of view of Number Theory it is natural to study linear forms of several variables evaluated over Zd . Let q  αij nj , i = 1 . . . d. li (n) = xi + j=1

LIMIT THEOREMS FOR TORAL TRANSLATIONS

269

Thus it is of interest to study the discrepancy DN (Ω, α, x) = Card(0 ≤ nj < N, j = 1, . . . , q : ({l1 (n)}, . . . {ld (n)}) ∈ Ω) − N q Vol(Ω). The latter problem is a classical subject in Number Theory, and there are several important results related to it. In particular, the Poisson regime is well understood ([88]). The following result generalizes Theorem 16 and can be proven by a similar argument. Theorem 48. Let (α, x) be uniformly distributed on Td(q+1) . Then the distribution of n Card(n : ∈ Σ and ({l1 (n)}, . . . {ld (n)}) ∈ N −q/d Ω) N converges as N → ∞ to N (Ω, Σ) := Card(e ∈ L, e = (x, y) : x(e) ∈ Ω, y(e) ∈ Σ) where L is a random affine lattice in Rd+q . Thus the Poisson regime for the rotations exhibits more regular behavior comparing to standard Poisson processes. However then the number of rotations becomes large the limiting distribution approaches the Poisson. Namely, the following is the special case of the result proven in [125]. Theorem 49. If Σq are unit cubes in Rq then Ω → N (Ω, Σq ) converge as q → ∞ to the Poisson measure μ(Ω) = Card(P ∩ Ω) there P is a Poisson process on Rd with constant intensity. Next we present extensions of Theorems 25, 24, 26 and 27 to thecontext of homogeneous and inhomogeneous linear forms. Let again li (n) = xi + qj=1 αij nj , i = 1 . . . d. Consider VN (α, x, c) = Card(0 ≤ ni < N : ({l1 (n)}, . . . {ld (n)}) ∈ B(c|n|−q/d )). More generally given a function ψ : R+ → R+ define VNψ (α, x) = Card(0 ≤ ni < N : ({l1 (n)}, . . . {ld (n)}) ∈ B(ψ(|q|)). ψ (α) = VNψ (0, α) be the quantities We also let UN (α, c) = VN (0, α, c) and UN measuring the rate of recurrence. In particular we call the matrix α badly approximable if there exists c > 0 ψ (α) → ∞ where such that for, VN (0, α, c) is bounded. On the other hand, if UN −(d/q+ε) then α is called very well approximable (VWA). ψ(r) = r The following result is known as Khinchine–Groshev Theorem. Almost sure is considered relative to Lebesgue measure on the space of matrices α ∈ Tdq .  ψ d Theorem 50. [7, 14, 31, 47, 64, 116] (a) If Zq |ψ (|n|) < ∞ then UN is bounded almost  surely. d (b) If Zq ψ (|n|) = +∞ and either ψ is decreasing or dq > 1 then ψ limN →∞ UN (α) = +∞ almost surely.  ψ (c) For d = q = 1 there exists ψ such that n∈Z ψ(|n|) = +∞ but UN is bounded almost surely.  (d) If ψ is decreasing and n∈Z d ψ d (|n|) = +∞ then VNψ (α, x) → ∞ almost surely.

270

DMITRY DOLGOPYAT AND BASSAM FAYAD

In particular, both badly approximable and very well approximable αs have zero measure. When the number of hits is infinite, it is natural to consider the question of the sBC property. Theorem 51. [116] (a) For almost all α *  ψ ψ UN (α) = E(UN )+O Γ(N ) ln3 Γ(N ) where Γ(N ) =



ψ(|n|)d D(gcd(n1 . . . nq ))

|n|≤N

and D denotes the number of divisors. ψ (b) Γ(N ) ≤ CE(UN ) if either q > 3 or q = 2 and nψ 2 (n) is decreasing. (c) If q = 1 and ψ(n) is decreasing then for each δ *  ψ ψ ψ ψ 2+δ ˜ Γ(N )E(UN ) ln (E(UN )) UN (α) = E(UN ) + O where ˜ )= Γ(N

N  ψ(n) . n n=1

Question 51. Does a similar formula as that of Theorem 51 hold for V ψ ? Some partial results are obtained in [117]. It follows from the same arguments as the proof of Theorem 24 sketched in Section 8.3 that the sBC property holds for ψ(r) = r −(d/q) for almost every (α, x), that is VN (α, x) = 1. lim N →∞ E(VN (α, x)) For badly approximable α we have the following. Theorem 52. [90] Let x be uniformly distributed on Td . If α is badly approxN) are normal imable, there exists a constant K such that all limit points of VN√−E(V ln N random variables with zero mean and variance σ 2 where 0 ≤ σ 2 ≤ K. Question 52. (a) Show that there exist a constant σ ¯ 2 > 0 that for almost all VN − E(VN ) √ α converges to N(¯ σ 2 ). ln N √ VN − E(VN ) √ = 0 (that is, ln N (b) Does there exist α such that lim inf N →∞ ln N is not a correct normalization for such α)? For random α we have the following. Theorem 53. ([30]) There exists σ such that If α1 , . . . , αr and x1 , . . . , xd are N) converges in distribution to a normal randomly distributed on Tdr+d then VN√−E(V ln N random variables with zero mean and variance σ 2 . A similar convergence holds if d + r > 2 and (x1 , . . . , xr ) = (0, . . . , 0) and only the αi ’s are random. Still there are many open questions. We provide several examples. Question 53. Extend Theorems 10 and 11 to the case q > 1.

LIMIT THEOREMS FOR TORAL TRANSLATIONS

271

We note that in the case of Theorem 11, even the case d = 1 seems quite difficult. One can attack this question using the method of [29] but it runs into the problem of lack of parameters described after Question 24. Question 54. Let l, ˆl : Rd → R, be linear forms with random coefficients, Q : Rd → R be a positive definite quadratic form. Investigate limit theorems, after adequate renormalization, for the number of solutions to (a) {l(n)}Q(n) ≤ c, |n| ≤ N ; (b) {l(n)}|ˆl(n)| ≤ c, |n| ≤ N ; (c) |l(n)Q(n)| ≤ c, |n| ≤ N ; (d) |l(n)ˆl(n)| < c, |n| ≤ N. While (a) and (b) have obvious interpretation as shrinking target problems for toral translations, such interpretation q for (c) and (d) is less straightforward. Consider for example (c). Let l(n) = j=1 αj nj . Dividing the distribution of α into thin slices we may assume  almost constant. If αd ≈ a then we can  that αd is q−1 compare our problem with | ˜ j nj + nq |Q(n) < c˜ where α ˜ j = αj /a, c˜ = c/a. j=1 α    q−1 ˜ j nj + nq | = { qj=1 α ˜ j } in Since |l(n)| should be small we must have | j=1 α which case Q(n1 , . . . , nq−1 , nq ) is well approximated by Q(n1 , . . . , nq−1 , −

q−1 

α ˜ j nj )

j=1

so we have a shrinking target problem in lower dimensions. In fact as we saw in Section 7 typically the proof proceeds in the opposite direction by getting rid of fractional part at the expense of increasing dimension since problems (c) and (d) have more symmetry and so should be easier to analyze. We note that part (d) deals with degenerate quadratic form. The case of nondegenerate forms is discussed in [32, Sections 5 and 6]. 11.2. Cut-and-project sets. Cut-and-project sets are used in physics literature to model quasicrystals. To define them we need the following data: a lattice in Rd , a decomposition Rd = E1 ⊕ E2 and a compact set (a window) W ⊂ E2 . Let P1 and P2 be the projections to E1 and E2 respectively. The cut-and-project set is defined by P = {P1 (e), e ∈ L and P2 (e) ∈ W}. We suppose in the following discussion that E1 + L = Rd and L ∩ E2 = ∅. Then P is a discrete subset of E1 sharing many properties of lattices but having a more complicated structure. Note that the limiting distributions in Theorems 16 and 48 are described in terms of cut-and-project sets. We refer the reader to [93, Sections 16 and 17] for more discussion of cut-and-project set. Here we only mention the fact that such sets have asymptotic density. Let PR = {t ∈ P : |t| ≤ R}. Theorem 54. Suppose that W is an open subset of E2 with a piecewise smooth boundary. Then Card(PR ) Vol(W) VolRd = lim . R→∞ Vol(B(0, R)) covol(L) VolE1 VolE2

272

DMITRY DOLGOPYAT AND BASSAM FAYAD

Proof. (Following [52]). Note that t ∈ P iff there exists e ∈ L such that −t + e ∈ W, that is −t ∈ W mod L. Consider the action of E1 on Rd /L given by T t (x) = x + t. Then PR counts the number of intersections of the orbit of the origin of size R with W. Pick a small δ and let Wδ = {W + t, |t| ≤ δ}. Then Wδ is a subset of Rd /L and for small δ Vol(Wδ ) = Vol(W)Vol(B(0, δ))

(41) Next,

VolRd . VolE1 VolE2



(42) |t| 0 |ϕ(x) − ϕ(y)| and we write H ε (X) ⊂ C(X) for the |ϕ|∞ = sup |ϕ(x)|, [ϕ]ε = sup dε (x, y) x x,y

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

281

space of ε-H¨older functions i.e ϕ ∈ H ε (X) if and only if [ϕ]ε < ∞. The space H ε (X) is normed by ϕ = |ϕ|∞ + [ϕ]ε and then it becomes a Banach space. We will say that a bounded operator Q on C(X) satisfies condition (D-F) if a) The powers Qn of Q acting on C(X) are uniformly bounded in norm | |∞ . b) There exists numbers ρ ∈ [0, 1[, r ∈ N, C ≥ 0 such that for any ϕ ∈ H ε (X) r [Q ϕ]ε ≤ ρ[ϕ]ε + C|ϕ|∞ . The functional inequality in b) will be called (D-F) inequality. Then we have the Theorem 1.1. Assume that (X, d) is compact and Q satisfies condition (D-F) on H ε (X). Then Q has only finitely many unimodular spectral values and they are isolated in the spectrum of Q. The corresponding characteristic subspaces have finite dimension and are equal to the corresponding eigenspaces. For more general and more precise statements see [12], [34], [35]. If Q is a Markov operator on X, hence in particular Q1 = 1, we say that Q is irreducible if the equation Qϕ = eiθ ϕ(θ ∈ R) implies eiθ = 1, ϕ =const. If the Markov operator Q on X is irreducible and satisfies condition (D-F) then m is the unique Q-invariant measure and the projection m ⊗ 1 on C1 satisfies lim Qn − m ⊗ 1 = 0. More precisely, we have the spectral decomposition Q =

n→∞

m ⊗ 1 + R where R(m ⊗ 1) = (m ⊗ 1) R = 0 and the spectral radius of R is less than 1. We write H0ε (X) = {ϕ ∈ H ε (X) ; m(ϕ) = 0} and we observe that H ε (X) = C1 ⊕ H0ε (X). Under condition D-F, Q-ergodicity of m is equivalent to simplicity of the eigenvalue 1. Also, the support supp(m) of m is the smallest closed Q-invariant subset of X. In general it is smaller than X, but we can always restrict Q to supp(m). For the operator Qit we have [Qit ϕ]ε ≤ ρ[ϕ]ε + Ct |ϕ|∞ where  ρ = sup x,y

 Ct = sup x,y

p(x, g)

dε (g.x, g.y) dμ(g) dε (x, y)

|eitf (x,g) p(x, g) − eitf (y,g) p(y, g)| dμ(g). dε (x, y)

The corresponding quantities for (Qit )n are conveniently expressed with Ex , in particular ρ is replaced by: #  ε # # d (sn .x, sn .y) ## # ρ(n) = sup #Ex #. dε (x, y) x,y In particular if the functions x → p(x, g), x → f (x, g) are uniformly H¨ older then Ct < ∞. Also, if lim ρ(n) < 1, then condition D-F is satisfied; hence in this case n→∞

we are able to develop a detailed study of Sn (x, ω). We assume now that Q is an irreducible Markov operator with satisfies condition D-F and f (., g) is uniformly H¨ older. We follow closely [25]. Definition 1.2. We will say that the function f (x, g) on X ×S is non degenerate (resp is aperiodic) if f is not additively (resp multiplicatively) cohomologous to a constant on supp(m) i.e for any g ∈ supp(μ), x ∈ supp(m) and some u ∈ H ε (X), c ∈ R (resp u ∈ H ε (X), |u| = 1, t ∈ R, θ ∈ R) we have f (x, g) = u(g.x) − u(x) + c (resp. eitf (x,g) = eiθ u(g.x)u−1 (x)).

282

YVES GUIVARC’H

 We assume that the integral f (x, g)dm(x)dμ(g) is finite and we denote this  integral by γ(f ). The function v(x) = f (x, g)dμ(g) − γ(f ) satisfies m(v) = 0, ε hence  2 the equation u − Qu = v has a unique solution u with u ∈ H0 (X). If f (x, g)dm(x)dμ(g) is finite we can define  σ 2 = σ 2 (f ) = (f (x, g) − γ(f ) + u(x) − u(g.x))2 dm(x)dμ(g) and non degeneracy of f can be expressed as σ 2 (f ) > 0. Also we see that for t small, by perturbation theory on H ε (X), the operator Qit has a simple dominant eigenvalue k(f, it) = k(it) which is isolated in the spectrum of Qit and satisfies |k(it)| = lim (Qit )n 1/n < 1. If f 2 (x, g)dm(x)dμ(g) < n→∞

∞ the second derivative of k(it) at zero exists and k (f, 0) = iγ(f ), k (f, 0) = −(σ 2 (f ) + γ 2 (f )). The non-degeneracy (resp aperiodicity) of f can be caracterized as follows: Proposition 1.3. With the above notations, the following conditions are equivalent a) f is degenerate (resp not aperiodic). b) For any t ∈ R (resp for some t = 0), there exists θ ∈ R and ϕt ∈ H ε (supp(m)) such that Qit ϕt = eiθ ϕt on supp(m). The following simple Lemma will be useful in the proof of the local limit theorem below.  Lemma 1.4. With the above notation assume f 2 (x, g)dm(x)dμ(g) < ∞, γ(f ) = 0, √and f is non degenerate. Then for any ϕ ∈ H ε (X), t ∈ R, the sequence 2 2 (Qit/ n )n ϕ converges to e−t σ /2 m(ϕ). There exists A > 0, ε0 > 0 such that for √ √ 2 it/ n n |t/ n| < ε0 the sequence (Q )  is bounded by 2 e−At . Proof: Since Q is irreducible and satisfies condition (D-F), we have for t small, by perturbation in H ε (X), Qit ϕ = k(it)mt (ϕ)et + R(it)ϕ where mt ⊗ et is the projection operator on the line Cet , (mt ⊗ et )R(it) = R(it)(mt ⊗ et ) = the spectral radius of R(it) is less than |k(it)|. Since γ(f ) = 0 and 0, and f 2 (x, g)dm(x)dμ(g) < ∞ we have k(it) = 1 − σ 2 t2 /2 + ◦(t2 ), hence |k(it)| < 1 √ 2 2 for t small non zero and lim kn (it n) = lim (1 − σ 2 t2 /2n)n = e−σ t /2 . Also, n→∞

n→∞

for some C ∈ [0, |k(it)|[, C < 1 we have (Qit )n = kn (it)mt ⊗ et + Rn (it) with (R(it))n  ≤ C n , lim mt ⊗ et = m ⊗ 1. This gives the first conclusion. The ast→0

ymptotic expansion of k(it) gives |k(it)| ≤ e−σ t /4 for |t| ≤ ε0 with ε0 > 0, hence √ √ 2 2 2 −2 |kn (it/√ n)| ≤ e−σ t /4 . If |t/ n| ≤ ε0 we have C n ≤ exp(t ε0 LogC). Since √ √ √ 2 it/ n n n n it/ n n (Q )  ≤ |k (it/ n)| + R (it/ n) we have (Q )  ≤ 2e−At with A = sup(σ 2 /4, ε−2 0 LogC).  2 2

Theorem 1.5. Assume that the Markov kernel Q on H ε (X) is irreducible, Qit satisfies condition  (D-F) for any t ∈ R, the function f on X × S is aperiodic with γ(f ) = 0, and f 2 (x, g)dm(x)dμ(g) < ∞. Then for any ϕ ∈ H ε (X) and arbitrary interval I ⊂ R : √ lim σ 2πn Ex (ϕ(sn .x)1I (Sn (x, .))] = m(ϕ)(I), n→∞

where  is Lebesgue measure on R.

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

283

Proof: As in [6] we consider the √following sequence of Radon measures nx on X × R defined by nx (ϕ ⊗ u) = σ 2πn Ex (ϕ(sn .x)u(Sn (x, .))) and we need to show the weak convergence of nx to m ⊗ . Hence it suffices to show that lim nx (ϕ⊗u) = m(ϕ)(u) for any continuous u ∈ L1 (R) with compactly supported n→∞

Fourier transform u / (see [6]). The Fourier inversion formula gives with ε > 0:  ∞ (Qit )n ϕ(x)/ u(t)dt Ex (ϕ(sn .x)u(Sn (x, .)) = (2π)−1 −∞

√ nx (ϕ ⊗ u) = σ n(2π)1/2   it n × (Q ) ϕ(x)/ u(t)dt + |t|≥ε

ε

√ n

−ε

√ n

√ ) ϕ(x)/ u(t/ n)dt

√ it/ n n

(Q



Since f is aperiodic, Proposition 1.3 gives that Qit has no unimodular eigenvalue if t = 0. Since Qit satisfies condition (D-F) it follows that the spectral radius of Qit satisfies r(Qit ) < 1. Hence the first term in the right hand side has limit zero. On the other hand, the above lemma shows that for ε small √ 2 2 it/ n n ) ϕ(x)1[−ε √n,ε √n] (t) satisfies dominated convergence with limit e−t σ /2 . (Q The convergence of the second term to  ∞ √ 2 2 e−σ t /2 u /(0)dt = (σ −1 2π)−1 m(ϕ)(u) m(ϕ) −∞

follows.  Remark 1.6. Condition (D-F) for Qit (t = 0) and aperiodicity of f are satisfied if Qit has spectral radius less than one and f is non degenerate. Clearly the same proof in the L2 context with L20 (X) = {ϕ ∈ L2 (X) ; m(ϕ) = 0} gives Theorem 1.7. Assume that the Markov kernel Q is irreducible and its spectral radius r(Q0 ) on L20 (X) satisfies r(Q0 ) < 1. Assume the Borel function f (x, g) on X × S satisfies γ(f ) = 0, σ 2 (f ) > 0, and the operator Qit on L2 (X) has spectral radius less than 1 for any t = 0. Then for ϕ ∈ L2 (X) and arbitrary interval I ⊂ R: √ lim σ 2πnE(ϕ(sn .x)1I (Sn (x, .)) = m(ϕ)(I).

n→∞

For various examples where the conditions of the theorem are satisfied, see [10]. Clearly the hypothesis and the result are stable under perturbation of Q in the operator norm and this property gives new examples. We observe that if r(Q0 ) < 1, Proposition 3.4 below allows one to verify that the spectral radius of Qit is less than 1 in various situations. Then the spectral arguments developed in [25] can also be used for the proofs of various limit theorems, under a Hilbert space spectral hypothesis on Q. The condition r(Q0 ) < 1 of the theorem is called ”strong mixing” of Q in [45]. For a study of the central limit theorem in a similar general framework, we refer to [45]. The classical “Doeblin’s condition” is stronger than “strong mixing”. More precisely “Doeblin’s condition” is equivalent to the fact that the spectral radius of Q0 in L10 (m) is less than 1.

284

YVES GUIVARC’H

2. Law of large numbers and spectral gap properties for products of random matrices. In this section we describe some properties of products of random matrices which are essential for the proofs of limit theorems in sections 3, 4 below. In section 3 we will show the local limit theorem and large deviations asymptotics in a certain range for products of random matrices. These results will play also an important role in section 4, in the study of extreme values for affine random walks. We need to give a brief exposition of known results on products of random matrices. We begin by describing some notation. 9 = GL(V ) be the full linear group of the Euclidean vector space V = Let G d R (d ≥ 2), G = SL(V ) the special linear group of V , P(V ) (resp S(V )) the projective space (resp unit sphere) of V . The scalar product on V will be written -x, y. 9 will be written |x| if x, y ∈ V . The corresponding norm of x ∈ V (resp g ∈ G) −1 9 (resp |g|) . For g ∈ G we write γ(g) = sup(|g|, |g |). If x ∈ V we will denote x(resp x ˜) its projection on P(V ) (resp S(V )). Also for x ∈ P(V ) we will write x ˜ ∈ S(V ) for one of its corresponding representatives in S(V ). The action of g on x ∈ P(V ) (resp y ∈ S(V )) will be written g.x (resp g.y). We will use on S(V ) ¯ x, y˜) = |˜ ¯ y) = inf (δ(˜ ˜ x, y˜), δ(−˜ ˜ x, y˜)). We (resp P(V ) the distance δ(˜ x − y˜| (respδ(x, will also consider the distance δ on P(V ) given by δ(x, y) = |x∧y| = sin(x, y) which ¯ We note that the natural norm |x ∧ y| on the wedge product Λ2 V is equivalent to δ. 2 2 satisfies |x| |y| = |x ∧ y|2 + |-x, y.|2 . For s ∈ R we write s¯ = inf(s, 1), sˆ = sup(s, 1). For a subset E of a group we write -E. (resp [E]) for the closed subgroup (resp 9 has a simple dominant eigenvalue λg we will semigroup) generated by E. If g ∈ G < write V = Rvg ⊕ Vg where vg is a dominant eigenvector and Vg< the corresponding supplementary hyperplane. In this case we will say that g is proximal and we denote by g + ∈ P(V ) the attracting fixed point defined by vg . For a semigroup 9 we will write Γprox for the subset of proximal elements in Γ. The closure of Γ⊂G + {g ; g ∈ Γprox } will play an important role below in the discussion of aperiodicity 9 x ∈ P(V ) (see Corollary 2.12). It for the function f (x, g) = Log|g9 x| with g ∈ G, will called limit set of Γ. The dual space of V will be denoted V ∗ and the dual map 9 satisfies condition (I-P) if of g by g ∗ . We will say that a semigroup Γ ⊂ G a) No finite union of proper subspaces is Γ-invariant (strong irreducibility of Γ). b) Γ contains a proximal element. It follows from [18] that condition (I-P) for Γ, -Γ. and the Zariski closure Zc(Γ) of 9 are all equivalent. Since Zc(Γ) is a Lie subgroup of G 9 with a finite number Γ in G of connected components, condition (I-P) is easily verified. If the field R is replaced by the local field Qp , the corresponding (I-P) condition is useful (see [29]). If Γ acts on a locally compact metric space X and if λ(resp ν) is a positive measure on Γ (resp X) we define the convolution λ ∗ ν by (λ ∗ ν)(ϕ) =  ϕ(γ.x)dλ(γ)dν(x) where ϕ ∈ C(X) is compactly supported. We will denote by M 1 (E) the space of probabilities on a Polish space E. In particular if μ ∈ M 1 (Γ) we will write μn for its convolution power of order n. Also we will write P = μ⊗N for the corresponding probability on ΓN . The product of the random matrices 9 ≤ k ≤ n) will be written Sn = gn · · · g1 . The spectral radius of a bounded gk ∈ G(1 linear operator U of a Banach space into itself will be denoted r(U ) = lim |U n |1/n . n→∞

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

285

We begin by considering products of unimodular random matrices; we follow ([4] [16], [22], [30]) with a few changes. We start with the following (see [22], [52]). Theorem 2.1. Let V = Rd (d ≥ 2), G = SL(V ), μ ∈ M 1 (G) be such that the subgroup < supp(μ) > is non-compact and strongly irreducible, and denote by Pμ the convolution operator defined by μ on the G-space V \ {0}. Let L2 (V ) be the Hilbert space on V with respect to Lebesgue measure and rV (μ) the spectral radius of Pμ acting on L2 (V ). Then rV (μ) < 1. We have the following corollaries.  Corollary 2.2. With μ as in the theorem and Log|g|dμ(g) < ∞, the se quence n1 Log|g|dμn (g) converges to a positive number:  1 1 lim Log|g|dμn (g) ≥ − Log rV (μ) > 0. n→∞ n d Corollary 2.3. Assume supp(μ) satisfies condition (I-P), denote by P¯μ the operator on P(V ) definedby convolution with μ. Then P¯μ has a unique invariant probability measure ν. If Log|g|dμ(g) is finite the   sequence of functions on P(V ), 1 n Log|gx|dμ (g) converges uniformly to L = Log|gx|dμ(g)dν(x) > 0. μ n This is a weak form of the well known Furstenberg’s result giving exponential 1 growth for products of unimodular random matrices: lim Log|gn · · · g1 | = Lμ > n→∞ n 0 P − a.e (see [16]). This fact follows also from the above and the subadditive ergodic theorem. Corollary 2.4. We consider the d-dimensional torus Td = Rd /Zd (d > 1), the action of a semigroup Γ ⊂ SL(d, Z) on Td by automorphisms, a probability measure μ ∈ M 1 (Γ) such that the action of the group < supp(μ) > on Rd is strongly irreducible. Then the spectral radius r0 (μ) of the corresponding convolution action on L20 (Td , m) where m is Lebesgue measure, satisfies r0 (μ) < 1. We observe that this corollary (see[15]) is a very special case of a general result of automorphism-actions on a compact nilmanifold N/Δ used below : the action of Γ ⊂ Aut(N/Δ) on L20 (N/Δ) does not contain weakly the identity representation if and only if there is no Γ-equivariant torus factor T of N/Δ such that the corresponding quotient group acting on T is virtually abelian (see [1]). Corollary 2.5. With the hypothesis of Corollary 2.3 we assume that for some  τ > 0, |g|τ dμ(g) is finite. Then there exists ε0 > 0 such that for any ε ∈]0, ε0 ]:  lim sup

n→∞ |x|=1

|gx|−ε dμn (g)

1/n < 1.

Then in particular if dim V = 2, and 2ε ∈]0, ε0 ] one has:  ρ(ε) = lim

sup

n→∞ x ¯,¯ y ∈P(V )

x, g.¯ y) n δ ε (g.¯ dμ (g)) < 1, δ ε (¯ x, y¯)

and the spectral radius of P¯μ on H0ε (P(V )) = {ϕ ∈ H ε (P(V )); ν(ϕ) = 0} is at most ρ(ε) < 1.

286

YVES GUIVARC’H

Remark 2.6. a) As the proof below shows, we can replace in the statement of the theorem V by Zd , i.e SL(V ) by SL(d, Z); then rV (μ) is replaced by the spectral radius of the convolution by μ on Zd \ {0}, less than 1 under the same condition. b) Strong irreducibility is for  the validity of the theorem: if  a necessary  hypothesis  λ 0 0 1 μ = 12 (δa + δb ) with a = ,b= , λ > 1 we get with standard 0 1/λ −1 0 1 2 notations on P (R ):  1 1 Log|g|dμn (g) = 0. ν = (δe¯1 + δe¯2 ), Lμ = lim n→∞ n 2 c) If d = 2, condition (I-P) for [supp(μ)] is equivalent to strong irreducibility and non-compactness of < supp(μ) >. For the proof of the theorem we need the lemma (see [16]): Lemma 2.7. Assume that a semigroup Γ ⊂ SL(V ) is strongly irreducible and non-compact. Then there is no Γ-invariant probability measure on P(V ). Proof of Lemma 2.7 Assume ν ∈ M 1 (P(V )) satisfies g.ν = ν for every g ∈ Γ. Since Γ is unbounded there exists gn ∈ Γ with lim |gn | = ∞. We denote n→∞

un = |gn |−1 gn and we observe that |det un | = |gn |−d converges to zero. Since |un | = 1, we can extract a convergent subsequence in EndV , again denoted un and we have lim un = u, with |u| = 1, det u = 0. n→∞



We write W = Ker u, W  = Im u and we denote by W and W the correspondc ing projective subspaces with W = P(V ) \ W . We decompose ν as ν = ν1 + ν2 c where ν1 (resp ν2 ) is the restriction of ν1 (respν2 ) to W (resp W ) and we observe  that ν = lim gn .ν = lim gn .ν1 + u.ν2 where u.ν2 is supported by W . Since n→∞

n→∞

P(V ) is compact we can assume, taking a subsequence, that gn .ν1 converges to ν1 which is supported on the subspace W 1 = lim gn .W . Since g.ν = ν it follows n→∞



that ν(W 1 U W ) = 1. We consider the set Φ of subsets F of P(V ) which are finite unions of proper subspaces with ν(F ) = 1. Since any decreasing sequence in Φ is finite, Φ has a least element F0 . Since g.ν = ν, we have g.F0 = F0 for any g ∈ Γ, hence Γ is not strongly irreducible. This gives the required contradiction.  Proof of Theorem 2.1 Assume rV (μ) = 1 and let z = eiθ , |z| = 1 be a spectral value of Pμ = P . Then either lim P ϕn −zϕn 2 = 0 for some sequence ϕn ∈ L2 (V ) n→∞

with ϕn 2 = 1 or Im(P −zI) is not dense in L2 (V ). In the second case, duality gives zP ∗ ϕ = ϕ for some ϕ ∈ L2 (V ), ϕ = 0. Since SL(V ) preserves Lebesgue measure on V , P is a contraction of L2 (V ), hence the fixed points of z¯ P and zP ∗ in L2 (V ) are the same i.e P ϕ = zϕ. Hence it suffices to consider the first case. Then the condition lim P ϕn − zϕn 2 = 0 is equivalent to lim < P ϕn , zϕn >= 1. n→∞

n→∞

Since |P ϕn | ≤ P |ϕn | and -P |ϕn |, |ϕn |. ≤ 1 we get lim -P |ϕn |, |ϕn |. = 1, i.e n→∞  -|ϕn | ◦ g, |ϕn |.dμ(g) = 1. Since -|ϕn | ◦ g, |ϕn |. ≤ 1 it follows that, taking lim

n→∞

a subsequence, lim -|ϕn |◦g, |ϕn |. = 1 μ−a.e, i.e lim |ϕn |◦g −|ϕn |2 = 0 μ−a.e. n→∞ n→∞ On the other hand we have |ϕn |2 ◦ g − |ϕn |2 1 ≤ |ϕn | ◦ g − |ϕn |2 |ϕn | ◦ g + |ϕn |2 ≤ 2|ϕn | ◦ g − |ϕn |2 ,

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

287

hence lim |ϕn |2 ◦ g − |ϕn |2 1 = 0 μ − a.e. We consider the probability νn on n→∞

V \ {0} defined by νn = |ϕn |2 , where  is Lebesgue measure on V , and its projection ν n on P(V ). Then we have in variation norm lim gνn − νn  = 0, hence n→∞

lim g.ν n −ν n  = 0 μ−a.e, and also lim g.ν n −ν n = 0 weakly for any g ∈ supp(μ).

n→∞

n→∞

Taking a weak limit ν of a subsequence ν nk we get g.ν = ν for any g ∈ supp(μ), hence ν is < supp(μ) >-invariant. From the lemma this is impossible because Γ =< supp(μ) > is non-compact and strongly irreducible.  Proof of Corollary 2.2 We have 1 = det g ≤ |g|d , hence |g| ≥ 1 for g ∈ G, and  ∞ > Log|g|dμn (g) = un ≥ 0. Also by subadditivity of Log|g| : um+n ≤ um + un . It follows n−1 un ≤ u1 < ∞, lim n−1 un = inf n−1 un = c ≥ 0. n→∞

n≥1

For ε > 0 we consider the L2 functions fε and 1C on V given by: fε (v) = inf(1, |v|−d−ε ), C = {1 ≤ |v| ≤ 2}. The theorem gives: lim sup | < Pμn fε , 1C > |1/n ≤ rV (μ) < 1. 

n→∞



But: -Pμn fε , 1C . ≥ C |gv|−d−ε dμn (g)dv ≥ 2−d−ε (C) Log-Pμn fε , 1C . ≥ −(d+ε)un +Log(2−d−ε (C)), (d+ε) Hence c ≥ −d−1 Log rV (μ) > 0. 

|g|−d−ε dμn (g),. It follows lim n−1 un ≥ −Log rV (μ).

n→∞

Proof of Corollary 2.3 The uniqueness of the P μ -invariant measure ν ∈ M 1 (P(V )) will be shown later (Theorem 2.11). This uniqueness implies the uniform convergence of the sequence to ν(ϕ), for any ϕ ∈ C(P(V )). We can write:   n Log|gx|dμ (g) = Log|gn gn−1 · · · g1 x|dP(ω)  n = Σ Log|gk (gk−1 · · · g1 .x)|dP(ω) 1  n = Σ Log|g(h.x)|dμ(g)dμk−1 (h) 1 n

k−1

= ΣPμ

1 n

n−1

k

Σ P μ ϕ(x) 0

ϕ(x)

1



with ϕ(x) = Log|gx|dμ(g).  The condition Log|g|dμ(g) < ∞ implies of ϕ on P(V ). From  the continuity 1 n above we get the uniform convergence of n Log|gx|dμ (g) to ϕ(x)dν(x). The positivity of the limit follows from 2.2.  Proof of Corollary 2.4 We observe that m is preserved by any γ ∈ SL(d, Z) and we can write L2 (Td ) = L2 (Td , m) as L2 (Td ) = C1⊕L20 (Td ). For any ϕ ∈ L2 (Td ) we have Tμ ϕ(x) = μ∗ϕ(x) = Σ ϕ(γ −1 x)μ(γ). We observe that the Fourier transform γ

of ϕ, i.e ϕ(k) / =< ϕ, ek > where ek (x) = e2iπ and k ∈ Zd , gives an isometry 2 ϕ→ϕ / from L (Td ) to 2 (Zd ), in which L20 (Td ) corresponds to 2 (Zd \{0}). Also for g ∈ SL(d, Z), we have ϕ  ◦g =ϕ / ◦ g ∗ . Clearly the stated hypothesis on < supp(μ) > ∗ carries over to < supp(μ) > . If < supp(μ) > is strongly irreducible on Rd , then

288

YVES GUIVARC’H

-supp(μ). ⊂ SL(d, Z) is infinite, hence non-compact. Hence we can use Theorem 2.1 and the Remark 2.6 to deduce that the spectral radius r0 (μ) of Tμ on L20 (Td ) satisfies r0 (μ) < 1.   Proof of Corollary 2.5 We denote un (ε) = sup |gx|−ε dμn (g). Since |gx| ≥ |x|=1

|g −1 |−1 ≥ |g|−(d−1) the integral is finite for ε(d − 1) ≤ τ . We have um+n (ε) ≤ um (ε)un (ε) because:   −ε m+n (g) = |g(h.x)|−ε |hx|−ε dμm (g)dμn (h) |gx| dμ   ≤ sup |gy|−ε dμm (g) |hx|−ε dμn (h) ≤ um (ε)un (ε). |y|=1

Hence lim |un (ε)|1/n = inf(un (ε))1/n . It follows that it suffices to show un (ε) < 1 n→∞ n for some n ≥ 1. We write |gx|−ε = exp(−εLog|gx|) and use the inequality eu ≤ 1 + u + u2 e|u| which gives |gx|−ε ≤ 1 − εLog|gx| + (ε2 Log2 |g|)|g|(d−1)ε ,    |gx|−ε dμn (g) ≤ 1 − ε Log|gx|dμn (g) + ε2 |g|(d−1)ε Log2 |g|dμn (g).   The last integral is bounded by ( (Log|g|)4 dμn (g))1/2 ( |g|2ε(d−1) dμn (g))1/2 , hence is finite for any n if 2ε(d − 1) ≤ τ . Using Corollary 2.3, we fix n such that  1 − un (ε) ≥ c > 0, i.e un (ε) < 1 for Log|gx|dμn (g) ≥ c > 0, hence lim inf ε ε→0+ ε ≤ ε0 small. We observe that δ(¯ x, y¯) = |x ∧ y| where |x| = |y| = 1 is also a distance on P(V ) which is equivalent to δ¯ since δ(¯ x, y¯) = |sin(x, y)|. Also δ(g.¯ x, g.¯ y ) = |gx| |gy|δ(¯ x, y¯) since g ∈ SL(2, V ). It follows:   ε x)(g.¯ y) n 1 δ (g.¯ dμ (g) = dμn (g) ε ε δ (¯ x, y¯ |gx| |gy|ε   −2ε n 1/2 ≤ ( |gx| dμ (g)) ( |gy|−2ε dμn (g))1/2  ≤ sup |gx|−2ε dμn (g). |x|=1

The inequality ρ(ε) < 1 follows if 2ε ≤ ε0 .  For any ϕ ∈ H ε (P(V )) we have : ϕ(x) − ν(ϕ) = (ϕ(x) − ϕ(y))dν(y). It follows n n that |ϕ − ν(ϕ)|∞ ≤ [ϕ]ε , hence |P μ ϕ|∞ ≤ [P μ ϕ]ε if ϕ ∈ H0ε (P(V ), and the norms ϕ → ϕ, ϕ → [ϕ]ε on H0ε (P(V )) are equivalent. δ ε (g.¯ x, g.¯ y) n n n n Also [P μ ϕ]ε ≤ [ϕ]ε sup dμ (g). Since |P μ ϕ|∞ ≤ [P μ ϕ]ε on ε (¯ δ x , y ¯ ) x ¯,¯ y ∈P(V )  ε x, g.¯ y) n δ (g.¯ n ε H0 (P(V )) we get P μ  ≤ 2 sup dμ (g) on H0ε (P(V )). The last ε δ (¯ x, y¯) x ¯,¯ y ∈P(V ) conclusion follows from the above inequality ρ(ε) < 1.  Corollary 2.5 has a natural extension to any dimension d ≥ 2. The proof is based on a basic inequality between the two first Lyapunov exponents associated

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

289

with μ ∈ M 1 (GL(V )) (see [18], [24]). We consider the ergodic dynamical system 9 N , θ is the shift, π is a θ-invariant probability, f a GL(d, R)(Ω, θ, π) where Ω = G  valued function on Ω which satisfies (Log+ |f (ω)| + Log+ |f −1 (ω)|)dπ(ω) < ∞, and we write Sn (ω) = f (θ n−1 ω) · · · f (ω) for n ≥ 0. Then the top Lyapunov exponent L1f is defined by  1 L1f = lim Log|Sn (ω)|dπ(ω). n→∞ n Using the wedge products Λk V we define the other Lyapunov exponents Lkf by:  1 L1f + · · · + Lkf = lim Log|Λk Sn (ω)|dπ(ω). n→∞ n Here we consider the two first Lyapunov exponents L1μ , L2μ for products of independant random matrices with law μ i.e π = μ⊗N = P. We have the following result (see [24]). Theorem 2.8. Let μ ∈ M 1 (GL(d, R)) be such that Logγ(g) is μ-integrable and the semigroup [supp(μ)] of GL(d, R) satisfies condition (I-P). Then L1μ > L2μ . Condition (I-P) is satisfied for example, if the projection of the group < supp(μ) > on SL(d, R) is Zariski-dense in SL(d, R) (see [18]). The geometrical meaning of the theorem is that for any x, y ∈ V \{0}, the angle between the random vectors Sn (ω)x, Sn (ω)y decreases P−a.e to zero at exponential speed exp(L2μ − L1μ ) < 1. The proof of this fact is based on a martingale argument due to H. Furstenberg and given below (see Theorem 2.11): if ν is a P μ -invariant probability on P(V ) then the sequence of random measures g1 · · · gn .ν on P(V ) converges weakly to a Dirac measure. The uniqueness of ν as a P μ -invariant measure follows (see below). A useful algebraic tool for passing from ergodic properties of P μ to Lyapunov exponents is the following cocycle σ : if ξ ∈ T 1 is a unit tangent vector to P(V ) 2 defined by x ∈ V and x ∧ y ∈ ∧2 (V ), σ(g, ξ) = |g(x∧y)||x| |gx|2 |x∧y| . Then, using the above convergence, one shows that, under condition (I-P) (see [24]):  1 lim sup Logσ(g, ξ)dμn (g) < 0. n→∞ n ξ∈T 1 The extension of Corollary 2.4 is as follows:  9 satisfies γ τ (g)dμ(g) < ∞ for some Theorem 2.9. Assume that μ ∈ M 1 (G) τ > 0 and the semigroup [supp(μ)] satisfies (I-P). Then for ε sufficiently small one has: 1/n   ε x, g.¯ y) n δ (g.¯ dμ (g) < 1. ρ(ε) = lim sup n→∞ δ ε (¯ x, y¯) x ¯,¯ y In particular the spectral radius of P μ acting on H0ε (P(V )) = {ϕ ∈ H ε (P(V )); ν(ϕ) = 0} is at most ρ(ε) < 1. In order to deduce limit theorems for products of random matrices (see [42]), we it it ε x) = are ledit to consider the ”Fourier operators” P μ on H (P(V )) defined by P μ ϕ(¯ x)dμ(g). |gx| ϕ(g.¯

290

YVES GUIVARC’H

Corollary 2.10. We denote for any k ≥ 0, ε > 0:   ε δ (g.¯ x, g.¯ y) k dμ (g) Ck (ε) = |g|ε dμk (g), ρk (ε) = sup δ ε (¯ x, y¯) x ¯,¯ y ∈P(V ) Then, for any t ∈ R, ϕ ∈ H ε (P(V )), k ≥ 0: it

[(P μ )k ϕ]ε ≤ ρk (ε)[ϕ]ε + |ε−1 t|Ck (ε)|ϕ|∞ . it

In particular for ε small and k large the operator P μ satisfies the (D-F) init

equality, [(P μ )k ϕ]ε ≤ ρ (ε)[ϕ]ε + |ε−1 t|C  (ε)|ϕ|∞ with ρ (ε) < 1, C  (ε) < ∞. Proof of Theorem 2.9 We have if |x| = |y| = 1, |gx ∧ gy| δ(g.¯ x, g.¯ y) = . δ(¯ x, y¯) |gx||gy||x ∧ y| Hence, by Schwarz inequality 

δ ε (g.¯ x, g.¯ y) n dμ (g) ε δ (¯ x, y¯)

2

  ≤

Hence it suffices to show lim sup

n→∞ ξ∈T 1

|gx ∧ gy| |gx|2 |x ∧ y|



  n

dμ (g)



|gx ∧ gy| |gy|2 |x ∧ y|

ε dμn (g).

1/n σ ε (g, ξ)dμn

< 1.

The proof is the same as for Corollary 2.5 where |gx|−ε replaces σ ε (g, ξ), once we  1 sup have lim Logσ(g, ξ)dμn (g) < 0 which, as explained above, follows from n→∞ n ξ∈T 1 L1μ > L2μ . The spectral gap assertion on H0ε (P(V )) is proved as in Corollary 2.5.  Proof of Corollary 2.10 We use the following mean value inequality for |x| = |y| = 1, t ∈ R, ε ∈]0, 1] : ||gx|it − |gy|it | ≤ |ε−1 t| |g|ε |x − y|ε . Hence, with x ¯, y¯ ∈ P(V ), |x| = |y| = 1: ||gx|it − |gy|it | ≤ |ε−1 t| |g|ε δ¯ε (¯ x, y¯), ||gx|it ϕ(g.¯ x) − |gy|it ϕ(g.¯ y)| ≤ |ϕ|∞ ||gx|it − |gy|it | + [ϕ]ε δ¯ε (g.¯ x, g.¯ y ). The first inequality follows by integration will respect of μk . Let ε > 0 with 0 < 2ε < ε0 as in Corollary 2.5. Since 1/k   ε x, g.¯ y) k δ (g.¯ lim dμ (g) 0 with ρ (ε) = ρk (ε) < 1. Taking ε with ε < τ , we have Ck (ε < ∞). Hence (D-F) inequality follows.  9 ν ∈ M 1 (P(V )) is P μ -invariant and Theorem 2.11. Assume μ ∈ M 1 (G), [supp(μ)] satisfies condition (I-P). Then the sequence σn .ν = g1 g2 · · · gn .ν converges P − a.e to a Dirac measure δz (ω). In particulier ν is unique as a P μ -invariant probability and gives zero mass to any proper subspace of P(V ).

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

291

The following corollaries will play an essential role in the proofs of the local limit theorem for Log|Sn (ω)x|. 9 which satisfies condition Corollary 2.12. Let Δ be a closed subsemigroup of G (I-P). Then the action of Δ on P(V ) has a unique minimal subset Λ(Δ) and Λ(Δ) is not contained in a countable union of proper subspaces. Furthermore Λ(Δ) is the closure of the set of attractive points of Δ. The set Λ(Δ) is called the limit set of Δ. Corollary 2.13. With the notations and hypothesis of Corollary 2.12 and d ≥ 2, let A ⊂ Δ with [A] = Δ, assume that ϕ ∈ C(P(V )) satisfies for some t ∈ R, x) = eiθ ϕ(¯ x). θ ∈ [0, 2π[ and any γ ∈ A, x ¯ ∈ Λ(Δ), |γx|it ϕ(γ.¯ iθ Then t = 0, e = 1, ϕ =const. This corollary (see Proposition 3 in [24]) is useful in the verification of the aperiodicity conditions in the proofs of the local limit theorems. In view of its essential role in limit theorems we give a detailed proof below. Corollary 2.14. Assume Γ ⊂ GL(V ) is a subsemigroup which satisfies (I-P), d ≥ 2 and define SΓ = {Log|λg | ; g ∈ Δprox }. Then SΓ generates a dense subgroup of the additive group R. This property plays an essential role in the study of renewal theorems for products of random matrices (see [31]), and also in some geometric situations (see [7]). For the proof of Theorem 2.11 we use the following lemmas. Lemma 2.15. Let ϕ ∈ C(P(V )) and denote fn (ω) = σn .ν, fnr (ω, η) = σn η1 · · · ηr .ν for (ω, η) ∈ Ω×Ω. Then for r fixed the sequence fnr (ω, η)−fn (ω) converges P×P−a.e to zero. Lemma 2.16. For any proper subspace W of P(V ) one has ν(W ) = 0. Proof of Lemma 2.15 Since g → g.ν(ϕ) is μ-harmonic, fn (ω) is a martingale and we have for any r ≥ 0: 2 ) − E(fn2 ). E(fn+r − fn )2 = E(fn+r N



1

1

2 2 Hence for any N > 0, Σ E(fn+r − fn2 ) ≤ 2r|ϕ|2∞ . Then the series Σ E(fn+r − fn2 ) is

convergent. Also:



2 − fn2 ) = E(fn+r

|fnr (ω, η) − fn (ω)|2 d(P ⊗ P)(ω, η).



Hence the series Σ |fnr (ω, η) − fn (ω)|2 is P ⊗ P − a.e convergent. In particular for r 1

fixed and P − a.e η = (η1 , · · · , ηr ): lim |fnr (ω, η) − fn (ω)| = 0.

n→∞

 Proof of Lemma 2.16 We consider the set Φ of proper projective subspaces H of P(V ) of minimal dimension r such that ν(H) > 0. Clearly if H, H  ∈ Φ

292

YVES GUIVARC’H

with H = H  , then ν(H ∩ H  ) = 0. It follows that for any ε > 0 the set Φε = {H ∈ Φ ; ν(H) ≥ ε} is finite. In particular, there exists H0 ∈ Φ with ν(H0 ) = sup{ν(H) ; h ∈ Φ} and the set Φ = {H ∈ Φ ; ν(H) = ν(H0 )} is finite. Since g −1 H0 and H0 have the same dimension r the equation ν(H0 ) = (g.ν)(H0 )dμ(g) implies ν(g −1 H0 ) = ν(H0 ) for μ-almost every g. Hence g −1 H0 ∈ Φ , g −1 Φ ⊂ Φ for any g ∈ supp(μ). Since Φ is finite this contradicts strong irreducibility of [suppμ]. Hence Φ = φ.  Proof of Theorem 2.11 Using Lemma 2.15 we know that the sequence fn (ω) is a bounded martingale. The martingale convergence theorem implies the P − a.e convergence of fn (ω). Since C(P(V )) is separable, we get the weak convergence of σn .ν to the random probability νω . We denote by Ω ⊂ Ω the set of ω such that σn .ν converges weakly and such that the convergence in Lemma 2.15 for any ϕ ∈ C(P(V )) takes place with P ⊗ P-probability 1. Then, P(Ω ) = 1 and for any ω ∈ Ω , and for P − a.e η: lim σn η1 · · · ηr .ν = lim σn .ν.

n→∞

n→∞

−1

Let nk (ω) be a subsequence such that |σnk | σnk converges to a linear map τω with kernel Hω , hence the quasi projective map τω is defined and continuous on P(V ) \ Hω . Then by Lemma 2.16, ν and η1 · · · ηr .ν gives zero mass to any proper subspace, hence τω (η1 · · · ηr .ν) = τω .ν = νω . This equality is valid for a dense subset of (supp(μ))⊗r , hence for any γ ∈ (supp(μ))r : τω (γ.ν) = τω .ν. Then for any γ in [supp(μ)] we have τω .(γ.ν) = νω = τω .ν. Let γ1 be a proximal element of [supp(μ)] with γ1+ = x, and γ0 ∈ [supp(μ)] / Hω (γ0 exists by strong irreducibility of [supp(μ)], see [19]). Then with γ0 .x ∈ taking γ = γ0 γ1n ∈ [supp(μ)] we get lim γ0 γ1n .ν = γ0 .δx . The continuity of x → n→∞

τω .x outside Hω gives finally : τω .ν = νω = τω (γ0 .δx ) = δτω γ0 .x . It follows that for ω ∈ Ω , νω is a Dirac measure δz(ω) and τω (P(V ) \ Hω ) = z(ω). The martingale convergence theorem implies that the P-law of z(ω) is ν. Since τω .ν = δz(ω) and τω is independant of ν, this gives the uniqueness of ν.  9 be such that [supp(μ)] = Δ, Proof of Corollary 2.12 Let μ ∈ M 1 (G), 1 and ν ∈ M (P(V )) be the unique P μ -invariant probability. If F is a closed Δinvariant subset of P(V ), then Kakutani’s fixed point theorem implies that there exists a P μ -invariant probability η ∈ M 1 (F ) ; the theorem gives η = ν, hence F ⊃ supp(η) = supp(ν). The first assertion follows. If supp(ν) is contained in a countable union of subspaces, then at least one of them has positive measure, which contradicts Theorem 2.11. The last assertion follows from the minimality of Λ(Δ).  ˜ Proof of Corollary 2.13 Let Λ(Δ) be the inverse image of Λ(Δ) in V \ {0} ˜ and let ϕ˜ be the function on Λ(Δ) defined by ϕ(v) ˜ = ϕ(¯ v )|v|it where we can assume ˜ hence t ≥ 0. Then the relation satisfied by ϕ can be written as ϕ(γv) ˜ = eiθ ϕ(v), irθ r 2πt ˜ if δ ∈ A , r ∈ N. Assume t > 0, hence ρ = e > 1. We have ϕ(δv) ˜ = e ϕ(v) k also ϕ(λv) ˜ = |λ|it ϕ(v) ˜ and in particular ϕ(±ρ ˜ v) = ϕ(v), ˜ k ∈ Z. We observe that, since Δ acts minimally on Λ(Δ) and ϕ is continuous, the invariance condition on ϕ implies |ϕ| =const, hence we can assume |ϕ| = 1. It follows that the condition ˜ ϕ(λv) ˜ = ϕ(v) ˜ for some v ∈ Λ(Δ) is equivalent to λ = ±ρk (k ∈ Z). We define −1 ˜ Λc ⊂ Λ(Δ) by Λc = ϕ˜ (c) where c is a value of ϕ˜ hence Λc = ±ρZ Λc = φ.

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

293

It follows that Λc is closed as well as its projection in Λ(Δ). We note that for β ∈ R, one has Λceiβ = exp(βt−1 )Λc , γ(Λc ) ⊂ Λceiθ for γ ∈ A, hence δ(Λc ) ⊂ exp(rθt−1 )Λc if δ ∈ Ar . Then the above projection of Λc in Λ(Δ) is Δ-invariant, hence equal to Λ(Δ) by minimality. Let γk be sequence in Δ and nk ∈ Z such that u = lim ρ−nk γk = 0. Then, since γk (Λc ) ⊂ Λceiθk and ±ρZ Λc = Λc , we have also k→∞

u(Λc ) ⊂ Λceiα U {0} with eiα = lim eiθk , passing if necessary to a subsequence. k→∞

Let δ ∈ Ar be a proximal element and γk = δ k , hence Keru = W = {0} has codimension 1 and Imu = Rvδ for some vδ = 0. From the above we get u(Λc ) ⊂ ±ρZ vδ U {0}. Since u−1 ({vδ }) = a + W with a = 0, we get Λc \ W ⊂ ±ρZ (a + W ), hence Λc has a countable projection in V /W . Let Fc be the family of subspaces W  in V such that Λc has a countable projection in V /W  . We observe that the condition Λceiβ = exp(βt−1 )Λc gives Fceiβ = Fc for β ∈ R. Also if W1 , W2 ∈ Fc , then W1 ∩ W2 ∈ F ; this follows since V /W1 ∩ W2 is a subspace of V /W1 × V /W2 . Then Fc has a minimum element Wc = ∩ W  . Since Fc = Fceiθ , we have W ∈Fc

Wc = Wceiθ , hence the condition γΛc ⊂ Λceiθ for γ ∈ A implies γ(Wc ) ⊂ Wc . Since the projection of Λc in P(V ) is Λ(Δ) and, since dimV ≥ 2, we know from the theorem that Λ(Δ) is uncountable, it follows that Wc = {0}. This contradicts irreducibility of Δ. Hence we have t = 0, ϕ(x) = eiθ ϕ(γ.x) for any x ∈ Λ(Δ), γ ∈ A. Taking γ proximal and x = γ + we get eiθ = 1 since |ϕ(x)| = 1, hence ϕ =const by minimality of the Δ-action on Λ(Δ).  We need the following lemmas, the first of them is well known (see [5] p 90–94, Lemma 2.16). The second one is basic. Lemma 2.17. Let A be a finite set, Ω the compact metric space AN and θ the n−1

shift on Ω. For a function ϕ on Ω we denote Sn (ω) = Σ ϕ ◦ θ k (ω). Suppose ϕ k=0

is H¨ older continuous and for any periodic point ω of period p, Sp ϕ(ω) ∈ Z. Then older function ψ on Ω such there exists a H¨ older Z-valued function ϕ on Ω and a H¨  that ϕ = ϕ + ψ − ψ ◦ θ. If for any p-periodic ω we have Sp ϕ(ω) = 0, then ϕ is of the form ψ − ψ ◦ θ with ψ H¨ older on Ω. / Vh< . Lemma 2.18. Assume g, h ∈ GL(V ) are such that h is proximal and g.h+ ∈ Then for n = 2p large, g hn is proximal and: lim (g hn )+ = g.h+ ,

n→∞

lim Vg 1 and the semigroup generated by supp(μ) satisfies condition (I-P). Then for any s ∈ Iμ , there exists a unique probability measure ν s on P(V ), a unique positive continuous function es ∈ C(P(V )) with ν s (es ) = 1, Pμs ν s = k(s)ν s , Pμs es = k(s)es . For s ∈ Iμ , if |g|s γ δ (g)dμ(g) < ∞ for some δ > 0, then for ε sufficiently small we have on H ε (P(V )) : Pμs = k(s)(ν s ⊗ es + U s ) where ν s ⊗es is the projection on Ces defined by ϕ → ν s (ϕ)es and U s is an operator with spectral radius less than 1 which satisfies U s (ν s ⊗ es ) = (ν s ⊗ es )U s = 0. Furthermore the function k(s) is analytic, strictly convex on ]0, s∞ [ and the function s → ν s ⊗ es from ]0, s∞ [ to End(H ε (P(V )) is analytic. If t = Imz = 0, the spectral radius of Pμz is less than k(s). The proof follows the same lines as in Theorem 2.9. We give below corresponding indications. The following lemmas and Propositions are basic. and Lemma 2.21. Assume σ ∈ M 1 (P(V )) is not supported by an hyperplane  s > 0. There there exists Cs (σ) > 0 such that for any u ∈ End V : |ux|s dσ(x) ≥ Cs (σ)|u|s Proof: By homogeneity it suffices to show the above inequality if |u| = 1.  The function u → |ux|s dσ(x) is continuous on the unit sphere Σ of End V . SinceΣ is compact, this function reaches its infimum Cs (σ) at u0 ∈ Σ. If Cs (σ) = 0, then |u0 x|dσ(x) = 0, hence supp(σ) ⊂ Ker u0 , which contradicts the hypothesis on σ.  Lemma 2.22. If s ∈ Iμ , there exists σ ∈ M 1 (P(V )), k > 0 such that Pμs σ = kσ. We have k(s) = k and σ is not supported on a hyperplane. Furthermore for any n ∈ N:   |g|s dμn (g) ≥ kn (s) ≥ Cs (σ)

|g|s dμn (g)

296

YVES GUIVARC’H

Proof: We consider the non linear operator P/s on M 1 (P(V )) defined by P/s σ = Since s ∈ Iμ this operator is continuous on the compact convex set M (P(V )) endowed with the weak topology, hence Schauder-Tychonov theorem implies the existence of k > 0 and σ ∈ M 1 (P(V )) with P/s σ = σ, Pμs σ = kσ, hence k = (Pμs σ)(1).  For such a σ the equation kσ(ϕ) = ϕ(g.x)|g x ˜|s dμ(g)dσ(x) implies g.x ∈ supp(σ), μ − a.e if x ∈ supp(σ), hence for any g ∈ supp(μ) : g.supp(σ) ⊂ supp(σ). Then the projective subspace W generated by supp(σ) is [supp(μ)]-invariant and if  W is proper this contradicts condition (I-P). Since k ≤ ( |g|s dμn (g))1/n the two sided inequality for k follows from Lemma 2.21, hence k = k(s).  (Pμs σ(1))−1 P s σ. 1

The dual version σ  of the measure σ considered above is useful to construct a as follows.

Pμs -eigenfunction

Lemma 2.23. Assume σ  ∈ M 1 (P(V ∗ )) and k > 0 satisfy ∗ P sμ σ  = k σ  . Then k = k(s) and the function es0 on P(V ) defined by:  x, y˜.|s d σ  (y), es0 (x) = |-˜ 

older where |˜ x| = |˜ y | = 1, satisfies Pμs es0 = k(s)es0 . Furthermore es0 is positive and H¨ of order s¯ = inf(s, 1)   Proof: Since |g ∗ | = |g|, we have |g|s dμn (g) = |g ∗ |s dμn (g), hence using the dual version of Lemma 2.22 we get k = k(s). Also:  x, g ∗ .˜ y .|s |g ∗ y˜|s dσ  (y), ∗ P sμ σ  = k σ  |g x ˜|s es0 (g.x) = |-˜  x, y˜.|s dσ  (˜ hence, Pμs es0 (x) = k |-˜ y ) = k(s)es0 (x). If es0 (x) = 0 for some x, then   -˜ x, y˜. = 0 σ − a.e, hence supp(σ ) is contained in the hyperplane of P(V ∗ ) orthogonal to x. Since [supp(μ)] satisfies (I-P), this contradicts the dual version of Lemma older property of es0 follows from the inequality 2.22. Hence es0 is positive. The H¨ s s s¯ |a − b | ≤ s/|a − b| where a, b ∈ [0, 1].  ˜|s es0 (g.x)(es0 (x))−1 , q s (x, g) = q1s (x, g) For s ∈ Iμ we define qns (x, g) = k−n (s)|g x and we observe that Lemma 2.23 implies qns (x, g)dμn(g) = 1. Also we consider the Markov operator Qs on P(V ) defined by Qs ϕ(x) = ϕ(g.x)q s (x, g)dμ(g) hence  s n (Q ) ϕ(x) = ϕ(g.x)qns (x, g)dμn (g). For z = s + it, t ∈ R, s ∈ Iμ we write:  ˜|it dμ(g). (Qz ϕ)(x) = ϕ(g.x)q s (x, g)|g x Then we have the following simple lemmas which allow to control the iterates (Qs )n . Lemma 2.24. If f ∈ C(P(V )) is real valued and satisfies Qs f ≤ f , then f is constant on Λ([suppμ]) and equal to its infimum on P(V ). Lemma 2.25. For any g ∈ SL(V ) x, y ∈ V with |x| = |y| = 1: ˜ ˜ y), ||gx|s − |gy|s | ≤ (s + 1)|g|s δ˜s¯(x, y), δ(g.x, g.y) ≤ 2|g||gx|−1 δ(x, |q s (x, g) − q s (y, g)| ≤ bs |g|s k−1 (s)δ˜s¯(x, y) with bs > 0.

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

297

To go further, in analogy with the case s = 0 in Theorem 2.9, we need to / N , θ is the shift and Qs is consider a dynamical system (Ω, θ, Qs ) where Ω = G a Markov measure defined as follows. We denote by π s a Qs -invariant extremal probability on P(V ), we observe that with 9 ⊗n , Sn (ω) = gn · · · g1 ∈ G 9 ω = (g1 , · · · , gn ) ∈ Ωn = G we have



n

qns (x, Sn (ω)) = Π q s (sk−1 .x, gk ), k=1

qns (x, Sn (ω))dμ⊗n (ω) = 1.

Definition 2.26. We denote by Qsx ∈ M 1 (Ω) the limit of the projective system  of probabilities qns (x, .)μ⊗n on Ωn , and we write Qs = Qsx dπ s (x). θ on Ω, we write Es for the We observe that Qs is invariant under  s the shift s s corresponding expectation : E (ϕ) = Ex (ϕ)dπ (x) where Esx is the expectation with respect to Qsx . Since π s is extremal, it follows that Qs is θ-ergodic. Using the above Lemmas it is easy to show the following. Proposition 2.27. There exists a constant b > 0 such that for any x, y ∈ P(V ) the total variation measure of Qsx −Qsy is bounded by bδ s¯(x, y)Qs . For any x ∈ P(V ), Qsx is equivalent to Qs . Using the Markov operator ∗ Qs on P(V ∗ ), given by  ∗ s Q ϕ(x) = ∗ q s (x, g)ϕ(g ∗ .x)dμ(g) where ∗ q s is the dual analogue of q s , the corresponding ∗ Qs -invariant measure ∗ s π and function ∗ es0 on P(V ∗ ), one obtains a ∗ P sμ -eigenmeasure by the formula ∗ s ν = (∗ es0 )−1 ∗ π s . Then the positive kernel x → νxs from P(V ) to P(V ∗ ) given  s s ∗ s ν is continuous in variation and satisfies g ∗ .νg.x q s (x, g)dμ(g) = by νxs = |x,.| es0 (x) s ∗ s s νx , hence Sn .νSn .x is a bounded martingale with respect to Qx and the natural filtration. Then, in analogy with Theorem 2.11 we have the following. Proposition 2.28. The martingale Sn∗ .νSs n .x converges Qsx − a.e to a Dirac measure δzs∗ (ω) , the law of zs∗ (ω) is ∗ π s , ∗ π s is proper, g1∗ .zs∗ (θω) = z ∗ (ω) Qs − a.e and if < x, z ∗ (ω) > = 0, then lim σ(Sn , ξ) = 0 Qs − a.e. In particular, ∗ π s is the n→∞

unique ∗ Qs -invariant probability.

Then we deduce the following (see [31]). Theorem 2.29. The Lyapunov exponents L1μ (s), L2μ (s) of Sn (ω) satisfy L1μ (s) > L2μ (s). One has, using the same arguments, the analogues of Theorem 2.9, Corollary 2.10. (See [31], Theorem 3.17).  ˜ satisfies |g|s γ δ (g)dμ(g) < ∞ for Theorem 2.30. Assume that μ ∈ M 1 (G) some δ > 0, s ∈ Iμ and [suppμ] satisfies (I-P). Then for ε sufficiently small one has   ε 1/n δ (Sn .x, Sn .y) ρs (ε) = lim sup Esx 1, the semigroup [supp(μ)] satisfies condition (I-P) and for some δ > 0 γ δ (g)dμ(g) < ∞. Then there exists σ > 0 such that for any ϕ ∈ C(P(V )), x ∈ P(V ) and ψ compactly supported on R and Riemann integrable, we have:  √ lim σ 2πn ϕ(g.x) ψ(Log|gx| − nL1μ )dμn (g) = ν(ϕ)(ψ). n→∞

Remark 3.2. In [19] the central limit theorem for the diagonal component of Sn in the Cartan decompostion of SL(V ) is proved, under the condition of Zariski density of [supp(μ)]. In ([29], Theorem 3) this result is extended to a semi-simple group. The Gaussian law has full dimension and the proof is based on the fact that the set {λγ ; γ ∈ [supp(μ)]prox } is large a property which is stronger than the density property stated in corollary 2.14. This allows one to get a multidimensional local limit theorem which extends Theorem 3.1, under the condition that [supp(μ)] has a Zariski dense projection in SL(V ). Using the operators Qs+it (s ≥ 0, t ∈ R) and the probability Qsx of section 2 instead of P μ , Qit μ and P we get in the same way, using Theorem 2.30 and the ”anomalous” centering by L1μ (s) for s > 0: Theorem 3.3.  Assume d > 1, [supp(μ)] satisfies condition (I-P) and for some δ > 0, s ∈ Iμ , |g|s γ δ (g)dμ(g) is finite. Then there exists σ s > 0 such that for any ϕ ∈ C(P(V )), x ∈ P(V ) and any compactly supported function ψ on R which is Riemann integrable we have  √ lim σ s 2πn kn (s) |gx|s ψ[Log|gx| − n L1μ (s)] ϕ(g.x)dμn (g) n→∞

= (ψ)es (x)ν s ((es )−1 ϕ).

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

299

This gives precise large deviations for the ”anomalous mean values” γs = L1μ (s) of Log|Sn (ω)x|. This corresponds to estimating the probability that Log|Sn (ω)x| belongs to an interval of fixed length centered at nγs , and gives exponential speed of convergence. One can show, using Theorem 2.9 and perturbation theory that there exists s1 < 0 such that Theorem 3.2 remains walid for s ∈]s1 , s∞ [. We recall that L1μ (s) = k (s)k−1 (s), hence the strict convexity of Logk(s) implies that for s ≥ s1 , L1μ (s) is increasing and takes only once any value between L1μ (s1 ) and lim L1μ (s). However, in general for s < 0, the function, k(s) as defined above, s→s∞

loses its analyticity at a negative value of s, hence the analogue of Theorem 3.2 is no longer valid for any s ∈ R, and a general μ. For an analogous statement see [43]. We consider now the following map θ of the circle T = {|z| = 1} ⊂ C defined by θ(z) = z 2 and the C-valued function f (z) = z. We are interested in the asymptotic n

k

n−1

properties of the Birkhoff sums Sn (z) = Σ z 2 = Σ f (θ k z). 1

0

Following [23], we sketch the proof of the following local limit in C : Theorem 3.4. There exists a positive number σ > 0 such that, for any bounded Borel set B ⊂ C with negligible boundary: lim 2π n σ m{z ∈ T ; Sn (z) ∈ B} = n→∞

(B) where (B) is the Lebesgue measure of B. In particular for m − a.e z ∈ T the sums Sn (z) are dense in C. Proof: We transfer the situation to the interval I = [0, 1], the map θ(x) = {2x} and the function f (x) = e2iπx = (cos2πx, sin2πx), replacing m by Lebesgue measure m on I. Then the adjoint Q of θ with respect to m is given by Qϕ(x) = 1 1 1 2 (ϕ(ax)+ϕ(bx)) where ax = 2 x, bx = 2 (x+1). We write xk = ak · · · a1 x (ai = a, b) n−1

for a trajectory of the Markov chain on I defined by Q and S n (x) = Σ f (xk ). By k=0

stationarity the laws of Sn (z) and S n (x) are the same, hence it suffices to show: lim 2π n σ m{x ∈ I ; S n (x) ∈ B} = (B).

n→∞

As in [23] the recurrence and density properties of Sn (z) follow. The function f is R2 -valued and we admit the multidimensional extension of Theorem 1.5. Here aperiodicity of f means that for λ ∈ R2 \ {0}, α ∈ R the equation: (1)

e2iπλ,f  = e2iπα (u ◦ a)u−1 = e2iπα (u ◦ b)u−1

has no H¨ older solution u. We write here Qiλ ϕ = Q(e2iπλ,f  ϕ). The Lipchitz coefficients of ϕ, Qϕ, Qiλ ϕ satisfy [Qϕ]1 ≤

1 1 [ϕ]1 , |Qiλ ϕ]1 ≤ [ϕ]1 + 2π|λ||ϕ|∞ , 2 2

hence condition (D-F) is satisfied by Q, Qiλ on H 1 (I), C(I). Clearly Q is ergodic as is θ. Then, the local limit theorem will follow from Theorem 1.7 if f is aperiodic. Since f (0) = f (1), from equation (1) we get u(0) = u(1), hence we can write u = e2iπv where v is a H¨older 1-periodic function on R. Then, using (1), the

300

YVES GUIVARC’H

aperiodicity of f is not valid if there exists α ∈ R, such a v and a Z-valued function k(x) with -λ, f .(2x) = α + v(x) − v(2x)) + k(x). By continuity we get that k(x) is independant of x. Integrating with respect to m, we get -λ, f .(2x) = v(x) − v(2x). Now with λ = (a, b) = (0, 0) we write -λ, f .(x) = a cos2πx + b sin 2πx, and we use ∞

the Fourier series expansion of v, v(x) = Σ ck e4iπkx . Then the above equation −∞

gives a = c1 , b = c−1 , c0 = 0, c2k = ck , c2k+1 = 0. Hence c2k or c−2k is equal to a or b, which contradicts the fact that v is square integrable on I.  For the use of the Perron-Frobenius operator and spectral gaps on Sobolev spaces in the more general situation of holomorphic endomorphisms of the complex projective space Pd (C) see [9] and ([11], Theorem 1.83). The following is useful in dealing with L2 -spectral gaps in the two situations which follow, i.e motion groups and nilmanifolds, but also in the general context of “strong mixing”. For other examples see [10] where the following was proved in a slightly different form. Proposition 3.5. Let (X, m) be a probability space, Γ a countable group acting on X by measure preserving transformations, A a generating set for Γ, μ ∈ M 1 (Γ) a probability measure on Γ with supp(μ) = A. Let c(a, x) = ca (x) (a ∈ A, x ∈ X) be a measurable function with |c(a, x)| = 1, Q(resp Qc ) the operator on L2 (X, m) = L2 (X) defined by Qϕ = Σ μ(a)ϕ ◦ a (resp Qc ϕ = Σ μ(a)ca ϕ ◦ a). Assume that a∈A

a∈A

the restriction Q0 of Q to L20 (X) satisfies r(Q0 ) < 1. Then one has r(Qc ) = 1 if and only if there exists a measurable function α(x) with value in [0, 2π[ and θ ∈ [0, 2π[such that for any a ∈ A c(a, x) = ei(α(ax)−α(x)+θ) m − a.e. In particular any unimodular spectral value of Qc is an eigenvalue of Qc . Proof: The proof begins like the proof of Theorem 2.1. Assume r(Qc ) = 1 and let eiθ be a spectral value of Qc . Then, either the subspace Im(eiθ −Qc ) is not dense in L2 (X) or there exists ϕn ∈ L2 (X) with ϕn 2 = 1 and lim Qc ϕn −eiθ ϕn 2 = 1. n→∞

In the first case there exists ϕ ∈ L2 (X) with eiθ (Qc )∗ ϕ = ϕ. Since e−iθ Qc is a contraction of L2 (X), ϕ is also a fixed point of e−iθ Qc , hence Qc ϕ = eiθ ϕ and it suffices to consider the second case . We have: 0 ≤ Qc ϕn − eiθ ϕn 2 = Qc ϕn 22 + ϕn 22 − 2Re-Qc ϕn , eiθ ϕn ., 0 ≤ 2 − 2Re-Qc ϕn , eiθ ϕn .. Then the condition lim Qc ϕn − eiθ ϕn  = 0 is equivalent to lim Re-Qc ϕn , eiθ ϕn . n→∞

n→∞

= 1 hence to lim -Qc ϕn , eiθ ϕn . = 1. Since ϕn 2 = 1 and |c(a, x)| = 1 we have: n→∞

|Qϕ|(x) ≤ Q|ϕ|(x), 0 ≤ |-Qc ϕn , eiθn ϕn .| ≤< Q|ϕn |, |ϕn | >≤ 1. It follows lim -Q|ϕn |, |ϕn |. = 1, i.e lim Q|ϕn | − |ϕn | 2 = 0. n→∞

n→∞

We can write |ϕn | = -|ϕn |, 1. + ψn where ψn := |ϕn | − -|ϕn |, 1.1 ∈ L20 (X), hence lim (Q − I)ψn 2 = 0 and -|ϕn |, 1. ≤ ϕn 2 = 1, 1 = -ϕn , 1.2 + ψn 22 . n→∞

Since r(Q0 ) < 1, the restriction of Q − I to L20 (X) is invertible, hence the condition lim (Q−I)ψn 2 = 0 implies lim ψn 2 = 0. Hence lim  |ϕn |−12 = n→∞

n→∞

n→∞

0. On the other hand, the condition lim -Qc ϕn , eiθ ϕn . = 1 can be written as n→∞

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

301

lim Σ -ca ϕn ◦ a, eiθ ϕn .μ(a) = 1 with |-ca ϕn ◦ a, eiθ ϕn .| ≤ 1 for any a ∈ A. It

n→∞ a∈A

follows lim ca ϕn ◦ a − eiθ ϕn 2 = 0. n→∞

We write ϕn = |ϕn |eiαn with αn (x) ∈ [0, 2π[, hence: ca ϕn ◦ a − eiθ ϕn = ca eiαn ◦a (|ϕn ◦ a| − 1) − eiθ (|ϕn | − 1)eiαn + (ca eiαn ◦a − ei(αn +θ) ). From above we get lim ca ei(αn ◦a−αn −θ) − 12 = 0, therefore for a subsequence n→∞ nk : ca (x)e−iθ = lim e−i(αnk (ax)−αnk (x) , m − a.e. k→∞

Hence for any γ ∈ -A. = Γ, σ(γ, x) = lim e−i(αnk (γx)−αnk (x)) exists m − a.e and k→∞

is a T-valued cocycle. On the other hand, the spectral gap property r(Q0 ) < 1 implies the strong ergodicity (see [47] Proposition 2.10) of the action of Γ on X, hence proposition 2.3 of [47] gives the existence of a measurable function α on X such that: σ(γ, x) = cγ (x)e−i|γ|θ = e−i(α(γx)−α(x)) , m − a.e with |γ| equal to the length of γ in A. The result follows.  Remark 3.6. The proof of [47] shows that, up to unimodular coefficients, the sequence ϕn converges in measure to an eigenfunction ϕ of Qc with eigenvalue eiθ . We write the group of complex motions of Cd in semi-direct product form Md = SU (d)αCd , we denote by μ a finitely supported probability on Md , and by μ its projection on SU (d) ; let B = supp(μ), A = supp(μ). If A generates a dense subgroup of SU (d), i.e -A. = SU (d), and if the elements of A have algebraic coefficients in some base of Cd , then it is known (see [3]) that for d > 1 the convolution operator ρ0 (μ) by μ on L20 (S2d−1 ) satisfies r(ρ0 (μ)) < 1. For d = 1, the problem of equidistribution of μn ∗δv (v ∈ C) was considered by D.A Kazdhan ([36]). ¯ defines a random walk on T, which has no spectral In this case M2 is solvable and μ gap in L2 (T), in general ; however M2 is non-abelian and unitary representations are still useful (see [21], [53]) and allow one to obtain equidistribution. For d > 1 the analogous problem remained unsolved for a long time, and the corresponding question was asked by G.-A Margulis. Using the result of [3] we give an affirmative answer (see [10]). We note that the same question has been recently solved in full generality in [51] i.e without using the spectral gap property in SU (d). Our method here is of general interest in the larger context of Markov operators with a Hilbert space spectral condition, i.e Q strongly mixing in the sense of [45]. Theorem 3.7. With the above notations and d > 1 assume that -B. = Md and A ⊂ SU (d) consists of matrices with algebraic coefficients. Then there exists σ(μ) > 0 such that for any continuous compactly supported function ϕ on Cd , any v ∈ Cd , lim σ(μ)(2πn)d (μn ∗ δv )(ϕ) = (ϕ) n→∞

where  is Lebesgue measure on Cd 3 R2d . Proof: We recall some notations from unitary representations of Md , where Cd is with V = R2d . The Fourier transform of ϕ ∈ L1 (V ) is given by ϕ(x) / =  indentified 2iπx,v ϕ(v)e d(v). For x = tu with t > 0, u ∈ S2d−1 we write ϕ (u) = ϕ(tu) / so t  that the Fourier inversion formula for ϕ gives ϕ(v) = ϕt (u)e−2iπtu,v d(x). If

302

YVES GUIVARC’H

−1 h v) = d is given by hv = av + b where a ∈ SU (d), b ∈ V , we get ϕ(h  ∈ M−1 2iπtu,b −2iπtu,v −1 ϕt (a u)e e d(x) so that the function ϕ ◦ h has t-Fourier component given by (ρt (h)ϕt )(u) = ϕt (a−1 u)e2iπtu,b . For t > 0 the map h → ρt (h) is a unitary representation of Md in the Hilbert space L2 (S2d−1 ) which is irreducible. We denote by Q(resp Qit ) the operator on L2 (S2d−1 ) given by

Qϕ(u) = Σ ϕ(a−1 u)μ(a)(resp Qit ϕ(u) = Σ e2iπtu,b ϕ(a−1 u)μ(h)) a∈A h∈B  it n it n −2iπtu,v ˇ ∗δv (ϕ) = ((Q ) ϕt )(u)e d(x). With the notation so that Q = ρt (μ), μ of Proposition 3.4 we have X = Sd−1 , Qit = Qc where c(u, h) = e2iπtu,b . In view of Theorem 1.7, in order to prove the statement, it suffices to show that the restriction Q0 of Q to L20 (S2d−1 ) satisfies r(Q0 ) < 1, and r(Qit ) < 1 if t = 0. Since < A >= SU (d) and A has algebraic coefficients the first assertion follows from [3]. Using strict convexity in L2 (S2d−1 ), the second assertion follows from Proposition 3.4 if there is no function ϕt ∈ L2 (S2d−1 ) such that for some θ ∈ R, t > 0, any h ∈ B : ρt (h−1 )ϕt = eiθ ϕt . But this relation implies that the group ρt (-B.) = ρt (Md ) has a one dimensional invariant subspace; since the representation ρt of Md is irreducible, this is impossible.  Let X = N/Δ be a compact nilmanifold, H a subgroup of the automorphism group Aut(X) of X, m the Haar measure on X. It is known (see [1]) that, if there is no N -equivariant torus factor T of X such that the projection of H in Aut(X) is virtually abelian, then the natural representation ρ0 of H in L20 (X) does not weakly contain the identity representation. It follows that if A ⊂ H generates H and μ ∈ M 1 (A) is aperiodic, i.e supp(μ) is not contained in a coset of a proper subgroup of H, then the corresponding Markov operator Q0 = ρ0 (μ) satisfies r(Q0 ) < 1. We recall that a finitely generated group Γ is said to be virtually abelian if it contains a normal group Zr such that the quotient group Γ/Zr is finite. We consider the random walk on X defined by μ and we denote by xn = an · · · a1 x (ai ∈ suppμ, x ∈ X) its trajectories, where x is distributed according to m. Let f be an R-valued function on X with m(f ) = 0, m(f 2 ) < ∞ n−1

and let us consider the Birkhoff sums Sn (x, ω) = Σ f (xk ) with respect to the 0

product measure Pm = m ⊗ P on X × H N . We denote by Em the corresponding expectation symbol. Then Theorem 1.7 gives: Theorem 3.8. With the above notations, assume that the group H = -supp(μ). ⊂ Aut X is ergodic on X, that μ is aperiodic on H and there is no N -equivariant torus factor of X such that the corresponding projection of H is virtually abelian. Let f ∈ L20 (X) be such that the equation eitf (x) = eiθ is not satisfied m − a.e for any (t, θ) ∈ R2 \ {0}. Then there exists σ(μ, f ) > 0 such that for any ϕ ∈ L2 (X) any ψ continuous with compact support on R: √ lim σ(μ, f ) 2πnEm (ϕ(xn )ψ(Sn )) = m(ϕ)(ψ).

n→∞

We observe that there are natural subgroups of automorphims of X without any hyperbolic element which satisfy the above hypothesis (see [10]).

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

303

Proof of Theorem 3.6 As in the above example we use Theorem 1.7. The condition r(Q0 ) < 1 follows from [1]. Here we have Qit ϕ(x) = Σ ϕ(ax)eitf (ax) μ(a). a∈A

Hence the condition r(Qit ) < 1 for t = 0 from Proposition 3.4 will be satisfied if there is no t = 0, ϕt ∈ L2 (X), θ ∈ R such that, for any a ∈ supp(μ) and m − a.e: eitf (ax) = eiθ

ϕt (ax) ϕt (x)

iθ −1 itf (ax) −1 This equation implies eitf (ax) ϕ−1 ϕt (ax) is indet (ax) = e ϕt (x), hence e pendant of a ∈ suppμ. Since (supp(μ)) (supp(μ))−1 generates H and H is ergodic iθ −1 we get eitf (x) ϕ−1 t (x) = c, m − a.e with |c| = 1. It follows e ϕt (x) = c, hence ϕt (x) = c−1 eiθ , eitf (x) = eiθ , m − a.e, which is excluded. 

4. Extreme value theory for affine random walks. We consider the affine group H of the vector space V = Rd , a probability λ on / the product measure λ⊗N 9 = GL(V ). We denote by P H with projection μ on G N / / / / / on Ω = H , we consider the dynamical system (Ω, θ, P) where θ/ is the shift on Ω, and we study some limit theorems for the stochastic recursion xn+1 = gn+1 xn + bn+1 , x0 = x where (gn , bn ) are i.i.d random variables with law λ. Under hypothesis (C) below there exists a unique λ-invariant probability ρ on V . We recall below the ”homogeneity at infinity” property of ρ (see [31], [32]), first observed in [38] in a weak form, and we describe its consequences for extreme value theory of the process (xn )n∈N . The proofs of homogeneity are given here in a restricted setting and are based on the properties of the operators Pμs already studied in section 2. Here we consider also the case V = R ; then we say that a probability μ on R∗ is arithmetic if the semigroup [supp(μ)] is contained in a subgroup ±aZ of R∗ (a > 0). If d ≥ 1, our main hypothesis will be the existence of ρ, condition (I-P) for [supp(μ)] and the existence of g, g  in [supp(μ)] with r(g) < 1, r(g  ) > 1. More precisely condition (C) is the following. a) The semigroup [supp(λ)] has no fixed point in V . 1/n  = 1. |g|α dμn (g) b) There exists α > 0 with k(α) = lim n→∞ !  c) For some δ > 0 we have : |g|α γ δ (g) + |b|α+δ dλ(h) < ∞. d) If d > 1, [supp(μ)] satisfies condition (I-P) and if d = 1, [supp(μ)] is nonarithmetic. 1 Condition (C) implies that Logk (0) = lim Log|Sn | < 0. We observe that n→∞ n n xn − Sn x has the same law as Σ g1 · · · gk−1 bk , i.e as the partial sums of the ∞

k=1

convergent series R = Σ g1 · · · gk−1 bk . Since lim Sn x = 0, we have the equation : 1

n→∞

/ R(ω) = g1 R(θω)+b 1 , hence the law ρ of R is the  unique λ-invariant probability. We observe that the above formula for R implies |x|s dρ(x) < ∞ for s < α. Property b) of condition (C) implies the unboundedness of supp(ρ). We denote by η → t.η dt on the extension of the dilation v → tv to measures on V , by s the measure ts+1 9 R+ , by Λ([supp(μ)]) the inverse image in S(V ) of the limit set Λ([supp(μ)]) ⊂ P(V ) considered in section 2. Then we have the following ”homogeneity property of

304

YVES GUIVARC’H

infinity” for ρ, which is basic for the whole development of extreme value theory (see [41]): Theorem 4.1. Under condition (C), the λ-invariant probability ρ satisfies: σ α ⊗ α lim t−α (t.ρ) = c9

t→0+

9 where c > 0, σ 9α is a probability measure on Λ([supp(μ)]) and σ 9α ⊗ α is a μ9 harmonic measure supported by RΛ([suppμ]). If [suppμ] has no proper convex invariant cone then σ 9α is equal to ν9α , the unique symmetric measure on S(V ) with α 9α is a convex combination of the two extremal projection ν on P(V ). If not, σ α P -invariant probabilities on S(V ) with projection ν α on P(V ). We observe that, in various situations, ‘homogeneity at infinity” of solutions of the corresponding functional equations can be deduced from the above theorem. This is the case (see [8]) in the study of the multidimensional ”smoothing process” considered by Durett-Liggett in the context of statistical mechanics. Below, we assume condition (C) and we will restrict ourselves to the case where [supp(μ)] has no proper convex invariant cone. The proof of Theorem 4.1 in the general case is long (see [31]). Here, with Hv+ = {x ∈ V ; -x, v. ≥ 1} (v ∈ V \ {0}), we show only the convergence of t−α (t.ρ)(Hv+ ) under a reinforced hypothesis, and we deduce the homogeneity at infinity of ρ if α ∈ / N. The proof of the positivity of c in the general case is based on the well known Kac’s return lemma and the method of ladder indices expounded in [14] for the case of random walks on the line ; here we will give an analytic argument in case d = 1 only. If d ≥ 2 we observe that the above homogeneity of ρ is stable under perturbation of λ in the weak topology; it follows that convergence to Fr´echet’s law for affine random walks is robust if d ≥ 2. This is also the case for the convergence to stable laws associated with affine random walks on Rd (see [17]). 9z 9z  Inz the special case considered above, the operator Pμ defined by Pμ ϕ(x) = |gx| ϕ(g.x)dμ(g) (z = s + it , s ≥ 0, t ∈ R) on the unit sphere S(V ) has spectral properties similar to those of Pμz in section 3. In particular, there exists a unique P9μs -eigenmeasure (resp eigenfunction) on S(V ) : P9μs ν9s = k(s)9 ν s , P9μs e˜s = k(s)9 ν s and s s s s ν9 (resp e˜ ) is the symmetric lift of ν (resp e ) to S(V ). The proof of Theorem 4.1 is based on a lemma and a proposition as follows. Lemma 4.2. There exists an open set D ⊂ C which contains the set {Rez ∈ ]0, α]} such that (I − P9μz )−1 is meromorphic in D with a unique simple pole at z = α. Furthermore: lim (α − z) (I − P9z )−1 = k (α)−1 (9 ν α ⊗ e˜α ). z→α

Proof: If d = 1, the lemma follows from elementary arguments and nonarithmeticity of [supp(μ)]. We assume now d > 1. From the analogue of Theorem 2.20 in case S(V ) it follows that for some ε > 0 and any z with |z − α| < ε, there exists a holomorphic function k(z) such that k(z) is a dominant simple eigenvalue of P9μz with k(z) = 1 + k(α)(z − α) + ◦(z − α). Since k (α) = 0, we have k(z) = 1 for z = α and |z − α| small. Also for |z − α| small we have in H ε (S(V )) the deν z ⊗ e9z + U (z) where ν9z ⊗ e9z is a projection on C9 ez , U (z) composition P9μz = k(z)9 z z z z z z ν ⊗ e9 )U (z) = 0, r(U (z)) < 1 and ν9 ⊗ e9 , U (z) depend satisfies U (z)(9 ν ⊗ e9 ) = (9 holomorphically on z. We write: I − P9μz = (1 − k(z))(9 ν z ⊗ e9z ) + pz (I − U (z)),

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

305

where pz = I −9 ν z ⊗9 ez . It follows (I − P9μz )−1 = (1−k(z))−1 (9 ν z ⊗9 ez )+pz (I −U (z))−1 , hence lim (α−z)(I − P9μz )−1 = k (α)−1 (9 ν α ⊗ e9α ) and (I − P9μz )−1 is meromorphic in a z→α

disk D0 centered at α with radius ε ≤ ε, with unique pole at z = α. For z = α + it with |t| ≥ ε , from the last assertion in Theorem 2.20 we get that there exists a disk Dt centered at α + it such that r(P9μz ) < 1 for z ∈ Dt , hence (I − P9μz )−1 is a bounded operator depending holomorphically on z for z ∈ Dt . If Rez ∈]0, α[, then r(P9μz ) ≤ r(P9μs ) < 1, hence I − P9μz is invertible and the function z → (I − P9μz )−1 is holomorphic in the domain Rez ∈]0, α[. Then the open set (∪ Dt ) ∪ {Rez ∈]0, α[} t

with t = 0 or |t| ≥ ε satisfies the conditions of the lemma and the formula for lim (α − z)(I − P9μz )−1 is valid.  z→α



We note that R = Σ g1 · · · gk−1 bk satisfies the equation: R = g1 R ◦ θ + B with 1

B = b1 We denote for v ∈ V ∗ and Rez = s ∈]0, α[: / / fz (v) = E(-R, v.z+ ), dz (v) = E(-R, v.z+ − -R − B, v.z+ ). Then we have the: Proposition 4.3. We have the convergence: lim t−α (t.ρ)(Hu+ ) = c(9 ν α ⊗ α )(Hu+ ) = C(u) = α−1 lim (α − s)fs (u) s→α−

t→0+

ν ⊗  ) = ν9 ⊗ α . where |u| = 1, c ≥ 0 and ν9 satisfies μ ∗ (9  −1 ∗ α ¯ ν9 (dα ) ∗ e9α (u) where d¯α is the restriction of dα We have C(u) = (αk (α)) ∗ to S(V ). α

α

α

α

/ Proof: We write the above functional equation in the form R − B = g1 R ◦ θ, / g ∗ v.z , ∗ P μ fz = f 1 , with hence for Rez = s ∈]0, α[ we have -R − B, v.z+ = -R ◦ θ, 1 + z / − B, v.z , i.e (I − ∗ P μ )fz = dz . By homogeneity of fz , dz and writing fz1 (v) = E-R + f¯z , d¯z for the restrictions of fz , dz to S(V ) we get: (I − ∗ P zμ )f¯z = d¯z . The function d¯z (u) is dominated by: s / / ) + ε(z)E(|B|(|B|+ < R, u >+ )s−1 ) ε (z)E(|B|

where ε(z) = |z|1[1,s∞ [ (s), ε (z) = 1[0,1] (s). Hence, using the H¨ older inequality and the moment hypothesis, we get that for u fixed the function z → d¯z (u) is holomorphic in the domain Rez ∈]0, α + δ[ ; on the other hand the above lemma shows the meromorphicity of z → (I − ∗ P zμ )−1 in an open set D which contains {Rez ∈]0, α]}. The above estimation of d¯z (u) shows z that the same property is valid for f¯z = (I − ∗ P9μ )−1 (d¯z ). If we denote by ρu the  s law of < R, u >+ , we have f¯s (u) = x dρu (x), hence f¯s (u) is the Mellin transform z of the positive measure ρu . Then, since (I − ∗ P9μ )−1 has a simple pole at z = α, we can apply Wiener-Ikehara theorem (see [54]) to f¯s (u), ρu and obtain the tail of ρu in the form : lim tα ρu (t, ∞) = lim α−1 (α − s)f¯s (u). Hence, using Lemma 4.2: t→∞

s→α−

lim (α − s)f¯s (u) = k (α))−1∗ ν9α (d¯α ) ∗ e9α (u).

s→α−

306

YVES GUIVARC’H

Using the formula for ∗ e9α (u) given in Lemma 2.23, we get: lim (α − s)f¯s (u) = αC(u) = αC(9 ν α ⊗ α )(Hu+ )

s→α−

with C ≥ 0, C(u) ≥ 0 and μ ∗ (9 ν α ⊗ α ) = ν9α ⊗ α .  For d = 1 and some additionnal hypothesis we show lim tα ρ(t, ∞) > 0, using t→∞

an analytic argument sketched in [27]. This argument uses a lemma of E. Landau (see [54]) which says the following : if the Mellin transform of a positive measure ρ supported on [0, ∞[ can be extended holomorphically to a neighbourbood of ∞ α ∈ R+ , containing [α, α + ε[ then we have 0 xs dρ(x) < ∞ if s < α + ε. Then we have the: Proposition 4.4. Assume d = 1, λ satisfies condition C, has compact support and supp(μ) ⊂ R+ . Then if supp(ρ) ∩ R+ is unbounded, then: lim tα ρ(t, ∞) = c+ > 0.

t→∞



Proof: We know that R = Σ g1 · · · gk bk+1 satisfies the equation : R − b1 = 0

/ Since g1 is positive here we get (R − b1 )+ = g1 R+ ◦ θ. / g1 R ◦ θ. z 1 z / / With f (z) = E(R+ ), E f (z) = E(R − b1 )+ ), d(z) = f (z) − f 1 (z), we get for s < α: (E0 ) k(z)f (z) = f1 (z), and (E) (1 − k(z))f (z) = d(z). Here we assume supp(μ) ⊂]0, K[, supp(λ) ⊂]0, K] × [−B, B], for some positive / z ), E(|b / 1 |z ) are holomorphic for Rez > 0. K and B, hence the functions k(z) = E(g 1 z Using the mean value theorem we get |R+ − (R − b1 )z+ | ≤ B s + |z|B(R+ + B)s−1 , hence |d(z)| < ∞ for s < α + 1. Also, since k(α) = 1 and μ is non arithmetic we have |k(α + it)| < 1 for t = 0. Then equation (E) implies that f (z) extends meromorphically to an open set U ⊃ {0 < Rez ≤ α}, with possibly a unique simple pole at z = α. Then, as in the proof of Proposition 4.3 we get lim tα ρ(t, ∞) = t→∞

α−1 lim (α − s)f (s) = A. If A = 0, since k(s) > 1 for s > α, f (z) is holomorphic in s→α

a neighbourhood of ]0, α + 1[, hence the Landau  ∞Lemma quoted above implies that f (z) is holomorphic for 0 < Rez < α + 1 and 0 xs dρ(x) < ∞ for s < α + 1. Since k(s) > 1 for s > α, the function (1 − k(z))−1 is holomorphic in a neighbourhood of R+ . Then as above, the Landau Lemma and equation (E) imply that the Mellin ∞ transform f (z) is holomorphic for Rez > 0, and f (s) = 0 xs dρ(x) < ∞ for s ≥ 0. Now equation (E0 ) gives for s ≥ 1 / / s )1/s + B. k1/s (s)f 1/s (s) = E((R − b1 )s+ )1/s ≤ E(R + Hence (k1/s (s) − 1) f 1/s (s) ≤ B. Since 1 < lim k1/s (s) = K1 ≤ K, it follows: s→∞

lim f 1/s (s) ≤ B(K1 − 1)−1 < ∞.

s→∞

Hence we have: supp(ρ) ⊂] − ∞, B(K1 − 1)−1 ], which contradict the hypothesis supp(ρ) ∩ R+ unbounded. 

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

307

Proof of Theorem 4.1 We consider only the case where [supp(μ)] has no proper convex invariant cone and α ∈ / N. Then using Proposition 4.3, and the properties of Radon transforms of positive measures studied in [2], we get the homogeneity at infinity result for ρ if α ∈ / N. For α ∈ N we use radial Fourier analysis (see [31]).  Theorem 4.1 allows to describe the asymptotic behaviour of large values of xn , following the framework of extreme value theory (see [40], [41]). We denote Mn = sup{|xi | ; 1 ≤ i ≤ n}. We recall that Fr´echet’s law with parameters p, α −α is the probability measure Φα,p on R+ given by Φα,p (0, t) = e−pt . For i.i.d random variables xk with law ν such that lim tα ν(t, ∞) = c > 0, it was observed t→∞

by M. Fr´echet (see [41]) that the law of n−1/α Mn converges to Φα,c . For affine random walks it can be shown (see [31]) that Fr´echet’s law is still valid but with a parameter 0 < p < c, an inequality due to the clustering of extreme values, which corresponds to the mixing with speed properties of the weakly dependant sequence xn (see Lemma 4.7 below). For analogous results on different homogeneous spaces see [39]. Proposition 4.5. With the notations and hypothesis of Theorem 4.1, the law of n−1/α Mn converges to the Fr´echet law Φα,p with p < c. As observed in [44] in the case of geodesic flow on the modular surface, the famous Sullivan’s logarithm law (see [50]) is a simple consequence of such a result. Here we have the / − a.e convergence: Corollary 4.6. For any v ∈ V , x0 = v, we have the P lim sup n→∞

1 Log|xn | = . Logn α

The proofs of extreme value properties for the sequence xn are based on the nice spectral properties of the family of operators P f,t defined below. Let f be a Lipchitz function which compact support on [0, 1] × V . For χ > 0, and ϕ ∈ |ϕ(v)| C(V ), we write |ϕ|χ = sup (1+|v|) χ . For ε ∈]0, 1[, λ ≥ 0 with λ + ε < χ we write

[ϕ]ε,λ = sup |x−y|ε|ϕ(x)−ϕ(y)| , we define the space H χ,ε,λ by |ϕ|χ + [ϕ]ε,λ < ∞ (1+|x|)λ (1+|y|λ ) and we norm H χ,ε,λ by : ϕ = |ϕ|χ + [ϕ]ε,λ . Then H χ,ε,λ endowed with this norm is a Banach space and  for suitable values of χ, ε, λ the ”Laplace operator” P f,t defined by P f,t ϕ(v) = e−f (t,hv) ϕ(hv)dλ(h) acts continuously on it. The use of the spectral properties of operators P f,t allows one to prove multiple mixing properties of the sequence xn , in particular the following Lemma 4.7 below. /ρ on the product space V N∪{0} , associated We consider the Markov probability P with the affine random walk defined by λ and the invariant probability ρ. We denote rn = [n]s with 0 < s < α and fj,n (v, ω) = exp(−f (n−1 j, n−1/α < v, xj >) (1 ≤ j ≤ n). Lemma 4.7. With rn , fj,n as above, kn = [nrn−1 ] we have: #    # # # kn krn n / / # fj,n ## = 0 lim #Pρ Π fj,n − Π Pρ Π n→∞

j=1

k=1

j=1+(k−1)rn

308

YVES GUIVARC’H

For the use of complex versions of the operators P f,t in the proofs of limit theorems, in particular for spectral gap properties and convergence to stable laws n of Birkhoff sums Σ xi , after normalisation, see ([17],[28]). j=1

References [1] Bekka B. and Guivarc’h Y. : On the spectral theory of groups of affine transformations on compact nilmanifolds. arXiv : 1106. 2623. [2] J. Boman and F. Lindskog, Support theorems for the Radon transform and Cram´ er-Wold theorems, J. Theoret. Probab. 22 (2009), no. 3, 683–710, DOI 10.1007/s10959-008-0151-0. MR2530109 (2010m:60055) [3] J. Bourgain and A. Gamburd, Spectral gaps in SU(d) (English, with English and French summaries), C. R. Math. Acad. Sci. Paris 348 (2010), no. 11-12, 609–611, DOI 10.1016/j.crma.2010.04.024. MR2652483 (2011m:11029) [4] P. Bougerol and J. Lacroix, Products of random matrices with applications to Schr¨ odinger operators, Progress in Probability and Statistics, vol. 8, Birkh¨ auser Boston, Inc., Boston, MA, 1985. MR886674 (88f:60013) [5] R. Bowen, Equilibrium states and the ergodic theory of Anosov diffeomorphisms, Lecture Notes in Mathematics, Vol. 470, Springer-Verlag, Berlin-New York, 1975. MR0442989 (56 #1364) [6] L. Breiman, Probability, Addison-Wesley Publishing Company, Reading, Mass.-London-Don Mills, Ont., 1968. MR0229267 (37 #4841) [7] Broise A., Dal’bo F., Peign´e M. : Etudes spectrales d’op´ erateurs de transfert et applications. Ast´ erisque 238, S.M.F. (1996). [8] D. Buraczewski, E. Damek, Y. Guivarc’h, and S. Mentemeier, On multidimensional Mandelbrot cascades, J. Difference Equ. Appl. 20 (2014), no. 11, 1523–1567, DOI 10.1080/10236198.2014.950259. MR3268907 [9] S. Cantat and S. Le Borgne, Th´ eor` eme limite central pour les endomorphismes holomorphes et les correspondances modulaires (French), Int. Math. Res. Not. 56 (2005), 3479–3510, DOI 10.1155/IMRN.2005.3479. MR2200586 (2007c:60024) [10] J.-P. Conze and Y. Guivarc’h, Ergodicity of group actions and spectral gap, applications to random walks and Markov shifts, Discrete Contin. Dyn. Syst. 33 (2013), no. 9, 4239–4269, DOI 10.3934/dcds.2013.33.4239. MR3038061 [11] T.-C. Dinh and N. Sibony, Dynamics in several complex variables: endomorphisms of projective spaces and polynomial-like mappings, Holomorphic dynamical systems, Lecture Notes in Math., vol. 1998, Springer, Berlin, 2010, pp. 165–294, DOI 10.1007/978-3-642-13171-4 4. MR2648690 (2011h:32019) [12] W. Doeblin and R. Fortet, Sur des chaˆınes a ` liaisons compl` etes (French), Bull. Soc. Math. France 65 (1937), 132–148. MR1505076 [13] D. Dolgopyat and I. Goldsheid, Quenched limit theorems for nearest neighbour random walks in 1D random environment, Comm. Math. Phys. 315 (2012), no. 1, 241–277, DOI 10.1007/s00220-012-1539-3. MR2966946 [14] W. Feller, An introduction to probability theory and its applications. Vol. II., Second edition, John Wiley & Sons, Inc., New York-London-Sydney, 1971. MR0270403 (42 #5292) [15] A. Furman and Y. Shalom, Sharp ergodic theorems for group actions and strong ergodicity, Ergodic Theory Dynam. Systems 19 (1999), no. 4, 1037–1061, DOI 10.1017/S0143385799133881. MR1709429 (2000i:37001) [16] H. Furstenberg, Boundary theory and stochastic processes on homogeneous spaces, Harmonic analysis on homogeneous spaces (Proc. Sympos. Pure Math., Vol. XXVI, Williams Coll., Williamstown, Mass., 1972), Amer. Math. Soc., Providence, R.I., 1973, pp. 193–229. MR0352328 (50 #4815) [17] Gao Z. and Guivarc’h Y., Le Page E. : Spectral gap properties and convergence to stable laws for affine random walks. arXiv 1108-3146 (2011). [18] I. Ya. Goldshe˘ıd and G. A. Margulis, Lyapunov exponents of a product of random matrices (Russian), Uspekhi Mat. Nauk 44 (1989), no. 5(269), 13–60, DOI

SPECTRAL GAP PROPERTIES AND LIMIT THEOREMS

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29] [30]

[31] [32]

[33]

[34]

[35]

[36]

309

10.1070/RM1989v044n05ABEH002214; English transl., Russian Math. Surveys 44 (1989), no. 5, 11–71. MR1040268 (91j:60014) I. Ya. Goldsheid and Y. Guivarc’h, Zariski closure and the dimension of the Gaussian law of the product of random matrices. I, Probab. Theory Related Fields 105 (1996), no. 1, 109–142, DOI 10.1007/BF01192073. MR1389734 (97f:60019) S. Gou¨ ezel, Berry-Esseen theorem and local limit theorem for non uniformly expanding maps (English, with English and French summaries), Ann. Inst. H. Poincar´e Probab. Statist. 41 (2005), no. 6, 997–1024, DOI 10.1016/j.anihpb.2004.09.002. MR2172207 (2007b:60071) Y. Guivarc’h, Equir´ epartition dans les espaces homog` enes (French), Th´ eorie ergodique (Actes Journ´ ees Ergodiques, Rennes, 1973/1974), Springer, Berlin, 1976, pp. 131–142. Lecture Notes in Math., Vol. 532. MR0480860 (58 #1009) Y. Guivarc’h, Quelques propri´ et´ es asymptotiques des produits de matrices al´ eatoires (French), Eighth Saint Flour Probability Summer School—1978 (Saint Flour, 1978), Lecture Notes in Math., vol. 774, Springer, Berlin, 1980, pp. 177–250. MR590627 (82b:60006) Y. Guivarc’h, Application d’un th´ eor` eme limite local ` a la transience et ` a la r´ ecurrence de marches de Markov (French), Th´ eorie du potentiel (Orsay, 1983), Lecture Notes in Math., vol. 1096, Springer, Berlin, 1984, pp. 301–332, DOI 10.1007/BFb0100117. MR890364 (88h:60124) Y. Guivarc’h and A. Raugi, Products of random matrices: convergence theorems, Random matrices and their applications (Brunswick, Maine, 1984), Contemp. Math., vol. 50, Amer. Math. Soc., Providence, RI, 1986, pp. 31–54, DOI 10.1090/conm/050/841080. MR841080 (87m:60024) Y. Guivarc’h and J. Hardy, Th´ eor` emes limites pour une classe de chaˆınes de Markov et applications aux diff´ eomorphismes d’Anosov (French, with English summary), Ann. Inst. H. Poincar´ e Probab. Statist. 24 (1988), no. 1, 73–98. MR937957 (89m:60080) Y. Guivarc’h, Limit theorems for random walks and products of random matrices, Probability measures on groups: recent directions and trends, Tata Inst. Fund. Res., Mumbai, 2006, pp. 255–330. MR2213480 (2007i:60006) Y. Guivarc’h, Heavy tail properties of stationary solutions of multidimensional stochastic recursions, Dynamics & stochastics, IMS Lecture Notes Monogr. Ser., vol. 48, Inst. Math. Statist., Beachwood, OH, 2006, pp. 85–99, DOI 10.1214/lnms/1196285811. MR2306191 (2008h:60261) Y. Guivarc’h and E. Le Page, On spectral properties of a family of transfer operators and convergence to stable laws for affine random walks, Ergodic Theory Dynam. Systems 28 (2008), no. 2, 423–446. MR2408386 (2009b:60051) Y. Guivarc’h, On the spectrum of a large subgroup of a semisimple group, J. Mod. Dyn. 2 (2008), no. 1, 15–42, DOI 10.3934/jmd.2008.2.15. MR2366228 (2009f:22006) Y. Guivarc’h, On contraction properties for products of Markov driven random matrices (English, with English and Ukrainian summaries), Zh. Mat. Fiz. Anal. Geom. 4 (2008), no. 4, 457–489, 573. MR2485240 (2010a:37012) Guivarc’h Y. and Le Page E. : Spectral gap properties and asymptotics of stationary measures for affine random walks. arXiv : 1204.6004 v4 (2014). Y. Guivarc’h and E Le Page, Homogeneity at infinity of stationary solutions of multivariate affine stochastic recursions, Random matrices and iterated random functions, Springer Proc. Math. Stat., vol. 53, Springer, Heidelberg, 2013, pp. 119–135, DOI 10.1007/978-3-642-388064 6. MR3095196 Y. Guivarc’h and E. Le Page, Asymptotique des valeurs extrˆ emes pour les marches al´ eatoires affines (French, with English and French summaries), C. R. Math. Acad. Sci. Paris 351 (2013), no. 17-18, 703–705, DOI 10.1016/j.crma.2013.09.017. MR3124330 H. Hennion, Sur un th´ eor` eme spectral et son application aux noyaux lipchitziens (French, with English summary), Proc. Amer. Math. Soc. 118 (1993), no. 2, 627–634, DOI 10.2307/2160348. MR1129880 (93g:60141) C. T. Ionescu Tulcea and G. Marinescu, Th´ eorie ergodique pour des classes d’op´ erations non compl` etement continues (French), Ann. of Math. (2) 52 (1950), 140–147. MR0037469 (12,266g) D. A. Kaˇ zdan, Uniform distribution on a plane (Russian), Trudy Moskov. Mat. Obˇsˇ c. 14 (1965), 299–305. MR0193187 (33 #1408)

310

YVES GUIVARC’H

[37] G. Keller, Un th´ eor` eme de la limite centrale pour une classe de transformations monotones par morceaux (French, with English summary), C. R. Acad. Sci. Paris S´er. A-B 291 (1980), no. 2, A155–A158. MR605005 (82i:60042) [38] H. Kesten, Random difference equations and renewal theory for products of random matrices, Acta Math. 131 (1973), 207–248. MR0440724 (55 #13595) [39] M. S. Kirsebom, Extreme value theory for random walks on homogeneous spaces, Discrete Contin. Dyn. Syst. 34 (2014), no. 11, 4689–4717, DOI 10.3934/dcds.2014.34.4689. MR3223825 [40] C. Kl¨ uppelberg and S. Pergamenchtchikov, Extremal behaviour of models with multivariate random recurrence representation, Stochastic Process. Appl. 117 (2007), no. 4, 432–456, DOI 10.1016/j.spa.2006.09.001. MR2305380 (2008j:60125) [41] M. R. Leadbetter, G. Lindgren, and H. Rootz´ en, Extremes and related properties of random sequences and processes, Springer Series in Statistics, Springer-Verlag, New York-Berlin, 1983. MR691492 (84h:60050) [42] E. Le Page, Th´ eor` emes limites pour les produits de matrices al´ eatoires (French), Probability measures on groups (Oberwolfach, 1981), Lecture Notes in Math., vol. 928, Springer, BerlinNew York, 1982, pp. 258–303. MR669072 (84d:60012) [43] A. V. L¨ etchikov, Products of unimodular independent random matrices (Russian), Uspekhi Mat. Nauk 51 (1996), no. 1(307), 51–100, DOI 10.1070/RM1996v051n01ABEH002735; English transl., Russian Math. Surveys 51 (1996), no. 1, 49–96. MR1392671 (97k:60020) [44] M. Pollicott, Limiting distributions for geodesics excursions on the modular surface, Spectral analysis in geometry and number theory, Contemp. Math., vol. 484, Amer. Math. Soc., Providence, RI, 2009, pp. 177–185, DOI 10.1090/conm/484/09474. MR1500147 (2010c:37074) [45] M. Rosenblatt, Markov processes. Structure and asymptotic behavior, Springer-Verlag, New York-Heidelberg, 1971. Die Grundlehren der mathematischen Wissenschaften, Band 184. MR0329037 (48 #7379) [46] J. Rousseau-Egele, Un th´ eor` eme de la limite locale pour une classe de transformations dilatantes et monotones par morceaux (French, with English summary), Ann. Probab. 11 (1983), no. 3, 772–788. MR704569 (84m:60032) [47] K. Schmidt, Asymptotically invariant sequences and an action of SL(2, Z) on the 2-sphere, Israel J. Math. 37 (1980), no. 3, 193–208, DOI 10.1007/BF02760961. MR599454 (82e:28023a) [48] Ja. G. Sina˘ı, Gibbs measures in ergodic theory (Russian), Uspehi Mat. Nauk 27 (1972), no. 4(166), 21–64. MR0399421 (53 #3265) [49] F. Spitzer, Principles of random walk, 2nd ed., Springer-Verlag, New York-Heidelberg, 1976. Graduate Texts in Mathematics, Vol. 34. MR0388547 (52 #9383) [50] D. Sullivan, Disjoint spheres, approximation by imaginary quadratic numbers, and the logarithm law for geodesics, Acta Math. 149 (1982), no. 3-4, 215–237, DOI 10.1007/BF02392354. MR688349 (84j:58097) [51] Varj´ u P.P. : Random walks in Euclidean spaces. arXiv : 1205.3399. [52] A. D. Vircer, Matrix and operator random products (Russian, with English summary), Teor. Veroyatnost. i Primenen. 24 (1979), no. 2, 361–370. MR532448 (80h:60018) [53] Ya. B. Vorobets, On the actions of finitely generated groups and semigroups on a plane by means of isometries (Russian, with Russian summary), Mat. Zametki 75 (2004), no. 4, 523– 548, DOI 10.1023/B:MATN.0000023332.36682.51; English transl., Math. Notes 75 (2004), no. 3-4, 489–512. MR2068284 (2005e:28024) [54] D. V. Widder, The Laplace Transform, Princeton Mathematical Series, v. 6, Princeton University Press, Princeton, N. J., 1941. MR0005923 (3,232d)

Proceedings of Symposia in Pure Mathematics Volume 89, 2015 http://dx.doi.org/10.1090/pspum/089/01490

The martingale approach after Varadhan and Dolgopyat Jacopo De Simoi and Carlangelo Liverani Abstract. We present, in the simplest possible form, the so called martingale problem strategy to establish limit theorems. The presentation is specially adapted to problems arising in partially hyperbolic dynamical systems. We will discuss a simple partially hyperbolic example with fast-slow variables and use the martingale method to prove an averaging theorem and study fluctuations from the average. The emphasis is on ideas rather than on results. Also, no effort whatsoever is done to review the vast literature of the field.

Contents 1. Introduction 2. Preliminaries and results 3. Standard Pairs 4. Conditioning 5. Averaging (the Law of Large Numbers) 6. A recap of what we have done so far 7. Fluctuations (the Central Limit Theorem) Appendix A. Geometry Appendix B. Shadowing Appendix C. Martingales, operators and It¯ o’s calculus References

C.L. thanks Konstantin Khanin for pointing out (long time ago, during the 2011 Thematic Program on Dynamics and Transport in Disordered Systems at the Fields Institute) the need for this note and forcing him to start thinking about it. We thank Mikko Stenlund and Denis Volk for several comments on a preliminary version. We thank Dmitry Dolgopyat, Alexey Korepanov, Zemer Kosloff, Ian Melbourne, Mark Pollicott and the anonymous referees for providing several suggestions which improved the readability of the current version. We thank the Centre Bernoulli, Lausanne, Switzerland where part of this notes were written. Both authors have been partially supported by the ERC Grant MALADY (ERC AdG 246953). J.D.S. acknowledges partial NSERC support. c 2015 American Mathematical Society

311

312

JACOPO DE SIMOI AND CARLANGELO LIVERANI

1. Introduction In this note1 we purport to explain in the simplest possible terms a strategy to investigate the statistical properties of dynamical systems put forward by Dmitry Dolgopyat [3]. It should be remarked that Dolgopyat has adapted to the field of Dynamical Systems a scheme developed by Srinivasa Varadhan and collaborators first for the study of stochastic process arising from a diffusion [13], then for the study of limit theorems (e.g. the hydrodynamics limit), starting with the pioneering [7], and large deviations, e.g. [4].2 The adaptation is highly non trivial as in the case of Dynamical Systems two basic tools commonly used in probability (conditioning and It¯ o calculus) are missing. The lesson of Dolgopyat is that such tools can be recovered nevertheless, provided one looks at the problem in the right way. Rather than making an abstract exposition, we prefer a hands-on presentation. Hence, we will illustrate the method by discussing a super simple (but highly non trivial) example. The presentation is especially aimed at readers in the field of Dynamical Systems. Thus probabilists could find the exposition at times excessively detailed and/or redundant and at other times a bit too fast. 1.1. Fast-Slow partially hyperbolic systems. We are interested in studying fast-slow systems in which the fast variable undergoes a strongly chaotic motion. Namely, let M, S be two compact Riemannian manifolds, let X = M × S be the configuration space of our systems and let mLeb be the Riemannian measure on M . For simplicity, we consider only the case in which S = Td for some d ∈ N. We consider a map F0 ∈ C r (X, X), r ≥ 3, defined by F0 (x, θ) = (f (x, θ), θ) where the maps f (·, θ) are uniformly hyperbolic for every θ. If we consider a small perturbation of F0 we note that the perturbation of f still yields a uniformly hyperbolic system, by structural stability. Thus such a perturbation can be subsumed in the original maps. Hence, it suffices to study families of maps of the form Fε (x, θ) = (f (x, θ), θ + εω(x, θ)) with ε ∈ (0, ε0 ), for some ε0 small enough, and ω ∈ C r . Such systems are called fast-slow, since the variable θ, the slow variable, needs a time at least ε−1 to change substantially. The basic question is what are the statistical properties of Fε ? The answer to such a question is at the end of a long road that starts with the attempt to understand the dynamics for times of order ε−1 . In this note we will concentrate on such a preliminary problem and will describe how to overcome the first obstacles along the path we would like to walk. 1 A first, preliminary, version of this note was prepared by the second author for a mini course at the conference Beyond Uniform Hyperbolicity in Bedlewo, Poland, held at the end of May 2013, which, ultimately, he could not attend. The note was then extended and presented during the semester Hyperbolic dynamics, large deviations and fluctuations held at the Bernoulli Centre, Lausanne, January–June 2013. 2 It should be noted that the above has no pretension of being an exact historical reconstruction, it just describes the way we learned this material. Indeed, some of the relevant ideas were previously present. See, e.g., the reference [9] pointed out to us by Sergei Kuksin.

MARTINGALES

313

1.2. The unperturbed system: ε = 0. The statistical properties of the system are well understood in the case ε = 0. In such a case θ is an invariant of motion, while for every θ the map f (·, θ) has strong statistical properties. We will need such properties in the following discussion which will be predicated on the idea that, for times long but much shorter than ε−1 , on the one hand θ remains almost constant, while, on the other hand, its change depends essentially on the behavior of an ergodic sum with respect to a fixed dynamics f (·, θ). It is not obvious which exact general properties are necessary to prove the type of results we are interested in. Yet, let us give an idea of the situation by listing the main properties that we will need, and use, in the following. (1) the maps f (·, θ) admit a unique SRB (Sinai–Ruelle–Bowen) measure mθ . (2) the measure mθ , when seen as an element of C 1 (M, R) , is differentiable in θ. (3) there exists C0 , α > 0 such that, for each g, h ∈ C 1 (M, R), we have3 |mLeb (h · g ◦ f n (·, θ)) − mθ (g)mLeb (h)| ≤ C0 e−αn hB1 gB2 , |mθ (h · g ◦ f n (·, θ)) − mθ (g)mθ (h)| ≤ C0 e−αn hB1 gB2 , where B1 , B2 are appropriate Banach spaces.4 The above properties hold for a wide class of uniformly hyperbolic systems, [1, 2, 5, 6], yet here, to further simplify the exposition, we assume that M = T1 and (1.1)

∂x f ≥ λ > 2.

Then a SRB measure is just a measure absolutely continuous with respect to Lebesgue and all the above properties are well known with the choices B1 = C 1 and B2 = C 0 or B1 = BV and B2 = L1 (see [10] for a fast and elementary exposition or [1] for a more complete discussion). Remark 1.1. For the wondering reader: in all the following arguments the case of an higher dimensional expanding map can be treated in almost exactly the same way (a part from a slightly heavier notation).5 On the contrary, the case of a hyperbolic map is a bit more complex (although the logic of the argument remains exactly the same) due to the different definition of standard pairs necessary to handle the stable direction. See [3] for details. Remark 1.2. Note that in the following we do not require or use the exact knowledge of the spectrum of the transfer operator.6 Yet, a detailed understanding of the transfer operator might be necessary in order to obtain sharper results. 3 We

remark that a slower decay of correlation could suffice, but let us keep things simple. exact required properties for the Banach spaces vary depending on the context. In the context that we are going to consider nothing much is needed. Yet, in general, it could be helpful to have properties that allow to treat automatically multiple correlations: let {g1 , g2 , g3 } ⊂ C 1 , then 4 The

mLeb (g1 · (g2 ◦ f n · g3 ) ◦ f m ) = mLeb (g1 )mθ (g2 ◦ f n · g3 ) + O(e−αm g1 B1 g2 ◦ f n · g3 B2 ). Thus, in order to have automatically decay of multiple correlations we need, at least, g2 ◦f n B2 ≤ C# g2 B2 , which is false, for example, for the C 1 norm. 5 Simply, the support of a standard pair will be a ball rather than a segment. 6 The transfer operator L is simply the adjoint of the dynamics, i.e. L μ(g) = μ(g ◦ f (·, θ)), θ θ when acting on an appropriate class of measures.

314

JACOPO DE SIMOI AND CARLANGELO LIVERANI

It follows that the dynamical systems (X, F0 ) has uncountable many SRB measures: all the measures of the form μ(ϕ) = ϕ(x, θ)mθ (dx)ν(dθ) for an arbitrary measure ν. The ergodic measures are the ones in which ν is a point mass. The system is partially hyperbolic and has a central foliation. Indeed, the f (·, θ) are all topologically conjugate by structural stability of expanding maps [8]. Let h(·, θ) be the map conjugating f (·, 0) with f (·, θ), that is h(f (x, 0), θ) = f (h(x, θ), θ). Thus the foliation Wxc = {(h(x, θ), θ)}θ∈S is invariant under F0 and consists of points that stay, more or less, always at the same distance, hence it is a center foliation. Note however that, since in general h is only a H¨ older continuous function (see [8]) the foliation is very irregular and, typically, not absolutely continuous. In conclusion, the map F0 has rather poor statistical properties and a not very intuitive description as a partially hyperbolic system. It is then not surprising that its perturbations form a very rich universe to explore and already the study of the behavior of the dynamics for times of order ε−1 (a time long enough so that the variable θ has a non trivial evolution, but far too short to investigate the statistical properties of Fε ) is interesting and non trivial. 2. Preliminaries and results Let μ0 be a probability measure on X. Let us define (xn , θn ) = Fεn (x, θ), then (xn , θn ) are random variables 7 if (x0 , θ0 ) are distributed according to μ0 .8 It is natural to define the polygonalization9 (2.1)

Θε (t) = θε−1 t + (t − ε'ε−1 t()(θε−1 t+1 − θε−1 t ),

t ∈ [0, T ].

Note that Θε is a random variable on X with values in C 0 ([0, T ], S). Also, note the time rescaling done so that one expects non trivial paths. It is often convenient to consider random variables defined directly on the space C 0 ([0, T ], S) rather than X. Let us discuss the set up from such a point of view. The space C 0 ([0, T ], S) endowed with the uniform topology is a separable metric space. We can then view C 0 ([0, T ], S) as a probability space equipped with the Borel σ-algebra. It turns out that such a σ-algebra is the minimal σ-algebra containing  the open sets ni=1 {ϑ ∈ C 0 ([0, T ], S) | ϑ(ti ) ∈ Ui } for each {ti } ⊂ [0, T ] and open sets Ui ⊂ S, [13, Section 1.3]. Since Θε can be viewed as a continuous map from X to C 0 ([0, T ], S), the measure μ0 induces naturally a measure Pε on C 0 ([0, T ], S): Pε = (Θε )∗ μ0 .10 Also, for each t ∈ [0, T ] let Θ(t) ∈ C 0 (C 0 ([0, T ], S), S) be the random variable defined by Θ(t, ϑ) = ϑ(t), for each ϑ ∈ C 0 ([0, T ], S). Next, for each A ∈ C 0 (C 0 ([0, T ], S), R), we will write Eε (A) for the expectation with respect to Pε . For A ∈ C 0 (S, R) and t ∈ [0, T ], Eε (A◦Θ(t)) = Eε (A(Θ(t))) is the expectation of the function A(ϑ) = A(ϑ(t)), ϑ ∈ C 0 ([0, T ], S). To continue, a more detailed discussion concerning the initial conditions is called for. Note that not all measures are reasonable as initial conditions. Just think of the possibility to start with initial conditions given by a point mass, hence killing any trace of randomness. The best one can reasonably do is to fix the 7 Recall that a random variable is a measurable function from a probability space to a measurable space. 8 That is, the probability space is X equipped with the Borel σ-algebra, μ is the probability 0 measure and (xn , θn ) are functions of (x, θ) ∈ X. 9 Since we interpolate between close points the procedure is uniquely defined in T. 10 Given a measurable map T : X → Y between measurable spaces and a measure P on X, T∗ P is a measure on Y defined by T∗ P (A) = P (T −1 (A)) for each measurable set A ⊂ Y .

MARTINGALES

315

slow variable and leave the randomness only in the fast one. Thuswe will consider measures μ0 of the following type: for each ϕ ∈ C 0 (X, R), μ0 (ϕ) = ϕ(x, θ0 )h(x)dx for some θ0 ∈ S and h ∈ C 1 (M, R+ ). Our first problem is to understand limε→0 Pε . After some necessary preliminaries, in Section 5 we will prove the following result. Theorem 2.1. The measures {Pε } have a weak limit P, moreover P is a measure supported on the trajectory determined by the O.D.E. ˙ =ω Θ ¯ (Θ)

(2.2)

Θ(0) = θ0 where ω ¯ (θ) =

 M

ω(x, θ)mθ (dx).

The above theorem specifies in which sense the random variable Θε converges to the average dynamics described by equation (2.2). The next natural question is how fast the convergence takes place. To this end it is natural to consider, for each t ∈ [0, T ], @ 1 ? ζε (t) = ε− 2 Θε (t) − Θ(t) . Note that ζε is a random variable on X with values in C 0 ([0, T ], Rd ) which describes 9ε be the path measure describing ζε the fluctuations around the average.11 Let P 9ε = (ζε )∗ μ0 . when (x0 , θ0 ) are distributed according to the measure μ0 . That is, P Our second task, and the last in this note, will be to understand the limit behavior 9ε , hence of the fluctuation around the average. Section 7 will be devoted to of P proving the following result. 9 Moreover, P 9 is the 9ε } have a weak limit P. Theorem 2.2. The measures {P measure of the zero average Gaussian process defined by the Stochastic Differential Equation (SDE) dζ = Dω ¯ (Θ)ζdt + σ(Θ)dB

(2.3)

ζ(0) = 0,

where B is the Rd dimensional standard Brownian motion and the diffusion coefficient σ is given by 12 σ(θ)2 =mθ (ˆ ω (·, θ) ⊗ ω ˆ (·, θ)) +

∞ 

mθ (ˆ ω (fθm (·), θ) ⊗ ω ˆ (·, θ)) +

m=1

(2.4) +

∞ 

mθ (ˆ ω (·, θ) ⊗ ω ˆ (fθm (·), θ)) .

m=1

where ω ˆ = ω−ω ¯ and we have used the notation fθ (x) = f (x, θ). In addition, σ 2 is symmetric and non-negative, hence σ is uniquely defined as a symmetric positive definite matrix. Finally, σ(θ) is strictly positive, unless ω ˆ (θ, ·) is a coboundary for fθ . we are using that S = Td can be lifted to its universal cover Rd . our notation, for any measure μ and vectors v, w, μ(v ⊗ w) is a matrix with entries

11 Here 12 In

μ(vi wj ).

316

JACOPO DE SIMOI AND CARLANGELO LIVERANI

Remark 2.3. Note that, setting ψ(λ, t) = E(eiλ,ζ(t) ), equation (2.3) implies, by It¯ o’s formula, that 1 ¯ ∂λ ψ. − -λ, σ 2 λ.ψ ∂t ψ = -λ, Dω 2 ψ(λ, 0) = 1 which implies that ψ is a zero mean Gaussian. In turn, this implies that ζ is a zero mean Gaussian process, see the proof of Proposition 7.6 for more details. Remark 2.4. It is interesting to notice that equation (2.3) with σ ≡ 0 is just the equation for the evolution of an infinitesimal displacement of the initial condition, that is the linearised equation along an orbit of the averaged deterministic system. This is rather natural, since in the time scale we are considering, the fluctuations around the deterministic trajectory are very small. Remark 2.5. Note that the condition that insures that the diffusion coefficient σ is non zero can be constructively checked by finding periodic orbits with different averages. Having stated our goals, let us begin with a first, very simple, result. Lemma 2.6. The measures {Pε } are tight. Proof. By (2.1) it follows that the path Θε is made of segments of length ε and maximal slope ωL∞ , thus for all h > 0,13 ε−1 (t+h)−1

Θε (t + h) − Θε (t) ≤ C# h + ε



ω(xk , θk ) ≤ C# h.

k=ε−1 t

Thus the measures Pε are all supported on a set of uniformly Lipschitz functions, that is a compact set.  The above means that there exist converging subsequences {Pεj }. Our next step is to identify the set of accumulation points. An obstacle that we face immediately is the impossibility of using some typical probabilistic tools. In particular, conditioning with respect to the past and It¯ o’s formula. In fact, even if the initial condition is random, the dynamics is still deterministic, hence conditioning with respect to the past seems hopeless as it might kill all the randomness at later times. To solve the first problem it is therefore necessary to devise a systematic way to use the strong dependence on the initial condition (typical of hyperbolic systems) to show that the dynamics, in some sense, forgets the past. One way of doing this effectively is to use standard pairs, introduced in the next section, whereby slightly enlarging our allowed initial conditions. Exactly how this solves the conditioning problem will be explained in Section 4. The lack of It¯o’s formula will be overcome by taking the point of view of the Martingale problem to define the solution of a SDE. To explain what this means in the present context is the goal of the present note, but see Appendix C for a brief comment on this issue in the simple case of an SDE. We will come back to the problem of studying the accumulation points of {Pε } after having settled the issue of conditioning. 13 The reader should be aware that we use the notation C # to designate a generic constant (depending only on f and ω) which numerical value can change from one occurrence to the next, even in the same line.

MARTINGALES

317

3. Standard Pairs Let us fix δ > 0 small enough, and D > 0 large enough, to be specified later; for c1 > 0 consider the set of functions Σc1 = {G ∈ C 2 ([a, b], S) | a, b ∈ T1 , b − a ∈ [δ/2, δ], G C 0 ≤ εc1 , G C 0 ≤ εDc1 , }. Let us associate to each G ∈ Σc1 the map G ∈ C 2 ([a, b], X) defined by G(x) = (x, G(x)) whose image is a curve –the graph of G– which will be called a standard curve. For c2 > 0 large enough, let us define the set of c2 -standard probability densities on the standard curve as ' ) #  b ( ( # (ρ ( 1 ( # ( ρ(x)dx = 1, ( ( ≤ c2 . Dc2 (G) = ρ ∈ C ([a, b], R+ ) # ρ 0 a

C

A standard pair  is given by  = (G, ρ) where G ∈ Σc1 and ρ ∈ Dc2 (G). Let Lε be the collection of all standard pairs for a given ε > 0. A standard pair  = (G, ρ) induces a probability measure μ on X = Td+1 defined as follows: for any continuous function g on X let  b g(x, G(x))ρ(x)dx. μ (g) := a

We define14 a standard family L = (A, ν, {j }j∈A ), where A ⊂ N and ν is a probability measure on A; i.e. we associate to each standard pair j a positive weight ν({j}) so that j∈A ν({j}) = 1. For the following we will use also the notation νj = ν({j}) for each j ∈ A and we will write  ∈ L if  = j for some j ∈ A. A standard family L naturally induces a probability measure μL on X defined as follows: for any measurable function g on X let  ν μ (g). μL (g) := ∈L

Let us denote by ∼ the equivalence relation induced by the above correspondence i.e. we let L ∼ L if and only if μL = μL . Proposition 3.1 (Invariance). There exist δ and D such that, for any c1 , c2 sufficiently large, and ε sufficiently small, for any standard family L, the measure Fε∗ μL can be decomposed in standard pairs, i.e. there exists a standard family L such that Fε∗ μL = μL . We say that L is a standard decomposition of Fε∗ μL . Proof. For simplicity, let us assume that L is given by a single standard pair ; the general case does not require any additional ideas and it is left to the reader. By definition, for any measurable function g: Fε∗ μ (g) = μ (g ◦ Fε ) =  b = g(f (x, G(x)), G(x) + εω(x, G(x))) · ρ(x)dx. a

It is then natural to introduce the map fG : [a, b] → T1 defined by fG (x) = f ◦ G(x). Note that, by assumption (1.1), fG ≥ λ − εc1 ∂θ f C 0 > 3/2 provided that ε is small enough (depending on how large is c1 ). Hence all fG ’s are expanding maps, moreover they are invertible if δ has been chosen small enough. In addition, for any 14 This

is not the most general definition of standard family, yet it suffices for our purposes.

318

JACOPO DE SIMOI AND CARLANGELO LIVERANI

sufficiently smooth function A on X, it is trivial to check that, by the definition of standard curve, if ε is small enough (once again depending on c1 )15 (A ◦ G) C 0 ≤ dAC 0 + εdAC 0 c1

(3.1a)

(A ◦ G) C 0 ≤ 2dAC 1 + εdAC 0 Dc1 .  Then, fix a partition (mod 0) [fG (a), fG (b)] = m j=1 [aj , bj ], with bj − aj ∈ [δ/2, δ] and bj = aj+1 ; moreover let ϕj (x) = fG−1 (x) for x ∈ [aj , bj ] and define (3.1b)

Gj (x) = G ◦ ϕj (x) + εω(ϕj (x), G ◦ ϕj (x)); ρ˜j (x) = ρ ◦ ϕj (x)ϕj (x). By a change of variables we can thus write: m  bj  ρ˜j (x)g(x, Gj (x))dx. (3.2) Fε∗ μ (g) = j=1

aj

Observe that, by immediate differentiation we obtain, for ϕj : ϕj =

(3.3)

1 ◦ ϕj fG

ϕj = −

fG ◦ ϕj . fG3

¯ = G + εωG . Differentiating the definitions of Gj and ρ˜j and Let ωG = ω ◦ G and G using (3.3) yields ¯ ¯  G G f  (3.4) Gj = 2 ◦ ϕj − Gj · G ◦ ϕj Gj =  ◦ ϕj fG fG fG2 and similarly (3.5)

ρ˜j ρ f  = ◦ ϕj − G ◦ ϕj .  ρ˜j ρ · fG fG2

Using the above equations it is possible to conclude our proof: first of all, using (3.4), ¯ and equations (3.1) we obtain, for small enough ε: the definition of G ( (   ( ( G + εωG  ( ≤ 2 (1 + C# ε)εc1 + C# ε ≤ ( Gj  ≤ (  ( 3 f G

3 ≤ εc1 + C# ε ≤ εc1 , 4 provided that c1 is large enough; then: ( (   ( ( G + εωG ( + C# (1 + εDc1 )εc1 ≤ Gj  ≤ ( ( ( fG2 3 ≤ εDc1 + εC# c1 + εC# ≤ εDc1 4 provided c1 and D are sufficiently large. Likewise, using (3.1) together with (3.5) we obtain ( ( ( ρ˜j ( 2 ( ( ≤ c2 + C# (1 + Dc1 ) ≤ c2 , ( ρ˜j ( 3 provided that c2 is large enough. This concludes our proof: it suffices to define b the family L given by (A, ν, {j }j∈A ), where A = {1, . . . , m}, ν({j}) = ajj ρ˜j , 15 Given

a function A by dA we mean the differential.

MARTINGALES

319

ρj = ν({j})−1 ρ˜j and j = (Gj , ρj ). Our previousestimates imply that (Gj , ρj ) are  standard pairs; note moreover that (3.2) implies ∈L ˜  ν˜ = 1, thus L is a standard family. Then we can rewrite (3.2) as follows:  Fε∗ μ (g) = ν˜μ˜(g) = μL (g).  ˜  ∈L

Remark 3.2. Given a standard pair  = (G, ρ), we will interpret (xk , θk ) as random variables defined as (xk , θk ) = Fεk (x, G(x)), where x is distributed according to ρ. 4. Conditioning In probability, conditioning is one of the most basic techniques and one would like to use it freely when dealing with random variables. Yet, as already mentioned, conditioning seems unnatural when dealing with deterministic systems. The use of standard pairs provides a very efficient solution to this conundrum. The basic idea is that one can apply repeatedly Proposition 3.1 to obtain at each time a family of standard pairs and then “condition” by specifying to which standard pair the random variable belongs at a given time.16 Note that if  is a standard pair with G = 0, then it belongs to Lε for all ε > 0. In the following, abusing notations, we will use  also todesignate a family {ε }, ε ∈ Lε that weakly converges to a standard pair  ∈ ε>0 Lε . For every standard pair  we let Pε be the induced measure in path space and Eε the associated expectation. Before continuing, let us recall and state a bit of notation: for each t ∈ [0, T ] recall that the random variable Θ(t) ∈ C 0 (C 0 ([0, T ], S), S) is defined by Θ(t, ϑ) = ϑ(t), for all ϑ ∈ C 0 ([0, T ], S). Also we will need the filtration of σ-algebras Ft defined as the smallest σ-algebra for which all the functions {Θ(s) : s ≤ t} are measurable. Last, we consider the shift τs : C 0 ([0, T ], S) → C 0 ([0, T − s], S) defined by τs (ϑ)(t) = ϑ(t + s). Note that Θ(t) ◦ τs = Θ(t + s). Also, it is helpful to keep in mind that, for all A ∈ C 0 (S, R), we have17 Eε (A(Θ(t + kε))) = μ (A(Θε (t + kε))) = μ (A(Θε (t) ◦ Fεk )). Our goal is to compute, in some reasonable way, expectations of a function of Θ(t + s) conditioned to Ft , notwithstanding the above mentioned problems due to the fact that the dynamics is deterministic. Obviously, we can hope to obtain a result only in the limit ε → 0. Note that we can always reduce to the case in which the conditional expectation is zero by subtracting an appropriate function, thus it suffices to analyze such a case. The basic fact that we will use is the following. Lemma 4.1. Let t ∈ [0, T ] and A be a continuous bounded random variable on C ([0, t ], S) with values in R. If we have 0

lim sup |Eε (A)| = 0,

ε→0 16 Note

∈Lε

that the set of standard pairs does not form a σ-algebra, so to turn the above into a precise statement would be a bit cumbersome. We thus prefer to follow a slightly different strategy, although the substance is unchanged. 17 To be really precise, maybe one should write, e.g., Eε (A ◦ Θ(t + kε)), but we conform to  the above more intuitive notation.

320

JACOPO DE SIMOI AND CARLANGELO LIVERANI

then, for each s ∈ [0, T − t ], standard pair , uniformly bounded continuous functions {Bi }m i=1 , Bi : S → R and times {t1 < · · · < tm } ⊂ [0, s), m  & ε Bi (Θ(ti )) · A ◦ τs = 0. lim E ε→0

i=1

Proof. The quantity we want to study can be written as  m & μ Bi (Θε (ti )) · A(τs (Θε )) . i=1

To simplify our notation, let ki = 'ti ε−1 ( and km+1 = 'sε−1 (. Also, for every stan˜ let L ˜ denote an arbitrary standard decomposition of (Fεki+1 −ki )∗ μ ˜ dard pair ,  i, b and define θ∗ = μ (θ) = a ρ (x)G (x)dx. Then, by Proposition 3.1,   m m & & km+1 μ Bi (Θε (ti )) · A(τs (Θε )) = μ Bi (Θε (ti )) · A(τs−εkm+1 (Θε ◦ Fε )) i=1

=

=





···

1 ∈L1,

m+1 ∈Lm,m





···

1 ∈L1,

m+1 ∈Lm,m

+m &

i=1

.

νi Bi (θ∗i ) νm+1 μm+1 (A(Θε )) + o(1)

i=1

+m &

. νi Bi (θ∗i )

νm+1 Eεm+1 (A) + o(1)

i=1

where limε→0 o(1) = 0. The lemma readily follows.



Lemma 4.1 implies that, calling P an accumulation point of Pε , we have18 m  & (4.1) E Bi (Θ(ti )) · A ◦ τs = 0. i=1

This solves the conditioning problems thanks to the following Lemma 4.2. Property (4.1) is equivalent to E (A ◦ τs | Fs1 ) = 0, for all s1 ≤ s. Proof. Note that the statement of the Lemma immediately implies (4.1), we thus worry only about the other direction. If the lemma were not true then there would exist a positive measure set of the form K=



{ϑ(ti ) ∈ Ki },

i=0

where the {Ki } is a collection of compact sets in S, and ti < s, on which the conditional expectation is strictly positive (or strictly negative, which can be treated in exactly the same way). For some arbitrary δ > 0, consider open sets Ui ⊃ Ki be 18 By

E we mean the expectation with respect to P.

MARTINGALES

321

such that P({ϑ(ti ) ∈ Ui \ Ki }) ≤ δ2−i . Also, let Bδ,i be a continuous function such that Bδ,i (ϑ) = 1 for ϑ ∈ Ki and Bδ,i (ϑ) = 0 for ϑ ∈ Ui . Then n  & 0 < E(K A ◦ τs ) = lim E Bδ,i (Θ(ti )) · A ◦ τs + C# δ = C# δ n→∞

i=1



which yields a contradiction by the arbitrariness of δ.

In other words, we have recovered the possibility of conditioning with respect to the past after the limit ε → 0. 5. Averaging (the Law of Large Numbers) We are now ready to provide the proof of Theorem 2.1. The proof consists of several steps; we first illustrate the global strategy while momentarily postponing the proof of the single steps. Proof of Theorem 2.1. As already mentioned we will prove the theorem for a larger class of initial conditions: any initial condition determined by a sequence of standard pairs. Note that for any fixed flat standard pair , i.e. G (x) = θ, we have an initial condition assumed in the statement of the Theorem. Given a standard pair  let {Pε } be the associate measures in path space (the latter measures being determined, as explained at the beginning of Section 2, by the standard pair  and (2.1)). We have already seen in Lemma 2.6 that the set {Pε } is tight. Next we will prove in Lemma 5.1 that, for each A ∈ C 2 (S, R), we have #  #  t # # -ω(Θ(τ )), ∇A(Θ(τ )).dτ ## = 0. (5.1) lim sup ##Eε A(Θ(t)) − A(Θ(0)) − ε→0 0

∈Lε

Accordingly, it is natural to consider the random variables A(t) defined by  t -ω(ϑ(τ )), ∇A(ϑ(τ )).dτ, A(t, ϑ) = A(ϑ(t)) − A(ϑ(0)) − 0

for each t ∈ [0, T ] and ϑ ∈ C 0 ([0, T ], S), and the first order differential operator (5.2)

LA = -ω, ∇A..

Then equation (5.1), together with Lemmata 4.1 and 4.2, means that each accumulation point P of {Pε } satisfies, for all s ∈ [0, T ] and t ∈ [0, T − s], #    t+s # (5.3) E (A ◦ τs | Fs ) = E A(Θ(t + s)) − A(Θ(s)) − LA(Θ(τ ))dτ ## Fs = 0 s

this is the simplest possible version of the Martingale Problem. Indeed it implies that, for all θ, A and standard pair  such that G (x) = θ,  t LA(Θ(s))ds M (t) = A(Θ(t)) − A(Θ(0)) − 0

is a martingale with respect to the measure Pθ and the filtration Ft (i.e., for each 0 ≤ s ≤ t ≤ T , Eθ (M (t) | Fs ) = M (s)).19 Finally we will show in Lemma 5.2 that there is a unique measure that has such a property: the measure supported on the unique solution of equation (2.2). This concludes the proof of the theorem.  In the rest of this section we provide the missing proofs. 19 We

use Pθ to designate any measure P with G (x) = θ.

322

JACOPO DE SIMOI AND CARLANGELO LIVERANI

5.1. Differentiating with respect to time. Let us start with the proof of (5.1). Lemma 5.1. For each A ∈ C 2 (S, R) we have20 #  #  t # # lim sup ##Eε A(Θ(t)) − A(Θ(0)) − -ω(Θ(s)), ∇A(Θ(s)).ds ## = 0, ε→0

0

∈Lε

where (we recall) ω(θ) = mθ (ω(·, θ)) and mθ is the unique SRB measure of f (·, θ). Proof. We will use the notation of Appendix B. Given a standard pair  let ρ = ρ, θ∗ = μ (θ) and f∗ (x) = f (x, θ∗ ). Then, by Lemmata B.1 and B.2, we can 1 write, for n ≤ Cε− 2 ,21    b n−1  μ (A(θn )) = ρ(x)A θ0 + ε ω(xk , θk ) dx a



b

=

k=0



ρ(x)A θ∗ + ε

a





ω(xk , θ∗ ) dx + O(ε2 n2 + ε)

k=0 b

= 

n−1 

ρ(x)A(θ∗ )dx + ε

a

n−1  b k=0

ρ(x)-∇A(θ∗ ), ω(xk , θ∗ ).dx + O(ε)

a

b

ρ(x)A(G (x))dx + O(ε)

= a



n−1  b k=0

a

ρ(x)-∇A(θ∗ ), ω(f∗k ◦ Yn (x), θ∗ ).dx

= μ (A(θ0 )) + ε =

>

n−1  k=0

T1

ρ˜n (x)-∇A(θ∗ ), ω(f∗k (x), θ∗ ).dx + O(ε)

 where ρ˜n (x) = ◦ Yn−1 (x). Note that T1 ρ˜n = 1 but, unfortunately, ˜ ρBV may be enormous. Thus, we cannot estimate the integral in the above expression by naively using decay of correlations. Yet, equation (B.3) implies |Yn − 1| ≤ C# εn2 . Moreover, ρ¯ = (χ[a,b] ρ) ◦ Y −1 has uniformly bounded variation.22 Accordingly, by the decay of correlations and the C 1 dependence of the invariant measure on θ (see Section 1.2) we have   ρ˜n (x)-∇A(θ∗ ), ω(f∗k (x), θ∗ ).dx = ρ¯n (x)-∇A(θ∗ ), ω(f∗k (x), θ∗ ).dx + O(εn2 ) T1

χ[a,b] ρ Yn

T1

ρn (x))mθ∗ (-∇A(θ∗ ), ω(·, θ∗ ).) + O(εn2 + e−c# k ) = mLeb (˜ = μ (-∇A(θ0 ), ω(θ0 ).) + O(εn2 + e−c# k ).

Hence, (5.4)

μ (A(θn )) = μ (A(θ0 ) + εn-∇A(θ0 ), ω ¯ (θ0 ).) + O(n3 ε2 + ε).

20 See Lemma C.1 for the relation between the differentiability mentioned in the title of the subsection and the present integral formula. 21 By O(εa nb ) we mean a quantity bounded by C εa nb , where C does not depend on . #    # 22 Indeed, for all ϕ ∈ C 1 , |ϕ|  = b ρ · ϕ ◦ Y · Y  = b ρ(ϕ ◦ Y ) ≤ ρ

≤ 1, ρ ¯ ϕ ∞ BV . a a

MARTINGALES

323

Finally, we choose n = 5ε− 3 6 and set h = εn. We define inductively standard families such that L0 = {} and for each standard pair i+1 ∈ Li the family Li+1 is 2 a standard decomposition of the measure (Fεn )∗ μi+1 . Thus, setting m = 5tε− 3 6−1, recalling equation (5.4) and using repeatedly Proposition 3.1, 1

Eε (A(Θ(t))) = μ (A(θtε−1 )) = μ (A(θ0 )) +

m−1 

μ (A(θε−1 (k+1)h )) − A(θε−1 kh ))

k=0

= μ (A(θ0 )) +  = Eε

m−1 



k=0 1 ∈L0

A(Θ(0)) +

···



k−1 &

> = 2 νj μk−1 (ε 3 -∇A(θ0 ), ω ¯ (θ0 ).) + O(ε)

k−1 ∈Lk−2 j=1

m−1 



-∇A(Θ(kh)), ω ¯ (Θ(kh)).h

1

+ O(ε 3 t)

k=0

   t 1 ε = E A(Θ(0)) + -∇A(Θ(s)), ω ¯ (Θ(s)).ds + O(ε 3 t). 0

The lemma follows by taking the limit ε → 0.



5.2. The Martingale Problem at work. First of all let us finally specify precisely what we mean by the martingale problem. Definition 1 (Martingale Problem). Given a Riemannian manifold S, a linear operator L : D(L) ⊂ C 0 (S, Rd ) → C 0 (S, Rd ), a set of measures Py , y ∈ S, on C 0 ([0, T ], S) and a filtration Ft we say that {Py } satisfies the martingale problem if for each function A ∈ D(L), Py ({z(0) = y}) = 1



M (t, z) := A(z(t)) − A(z(0)) −

t

LA(z(s))ds is Ft -martingale under all Py . 0

We can now prove the last announced result. Lemma 5.2. If ω ¯ is Lipschitz, then the martingale problem determined by (5.2) has a unique solution consisting of the measures supported on the solutions of the ODE ˙ = ω(Θ) Θ (5.5) Θ(0) = y. Proof. Let Θ be the solution of (5.5) with initial condition y ∈ Td and Py the probability measure in the martingale problem. The idea is to compute d d Ey (Θ(t) − Θ(t)2 ) = Ey (-Θ(t), Θ(t).) − 2-¯ ω (Θ(t)), Ey (Θ(t)). dt dt d ω (Θ(t)), Θ(t)).. − 2-Θ(t), Ey (Θ(t)). + 2-¯ dt To continue we use Lemma C.1 where, in the first term A(θ) = θ2 , in the third ¯ (θ).. A(θ) = θi and the generator in (5.3) is given by LA(θ) = -∇A(θ), ω d Ey (Θ(t) − Θ(t)2 ) = 2Ey (-Θ(t), ω ¯ (Θ(t)).) − 2-¯ ω (Θ(t)), Ey (Θ(t)). dt ω (Θ(t))). + 2Ey (-Θ(t), ω ¯ (Θ(t)).) − 2-Θ(t), Ey (¯ = Ey (-Θ(t) − Θ(t), ω(Θ(t)) − ω(Θ(t)).).

324

JACOPO DE SIMOI AND CARLANGELO LIVERANI

By the Lipschitz property of ω ¯ (let CL be the Lipschitz constant), using the Schwartz inequality and integrating we have  t Ey (Θ(t) − Θ(t)2 ) ≤ 2CL Ey (Θ(s) − Θ(s)2 )ds 0

which, by Gronwall’s inequality, implies that Py ({Θ}) = 1.



6. A recap of what we have done so far We have just seen that the martingale method (in Dolgopyat’s version) consists of four steps (1) Identify a suitable class of measures on path space which allow one to handle the conditioning problem (in our case: the one coming from standard pairs) (2) Prove tightness for such measures (in our case: they are supported on uniformly Lipschitz functions) (3) Identify an equation characterizing the accumulation points (in our case: an ODE) (4) Prove uniqueness of the limit equation in the martingale sense. The beauty of the previous scheme is that it can be easily adapted to a variety of problems. To convince the reader of this fact we proceed further and apply it to obtain more refined information on the behavior of the system. 7. Fluctuations (the Central Limit Theorem) It is possible to study the limit behavior of ζε using the strategy summarized in Section 6, even though now the story becomes technically more involved. Let us 9ε be the path measure describing discuss the situation a bit more in detail. Let P  ζε when (x0 , θ0 ) are distributed according to the standard pair .23 Note that, 9ε = (ζε )∗ μ . Again, we provide a proof of the claimed results based on some facts P  that will be proven in later sections. 9ε is tight, Proof of Theorem 2.2. First of all, the sequence of measures P  which will be proven in Proposition 7.1. Next, by Proposition 7.4, we have that #  #  t # ε # 9 # (7.1) lim sup #E A(ζ(t)) − A(ζ(0)) − Ls A(ζ(s))ds ## = 0, ε→0

∈Lε

0

where (7.2)

d 1  2 (Ls A)(ζ) = -∇A(ζ), Dω ¯ (Θ(s))ζ. + [σ (Θ(s))]i,j ∂ζi ∂ζj A(ζ), 2 i,j=1

with diffusion coefficient σ 2 given by (2.4). In the following we will often write, slightly abusing notations, σ(t) for σ(Θ(t)). 23 As already explained, here we allow  to stand also for a family { } which weakly converges ε to . In particular, this means that Θ is also a random variable, as it depends on the initial condition θ0 .

MARTINGALES

325

We can then use equation (7.1) and Lemma 4.1 followed by Lemma 4.2 to obtain that  t A(ζ(t)) − A(ζ(0)) − Ls A(ζ(s))ds 0

9 of the measures P 9ε with respect is a martingale under any accumulation point P  9 = 0}) = 1. In Proposition 7.6 we will prove that to the filtration Ft with P({ζ(0) 9 9ε = P. such a problem has a unique solution thereby showing that limε→0 P  Note that the time dependent operator Ls is a second order operator, this means that the accumulation points of ζε do not satisfy a deterministic equation, 9 is equal in law but rather a stochastic one. Indeed our last task is to show that P to the measure determined by the stochastic differential equation dζ = Dω ¯ ◦ Θ(t) ζdt + σdB

(7.3)

ζ(0) = 0

where B is a standard Rd dimensional Brownian motion. Note that the above equation is well defined in consequence of Lemma 7.5 which shows that the matrix σ 2 is symmetric and non negative, hence σ = σ T is well defined and strictly positive if ω ˆ is not a coboundary (see Lemma 7.5). To conclude it suffices to show that the probability measure describing the solution of (7.3) satisfies the martingale problem.24 It follows from It¯o’s calculus, indeed if ζ is the solution of (7.3) and o’s formula reads A ∈ C r , then It¯  1 dA(ζ) = ∂ζi A(ζ)dζi + ∂ζi ∂ζj A(ζ)σik σjk dt. 2 i i,j,k

Integrating it from s to t and taking the conditional expectation we have    t # # E A(ζ(t)) − A(ζ(s)) − Lτ A(ζ(τ ))dτ Fs = 0. s

See Appendix C for more details on the relation between the Martingale problem and the theory of SDE and how this allows to dispense form It¯o’s formula altogether. We have thus seen that the measure determined by (7.3) satisfies the martingale 9 since P 9 is the unique solution of the martingale problem, hence it must agree with P problem. The proof of the Theorem is concluded by noticing that (7.3) defines a zero mean Gaussian process (see the end of the proof of Proposition 7.6).  7.1. Tightness. 9ε }ε>0 are tight. Proposition 7.1. For every standard pair , the measures {P  Proof. Now the proof of tightness is less obvious since the paths have a Lipschitz constant that explodes. Luckily, there exists a convenient criterion for tightness: Kolmogorov criterion [15, Remark A.5]. Theorem 7.2 (Kolmogorov). Given a sequence of measures Pε on C 0 ([0, T ], R), if there exists α, β, C > 0 such that Eε (|z(t) − z(s)|β ) ≤ C|t − s|1+α for all t, s ∈ [0, T ] and the distribution of z(0) is tight, then {Pε } is tight. 24 We

do not prove that such a solution exists as this is a standard result in probability, [15].

326

JACOPO DE SIMOI AND CARLANGELO LIVERANI

Note that ζε (0) = 0. Of course, it is easier to apply the above criteria with β ∈ N. It is reasonable to expect that the fluctuations behave like a Brownian motion, so the variance should be finite. To verify this let us compute first the case β = 2. Note that, setting ω ˆ (x, θ) = ω(x, θ) − ω ¯ (θ), √ ζε (t) = ε

ε−1 t−1



? @ √ ω(xk , θk ) − ω ¯ (Θ(εk)) + O( ε)

k=0

=

√ ε

ε−1 t−1



? @ √ ω ˆ (xk , θk ) + ω ¯ (θk ) − ω ¯ (Θ(εk)) + O( ε)

k=0

(7.4) =

√ ε

ε−1 t−1



? @ √ √ ω ˆ (xk , θk ) + εDω ¯ (Θ(εk))ζε (kε) + O( ε)

k=0 ε−1 t−1

+



3

O(ε 2 ζε (εk)2 ).

k=0

We start with a basic result. Lemma 7.3. For each standard pair  and k, l ∈ {0, . . . , ε−1 }, k ≥ l, we have ⎛( (2 ⎞ ( ( k ( ⎟ ⎜( ( ω ˆ (xj , θj )( μ ⎝( ( ⎠ ≤ C# (k − l). ( j=l ( The proof of the above Lemma is postponed to the end of the section. Let us see how it can be profitably used. Note that, for t = εk, s = εl, (7.5)

9 ε (ζ(t) − ζ(s)2 ) ≤ C# |t − s| + C# |t − s|ε E 

k 

μ (ζε (εj)2 ) + C# ε,

j=l

where we have used Lemma 7.3 and the trivial estimate ζε  ≤ C# ε− 2 . If we use the above with s = 0 and define Mk = supj≤k μ (|ζε (εj)|2 ) we have 1

Mk ≤ C# εk + C# k2 ε2 Mk . Thus there exists C > 0 such that, if k ≤ Cε−1 , we have Mk ≤ C# εk. Hence, we can substitute such an estimate in (7.5) and obtain (7.6)

9 ε (ζ(t) − ζ(s)2 ) ≤ C# |t − s| + C# ε. E 

Since the estimate for |t − s| ≤ C# ε is trivial, we have the bound, 9 ε (ζ(t) − ζ(s)2 ) ≤ C# |t − s|. E  This is interesting but, unfortunately, it does not suffice to apply the Kolmogorov criteria. The next step could be to compute for β = 3. This has the well known disadvantage of being an odd function of the path, and hence one has to deal with the absolute value. Due to this, it turns out to be more convenient to consider directly the case β = 4. This can be done in complete analogy with the above computation, by first generalizing the result of Lemma 7.3 to higher momenta. Doing so we obtain 9 ε (ζ(t) − ζ(s)4 ) ≤ C# |t − s|2 , (7.7) E 

MARTINGALES

327

which concludes the proof of the proposition. Indeed, the proof of Lemma 7.3 explains how to treat correlations. Multiple correlations can be treated similarly and one can thus show that they do not contribute to the leading term. Thus the computation becomes similar (although much more involved) to the case of the sum independent zero mean random variables Xi (where no correlations are present), that is k  X i ]4 ) = E([ i=l

k 

E(Xi1 Xi2 Xi2 Xi4 ) =

i1 ,...,i4 =l

k 

E(Xi2 Xj2 ) = O((k − l)2 ).

i,j=l

For future use let us record that, by equation (7.7) and the Young inequality, (7.8)

9 ε (ζ(t) − ζ(s)3 ) ≤ C# |t − s| 32 . E 



We still owe the reader the Proof of Lemma 7.3. The proof starts with a direct computation:25 ⎛# #2 ⎞ # # k k # # ⎟  ! ⎜# # ˆ (xj , θj )2 μ ⎝# ω ˆ (xj , θj )# ⎠ ≤ μ ω # j=l # j=l +2

k k  

μ (ˆ ω (xj , θj )ˆ ω(xr , θr ))

j=l r=l+1

≤ C# |k − l| + 2

k k  

μ (ˆ ω (xj , θj )ˆ ω(xr , θr )) .

j=l r=j+1

To compute the last correlation, remember Proposition 3.1. We can thus call Lj the standard family associated to (Fεj )∗ μ and, for r ≥ j, we write  μ (ˆ ω (xj , θj )ˆ ω (xr , θr )) = ν1 μ1 (ˆ ω (x0 , θ0 )ˆ ω (xr−j , θr−j )) 1 ∈Lj

=





b1

ν 1

ρ1 (x)ˆ ω (x, G1 (x))ˆ ω(xr−j , θr−j ). a 1

1 ∈Lj

We would like to argue as in the proof of Lemma 5.1 and try to reduce the problem to  b  b 1 1 ρ1 (x)ˆ ω(x, θ∗1 )ˆ ω (xr−j , θ∗1 ) = ρ1 (x)ˆ ω (x, θ∗1 )ˆ ω(f∗r−j (Yr−j (x)), θ∗1 ) a 1

a 1

 =

T1

−1 ρ˜(x)ˆ ω(Yr−j (x), θ∗1 )ˆ ω (f∗r−j (x), θ∗1 ),

but then the mistake that we would make substituting ρ˜ with ρ¯ is too big for our current purposes. It is thus necessary to be more subtle. The idea is to write ρ1 (x)ˆ ω(x, G1 (x)) = α1 ρˆ1 (x)+α2 ρˆ2 (x), where ρˆ1 , ρˆ2 are standard densities.26 Note that α1 , α2 are uniformly bounded. Next, let us fix L > 0 to be chosen later and 25 To

simplify notation we do the computation in the case d = 1, the general case is identical. fact, it would be more convenient to define standard pairs with signed (actually even complex) measures, but let us keep it simple. 26 In

328

JACOPO DE SIMOI AND CARLANGELO LIVERANI

assume r − j ≥ L. Since 1,i = (G, ρˆi ) are standard pairs, by construction, calling L1,i = (F r−j−L )∗ μ1,i we have 

b1

ρˆi (x)ˆ ω(xr−j , θr−j ) = a 1





a 2

2 ∈L1,i

=



 ν 2

2 ∈L1,i

=



2 ∈L1,i

b2

ν 2

T1

 ν 2

T1

ρ2 (x)ˆ ω (f∗L (YL (x)), θ∗2 ) + O(εL)

ρ˜(x)ˆ ω(f∗L (x), θ∗2 ) + O(εL) ρ¯(x)ˆ ω(f∗L (x), θ∗2 ) + O(εL2 )

= O(e−c# L + εL2 ), due to the decay of correlations for the map f∗ and the fact that ω ˆ (·, θ∗2 ) is a zero average function for the invariant measure of f∗ . By the above we have ⎛# #2 ⎞ # # k k  # ⎟ $ −c L % ⎜## [e # + εL2 ](k − j) + 1 + εL3 ω ˆ (xj , θj )## ⎠ ≤ C# μ ⎝ # # j=l # j=l which yields the result by choosing L = c log(k − j) for c large enough.



7.2. Differentiating with respect to time (poor man’s It¯ o’s formula). Proposition 7.4. For every standard pair  and A ∈ C 3 (S, R) we have #  #  t # # ε 9 A(ζ(t)) − A(ζ(0)) − # = 0. L A(ζ(s))ds lim sup ##E s  # ε→0 ∈Lε

0

Proof. As in Lemma 5.1, the idea is to fix h ∈ (0, 1) to be chosen later, and compute

(7.9)

9 ε (A(ζ(t + h)) − A(ζ(t))) = E 9 ε (-∇A(ζ(t)), ζ(t + h) − ζ(t).) E   1 9 ε ( -(D2 A)(ζ(t))(ζ(t + h) − ζ(t)), ζ(t + h) − ζ(t).) + O(h 23 ), +E  2

where we have used (7.8). Unfortunately this time the computation is a bit lengthy and rather boring, yet it basically does not contain any new idea, it is just a brute force computation. Let us start computing the last term of (7.9). Setting ζ h (t) = ζ(t + h) − ζ(t) (t+h)ε−1 and Ωh = k=tε−1 ω ˆ (xk , θk ), by equations (7.4) and using the trivial estimate

MARTINGALES

329

ζε (t) ≤ C# ε− 2 , we have 1

9 ε (-(D2 A)(ζ(t))ζ h (t), ζ h (t).) E 

(t+h)ε−1





μ (-(D2 A)(ζε (t))ˆ ω(xk , θk ), ω ˆ (xj , θj ).)

k,j=tε−1





(t+h)ε

+ O ⎝ε

3 2

−1

μ

⎞ ( h( ! ! (Ω ( ζε (jε) ⎠ + O εμ (Ωh )

j=ε−1 t



(t+h)ε−1



+ O ⎝ε2



(t+h)ε

+ O ⎝ε

3 2





(t+h)ε

μ (ζε (kε)) + ε

5 2

k,j=tε−1 (t+h)ε−1



μ (ζε (kε)2 ) + ε3

k=tε−1



−1

−1



μ (ζε (kε) ζε (jε))⎠

k,j=tε−1

−1

k=tε−1 (t+h)ε

+ O ⎝ε2



μ (Ωh  ζε (jε)2 ) + ε2

j=tε−1





(t+h)ε−1

μ (ζε (kε) ζε (jε)2 ) + ε⎠ ⎞ μ (ζε (kε)2 ζε (jε)2 )⎠ .

k,j=tε−1

Observe that (7.6), (7.8) and (7.7) yield m

μ (ζε (kε)m ) = μ (ζε (kε) − ζε (0)m ) ≤ C# (εk) 2 ≤ C# for m ∈ {1, 2, 3, 4} and k ≤ C# ε−1 . We can now use Lemma 7.3 together with Schwartz inequality to obtain (7.10) (t+h)ε−1

9 ε (-(D2 A)(ζ(t))ζ h (t), ζ h (t).) = ε E 



μ (-(D2 A)(ζε (t))ˆ ω(xk , θk ), ω ˆ (xj , θj ).)

k,j=tε−1

√ + O( εh + h2 + ε).

Next, we must perform a similar analysis on the first term of equation (7.9). 9 ε (-∇A(ζ(t)), ζ h (t).) = E 



(t+h)ε−1

ε



μ (-∇A(ζε (t)), ω ˆ (xk , θk ).)

k=tε−1

(7.11)

(t+h)ε−1





√ μ (-∇A(ζε (t)), Dω ¯ (Θ(εk))ζε (εk).) + O( ε).

k=tε−1

To estimate the term in the second line of (7.11) we have to use again (7.4): (t+h)ε−1



μ (-∇A(ζε (t)), Dω ¯ (Θ(εk))ζε (εk).) = hε−1 μ (-∇A(ζε (t)), Dω ¯ (Θ(t))ζε (t).)

k=tε−1 −1 2

+ O(ε

− 12

h +ε

h) +



(t+h)ε−1

ε



k 

μ (-∇A(ζε (t)), Dω ¯ (Θ(t))ˆ ω(xj , θj ).).

k=tε−1 j=tε−1

To compute the last term in the above equation let L be the standard family 1 generated by  at time ε−1 t, then, setting αε (θ, t) = ∇A(ε− 2 (θ − Θ(t))) and ˆj =

330

JACOPO DE SIMOI AND CARLANGELO LIVERANI

j − tε−1 , we can write μ (-∇A(ζε (t)), Dω ¯ (Θ(t))ˆ ω(xj , θj ).) =

d  

ν1 μ1 (αε (θ0 , t)r Dω ¯ (Θ(t))r,s ω ˆ (xˆj , θˆj )s ).

1 ∈L r,s=1

Next, notice that for every r, the signed measure μ1 ,r (φ) = μ1 (αε (θ0 , t)r φ) has density ρ1 αε (G1 (x), t)r whose derivative is uniformly bounded in ε, t. We can then write μ1 ,r as a linear combination of two standard pairs 1,i . Finally, given L ∈ N, if ˆj ≥ L, we can consider the standard families L1,i generated by 1,i at time ˆj − L and write, arguing as in the proof of Lemma 7.3,  μ1,i (ˆ ω (xˆj , θˆj )s ) = ν2 μ2 (ˆ ω (xL , θL )s ) 2 ∈L1,i



=

2 ∈L1,i



b2

ν 2 a 2

ρ2 (x)ˆ ω(fθL∗ (x), θ∗2 )s + O(εL2 ) 2

= O(e−C# L + εL2 ). Collecting all the above estimates yields (t+h)ε−1

(7.12)

ε



1

μ (-∇A(ζε (t)), Dω ¯ (Θ(εk))ζε (εk).) = O(h2 + ε 2 h)

k=tε−1

¯ (Θ(t))ζε (t).) + O(h2 ε 2 L2 + h2 ε− 2 e−C# L + ε 2 Lh). + hμ (-∇A(ζε (t)), Dω 1

1

1

To deal with the second term in the first line of equation (7.11) we argue as before: (t+h)ε−1



μ (-∇A(ζε (t)), ω ˆ (xk , θk ).) =

k=tε−1

−1 tε +L

μ (-∇A(ζε (t)), ω ˆ (xk , θk ).)

k=tε−1

+ O(hL2 + ε−1 heC# L ) = O(L + hL2 + ε−1 heC# L ). Collecting the above computations and remembering (7.4) we obtain (7.13)

9 ε (-∇A(ζ(t)), Dω 9 ε (-∇A(ζ(t)), ζ h (t).) =hE ¯ (Θ(t))ζ(t).) E   √ √ 2 + O(h + L ε + h εL2 )

provided L is chosen in the interval [C∗ ln ε−1 , ε− 4 ] with C∗ > 0 large enough. To conclude we must compute the term on the right hand side of the first line of equation (7.10). Consider first the case |j − k| > L. Suppose k > j, the other case being equal, then, letting L be the standard family generated by  at time 1 ε−1 t, and set kˆ = k − ε−1 t, ˆj = j − ε−1 t, B(x, θ, t) = (D2 A)(ε− 2 (θ − Θ(t))) 1

ω(xk , θk ), ω ˆ (xj , θj ).) μ (-(D2 A)(ζε (t))ˆ  ν1 μ1 (-B(x0 , θ0 , t)ˆ ω (xkˆ , θkˆ ), ω ˆ (xˆj , θˆj ).). = 1 ∈L

Note that the signed measure μ ˆ1 ,r,s (g) = μ1 (Br,s g) has a density with uniformly bounded derivative given by ρˆ1 ,r,s = ρ1 B(x, G1 (x), t)r,s . Such a density can then be written as a linear combination of standard densities ρˆ1 ,r,s = α1,1 ,r,s ρ1,1 ,r,s +

MARTINGALES

331

α2,1 ,r,s ρ2,1 ,r,s with uniformly bounded coefficients αi,1 ,r,s . We can then use the same trick at time j and then at time k − L and obtain that the quantity we are interested in can be written as a linear combination of quantities of the type μ3 ,r,s (ˆ ωs (xL , θL )) = μ3 ,r,s (ˆ ωs (xL , θ∗3 ) + O(Lε)  b = ρ˜r,s ω ˆ s (fθL∗ (x), θ∗3 ) + O(L2 ε) 3

a

−C# L

= O(e

+ L2 ε)

where we argued as in the proof of Lemma 7.3. Thus the total contribution of all such terms is of order L2 h2 + ε−1 e−C# L h2 . Next, the terms such that |k − j| ≤ L but j ≤ ε−1 t + L give a total contribution of order L2 ε while to estimate the other terms it is convenient to proceed as before but stop at the time j − L. Setting k˜ = k − j + L we obtain terms of the form μ2 ,r,s (ˆ ωs (xk˜ , θk˜ )ˆ ωr (xL , θL ).) = Γk−j (θ∗2 )r,s + O(e−C# L + L2 ε) where

 ω ˆ (fθk (x), θ) ⊗ ω ˆ (x, θ) mθ (dx).

Γk (θ) = S

The case j > k yields the same results but with Γ∗j . Remembering the smooth dependence of the covariance on the parameter θ (see [10]), substituting the result of the above computation in (7.10) and then (7.10) and (7.13) in (7.9) we finally have 9 ε (A(ζ(t + h)) − A(ζ(t))) = hE 9 ε (-∇A(ζ(t)), Dω E ¯ (Θ(t))ζ(t).)   √ √ ε 2 2 9 + hE (Tr(σ (Θ(t))D A(ζ(t)))) + O(L ε + hL2 ε + h2 L2 )  t+h = > 9 ε (Tr(σ 2 (Θ(s))D2 A(ζ(s))) ds 9 ε (-∇A(ζ(s)), Dω = ¯ (Θ(s))ζ(s).) + E E   t

√ √ 3 + O(L ε + hL2 ε + h 2 + h2 L2 ). The proposition follows by summing the h−1 t terms in the interval [0, t] and by 1 1  choosing L = ε− 100 and h = ε 3 . In the previous Lemma the expression σ 2 just stands for a specified matrix, we did not prove that such a matrix is positive definite and hence it has a well defined square root σ, nor we have much understanding of the properties of such a σ (provided it exists). To clarify this is our next task. Lemma 7.5. The matrices σ 2 (s), s ∈ [0, T ], are symmetric and non negative, hence they have a unique real symmetric square root σ(s). In addition, if, for ¯ . is not a smooth coboundary, then there exists c > 0 such that each v ∈ Rd , -v, ω σ(s) ≥ c.

332

JACOPO DE SIMOI AND CARLANGELO LIVERANI

Proof. For each v ∈ Rd a direct computation shows that ⎛+ .2 ⎞ n−1  1 -v, ω(fθk (·), θ). ⎠ lim mθ ⎝ n→∞ n k=0

n−1   1  mθ -v, ω(fθk (·), θ).-v, ω(fθj (·), θ). n→∞ n

= lim

k,j=0

n−1 2 (n − k)mθ (-ω(·, θ), v.-v, ω(fθk (·), θ).) n→∞ n

= mθ (-v, ω(·, θ).2 ) + lim

k=1

= mθ (-v, ω(·, θ).2 ) + 2

∞ 

! mθ -ω(·, θ), v.-v, ω(fθk (·), θ). = -v, σ(θ)2 v..

k=1

This implies that σ(θ) ≥ 0 and since it is symmetric, there exists, unique, σ(θ) symmetric and non-negative. On the other hand if -v, σ 2 (θ)v. = 0, then, by the decay of correlations and the above equation, we have ⎛+ .2 ⎞ n−1  -v, ω(fθk (·), θ). ⎠ = n mθ (-v, ω(·, θ).2 ) mθ ⎝ 2

k=1

+ 2n

n−1 

! mθ -v, ω(·, θ).-v, ω(fθk (·), θ). + O(1)

k=0

= 2n n−1

∞ 

! mθ -v, ω(·, θ).-v, ω(fθk (·), θ). +O(1) = O(1).

k=n

Thus the L norm of φn = k=1 -v, ω(fθk (·), θ). is uniformly bounded. Hence there exist a weakly convergent subsequence. Let φ ∈ L2 be an accumulation point, then for each ϕ ∈ C 1 we have 2

mθ (φ ◦ fθ ϕ) = lim mθ (φnk ◦ fθ ϕ) = mθ (φϕ) − mθ (-v, ω(·, θ).ϕ) k→∞

That is -v, ω(x, θ). = φ(x) − φ ◦ fθ (x). In other words -v, ω(x, θ). is an L2 coboundary. Since the Livsic Theorem [11] states that the solution of the cohomological  equation must be smooth, we have φ ∈ C 1 . 7.3. Uniqueness of the Martingale Problem. We are left with the task of proving the uniqueness of the martingale problem. Note that in the present case the operator depends explicitly on time. Thus if we want to set the initial condition at a time s = 0 we need to slightly generalise the definition of martingale problem. To avoid this, for simplicity, here we consider only initial conditions at time zero, which suffice for our purposes. In fact, we will consider only the initial condition ζ(0) = 0, since it is the only one we are interested in. We have then the same definition of the martingale problem as in Definition 1, apart form the fact that L is replaced by Ls and y = 0. Since the operators Ls are second order operators, we could use well known results. Indeed, there exists a deep theory due to Stroock and Varadhan that establishes the uniqueness of the martingale problem for a wide class of second order operators, [13]. Yet, our case is especially simple because the coefficients of the higher order part of the differential operator depend only on time and not on

MARTINGALES

333

ζ. In this case it is possible to modify a simple proof of the uniqueness that works when all the coefficients depend only on time, [13, Lemma 6.1.4]. We provide here the argument for the reader’s convenience. Proposition 7.6. The martingale problem associated to the operators Ls in Proposition 7.4 has a unique solution. Proof. As already noticed, Lt , defined in (7.2), depends on ζ only via the coefficient of the first order part. It is then natural to try to change measure so that such a dependence is eliminated and we obtain a martingale problem with respect to an operator with all coefficients depending only on time, then one can conclude arguing as in [13, Lemma 6.1.4]. Such a reduction is routinely done in probability via the Cameron-Martin-Girsanov formula. Yet, given the simple situation at hand one can proceed in a much more naive manner. Let S(t) : [0, T ] → Md , Md being the space of d × d matrices, be the generated by the differential equation ˙ S(t) = −Dω(Θ(t))S(t) S(0) = . Note that, setting ς(t) = det S(t) and B(t) = Dω(Θ(t)), we have ς(t) ˙ = − tr(B(t))ς(t) ς(0) = 1. The above implies that S(t) is invertible. Define the map S ∈ C 0 (C 0 ([0, T ], Rd ), C 0 ([0, T ], Rd )) by [Sζ](t) = S(t)ζ(t) and 9 Note that the map S is invertible. Finally, we define the operator set P = S∗ P. 1/ 2 [Σ(t) ]i,j ∂ζi ∂ζj , L/t = 2 i,j / 2 = S(t)σ(t)2 S(t)∗ , σ(t) = σ(Θ(t)) as mentioned after (7.2). Let us where Σ(t) verify that P satisfies the martingale problem with respect to the operators L/t . By Lemma C.1 we have d d9 E(A(ζ(t)) | Fs ) = E(A(S(t)ζ(t)) | Fs ) dt dt 9 S(t)∇A(S(t)ζ(t)) ˙ = E( + Lt A(S(t)ζ(t)) | Fs ) ⎞ ⎛ #  # 19 ⎝ σ 2 (t)i,j ∂ζk ∂ζl A(S(t)ζ(t))S(t)k,iS(t)l,j ## Fs ⎠ = E 2 i,j,k,l

= E(L/t A(ζ(t)) | Fs ). Thus the claim follows by Lemma C.1 again. Accordingly, if we prove that the above martingale problem has a unique so9 conlution, then P is uniquely determined, which, in turn, determines uniquely P, cluding the proof. Let us define the function B ∈ C 1 (R2d+1 , R) by B(t, ζ, λ) = eλ,ζ− 2 1

t s

 )2 λdτ λ,Σ(τ

then Lemma C.1 implies d 1 / 2 λ.B(t, ζ(t), λ) + L/t B(t, ζ(t), λ) | Fs ) = 0. E(B(t, ζ(t), λ) | Fs ) = E(− -λ, Σ(t) dt 2

334

JACOPO DE SIMOI AND CARLANGELO LIVERANI

Hence E(eλ,ζ(t) | Fs ) = eλ,ζ(s)+ 2 1

t s

 )2 λdτ λ,Σ(τ

.

From this follows that the finite dimensional distributions are uniquely determined. Indeed, for each n ∈ N, {λi }ni=1 and 0 ≤ t1 < · · · < tn we have  n  n−1    # E e i=1 λi ,ζ(ti ) = E e i=1 λi ,ζ(ti ) E eλn ,ζ(tn ) # Ftn−1  n−2  1  tn λ ,Σ(τ  )2 λn dτ n = E e i=1 λi ,ζ(ti )+λn−1 +λn ,ζ(tn−1 ) e 2 tn−1 1

= e2

 tn n  )2  n  i=n(τ ) λi ,Σ(τ i=n(τ ) λi dτ 0

where n(τ ) = inf{m | tm ≥ τ }. This concludes the Lemma since it implies that the measure is uniquely determined on the sets that generate the σ-algebra.27 Note that we have also proven that the process is a zero mean Gaussian process; this, after translating back to the original measure, generalises Remark 2.3.  Appendix A. Geometry For c > 0, consider the cones Cc = {(ξ, η) ∈ R2 : |η| ≤ εc|ξ|}. Note that   ∂x f ∂θ f . dFε = ε∂x ω 1 + ε∂θ ω Thus if (1, εu) ∈ Cc , dp Fε (1, εu) = (∂x f (p) + εu∂θ f (p), ε∂x ω(p) + εu + ε2 u∂θ ω(p))   ∂θ f (p) = ∂x f (p) 1 + ε u · (1, εΞp (u)) ∂x f (p) where (A.1)

Ξp (u) =

∂x ω(p) + (1 + ε∂θ ω(p))u . ∂x f (p) + ε∂θ f (p)u

Thus the vector (1, εu) is mapped to the vector (1, εΞp (u)). Thus letting K = max{∂x ω∞ , ∂θ ω∞ , ∂θ f ∞ } we have, for |u| ≤ c and assuming Kεc ≤ 1, |Ξp (u)| ≤

K +1+c K + (1 + εK)c ≤ . λ − εKc λ−1

−1 ] we have that dp Fε (Cc ) ⊂ Cc . Since this Thus, if we choose c ∈ [ K+1 λ−2 , (εK) −1 implies that dp Fε Cc ⊂ Cc we have that the complementary cone CKε−1 is invariant under dFε−1 . From now on we fix c = K+1 λ−2 . 1+d and n ∈ N, we can define the quantities vn , un , sn , rn Hence, for any p ∈ T as follows:

(A.2)

dp Fεn (1, 0) = vn (1, εun )

dp Fεn (sn , 1) = rn (0, 1)

with |un | ≤ c and |sn | ≤ K. For each n the slope field sn is smooth, therefore integrable; given any small Δ > 0 and p = (x, θ) ∈ T1+d , define Wnc (p, Δ) the local n-step central manifold of size Δ as the connected component containing p of the intersection with the strip {|θ − θ| < Δ} of the integral curve of (sn , 1) passing through p. 27 See

the discussion at the beginning of Section 2.

MARTINGALES

335

Notice that, by definition, dp Fε (sn (p), 1) = rn /rn−1 (sn−1 (Fε p), 1); thus, by definition, there exists a constant b such that: rn (A.3) exp(−bε) ≤ ≤ exp(bε). rn−1  k Furthermore, define Γn = n−1 k=0 ∂x f ◦ Fε , and let ( ( ( ∂θ f ( ( . ( (A.4) a = c( ∂x f (∞ Clearly, Γn exp(−aεn) ≤ vn ≤ Γn exp(aεn).

(A.5)

Appendix B. Shadowing In this section we provide a simple quantitative version of shadowing that is needed in the argument. Let (xk , θk ) = Fεk (x, θ) with k ∈ {0, . . . , n}. We assume that θ belongs to the range of a standard pair  (i.e., θ = G(x) for some x ∈ [a, b]). Let θ ∗ ∈ S such that θ ∗ − θ ≤ ε and set f∗ (x) = f (x, θ ∗ ). Let us denote with πx : X → S the canonical projection on the x coordinate; then, for any s ∈ [0, 1], let n (x, θ ∗ + s(G (x) − θ ∗ )) − f∗n (z) Hn (x, z, s) = πx Fsε

Note that, Hn (x, x, 0) = 0, in addition, for any x, z and s ∈ [0, 1] ∂z Hn (x, z, s) = −(f∗n ) (z). Accordingly, by the Implicit Function Theorem any n ∈ N and s ∈ [0, 1], there exists Yn (x, s) such that28 Hn (x, Yn (x, s), s) = 0; from now on Yn (x) stands for Yn (x, 1). Note that setting x∗k = f∗k (Yn (x)), by construction, x∗n = xn . Observe moreover that (1 − G sn )vn , (B.1) ∂x Yn = (f∗n ) (z)−1 d(πx Fεn ) = (f∗n ) ◦ Yn where we have used the notations introduced in equation (A.2). Recalling (A.5) and by the cone condition we have n−1 n−1 & ∂x f (xk , θk ) ## (1 − G sn )vn ## & ∂x f (xk , θk )  # ≤ ec# εn # (B.2) e−c# εn ≤ . ∗ # #  n  f∗ (xk ) (f∗ ) f∗ (x∗k ) k=0

k=0

Next, we want to estimate to which degree

x∗k

shadows the true trajectory.

Lemma B.1. There exists C > 0 such that, for each k ≤ n < Cε− 2 we have 1

θk − θ ∗  ≤ C# εk |xk − x∗k | ≤ C# εk. 28 The Implicit Function Theorem allows to define Y (x, s) in a neighborhood of s = 0; in fact n we claim that this neighborhood necessarily contains [0, 1]. Otherwise, there would exist s¯ ∈ (0, 1) x, s¯) but not at (¯ x, s) with s > s¯. We then could apply the Implicit and x ¯ so that Yn is defined at (¯ x, s¯), s¯) and obtain, by uniqueness, an extension of the Function Theorem at the point (¯ x, Yn (¯ previous function Yn to a larger neighborhood of s = 0, which contradicts our assumption.

336

JACOPO DE SIMOI AND CARLANGELO LIVERANI

Proof. Observe that θk = ε

k−1 

ω(xj , θj ) + θ0

j=0

thus θk − θ ∗  ≤ C# εk. Accordingly, let us set29 ξk = x∗k − xk ; then, by the mean value theorem, |ξk+1 | = |∂x f · ξk + ∂θ f · (θk − θ ∗ )| ≥ λ|ξk | − C# εk. Since, by definition, ξn = 0, we can proceed by backward induction, which yields |ξk | ≤

n−1 

λ−j+k C# εj ≤ C# ε

∞ 

λ−j (j + k) ≤ C# εk.



j=0

j=k

Lemma B.2. There exists C > 0 such that, for each n ≤ Cε− 2 , 1

e−c# εn ≤ |Yn | ≤ ec# εn . 2

(B.3)

2

In particular, Yn is invertible with uniformly bounded derivative. Proof. Let us prove the upper bound, the lower bound being similar. By equations (B.1), (B.2) and Lemma B.1 we have n−1

|Yn | ≤ ec# εn e

k=0

ln ∂x f (xk ,θk )−ln f∗ (x∗ k)

≤ ec# εn ec#

n−1 k=0

εk

.



Appendix C. Martingales, operators and It¯ o’s calculus Suppose that Lt ∈ L(C r (Rd , R), C 0 (Rd , R)), t ∈ R, is a one parameter family of bounded linear operators that depends continuously on t.30 Also suppose that P is a measure on C 0 ([0, T ], Rd ) and let Ft be the σ-algebra generated by the variables {z(s)}s≤t .31 Lemma C.1. The two properties below are equivalent: (1) For all A ∈ C 1 (Rd+1 , R), such that, for all t ∈ R, A(t, ·) ∈ C r (Rd , R), and for all times s, t ∈ [0, T ], s < t, the function g(t) = E(A(t, z(t)) | Fs ) is differentiable and g  (t) = E(∂t A(t, z(t)) + Lt A(t, z(t)) | Fs ). t (2) For all A ∈ C r (Rd , R), M (t) = A(z(t)) − A(z(0)) − 0 Ls A(z(s))ds is a martingale with respect to Ft . Proof. Let us start with (1) ⇒ (2). Let us fix t ∈ [0, T ], then for each s ∈ [0, t] let us define the random variables B(s) by  t B(s, z) = A(z(t)) − A(z(s)) − Lτ A(z(τ ))dτ. s 29 Here, as we already done before, we are using the fact that we can lift T1 to the universal covering R. 30 Here C r are thought as Banach spaces, hence consist of bounded functions. A more general setting can be discussed by introducing the concept of a local martingale. 31 At this point the reader is supposed to be familiar with the intended meaning: for all ϑ ∈ C 0 ([0, T ], Rd ), [z(s)](ϑ) = z(ϑ, s) = ϑ(s).

MARTINGALES

337

Clearly, for each z ∈ C 0 , B(s, z) is continuous in s, and B(t, z) = 0. Hence, for all τ ∈ (s, t], by Fubini we have32  t d d d E(B(τ ) | Fs ) = − E(A(z(τ )) | Fs ) − E(Lr A(z(r)) | Fs )dr dτ dτ dτ τ = E(−Lτ A(z(τ )) + Lτ A(z(τ )) | Fs ) = 0. Thus, since B is bounded, by Lebesgue dominated convergence theorem, we have 0 = E(B(t) | Fs ) = lim E(B(τ ) | Fs ) = E(B(s) | Fs ). τ →0

This implies E(M (t) | Fs ) = E(B(s) | Fs ) + M (s) = M (s) as required. Next, let us check (2) ⇒ (1). For each h > 0 we have E(A(t + h, z(t + h))−A(t, z(t)) | Fs ) = E ((∂t A)(t, z(t + h)) | Fs ) h + o(h)    t+h

Lτ A(t, z(τ ))dτ | Fs

+ E M (t + h) − M (t) +

.

t

Since M is a martingale E (M (t + h) − M (t) | Fs ) = 0. The lemma follows by Lebesgue dominated convergence theorem.  The above is rather general, to say more it is necessary to specify other properties of the family of operators Ls . A case of particular interest arises for second order differential operators like (7.2). Namely, suppose that (Ls A)(z) =



a(z, s)i ∂zi A(z) +

i

d 1  2 [σ (z, s))]i,j ∂zi ∂zj A(z), 2 i,j=1

where, for simplicity, we assume a, σ to be smooth and bounded and σij = σji . Clearly, (7.2) is a special case of the above. In such a case it turns out that it can be established a strict connection between Ls and the Stochastic Differential Equation (C.1)

dz = adt + σdB

where B is the standard Brownian motion. The solution of (C.1) can be defined in various way. One possibility is to define it as the solution of the Martingale problem [13], another is to use stochastic integrals [15, Theorem 6.1]. The latter, more traditional, approach leads to It¯ o’s formula that reads, for each bounded continuous function A of t and z, [15, page 91],  t  t ∂s A(z(s), s)ds + a(z(s), s)i ∂zi A(z(s), s)ds A(z(t), t) − A(z(0), 0) = 0

+ +

 1 t 2

0

i

σ 2 (z(s), s)∂zi ∂zj A(z(s), s)ds

i,j

 i,j

0

t

σij (z(s), s)∂zj A(z(s), s)j dBi (s) 0

32 If uncomfortable about applying Fubini to conditional expectations, then have a look at [15, Theorem 4.7].

338

JACOPO DE SIMOI AND CARLANGELO LIVERANI

where the last is a stochastic integral [15, Theorem 5.3]. This formula is often written in the more impressionistic form 1 dA = ∂t Adt + a∂z Adt + σ∂z AdB + σ 2 ∂z2 Adt = ∂t Adt + σ∂z AdB + Lt Adt. 2 Taking the expectation with respect to E(· | Fs ) we obtain exactly condition (1) of Lemma C.1, hence we have that the solution satisfies the Martingale problem. Remark C.2. Note that, if one defines the solution of (C.1) as the solution of the associated Martingale problem, then one can dispense from It¯ o’s calculus altogether. This is an important observation in our present context in which the fluctuations come form a deterministic problem rather than from a Brownian motion and hence a direct application of It¯ o’s formula is not possible. References [1] Viviane Baladi, Positive transfer operators and decay of correlations, Advanced Series in Nonlinear Dynamics, vol. 16, World Scientific Publishing Co., Inc., River Edge, NJ, 2000. MR1793194 (2001k:37035) [2] Viviane Baladi and Masato Tsujii, Anisotropic H¨ older and Sobolev spaces for hyperbolic diffeomorphisms (English, with English and French summaries), Ann. Inst. Fourier (Grenoble) 57 (2007), no. 1, 127–154. MR2313087 (2008d:37034) [3] Dmitry Dolgopyat, Averaging and invariant measures (English, with English and Russian summaries), Mosc. Math. J. 5 (2005), no. 3, 537–576, 742. MR2241812 (2007i:37063) [4] M. D. Donsker and S. R. S. Varadhan, Large deviations for noninteracting infiniteparticle systems, J. Statist. Phys. 46 (1987), no. 5-6, 1195–1232, DOI 10.1007/BF01011162. MR893138 (88i:60047) [5] S´ ebastien Gou¨ ezel and Carlangelo Liverani, Banach spaces adapted to Anosov systems, Ergodic Theory Dynam. Systems 26 (2006), no. 1, 189–217, DOI 10.1017/S0143385705000374. MR2201945 (2007h:37037) [6] S´ ebastien Gou¨ ezel and Carlangelo Liverani, Compact locally maximal hyperbolic sets for smooth maps: fine statistical properties, J. Differential Geom. 79 (2008), no. 3, 433–477. MR2433929 (2010a:37050) [7] M. Z. Guo, G. C. Papanicolaou, and S. R. S. Varadhan, Nonlinear diffusion limit for a system with nearest neighbor interactions, Comm. Math. Phys. 118 (1988), no. 1, 31–59. MR954674 (89m:60255) [8] Anatole Katok and Boris Hasselblatt, Introduction to the modern theory of dynamical systems, Encyclopedia of Mathematics and its Applications, vol. 54, Cambridge University Press, Cambridge, 1995. With a supplementary chapter by Katok and Leonardo Mendoza. MR1326374 (96c:58055) o’s stochastic differential equations [9] R. Z. Hasminski˘ı, On the principle of averaging the Itˆ (Russian, with Czech summary), Kybernetika (Prague) 4 (1968), 260–279. MR0260052 (41 #4681) [10] Carlangelo Liverani, Invariant measures and their properties. A functional analytic point of view, Dynamical systems. Part II, Pubbl. Cent. Ric. Mat. Ennio Giorgi, Scuola Norm. Sup., Pisa, 2003, pp. 185–237. MR2071241 (2005d:37045) [11] A. N. Livˇsic, Cohomology of dynamical systems (Russian), Izv. Akad. Nauk SSSR Ser. Mat. 36 (1972), 1296–1320. MR0334287 (48 #12606) [12] Charles Pugh and Michael Shub, Stably ergodic dynamical systems and partial hyperbolicity, J. Complexity 13 (1997), no. 1, 125–179, DOI 10.1006/jcom.1997.0437. MR1449765 (98e:58110) [13] Daniel W. Stroock and S. R. Srinivasa Varadhan, Multidimensional diffusion processes, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 233, Springer-Verlag, Berlin-New York, 1979. MR532498 (81f:60108) [14] S. R. S. Varadhan, Probability theory, Courant Lecture Notes in Mathematics, vol. 7, New York University, Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI, 2001. MR1852999 (2003a:60001)

MARTINGALES

339

[15] S. R. S. Varadhan, Stochastic processes, Courant Lecture Notes in Mathematics, vol. 16, Courant Institute of Mathematical Sciences, New York; American Mathematical Society, Providence, RI, 2007. MR2354349 (2009c:60207) Department of Mathematics, University of Toronto, 40 St George St. Toronto, Ontario M5S 2E4 E-mail address: [email protected] URL: http://www.math.utoronto.ca/jacopods ` di Roma (Tor Vergata), Via della Ricerca Dipartimento di Matematica, II Universita Scientifica, 00133 Roma, Italy E-mail address: [email protected]

SELECTED PUBLISHED TITLES IN THIS SERIES

89 D. Dolgopyat, Y. Pesin, M. Pollicott, and L. Stoyanov, Editors, Hyperbolic Dynamics, Fluctuations and Large Deviations, 2015 88 Ron Donagi, Michael R. Douglas, Ljudmila Kamenova, and Martin Rocek, Editors, String-Math 2013, 2014 87 Helge Holden, Barry Simon, and Gerald Teschl, Editors, Spectral Analysis, Differential Equations and Mathematical Physics: A Festschrift in Honor of Fritz Gesztesy’s 60th Birthday, 2013 86 Kailash C. Misra, Daniel K. Nakano, and Brian J. Parshall, Editors, Recent Developments in Lie Algebras, Groups and Representation Theory, 2012 85 Jonathan Block, Jacques Distler, Ron Donagi, and Eric Sharpe, Editors, String-Math 2011, 2012 84 Alex H. Barnett, Carolyn S. Gordon, Peter A. Perry, and Alejandro Uribe, Editors, Spectral Geometry, 2012 83 Hisham Sati and Urs Schreiber, Editors, Mathematical Foundations of Quantum Field Theory and Perturbative String Theory, 2011 82 Michael Usher, Editor, Low-dimensional and Symplectic Topology, 2011 81 Robert S. Doran, Greg Friedman, and Jonathan Rosenberg, Editors, Superstrings, Geometry, Topology, and C ∗ -algebras, 2010 80 D. Abramovich, A. Bertram, L. Katzarkov, R. Pandharipande, and M. Thaddeus, Editors, Algebraic Geometry, 2009 79 Dorina Mitrea and Marius Mitrea, Editors, Perspectives in Partial Differential Equations, Harmonic Analysis and Applications, 2008 78 Ron Y. Donagi and Katrin Wendland, Editors, From Hodge Theory to Integrability and TQFT, 2008 77 Pavel Exner, Jonathan P. Keating, Peter Kuchment, Toshikazu Sunada, and Alexander Teplyaev, Editors, Analysis on Graphs and Its Applications, 2008 76 Fritz Gesztesy, Percy Deift, Cherie Galvez, Peter Perry, and Wilhelm Schlag, Editors, Spectral Theory and Mathematical Physics: A Festschrift in Honor of Barry Simon’s 60th Birthday, 2007 75 Solomon Friedberg, Daniel Bump, Dorian Goldfeld, and Jeffrey Hoffstein, Editors, Multiple Dirichlet Series, Automorphic Forms, and Analytic Number Theory, 2006 74 Benson Farb, Editor, Problems on Mapping Class Groups and Related Topics, 2006 73 Mikhail Lyubich and Leon Takhtajan, Editors, Graphs and Patterns in Mathematics and Theoretical Physics, 2005 72 Michel L. Lapidus and Machiel van Frankenhuijsen, Editors, Fractal Geometry and Applications: A Jubilee of Benoˆıt Mandelbrot, 2004 71 Gordana Mati´ c and Clint McCrory, Editors, Topology and Geometry of Manifolds, 2003 70 Michael D. Fried and Yasutaka Ihara, Editors, Arithmetic Fundamental Groups and Noncommutative Algebra, 2002 69 Anatole Katok, Rafael de la Llave, Yakov Pesin, and Howard Weiss, Editors, Smooth Ergodic Theory and Its Applications, 2001 68 Robert S. Doran and V. S. Varadarajan, Editors, The Mathematical Legacy of Harish-Chandra, 2000 67 Wayne Raskind and Charles Weibel, Editors, Algebraic K-Theory, 1999 66 Robert S. Doran, Ze-Li Dou, and George T. Gilbert, Editors, Automorphic Forms, Automorphic Representations, and Arithmetic, 1999

For a complete list of titles in this series, visit the AMS Bookstore at www.ams.org/bookstore/pspumseries/.

PSPUM

89

Hyperbolic Dynamics, Fluctuations and Large Deviations • Dolgopyat et al., Editors

AMS