Lyapunov exponents of linear cocycles : continuity via large deviations 978-94-6239-123-9, 9462391238, 978-94-6239-124-6

The aim of this monograph is to present a general method of proving continuity of Lyapunov exponents of linear cocycles.

411 26 2MB

English Pages 263 [271] Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Lyapunov exponents of linear cocycles : continuity via large deviations
 978-94-6239-123-9, 9462391238, 978-94-6239-124-6

Table of contents :
Front Matter....Pages i-xiii
Introduction....Pages 1-21
Estimates on Grassmann Manifolds....Pages 23-79
Abstract Continuity of Lyapunov Exponents....Pages 81-111
The Oseledets Filtration and Decomposition....Pages 113-160
Large Deviations for Random Cocycles....Pages 161-205
Large Deviations for Quasi-Periodic Cocycles....Pages 207-246
Further Related Problems....Pages 247-260
Back Matter....Pages 261-263

Citation preview

Atlantis Studies in Dynamical Systems Series Editors: H. Broer · B. Hasselblatt

Pedro Duarte Silvius Klein

Lyapunov Exponents of Linear Cocycles Continuity via Large Deviations · Volume 3

Atlantis Studies in Dynamical Systems Volume 3

Series editors Henk Broer, Groningen, The Netherlands Boris Hasselblatt, Medford, USA

The “Atlantis Studies in Dynamical Systems” publishes monographs in the area of dynamical systems, written by leading experts in the field and useful for both students and researchers. Books with a theoretical nature will be published alongside books emphasizing applications.

More information about this series at http://www.atlantis-press.com

Pedro Duarte Silvius Klein •

Lyapunov Exponents of Linear Cocycles Continuity via Large Deviations

Pedro Duarte Faculdade de Ciências Universidade de Lisboa Lisbon Portugal

Silvius Klein Department of Mathematical Sciences Norwegian University of Science and Technology (NTNU) Trondheim Norway

Atlantis Studies in Dynamical Systems ISBN 978-94-6239-123-9 ISBN 978-94-6239-124-6 DOI 10.2991/978-94-6239-124-6

(eBook)

Library of Congress Control Number: 2016933219 © Atlantis Press and the author(s) 2016 This book, or any parts thereof, may not be reproduced for commercial purposes in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system known or to be invented, without prior permission from the Publisher. Printed on acid-free paper

In memory of João Santos Guerreiro and Ricardo Mañé, professors whose friendship and intelligence I miss Pedro Duarte To Florin Popovici and Șerban Strătilă who taught me to seek and to appreciate good mathematical exposition Silvius Klein

Preface

The aim of this monograph is to present a general method of proving continuity of the Lyapunov exponents (LE) of linear cocycles. The method consists of an inductive procedure that establishes continuity of relevant quantities for finite, larger and larger number of iterates of the system. This leads to continuity of the limit quantities, the LE. The inductive procedure is based upon a deterministic result on the composition of a long chain of linear maps called the Avalanche Principle (AP). A geometric approach is used to derive a general version of this principle. The main assumption required by this method is the availability of appropriate large deviation type (LDT) estimates for quantities related to the iterates of the base and fiber dynamics associated with the linear cocycle. Crucial for our approach is the uniformity in the data of these estimates. We derive such LDT estimates for various models of random cocycles (over Bernoulli and Markov systems) and quasi-periodic cocycles (defined by one or multivariable torus translations). The random model, treated under an irreducibility assumption, uses an existing functional analytic approach which we adapt so that it provides the required uniformity of the estimates. The quasi-periodic model uses harmonic analysis and it involves the study of (pluri) subharmonic functions. This method has its origins in a paper of M. Goldstein and W. Schlag which proves continuity of the Lyapunov exponent for the one-parameter family of quasi-periodic Schrödinger cocycles, assuming a uniform lower bound on the exponent. This is where the first version of the Avalanche Principle appeared, along with the use and proof of the relevant LDT estimate. The present work expands upon their approach in both depth and breadth. Moreover, it reduces the general problem of proving continuity of the LE to one of a different nature—proving LDT estimates. This may be treated independently and by means specific to the underlying base dynamic of the the cocycle. Our geometric approach to the AP also gives rise to a mechanism for studying the most expanding singular direction of the composition of a long chain of linear maps. This allows us to obtain a new proof of the Multiplicative Ergodic

vii

viii

Preface

Theorem of Oseledets. Moreover, assuming the availability of the same LDT estimates, this extension of the AP leads to continuity properties of the Oseledets filtration and decomposition. Most of the results presented in this research monograph are new. We assume the reader to have a certain degree of familiarity with basic dynamical systems and ergodic theory notions. The relevant concepts and definitions needed for the formulation of the main results are introduced in Chap. 1. While each subsequent chapter is to some extent self-contained and it may be read independently of the rest, all the arguments in this work are based upon the results in Chaps. 2 and 3. Besides the formulation and the proof of the AP, Chap. 2 contains Lipschitz estimates on certain Grassmann geometrical quantities that are crucial in Chap. 4, where we study the Oseledets filtration and decomposition and their continuity properties. In Chap. 3 we establish the abstract continuity theorem (ACT) of the LE and some other related technical results. In Chaps. 5 and 6, under appropriate assumptions, we derive the relevant LDT estimates for random and respectively quasi-periodic cocycles. The general results in Chaps. 3 and 4 are then applicable to these models, and they imply continuity properties of the LE and of the Oseledets filtration and decomposition for the corresponding spaces of cocycles. Our work concludes in Chap. 7 with a list of related open problems, some of which may be treated using the methods described in this monograph. The first author was supported by National Funding from FCT—Fundação para a Ciência e a Tecnologia, under the project: UID/MAT/04561/2013. The second author was supported by the Norwegian Research Council project no. 213638, “Discrete Models in Mathematical Analysis”. Both authors are grateful to the Faculty of Sciences of the University of Lisbon (FCUL) and to the Norwegian University of Science and Technology (NTNU) for the support received and for facilitating their collaboration on this monograph. We would like to thank José Pedro Gaivão and Wilhelm Schlag for reading through parts of the manuscript. And last but not least, many thanks to Teresa, Zé, Jaime, Daniel and Jaqueline for their understanding. Lisbon Trondheim January 2016

Pedro Duarte Silvius Klein

Contents

1 Introduction . . . . . . . . . . . . . . . . . . 1.1 Prologue. . . . . . . . . . . . . . . . . . 1.2 The Main Concepts . . . . . . . . . . 1.3 The Continuity Problem . . . . . . . 1.4 Large Deviations Type Estimates . 1.5 Summary of Results. . . . . . . . . . 1.6 Literature Review . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

1 1 5 11 12 15 18 20

2 Estimates on Grassmann Manifolds . . . . . . . . . 2.1 Grassmann Geometry . . . . . . . . . . . . . . . . . 2.1.1 Projective Spaces . . . . . . . . . . . . . . 2.1.2 Exterior Algebra . . . . . . . . . . . . . . . 2.1.3 Grassmann Manifolds . . . . . . . . . . . 2.1.4 Flag Manifolds . . . . . . . . . . . . . . . . 2.2 Singular Value Geometry . . . . . . . . . . . . . . 2.2.1 Singular Value Decomposition . . . . . 2.2.2 Gaps and Most Expanding Directions 2.2.3 Angles and Expansion . . . . . . . . . . . 2.3 Lipschitz Estimates . . . . . . . . . . . . . . . . . . 2.3.1 Projective Action. . . . . . . . . . . . . . . 2.3.2 Operations on Flag manifolds . . . . . . 2.3.3 Dependence on the Linear Map. . . . . 2.4 Avalanche Principle . . . . . . . . . . . . . . . . . . 2.4.1 Contractive Shadowing . . . . . . . . . . 2.4.2 Statement and Proof of the AP . . . . . 2.4.3 Consequences of the AP. . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

23 23 23 25 28 30 32 32 35 38 48 48 50 57 63 64 68 73 79

ix

x

Contents

3 Abstract Continuity of Lyapunov Exponents . . . . . . . . . 3.1 Definitions, the Abstract Setup and Statement . . . . . . 3.1.1 Cocycles and Observables . . . . . . . . . . . . . . 3.1.2 Large Deviations Type Estimates. . . . . . . . . . 3.1.3 Abstract Continuity Theorem of the Lyapunov Exponents . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Upper Semicontinuity of the Top Lyapunov Exponent 3.3 Finite Scale Continuity . . . . . . . . . . . . . . . . . . . . . . 3.4 The Inductive Step Procedure . . . . . . . . . . . . . . . . . 3.5 General Continuity Theorem . . . . . . . . . . . . . . . . . . 3.6 Modulus of Continuity . . . . . . . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

81 81 82 84

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

86 86 95 97 102 107 111

4 The Oseledets Filtration and Decomposition . . . . . . . . . . . . . . 4.1 Introduction and Statements . . . . . . . . . . . . . . . . . . . . . . . 4.2 The Ergodic Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.1 Review of Grassmann Geometry Concepts and Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 The Ergodic Theorems of Birkhoff and Kingman . . . 4.2.3 The Multiplicative Ergodic Theorem . . . . . . . . . . . . 4.3 Abstract Continuity Theorem of the Oseledets Filtration . . . . 4.3.1 Continuity of the Most Expanding Direction . . . . . . . 4.3.2 Spaces of Measurable Filtrations and Decompositions 4.3.3 Continuity of the Oseledets Filtration . . . . . . . . . . . . 4.3.4 Continuity of the Oseledets Decomposition. . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

... ... ...

113 113 116

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

116 118 120 141 142 149 153 154 160

5 Large Deviations for Random Cocycles. . . . . . . . . . 5.1 Introduction and Statements . . . . . . . . . . . . . . . 5.1.1 Description of the Model . . . . . . . . . . . . 5.1.2 The Spectral Method . . . . . . . . . . . . . . . 5.1.3 Literature Review . . . . . . . . . . . . . . . . . 5.2 An Abstract Setting . . . . . . . . . . . . . . . . . . . . . 5.2.1 The Assumptions. . . . . . . . . . . . . . . . . . 5.2.2 An Abstract Theorem. . . . . . . . . . . . . . . 5.3 The Proof of LDT Estimates . . . . . . . . . . . . . . . 5.3.1 Base LDT Estimates . . . . . . . . . . . . . . . 5.3.2 Fiber LDT Estimates . . . . . . . . . . . . . . . 5.4 Deriving Continuity of the Lyapunov Exponents . 5.4.1 Proof of the Continuity . . . . . . . . . . . . . 5.4.2 Some Generalizations. . . . . . . . . . . . . . . 5.4.3 Method Limitations . . . . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

161 161 161 170 175 175 176 185 187 187 191 201 201 202 204 205

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

Contents

xi

6 Large Deviations for Quasi-Periodic Cocycles . . . . . . . . . . 6.1 Introduction and Statements . . . . . . . . . . . . . . . . . . . . 6.1.1 Description of the Model . . . . . . . . . . . . . . . . . 6.1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . 6.2 Estimates on Unbounded Pluri-Subharmonic Functions. . 6.2.1 The Uniform Łojasiewicz Inequality . . . . . . . . . 6.2.2 Uniform L2 -Bounds on Analytic Functions . . . . . 6.2.3 Estimates on Unbounded Subharmonic Functions 6.2.4 Base LDT Estimates for Pluri-Subharmonic Observables . . . . . . . . . . . . . . . . . . . . . . . . . . 6.3 The Proof of the Fiber LDT Estimate . . . . . . . . . . . . . . 6.3.1 Uniform Measurements on the Cocycle . . . . . . . 6.3.2 The Nearly Almost Invariance Property . . . . . . . 6.3.3 The Statement and Proof of the LDT . . . . . . . . . 6.4 Deriving Continuity of the Lyapunov Exponents . . . . . . 6.5 Refinements in the One-Variable Case . . . . . . . . . . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

207 207 207 210 212 212 215 218

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

223 230 230 232 234 236 237 245

7 Further Related Problems . . . . . . . . . . . . . . . . 7.1 Limitations and Counterexamples . . . . . . . . 7.2 Some Connections to Mathematical Physics . 7.3 Continuity for Other Spaces of Cocycles . . . 7.4 Continuity with Respect to Other Parameters. References . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

247 247 250 253 257 259

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

261

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

Acronyms

ACT AP IDS LDT LE MET MPDS MPT

Abstract continuity theorem Avalanche principle Integrated density of states Large deviation type Lyapunov exponent Multiplicative ergodic theorem Measure preserving dynamical system Measure preserving transformation

xiii

Chapter 1

Introduction

1.1 Prologue We are about to offer the reader a distilled version of the entire monograph. All the main actors will be briefly introduced, along with their principal roles. We begin with the avalanche principle, first formulated for SL(2, R) matrices in [14]. The AP is a deterministic statement relating the norm growth of a long chain of matrices to the corresponding product of matrix norms. In order to state our general version of this principle, let us introduce some notations. Given a matrix g ∈ Mat(m, R), we denote its singular values by ≥ 1 represent s1 (g) ≥ s2 (g) ≥ · · · ≥ sm (g) ≥ 0 . Moreover, let gr(g) = ss21 (g) (g) the gap ratio between its first two singular values. When gr(g) > 1, the first singular value s1 (g) is simple, so the corresponding most expanding (singular) direction v(g) ∈ P(Rm ) is well defined. Given a sequence of matrices g0 , g1 , . . . , gn−1 ∈ Mat(m, R), we define its expansion rift as gn−1 . . . g1 g0  . ρ(g0 , g1 , . . . , gn−1 ) := gn−1  . . . g1 g0  The angle between two matrices g, g  ∈ Mat(m, R), denoted by α(g, g  ), is the sine of the angle between the g-image of v(g) and the orthogonal complement of v(g  ). Hence α(g, g  ) > ε means that the g-image of the most expanding direction of g is reasonably well aligned with the most expanding direction of g  . Moreover, g . if gr(g) and gr(g  ) are large enough, then α(g, g  ) ≈ ρ(g, g  ) = gg g The avalanche principle can be formulated as follows. Let 0 < κ  ε2 . Given a sequence of matrices g0 , g1 , . . . , gn−1 ∈ Mat(m, R), if for all indices i (angles) (gaps)

gi gi−1  ≥ε gi gi−1  gr(gi ) ≥ κ −1 ,

© Atlantis Press and the author(s) 2016 P. Duarte and S. Klein, Lyapunov Exponents of Linear Cocycles, Atlantis Studies in Dynamical Systems 3, DOI 10.2991/978-94-6239-124-6_1

1

2

1 Introduction

then the expansion rift behaves almost multiplicatively: ρ(g0 , . . . , gn−1 )  ρ(g0 , g1 ) . . . ρ(gn−2 , gn−1 ) , or equivalently gn−1 . . . g1 g0  

g1 g0  . . . gn−1 gn−2  . g1  . . . gn−2 

(1.1)

Moreover, the most expanding direction of the matrix product gn−1 . . . g1 g0 is close to the most expanding direction of the first matrix g0 , that is: d(v(gn−1 . . . g1 g0 ), v(g0 )) 

κ . ε

(1.2)

Let (X, μ, T ) be an ergodic dynamical system, and let A : X → Mat(m, R) be a linear cocycle over this base dynamics. We denote by A(n) (x) its nth iterate and refer to it as a block of length n. By Kingman’s ergodic theorem, the Lyapunov exponents of A, denoted by L k (A), can be characterized as the almost everywhere limits 1 log sk (A(n) (x)). n→∞ n

L k (A) = lim

The maximal Lyapunov exponent L 1 (A) corresponds to the largest singular value growth rate of the iterates of A. Many of the arguments in this monograph involve an inductive procedure, where we take a large block A(n) (x), corresponding to a time scale n, and divide it up into smaller blocks of approximately the same lengths. We transfer measurements from the smaller blocks to the larger ones and repeat this procedure (Fig. 1.1).

Fig. 1.1 Blocks of different time scales in an inductive procedure

1.1 Prologue

3

This inductive procedure is based on the AP, where the individual matrices gi correspond to the smaller blocks and the product matrix gn−1 . . . g1 g0 corresponds to the larger block. We illustrate this approach with a sketch of a new proof of the Oseledets Multiplicative Ergodic Theorem. To start off, we show that if L 1 (A) > L 2 (A), then the most expanding direction v(A(n) (x)) of the iterates A(n) (x) of the cocycle is well defined and it converges for almost every phase x. To do that, it is enough to prove that except for a set of phases of arbitrarily small measure, and for n n 0 1, we have d(v(A(n) (x)), v(A(n 0 ) (x)))  1 .

(1.3)

By Kingman’s ergodic theorem for a.e. x ∈ X we have A(n+m) (x) 1 log = L 1 (A) − L 1 (A) = 0, n,m→+∞ n + m A(n) (x) A(m) (T n x) 1 log gr(A(n) (x)) = L 1 (A) − L 2 (A) > 0 . lim n→+∞ n lim

This means that asymptotically, the gap dominates the angle almost surely. Hence for any large enough finite time scale, this will happen outside a set B of phases with arbitrarily small measure. Using Birkhoff’s ergodic theorem, we have that for long enough orbits, the iterates T m x of a base point x visit the set X \ B frequently, hence these visits can be chosen to be well distributed throughout the orbit. We refer to such times m as avalanche times of the base point x. Therefore, if we consider a long block of length n 1, we may divide it up into smaller blocks, bounded between consecutive avalanche times, with lengths of order n 0  n μ(B).   Two consecutive blocks of this kind have the form A(m −m) (T m x) and A(m−m )  (T m x), where m  , m, m  are consecutive avalanche times (Fig. 1.2). By construction, the gap and angle assumptions of the AP hold uniformly for all such consecutive blocks. The AP is then applicable, and (1.2) implies d(v(A(n) (x)), v(A(n 0 ) (x)))  1.

Fig. 1.2 Consecutive blocks between avalanche times

4

1 Introduction

If n 0  n 0 , then (1.3) is proven. Otherwise, n 0 n 0 , and we repeat the procedure above by further dividing up the the long block A(n 0 ) (x) of length n 0 , and so on. The errors accumulated at each step form a convergent series whose sum is arbitrarily small, provided we choose the initial scale n 0 to be large enough. We may now define the μ-almost everywhere limit v(∞) (A)(x) = lim v(A(n) (x)), n→∞

and refer to it as the most expanding direction of A. For simplicity, let us assume that the Lyapunov exponents of A have the following gap pattern: L 1 (A) = L 2 (A) > L 3 (A) > L 4 (A) = · · · = L m (A). By taking exterior powers we can extend the notion of most expanding direction of A and introduce the most expanding flag of A as the measurable filtration V2 (x) ⊂ V3 (x) ⊂ Rm , where for i = 2, 3, the subspace Vi (x) has dimension i and it contains the i most expanding directions of A. The orthogonal complements V3 (x)⊥ ⊂ V2 (x)⊥ ⊂ Rm of these subspaces contain the least expanding directions of the cocycle A. They determine another filtration which is (T, A)-invariant and which turns out to be exactly the Oseledets filtration. More could be said about both Lyapunov exponents and the Oseledets filtration if instead of the almost sure convergence of n1 logA(n) (x) to L 1 (A) provided by Kingman’s ergodic theorem, a stronger, quantitative type of convergence in measure were available. Enter the large deviation type estimates, i.e. estimates of the form 1  μ {x ∈ X :  logA(n) (x) − L 1 (A) > ε} < ι(n, ε) n

(1.4)

where ι(n, ε) → 0 fast as n → ∞. Using conclusion (1.1) of the AP in the inductive procedure described before, and after taking logarithms, we get that the finite scale quantity n1 logA(n) (x) is close to certain Birkhoff averages of similar quantities at scales of order n 0  n, off of a very small set of phases B. Moreover, (1.4) gives us a precise quantitative estimate on the exceptional set B. This in turn leads to a quantitative description of the convergence of the finite scale quantities (Lyapunov exponents and Oseledets filtrations) to their limits, which implies a modulus (speed) of convergence. If the LDT estimates (1.4) hold uniformly in a neighborhood of the cocycle A, then the speed of convergence to the LE and the Oseledets filtration are also uniform in a neighborhood of A. Since for a fixed initial finite scale, these quantities behave continuously, we can transfer the continuity property to the limits.

1.1 Prologue

5

Assuming good behavior on average of the relevant quantities at an initial scale, the role of the LDT estimates in this scheme is to provide a large enough set of phases for which a successful, long enough run of the AP can be guaranteed. This then leads to a good behavior at the next scale for many phases, hence on average.

1.2 The Main Concepts Given a probability space (X, F, μ), a measure preserving transformation is an Fmeasurable map T : X → X such that μ(T −1 (A)) = μ(A),

for all A ∈ F .

A measure preserving dynamical system (MPDS) is any triple (X, μ, T ) where (X, μ) is a probability space (the σ -field F is implicit to X ) and T : X → X is a measure preserving transformation. We refer to elements of X as phases. The sequence of iterates {T n x}n≥0 is called the orbit of the phase x. Definition 1.1 We say that (X, μ, T ) is ergodic if there is no T -invariant measurable set A = T −1 (A) such that 0 < μ(A) < 1. Definition 1.2 We say that (X, μ, T ) is mixing when for all A, B ∈ F, lim μ(A ∩ T −n (B)) = μ(A) μ(B) .

n→+∞

Mixing MPDS are always ergodic, but the converse is not true in general. The d-torus Td = (R/Z)d with its normalized Haar measure μ on the σ -field F of Borel sets determines a probability space (Td , F, μ). We mention a few classes of MPDS on the torus. Example 1.1 (toral translations) Given ω ∈ Rd , the translation map T : Td → Td , T x := x + ω (mod 1), preserves the Haar measure μ. This MPDS is ergodic iff the components of ω are rationally independent. Toral translations are never mixing. Example 1.2 (toral endomorphisms) Given a matrix M ∈ GL(d, Z), the endomorphism T : Td → Td , T x := M x (mod 1), preserves the Haar measure μ. The endomorphism T is ergodic iff the spectrum of M does not contain any root of unity. Ergodic toral automorphism are always mixing. The compositions of a toral endomorphisms with a translation is called an affine endomorphism. This provides another class of MPDS on the torus. See [32] for the characterization of the ergodic properties of affine endomorphisms. Let Σ be a compact metric space and consider the space of sequences X = Σ Z . The (two-sided) shift is the homeomorphism T : X → X defined by T x := {xn+1 }n∈Z for x = {xn }n∈Z .

6

1 Introduction

Example 1.3 (Bernoulli shifts) Given a probability measure μ ∈ Prob(Σ), the shift T : X → X preserves the product probability measure μZ . The MPDS (X, μZ , T ) is called a Bernoulli shift. Bernoulli shifts are ergodic and mixing. Let Σ be a compact metric space and let F be its Borel σ -field. A Markov or stochastic kernel on Σ is a function K : Σ × F → [0, 1] such that (1) for every x ∈ Σ, K (x, ·) ∈ Prob(Σ) is a probability measure in Σ, (2) for every A ∈ F, the function x → K (x, A) is F-measurable. The iterated Markov kernels are defined recursively, setting (a) K 1 = K ,  (b) K n+1 (x, A) = Σ K n (y, A) K (x, dy), for all n ≥ 1. Each power K n is itself a Markov kernel on Σ.  A probability measure μ on Σ is called K -stationary if for all A ∈ F , μ(A) = K (x, A) μ(d x). Given a stochastic kernel K on Σ and a K -stationary probability measure μ, there exists a unique probability P on the sequence space X = Σ Z such that the stochastic process {en : X → Σ}n≥0 , en (x) := xn , has initial distribution μ and transition kernel K , i.e., for all x ∈ Σ and A ∈ F, 1. P[ e0 ∈ A ] = μ(A), 2. P[ en ∈ A | en−1 = x ] = K (x, A). Example 1.4 (Markov shifts) The two-sided shift T : X → X preserves the probability measure P. The MPDS (X, P, T ) is called a Markov shift. When Σ is finite, the Markov shift is ergodic iff for all x, y ∈ Σ there exists m = m(x, y) ∈ N such that K m (x, y) > 0, and it is mixing iff there exists m ∈ N such that for all x, y ∈ Σ, K m (x, y) > 0. Definition 1.3 We say that (K , μ) is strongly mixing if there are constants C < ∞ and 0 < ρ < 1 such that for every f ∈ L ∞ (Σ), all x ∈ Σ and n ∈ N,   

 Σ

f (y) K n (x, dy) −

Σ

 f (y) μ(dy) ≤ C ρ n  f ∞ .

If (K , μ) is strongly mixing then the Markov shift (X, P, T ) is mixing. When Σ is finite, this strong mixing property of (K , μ) is actually equivalent to the the mixing property of (X, P, T ). These relations are discussed in Chap. 5. Given a probability space (X, μ), we denote by L 1 (X, μ) the space of measurable functions ϕ : X → C with finite first moment  Eμ (|ϕ|) := |ϕ| dμ < +∞ . X

These functions will be called observables. A simplified version of Birkhoff’s ergodic theorem reads as follows:

1.2 The Main Concepts

7

Theorem 1.1 Given an ergodic MPDS (X, μ, T ), for any observable ϕ and for μ almost every point x ∈ X ,  n−1 1  ϕ(T j x) = ϕ dμ . n→+∞ n X j=0 lim

Kingman’s Ergodic Theorem below generalizes Birkhoff’s ergodic theorem. Theorem 1.2 Let (X, μ, T ) be an ergodic MPDS. Given a sequence of measurable functions f n : X → [−∞, ∞] such that f 1+ ∈ L 1 (X, μ) and f n+m ≤ f n + f m ◦ T n the sequence {



for all n, m ≥ 0 ,

f n dμ}n≥0 is sub-additive, i.e., 

 f n+m dμ ≤

 f n dμ +

f m dμ for all n, m ≥ 0 ,

and for μ-a.e. x ∈ X , we have 1 1 f n (x) = lim n→∞ n n→∞ n lim



1 n≥1 n

f n dμ = inf

 f n dμ ∈ [−∞, +∞) .

Let Gr(Rm ) denote the set of all linear subspaces of Rm . This space has a natural topology (see Chap. 2). It is a compact Riemannian manifold, called the Grassmannian of the Euclidean space Rm . For each 0 ≤ k ≤ m we define Gr k (Rm ) := { E ∈ Gr(Rm ) : dim E = k } . These subsets are precisely the connected components of the Grassmannian Gr(Rm ). A subset B ⊆ X × Rm is called a measurable bundle over X when there exists a measurable function E : X → Gr(Rm ) such that B = { (x, v) ∈ X × Rm : v ∈ E(x) } . When E : X → Gr(Rm ) takes values in Gr k (Rm ) we say that B has constant dimension k. The set X is called the base of B, and the linear subspace B(x) := E(x) is called the fiber at the base point x ∈ X . To each measurable bundle B we associate the projection onto the base πB : B → X , πB (x, v) = x. The set B = X × Rm is a measurable bundle over X , called a trivial bundle. Given two measurable bundles B and B  over X , we say that B is a sub-bundle of B  when B ⊂ B  . By definition, every measurable bundle is a sub-bundle of some trivial bundle. Consider an MPDS (X, μ, T ) and a measurable bundle B over X .

8

1 Introduction

Definition 1.4 We call linear cocycle on B over (X, μ, T ) any measurable map FA : B → B defined by a measurable family of linear maps A(x) : B(x) → B(T x) through the expression FA (x, v) := (T x, A(x)v). We call the linear maps A(x) the fiber action and the transformation T the base dynamics of the linear cocycle FA . We identify FA with the pair (T, A), or simply with A when T is fixed. When B = X × Rm is a trivial bundle, a linear cocycle on B over (X, μ, T ) is determined by a measurable function A : X → Mat(m, R). Except for the more technical purposes in Chap. 4, we will always consider linear cocycles on trivial bundles. A cocycle A is said to be μ-integrable if 

log+ A(x) μ(d x) < +∞ . X

Given a cocycle A on X ×Rm over an MPDS (X, μ, T ), a sub-bundle B ⊂ X ×Rm is called A-invariant when for μ almost every x ∈ X , A(x) B(x) ⊂ B(T x). When B is A-invariant, the restriction of FA : X × Rm → X × Rm to B is a linear cocycle on B over (X, μ, T ). The forward iterates FAn of a linear cocycle FA : B → B are given by FAn (x, v) = n (T x, A(n) (x)v), where A(n) (x) := A(T n−1 x) . . . A(T x) A(x) (n ∈ N) . When the base map T : X → X is invertible, we define backward iterates, even if the linear maps A(x) are non-invertible, setting A(−n) (x) := A(n) (T −n x)+ (n ∈ N) , where M + denotes the pseudo-inverse of a linear map M (see Definition 4.10). Definition 1.5 A quasi-periodic cocycle is any cocycle A : Td → Mat(m, R) over an ergodic torus translation T : Td → Td . If T x := x + ω (mod 1) then ω ∈ Rd is called the frequency vector of the cocycle. Definition 1.6 Let Σ be a compact metric space and let μ ∈ Prob(Σ) be a probability measure on Σ. A random Bernoulli cocycle is any cocycle A : X → Mat(m, R) over the Bernoulli shift (X, μZ , T ), where X = Σ Z is the space of sequences in Σ, and the function A depends only on the first coordinate x0 . Definition 1.7 Let Σ be a compact metric space, K a Markov kernel on Σ, μ a K stationary probability measure on Σ and let P be the associated probability measure on the space of sequences X = Σ Z . A random Markov cocycle is any cocycle A : X → Mat(m, R) over the Markov shift (X, P, T ), where the function A depends only on the coordinates x0 and x1 .

1.2 The Main Concepts

9

Consider a measurable bundle B of constant dimension m. A measurable filtration of B is any finite sequence of sub-bundles B = B1 ⊃ B2 ⊃ · · · ⊃ Bk , with constant decreasing dimensions. Given a linear cocycle FA : B → B, we say that a measurable filtration of B is A-invariant if all the sub-bundles of the filtration are A-invariant. The first Multiplicative Ergodic Theorem (MET) of Oseledets [23] describes the fiber asymptotic behavior of a linear cocycle over a possibly non-invertible ergodic MPDS. Theorem 1.3 (MET I) Let (X, μ, T ) be an ergodic MPDS and FA : B → B be a μ-integrable linear cocycle. There exist numbers λ1 > λ2 > · · · > λk ≥ −∞ and a measurable filtration B = B1 ⊃ B2 ⊃ · · · ⊃ Bk ⊃ Bk+1 = X × {0} invariant under the cocycle such that for μ-almost every x ∈ X and every v ∈ B j (x)\B j+1 (x), 1 logA(n) (x) v = λ j . n→+∞ n lim

The second MET of Oseledets [23] gives a more precise description of the fiber asymptotic behavior of a linear cocycle over an invertible ergodic MPDS. Theorem 1.4 (MET I) Let (X, μ, T ) be an invertible and ergodic MPDS, and let FA : B → B be a μ-integrable linear cocycle. Then there exist numbers λ1 > λ2 > · · · > λk+1 ≥ −∞ and a family of measurable sub-bundles B j ⊂ B, 1 ≤ j ≤ k +1, such that for μ-almost every x ∈ X , (a) B(x) = ⊕k+1 j=1 B j (x), (b) each sub-bundle B j is A-invariant, 1 logA(n) (x) v = λ j , (c) for every v ∈ B j (x) \ {0}, lim n→±∞ n   1 (d) lim logsin min (⊕ j≤l B j (T n x), ⊕ j>l B j (T n x)) = 0, for any l = 2, . . . , k. n→±∞ n Let (X, μ, T ) be an ergodic MPDS. The numbers λ j in both Oseledets theorems are called the distinct Lyapunov exponents of the linear cocycle. Because norms behave sub-multiplicatively with matrix products, the sequence of functions f n (x) := logA(n) (x) is sub-additive, in the sense that f n+m ≤ f n + f m ◦ T n

for all

n, m ∈ N.

Hence, by Kingman’s ergodic theorem the following limit exists for μ-a.e. x ∈ X , L 1 (A) := lim

n→+∞

1 logA(n) (x). n

This limit is exactly the first Lyapunov exponent λ1 in both versions of the MET. Using exterior powers of the cocycle A, the remaining Lyapunov exponents can be expressed in terms of similar limits. Again by Kingman’s ergodic theorem the following limits exist for all 1 ≤ j ≤ m, and μ-a.e. x ∈ X ,

10

1 Introduction j 1 1 log∧ j A(n) (x) = lim log sk (A(n) (x)) . n→+∞ n n→+∞ n k=1

Λ j (A) = lim

where ∧ j g denotes the jth exterior power of g ∈ Mat(m, R). The Lyapunov exponents of the cocycle A can then be characterized by L j (A) = Λ j (A) − Λ j−1 (A) = lim

n→+∞

1 log s j (A(n) (x)) , n

with the convention that Λ0 (A) = 0. The Lyapunov spectrum of a cocycle A is the sequence of its Lyapunov exponents: L 1 (A) ≥ L 2 (A) ≥ · · · ≥ L m (A) ≥ −∞ . These numbers are called the Lyapunov exponents of A.  repeated 1 (n) We define L (n) 1 (A) := X n logA  dμ and call it the (maximal) finite scale Lyapunov exponent of A. By Kingman’s ergodic theorem L 1 (A) = limn→∞ L (n) 1 (A). All previous concepts correspond to basic notions in theory of linear cocycles. The following concepts are specifically targeted to the statement of the main results of this monograph. Definition 1.8 A space of measurable cocycles C is a collection {(Cm , dist)}m∈N of metric spaces such that: (1) Each Cm is a space of F-measurable functions A : X → Mat m (R) with uniformly bounded norm, i.e., A ∈ L ∞ (X, μ). (2) For all A ∈ Cm , we have ∧k A ∈ C(mk) , and the map Cm → C(mk) , A → ∧k A is locally Lipschitz. (3) For all A, B ∈ Cm , dist(B, A) ≥ B − A L ∞ . By A ∈ C we mean that A ∈ Cm for some m ∈ N. Definition 1.9 Let 1 < p ≤ ∞. A cocycle A ∈ C is called L p -bounded if there is C < ∞, which we call its L p -bound, such that for all n ≥ 1 we have:  1    logA(n)  p < C . L n A cocycle A ∈ Cm is called uniformly L p - bounded if there are δ = δ(A) > 0 and C = C(A) < ∞ such that for all B ∈ Cm with dist(B, A) < δ and all n ≥ 1 we have:  1    logB (n)  p < C . L n Given a cocycle A ∈ C and an integer N ∈ N, denote by F N (A) the lattice (w.r.t. union and intersection) generated by the sets {x ∈ X : A(n) (x) ≤ c} or {x ∈ X : A(n) (x) ≥ c} where c ≥ 0 and 0 ≤ n ≤ N .

1.2 The Main Concepts

11

Let Ξ be a set of observables ξ : X → R. Definition 1.10 We say that Ξ and A are compatible if for every integer N ∈ N, for every set F ∈ F N (A) and for every ε > 0, there is an observable ξ ∈ Ξ such that:  1 F ≤ ξ and

ξ dμ ≤ μ(F) + ε .

(1.5)

X

Definition 1.11 Let A ∈ Cm and 1 ≤ k ≤ m such that L k (A) > L k+1 (A). We define E k− (A) : X → Gr m−k (Rm ) as the measurable component of the Oseledets filtration of A corresponding to the Lyapunov exponents ≤ L k+1 (A). If T is invertible, we define E k+ (A) : X → Gr k (Rm ) to be the direct sum of the components of the Oseledets decomposition corresponding to Lyapunov exponents ≥ L k (A).

1.3 The Continuity Problem Let X be a phase space and let F be a σ -algebra on X . A linear cocycle is determined by a probability measure μ on (X, F ), an ergodic transformation T : X → X that preserves μ, and a fiber action given by a measurable function A : X → Mat(m, R) for some m ≥ 1. Therefore, the Lyapunov exponents (as well as the Oseledets filtration and decomposition) may be regarded as functions of one of these input data, individually or jointly. Understanding the behavior of the LE as the input data is perturbed is considered a difficult problem for most systems. Lyapunov exponents are known to be upper semicontinuous functions in very general settings. However, their continuity depends on the space of cocycles and its topology, and in some sense is rare (see [5]), unless strong regularity assumptions on the cocycles are made. In this monograph, the probability measure μ and the transformation T are fixed, and we identify a linear cocycle with the function A : X → Mat(m, R) defining its fiber action. We study the continuity properties of the Lyapunov exponents and of the Oseledets filtration and decomposition as the function A varies in an appropriate space which is endowed with a topology at least as fine as the uniform topology. The Oseledets filtration and decomposition are in general only measurable in the phase x. Their continuity as A varies is then understood in an average in x sense, or in probability. Moreover, we are interested in obtaining quantitative continuity properties, i.e. an explicit modulus of continuity. We establish such continuity properties for any systems satisfying appropriate large deviation type estimates, which are defined in the next section. Our main applications are to various types of random and quasi-periodic cocycles, for which the relevant LDT estimates will be derived. A related theme is that of joint continuity in various input data, e.g. in translation vector (which determines the base dynamics) and fiber action function for quasiperiodic cocycles, or in probability measure and fiber action function for i.i.d random

12

1 Introduction

cocycles. We will not study these types of problems in this monograph, but we will discuss them briefly in the last chapter.

1.4 Large Deviations Type Estimates In probability theory and harmonic analysis there are several inequalities describing the deviation of a function from its mean. The most basic result of this kind is Chebyshev’s inequality. We formulate it in its exponential form. For any t, λ > 0 and for any random variable X ,   P[  X − E(X ) ≥ λ ] ≤ e−λ t E[et |X −E(X )| ] .

(1.6)

A fundamental result in harmonic analysis, concerning functions of bounded mean oscillation (BMO), is John-Nirenberg’s inequality. Given f ∈ L 1 (T) let    f BMO := sup  f −  f  I  I , I

 where the sup is taken over all intervals I ⊂ T and  f  I = |I1| I f . There is a universal constant c > 0 such that if  f BMO < +∞, then for all λ > 0,   

  x ∈ T :  f −  f T  ≥ λ  ≤ e−c λ/ f BMO .

(1.7)

Let X 0 , X 1 , X 2 , . . . be a real valued random process and denote by Sn = n−1 j=0 X j the corresponding sum process. Tail events of this process correspond to the deviation of its averages n1 Sn from their means E( n1 Sn ). There are several types of large deviation inequalities describing tail events, such as Chernoff bounds (see [28]), which we formulate for a random i.i.d. process {X n }:  1 2 2 P[  Sn − μ ≥ λ ] < C max{e−(c λ /σ ) n , e−(λ/K ) n } n

(1.8)

for some universal constants C < ∞ and c > 0, where μ = E(X 0 ), σ 2 = var(X 0 ) and K = X 0 ∞ . The asymptotic behavior of tail events forms the subject of the theory of large deviations (see [25]). A classical result in this theory is the following theorem due to H. Cramér. Theorem 1.5 If the random process {X n } is i.i.d. with mean μ = E(X 0 ) and finite moment generating function M(t) := E[et X 0 ] < +∞ for all t > 0, then

 1 1   lim log P Sn − μ > ε = −I (ε) n→+∞ n n where I (ε) := supt>0 (t ε − log M(t) + t μ) is called the rate function.

1.4 Large Deviations Type Estimates

13

We now give a general formulation of the large deviation principle (see [25]). Given an increasing sequence of integers {rn } and a lower semi-continuous function I : R → [0, +∞), we say that the random process {X n } satisfies a large deviation principle with normalizing sequence {rn } and rate function I , if for any closed set F ⊂ R,

1 1 Sn ∈ F ≤ − inf I (x) , log P lim sup x∈F n n→+∞ r n and for any open set G ⊂ R, lim inf n→+∞



1 1 Sn ∈ G ≥ − inf I (x) . log P x∈G rn n

We note that the large deviation principle holds under the assumptions of Theorem 1.5 with rn = n and the rate function specified in that theorem. Other large deviation principles, including for Markov processes, can be found in [30]. Given a dynamical system (X, μ, T ), any observable ξ : X → R determines the random process X n = ξ ◦ T n . Let ξ  = X ξ dμ be the mean of this random j process, i.e., the space average of the observable, and let Sn ξ := n−1 j=0 ξ ◦ T be the corresponding sum process, i.e., the usual Birkhoff sums. There are many results regarding large deviations for dynamical systems (see for instance [17, 21, 26, 33]). Given a linear cocycle A : X → Mat(m, R) over the dynamical system (X, μ, T ), we say that A satisfies a large deviation principle with rate function I (ε) if 1 log μ n→+∞ n lim



 1  logA(n)  − L 1 (A) > ε n

 = −I (ε) .

Many large deviation principles, as well as other limit theorems, have been developed both in the context of random Bernoulli cocycles [8, 20, 29], and that of random Markov cocycles [7]. The large deviation principle for additive processes (Birkhoff sums of some given observable), and respectively for sub-additive processes (products of matrix valued random processes), are asymptotic results. Our study of continuity properties of Lyapunov exponents of linear cocycles does not require asymptotic statements, but only good upper bounds on the measure of tail events, for the random processes given by the base and fiber dynamics. We call these bounds large deviation type (LDT) estimates. To describe these LDT estimates we introduce the following formalism. From now on, ε, ι : (0, ∞) → (0, ∞) will represent functions that describe respectively, the size of the deviation from the mean and the measure of the deviation set. Definition 1.12 Let 1 < p ≤ ∞ and let E and I be two families of functions. A set P = P( p) of LDT parameters (with constant p) is a collection of triplets in N × E × I such that (1) The set E consists of non-increasing functions ε : (0, ∞) → (0, ∞).

14

1 Introduction

(2) The set I is a convex cone consisting of continuous functions ι : (0, ∞) → (0, ∞) such that for every ι ∈ I, (a) ι(t) is strictly decreasing and limt→+∞ ι(t) = 0, (b) e−c0 t < ι(t) < t −10 as t → ∞, for some constant c0 > 0, φι (2t) (c) lim < 2, where φι (t) denotes the inverse of t → ψι (t) := t→∞ φι (t) p−1

t ι(t)− 2 p−1 . (3) For all ε > 0 there exists p = (n 0 , ε, ι) ∈ P such that ε(n 0 ) ≤ ε. We will use the notation εn := ε(n) and ιn := ι(n) for integers n. Condition (2)(b) says that any ι(t) ∈ I decreases to 0, as t → ∞, at least like a power and at most like an exponential. Condition (2)(c) imposes a not too fast growth on the inverse function φι (t). In our applications of the main results, the constant p = 2 or p = ∞ and the set of LDT parameters P = N × E × I, where E and I are sets of functions of one of the following types. Example 1.5 The set E consists either of constant functions ε(t) ≡ ε, with 0 < ε < 1, or else of powers ε(t) ≡ C t −a for some C, a > 0. The set I consists of functions of one of the following kinds: (a) exponentials ι(t) ≡ M e−c t , b (b) sub-exponentials ι(t) ≡ M e−c t b (c) nearly exponentials ι(t) ≡ M e−c t/(log t) , where M < ∞ and c, b > 0. We now define the base and fiber LDT estimates. Definition 1.13 An observable ξ : X → R satisfies a base-LDT estimate w.r.t. a space of parameters P if for every ε > 0 there is (n 0 , ε, ι) ∈ P such that for all n ≥ n 0 we have εn ≤ ε and ⎧ ⎨

⎫  n−1 ⎬ 1   μ x ∈ X:  ξ(T j x) − ξ dμ > εn < ιn . ⎩ ⎭ n j=0 X Definition 1.14 A measurable cocycle A : X → Mat(m, R) satisfies a fiber-LDT estimate w.r.t. a space of parameters P if for every ε > 0 there is (n 0 , ε, ι) ∈ P such that for all n ≥ n 0 we have εn ≤ ε and   1 (n) (n)   logA (x) − L 1 (A) > εn < ιn . μ x ∈ X: n 

1.4 Large Deviations Type Estimates

15

We use LDT estimates to prove continuity of the LE as functions of the cocycle, where the space of cocycles is endowed with a distance. For this we need a stronger form of the fiber-LDT, one that is uniform in a neighborhood of the cocycle, in the sense that the estimate above holds with the same LDT parameter for all nearby cocycles. Definition 1.15 A measurable cocycle A satisfies a uniform fiber-LDT if for all ε > 0 there are δ > 0 and (n 0 , ε, ι) ∈ P such that if B is a measurable cocycle with dist(B, A) < δ and if n ≥ n 0 then εn ≤ ε and   1   > εn < ιn . μ x ∈ X :  logB (n) (x) − L (n) (B) 1 n We finish remarking that the known large deviation principles, in the context of random cocycles (see [7, 8, 15]), do not provide uniform fiber LDT estimates. Hence they can not be used directly in our scheme for proving continuity of Lyapunov exponents. However, the spectral theory approach used to prove them is enough to derive uniform fiber LDT estimates. Proving base and fiber (uniform) LDT estimates for quasi-periodic cocycles uses harmonic analysis and potential theory tools, along with the arithmetic properties of the torus translation.

1.5 Summary of Results The main result of this monograph is an Abstract Continuity Theorem (ACT). Theorem 1.6 Consider an ergodic MPDS (X, μ, T ), a space of measurable cocycles C , a set of observables Ξ , a constant 1 < p ≤ ∞, a set of LDT parameters P = P( p) with corresponding spaces of deviation functions E, I and assume the following: 1. 2. 3. 4.

Ξ is compatible with every cocycle A ∈ C . Every observable ξ ∈ Ξ satisfies a base-LDT w.r.t. P. Every A ∈ C with L 1 (A) > −∞ is uniformly L p -bounded. Every cocycle A ∈ C with L 1 (A) > L 2 (A) satisfies a uniform fiber-LDT w.r.t. P.

Then every Lyapunov exponent L k : Cm → [−∞, ∞), 1 ≤ k ≤ m, is continuous. Moreover, if for A ∈ Cm and 1 ≤ k ≤ m we have L k (A) > L k+1 (A), then there p−1 exists a neighborhood V of A, and a modulus of continuity ω(h) := [ι (c log h1 )] 2 p−1 with ι = ι(A) ∈ I and c = c(A) > 0 such that: (a) the map V  B → Λk (B)= (L 1 +· · ·+ L k )(B) ∈ R has modulus of continuity ω, i.e., Λk (B1 ) − Λk (B2 ) ≤ ω(dist(B1 , B2 )), for all B1 , B2 ∈ V;

16

1 Introduction

(b) If T : X → X is invertible then for some α = α(A) > 0 and for all B1 , B2 ∈ V, μ {x ∈ X : d(E k± (B1 )(x), E k± (B2 )(x)) > dist(B1 , B2 )α } < ω(dist(B1 , B2 )) , where d refers to the distance on the Grassmann manifold. This theorem is proved in Chaps. 3 and 4. Item (a) and the general continuity statement are proved in Theorem 3.1. Item (b) follows from Theorems 4.7 and 4.8. We present two applications of the ACT, to random and to quasi-periodic cocycles. For each of these models we specify the space of cocycles C = {(Cm , dist)}m∈N and the sets of deviation functions E and I. We begin with the random Markov application of the ACT, studied in Chap. 5. Let (X = Σ Z , P, T ) be a Markov shift, where Σ is a compact metric space, and the probability P on X is determined by a Markov kernel K on Σ and by a K -stationary probability measure μ. Define the space B∞ m (K ) of measurable functions A : Σ × Σ → GL(m, R) such that A and A−1 are uniformly bounded, i.e., A∞ < +∞ and A−1 ∞ < +∞. We endow this space with the metric d∞ (A, B) = A − B∞ . ˆ Each A ∈ B∞ m (K ) determines the random Markov cocyle A : X → GL(m, R), ˆ ˆ defined by A(x) := A(x0 , x1 ). We will make the notational identification A = A. Definition 1.16 A measurable section V : Σ → Gr(Rm ) is called A-invariant when A(xn−1 , xn ) V (xn−1 ) = V (xn ) for Pμ -a.e. x = (xn )n∈Z . Assuming (K , μ) is strongly mixing, the ergodicity of this Markov kernel implies that the subspaces V (x) have constant dimension μ-a.e., denoted by dim(V ). We say that this family is proper if 0 < dim(V ) < d. We now introduce the concepts of irreducible and totally irreducible cocycle. Definition 1.17 A cocycle A ∈ B∞ m (K ) is called irreducible w.r.t. (K , μ) if it admits no measurable proper A-invariant section V : Σ → Gr(Rm ). A cocycle A ∈ B∞ m (K ) is called totally irreducible w.r.t. (K , μ) if the exterior powers ∧k A are irreducible for all 1 ≤ k ≤ m − 1. We denote by Im∞ (K ) the subspace of totally irreducible cocycles in B∞ m (K ). This set is open (see Proposition 5.3). In this application we consider the family C = {(Im∞ (K ), d∞ )}m∈N as the space of measurable cocycles, the set E of constant deviation functions ε(t) ≡ ε, with 0 < ε < 1, and the set of exponential functions I = { ι(t) ≡ M e−c t : M < ∞, c > 0 } to measure the deviation sets. We formulate the continuity statements, which follow from Theorem 5.1. Theorem 1.7 If (K , μ) is strongly mixing then all conclusions of Theorem 1.6 hold on the space of measurable cocycles C = {(Im∞ (K ), d∞ )}m∈N with a Hölder modulus of continuity.

1.5 Summary of Results

17

The main ingredient in this application is the following fiber uniform LDT estimate of exponential type (see Theorem 5.3). Theorem 1.8 Given a Markov kernel on the compact metric space Σ, a K stationary measure μ and A ∈ B∞ m (K ), assume (1) (K , μ) is strongly mixing, (2) A is irreducible, (3) L 1 (A) > L 2 (A). Then there exists a neighborhood V of A in B∞ m (K ) and there exist C < ∞, k > 0 and ε0 > 0 such that for all 0 < ε < ε0 , B ∈ V and n ∈ N,

 1 2 Pμ  logB (n)  − L 1 (B) > ε ≤ C e−k ε n . n In Chap. 6 we apply the ACT to the space of quasi-periodic cocycles. Let (Td , μ, T ) be a torus translation with frequency vector ω ∈ Rd . We warn the reader that the letter ω denotes different quantities, hence its meaning should be inferred from the context. The letter μ denotes the Haar measure on Td . Let Ard := Ar ×· · ·×Ar , where Ar is the strip of width 2r around T. We denote by ω Cr (Td , Mat(m, R)) the space of real analytic functions A : Td → Mat(m, R) with a holomorphic extension to Ard , which are continuous up to the boundary. Endowed with the norm Ar := supz∈Ard A(z), this becomes a Banach space. Consider the space of measurable cocycles C = {(Cm , dist)}m∈N , where Cm is the set of cocycles A ∈ Crω (Td , Mat(m, R)) with det[A(x)] ≡ 0, which we call non identically singular cocycles. We consider the following distance on this space dist(A, B) := A − Br . Let E be the set of powers ε(t) ≡ C t −a for some C, a > 0, b and I be the set of sub-exponential functions ι(t) ≡ M e−c t , with M < ∞ and 0 < b < 1. Definition 1.18 We call weak-Hölder any modulus of continuity of the form ω(h) := M e−c (log(1/ h))

b

for some constants M, c and b as above. Note that if b = 1 then we have ω(h) = M h c , which corresponds to the usual modulus of Hölder continuity. Fix any positive number δ0 > 0. Given t > 0, we denote by DCt the set of frequency vectors ω ∈ Rd satisfying the following Diophantine condition: k · ω ≥

t |k|

d+δ0

for all k ∈ Zd \ {0} ,

  where for any real number x we write x := mink∈Z x − k . We formulate the continuity statement, which follows from Theorem 6.1.

18

1 Introduction

Theorem 1.9 Consider the above space of measurable cocycles C consisting of non-identically singular analytic quasi-periodic cocycles A : Td → Mat(m, R) over a torus translation (Td , μ, T ) with frequency vector ω ∈ Rd . If ω ∈ DCt for some t > 0 then all conclusions of Theorem 1.6 hold on the space of measurable cocycles C with a weak-Hölder modulus of continuity. As in the random setting, the key ingredient here is the following fiber uniform LDT estimate of sub-exponential type, proven in Theorem 6.2. Theorem 1.10 Given A ∈ Crω (Td , Mat(m, R)) with det[A(x)] ≡ 0 and ω ∈ DCt , there are constants δ = δ(A) > 0, k0 = k0 (A) ∈ N, C = C(A, r ) < ∞, a = a(d) > 0 and b = b(d) > 0 such that if B − Ar ≤ δ and n ≥ n 0 := t −2 k0 , then 

 1  b (n) (n) −a   μ x ∈T : < e−n . logB (x) − L (B) > C n n d

1.6 Literature Review The continuity of Lyapunov exponents for spaces of cocycles with low regularity was studied by Arbieto and Bochi [1], Bessa and Vilarinho [4] and by others. Since the results in this monograph are closer to the high regularity regime, we will focus our review on these types of models. There is a large amount of work dedicated to the case of analytic, quasi-periodic cocycles. Classic results on the subject are due to Goldstein and Schlag [14] and to Bourgain and Jitomirskaya [10]. These results refer to Schrödinger cocycles, as defined in Sect. 7.2, and continuity is understood relative to the energy parameter and/or the frequency. In [10], the authors prove joint continuity in the energy E and the frequency ω, at all points (E, ω) with ω irrational. In [14], for the one frequency case, assuming a strong Diophantine condition on the frequency, the authors prove a sharp fiber LDT estimate and establish the AP for SL(2, R) matrices. Based on these ingredients, they develop an inductive procedure that leads to Hölder continuity of the (top) Lyapunov exponent as a function of the energy E, under the assumption of a positive lower bound on the Lyapunov exponent. A similar approach is applied to the multifrequency Diophantine torus translation case, leading to weak-Hölder continuity of the Lyapunov exponent, the weaker modulus of continuity being due to a weaker version of the fiber LDT estimate available in this case. Extensions of the ideas and results in [14] to other related models were obtained in [11, 18, 19]. Bourgain proved in [9] joint continuity in energy and frequency for the multifrequency torus translation model. A higher dimensional version of the AP, along with a higher dimensional version of the result in [14], were obtained in [27] for Schrödinger-like cocycles, under the restrictive assumption that all Lyapunov exponents are simple. It was also indicated

1.6 Literature Review

19

in [27] that this method is in some sense modular, a statement that motivated in part our current work. With motivations that are both intrinsic and related to mathematical physics problems (e.g. spectral properties of Jacobi-type operators), the study of continuity properties of the Lyapunov exponents has been extended from Schrödinger cocycles to more general ones, including higher dimensional cocycles and/or cocycles with singularities. Each extension comes with significant technical challenges, requiring new methods. C. Marx and S. Jitomirskaya proved joint continuity in energy and frequency (one frequency case) for Mat(2, C)-valued analytic cocycles (see [16] and references therein). Using a different approach, A. Ávila, S. Jitomirskaya, C. Sadel extended this result to multidimensional (i.e. Mat(m, C)-valued) analytic cocycles (see [2]). We note that both results mentioned above ([2, 16]) treating one frequency torus translations, rely crucially on the convexity of the top Lyapunov exponent of the complexified cocycle as a function of the imaginary variable, by firstly establishing continuity away from the torus. This approach immediately breaks down in the multifrequency case. Our work in [12] presents a geometric, conceptual approach to the AP, which allows us to generalize it to higher dimensions, namely to blocks of GL(m, R) matrices, and further (see Chap. 2), to any blocks of nonzero matrices in Mat(m, R). We use this general AP in [12] to prove Hölder (or weak-Hölder for multifrequency translations) continuity of the Lyapunov exponents of GL(m, R)-valued analytic cocycles in a neighborhood of a cocycle with simple Lyapunov exponents. Moreover, continuity of all Lyapunov exponents (but without a modulus of continuity) holds everywhere, regardless of the multiplicity of the Lyapunov exponents. Our next goal was to handle cocycles with singularities (i.e. not necessarily GL(m, R)-valued), which, as explained in Chap. 6, is especially delicate in the multifrequency case. While unlike in [2, 16], we do require Diophantine translations, and the translation frequency is fixed, our method applies equally to translations on the one or the higher dimensional torus. At the other end of the type of ergodic behavior of the base dynamics—the random case, continuity results for linear coccycles over Bernoulli shifts in the generic case go back to Furstenberg and Kifer [13]. Le Page proved in [24] Hölder continuity of the top Lyapunov exponent for a one-parameter family of cocycles over the Bernoulli shift, under irreducibility and contraction assumptions, which are assumed to hold uniformly throughout this family. We are not aware of any generalization of this theorem of Le Page to irreducible cocycles over strongly mixing Markov shifts. Compared with [24], our result provides continuity of all Lyapunov exponents, regardless of the gaps in the Lyapunov spectrum and it holds in the space of all irreducible cocycles, not just for one-parameter families. It is also more general since we consider cocycles over mixing Markov shifts, and not just over the Bernoulli shifts. Moreover, we assume that our cocycles are locally constant, i.e. they depend on a finite number of coordinates, and not just on one coordinate.

20

1 Introduction

Bocker-Neto and Viana [6] proved continuity of the Lyapunov exponents for two-dimensional cocycles over Bernoulli shifts without any irreducibility assumptions. This result does not provide a modulus of continuity. A higher dimensional version of this result was announced by A. Ávila, A. Eskin and M. Viana (see the monograph [31]). An extension of results from [6] to a particular type of cocycles over Markov systems (particular in the sense that the cocycle still depends on one coordinate, as in the Bernoulli case) was obtained in [22]. Other related results were recently obtained in [3]. We note, for the interested reader, that a general one-stop reference for continuity results for random cocycles is Viana’s monograph [31].

References 1. A. Arbieto, J. Bochi, L p -generic cocycles have one-point Lyapunov spectrum. Stoch. Dyn. 3(1), 73–81 (2003). MR 1971187 (2004a:37063) 2. A. Ávila, S. Jitomirskaya, C. Sadel, Complex one-frequency cocycles. J. Eur. Math. Soc. (JEMS) 16(9), 1915–1935 (2014). MR 3273312 3. L. Backes, A.W. Brown, C. Butler, Continuity of Lyapunov exponents for cocycles with invariant holonomies, preprint (2015), 1–34 4. M. Bessa, H. Vilarinho, Fine properties of L p -cocycles which allow abundance of simple and trivial spectrum. J. Differ. Equ. 256(7), 2337–2367 (2014). MR 3160445 5. J. Bochi, Genericity of zero Lyapunov exponents. Ergodic Theor. Dynam. Syst. 22 (2002)(6), 1667–1696. MR 1944399 (2003m:37035) 6. C. Bocker-Neto, M. Viana, Continuity of Lyapunov exponents for random 2d matrices, preprint, to appear in Ergodic Theory and Dynamical Systems (2010), 1–38 7. P. Bougerol, Théorèmes limite pour les systèmes linéaires à coefficients markoviens. Probab. Theor. Relat. Fields 78(2), 193–221 (1988). MR 945109 (89i:60122) 8. P. Bougerol, J. Lacroix, Products of random matrices with applications to Schrödinger operators, in Progress in Probability and Statistics, vol. 8 (Birkhäuser Boston Inc, Boston, MA, 1985). MR 886674 (88f:60013) 9. J. Bourgain, Positivity and continuity of the Lyapounov exponent for shifts on Td with arbitrary frequency vector and real analytic potential. J. Anal. Math. 96, 313–355 (2005). MR 2177191 (2006i:47064) 10. J. Bourgain, S. Jitomirskaya, Continuity of the Lyapunov exponent for quasiperiodic operators with analytic potential. J. Statist. Phys. 108(5–6), 1203–1218 (2002). Dedicated to David Ruelle and Yasha Sinai on the occasion of their 65th birthdays. MR 1933451 (2004c:47073) 11. J. Bourgain, M. Goldstein, W. Schlag, Anderson localization for Schrödinger operators on Z with potentials given by the skew-shift. Comm. Math. Phys. 220(3), 583–621 (2001). MR 1843776 (2002g:81026) 12. P. Duarte, S. Klein, Continuity of the Lyapunov exponents for quasiperiodic cocycles. Comm. Math. Phys. 332(3), 1113–1166 (2014). MR 3262622 13. H. Furstenberg, Y. Kifer, Random matrix products and measures on projective spaces. Isr. J. Math. 46(1–2), 12–32 (1983). MR 727020 (85i:22010) 14. M. Goldstein, W. Schlag, Hölder continuity of the integrated density of states for quasi-periodic Schrödinger equations and averages of shifts of subharmonic functions. Ann. Math. (2) 154(1), 155–203 (2001). MR 1847592 (2002h:82055) 15. H. Hennion, L. Hervé, Limit Theorems for Markov chains and Stochastic Properties of Dynamical Systems by Quasi-Compactness, vol. 1766, Lecture notes in mathematics (Springer, Berlin, 2001)

References

21

16. S. Jitomirskaya, C.A. Marx, Analytic quasi-perodic cocycles with singularities and the Lyapunov exponent of extended Harper’s model. Comm. Math. Phys. 316(1), 237–267 (2012). MR 2989459 17. Y. Kifer, Large deviations in dynamical systems and stochastic processes. Trans. Am. Math. Soc. 321(2), 505–524 (1990). MR 1025756 (91e:60091) 18. S. Klein, Anderson localization for the discrete one-dimensional quasi-periodic Schrödinger operator with potential defined by a Gevrey-class function. J. Funct. Anal. 218(2), 255–292 (2005). MR 2108112 (2005m:82070) 19. S. Klein, Localization for quasiperiodic Schrödinger operators with multivariable Gevrey potential functions. J. Spectr. Theor. 4, 1–53 (2014) 20. É. Le Page, Théorèmes limites pour les produits de matrices aléatoires, Probability measures on groups (Oberwolfach, 1981), Lecture notes in mathematics, vol. 928 (Springer, Berlin-New York, 1982), pp. 258–303. MR 669072 (84d:60012) 21. A.O. Lopes, Entropy and large deviation, Nonlinearity 3(2), 527–546 (1990). MR 1054587 (91m:58092) 22. E. Malheiro, M. Viana, Lyapunov exponents of linear cocycles over Markov shifts, preprint (2014), 1–25 23. V.I. Oseledec, A multiplicative ergodic theorem. Characteristic Ljapunov, exponents of dynamical systems. Trudy Moskov. Mat. Obšˇc. 19, 179–210 (1968). MR 0240280 (39 #1629) 24. É. Le Page, Régularité du plus grand exposant caractéristique des produits de matrices aléatoires indépendantes et applications. Annales de l’institut Henri Poincaré (B) Probabilités et Statistiques 25, no. 2, 109–142 (1989) (fre) 25. F. Rassoul-Agha, T. Seppäläinen, A course on large deviations with an introduction to Gibbs measures. Graduate Studies in Mathematics, vol. 162 (American Mathematical Society, Providence, RI, 2015). MR 3309619 26. L. Rey-Bellet, L.-S. Young, Large deviations in non-uniformly hyperbolic dynamical systems. Ergodic Theor. Dynam. Syst. 28(2), 587–612 (2008). MR 2408394 (2009c:37029) 27. W. Schlag, Regularity and convergence rates for the Lyapunov exponents of linear cocycles. J. Mod. Dyn. 7(4), 619–637 (2013). MR 3177775 28. T. Tao, Topics in random matrix theory. Graduate Studies in Mathematics, vol. 132 (American Mathematical Society, Providence, RI, 2012). MR 2906465 (2012k:60023) 29. V.N. Tutubalin, Limit theorems for a product of random matrices. Teor. Verojatnost. i Primenen. 10, 19–32 (1965). MR 0175169 (30 #5354) 30. S.R.S. Varadhan, Large deviations and applications. École d’Été de Probabilités de Saint-Flour XV-XVII, 1985–87, Lecture Notes in Mathematics, vol. 1362 (Springer, Berlin, 1988), pp. 1–49. MR 983371 (89m:60068) 31. M. Viana, Lectures on Lyapunov exponents. Cambridge Studies in Advanced Mathematics (Cambridge University Press, 2014) 32. P. Walters, An introduction to ergodic theory. Graduate Texts in Mathematics, vol. 79 (Springer, New York, 1982). MR 648108 (84e:28017) 33. L-S. Young, Large deviations in dynamical systems. Trans. Am. Math. Soc. 318(2), 525–543 (1990). MR 975689 (90g:58069)

Chapter 2

Estimates on Grassmann Manifolds

Abstract The main result of this chapter, called the Avalanche Principle (AP), relates the expansion of a long product of matrices with the product of expansions of the individual matrices. This principle was introduced by M. Goldstein and W. Schlag in the context of SL(2, C) matrices. Besides extending the AP to matrices of arbitrary dimension and possibly non-invertible, the geometric approach we use here provides a relation between the most expanding (singular) directions of such a long product of matrices and the corresponding singular directions of the first and last matrices in the product. The AP along with other estimates on the action of matrices on Grassmann manifolds will play a fundamental role in the next chapters, when we establish the continuity the LE and of the Oseledets decomposition.

2.1 Grassmann Geometry Grassmann geometry is the geometric study of manifolds of linear subspaces of an Euclidean space and of the action of linear groups (and algebras) on them. Its foundations were laid in the masterpiece ‘Die lineale Ausdehnungslehre’ of Hermann Grassmann, whose genius is still not fully understood, as explained in the survey [2].

2.1.1 Projective Spaces The projective space is the simplest compact model to study the action of a linear map. Given an n-dimensional Euclidean space V , consider the equivalence relation defined on V \ {0} by u ≡ v if and only if u = λ v for some λ = 0. For v ∈ V \ {0}, the set vˆ := {λ v : λ ∈ R \ {0}} is the equivalence class of the vector v relative to this relation. The projective space of V is the quotient P(V ) := {ˆv : v ∈ V \ {0}} of V \ {0} by this equivalence relation. It is a compact topological space when endowed with the quotient topology.

© Atlantis Press and the author(s) 2016 P. Duarte and S. Klein, Lyapunov Exponents of Linear Cocycles, Atlantis Studies in Dynamical Systems 3, DOI 10.2991/978-94-6239-124-6_2

23

24

2 Estimates on Grassmann Manifolds

The unit sphere S(V ) := {v ∈ V : v = 1} is a compact Riemannian manifold of constant curvature 1 and diameter π . The natural projection πˆ : S(V ) → P(V ), πˆ (v) = vˆ , is a (double) covering map. Hence the projective space P(V ) has a natural smooth Riemannian structure for which the covering map πˆ is a local isometry. Thus P(V ) is a compact Riemannian manifold with constant curvature 1 and diameter π2 . Given a linear map g ∈ L (V ) define P(g) := {ˆv ∈ P(V ) : g v = 0}. We refer to the linear map ϕg : P(g) ⊂ P(V ) → P(V ), ϕg (ˆv) := πˆ ( gg vv ), as the projective action of g on P(V ). If g is invertible then ϕg : P(V ) → P(V ) is a diffeomorphism with inverse ϕg−1 : P(V ) → P(V ). Through these maps, the group GL(V ), of all linear automorphisms on V , acts transitively on the projective space P(V ). We will consider three different metrics on the projective space P(V ). The Riemannian distance, ρ, measures the length of an arc connecting two points on the sphere. More precisely, given u, v ∈ S(V ), ρ(ˆu, vˆ ) := min{∠(u, v), ∠(u, −v)}.

(2.1)

The second metric, d, corresponds to the Euclidean distance. More precisely, given u, v ∈ S(V ), (2.2) d(ˆu, vˆ ) := min{u − v, u + v} measures the smallest chord of the arcs between u and v and between u and −v. The third metric, δ, measures the sine of the arc between two points on the sphere. More precisely, given u, v ∈ S(V ), δ(ˆu, vˆ ) :=

u ∧ v = sin(∠(u, v)). u v

(2.3)

The fact that δ is a metric on P(V ) follows from the sine addition law, which implies that sin(θ + θ ) ≤ sin θ + sin θ , for all θ, θ ∈ [0, π2 ]. These three distances are equivalent. For all uˆ , vˆ ∈ P(V ), δ(ˆu, vˆ ) = sin ρ(ˆu, vˆ )

and

d(ˆu, vˆ ) = chord ρ(ˆu, vˆ ).

(2.4)

The inequalities 2θ ≤ sin θ ≤ chord θ = 2 sin(θ/2) ≤ θ π

∀0 ≤ θ ≤

π 2

imply that 2 ρ(ˆu, vˆ ) ≤ δ(ˆu, vˆ ) ≤ d(ˆu, vˆ ) ≤ ρ(ˆu, vˆ ). π

(2.5)

Because of (2.4), these three metrics determine the same group of isometries on the projective space.

2.1 Grassmann Geometry

25

2.1.2 Exterior Algebra Exterior Algebra was introduced by H. Grassmann in the ‘Ausdehnungslehre’. We present here an informal description of some of its properties. See the book of Stenberg [8] for a rigorous treatment of the subject. Let V be a finite n-dimensional Euclidean space. Given k vectors v1 , . . . , vk ∈ V , their kth exterior product is a formal skew-symmetric product v1 ∧ · · · ∧ vk , in the sense that for any permutation σ = (σ1 , . . . , σk ) ∈ Sk , vσ1 ∧ · · · ∧ vσk = (−1)sgn(σ ) v1 ∧ · · · ∧ vk . These formal products are elements of an anti-commutative and associative graded algebra (∧∗ V, +, ∧), called the exterior algebra of V . Formal products v1 ∧ · · · ∧ vk are called simple k-vectors of V . The kth exterior power of V , denoted by ∧k V , is the linear span of all simple k vectors of V . Elements of ∧k V are called k-vectors. An easy consequence of this formal definition is that v1 ∧ · · · ∧ vk = 0 if and only if v1 , . . . , vk are linearly dependent. Another simple consequence is that given k-dimensional linear subspace two bases {v1 , . . . , vk } and {w1 , . . . , wk } of the same  of V , if for some real matrix A = (aij ) we have wi = kj=1 aij vj for all i = 1, . . . , k, then w1 ∧ · · · ∧ wk = (det A) v1 ∧ · · · ∧ vk . More generally, two families {v1 , . . . , vk } and {w1 , . . . , wk } of linearly independent vectors span the same k-dimensional subspace if and only if for some real number λ = 0, w1 ∧ · · · ∧ wk = λ v1 ∧ · · · ∧ vk . Hence we identify the line spanned by a simple k-vector v = v1 ∧ · · · ∧ vk , i.e., the projective point vˆ ∈ P(∧k V ) determined by v, with the k-dimensional subspace spanned by the vectors {v1 , . . . , vk }, denoted hereafter by

v1 ∧ · · · ∧ vk . The subspaces ∧k V induce the grading structure of the exterior algebra ∧∗ V , i.e., V we have the direct sum decomposition ∧∗ V = ⊕dim k=0 ∧k V with (∧k V ) ∧ (∧k V ) ⊂ ∧k+k V for all 0 ≤ k, k ≤ dim V . Geometrically, the exterior product operation ∧ : ∧k V × ∧k V → ∧k+k V corresponds to the algebraic sum of linear subspaces, in the sense that given families {v1 , . . . , vk } and {w1 , . . . , wk } of linearly independent vectors such that

v1 ∧ · · · ∧ vk  ∩

w1 ∧ · · · ∧ wk  = 0, then

v1 ∧ · · · ∧ vk ∧ w1 ∧ · · · ∧ wk  =

v1 ∧ · · · ∧ vk  +

w1 ∧ · · · ∧ wk . Let Λnk be the set of all k-subsets I = {i1 , . . . , ik } ⊂ {1, . . . , n}, with i1 < · · · < ik , and order it lexicographically. Given a basis {e1 , . . . , en } of V , define for each I ∈ Λnk , the ordered family {eI : I ∈ Λnk } the kth exterior product eI = ei1 ∧ · · · ∧ eik .Then n is a basis of ∧k V . In particular dim ∧k V = k . The exterior algebra ∧∗ V inherits an Euclidean structure from V . More precisely, there is a unique inner product on ∧∗ V such that for any orthonormal basis

26

2 Estimates on Grassmann Manifolds

{e1 , . . . , en } of V , the family {eI : I ∈ Λnk , 0 ≤ k ≤ n} is an orthonormal basis of the exterior algebra ∧∗ V . Given vectors v1 , . . . , vk ∈ V let us call parallelepiped generated by these vectors the set ⎧ ⎫ k ⎨ ⎬ P(v1 , . . . , vk ) := tj vj : tj ∈ [0, 1], j = 1, . . . , k . ⎩ ⎭ j=1

Interestingly, the norm of the simple k-vector v1 ∧· · ·∧vk is equal to the k-dimensional volume of the parallelepiped generated by its factors vj . More precisely, v1 ∧ · · · ∧ vk  = Volk (P(v1 , . . . , vk )),

(2.6)

where Volk stands for the k-dimensional Hausdorff measure. To explain this fact first notice that if the vectors v1 , . . . , vk are pairwise orthogonal then v1 vk v1 ∧ · · · ∧ vk  = ∧ ··· ∧ =1 v1  · · · vk  v1  vk  because the vectors {vj /vj  : j = 1, . . . , k} are orthonormal. This shows that v1 ∧ · · · ∧ vk  = v1  · · · vk  and establishes (2.6) in this case. In general we use the Gram-Schmidt orthogonalization method, defining recursively v1 = v1 and vj = vj −

j−1 

vj , v  i

i=1

vi 2

vi for j = 2, . . . , k.

At each step, when we replace vj by vj , both wedge products and k-volumes are preserved. Hence v1 ∧ · · · ∧ vk = v1 ∧ · · · ∧ vk and v1 ∧ · · · ∧ vk  = v1 ∧ · · · ∧ vk  = v1  . . . vk  = Volk (P(v1 , . . . , vk )) = Volk (P(v1 , . . . , vk )). Formula (2.6) also implies that for any simple vectors e = e1 ∧ · · · ∧ er and f = f1 ∧ · · · ∧ fs in V , e ∧ f  ≤ e f . (2.7) Moreover, equality holds if and only if ei , fj  = 0 for all i = 1, . . . , r and j = 1, . . . , s. A simple k-vector v1 ∧ · · · ∧ vk of norm one is called a unit k-vector. From the previous considerations the correspondence v1 ∧ · · · ∧ vk →

v1 ∧ · · · ∧ vk  is oneto-one, between the set of unit k-vectors in ∧k V and the set of oriented k-dimensional linear subspaces of V . In particular, if V is an oriented Euclidean space then the 1dimensional space ∧n V has a canonical unit n-vector, denoted by ω, and called the volume element of ∧n V . In this case there is a unique operator, called the Hodge star

2.1 Grassmann Geometry

27

operator, ∗ : ∧∗ V → ∧∗ V defined by v ∧ (∗w) = v, w ω,

for all v, w ∈ ∧∗ V.

The Hodge star operator maps ∧k V isomorphically, and isometrically, onto ∧n−k V , for all 0 ≤ k ≤ n. Geometrically it corresponds to the orthogonal complement operation on linear subspaces, i.e., for any simple k-vector,

∗(v1 ∧ · · · ∧ vk ) =

v1 ∧ · · · ∧ vk ⊥ . A dual product operation ∨ : ∧∗ V × ∧∗ V → ∧∗ V can be defined by v ∨ w := ∗((∗v) ∧ (∗w)),

for all v, w ∈ ∧∗ V.

This operation maps ∧k V × ∧k V to ∧k+k −n V , and describes the intersection operation on linear subspaces, in the sense that given families {v1 , . . . , vk } and {w1 , . . . , wk } of linearly independent vectors with

v1 ∧ · · · ∧ vk  +

w1 ∧ · · · ∧ wk  = V , then

(v1 ∧ · · · ∧ vk ) ∨ (w1 ∧ · · · ∧ wk ) =

v1 ∧ · · · ∧ vk  ∩

w1 ∧ · · · ∧ wk . The geometric meaning of the ∨-operation reduces by duality to that of the sum ∧-operation and the complement ∗-operation. Any linear map g : V → V induces a linear map ∧k g : ∧k V → ∧k V , called the kth exterior power of g, such that for all v1 , . . . , vk ∈ V , ∧k g(v1 ∧ · · · ∧ vk ) = g(v1 ) ∧ · · · ∧ g(vk ). This construction is functorial in the sense that for all linear maps g, g : V → V , ∧k id V = id∧k V , ∧k (g ◦ g) = ∧k g ◦ ∧k g

and

∧k g∗ = (∧k g)∗ ,

where g∗ : V → V denotes the adjoint operator. A clear consequence of these properties is that if g : V → V is an orthogonal automorphism, i.e., g∗ ◦ g = id V , then so is ∧k g : ∧k V → ∧k V . Consider a matrix A ∈ Mat(n, R). Given I, J ∈ Λnk , we denote by AI×J the square sub-matrix of A indexed in I×J. If a linear map g : V → V is represented by A relative to a basis {e1 , . . . , en }, then the kth exterior power ∧k g : ∧k V → ∧k V is represented by the matrix ∧k A := (det AI×J )I,J relative to the basis {eI : I ∈ Λnk }. The matrix ∧k A is called the kth exterior power of A. Obviously, matrix exterior powers satisfy the same functorial properties as linear maps, i.e., for all A, A ∈ Mat(n, R), ∧k In = I(nk) , ∧k (A A) = (∧k A )(∧k A) where A∗ denotes the transpose matrix of A.

and

∧k A∗ = (∧k A)∗ ,

28

2 Estimates on Grassmann Manifolds

Let n = dim V and {ei : i = 1, . . . , n} be an eigen-basis of a linear endomorphism g : V → V with eigenvalues {λi : i = 1, . . . , n}, i.e., gei = λi ei for all i = 1, . . . , n. Then the family {eI : I ∈ Λnk } is an eigen-basis of ∧k g : ∧k V → ∧k V with eigenvalues λI = λi1 λi2 . . . λik , I = {i1 , . . . , ik } ∈ Λnk . In other words, (∧k g)eI = λI eI for all I ∈ Λnk .

2.1.3 Grassmann Manifolds Grassmannians, like projective spaces, are compact Riemannian manifolds which stage the action of linear maps. For each 0 ≤ k ≤ n, the Grassmannian Gr k (V ) is the space of all k-dimensional linear subspaces of V . Notice that the projective space P(V ) and the Grassmannian Gr 1 (V ) are the same object if we identify each point vˆ ∈ P(V ) with the line v = {λ v : λ ∈ R}. The full Grassmannian Gr(V ) is the union of all Grassmannians Gr k (V ) with 0 ≤ k ≤ n. Denote by L (V ) the algebra of linear endomorphisms on V , and consider the map π : Gr(V ) → L (V ), E → πE , that assigns the orthogonal projection πE onto E, to each subspace E ∈ Gr(V ). This map is one-to-one, and we endow Gr(V ) with the unique topology that makes the map π : Gr(V ) → π(Gr(V )) a homeomorphsim. With it, Gr(V ) becomes a compact space, and each Grassmannian Gr k (V ) is a closed connected subspace of Gr(V ). The group GL(V ) acts transitively on each Grassmannian. The action of GL(V ) on Gr k (V ) is given by · : GL(V ) × Gr k (V ) → Gr k (V ), (g, E) → g E. The special orthogonal group SO(V ), of orientation preserving orthogonal automorphisms, acts transitively on Grassmannians too. All Grassmannians are compact homogeneous spaces. For each 0 ≤ k ≤ n, the Plücker embedding is the map ψ : Gr k (V ) → P(∧k V ) that to each subspace E in Gr k (V ) assigns the projective point vˆ ∈ P(∧k V ), where v = v1 ∧ · · · ∧ vk is any simple k-vector formed as exterior product of a basis {v1 , . . . , vk } of E. This map is one-to-one and equivariant, i.e., for all g ∈ GL(V ) and E ∈ Gr(V ), (2.8) ψ(g E) = ϕ∧k g ψ(E). We will consider the metrics ρ, d, δ : Gr k (V ) × Gr k (V ) → [0, +∞) defined for any given E, F ∈ Gr k (V ) by ρ(E, F) := ρ(ψ(E), ψ(F)), d(E, F) := d(ψ(E), ψ(F)), δ(E, F) := δ(ψ(E), ψ(F)).

(2.9) (2.10) (2.11)

√ which assign diameter π2 , 2 and 1, respectively, to the manifold Gr k (V ). These distances are preserved by the action of orthogonal linear maps in SO(V ).

2.1 Grassmann Geometry

29

Given k, k ≥ 0 such that k + k ≥ n = dim V , the intersection of subspaces is an operation ∩ : Gr k,k (∩) ⊂ Gr k (V ) × Gr k (V ) → Gr k+k −n (V ) where: Definition 2.1 The domain is defined by Gr k,k (∩) := {(E, E ) ∈ Gr k (V ) × Gr k (V ) : E + E = V }. Similarly, given k, k ≥ 0 such that k + k ≤ n = dim V , the algebraic sum of subspaces is operation + : Gr k,k (+) ⊂ Gr k (V ) × Gr k (V ) → Gr k+k −n (V ) where: Definition 2.2 The domain is defined by Gr k,k (+) := {(E, E ) ∈ Gr k (V ) × Gr k (V ) : E ∩ E = {0}}. The considerations in Sect. 2.1.2 show that the Plücker embedding satisfies the following relations: Proposition 2.1 Given E ∈ Gr k (V ), E ∈ Gr k (V ), consider unit vectors v ∈ Ψ (E) and v ∈ Ψ (E ). (a) If (E, E ) ∈ Gr k,k (∩) then ψ(E ∩ E ) = v ∨ v . (b) If (E, E ) ∈ Gr k,k (+) then ψ(E + E ) = v ∧ v . A duality between sums and intersections stems from these facts. Proposition 2.2 The orthogonal complement operation E → E ⊥ is a d-isometric involution on Gr(V ) which maps Gr k,k (+) to Gr n−k,n−k (∩) and satisfies for all (E, E ) ∈ Gr k,k (+), (E + E )⊥ = (E ⊥ ) ∩ (E )⊥ . The composition semigroup L (V ) has two partial actions on Grassmannians, called the push-forward action and the pull-back action. Before introducing them, a couple of facts are needed. Definition 2.3 Given g ∈ L (V ), we denote by Kg := {v ∈ V : g v = 0} the kernel of g, and by Rg := {g v : v ∈ V } the range of g. Lemma 2.1 Given g ∈ L (V ) and E ∈ Gr(V ), 1. if E ∩ (Kg) = {0} then the linear map g|E : E → g(E) is an isomorphism, and in particular dim g(E) = dim E. 2. if E + (Rg) = V then the linear map g∗ |E ⊥ : E ⊥ → g−1 (E)⊥ is an isomorphism, and in particular dim g−1 (E) = dim E. Proof The first statement is obvious because if E ∩ (Kg) = {0} then K(g|E ) = {0}. If E +(Rg) = V then, since Kg∗ = (Rg)⊥ , we have E ⊥ ∩(Kg∗ ) = E ⊥ ∩(Rg)⊥ = (E + Rg)⊥ = {0}. Hence by 1, the linear map g∗ |E ⊥ : E ⊥ → g∗ (E ⊥ ) is an isomorphism. It is now enough to remark that g∗ (E ⊥ ) = g−1 (E)⊥ . In fact, the inclusion g∗ (E ⊥ ) ⊂

30

2 Estimates on Grassmann Manifolds

g−1 (E)⊥ is clear. Since g∗ |E ⊥ is injective, dim g∗ (E ⊥ ) = dim(E ⊥ ). On the other hand, by the transversality condition, g−1 (E) has dimension   dim g−1 (E) = dim (g|(Kg)⊥ )−1 (E ∩ Rg) + dim(Kg) = dim(E ∩ Rg) + dim(Kg) = dim(E) + dim(Rg) − n + dim(Kg) = dim(E). Hence both g∗ (E ⊥ ) and g−1 (E)⊥ have dimension equal to dim(E ⊥ ), and the equality follows.  Given g ∈ L (V ) and k ≥ 0 such that k + dim(Kg) ≤ n = dim V , the pushforward by g is the map ϕg : Gr k (g) ⊂ Gr k (V ) → Gr k (V ), E → gE, where: Definition 2.4 The domain is defined by Gr k (g) := {E ∈ Gr k (V ) : E ∩ (Kg) = {0}}. We warn the reader that the notation ϕg is used for both the projective and the Grassmannian actions of g ∈ L (V ). Similarly, given k ≥ 0 such that k + dim(Rg) ≥ n = dim V , the pull-back by g is the map ϕg−1 : Gr k (g−1 ) ⊂ Gr k (V ) → Gr k (V ), E → g−1 E, where: Definition 2.5 The domain is defined by Gr k (g−1 ) := {E ∈ Gr k (V ) : E + (Rg) = V }. From the proof of Proposition 2.1 we obtain a duality between push-forwards and pull-backs which can be expressed as follows. Proposition 2.3 Given g ∈ L (V ) and k ≥ 0 such that k + dim(Rg) ≥ n = dim V , we have Gr k (g−1 ) = Gr n−k (g∗ )⊥ and for all E ∈ Gr k (g−1 ), (g−1 E)⊥ = g∗ (E ⊥ ). In Sect. 2.3 we derive a modulus of Lipschitz continuity, w.r.t. the metric δ, for the sum and intersection operations.

2.1.4 Flag Manifolds Let V be a finite n-dimensional Euclidean space. Any strictly increasing sequence of linear subspaces F1 ⊂ F2 ⊂ · · · ⊂ Fk ⊂ V is called a flag in the Euclidean space V . Formally, flags are denoted as lists F = (F1 , . . . , Fk ). The sequence τ = (τ1 , . . . , τk ) of dimensions τj = dim Fj is called the signature of the flag F. The integer k is called

2.1 Grassmann Geometry

31

the length of the flag F, and the length of the signature τ . Let F(V ) be the set of all flags in V , and define Fτ (V ) to be the space of flags with a given signature τ . Two special cases of flag spaces are the projective space P(V ) = Fτ (V ), when τ = (1), and the Grassmannian Gr k (V ) = Fτ (V ), when τ = (k). The general linear group GL(V ) acts naturally on F(V ). Given g ∈ GL(V ) the action of g on Fτ (V ) is given by the map ϕg : Fτ (V ) → Fτ (V ), ϕg F = (gF1 , . . . , gFk ). The special orthogonal subgroup SO(V ) ⊂ GL(V ) acts transitively on Fτ (V ). Hence, all flag manifolds Fτ (V ) are compact homogeneous spaces. Each of them is a compact connected Riemannian manifold where the group SO(V ) acts by isometries. Since Fτ (V ) ⊂ Gr τ1 (V ) × Gr τ2 (V ) × · · · × Gr τk (V ), the product distances ρτ (F, F ) = max ρ(Fj , Fj )

(2.12)

dτ (F, F ) = max d(Fj , Fj )

(2.13)

δτ (F, F ) = max δ(Fj , Fj )

(2.14)

1≤j≤k 1≤j≤k 1≤j≤k

are equivalent to the Riemannian distance on Fτ (V ). With these metrics, the flag √ manifold Fτ (V ) has diameter π2 , 2 and 1, respectively. The group SO(V ) acts isometrically on Fτ (V ) with respect to these distances. Given a signature τ = (τ1 , . . . , τk ), if n = dim V , we define τ ⊥ := (n − τk , . . . , n − τ1 ). When τ = (τ1 , . . . , τk ) we will write τ ⊥ = (τ1⊥ , . . . , τk⊥ ), where τj⊥ = n − τk+1−i . Definition 2.6 Given a flag F = (F1 , . . . , Fk ) ∈ Fτ (V ), its orthogonal complement is the τ ⊥ -flag F ⊥ := (Fk⊥ , . . . , F1⊥ ). The map ·⊥ : F(V ) → F(V ) is an isometric involution on F(V ), mapping Fτ (V ) onto Fτ ⊥ (V ). The involution character, (F ⊥ )⊥ = F for all F ∈ F(V ), is clear. As explained in Sect. 2.1.2, the Hodge star operator ∗ : ∧k V → ∧n−k V is an isometry between these Euclidean spaces. By choice of metrics on the Grassmannians, see (2.10), the Plücker embeddings are isometries. Finally, the Plücker embedding conjugates the orthogonal complement map ·⊥ : Gr k (V ) → Gr n−k (V ) with the Hodge star operator. Hence for each 0 ≤ k ≤ n, the map ·⊥ : Gr k (V ) → Gr n−k (V ) is an isometry. The analogous conclusion for flags follows from the definition of distance dτ . Given g ∈ L (V ) and a signature τ such that τi + dim(Kg) ≤ n for all i, the push-forward by g on flags is the map ϕg : Fτ (g) ⊂ Fτ (V ) → Fτ (V ), ϕg F := (g F1 , . . . , g Fk ), where: Definition 2.7 The domain of ϕg is defined by Fτ (g) := {F ∈ Fτ (V ) : Fk ∩ (Kg) = {0}}.

32

2 Estimates on Grassmann Manifolds

Similarly, given a signature τ such that τi + dim(Rg) ≥ n for all i, the pullback by g on flags is the map ϕg−1 : Fτ (g−1 ) ⊂ Fτ (V ) → Fτ (V ), ϕg−1 F := (g−1 F1 , . . . , g−1 Fk ), where: Definition 2.8 The domain of ϕg−1 is defined by Fτ (g−1 ) := {F ∈ Fτ (V ) : F1 + (Rg) = V }. The duality between duality between push-forwards and pull-backs is expressed as follows. Proposition 2.4 Given g ∈ L (V ), Fτ (g−1 ) = Fτ ⊥ (g∗ )⊥ and for all F ∈ Fτ (g−1 ), (ϕg−1 F)⊥ = ϕg∗ (F ⊥ ).

2.2 Singular Value Geometry Singular value geometry refers here to the geometry of the singular value decomposition (SVD) of a linear endomorphism g : V → V on some Euclidean space V . It also refers to some geometric properties of the action of g on Grassmannians and flag manifolds related to the singular value decomposition of g.

2.2.1 Singular Value Decomposition Let V be a Euclidean space of dimension n. Definition 2.9 Given g ∈ L (V ), the singular values of g are the square roots of the 2 = gv, gv, i.e., the eigenvalues of the quadratic form Qg : V → R, Qg (v) = g v√ eigenvalues of the positive semi-definite self-adjoint operator g∗ g. Given g ∈ L (V ), let s1 (g) ≥ s2 (g) ≥ · · · ≥ sn (g) ≥ 0, denote the sorted singular √ values of g.√The adjoint g∗ has the same singular values as g because the operators g∗ g and g g∗ are conjugate. The largest singular value, s1 (g), is the square root of the maximum value of Qg over the unit sphere, i.e., s1 (g) = maxv=1 g v = g is the operator norm of g. Likewise, the least singular value, sn (g), is the square root of the minimum value of Qg over the unit sphere, i.e., sn (g) = minv=1 g v. This number, also denoted by m(g), is called the least expansion of g. If g is invertible then m(g) = g−1 −1 , while otherwise m(g) = 0.

2.2 Singular Value Geometry

33

Definition 2.10 The eigenvectors √ of the quadratic form Qg , i.e., of the positive semidefinite self-adjoint operator g∗ g, are called the singular vectors of g. By the spectral theory of self-adjoint operators, for any g ∈ L (V ) there exists an orthonormal basis consisting of singular vectors of g. Proposition 2.5 Given g ∈ L (V ), let v ∈ V be such that g∗ g v = λ2 v with λ ≥ 0 and v = 1, i.e., v is a unit singular vector of g with singular value λ. Then there exists a unit vector w ∈ V such that (a) g v = λ w, (b) g g∗ w = λ2 w, i.e., w is a singular vector of g∗ . Proof Let v ∈ V be a unit singular vector of g. Then g∗ g v = λ2 v with λ ≥ 0 and λ2 = λ2 v, v = g∗ g v, v = g v2 , which implies that λ = g v. Since (g g∗ ) (g v) = g (g∗ g) v = λ2 g v, if λ = 0 then setting w = g v/g v = λ−1 g v, we have (g g∗ ) w = λ2 w, which proves that w is a singular vector of g∗ . By definition g v = λ w. When λ = 0, take w to be any unit vector in Kg∗ . Notice that dim(Kg) = dim(Kg∗ ). In this case v and w are singular vectors of g and g∗ , respectively, such that g v = 0 = λ w.  By the previous proposition, given g ∈ L (V ) there exist two orthonormal singular vector basis of V , {v1 (g), . . . , vn (g)} and {v1 (g∗ ), . . . , vn (g∗ )} for g and g∗ , respectively, such that g vj (g) = sj (g) vj (g∗ )

for all 1 ≤ j ≤ n.

Denote by Dg the diagonal matrix with diagonal entries sj (g), 1 ≤ j ≤ n, seen as an operator Dg ∈ L (Rn ). Define the linear maps Ug , Ug∗ : Rn → V by Ug (ej ) = vj (g) and Ug∗ (ej ) = vj (g∗ ), for all 1 ≤ j ≤ n, where the ej are the vectors of the canonical basis in Rn . By construction Ug and Ug∗ are isometries and the following decomposition holds g = Ug∗ Dg (Ug )∗ , known as the singular value decomposition (SVD) of g. We say that g has a simple singular spectrum if its n singular values are all distinct. When g has simple singular spectrum, the singular vectors vj (g) and vj (g∗ ) above are uniquely determined up to a sign, and in particular they determine well-defined projective points vj (g), vj (g∗ ) ∈ P(V ). Definition 2.11 Given g ∈ L (V ), we call singular basis of g any orthonormal basis {v1 , . . . , vn } of V formed by singular vectors of g ordered in such a way that g vi  = si (g) for all i = 1, . . . , n. Given g ∈ L (V ), consider singular bases {v1 , . . . , vn } and {v1∗ , . . . , vn∗ } for g and g , respectively, such that ∗

g vj = sj vj∗ with sj = sj (g)

for all 1 ≤ j ≤ n.

34

2 Estimates on Grassmann Manifolds

For any I = {i1 , . . . , ik } ∈ Λnk we have (∧k g)(vi1 ∧ · · · ∧ vik ) = (si1 . . . sik ) (vi∗1 ∧ · · · ∧ vi∗k ). Therefore, by the considerations at the end of Sect. 2.1.2, the families of k-vectors {vI = vi1 ∧· · ·∧vik : I ∈ Λnk } and {vI∗ = vi∗1 ∧· · ·∧vi∗k : I ∈ Λnk } form two singular bases for ∧k g and ∧k g∗ , respectively, while the products sI = si1 . . . sik are the singular values of both ∧k g and ∧k g∗ . Proposition 2.6 For any 1 ≤ k ≤ dim V , ∧k g = s1 (g) . . . sk (g). Proof The maximum product sI is attained when I = {1, . . . , k} ∈ Λnk . Hence  ∧k g = s1 . . . sk . The volume expansion factor of a linear map g : V → V between two Euclidean spaces V and V is defined by det + (g) :=

det(g∗ g).

This name is justified by the following fact. Proposition 2.7 Given a linear map g : V → V between Euclidean spaces, with n = dim V , for any Borel set B ⊂ V , Voln (g(B)) = det + (g) Voln (B), where Voln denotes the n-dimensional Hausdorff measure. Proof Let {v1 , . . . , vn } be any basis of V and consider the parallelipiped B = P(v1 , . . . , vn ). By Proposition 2.9 below and formula (2.6), Voln (g(B)) = (gv1 ) ∧ · · · ∧ (gvn ) = (∧n g)(v1 ∧ · · · ∧ vn ) = ∧n gv1 ∧ · · · ∧ vn  = det + (g)Voln (B). On the third step we have used the fact that ∧n V has dimension 1.



Because of this property the volume expansion factor behaves multiplicatively. Proposition 2.8 Given Euclidean spaces V , V and V , if g : V → V is an isomorphism and g : V → V any linear map then det + (g ◦ g) = det + (g ) det + (g). Proposition 2.9 Let V and V be Euclidean spaces with n = dim V ≤ dim V . Then for any linear map g : V → V det+ (g) = s1 (g) . . . sn (g) = ∧n g.

2.2 Singular Value Geometry

35

Proof The squares si2 = si (g)2 (1 ≤ i ≤ n) are the eigenvalues of g∗ g.



Next proposition provides a method to compute the volume expansion factor. Proposition 2.10 Let g : V → V be a linear map between Euclidean spaces. Given orthonormal bases {vi : i = 1, . . . , n} of V and {vi : i = 1, . . . , n} of the range gV ,

  det + (g) = det gvi , vj  i,j . Proof The matrix A ∈ Mat(n, R) with entries aij = gvi , vj  represents the linear map g in the given orthonormal bases. Consider the isometries U : Rn → V and U : Rn → V respectively defined by Uei = vi and U ei = vi for all i = 1, . . . , n. Then g = U AU ∗ and det+ (g)2 = det(g∗ g) = det(UA∗ AU ∗ ) = det(A∗ A) = det(A)2 .

This proves that det+ (g) = det A .



2.2.2 Gaps and Most Expanding Directions Consider a linear map g ∈ L (V ) and a number 1 ≤ k < dim V . Definition 2.12 The kth gap ratio of g is defined to be gr k (g) :=

sk (g) ≥ 1. sk+1 (g)

We will also write gr(g) instead of gr 1 (g). Definition 2.13 We say that g has a first singular gap when gr(g) > 1. More generally, we say that g has a k singular gap when gr k (g) > 1. In some occasions it is convenient to work with the inverse quantity, denoted by σk (g) := gr k (g)−1 ≤ 1.

(2.15)

Proposition 2.11 For any 1 ≤ k < dim V, gr k (g) =

∧k g2 = gr 1 (∧k g). ∧k−1 g ∧k+1 g

Proof The first equality follows from Proposition 2.6. The two first singular values of ∧k g are s1 (∧k g) = s1 (g) . . . sk−1 (g)sk (g) and s2 (∧k g) = s1 (g) . . . sk−1 (g)sk+1 (g). Hence

36

2 Estimates on Grassmann Manifolds

gr 1 (∧k g) =

sk (g) s1 (∧k g) = = gr k (g). s2 (∧k g) sk+1 (g)



Given g ∈ L (V ), if gr(g) > 1 then the singular value s1 (g) = g is simple. Definition 2.14 In this case we denote by v(g) ∈ P(V ) the associated singular direction, and refer to it as the g-most expanding direction. By definition we have

ϕg v(g) = v(g∗ ).

(2.16)

More generally, given 1 ≤ k < dim V , we have: Definition 2.15 If gr k (g) > 1 we define the g-most expanding k-subspace to be   vk (g) := Ψ −1 v(∧k g) , where Ψ stands for the Plücker embedding defined in Sect. 2.1.3. The subspace vk (g) is the direct sum of all singular directions associated with the singular values s1 (g), . . . , sk (g). We have ϕg vk (g) = vk (g∗ ).

(2.17)

Analogously, let n = dim V and assume gr n−k (g) > 1. Definition 2.16 We define the g-least expanding k-subspace as vk (g) := vn−k (g)⊥ . The subspace vk (g) is the direct sum of all singular directions associated with the singular values sn−k+1 (g), . . . , sn (g). Again we have ϕg vk (g) = vk (g∗ ).

(2.18)

Let τ = (τ1 , . . . , τk ) be a signature with 1 ≤ τ1 < · · · < τk < dim V . Definition 2.17 We define the τ -gap ratio of g to be gr τ (g) := min gr τj (g). 1≤j≤k

When gr τ (g) > 1 we say that g has a τ -gap pattern. Note that gr τ (g) > 1 means that g has a τj singular gap for 1 ≤ j ≤ k. Recall that Fτ (V ) denotes the space of all τ -flags, i.e., flags F = (F1 , . . . , Fk ) such that dim(Fj ) = τj for j = 1, . . . , k.

2.2 Singular Value Geometry

37

Definition 2.18 If gr τ (g) > 1 then the most expanding τ -flag is vτ (g) := (vτ1 (g), . . . , vτk (g)) ∈ Fτ (V ). Given g ∈ L (V ) the domain of its push-forward action on Fτ (V ) is Definition 2.19

Fτ (g) := {F ∈ Fτ (V ) : Fk ∩ Kg = {0}}.

The push-forward of a flag F ∈ Fτ (g) by g is ϕg F = g F := (g F1 , . . . , g Fk ). Proposition 2.12 Given g ∈ L (V ) such that gr τ (g) > 1, the push-forward induces a map ϕg : Fτ (g) → Fτ (g∗ ) such that ϕg vτ (g) = vτ (g∗ ). Proof Given F ∈ Fτ (g), we have Fj ∩ Kg = {0} for all j = 1, . . . , k. Hence dim gFj = dim Fj = τj for all j, which proves that ϕg F ∈ Fτ (V ). To check that ϕg F ∈ Fτ (g∗ ) we need to show that gFk ∩ Kg∗ = {0}. Assume g v ∈ Kg∗ , with v ∈ Fk , and let us see that g v = 0. By assumption g∗ g v = 0, which implies (g g∗ ) g v = 0. Since the self-adjoint map g g∗ induces an automorphism on Rg , we conclude that g v = 0. The second statement follows from (2.17).  Given g ∈ L (V ), the domain of its pull-back action on Fτ (V ) is Definition 2.20 Fτ−1 (g) := {F ∈ Fτ (V ) : F1 + Rg = V }. The pull-back of a flag F ∈ Fτ (g) by g is ϕg−1 F = g−1 F := (g−1 F1 , . . . , g−1 Fk ). Definition 2.21 If gr τ ⊥ (g) > 1 the least expanding τ -flag is vτ (g) := (vτ1 (g), . . . , vτk (g)) ∈ Fτ (V ). Proposition 2.13 If gr τ (g) > 1 then vτ ⊥ (g) = vτ (g)⊥ . Proof Let {v1 , . . . , vn } be a singular basis of g. Since this basis is orthonormal, vn−k (g) = vk+1 , . . . , vn  = v1 , . . . , vk ⊥ = vk (g)⊥ . Hence vτ ⊥ (g) = (vn−τk (g), . . . , vn−τ1 (g)) = (vτ1 (g), . . . , vτk (g))⊥ = vτ (g)⊥ .



Proposition 2.14 Given g ∈ L (V ) such that gr τ ⊥ (g) > 1, the pull-back induces a map ϕg−1 : Fτ−1 (g) → Fτ−1 (g∗ ) such that ϕg−1 vτ (g) = vτ (g∗ ).

38

2 Estimates on Grassmann Manifolds

Proof Given F ∈ Fτ−1 (g), we have Fj + Rg = V for all j = 1, . . . , k. Hence dim g−1 Fj = dim Fj = τj for all j, which proves that ϕg−1 F ∈ Fτ (V ). To check that ϕg−1 F ∈ Fτ−1 (g∗ ) just notice that g−1 F1 + Rg∗ ⊇ Kg + Kg⊥ = V . The second statement follows from (2.18) and Proposition 2.13.  We end this section proving that the orthogonal complement involution conjugates the push-forward action by g ∈ L (V ) with the pull-back action by the adjoint map g∗ . Proposition 2.15 Given g ∈ L (V ) such that gr τ ⊥ (g) > 1, the action of ϕg−1 on Fτ (V ) is conjugated to the action of ϕg∗ on Fτ ⊥ (V ) by the orthogonal complement involution. More precisely, we have Fτ−1 (g) = Fτ ⊥ (g∗ )⊥ and Fτ−1 (g∗ ) = Fτ ⊥ (g)⊥ , and the following diagram commutes ϕg∗

Fτ ⊥ (g∗ ) −−−−→ Fτ ⊥ (g) ⏐ ⏐ ⏐⊥ ⏐ ·⊥  · . Fτ−1 (g) −−−−→ Fτ−1 (g∗ ) ϕg−1

Proof To see that Fτ−1 (g) = Fτ ⊥ (g∗ )⊥ , notice that the following equivalences hold: F ∈ Fτ−1 (g) ⇔ F1 + Rg = V ⇔ F1⊥ ∩ Kg∗ = {0} ⇔ F ⊥ ∈ Fτ ⊥ (g∗ ). Exchanging the roles of g and g∗ we obtain the relation Fτ−1 (g∗ ) = Fτ ⊥ (g)⊥ . Finally, notice that it is enough to prove the diagram’s commutativity at the Grassmannian level. For that use Proposition 2.3. 

2.2.3 Angles and Expansion Throughout this section let pˆ , qˆ ∈ P(V ), and p ∈ pˆ , q ∈ qˆ denote representative vectors. The projective distance δ(ˆp, qˆ ) was defined by  δ(ˆp, qˆ ) :=

1−

p, q2 p ∧ q = sin ρ(ˆp, qˆ ). = p2 q2 p q

We also define the minimum distance between any two subspaces E, F ∈ Gr(V ), δmin (E, F) :=

min

u∈E\{0},v∈F\{0}

δ(ˆu, vˆ ),

(2.19)

2.2 Singular Value Geometry

39

and the Hausdorff distance between subspaces E, F ∈ Gr k (V ),  δH (E, F) := max

 max δmin (ˆu, F), max δmin (ˆv, E) .

u∈E\{0}

v∈F\{0}

Given a unit vector v ∈ V , v = 1, denote by πv , πv⊥ : V → V the orthogonal projections πv (x) := v, x v, respectively πv⊥ (x) := x − v, x v. Lemma 2.2 Given u, v ∈ V non-collinear with u = v = 1, denote by P the plane spanned by u and v. Then πv − πu is a self-adjoint endomorphism, K(πv − πu ) = P⊥ ,

πv − πu : P → P is anti-conformal with similarity factor

the restriction

sin ∠(u, v) , (d) πv⊥ − πu⊥  = πv − πu  = δ(ˆu, vˆ ).

(a) (b) (c)

Proof Item (a) follows because orthogonal projections are self-adjoint operators. Given w ∈ P⊥ , we have πu (w) = πv (w) = 0, which implies w ∈ K(πu − πv ). Hence P⊥ ⊂ K(πu − πv ). Since u and v are non-collinear, πu − πv has rank 2. Thus K(πu − πv ) = P⊥ , which proves (b). For (c) we may assume that V = R2 and consider u = (u1 , u2 ), v = (v1 , v2 ), with 2 u1 + u22 = v12 + v22 = 1. The projections πu and πv are represented by the matrices  U=

u12 u1 u2 u1 u2 u22



 and

V =

v12 v1 v2 v1 v2 v22



w.r.t. the canonical basis. Hence πv − πu is given by  V −U =

v12 − u12 v1 v2 − u1 u2 v1 v2 − u1 u2 v22 − u22



 =

β α α −β



where α = v1 v2 −u1 u2 and β = v12 −u12 = −(v22 −u22 ). This proves that the restriction of πv − πu to the plane P is anti-conformal. The similarity factor of this map is

πv − πu  = πv (u) − u = πv⊥ (u) = sin ∠(u, v) Finally, since u − v, u v ⊥ v, πv⊥ − πu⊥ 2 = πv − πu 2 = πv⊥ (u)2 = u − v, u v2 = u ∧ v2 = δ(ˆu, vˆ )2 .



Lemma 2.3 Let V be a Euclidean space of even dimension 2k and let E, F ∈ Gr k (V ) be subspaces such that V = E ⊕ F. Then the linear map πE − πF admits an invariant decomposition V = P1 ⊕ · · · ⊕ Pk into pairwise orthogonal planes Pj such that

40

2 Estimates on Grassmann Manifolds

(1) each Pj is invariant under πE and πF , (2) Pj = Ej ⊕ Fj , where Ej = E ∩ Pj , Fj = F ∩ Pj and dim Ej = dim Fj = 1, (3) (πE − πF )|Pj : Pj → Pj is anti-conformal. Proof Choose unit vectors u0 ∈ E and v0 ∈ F such that ∠(u0 , v0 ) = max{∠(u, v) : u ∈ E \ {0}, v ∈ F \ {0}}. Then the function f (x) = u − v0 2 defined over the unit sphere in E attains its maximum value at u0 . By the method of Lagrange multipliers, πE (u0 − v0 ) is collinear with u0 , which implies that πE (v0 ) is also collinear with u0 . Therefore πE (v0 ) = u0 , v0 u0 . By a similar argument, πF (u0 ) = u0 , v0 v0 . The plane P spanned by the vectors u0 and v0 is invariant under both projections πE and πF . Hence, by Lemma 2.2 the restriction πE − πF : P → P is anti-conformal. Now the orthogonal complement P⊥ is also invariant under πE , πF and πE − πF . Defining E0 = E ∩ P⊥ and F0 = F ∩ P⊥ , we have P⊥ = E0 ⊕ F0 and πE − πF = πE0 − πF0 over P⊥ , where πE0 and πF0 denote orthogonal projections on P⊥ . The claim of this  lemma follows proceeding inductively with πE0 − πF0 . Definition 2.22 Given E, F ∈ Gr(V ), we denote by πF : V → V the orthogonal projection onto F, and by πE,F : E → F the restriction of πF to E. Proposition 2.16 Given E, F ∈ Gr k (V ),   (a) δ(E, F) = 1 − det + (πE,F )2 = 1 − det + (πF,E )2 , (b) δH (E, F) = πE,F ⊥  = πF,E ⊥  = πE − πF , (c) δH (E, F) ≤ δ(E, F). Proof Consider the unit k-vectors e = Ψ (E) and f = Ψ (F). For (a) notice first that δ(E, F) = δ(e, f ) = 1 − e, f 2 . Since the exterior power ∧k πF,E : ∧k F → ∧k E is also an orthogonal projection we have e, f  =

e, ∧k πF,E (f ) = ∧k πF,E  = det + (πF,E ). Take an orthogonal reflexion g ∈ O(V ) such that g(F) = E and g(E) = F. We have g−1 (E ⊥ ) = F ⊥ and πE,F ⊥ = g−1 ◦ πF,E ⊥ ◦ g. Therefore πE,F ⊥  = πF,E ⊥ . We have δH (E, F) = πE,F ⊥  because for any unit vector u ∈ uˆ , with uˆ ∈ P(E), πE,F ⊥ (u) = min δ(ˆu, vˆ ). v∈F\{0}

To finish (b) we still have to prove that πE − πF  = πE,F ⊥ . Restricting our attention to the subspace V0 = (E ∩ (E ∩ F)⊥ ) ⊕ (F ∩ (E ∩ F)⊥ ), because πE − πF vanishes on V0⊥ we can assume that V = E ⊕ F. In particular dim V = 2k. Consider the orthogonal invariant decomposition of Lemma 2.3. It is enough to check that the relation πE − πF  = πE,F ⊥  holds on each plane Pj . Therefore we may as well assume that k = 1. Notice that over the subspace E we have πE − πF = πE,F ⊥ . Since the linear map πE − πF is anti-conformal, the norm πE − πF  is attained along E, which implies that πE − πF  = πE,F ⊥ . This proves item (b). Since πE,F is an orthogonal projection all its singular values are in the range [0, 1]. Hence, for any unit vector u ∈ E, πE,F (u) ≥ m(πE,F ) ≥ det + (πE,F ). Thus

2.2 Singular Value Geometry

41

πE,F ⊥ (u)2 = 1 − πE,F (u)2 ≤ 1 − det + (πE,F )2 . Item (c) follows taking the maximum over all unit vectors u ∈ E.



The following complementary quantity to the distance δ(ˆp, qˆ ) plays a special role in the sequel. Definition 2.23 The α-angle between pˆ and qˆ is defined to be α(ˆp, qˆ ) :=

| p, q| = cos ρ(ˆp, qˆ ). p q

In order to give a geometric meaning to this angle we define the projective orthogonal hyperplane of pˆ ∈ P(V ) as Σ(ˆp) := {ˆx ∈ P(V ) : x, p = 0

for x ∈ xˆ }.

The number α(ˆp, qˆ ) is the sine of the minimum angle between pˆ and Σ(ˆq). As in Definition (2.19), given a subspace F ⊂ V we write ρmin (ˆp, F) := min ρ(ˆp, qˆ ). q∈F\{0}

Proposition 2.17 For any pˆ , qˆ ∈ P(V ), α(ˆp, qˆ ) = sin ρmin (ˆp, Σ(ˆq)) = δmin (ˆp, Σ(ˆq)) α(ˆp, qˆ ) = 0 ⇔ δ(ˆp, qˆ ) = 1 ⇔ p ⊥ q.

(2.20) (2.21)

These concepts extend naturally to Grassmannians and flag manifolds. Definition 2.24 Given E, F ∈ Gr k (V ), we define the α-angle between them α(E, F) = αk (E, F) := α(Ψ (E), Ψ (F)), where Ψ : Gr k (V ) → P(∧k V ) denotes the Plücker embedding (see Sect. 2.1.3). Definition 2.25 We say that two k-subspaces E, F ∈ Gr k (V ) are orthogonal, and we write E ⊥ F, iff α(E, F) = 0. The Grassmannian orthogonal hyperplane of F is defined as Σ(F) := {E ∈ Gr k (V ) : α(E, F) = 0}. As before, the number α(E, F) equals the sine of the minimum angle between E and Σ(F). Proposition 2.18 For any E, F ∈ Gr k (V ), α(E, F) = sin ρmin (E, Σ(F)) = δmin (E, Σ(F)).

42

2 Estimates on Grassmann Manifolds

Next we characterize the angle α(E, F). Consider the notation of Definition 2.22. Proposition 2.19 Given E, F ∈ Gr k (V ), (a) α(E, F) = α(E ⊥ , F ⊥ ), (b) α(E, F) = det + (πE,F ) = det + (πF,E ), (c) E ⊥ F iff there exists a pair (e, f ) of unit vectors such that e ∈ E ∩ F ⊥ and f ∈ F ∩ E⊥, (d) α(E, F) ≤ πE,F  = 1 − δmin (E, F ⊥ )2 . Proof Given E, F ∈ Gr k (V ), take orthonormal bases {u1 , . . . , uk } and {v1 , . . . , vk } of E and F, respectively, and consider the associated unit k-vectors u = u1 ∧ · · · ∧ uk and v = v1 ∧ · · · ∧ vk , so that u ∈ Ψ (E) and v ∈ Ψ (F). Using the Hodge star operator we obtain unit vectors ∗u ∈ Ψ (E ⊥ ) and ∗v ∈ Ψ (F ⊥ ). Hence



α(E ⊥ , F ⊥ ) =

∗u, ∗v =

u, v = α(E, F), which proves (a). Also

α(E, F) :=

u1 ∧ · · · ∧ uk , v1 ∧ · · · ∧ vk  ⎛ ⎞

u1 , v1  u1 , v2  . . . u1 , vk  ⎟

⎜ ⎜ u2 , v1  u2 , v2  . . . u2 , vk  ⎟

= det ⎜ .. .. .. ⎟ .. ⎝ . . . . ⎠

uk , v1  uk , v2  . . . uk , vk 

= det + (πE,F ).  For the second equality above write ui = wi + kj=1 ui , vj  vj with wi ∈ F ⊥ and use the anti-symmetry of the exterior product. For the third equality remark that the matrix with entries ui , vj  represents πE,F w.r.t. the given orthonormal bases. By symmetry, α(E, F) = det + (πF,E ). This proves (b). From these relations, α(E, F) = 0 ⇔ K(πE,F ) = {0} ⇔ K(πF,E ) = {0}, which explains (c). Finally, because all singular values of πE,F are in [0, 1], α(E, F) = det + (πE,F ) ≤ πE,F  = max πE,F (u) u∈E,u=1  = max 1 − πE,F ⊥ (u)2 u∈E,u=1 = 1 − δmin (E, F ⊥ )2 , which proves (d).



2.2 Singular Value Geometry

43

Next we extend α-angles to flags. Consider a signature τ of length k. Definition 2.26 Given flags F, G ∈ Fτ (V ), define α(F, G) = ατ (F, G) := min α(Fj , Gj ). 1≤j≤k

Definition 2.27 We say that two τ -flags F, G ∈ Fτ (V ) are orthogonal, and we write F ⊥ G, if Fj ⊥ Gj for some j = 1, . . . , k. Comparing the two definitions, for all F, G ∈ Fτ (V ) ατ (F, G) = 0



G ⊥ F.

Hence, the orthogonal flag hyperplane of F is defined as Σ(F) := {G ∈ Fτ (V ) : α(G, F) = 0}. As in the previous cases, the number ατ (F, G) equals the sine of the minimum angle between F and Σ(G). Proposition 2.20 For any F, G ∈ Fτ (V ), α(E, F) = sin ρmin (F, Σ(G)) = δmin (F, Σ(G)). Consider a sequence of linear maps g0 , g1 , . . . , gn−1 ∈ L (V ). The following quantities, called expansion rifts, measure the break of expansion in the composition gn−1 . . . g1 g0 of the maps gj . Definition 2.28 The first expansion rift of the sequence above is the number ρ(g0 , g1 , . . . , gn−1 ) :=

gn−1 . . . g1 g0  ∈ [1, +∞) . gn−1  . . . g1 g0 

Given 1 ≤ k ≤ dim V , the kth expansion rift is ρk (g0 , g1 , . . . , gn−1 ) := ρ(∧k g0 , ∧k g1 , . . . , ∧k gn−1 ) . Given a signature τ = (τ1 , . . . , τk ), the τ -expansion rift is defined as ρτ (g0 , g1 , . . . , gn−1 ) := min ρτj (g0 , g1 , . . . , gn−1 ) . 1≤j≤k

The key concept of this section is that of angle between linear maps. The quantity α(g, g ), for instance, is the sine of the angle between ϕg (v(g)) = v(g∗ ) and Σ(v(g )). As we will see, this angle is a lower bound on the expansion rift of two linear maps g and g .

44

2 Estimates on Grassmann Manifolds

Definition 2.29 Given g, g ∈ L (V ), we define α(g, g ) := α(v(g∗ ), v(g )) αk (g, g ) := α(vk (g∗ ), vk (g ))

if g and g have a first gap ratio if g and g have a k gap ratio

ατ (g, g ) := α(vτ (g∗ ), vτ (g ))

if g and g have a τ gap pattern.

The following exotic operation is introduced to obtain an upper bound on the expansion rift ρ(g, g ). Consider the algebraic operation a ⊕ b := a + b − a b on the set [0, 1]. Clearly ([0, 1], ⊕) is a commutative semigroup isomorphic to ([0, 1], ·). In fact, the transformation Φ : ([0, 1], ⊕) → ([0, 1], ·), Φ(x) := 1 − x, is a semigroup isomorphism. We summarize some properties of this operation. Proposition 2.21 For any a, b, c ∈ [0, 1], (1) (2) (3) (4) (5) (6) (7)

0 ⊕ a = a, 1 ⊕ a = 1, a ⊕ b = (1 − b) a + b = (1 − a) b + a, a ⊕ b < 1 ⇔ a < 1 and b < 1, a ≤ b ⇒ a ⊕ c ≤ b ⊕ c, −1 ⊕ c) b ≤ a ⊕ b>0 ⇒ √ c, √ (ab √ a c + b 1 − a 2 1 − c 2 ≤ a 2 ⊕ b2 .

Proof Items (1)–(6) are left as exercises. For √ item consider the function f : √ the last [0, 1] → [0, 1] defined by f (c) := a c + b 1 − a2 1 − c2 . A simple computation shows that √ b c 1 − a2 f (c) = a − √ 1 − c2 √ The derivative f has a zero at c =√a/ a ⊕ b,√and one can check that this zero is a  global maximum of f . Since f (a/ a ⊕ b) = a2 ⊕ b2 , item (7) follows. Definition 2.30 Given g, g ∈ L (V ) with τ -gap patterns, the upper τ -angle between g and g is defined to be βτ (g, g ) :=



gr τ (g)−2 ⊕ ατ (g, g )2 ⊕ gr τ (g )−2 .

We will write βk (g, g ) when τ = (k), and β(g, g ) when τ = (1). The next proposition relates norm expansion by the linear map g, and distance contraction by the projective map ϕg , with angles and gap ratios. Proposition 2.22 Given g ∈ L (V ) with σ (g) < 1, a point wˆ ∈ P(V ) and a unit vector w ∈ w, ˆ (a) α(w, ˆ v(g)) g ≤ g w ≤ g α(w, ˆ v(g))2 ⊕ σ (g)2 , σ (g) δ(w, ˆ v(g)) . (b) δ(ϕg (w), ˆ v(g∗ )) = δ(ϕg (w), ˆ ϕg (v(g))) ≤ α(w, ˆ v(g))

2.2 Singular Value Geometry

45

Proof Let us write α = α(w, ˆ v(g)) and σ = σ (g). Take a unit vector v ∈ v(g) √ such that ∠(v, w) is non obtuse. Then w = α v + u with u ⊥ v and u = 1 − α 2 . ∗ ∈ v(g∗ ), we have gw = α g v∗ + gu with gu ⊥ v∗ and Choosing √a unit vector v √ 2 gu ≤ √1 − α s2 (g) = 1 − α 2 σ g. We define the number 0 ≤ κ ≤ σ so that gu = 1 − α 2 κ g. Hence α 2 g2 ≤ α 2 g2 + gu2 = gw2 , and also   gw2 = α 2 g2 + gu2 = g2 α 2 + (1 − α 2 )κ 2     = g2 α 2 ⊕ κ 2 ≤ g2 α 2 ⊕ σ 2 , which proves (a). Item (b) follows from   g v ∧ gw g v ∧ gu v∗ ∧ gu = = δ ϕg (w), ˆ v(g∗ ) = gv gw g gw gw √ 2 σ 1 − α g σ δ(w, ˆ v(g)) gu ≤ = . = gw α g α



Next proposition relates the expansion rift ρ(g, g ) with the angle α(g, g ) and the upper angle β(g, g ). Proposition 2.23 Given g, g ∈ L (V ) with a (1)-gap pattern, α(g, g ) ≤

g g ≤ β(g, g ) g  g

Proof Let α := α(g, g ) = α(v(g∗ ), v(g )) and take unit vectors v ∈ v(g), v∗ ∈ v(g∗ ) and v ∈ v(g ) such that v∗ , v  = α > 0 and g v = g v∗ . Since ϕg (v(g)) = v(g∗ ), w = gg vv is a unit vector in wˆ = v(g∗ ). Hence, applying Proposition 2.22(a) to g and w, ˆ we get α(g, g ) g  = α(w, ˆ v(g )) g  ≤ 

g g g g v ≤ , g v g

which proves the first inequality. For the second inequality, consider any wˆ ∈ P(g) and √ a unit vector w ∈ wˆ such that a := w, v = α(w, ˆ v(g)) ≥ 0. Then w = a v + √ 1 − a2 u, where u is a unit ∗ vector orthogonal to v. It follows that g w = a g v + 1 − a2 g u with g u ⊥ v∗ , and g u = κ g for some 0 ≤ κ ≤ σ (g). Therefore g w2 = a2 + (1 − a2 ) κ 2 = a2 ⊕ κ 2 . g2

46

and

2 Estimates on Grassmann Manifolds

√ 1 − a2 g u gw a ∗ v +√ =√ . g w a2 ⊕ κ 2 a2 ⊕ κ 2 g

The vector v can be written as v = α v∗ + w with w ⊥ v∗ and w  = ˆ v(g )). Then Set now b := α(ϕg (w),

gw αa , v  ≤ √ b =

g w a2 ⊕ κ 2 αa ≤√ a2 ⊕ κ 2 αa ≤√ a2 ⊕ κ 2 αa =√ a2 ⊕ κ 2



1 − α2 .

1 − a2

g u, v  √ g a2 ⊕ κ 2 √ 2

κ 1 − a

g u

, w  √ 2 2 g u a ⊕κ √ 2 κ 1−a w  √ 2 2 a ⊕κ √ √ √ α2 ⊕ κ 2 κ 1 − a2 1 − α 2 ≤ √ . √ a2 ⊕ κ 2 a2 ⊕ κ 2 √

+ + + +

We use Lemma 2.21 (7) on the last inequality. Finally, by Proposition 2.22(a) applied to g ∈ L (V ) and the unit vector gw/gw ∈ ϕg (w), ˆ g g w ≤ g  b2 ⊕ σ (g )2 g w ≤ g  g b2 ⊕ σ (g )2 a2 ⊕ κ 2 ≤ g  g κ 2 ⊕ α 2 ⊕ σ (g )2 ≤ β(g, g ) g  g, where on the two last inequalities use items (6) and (5) of Lemma 2.21.



Corollary 2.1 Given g, g ∈ L (V ) with a (k)-gap pattern, αk (g, g ) ≤

∧k (g g) ≤ βk (g, g ) ∧k g  ∧k g

Proof Apply Proposition 2.23 to the composition (∧k g ) (∧k g). Notice that by Definition 2.15, the Plücker embedding satisfies Ψ (vk (g)) = v(∧k g). Hence

αk (g, g ) = α(vk (g∗ ), vk (g )) =

v(∧k g), v(∧k g ) = α(∧k g, ∧k g ).



The next results show how close the bounds α(g, g ) and β(g, g ) are to each other and to the rift ρ(g, g ).

2.2 Singular Value Geometry

47

Lemma 2.4 Given g, g ∈ L (V ) with (1)-gap patterns, β(g, g ) ≤ 1≤ α(g, g )

 1+

gr(g)−2 ⊕ gr(g )−2 . α(g, g )2

Proof Just notice that

κ 2 ⊕ α 2 ⊕ (κ )2 ≤ α



α 2 + (κ 2 ⊕ (κ )2 ) = α2

 1+

κ 2 ⊕ (κ )2 . α2



Proposition 2.24 Given g, g ∈ L (V ) with a (1)-gap pattern  α(g, g ) ≥ ρ(g, g ) 1 −

gr(g)−2 + gr(g )−2 . ρ(g, g )2

Proof By Proposition 2.23 ρ(g, g )2 ≤ β(g, g )2 ≤ α(g, g )2 + σ (g)2 + σ (g )2 ,



which implies the claimed inequality. These inequalities then imply the following more general fact. Proposition 2.25 Given g0 , g1 , . . . , gn−1 ∈ L (V ), if for all 1 ≤ i ≤ n − 1 the linear maps gi and g(i) = gi−1 . . . g0 have (1)-gap patterns, then n−1 

 gn−1 . . . g1 g0  α(g , gi ) ≤ β(g(i) , gi ) ≤ g  . . . g g  n−1 1 0 i=1 i=1 n−1

(i)

Proof By definition g(n−1) = gn−1 . . . g1 g0 , and by convention g(0) = id V . Hence  g(i+1)  gn−1 . . . g1 g0  = n−1 i=0 g(i)  . This implies that n−1  1 gn−1 . . . g1 g0  = gn−1  . . . g1  gi  i=0

n−1  g(i+1)  i=0

g(i) 

n−1  gi g(i)  . = gi  g(i)  i=0

It is now enough to apply Proposition 2.23 to each factor.



48

2 Estimates on Grassmann Manifolds

2.3 Lipschitz Estimates In this section we will derive some inequalities describing quantities such as the contracting behavior of a linear endomorphism on the projective space, the Lipschitz dependence of a projective action on the acting linear endomorphism, the continuity of most expanding directions as functions of a linear map, and the Lipschitz modulus of continuity for sum and intersection operations on flag manifolds. Except for Propositions 2.28 and 2.29, the content of this section will be only used in Chaps. 4 and 5.

2.3.1 Projective Action Proposition 2.26 Given p, q ∈ V \ {0}, 

q 1 1 p −  ≤ max{ , } p − q. p q p q

Proof Given to vectors u, v ∈ V with u ≥ v = 1 we have 

u v −  ≤ u − v. u v

Assume for instance that p ≥ q, so that max{p−1 , q−1 } = q−1 . Applying the previous inequality with u = 

p q

and v =

q , q

we get

q u v p q p − = −  ≤ u − v =  −  p q u v q q = q−1 p − q = max{p−1 , q−1 } p − q.



Given a linear map g ∈ L (V ), the projective action of g is given by the map ϕg : P(g) → P(g∗ ), ϕg (ˆp) := g!p. For any non collinear vectors p, q ∈ V with p = q = 1, define vp (q) :=

q − p, q p . q − p, q p

This is the normalized unit vector of the orthogonal projection of q onto p⊥ .

2.3 Lipschitz Estimates

49

Proposition 2.27 Given g ∈ L (V ), and points pˆ = qˆ in P(V ), δ(ϕg (ˆp), ϕg (ˆq)) gp ∧ gvp (q) = . δ(ˆp, qˆ ) g p g q Proof Let p ∈ pˆ and q ∈ qˆ be unit vectors such that θ = ∠(p, q) ∈ [0, π2 ]. We can write q = (cos θ ) p + (sin θ ) vp (q). Hence δ(ˆp, qˆ ) = p ∧ q = (sin θ ) p ∧ vp (q) = sin θ, and δ(ϕg (ˆp), ϕg (ˆq)) =

gp ∧ gvp (q) g p ∧ g q = (sin θ ) . g p g q g p g q



Given a point pˆ ∈ P(V ), we identify the tangent to the projective space at pˆ as Tpˆ P(V ) = p⊥ , for any representative p ∈ pˆ . Proposition 2.28 Given g ∈ L (V ), xˆ ∈ P(g), and a representative x ∈ xˆ , the derivative of the map ϕg : P(g) → P(g∗ ) at xˆ is given by (Dϕg )xˆ v =

g v − gg xx , g v gg xx g x

=

1 (g v) π⊥ g x gx/gx

Proof The sphere S(V ) := {v ∈ V : v = 1} is a double covering space of P(V ), whose covering map is the canonical projection πˆ : S(V ) → P(V ). With the identification Tpˆ P(V ) = p⊥ , the derivative of πˆ , Dπˆ x : Tx S(V ) → Txˆ P(V ), is the identity linear map. The map ϕg lifts to the map defined on the sphere by ϕg )x . A simple " ϕg (x) := gg xx . Hence we can identify the derivatives (Dϕg )xˆ and (D"  calculation leads to the explicit expression above for (D" ϕg )x v. We will use the following closed ball notation B(d) (ˆp, r) := {ˆx ∈ P(V ) : d(ˆx , pˆ ) ≤ r}, where the superscript emphasizes the distance in matter. Given a projective map f : X ⊂ P(V ) → P(V ), we denote by Lipd (f ) the least Lipschitz constant of f with respect to the distance d. Next proposition refers to the projective metrics δ and ρ defined in Sect. 2.1.1. Proposition 2.29 Given 0 < κ < 1 and g ∈ L (V ) such that gr(g) ≥ κ −1 , √   (1) ϕg B(δ) (v(g), r)  ⊂ B(δ) (v(g∗ ), κ r/ 1 − r 2 ), for any 0 < r < 1, (2) ϕg B(ρ) (v(g), a) ⊂ B(ρ) (v(g∗ ), κ tan a), for any 0 < a < π2 , (3) Lipρ (ϕg |B(δ) (v(g),r) ) ≤ κ

√ r+ 1−r 2 , 1−r 2

for any 0 < r < 1.

50

2 Estimates on Grassmann Manifolds

Proof Item (1) of this proposition follows from Proposition 2.22(b), because δ(w, ˆ v(g)) < r

implies α(w, ˆ v(g)) =



1 − δ(w, ˆ v(g))2 ≥



1 − r2.

Item (2) reduces to (1), because we have δ(ˆu, vˆ ) = sin ρ(ˆu, vˆ ), which implies that B(ρ) (ˆv, a) = B(δ) (ˆv, sin a). To prove (3), take unit vectors v ∈ v(g) and v∗ ∈ v(g∗ ) such that g v = g v∗ . Because v is a g-most expanding vector, πv⊥∗ ◦ g = g ◦ πv⊥  ≤ s2 (g) ≤ κ g. Given xˆ such that δ(ˆx , v(g)) < r, and a unit vector x ∈ xˆ , by Proposition 2.22(a) 1 1 g ≤ ≤√ . gx α(ˆx , v(g)) 1 − r2 Using item (b) of the same proposition we get δ(ϕg (ˆx ), v(g∗ )) ≤

σ (g) κr δ(ˆx , v(g)) ≤ √ α(ˆx , v(g)) 1 − r2

By Proposition 2.28 we have (Dϕg )x v =

$ 1 1 # ⊥ ⊥ πv⊥∗ (g v) + π" ϕg (x) − πv∗ (g v). gx gx

Thus, by Lemma 2.2(d), (Dϕg )x  ≤

κ g δ(ϕg (ˆx ), v(g∗ )) g + gx gx

√ κr κ (r + 1 − r 2 ) ≤√ + = . 1 − r2 1 − r2 1 − r2 κ

Since B(δ) (v(g), r) is a convex Riemannian disk, by the mean value theorem √ κ (r+ 1−r 2 ) with respect to distance ρ.  ϕg |B(δ) (v(g),r) has Lipschitz constant ≤ 1−r 2

2.3.2 Operations on Flag Manifolds As before let V be a finite n-dimensional Euclidean space. Recall that the Grassmann manifold Gr k (V ) identifies through the Plücker embedding with a submanifold of P(∧k V ). Up to a sign, E ∈ Gr k (V ) is identified with the unit k-vector e = e1 ∧· · ·∧ek associated to any orthonormal basis {e1 , . . . , ek } of E. Recall that the Grassmann distance (2.10) on Gr k (V ) can be characterized by d(E1 , E2 ) := min{e1 − e2 , e1 + e2 }, where ej is a unit k-vector of Ej , for j = 1, 2.

2.3 Lipschitz Estimates

51

Definition 2.31 Given E, F ∈ Gr(V ), we say that E and F are (∩) transversal if E + F = V . Analogously, we say that E and F are (+) transversal if E ∩ F = {0}. The following numbers quantify the transversality of two linear subspaces. Definition 2.32 Given E ∈ Gr r (V ) and F ∈ Gr s (V ), consider a unit r-vector e of E, a unit s-vector f of F, a unit (n − r)-vector e⊥ of E ⊥ and a unit (n − s)-vector f ⊥ of F ⊥ . We define θ+ (E, F) := e ∧ f , θ∩ (E, F) := e⊥ ∧ f ⊥ . Since the chosen unit vectors are unique up to a sign, these quantities are well-defined. Remark 2.1 If r + s > n then θ+ (E, F) = 0. Similarly, if r + s < n then θ∩ (E, F) = 0. Remark 2.2 Given E, F ∈ Gr(V ), θ∩ (E, F) = θ+ (E ⊥ , F ⊥ ). Next proposition establishes a Lispchitz modulus of continuity for the sum and intersection operations on Grassmannians in terms of the previous quantities. Proposition 2.30 Given r, s ∈ N and E, E ∈ Gr r (V ), F, F ∈ Gr s (V ),   1 1 (d(E, E ) + d(F, F )), (1) d(E + F, E + F ) ≤ max , , F ) θ (E, F) θ (E +   + 1 1 (2) d(E ∩ F, E ∩ F ) ≤ max , (d(E, E ) + d(F, F )). θ∩ (E, F) θ∩ (E , F ) Proof (1) Consider unit r-vectors e and e representing the subspaces E and E respectively. Consider also unit s-vectors f and f representing the subspaces F and F respectively. By Proposition 2.26 e ∧ f e∧f −  e ∧ f  e ∧ f  ≤ K e ∧ f − e ∧ f 

d(E + F, E + F ) = 

≤ K (e ∧ (f − f ) + (e − e ) ∧ f ) ≤ K (e − e  + f − f )

where K = max{e ∧ f −1 , e ∧ f −1 } = max{θ+ (E, F)−1 , max{θ+ (E , F )−1 }. (2) reduces to (1) by duality (see Proposition 2.2).  Next proposition gives an alternative characterization of the transversality measurements θ+ (E, F) and θ∩ (E, F). Let, as before, πE : V → E denote the orthogonal projection onto a subspace E ⊂ V , and define the restriction πE,F := πF |E : E → F.

52

2 Estimates on Grassmann Manifolds

Proposition 2.31 Given E ∈ Gr r (V ) and F ∈ Gr s (V ), (1) θ+ (E, F) = det + (πE,F ⊥ ) = det + (πF,E ⊥ ). (2) θ∩ (E, F) = det + (πE ⊥ ,F ) = det + (πF ⊥ ,E ). Proof Notice that E ∩ F = K(πE,F ⊥ ) = K(πF,E ⊥ ). If E ∩ F = {0} then the three terms in (1) vanish. Otherwise πE,F ⊥ and πF,E ⊥ are isomorphisms onto their ranges R(πE,F ⊥ ) = F ⊥ ∩ (E + F) and R(πF,E ⊥ ) = E ⊥ ∩ (E + F). Take an orthonormal basis {f1 , . . . , fs , fs+1 , . . . , fs+r , . . . , fn } such that {f1 , . . . , fs } spans F and the family of vectors {f1 , . . . , fr , fs+1 , . . . , fs+r } spans E + F. Consider the unit s-vector f = f1 ∧ · · · ∧ fs of F, and a unit r-vector e = e1 ∧ · · · ∧ er of E. Hence {fs+1 , . . . , fs+r } is a basis of R(πE,F ⊥ ) and θ+ (E, F) = (e1 ∧ · · · ∧ er ) ∧ (f1 ∧ · · · ∧ fs ) = πE,F ⊥ (e1 ) ∧ · · · ∧ πE,F ⊥ (er ) ∧ f1 ∧ · · · ∧ fs  = det + (πE,F ⊥ ) fs+1 ∧ · · · ∧ fs+r ∧ f1 ∧ · · · ∧ fs  = det + (πE,F ⊥ ). Reversing the roles of E and F, and because e ∧ f  is symmetric in e and f , we obtain θ+ (E, F) = det + (πF,E ⊥ ), which proves (1). By duality and Remark 2.2, item (2) reduces to (1).  The measurement on the (∩) transversality admits the following lower bound in terms of the angle in Definition 2.24. Proposition 2.32 Given E ∈ Gr r (V ) and F ∈ Gr s (V ), if E + F = V then θ∩ (E, F) ≥ αr (E, E ∩ F + F ⊥ ). Proof Combining Lemmas 2.5 and 2.6 below we have θ∩ (E, F) ≥ θ∩ (E, F ∩ (E ∩ F)⊥ ) = αr (E, (F ∩ (E ∩ F)⊥ )⊥ ) = αr (E, (E ∩ F) + F ⊥ ).



Lemma 2.5 Given E ∈ Gr r (V ), E ∈ Gr r (V ) and F ∈ Gr s (V ) such that r + s ≥ n and E ⊆ E then θ∩ (E , F) ≥ θ∩ (E, F). Proof Because E ⊂ E , we have πF ⊥ ,E = πE ,E ◦ πF ⊥ ,E . Hence by Proposition 2.8 θ∩ (E, F) = det + (πF ⊥ ,E ) = det + (ππE (F ⊥ ),E ) det + (πF ⊥ ,E ) ≤ det+ (πF ⊥ ,E ) = θ∩ (E , F),

where det + (ππE (F ⊥ ),E ) ≤ 1 because πE  ≤ 1. Lemma 2.6 Given E, E ∈ Gr r (V ), θ∩ (E , E ⊥ ) = αr (E , E).



2.3 Lipschitz Estimates

53

Proof Given orthonormal bases {v1 , . . . , vr } of E, and {v1 , . . . , vr } of E , θ∩ (E , E ⊥ ) = det + (πE ,E )

=

∧r πE,E (v1 ∧ · · · ∧ vr ), v1 ∧ · · · ∧ vr 

=

πE (v1 ) ∧ · · · ∧ πE (vr ), v1 ∧ · · · ∧ vr 

=

v1 ∧ · · · ∧ vr , v ∧ · · · ∧ v  = αr (E, E ). 1

r



Next proposition gives a modulus of lower semi-continuity for the transversality measurement θ∩ . Proposition 2.33 Given E, E0 ∈ Gr r (V ) and F, F0 ∈ Gr s (V ), θ∩ (E, F) ≥ θ∩ (E0 , F0 ) − d(E, E0 ) − d(F, F0 ). Proof Consider unit vectors e ∈ Ψ (E ⊥ ), f ∈ Ψ (F ⊥ ), e0 ∈ Ψ (E0⊥ ) and f0 ∈ Ψ (F0⊥ ), chosen so that d(E, E0 ) = d(E ⊥ , E0⊥ ) = e − e0 , d(F, F0 ) = d(F ⊥ , F0⊥ ) = f − f0 . Hence θ∩ (E, F) = e ∧ f  ≥ e0 ∧ f0  − e ∧ f − e0 ∧ f0  ≥ θ∩ (E0 , F0 ) − e ∧ (f − f0 ) − (e − e0 ) ∧ f0  ≥ θ∩ (E0 , F0 ) − f − f0  − e − e0  ≥ θ∩ (E0 , F0 ) − d(F, F0 ) − d(E, E0 ).



Next proposition refines inequality (2.7). Proposition 2.34 Given E, F ∈ Gr k (V ), and families of vectors {u1 , . . . , uk } ⊂ E and {uk+1 , . . . , uk+i } ⊂ F ⊥ with 1 ≤ i ≤ m − k, (a) u1 ∧ · · · ∧ uk ∧ uk+1 ∧ · · · ∧ uk+i  ≤ u1 ∧ · · · ∧ uk  uk+1 ∧ · · · ∧ uk+i , (b) u1 ∧ · · · ∧ uk ∧ uk+1 ∧ · · · ∧ uk+i  ≥ α(E, F) u1 ∧ · · · ∧ uk  uk+1 ∧ · · · ∧ uk+i . Proof Since πF ⊥ ,E ⊥ is an orthogonal projection, all its singular values are in [0, 1]. Thus, because det + (πF ⊥ ,E ⊥ ) is the product of all singular values, while m(∧i πF ⊥ ,E ⊥ ) is the product of the i smallest singular values, we have det + (πF ⊥ ,E ⊥ ) ≤ m(∧i πF ⊥ ,E ⊥ ) ≤ ∧i πF ⊥ ,E ⊥  = 1.

54

2 Estimates on Grassmann Manifolds

Hence u1 ∧ · · · ∧ uk ∧ uk+1 ∧ · · · ∧ uk+i  = u1 ∧ · · · ∧ uk ∧ πF ⊥ ,E ⊥ (uk+1 ) ∧ · · · ∧ πF ⊥ ,E ⊥ (uk+i ) = u1 ∧ . . . uk  πF ⊥ ,E ⊥ (uk+1 ) ∧ · · · ∧ πF ⊥ ,E ⊥ (uk+i ) ≤ ∧i πF ⊥ ,E ⊥  u1 ∧ · · · ∧ uk  uk+1 ∧ · · · ∧ uk+i  = u1 ∧ · · · ∧ uk  uk+1 ∧ · · · ∧ uk+i ,

which proves (a). By Proposition 2.19 we have α(E, F) = α(F ⊥ , E ⊥ ) = det + (πF ⊥ ,E ⊥ ) ≤ m(∧i (πF ⊥ ,E ⊥ )). Thus u1 ∧ · · · ∧ uk ∧ uk+1 ∧ · · · ∧ uk+i  = u1 ∧ · · · ∧ uk ∧ πF ⊥ ,E ⊥ (uk+1 ) ∧ · · · ∧ πF ⊥ ,E ⊥ (uk+i ) = u1 ∧ . . . uk  πF ⊥ ,E ⊥ (uk+1 ) ∧ · · · ∧ πF ⊥ ,E ⊥ (uk+i ) ≥ m(∧i πF ⊥ ,E ⊥ ) u1 ∧ · · · ∧ uk  uk+1 ∧ · · · ∧ uk+i  ≥ α(E, F) u1 ∧ · · · ∧ uk  uk+1 ∧ · · · ∧ uk+i ,



which proves (b). The angle α is a Lipschitz continuous function. Proposition 2.35 Given u, u , v, v ∈ P(V ),

α(u, v) − α(u , v ) ≤ d(u, u ) + d(v, v ).



Proof Exercise.

The intersection of complementary flags satisfying the appropriate transversality conditions determines a decomposition of the Euclidean space V . We end this section defining by this operation and proving a modulus of continuity for it. Consider a signature τ = (τ1 , . . . , τk ) of length k with τk < dim V . We make the convention that τ0 = 0 and τk+1 = dim V . Definition 2.33 A τ -decomposition is a family of linear subspaces E· = {Ei }1≤i≤k+1 in Gr(V ) such that V = ⊕k+1 i=1 Ei and dim Ei = τi − τi−1 for all 1 ≤ i ≤ k + 1. Let Dτ (V ) denote the space of all τ -decompositions, which we endow with the following metric dτ (E· , E· ) = max dτi −τi−1 (Ei , Ei ), 1≤i≤k+1

where dτi −τi−1 stands for the distance (2.10) in Gr τi −τi−1 (V ). Given two flags F ∈ Fτ (V ) and F ∈ Fτ ⊥ (V ), we will define a decomposition, denoted by F  F , formed out by intersecting the components of these flags. For that we introduce the following measurement.

2.3 Lipschitz Estimates

55

Definition 2.34 Given two flags F ∈ Fτ (V ) and F ∈ Fτ ⊥ (V ), let ). θ (F, F ) := min θ∩ (Fi , Fk−i+1 1≤i≤k

⊥ Notice that dim Fi = τi and dim Fk−i+1 = τk−i+1 = dim V −τi , i.e., the subspaces Fi and Fk−i+1 have complementary dimensions. We will refer to this quantity as the transversality measurement between the flags F and F . In the next proposition we complete F and F to full flags of length k + 1 setting = V . Set also τ0 = 0 and τk+1 = dim V . Fk+1 = Fk+1

Proposition 2.36 If θ (F, F ) > 0 then the following is a direct sum decomposition in the space Dτ (V ), k+1 % Fi ∩ Fk−i+2 , V = i=1 with dim(Fi ∩ Fk−i+2 ) = τi − τi−1 for all 1 ≤ i ≤ k + 1. Proof Since the subspaces Fi and Fk−i+1 have complementary dimensions, the rela tion θ∩ (Fi , Fk−i+1 ) > 0 implies that . V = Fi ⊕ Fk−i+1

(2.22)

) ≥ θ∩ (Fi , Fk−i+1 ) > 0. Therefore Fi + Fk−i+2 =V By Lemma 2.5, θ∩ (Fi , Fk−i+2 and ⊥ ) = τi + τk−i+2 − dim V dim(Fi ∩ Fk−i+2 = τi + (dim V − τi−1 ) − dim V = τi − τi−1 .

We prove by finite induction in i = 1, . . . , k + 1 that Fi =

%

Fj ∩ Fk−j+2 .

(2.23)

j≤i

Since Fk+1 = V the proposition follows from this relation at i = k + 1. For i = 1, (2.23) reduces to F1 = F1 ∩ V . The induction step follows from   . Fi+1 = Fi ⊕ Fi+1 ∩ Fk−i+1 Since the following dimensions add up dim Fi+1 = τi+1 = τi + (τi+1 − τi ) ), = dim Fi + dim(Fi+1 ∩ Fk−i+1

56

2 Estimates on Grassmann Manifolds

it is enough to see that   = Fi ∩ Fk−i+1 = {0}, Fi ∩ Fi+1 ∩ Fk−i+1 

which holds because of (2.22). Hence, by the previous proposition we can define:

Definition 2.35 Given flags F ∈ Fτ (V ) and F ∈ Fτ ⊥ (V ) such that θ (F, F ) > 0 }1≤i≤k+1 and call it the intersection decomposition we define F  F := {Fi ∩ Fk−i+2 of the flags F and F . Next proposition provides a modulus of lower semi-continuity for the transversality measurement θ . Proposition 2.37 Given F, F0 ∈ Fτ (V ) and F , F0 ∈ Fτ ⊥ (V ), θ (F, F ) ≥ θ (F0 , F0 ) − dτ (F, F0 ) − dτ ⊥ (F , F0 ). Proof Apply Proposition 2.33 at each subspace of the τ -decompositions.



The modulus of continuity for the intersection map  : Fτ (V ) × Fτ ⊥ (V ) → Dτ (V ) is established below. Proposition 2.38 Given flags F1 , F2 ∈ Fτ (V ) and F1 , F2 ∈ Fτ ⊥ (V ), dτ (F1  F1 , F2

 F2 )



1 1 ≤ max , θ (F1 , F1 ) θ (F2 , F2 )



(dτ (F1 , F2 ) + dτ ⊥ (F1 , F2 )).

Proof Apply Proposition 2.30 at each subspace of the τ -decompositions.



Given two linear maps g0 , g1 ∈ L (V ) with τ -gap ratios such that ατ (g0 , g1 ) > 0, they determine a τ -decomposition of V as intersection of the image by ϕg0 of the most expanding τ -flag for g0 with the least expanding τ ⊥ -flag for g1 (see Definitions 2.18 and 2.21). The corresponding intersection transversality measurement is bounded from below by the angle ατ (g0 , g1 ). Proposition 2.39 Given g0 , g1 ∈ L (V ), if gr τ (g0 ) > 1 and gr τ (g1 ) > 1 then θ (vτ ⊥ (g1 ), vτ (g0∗ )) ≥ ατ (g0 , g1 ). In particular, if ατ (g0 , g1 ) > 0 then the flags vτ (g0∗ ) and vτ ⊥ (g1 ) determine the decomposition vτ (g0∗ )  vτ ⊥ (g1 ) ∈ Dτ (V ). Proof Let n = dim V . Consider the flags F = vτ (g0∗ ) and F = vτ ⊥ (g1 ). We (g1 ) = vn−τi (g1 ) = vτi (g1 )⊥ . Hence by have Fi = vτi (g0∗ ) and Fk−i+1 = vτk−i+1 ⊥ Lemma 2.6,

2.3 Lipschitz Estimates

57

θ∩ (Fi , Fk−i+1 ) = θ∩ (vτi (g0∗ ), vτi (g1 )⊥ ) = ατi (vτi (g0∗ ), vτi (g1 )) = ατi (g0 , g1 ),

and taking the minimum, θ (F, F ) ≥ ατ (g0 , g1 ).



2.3.3 Dependence on the Linear Map We establish a modulus of Lipschitz continuity for the most expanding direction of a linear endomorphism with a first singular gap. For any 0 < κ < 1, consider the set Lκ := {g ∈ L (V ) : gr(g) ≥ κ1 }. We denote by v : Lκ → P(V ) the map that assigns the g-most expanding direction to each g ∈ Lκ . The relative distance between linear maps g, g ∈ L (V ) \ {0} is defined as drel (g, g ) :=

g − g  . max{g, g }

Notice that this relative distance is not a metric. It does not satisfy the triangle inequality. We introduce it just to lighten the notation. Proposition 2.40 The map v : Lκ → P(V ) is locally Lipschitz. More precisely, given 0 < κ < 1 there exists ε0 = ε0 (κ) > 0, which increases as κ decreases, such that for any g1 , g2 ∈ Lκ satisfying drel (g1 , g2 ) ≤ ε0 , d(v(g1 ), v(g2 )) ≤

16 drel (g1 , g2 ). 1 − κ2

Proof Let g ∈ Lκ and λ > 0. The singular √ values (resp. singular vectors) of g are the eigenvalues (resp. eigenvectors) of g∗ g. Hence sj (λ g) = λ sj (g), for all j. We also have v(λg) = v(g) and gr(λ g) = gr(g). Consider the subspace Lκ (1) := {g ∈ Lκ : g = 1}. The projection g → g/g takes Lκ to Lκ (1). It also satisfies v(g/g) = v(g) and 

g2 g1 −  ≤ 2 drel (g1 , g2 ). g1  g2 

Hence we can focus our attention on the restricted map v : Lκ (1) → P(V ). Let Lκ+ (1) denote the subspace of g ∈ Lκ (1) such that g = g∗ ≥ 0, i.e., g is positive semi-definite. Given g ∈ Lκ (1), we have g∗ g = 1 = g, gr(g∗ g) = gr(g)2 and v(g∗ g) = v(g). Also, for all g1 , g2 ∈ Lκ (1), g1∗ g1 − g2∗ g2  ≤ g1∗  g1 − g2  + g1∗ − g2∗  g2  = (g1∗  + g2 ) g1 − g2  ≤ 2 g1 − g2 .

58

2 Estimates on Grassmann Manifolds

Hence, the mapping g → g∗ g takes Lκ (1) to Lκ+2 (1) and has Lispschitz constant 2. Therefore, it is enough to prove that the restricted map v : Lκ+2 (1) → P(V ) has (locally) Lipschitz constant 4 (1 − κ 2 )−1 . Let δ0 be a small positive number and take 0 < ε0  δ40 . The size of δ0 will be fixed throughout the rest of the proof according to necessity. Take h1 , h2 ∈ Lκ+2 (1) such that h1 − h2  < ε0 and set pˆ 0 := v(h1 ). By Proposition 2.29 we have ⎛ ⎞ 2   δ κ 0 ⎠ ϕh1 B(ˆp0 , δ0 ) ⊂ B ⎝pˆ 0 ,  ⊂ B(ˆp0 , δ0 ), 1 − δ02 where all balls refer to the projective sine-metric δ defined in (2.3). The second inclusion holds if δ0 is chosen small enough. Take any pˆ ∈ B(ˆp0 , δ0 ) and choose unit vectors p ∈ pˆ and p0 ∈ pˆ 0 such that p, p0  > 0. Then p = p, p0  p0 + w, with ⊥ w ∈ p⊥ 0 , h1 (p0 ) = p0 and h1 (w) ∈ p0 . Hence h1 (p) =  p, p0  p0 + h1 (w) ≥ p, p0   = 1 − p ∧ p0 2 ≥ 1 − δ02 ≥ 1/2, and again, assuming δ0 is small, h2 (p) ≥ h1 (p) − h1 − h2  ≥

 1 − δ02 − ε0 ≥ 1/2.

Thus, by Lemma 2.9 below, for all pˆ ∈ B(ˆp0 , δ0 ), d(ϕh1 (ˆp), ϕh2 (ˆp)) ≤ 2 h1 − h2 . Choosing ε0 small enough, √κ

δ0 1−δ02 2

+ 2 ε0 < δ0 . This implies that

  ϕh2 B(ˆp0 , δ0 ) ⊂ B(ˆp0 , δ0 ). By Proposition 2.29 we know that T1 = ϕh1 |B(ˆp0 ,δ0 ) has Lispchitz constant κ = √ 2 δ + 1−δ 1 2 κ 2 0 1−δ2 0 ≈ κ 2 , and assuming δ0 is small enough we have 1−κ ≤ 1−κ 2 . Notice that 0 although the Lispchitz constant in this proposition refers to the Riemannian metric ρ, since the ratio Lipδ (T1 )/Lipρ (T1 ) approaches 1 as δ0 tends to 0, we can assume that Lipδ (T1 ) ≤ κ . Thus, by Lemma 2.7 below applied to T1 and T2 = ϕh2 |B(ˆp0 ,δ0 ) , we have d(T1 , T2 ) ≤ 2 h1 − h2  and d(v(h1 ), v(h2 )) ≤

1 4 d(T1 , T2 ) ≤ h1 − h2 . 1 − κ 1 − κ2



2.3 Lipschitz Estimates

59

Lemma 2.7 Let (X, d) be a complete metric space, T1 : X → X a Lipschitz contraction with Lip(T1 ) < κ < 1, x1∗ = T1 (x1∗ ) a fixed point, and T2 : X → X any other map with a fixed point x2∗ = T2 (x2∗ ). Then d(x1∗ , x2∗ ) ≤

1 d(T1 , T2 ), 1−κ

where d(T1 , T2 ) := supx∈X d(T1 (x), T2 (x)). Proof d(x1∗ , x2∗ ) = d(T1 (x1∗ ), T2 (x2∗ )) ≤ d(T1 (x1∗ ), T1 (x2∗ )) + d(T1 (x2∗ ), T2 (x2∗ )) ≤ κ d(x1∗ , x2∗ ) + d(T1 , T2 ), which implies that d(x1∗ , x2∗ ) ≤

1 d(T1 , T2 ). 1−κ



Lemma 2.8 Given g1 , g2 ∈ L (V ), for any 1 ≤ i ≤ dim V , ∧i g1 − ∧i g2  ≤ i max{1, g1 , g2 }i−1 g1 − g2 . Proof Given any unit i-vector v1 ∧ · · · ∧ vi ∈ ∧i V , determined by an orthonormal family of vectors {v1 , . . . , vi }, (∧i g1 )(v1 ∧ · · · ∧ vi ) − (∧i g2 )(v1 ∧ · · · ∧ vi ) = (g1 v1 ) ∧ · · · ∧ (g1 vi ) − (g2 v1 ) ∧ · · · ∧ (g2 vi ) i  ≤ (g1 v1 ) ∧ · · · ∧ (g1 vj−1 ) ∧ (g1 vj − g2 vj ) ∧ (g2 vj+1 ) ∧ · · · ∧ (g2 vi ) j=1 i  g1 j−1 g2 i−j g1 vj − g2 vj  ≤ j=1

≤ i max{1, g1 , g2 }i−1 g1 − g2 . Given a dimension 1 ≤ l ≤ dim V and 0 < κ < 1, consider the set Ll,κ := {g ∈ L (V ) : gr l (g) ≥ κ −1 },



60

2 Estimates on Grassmann Manifolds

and define Cl (g1 , g2 ) :=

l max{1, g1 , g2 }l−1 . max{∧l g1 , ∧l g2 }

Corollary 2.2 The map v : Ll,κ → Gr l (V ) is locally Lipschitz. More precisely, given 0 < κ < 1 there exists ε0 > 0 such that for any g1 , g2 ∈ Ll,κ such that g1 − g2  ≤ ε0 Cl (g1 , g2 )−1 , we have d(vl (g1 ), vl (g2 )) ≤

16 Cl (g1 , g2 ) g1 − g2 . 1 − κ2

Proof By Lemma 2.8, drel (∧l g1 , ∧l g2 ) ≤ Cl (g1 , g2 ) g1 − g2 . Apply Proposi tion 2.40 to the linear maps ∧l gj : ∧l V → ∧l V , j = 1, 2. Given g ∈ L (V ) having k and k + r gap ratios, if a subspace E ∈ Gr k (V ) is close to the g most expanding subspace vk (g) then the restriction g|E ⊥ has an r-gap ratio and the most expanding r-dimensional subspace of g|E ⊥ is close to the intersection of vk+r (g) with E ⊥ . Next proposition expresses this fact in a quantitative way. Proposition 2.41 Given 0 < κ < 21 and integers 1 ≤ k < k + r ≤ dim V , there exists δ0 > 0 such that for all g ∈ L (V ) and E ∈ Gr k (V ), if (a) σk (g) < κ and σk+r (g) < κ, (b) δ(E, vk (g)) < δ0 then (1) σr (g|E ⊥ ) ≤ 2 κ,   (2) δ vr (g|E ⊥ ), vk+r (g) ∩ E ⊥ ≤

20r δ(E, vk (g)). 1 − 4 κ2

Proof Consider the compact space Kr = {h ∈ L (V ) : h ≤ 1 and σr (h) ≤ κ}. By uniform continuity of σr on Kr there exists δ0 > 0 such that for all h ∈ L (V ) if there exists h0 ∈ Kr with h − h0  < δ0 then σr (h) ≤ 2 κ. Recall that πF denotes the orthogonal projection onto a linear subspace F ⊂ V . g ◦ πvk (g)⊥ . We have Given g ∈ L (V ) such that (a) holds, consider the map h = g h ∈ Kr because σr (h) = σr (g ◦ πvk (g)⊥ ) = σk+r (g) < κ. g ◦ πE ⊥ . Then by items Given E ∈ Gr k (V ) such that (b) holds, we define hE = g (b) and (c) of Proposition 2.16 h − hE  ≤ πvk (g)⊥ − πE ⊥  ≤ δ(vk (g)⊥ , E ⊥ ) = δ(E, vk (g)) < δ0 , which implies that σr (g|E ⊥ ) = σr (hE ) ≤ 2 κ, and hence proves (1).

2.3 Lipschitz Estimates

61

To prove item (2) we use the triangle inequality δ(vr (g|E ⊥ ), vk+r (g) ∩ E ⊥ ) ≤ δ(vr (hE ), vr (h)) + δ(vr (h), vk+r (g) ∩ vk (g)⊥ ) + δ(vk+r (g) ∩ vk (g)⊥ , vk+r (g) ∩ E ⊥ )   16 r + 0 + 2 δ(E, vk (g)) ≤ 1 − 4 κ2 20r ≤ δ(E, vk (g)). 1 − 4 κ2 By Corollary 2.2, with Cr (hE , h) = r, we get a bound on δ(vr (hE ), vr (h)). The second distance is zero because vr (h) = vk+r (g) ∩ vk (g)⊥ . Finally we use item (2) of Proposition 2.30 to derive a bound on the third distance. Notice that although the conclusion of Proposition 2.30 is stated in terms of the distance d, the ratio between the metrics d and δ is very close to 1 when δ0 is small. Finally notice that vk (g) ⊂ vk+r (g) implies θ∩ (vk+r (g), vk (g)⊥ ) = 1.  Lemma 2.9 Given g1 , g2 ∈ L (V ), pˆ ∈ P(g1 ) ∩ P(g2 ) and any unit vector p ∈ pˆ , d(ϕg1 (ˆp), ϕg2 (ˆp)) ≤ max{

1

,

1

g1 p g2 p

} g1 − g2 .

Proof Applying Proposition 2.26 to the non-zero vectors g1 p and g2 p, we get g2 p g1 p −  g1 p g2 p ≤ max{g1 p−1 , g2 p−1 } g1 p − g2 p

d(ϕg1 (ˆp), ϕg2 (ˆp)) ≤ 

≤ max{g1 p−1 , g2 p−1 } g1 − g2 .



The final four lemmas of this section apply to invertible linear maps in GL(V ). They express the continuity of the map g → ϕg with values in the space of Lipschitz or Hölder continuous maps on the projective space. These facts will be needed in Chap. 5. Lemma 2.10 Given g1 , g2 ∈ GL(V ), and pˆ = qˆ in P(V ),

δ(ϕg1 (ˆp), ϕg1 (ˆq)) δ(ϕg2 (ˆp), ϕg2 (ˆq))

≤ C(g1 , g2 ) g1 − g2 , − δ(ˆp, qˆ ) δ(ˆp, qˆ ) where C(g1 , g2 ) := (g1−1 2 + g2 2 g1−1 2 g2−1 2 ) (g1  + g2 ).

62

2 Estimates on Grassmann Manifolds

Proof Given p ∈ pˆ and q ∈ qˆ , by Proposition 2.27

δ(ϕg1 (ˆp), ϕg1 (ˆq)) δ(ϕg2 (ˆp), ϕg2 (ˆq)) g1 p ∧ g1 vp (q) g2 p ∧ g2 vp (q)

=

− − δ(ˆp, qˆ ) δ(ˆp, qˆ ) g1 pg1 q g2 pg2 q g1 p ∧ g1 vp (q) − g2 p ∧ g2 vp (q) ≤ g1 pg1 q

1 1

g2 p ∧ g2 vp (q) − + g1 pg1 q g2 pg2 q ≤ g1−1 2 g1 p ∧ (g1 vp (q) − g2 vp (q)) + g1−1 2 (g1 p − g2 p) ∧ g2 vp (q)



+ g−1 2 g−1 2 (g1 p g1 q − g2 q + g2 q g1 p − g2 p ) g2 2 1 2 −1 2 ≤ g1  (g1  + g2 ) g1 − g2  + g2 2 g1−1 2 g2−1 2 (g1  + g2 ) g1 − g2  = (g1−1 2 + g2 2 g1−1 2 g2−1 2 ) (g1  + g2 ) g1

− g2 .



Lemma 2.11 Given g ∈ GL(V ) and pˆ = qˆ in P(V ), δ(ϕg (ˆp), ϕg (ˆq)) 1 ≤ g2 g−1 2 . ≤ g2 g−1 2 δ(ˆp, qˆ ) Proof Given pˆ = qˆ in P(V ) consider unit vectors p ∈ pˆ , q ∈ qˆ and set v = vp (q). We have p = q = v = 1 and p, v = 0. This last relation implies p ∧ v = 1. Hence gp ∧ gv = (∧2 g)(p ∧ v) ≥ (∧2 g)−1 −1 ≥ g−1 −2 . Analogously gp ∧ gv = (∧2 g)(p ∧ v) ≤ ∧2 g ≤ g2 . We also have

g−1 −2 ≤ g p g q ≤ g2 .

To finish the proof combine these inequalities with Proposition 2.27.



Given g ∈ GL(V ), we define (g) := max{logg, logg−1 }. Lemma 2.12 For every g ∈ GL(V ) and pˆ = qˆ in P(V ), & −4 (g) ≤ log Proof It follows from Lemma 2.11.

' δ(ϕg (ˆp), ϕg (ˆq)) ≤ 4 (g). δ(ˆp, qˆ )

(2.24)

2.3 Lipschitz Estimates

63

Lemma 2.13 Given g1 , g2 ∈ GL(V ), 0 < α ≤ 1 and pˆ = qˆ in P(V ),    

δ(ϕg1 (ˆp), ϕg1 (ˆq)) α δ(ϕg2 (ˆp), ϕg2 (ˆq)) α

≤ C1 (g1 , g2 ) g1 − g2 , − δ(ˆp, qˆ ) δ(ˆp, qˆ ) where C1 (g1 , g2 ) = α max{g1  g1−1 , g2  g2−1 }2(1−α) C(g1 , g2 ), and C(g1 , g2 ) stands for the constant in Lemma 2.10. Proof Setting Δ1 := we get

δ(ϕg1 pˆ ,ϕg1 qˆ ) δ(ˆp,ˆq)

and Δ2 :=

δ(ϕg2 pˆ ,ϕg2 qˆ ) , δ(ˆp,ˆq)

from Lemmas 2.10 and 2.11



α

Δ − Δα ≤ α max{Δα−1 , Δα−1 } Δ1 − Δ2 1 2 1 2

≤ α max{g1  g1−1 , g2  g2−1 }2(1−α) Δ1 − Δ2 ≤ α max{g1  g1−1 , g2  g2−1 }2(1−α) C(g1 , g2 ) g1 − g2 .



2.4 Avalanche Principle Consider a long chain of n linear maps g0 : V0 → V1 , g1 : V1 → V2 , etc., between Euclidean spaces Vi of the same dimension m. The AP relates the expansion gn−1 . . . g1 g0  of the composition gn−1 . . . g1 g0 with the product of the individual expansions gn−1  . . . g1  g0 . Given two quantities Mn and Nn depending on a large number n ∈ N, we say in rough terms that they are ε-asymptotic, ε Nn , when e−n ε ≤ Mn /Nn ≤ en ε . In general it is not true that and write Mn ε gn−1 . . . g1 g0  gn−1  . . . g1  g0  for some small ε > 0, unless some atypically sharp alignment of the singular directions of the linear maps gj occurs. Given the n−1 ... g0  ∈ [0, 1] chain of linear maps g0 , g1 , . . . , gn−1 , its rift ρ(g0 , . . . , gn−1 ) := ggn−1  ... g0  measures the break of expansion in the composition gn−1 . . . g1 g0 . The AP says that given any such chain g0 , g1 , . . . , gn−1 , where the gap ratio of each map gj is large, and the rift of any pair of consecutive maps is never too small, the rift of the composition behaves multiplicatively, in the sense that for some small number ε > 0, ρ(g0 , g1 , . . . , gn−1 ) or, equivalently,

ε

ρ(g0 , g1 ) ρ(g1 , g2 ) . . . ρ(gn−2 , gn−1 ),

gn−1 . . . g1 g0  g1  . . . gn−2  g1 g0  . . . gn−1 gn−2 

ε

1.

The AP was introduced by Goldstein and Schlag [6, Proposition 2.2] as a technique to obtain Höder continuity of the integrated density of states for quasi-periodic

64

2 Estimates on Grassmann Manifolds

Schrödinger cocycles. In its original version, the AP applies to chains of unimodular matrices in SL(2, C), and the length of the chain is assumed to be less than some lower bound on the norms of the matrices. Note that for unimodular matrices, the gap ratio and the norm are two equivalent measurements. Still in this unimodular setting, for matrices in SL(2, R), Bourgain and Jitomirskaya [4, Lemma 5] relaxed the constraint on the length of the chain of matrices, and later Bourgain [3, Lemma 2.6] removed it, at the cost of slightly weakening the conclusion of the AP. Later, Schlag [7, Lemma 1] generalized the AP to invertible matrices in GL(m, C). Recently, C. Sadel has shared with the authors an earlier draft of [1], containing his version of the AP for GL(m, C) matrices. Both of these higher dimensional APs assume some bound on the length of the chains of matrices. A higher dimensional AP without this assumption was proven by the authors [5, Theorem 3.1] for invertible real matrices. We present here a more general AP, which holds for (possibly non-invertible) matrices in Mat(m, R). As a by-product of the geometric approach used in the proof, we also obtain a quantitative control on the most expanding directions of the matrix product, something essential in the proof of the continuity of the Oseledets decomposition.

2.4.1 Contractive Shadowing Here we prove a shadowing lemma saying that under some conditions, a loose pseudoorbit of a chain of contracting maps is shadowed by a true orbit of the mapping sequence. In particular, a closed pseudo-orbit is shadowed by a periodic orbit of the mapping chain. Given a metric space (X, d), denote the closed ε-ball around x ∈ X by B(x, ε) := {z ∈ X : d(z, x) ≤ ε}. Given an open set X 0 ⊂ X, define X 0 (ε) := {x ∈ X 0 : d(x, ∂X 0 ) ≥ ε}, where ∂X 0 denotes the topological boundary of X 0 in (X, d). Lemma 2.14 (shadowing lemma) Consider ε > 0 and 0 < δ < κ < 1 such that δ/(1 − κ) < ε < 1/2. Given a family {(Xj , dj )}0≤j≤n of compact metric spaces with diameter 1, a chain of continuous mappings {gj : Xj0 → Xj+1 }0≤j≤n−1 defined on open sets Xj0 ⊂ Xj , and a sequence of points xj ∈ Xj , assume that for every 0 ≤ j ≤ n − 1:

2.4 Avalanche Principle

(a) (b) (c) (d)

65

xj ∈ Xj0 and d(xj , ∂Xj0 ) = 1, gj has Lipschitz constant ≤ κ on Xj0 (ε), 0 (2 ε), gj (xj ) ∈ Xj+1 0 gj (Xj (ε)) ⊂ B(gj (xj ), δ).

Then, setting g(n) := gn−1 ◦ · · · ◦ g1 ◦ g0 , the following hold: (1) the composition g(n) is defined on B(x0 , ε) and Lip(g(n) |B(x0 ,ε) ) ≤ κ n , δ , (2) d( gn−1 (xn−1 ), g(n) (x0 ) ) ≤ 1−κ (n) (3) if x0 = gn−1 (xn−1 ) then g (B(x0 , ε)) ⊂ B(x0 , ε) and there is a point x ∗ ∈ δ B(x0 , ε) such that g(n) (x ∗ ) = x ∗ and d (x0 , x ∗ ) ≤ (1−κ)(1−κ n) . Proof The proof’s inductive scheme is better understood with the help of Fig. 2.1 (see also Fig. 2.2), where we set zji := (gj−1 ◦ · · · ◦ gi+1 ◦ gi )(xi ) for i ≤ j ≤ n, with the convention that this composition is the identity when i = j. Of course we have to prove that all points zji are well-defined. The boxed expressions represent upper bounds on the distance between the points respectively above and below the box. The ith row represents the orbit of xi ∈ Xi by the chain of mappings {gj }j≥i . All points in the jth column belong to the space Xj . To explain the last upper bound at the bottom of each column, first notice that zii = xi . By (a), zii−1 = gi−1 (xi−1 ) is well-defined, and by (c), zii−1 ∈ Xi0 (2 ε) ⊂ Xi0 (ε). i−2 0 ∈ Xi−1 (ε), and zii−2 = gi−1 (gi−2 (xi−2 )) is well-defined. Then by (d) Likewise zi−1 we have (2.25) d(zii−1 , zii−2 ) = d(gi−1 (xi−1 ), gi−1 (gi−2 (xi−2 ))) ≤ δ. X0

X1 g

X2 g

g

0 1 2 z00 −→ z01 −→ z02 −→

g

δ

...

z03

3 −→ . . . −→

g

Xn−1 gn−2

z0n−1

Xn gn−1

−→

g

g2

−→

z0n n−2

κ δ κ δ κδ gn−2 gn−1 g3 z13 −→ . . . −→ z1n−1 −→ z1n n−3

1 2 z11 −→ z12 −→

z22

X3

δ z23 z33

g3

gn−2

g3

gn−2

−→ . . . −→

κ n−4δ

gn−1

z2n−1 n−5

−→

−→ . . . −→ .. .

z3n−1

gn−1

gn−2

zn−2 n−1 zn−1 n−1

−→

z n−2 n−2 −→

δ

κ

κ n−3δ z2n κ n−4δ

−→

z3n .. .

−→

gn−1

zn−2 n

gn−1

zn−1 n

.. .

δ

znn

Fig. 2.1 Family of orbits for the chain of mappings {gj : Xj0 → Xj+1 }j

66

2 Estimates on Grassmann Manifolds

Fig. 2.2 Shadowing property for a chain of contractive mappings

All other bounds are obtained applying (b) inductively. More precisely, we prove by induction in the column index j that (i) all points zji in the jth column are well-defined and belong to Xj0 (ε), (ii) distances between consecutive points in the column j are bounded by the expressions in Fig. 2.1, i.e., for all 1 ≤ i ≤ j − 1, d(zji−1 , zji ) ≤ κ j−i−1 δ.

(2.26)

The initial inductive steps, j = 0, 1, 2, follow from (a), (c) and (2.25). Assume i = gj (zji ) now that the points zji in jth column satisfy (i) and (ii). Then their images zj+1 are well-defined. By (b) we have for all 1 ≤ i ≤ j − 1, i−1 i , zj+1 ) = d(gj (zji−1 ), gj (zji )) ≤ κ d(zji−1 , zji ) ≤ κ j−i δ. d(zj+1

Together with (2.25) this proves (ii) for the column j + 1. To prove (i) consider any 1 ≤ i ≤ j. By (c) and the triangle inequality, j

j

i 0 0 i , ∂Xj+1 ) ≥ d(zj+1 , ∂Xj+1 ) − d(zj+1 , zj+1 ) d(zj+1



0 ) d(gj (xj ), ∂Xj+1



j 

l−1 l d(zj+1 , zj+1 )

l=i+1

≥ 2ε −

j 

κ j−l δ ≥ 2 ε −

l=i+1

δ ≥ ε. 1−κ

This proves (i) for the column j + 1, and concludes the induction. Conclusion (1) follows from (b) and the following claim, to be proved by induction in i. For every i = 0, 1, . . . , n − 1, g(i) (B(x0 , ε)) ⊂ Xi0 (ε), where g(i) = gi−1 ◦ · · · ◦ g0 . Consider first the case i = 0. Given x ∈ B(x0 , ε), d(x, ∂X00 ) ≥ d(x0 , ∂X00 ) − d(x, x0 ) ≥ 1 − ε > ε. This implies that d(g0 (x), g0 (x0 )) ≤ κ d(x, x0 ). Thus

2.4 Avalanche Principle

67

d(g0 (x), ∂X10 ) ≥ d(g0 (x0 ), ∂X10 ) − d(g0 (x0 ), g0 (x)) ≥ 2 ε − d(g0 (x0 ), g0 (x)) ≥ 2 ε − κ d(x0 , x) ≥ 2 ε − κ ε > ε which proves that g0 (B(x0 , ε)) ⊂ X10 (ε). Assume now that for every l ≤ i − 1, 0 (ε). (gl ◦ · · · ◦ g0 )(B(x0 , ε)) ⊂ Xl+1

By (b), g(i) acts as a κ i contraction on B(x0 , ε) and g(i) (B(x0 , ε)) ⊂ Xi0 (ε). Thus for every x ∈ B(x0 , ε), 0 0 ) ≥ d(gi (xi ), ∂Xi+1 ) − d(gi (xi ), g(i+1) (x)) d(g(i+1) (x), ∂Xi+1 0 i 0 , zi+1 ) − d(zi+1 , g(i+1) (x)) ≥ 2 ε − d(zi+1

≥ 2ε −

i−1 

l+1 l d(zi+1 , zi+1 ) − d(g(i+1) (x0 ), g(i+1) (x))

l=0

≥ 2 ε − (δ + κ δ + · · · + κ i−1 δ) − κ i d(x0 , x) ≥ 2 ε − (δ + κ δ + · · · + κ i−1 δ) − κ i ε ≥ 2 ε − (1 − κ) ε (1 + κ + · · · + κ i−1 ) − κ i ε = ε 0 (ε), and establishes the claim above. which proves that g(i+1) (B(x0 , ε)) ⊂ Xi+1 (n) Thus g is well-defined on B(x0 , ε), and, because of assumption (b), g(n) is a κ n Lipschitz contraction on this ball. This proves (1). Item (2) follows by (2.26). In fact

(n)

d(gn−1 (xn−1 ), g (x0 )) =

d(znn−1 , zn0 )



n−1  l=1

d(znl , znl−1 )



n−1  l=1

κ n−l−1 δ ≤

δ . 1−κ

Finally we prove (3). Assume x0 = gn−1 (xn−1 ). It is enough to see that g(n) (B(x0 , ε)) ⊂ B(x0 , ε), because by (1) g(n) acts as a κ n -contraction in the closed ball B(x0 , ε). The conclusion on the existence of a fixed point, as well as the proximity bound, follow from the classical fixed point theorem for Lipschitz contractions. Given x ∈ B(x0 , ε), we know from the previous calculation that d(x0 , g(n) (x0 )) < δ + κ δ + · · · + κ n−2 δ.

68

2 Estimates on Grassmann Manifolds

Hence d(g(n) (x), x0 ) ≤ d(g(n) (x), g(n) (x0 )) + d(g(n) (x0 ), x0 ) ≤ κ n−1 d(x, x0 ) + δ + κ δ + · · · + κ n−2 δ ≤ δ + κ δ + · · · + κ n−2 δ + κ n−1 ε ≤ (1 − κ) ε (1 + κ + · · · + κ n−2 ) + κ n−1 ε = (1 − κ) ε

1 − κ n−1 + κ n−1 ε = ε. 1−κ

Thus g(n) (x) ∈ B(x0 , ε), which proves that g(n) (B(x0 , ε)) ⊂ B(x0 , ε).



2.4.2 Statement and Proof of the AP In the statement and proof of the AP we will use the notation introduced in Sect. 2.2.3. Given a chain of linear mappings {gj : Vj → Vj+1 }0≤j≤n−1 we denote the composition of the first i maps by g(i) := gi−1 . . . g1 g0 . Throughout this chapter, a  b will stand for a ≤ C b for some absolute constant C. Theorem 2.1 (Avalanche Principle) There exists a constant c > 0 such that given 0 < ε < 1, 0 < κ ≤ c ε2 and a chain of linear mappings {gj : Vj → Vj+1 }0≤j≤n−1 between Euclidean spaces Vj , if (a) σ (gi ) ≤ κ, for 0 ≤ i ≤ n − 1, and (b) α(gi−1 , gi ) ≥ ε, for 1 ≤ i ≤ n − 1, then (1) d(v(g(n) ), v(g0 ))  κ ε−1 , ∗ ))  κ ε−1 , (2) d(v(g(n)∗ ), v(gn−1 $n−1 # ε) , (3) σ (g(n) )  κ κ (4+2 ε2 n−2 n−1  

κ loggi  − loggi gi−1   n 2 . (4) logg(n)  + ε i=1 i=1

Remark 2.3 (On the assumptions) Assumption (a) says that the (first) gap ratio of each gj is large, gr(gj ) ≥ κ −1 . Given (a), by Propositions 2.23 and 2.24, assumption (b) is equivalent to a condition on the rift, ρ(gj−1 , gj )  ε for all j = 1, . . . , n − 1. Remark 2.4 (On the conclusions) Conclusions (1) and (2) say that the most expanding direction v(g(n) ) of the product g(n) , and its image ϕg(n) v(g(n) ), are respectively κ/ε-close to the most expanding direction v(g0 ) of g0 , and to the image ϕgn−1 v(gn−1 ) of the most expanding direction of gn−1 . Conclusion (3) says that the composition map g(n) has a large gap ratio. Finally, conclusion (4) is equivalent to

2.4 Avalanche Principle

e−n C κ ε

−2

69



gn−1 . . . g1 g0  g1  . . . gn−2  −2 ≤ en C κ ε , g1 g0  . . . gn−1 gn−2 

for some universal constant C > 0. These inequalities describe the asymptotic almost multiplicative behavior of the rifts ρ(g0 , g1 , . . . , gn−1 )

C κ/ε2

ρ(g0 , g1 ) ρ(g1 , g2 ) . . . ρ(gn−2 , gn−1 ).

Proof The strategy of the proof is to look at the contracting action of linear mappings gj on the projective space. For each j = 0, 1, . . . , n consider the compact metric space Xj = P(Vj ) with the normalized Riemannian distance, d(ˆu, vˆ ) = π2 ρ(ˆu, vˆ ). The reader should be warned of the notational similarity between this projective metric and the one defined in (2.2). We do not refer to the metric (2.2) in this proof. However, the distance in the statement of the AP can be understood as any of the four equivalent projective distances δ, d, ρ or d. For 0 ≤ j < n define Xj0 := {ˆv ∈ Xj : α(ˆv, v(gj )) > 0}, ∗ Yj0 := {ˆv ∈ Xj : α(ˆv, v(gj−1 )) > 0}.

The domain of the projective map ϕgj : P(gj ) ⊂ Xj → Xj+1 clearly contains the ∗ : P(gj∗ ) ⊂ Xj → Xj−1 contains Yj0 . open set Xj0 . Analogously, the domain of ϕgj−1 We will apply Lemma 2.14 to chains of projective maps formed by the mappings ∗ : Yj0 → Xj−1 . ϕgj : Xj0 → Xj+1 and their adjoints ϕgj−1 Take positive numbers ε and κ such that 0 < κ  ε2 , let r := 1 − ε2 /4, and define the following input parameters for the application of Lemma 2.14, 1 arcsin ε, π √ r + 1 − r2 4κ κsh := κ , 1 − r2 ε2 κr 2κ . δsh := √ 2 ε 1−r εsh :=

A simple calculation shows that there exists 0 < c < 1 such that for any 0 < ε < 1 δsh and 0 < κ ≤ c ε2 , the pre-conditions 0 < δsh < κsh < 1 and 1−κ < εsh < 1/2 of sh the shadowing lemma are satisfied. ∗ ). This lemma is going to be applied to the Define xj = v(gj ) and xj∗ = v(gj−1 following chains of maps and sequences of points ∗ , . . . , ϕ ∗, x0 , . . . , xn−1 , xn∗ , . . . , x1∗ , (A) ϕg0 , . . . , ϕgn−1 , ϕgn−1 g0

∗ ∗ ∗ , . . . , ϕ ∗, ϕ , . . . , ϕ (B) ϕgn−1 g0 g0 gn−1 , xn , . . . , x1 , x0 , . . . , xn−1 ,

70

2 Estimates on Grassmann Manifolds

from which we will infer the conclusions (1) and (2). Let us check now that assumptions (a)–(d) of Lemma 2.14 hold in both cases (A) and (B). By definition ∂Xj0 := {ˆv ∈ Xj : α(ˆv, xj ) = 0 } = { vˆ ∈ Xj : vˆ ⊥ xj }. Hence, if vˆ ∈ ∂Xj0 then d(xj , vˆ ) = 1, which proves that d(xj , ∂Xj0 ) = 1. Analogously, ∂Yj0 = {ˆv ∈ Xj : vˆ ⊥ xj∗ } and d(xj∗ , ∂Yj0 ) = 1. Therefore assumption (a) holds. By definition of Xj0 (ε), π vˆ ∈ Xj0 (ε) ⇔ d(ˆv, ∂Xj0 ) ≥ ε ⇔ ρ(ˆv, ∂Xj0 ) ≥ ε # π $2 0 ⇔ δ(ˆv, ∂Xj ) = α(ˆv, xj ) ≥ sin ε 2 #π $ ε . ⇔ δ(ˆv, xj ) ≤ cos 2 Similarly, by definition of Yj0 (ε), vˆ ∈ Yj0 (ε) ⇔ δ(ˆv, xj∗ ) ≤ cos

#π $ ε . 2

Thus, because cos

#π 2

εsh

$

  1 ε2 arcsin ε ≤ 1 − = r, = cos 2 4 

we have Xj0 (εsh ) ⊂ B(δ) (xj , r) and Yj0 (εsh ) ⊂ B(δ) (xj∗ , r), and assumption (b) holds by Proposition 2.29 (3). By the gap assumption, α(ϕgj (xj ), xj+1 ) = α(v(gj∗ ), v(gj+1 )) = α(gj , gj+1 ) ≥ ε. Therefore 2 2 0 arcsin δ(ϕgj (xj ), ∂Xj+1 arcsin α(ϕgj (xj ), xj+1 ) )= π π 2 arcsin ε = 2 εsh . ≥ π

0 d(ϕgj (xj ), ∂Xj+1 )=

Similarly, by the gap assumption, ∗ ∗ ∗ ∗ ∗ ∗ (x ), x α(ϕgj−1 j j−1 ) = α(v(gj−1 ), v(gj−1 )) = α(gj+1 , gj ) = α(gj , gj+1 ) ≥ ε,

and in the same way we infer that ∗ 0 ∗ (x ), ∂Y d(ϕgj−1 j j−1 ) ≥

2 arcsin ε = 2 εsh . π

2.4 Avalanche Principle

71

This proves that (c) of the shadowing lemma holds. Notice that in both cases (A) and (B), the assumption (c) holds trivially for the middle points, because ϕgn−1 (xn−1 ) = xn∗ ∈ Yn0 (2 εsh ) and ϕg0∗ (x1∗ ) = x0 ∈ X00 (2 εsh ). It was proved above that Xj0 (εsh ) ⊂ B(δ) (xj , r) and Yj0 (εsh ) ⊂ B(δ) (xj∗ , r). By (2.5) we have d(ˆu, vˆ ) ≤ δ(ˆu, vˆ ). Thus by Proposition 2.29 (1), ϕgj (Xj0 (εsh )) ⊂ B(δ) (xj∗ , δsh ) ⊂ B(d) (xj∗ , δsh ) with xj∗ = ϕgj (xj ), and analogously, 0 (δ) ∗ ∗ (Y (ε )) ⊂ B ∗ (x ). (xj−1 , δsh ) ⊂ B(d) (xj−1 , δsh ) with xj−1 = ϕgj−1 ϕgj−1 sh j j

Hence, (d) of Lemma 2.14 holds. Therefore, because ϕg0∗ (x1∗ ) = x0 and ϕgn−1 (xn−1 ) = xn∗ , conclusion (3) of Lemma 2.14 holds for both chains (A) and (B). The projective points v(g(n) ) and v(g(n)∗ ) are the unique fixed points of the chains of mappings (A) and (B), respectively. Hence, by the shadowing lemma both distances d(x0 , v(g(n) )) and d(xn∗ , v(g(n)∗ )) are bounded above by δsh 2n (1 − κsh ) (1 − κsh )

δsh

κ . ε

This proves conclusions (1) and (2) of the AP. From Proposition 2.28 we infer that for any g ∈ L (V ), (Dϕg )v(g)  =

s2 (g) = σ (g). g

Hence, by conclusion (1) of the shadowing lemma σ (g(n) ) = (Dϕg(n) )v(g(n) )  ≤ Lip(ϕg(n) |B(v(g0 ),εsh ) )   κ (4 + 2 ε) n n ≤ (κsh ) ≤ . ε2 On the other hand, by (1) the distance from v(g(n) ) to v(g0 ) is of order κ ε−1  ε and Lip(ϕg0 |B(v(g0 ),κ ε−1 ) )  (Dϕg0 )v(g0 )  = σ (g0 ) ≤ κ. Therefore (n)

σ (g )  κ (κsh )

 n−1

which proves conclusion (3) of the AP.

≤κ

κ (4 + 2 ε) ε2

n−1 ,

72

2 Estimates on Grassmann Manifolds

Before proving (4), notice that applying (3) to the chain of linear maps g0 , . . . , gi−1 we get that g(i) := gi−1 . . . g0 has a first gap ratio for all i = 1, . . . , n. We claim that

(i)

α(g , gi ) − α(gi−1 , gi )  κ ε−1 . (2.27) By (2) of the AP, applied to the chain of linear maps g0 , . . . , gi−1 , ∗ )) ≤ d(v(g(i)∗ ), v(gi−1

δsh  κ ε−1 . 2i (1 − κsh )(1 − κsh )

Hence, by Proposition 2.35



(i)

α(g , gi ) − α(gi−1 , gi ) = α(v(g(i)∗ ), v(gi )) − α(v(g∗ ), v(gi )) i−1 ∗ ≤ d(v(g(i)∗ ), v(gi−1 ))  κ ε−1 .

For any i, the logarithm of any ratio between the four factors α(g(i) , gi ), β(g(i) , gi ), α(gi−1 , gi ) and β(gi−1 , gi ) is of order κ ε−2 . In fact, by (2.27) (i)



log α(g , gi )  1 α(g(i) , gi ) − α(gi−1 , gi ) ≤ κ ε−2 . α(gi−1 , gi ) ε

By hypothesis (a), σ (gi ) ≤ κ. From conclusion (3) we also have σ (g(i) ) < κ, provided we make the constant c small enough. Hence by Lemma 2.4, 2 (i) 2



log β(gi−1 , gi )  κ and log β(g , gi )  κ . α(gi−1 , gi ) ε2 α(g(i) , gi ) ε2

Since κ 2 ε−2  κ ε−2 , the logarithms of the other ratios between the factors above are all  κ ε−2 . Thus, for some universal constant C > 0, each of these ratios is −2 −2 inside the interval [e−C κ ε , eC κ ε ]. Finally, applying Proposition 2.25 to the rifts ρ(g0 , . . . , gn−1 ), ρ(g0 , g1 ), ρ(g1 , g2 ), etc., we have

e−n C κ ε

−2



n−1 n−1   ρ(g0 , . . . , gn−1 ) α(g(i) , gi ) β(g(i) , gi ) −2 ≤ n−1 ≤ en C κ ε , ≤ β(g , g ) α(g , g ) ρ(g , g ) i−1 i i−1 i i−1 i i=1 i=1 i=1

which by Remark 2.4 is equivalent to (4). Next proposition is a practical reformulation of the Avalanche Principle.



2.4 Avalanche Principle

73

Proposition 2.42 There exists c > 0 such that given 0 < ε < 1, 0 < κ ≤ c ε2 and g0 , g1 , . . . , gn−1 ∈ Mat(m, R), if 1 κ gi gi−1  (angles) >ε gi  gi−1  (gaps) gr(gi ) >

for all 0 ≤ i ≤ n − 1 for all 1 ≤ i ≤ n − 1

then ) ( ∗ )), d(v(g(n) ), v(g0 ))  κ ε−1 max d(v(g(n)∗ ), v(gn−1

n−2 n−1

  κ

(n) loggi  − loggi gi−1   n 2 .

logg  +

ε i=1

i=1

Proof Consider the constant c > 0 in Theorem 2.1, let c := c (1 − 2 c2 ) and assume 0 < κ ≤ c ε 2 . Assumption (gaps) here is equivalent to assumption (a) of Theorem 2.1. By Proposition 2.24, the assumption (angles) here implies  α(gi−1 , gi ) ≥ ρ(gi−1 , gi ) 1 −  ≥ε

1−

2 κ2 ρ(gi−1 , gi )2

2 κ2 ≥ ε 1 − 2 c2 ε2 =: ε , 2 ε

Since 0 < κ ≤ c ε2 , and c ε2 ≤ c (1−2 c2 ε2 ) ε2 = c (ε )2 we have 0 < κ ≤ c (ε )2 . Thus, because ε ε , this proposition follows from conclusions (1), (2) and (4) of Theorem 2.1. 

2.4.3 Consequences of the AP Given a chain of linear maps {gj : Vj → Vj+1 }0≤j≤n−1 between Euclidean spaces Vj , and integers 0 ≤ i < j ≤ n we define g(j,i) := gj−1 ◦ · · · ◦ gi+1 ◦ gi . With this notation the following relation holds for 0 ≤ i < k < j ≤ n, g(j,i) = g(j,k) ◦ g(k,i) . Next proposition states, in a quantified way, that the most expanding directions v(gn,i) ) ∈ P(Vi ) are almost invariant under the adjoints of the chain mappings.

74

2 Estimates on Grassmann Manifolds

Proposition 2.43 Under the assumptions of Theorem 2.1, where 0 < κ  ε2 , d(ϕgi∗ v(g(n,i+1) ), v(g(n,i) )) 

κ κ (4 + 2 ε) n−i ( ) . ε ε2

Proof Consider κ, ε, κsh and εsh as in Theorem 2.1. From the proof of item (3) of the ∗ , . . . , gi∗ , we conclude that the composition AP, applied to the chain of mappings gn−1 ∗ ∗ ∗ (n,i) n−i = gi ◦ · · · ◦ gn−1 is a (κsh ) -Lipschitz contraction on the ball B(v(gn−1 ), εsh ). g ∗ (n,i+1)∗ , v(gn−1 ) )  κ ε−1 and On the other hand, by (2) of the AP we have d( v(g ∗ ), v(g(n,i)∗ )  κ ε−1 . Since κ ε−1  ε εsh , both projective points d( v(gn−1 ∗ (n,i)∗ v(g ) and v(g(n,i+1)∗ ) belong to the ball B(v(gn−1 ), εsh ). Thus, d(ϕgi∗ v(g(n,i+1) ), v(g(n,i) )) = d( ϕgi∗ ◦ ϕg(n,i+1)∗ v(g(n,i+1)∗ ), ϕg(n,i)∗ v(g(n,i)∗ ) ) = d( ϕg(n,i)∗ v(g(n,i+1)∗ ), ϕg(n,i)∗ v(g(n,i)∗ ) ) ≤ (κsh )n−i d( v(g(n,i+1)∗ , v(g(n,i)∗ )  κ (4 + 2 ε) n−i  ∗ ∗ ≤( d( v(g(n,i+1)∗ , v(gn−1 ) ) ) + d( v(gn−1 ), v(g(n,i)∗ ) 2 ε 2 κ κ (4 + 2 ε) n−i ( ) .  ε ε2 

which proves the proposition.

Most expanding directions and norms of products of chains matrices under an application of the AP admit the following modulus of continuity. Proposition 2.44 Let c > 0 be the universal constant in Theorem 2.1. Given numbers 0 < ε < 1 and 0 < κ < c ε2 , and given two chains of matrices g0 , . . . , gn−1 and g0 , . . . , gn−1 in Mat(m, R), both satisfying the assumptions of the AP for the given parameters κ and ε, if drel (gi , gi ) < δ for all i = 0, 1, . . . , n − 1, then (a) d( v(gn−1 . . . g0 ), v(gn−1 . . . g0 ) )   

gn−1 . . . g0 

κ δ

n . + (b) log gn−1 . . . g0  ε2 ε

κ ε

+ 8 δ,

Proof Item (a) follows from conclusion (1) of Theorem 2.1, and Proposition 2.40, d( v(gn−1 . . . g0 ), v(gn−1 . . . g0 ) ) ≤ d( v(gn−1 . . . g0 ), v(g0 ) )

+ d(v(g0 ), v(g0 )) + d( v(g0 ), v(gn−1 . . . g0 ) ) κ κ 16 δ 2 +  + 8 δ. 2 ε 1−κ ε

2.4 Avalanche Principle

75

Assuming gi  ≥ gi , we have gi  gi − gi  gi  gi  ≤ 1 + ≤ 1 + drel (gi , gi ) ≤ 1 + δ gi  gi  gi  gi  which implies

1 gi  ≤ . gi  1−δ

Because the case gi  ≤ gi  is analogous, we conclude that  

1 δ

log gi  ≤ log ≤ gi  1−δ 1−δ

δ.

Since the two chains of matrices satisfy the assumptions of the AP we have gi gi−1  ≥ α(gi−1 , gi ) ≥ ε gi  gi−1 

and

gi gi−1  ≥ α(gi−1 , gi ) ≥ ε. gi  gi−1 

A simple calculation gives   g  gi gi−1  max 1, i drel (gi , gi ) gi gi−1  gi    gi gi−1  gi−1  max 1, drel (gi−1 , gi−1 + ) gi gi−1  gi−1  δ 2 δ . ≤ (1 − δ)2 ε ε

)≤ drel ( gi gi−1 , gi gi−1

Therefore, arguing as above,

log gi gi−1   δ . gi gi−1  ε Hence, by conclusion (4) of the AP we have



log gn−1 . . . g0  ≤ log gn−1 . . . g0  g1  . . . gn−2  gn−1 . . . g0  g1 g0  . . . gn−1 gn−2 

g g  . . . gn−1 gn−2 

+ log 1 0 gn−1 . . . g0  g1  . . . gn−2  +

n−2 n−1  



log gi  +

log gi gi−1  gi  gi gi−1  i=1 i=1

76

2 Estimates on Grassmann Manifolds

κ δ  2 n 2 + (n − 2) δ + (n − 1) ε  ε δ κ , + n ε2 ε 

which proves (b). The next proposition is a flag version of the AP. Let τ = (τ1 , . . . , τk ) be a signature with 0 < τ1 < τ2 < · · · < τk < m. We call τ -block product any of the functions πτ,j : Mat(m, R) → R, πτ,j (g) := sτj−1 +1 (g) . . . sτj (g),

1 ≤ j ≤ k,

where by convention τ0 = 0. A τ -singular value product, abbreviated τ -s.v.p., is any product of distinct τ -block products. By definition, τ -block products are τ -singular value products. Other examples of τ -singular value products are the functions pτj (g) = s1 (g) . . . sτj (g) = ∧τj g. Note that for every 1 ≤ j ≤ k we have: πτ,j (g) =

pτj (g) , pτj−1 (g)

and pτj (g) = πτ,1 (g) . . . πτ,j (g). Proposition 2.45 (Flag AP) Let c > 0 be the universal constant in Theorem 2.1. Given numbers 0 < ε < 1, 0 < κ ≤ c ε2 and a chain of matrices gj ∈ Mat(m, R), with j = 0, 1, . . . , n − 1, if (a) στ (gi ) ≤ κ, for 0 ≤ i ≤ n − 1, and (b) ατ (gi−1 , gi ) ≥ ε, for 1 ≤ i ≤ n − 1, then ∗ (1) d(vτ (g(n)∗ ), vτ (gn−1 ))  κ ε−1 (n) −1 (2) d(vτ (g ), #vτ (g0 ))  $ κε n

ε) (3) στ (g(n) ) ≤ κ (4+2 ε2 (4) for any τ -s.v.p. function π ,

n−2 n−1  

κ

log π(g(n) ) + log π(gi ) − log π(gi gi−1 )  n 2 . ε i=1 i=1

Proof For each j = 1, . . . , k, consider the chain of matrices ∧τj g0 , ∧τj g1 , . . . , ∧τj gn−1 . Assumptions (a) and (b) here imply the corresponding assumptions of

2.4 Avalanche Principle

77

Theorem 2.1 for all these chains of exterior power matrices. Hence, by (1) of the AP ∗ ∗ )) = d(Ψ (vτj (g(n)∗ )), Ψ (vτj (gn−1 ))) d(vτj (g(n)∗ ), vτj (gn−1 ∗ ))  κ ε−1 . = d(v(∧τj g(n)∗ ), v(∧τj gn−1 ∗ ))  κ ε−1 , which proves Thus, taking the maximum in j we get d(vτ (g(n)∗ ), vτ (gn−1 (1). Conclusion (2) follows in the same way. Similarly, from (3) of Theorem 2.1, we infer the corresponding conclusion here (n)

(n)

(n)

στ (g ) = max στj (g ) = max σ (∧τj g ) ≤ 1≤j≤k

1≤j≤k



κ (4 + 2 ε) ε2

n .

Let us now prove (4). For the τ -s.v.p. π(g) = pτ,j (g) = ∧τj g conclusion (4) is a consequence of the corresponding conclusion of Theorem 2.1. For the τ -block product π = πτ,j , since log π(g) = log∧τj g − log∧τj−1 g, conclusion (4) follows again from Theorem 2.1 (4). Finally, since any τ -s.v.p. is a finite product of τ -block products we can reduce (4) to the previous case.  We finish this section with a version of the AP for complex matrices. The singular values of a complex matrix g ∈ Mat(m, C) are defined to be the eigenvalues of the positive semi-definite hermitian matrix g∗ g, where g∗ stands for the transjugate of g, i.e., the conjugate transpose of g. Similarly, the singular vectors of g are defined as the eigenvectors of g∗ g. The sorted singular values of g ∈ Mat(m, C) are denoted by s1 (g) ≥ s2 (g) ≥ · · · ≥ sm (g). The top singular value of g coincides with its norm, s1 (g) = g. The (first) gap ratio of g is the quotient σ (g) := s2 (g)/s1 (g) ≤ 1. We say that g ∈ Mat(m, C) has a (first) gap ratio when σ (g) < 1. When this happens the complex eigenspace {v ∈ Cm : g∗ g v = g v} = {v ∈ Cm : g v = g v} has complex dimension one and determines a point in P(Cm ), denoted by v(g) and referred to as the g-most expanding direction. Given points vˆ , uˆ ∈ P(Cm ), we set α(ˆv, uˆ ) :=



v, u v u

where

v ∈ vˆ , u ∈ uˆ .

(2.28)

78

2 Estimates on Grassmann Manifolds

Given g, g ∈ Mat(m, C), both with (first) gap ratios, we define the angle between g and g to be α(g, g ) := α(v(g∗ ), v(g )). With these definitions, the real version of the AP leads in a straightforward manner to a slightly weaker complex version, stated and proved below. However, adapting the original proof to the complex case, replacing each real concept by its complex analog, would lead to the same stronger estimates as in Theorem 2.1. Proposition 2.46 (Complex AP) Let c > 0 be the universal constant in Theorem 2.1. Given numbers 0 < ε < 1, 0 < κ ≤ c ε4 and a chain of matrices gj ∈ Mat(m, C), with j = 0, 1, . . . , n − 1, if (a) σ (gi ) ≤ κ, for 0 ≤ i ≤ n − 1, and (b) α(gi−1 , gi ) ≥ ε, for 1 ≤ i ≤ n − 1, then ∗ (1) d(v(g(n)∗ ), v(gn−1 ))  κ ε−2 (n) (2) d(v(g ), v(g κ ε−2 # 0 )) 2 $ n ε ) (3) σ (g(n) ) ≤ κ (4+2 ε4 n−2 n−1  

κ (4) logg(n)  + loggi  − loggi gi−1   n 4 . ε i=1 i=1

Proof Make the identification Cm ≡ R2m , and given g ∈ Mat(m, Cm ) denote by gR ∈ Mat(2m, R) the matrix representing the linear operator g : R2m → R2m in the canonical basis. We make explicit the relationship between gap ratios and angles of the complex matrices and g, g ∈ Mat(m, C), and the gap ratios and angles of their real analogues gR and (g )R . Given g ∈ Mat(m, C), for each eigenvalue λ of g, the matrix gR has a corresponding pair of eigenvalues λ, λ. Since g → gR is a C ∗ -algebra homomorphism, we have (g∗ g)R = (gR )∗ (gR ). Therefore, for all i = 1, . . . , m, si (g) = s2i−1 (gR ) = s2i (gR ). In particular, considering the signature τ = (2), σ(2) (gR ) =

s2 (g) s3 (gR ) = = σ (g). s1 (gR ) s1 (g)

(2.29)

The g-most expanding direction v(g) ∈ P(Cm ) is a complex line which we can identify with the real 2-plane v(2) (gR ). This identification, v(g) ≡ v(2) (gR ), comes from a natural isometric embedding P(Cm ) → Gr 2 (R2m ). Consider two points vˆ , uˆ ∈ P(Cm ) and take unit vectors v ∈ vˆ and u ∈ uˆ . Denote by U, V ⊂ Cm the complex lines spanned by these vectors, which are planes in Gr 2 (R2m ). Consider the complex orthogonal projection onto the complex line V , πu,v : U → V , defined by πu,v (x) := x, v v. By (2.28) we have α(ˆv, uˆ ) = πu,v . ∗ On the other hand, since the adjoints πu,v : V → U of πu,v both as a complex and as

2.4 Avalanche Principle

79

a real linear maps coincide, it follows that πu,v = πU,V is the restriction to U of the (real) orthogonal projection onto the 2-plane V . Thus, by Proposition 2.19(b), α2 (U, V ) =



∗ π ) = det (π ∗ π ) = π 2 = α(ˆ det R (πu,v v, uˆ )2 . u,v C u,v u,v u,v

In particular, α(2) (gR , (g )R ) = α(2) (v((gR )∗ ), v((g )R )) = α(v(g∗ ), v(g ))2 = α(g, g )2 . (2.30) Take κ, ε > 0 such that κ < c ε4 , 0 < ε < 1, and consider a chain of matrices gj ∈ Mat(m, C), j = 0, 1, . . . , n − 1 satisfying the assumptions (a) and (b) of the complex AP. By (2.29) and (2.30), the assumptions (a) and (b) of Proposition 2.45 hold for the chain of real matrices gjR ∈ Mat(2m, R), j = 0, 1, . . . , n − 1, with parameters κ and ε2 , and with τ = (2). Therefore conclusions (1)–(4) of the complex AP follow from the corresponding conclusions of Proposition 2.45. In conclusion  (4) we use the (2)-singular value product π(g) := g2 = ∧2 gR .

References 1. A. Ávila, S. Jitomirskaya, C. Sadel, Complex one-frequency cocycles. J. Eur. Math. Soc. (JEMS) 16(9), 1915–1935 (2014). MR 3273312 2. M. Barnabei, A. Brini, G.-C. Rota, On the exterior calculus of invariant theory. J. Algebra 96(1), 120–160 (1985). MR 808845 (87j:05002) 3. J. Bourgain, Positivity and continuity of the Lyapounov exponent for shifts on Td with arbitrary frequency vector and real analytic potential. J. Anal. Math. 96, 313–355 (2005). MR 2177191 (2006i:47064) 4. J. Bourgain, S. Jitomirskaya, Continuity of the Lyapunov exponent for quasiperiodic operators with analytic potential. J. Statist. Phys. 108(5–6), 1203–1218 (2002), Dedicated to David Ruelle and Yasha Sinai on the occasion of their 65th birthdays. MR 1933451 (2004c:47073) 5. P. Duarte, S. Klein, Continuity of the lyapunov exponents for quasiperiodic cocycles. Comm. Math. Phys. 332(3), 1113–1166 (2014). MR 3262622 6. M. Goldstein, W. Schlag, Hölder continuity of the integrated density of states for quasi-periodic Schrödinger equations and averages of shifts of subharmonic functions. Ann. Math. 154(2)(1), 155–203 (2001). MR 1847592 (2002h:82055) 7. W. Schlag, Regularity and convergence rates for the Lyapunov exponents of linear cocycles. J. Mod. Dyn. 7(4), 619–637 (2013). MR 3177775 8. S. Sternberg, Lectures on Differential Geometry (Prentice-Hall Inc, Englewood Cliffs, 1964). MR 0193578 (33 #1797)

Chapter 3

Abstract Continuity of Lyapunov Exponents

Abstract We devise an abstract, modular scheme to prove continuity of the Lyapunov exponents for a general class of linear cocycles. The main assumption is the availability of appropriate large deviation type (LDT) estimates which are uniform in the data. We provide a modulus of continuity that depends explicitly on the sharpness of the LDT estimate. Our method uses an inductive procedure based on the deterministic, general Avalanche Principle from the previous chapter. The main advantage of this approach, besides the fact that it provides quantitative estimates, is its versatility, as it applies to quasi-periodic cocycles (one and multivariable torus translations), to random cocycles (i.i.d. and Markov systems) and to any other types of base dynamics as long as appropriate LDT estimates are satisfied. Moreover, compared to other available quantitative results for quasi-periodic or random cocycles, this method allows for weaker assumptions.

3.1 Definitions, the Abstract Setup and Statement An ergodic measure preserving dynamical system (X, F, μ, T ) consists of a set X , a σ —algebra F, a probability measure μ on (X, F) and a transformation T : X → X which is ergodic and measure preserving. Two important classes of ergodic dynamical systems are the shift over a stochastic process (i.e. a Bernoulli shift or a Markov shift) and the torus translation by an incommensurable frequency vector. A linear cocycle over an ergodic system (X, F, μ, T ) is a skew-product map on the vector bundle X × Rm given by X × Rm  (x, v) → (T x, A(x)v) ∈ X × Rm , where A : X → Mat(m, R) is a measurable function. Hence T is the base dynamics while A defines the fiber action. Since the base dynamics will be fixed, we identify the cocycle with just its fiber action A.

© Atlantis Press and the author(s) 2016 P. Duarte and S. Klein, Lyapunov Exponents of Linear Cocycles, Atlantis Studies in Dynamical Systems 3, DOI 10.2991/978-94-6239-124-6_3

81

82

3 Abstract Continuity of Lyapunov Exponents

The iterates of the cocycle are (T n x, A(n) (x) v), where A(n) (x) = A(T n−1 x) · · · · · A(T x) · A(x). Given an ergodic system (X, F, μ, T ), we introduce the main actors—a space of cocycles and a set of observables—and we describe the main assumptions on them— certain uniform large deviation type (LDT) estimates and a uniform L p -boundedness. Then we formulate an abstract criterion for the continuity of the corresponding Lyapunov exponents.

3.1.1 Cocycles and Observables Definition 3.1 A space of measurable cocycles C is any class of matrix valued functions A : X → Mat(m, R), where m ∈ N is not fixed, such that every A : X → Mat(m, R) in C has the following properties: 1. A is F-measurable, 2. A ∈ L ∞ (μ),   3. The exterior powers ∧k A : X → Mat( mk , R) are in C , for k ≤ m. Each subspace Cm := { A ∈ C | A : X → Mat(m, R) } is a-priori endowed with a distance dist : Cm × Cm → [0, +∞) which is at least as fine as the L ∞ distance, i.e. for all A, B ∈ Cm we have dist(B, A) ≥ B − A L ∞ . We assume a correlation between the distances on each of these subspaces, in the sense that the map Cm  A → ∧k A ∈ C(mk) is locally Lipschitz. Let A ∈ C be a measurable cocycle. Since A ∈ L ∞ , we have log+ A ∈ L 1 , hence Furstenberg-Kesten’s theorem (the non-invertible, one-sided case, see Chap. 3 in [1]) applies. In particular, if we denote L (n) 1 (A) :=

 X

1 logA(n) (x) μ(d x), n

then as n → ∞, L (n) 1 (A) → L 1 (A) (the maximal Lyapunov exponent). We call L (n) (A) finite scale (maximal) Lyapunov exponents. 1

(3.1)

3.1 Definitions, the Abstract Setup and Statement

1 n

83

We will need stronger integrability assumptions on the measurable functions logA(n) (x). Let 1 ≤ p ≤ ∞. For simplicity of notations, later on we may assume that p = 2.

Definition 3.2 A cocycle A ∈ C is called L p -bounded if there is C < ∞, which we call its L p -bound, such that for all n ≥ 1 we have:  1    logA(n)  p < C. L n

(3.2)

Definition 3.3 A cocycle A ∈ Cm is called uniformly L p -bounded if there are δ = δ(A) > 0 and C = C(A) < ∞ such that for all B ∈ Cm with dist(B, A) < δ and for all n ≥ 1 we have:  1    logB (n)  p < C. L n

(3.3)

It is not difficult to show that if a cocycle A ∈ C satisfies the bounds logA±  L p ≤ C < ∞, then for all n ∈ Z we have

Hence if we assume that

  logA(n)  L p ≤ C n .

(3.4)

logA±  ∈ L p

(3.5)

holds for all cocycles A ∈ C , and if we endow Cm with the distance given by dist p (A, B) := A − B L ∞ +  logA−1  − logB −1  L p when 1 ≤ p < ∞, and by dist∞ (A, B) := A − B L ∞ , when p = ∞, then every cocycle A ∈ C is uniformly L p -bounded. In the applications we have in this book, the uniform L p -boundedness is automatic. For instance, in the case of random cocycles, we assume from the beginning the integrability condition (3.5) which implies uniform L p -boundedness. However, we want that our scheme be applicable also to cocycles that are very singular (i.e. non-invertible everywhere), which is the case of a forthcoming paper on quasi-periodic cocycles, and that is why we make the weaker, uniform L p boundedness assumption.

84

3 Abstract Continuity of Lyapunov Exponents

Given a cocycle A ∈ C and an integer N ∈ N, denote by F N (A) the lattice (w.r.t. union and intersection) generated by the sets {x ∈ X : A(n) (x) ≤ c} or {x ∈ X : A(n) (x) ≥ c} where c ≥ 0 and 0 ≤ n ≤ N . Let Ξ be a set of measurable functions ξ : X → R, which we call observables. Definition 3.4 We say that Ξ and A are compatible if for every integer N ∈ N, for every set F ∈ F N (A) and for every ε > 0, there is an observable ξ ∈ Ξ such that:  1 F ≤ ξ and

ξ dμ ≤ μ(F) + ε.

(3.6)

X

3.1.2 Large Deviations Type Estimates As mentioned in the introduction, the main tools in our results are some appropriate large deviations type (LDT) estimates for the given dynamical systems (meaning the base and the fiber dynamics). An LDT estimate for the base dynamics says that given an observable ξ : X → R, we have  n−1  1  j  ξ(T x) − ξ dμ > ε} < ι(n, ε), μ {x ∈ X : n j=0 X where ε = o(1) and ι(n, ε) → 0 (as n → ∞) represent, respectively, the size of the deviation from the mean and the measure of the deviation set. The above inequality should hold for all integers n ≥ n 0 (ξ, ε). In classical probabilities, when dealing with i.i.d. random variables, large deviations are precise asymptotic statements, and the measure of the deviation set decays exponentially. For our purposes, and for the given dynamical systems, we need slightly different types of estimates (not as precise, but for all iterates of the system and satisfying some uniformity properties). Moreover, in some of our applications (e.g. to certain types of quasi-periodic cocycles), the available decay of the measure of the deviation set is not exponential in the number of iterates, but slower than exponential. This is the motivation behind the following formalism. Fix a constant 1 < p ≤ ∞. From now on, ε, ι : (0, ∞) → (0, ∞) will represent functions that describe respectively, the size of the deviation from the mean and the measure of the deviation set. We assume that the deviation size functions ε(t) are nonincreasing. We assume that the deviation set measure functions ι(t) are continuous and strictly decreasing to 0, as t → ∞, at least like a power and at most like an exponential, in other words we assume that: log t  log

1  t as t → ∞. ι(t)

3.1 Definitions, the Abstract Setup and Statement

85 p−1

Denote by φι (t) the inverse of the map t → ψι (t) := t ι(t)− 2 p−1 . We then also assume that the increasing function φι (t) does not grow too fast, or more precisely that: φι (2t) lim < 2. t→∞ φι (t) We will use the notation εn := ε(n) and ιn := ι(n) for integers n. In the applications we have thus far, the constant p = 2 or p = ∞, the deviation size functions are either constant functions ε(t) ≡ ε for some 0 < ε 1 or powers ε(t) ≡ t −a for some a > 0, while the deviation set measure functions b are exponentials ι(t) ≡ e−c t , sub-exponentials ι(t) ≡ e−c t or nearly-exponentials b ι(t) ≡ e−c t/(log t) for some c, b > 0. Let E and I be some spaces of functions, with E containing deviation size functions ε(t) and I containing deviation set measure functions ι(t). We assume that I is a convex cone, i.e., the functions a · ι(t) and ι1 (t) + ι2 (t) belong to I for any a > 0 and ι1 , ι2 ∈ I. Let P = P( p) be a set of triplets p = (n 0 , ε, ι), where n 0 ∈ N, ε ∈ E and ι ∈ I. An element p ∈ P is called an LDT parameter. Our set of LDT parameters P should satisfy the condition: for all ε > 0 there is p = p(ε) = (n 0 , ε, ι) ∈ P such that εn 0 ≤ ε, so P contains LDT parameters with arbitrarily small deviation size functions. We now define the base and fiber LDT estimates, which are relative to given spaces of deviation functions E, I and set of parameters P. Definition 3.5 An observable ξ : X → R satisfies a base-LDT estimate if for every ε > 0 there is p = p(ξ, ε) ∈ P, p = (n 0 , ε, ι), such that for all n ≥ n 0 we have εn ≤ ε and  n−1 1   μ {x ∈ X :  ξ(T j x) − ξ dμ > εn } < ιn . (3.7) n j=0 X Definition 3.6 A measurable cocycle A ∈ C satisfies a fiber-LDT estimate if for every ε > 0 there is p = p(A, ε) ∈ P, p = (n 0 , ε, ι), such that for all n ≥ n 0 we have εn ≤ ε and 1   μ {x ∈ X :  logA(n) (x) − L (n) 1 (A) > εn } < ιn . n

(3.8)

We will need a stronger form of the fiber-LDT, one that is uniform in a neighborhood of the cocycle, in the sense that estimate (3.8) holds with the same LDT parameter for all nearby cocycles. Definition 3.7 A measurable cocycle A ∈ Cm satisfies a uniform fiber-LDT if for all ε > 0 there are δ = δ(A, ε) > 0 and p = p(A, ε) ∈ P, p = (n 0 , ε, ι), such that if B ∈ Cm with dist(B, A) < δ and if n ≥ n 0 then εn ≤ ε and  1  μ {x ∈ X :  logB (n) (x) − L (n) 1 (B) > εn } < ιn . n

(3.9)

86

3 Abstract Continuity of Lyapunov Exponents

3.1.3 Abstract Continuity Theorem of the Lyapunov Exponents We are ready to formulate the main result of this chapter. Theorem 3.1 Consider an ergodic MPDS (X, F, μ, T ), a space of measurable cocycles C , a set of observables Ξ , a constant 1 < p ≤ ∞, a set of LDT parameters P = P( p) with corresponding spaces of deviation functions E, I and assume the following: 1. 2. 3. 4.

Ξ is compatible with every cocycle A ∈ C . Every observable ξ ∈ Ξ satisfies a base-LDT. Every A ∈ C with L 1 (A) > −∞ is uniformly L p -bounded. Every cocycle A ∈ C for which L 1 (A) > L 2 (A) satisfies a uniform fiber-LDT.

Then all Lyapunov exponents L k : Cm → [−∞, ∞), 1 ≤ k ≤ m, m ∈ N are continuous functions of the cocycle. Moreover, given A ∈ Cm and 1 ≤ k ≤ m, if L k (A) > L k+1 (A), then locally near p−1 A the map L 1 + L 2 + · · · + L k has a modulus of continuity ω(h) := [ι (c log h1 )] 2 p−1 for some ι = ι(A) ∈ I and c = c(A) > 0. The proof of the abstract continuity theorem (ACT) of the Lyapunov exponents will be finalized in Sects. 3.5 and 3.6. In Sect. 3.2 we prove that the upper semicontinuity of the maximal Lyapunov exponent holds uniformly in cocycle and phase, for a large set of phases. While this result is interesting in itself, in our scheme it ensures that in an inductive procedure based on the AP, the gap condition holds. The inductive procedure, which is a type of multiscale analysis leading to the proof of continuity of the Lyapunov exponents, is described in Sect. 3.3 (the base step) and Sect. 3.4 (the inductive step). We note that the use of the nearly upper semicontinuity of the maximal Lyapunov exponent result in Sect. 3.2 represents a major point of difference between our inductive procedure and the one employed by Goldstein and Schlag in [4] or by Schlag in [7]. It is also what allows us to treat, within this scheme, random models (see Chap. 5) and (in a future work) identically singular quasi-periodic models.

3.2 Upper Semicontinuity of the Top Lyapunov Exponent Given are an ergodic system (X, F, μ, T ), a space of measurable cocycles C , a set of observables Ξ and a set of LDT parameters P with corresponding spaces of deviation functions E and I. It is well know that the top Lyapunov exponent is upper semicontinuous as a function of the cocycle. Our argument requires a much more precise version of the upper semicontinuity, one that is uniform in the number n of iterates and in the phase x. Such results are available, see [3, 5], and they are based on a stopping

3.2 Upper Semicontinuity of the Top Lyapunov Exponent

87

time argument used by Katznelson and Weiss [6] in their proofs of the Birkhoff’s and Kingman’s ergodic theorems. However, the results in [3, 5] require unique ergodicity of the system, a property that Bernoulli and Markov shifts do not satisfy. By replacing unique ergodicity with a weaker property—namely that a base-LDT holds for a large enough set of observables, which we show later to hold for Markov shifts—we obtain a (weaker) version of the uniform upper semicontinuity in [5], one which holds for a large enough set of phases. Proposition 3.1 (nearly uniform upper semicontinuity) Let A ∈ Cm be a measurable cocycle such that Ξ and A are compatible and every observable ξ ∈ Ξ satisfies a base-LDT with corresponding LDT parameter in P. (i) Assume that L 1 (A) > −∞ and that A is L 1 -bounded. For every ε > 0, there are δ = δ(A, ε) > 0, n 0 = n 0 (A, ε) ∈ N and ι = ι(A, ε) ∈ I, such that if B ∈ Cm with dist(B, A) < δ, and if n ≥ n 0 , then the upper bound 1 logB (n) (x) ≤ L 1 (A) + ε n

(3.10)

holds for all x outside of a set of measure < ιn . Up to a zero measure set, the exceptional set depends only on A, ε. (ii) Assume that L 1 (A) = −∞. For every t < ∞, there are δ = δ(A, t) > 0, n 0 = n 0 (A, t) ∈ N and ι = ι(A, t) ∈ I, such that if B ∈ Cm with dist(B, A) < δ, and if n ≥ n 0 , then the upper bound 1 logB (n) (x) ≤ −t n

(3.11)

holds for all x outside of a set of measure < ιn . Up to a zero measure set, the exceptional set depends only on A, t. Proof Throughout this proof, C will stand for a positive, finite, large enough constant that depends only on the cocycle A, and which may change slightly from one estimate to another. If B ∈ C is at some small distance from A, then it will be close enough to A in the L ∞ distance as well, hence we will assume that for μ a.e. x ∈ X we have B(x) < C. Moreover, in the case (i), when L 1 (A) is finite, since we also assume A to be 1 C such that for all n ≥ 1 we have L  bounded, we may choose the constant   1 (n)    n logA  1 < C and hence also L 1 (A) < C. L The proofs for each of the two cases are similar, but the argument will differ in some parts. We first present the case L 1 (A) > −∞ in detail, then indicate how to modify the argument for the case L 1 (A) = −∞.

88

3 Abstract Continuity of Lyapunov Exponents

(i) Fix ε > 0. By Kingman’s subadditive ergodic theorem, lim

n→∞

1 logA(n) (x) = L 1 (A) n

for μ a.e. x,

hence the number n(x) := min{n ≥ 1 :

1 logA(n) (x) < L 1 (A) + ε} n

(3.12)

is defined for μ a.e. x ∈ X . For every integer N , let U N := {x : n(x) ≤ N } =

N  n=1

{x :

1 logA(n) (x) < L 1 (A) + ε}. n

Then UN ∈ F N (A), U N ⊂ U N +1 and ∪ N U N has full measure. Therefore, there is N = N (ε, A) such that μ(UN ) < ε. We fix this integer N for the rest of the proof and denote the set U = U(ε, A) := U N . Therefore, U ∈ F N (A), μ(U ) < ε and we have: if x ∈ U then 1 ≤ n(x) ≤ N and (3.13) logA(n(x)) (x) ≤ n(x)L 1 (A) + n(x)ε. Next we will bound from above logB (n) (x) by logA(n) (x) + o(1) for all cocycles B with dist(B, A) < δ where δ will be chosen later, for all 1 ≤ n ≤ N and for a large set of phases x ∈ X . Since A is L 1 -bounded, logA(n)  ∈ L 1 (X, μ), so A(n) (x) = 0 for μ-a.e. x ∈ X . Moreover, if B ∈ Cm with dist(B, A) < δ (where δ 1 is chosen below), we have B(x) − A(x) < δ and B(x) < C for μ-a.e. x ∈ X . Then for x outside a null set and for 1 ≤ n ≤ N , we have: B (n) (x) A(n) (x) B (n) (x) − A(n) (x) B (n) (x) − A(n) (x) + 1] ≤ ≤ log[ (n) A (x) A(n) (x) 1 1 ≤ N C N −1 δ · . ≤ nC n−1 δ · (n) (n) A (x) A (x)

logB (n) (x) − logA(n) (x) = log

Hence logB (n) (x) ≤ logA(n) (x) + δ N C N −1

1 A(n) (x)

(3.14)

for all x outside a zero measure set and for all 1 ≤ n ≤ N . 2 N Let t := e−N C/ε . Consider the set V := n=1 {x : A(n) (x) > t}. Clearly   V ∈ F N (A), and we will show that V has measure at most ε.

3.2 Upper Semicontinuity of the Top Lyapunov Exponent

89

If for some 1 ≤ n ≤ N and x ∈ X we have A(n) (x) ≤ t (< 1), then  1  logA(n) (x) > log 1/t , n n hence V ⊂

N 

1  log 1/t }. {x :  logA(n) (x) > n n n=1

Since A is L 1 -bounded, there is C = C(A) < ∞ such that for all n ≥ 1  1    logA(n)  1 < C. L n Then by Chebyshev’s inequality, 1  log 1/t Cn CN ε μ {x :  logA(n) (x) > }< ≤ = . n n log 1/t log 1/t N Therefore,

μ(V ) < N ε/N = ε,

and if 1 ≤ n ≤ N then for μ a.e. x ∈ V we have logB (n) (x) ≤ logA(n) (x) + δ N C N −1 e N

2

C/ε

< logA(n) (x) + ε,

provided we choose δ < δ(ε, C, N ) = δ(ε, A) small enough. Let O := U ∩ V. Then O ∈ F N (A) and μ(O ) < 2ε. We conclude that for μ almost every x ∈ O, we have: logB (n(x)) (x) ≤ n(x) L 1 (A) + n(x) 2ε.

(3.15)

Let n 0 = n 0 (ε, A) := CεN . Fix x ∈ X and define inductively for all k ≥ 1 the sequence of phases xk = xk (x) ∈ X and the sequence of integers n k = n k (x) ∈ N as follows:

x1 = x

n(x1 ) 1

n(x2 ) n2 = 1

n1 =

x2 = T n1 x1 ...



xk+1 = T xk nk

n k+1 =

if x1 ∈ O /O if x1 ∈ if x2 ∈ O /O if x2 ∈

n(xk+1 ) 1

if xk+1 ∈ O / O. if xk+1 ∈

90

3 Abstract Continuity of Lyapunov Exponents

Note that for all k ≥ 1, xk+1 = T n k +···+n 1 x and 1 ≤ n k ≤ N . For any n ≥ n 0 (> N ≥ n 1 ), there is p ≥ 1 such that n 1 + · · · + n p ≤ n < n 1 + · · · + n p + n p+1 , so n = n 1 + · · · + n p + m, where 0 ≤ m < n p+1 ≤ N . For any cocycle B such that dist(B, A) < δ, let bn (x) := logB (n) (x). Then clearly for μ-a.e. x ∈ X we have bn (x) ≤ n C, where C is a constant that depends on A and bn (x) is a sub-additive process, meaning: bn+m (x) ≤ bn (x) + bm (T n x) for all n, m ≥ 1 and for μ almost every x ∈ X . Using this sub-additivity and the definition of xk (x), n k (x), we have: logB (n) (x) = bn (x) = bn 1 +···+n p +m (x) ≤

p 

bn k (xk ) + bm (x p+1 ).

(3.16)

k=1

We estimate each term separately. Each estimate is valid for x outside of a null set. For the last term we use the trivial bound: bm (x p+1 ) ≤ mC < N C.

(3.17)

For every 1 ≤ k ≤ p we have: • Either xk ∈ O, so n k = n(xk ), in which case, using (3.15) we get: bn k (xk ) = logB (n(xk )) (xk ) ≤ n(xk ) L 1 (A) + 2ε n(xk ) = (L 1 (A) + 2ε) n k . / O, so n k = 1, in which case bn k (xk ) = logB(xk ) ≤ C. • Or xk ∈ Therefore, bn k (xk ) = bn k (xk ) 1O (xk ) + bn k (xk ) 1O (xk ) ≤ (L 1 (A) + 2ε) n k 1O (xk ) + C 1O (xk ) = (L 1 (A) + 2ε) n k − (L 1 (A) + 2ε) n k 1O (xk ) + C 1O (xk ) = (L 1 (A) + 2ε) n k − (L 1 (A) + 2ε) 1O (xk ) + C 1O (xk ), where in the last equality we used the fact that n k = 1 when xk ∈ O .

3.2 Upper Semicontinuity of the Top Lyapunov Exponent

91

We conclude: bn k (xk ) ≤ (L 1 (A) + 2ε) n k + (C − L 1 (A) − 2ε) 1O (xk ) < (L 1 (A) + 2ε) n k + 2C 1O (xk ).

(3.18)

We add up (3.17) and (3.18) for all 1 ≤ k ≤ p, and then use (3.16) to get: logB (n) (x) ≤ (n 1 + · · · + n p ) (L 1 (A) + 2ε) + 2C

p 

1O (xk ) + C N

k=1

≤ n (L 1 (A) + 2ε) + 2C

n−1 

1O (T j x) + C N .

j=0

Divide both sides by n to conclude that for μ-a.e. x ∈ X and for all n ≥ n 0 , n−1 1 1 CN logB (n) (x) ≤ L 1 (A) + 2ε + 2C . 1  (T j x) + n n j=0 O n

(3.19)

By the choice of n we have CnN < ε, so all that is left is to estimate the Birkhoff  there is an average above. We use the compatibility condition. Since O ∈ F N (A), observable ξ = ξ(A, ε) ∈ Ξ such that 1O ≤ ξ and X ξ dμ < μ(O ) + ε < 3ε. Then, applying the base-LDT to ξ , there is p = p(ξ, ε) = p(A, ε) ∈ P, p = (n 0 , ε, ι), such that for n ≥ n 0 we have εn ≤ ε and  n−1 n−1 1 1 1O (T j x) ≤ ξ(T j x) < ξ dμ + εn < 4ε, n j=0 n j=0 X provided we choose x outside a set of measure ιn . This ends the proof in the case L 1 (A) > −∞. (ii) The case L 1 (A) = −∞. Let t be large enough, say t > C + 1. We apply again Kingman’s subadditive theorem and for μ a.e. x ∈ X , define the integers n(x) := min{n ≥ 1 :

1 logA(n) (x) < −2t}. n

(3.20)

The sets U N are defined as before. Fix N = N (A, t), then U = U(A, t) = U N so that μ(U ) < 1/t. Furthermore, U ∈ F N (A) and if x ∈ U then 1 ≤ n(x) ≤ N and logA(n(x)) (x) ≤ −2tn(x).

(3.21)

We show that (3.21) holds also for cocycles B in a neighborhood of A. This is where the argument differs from the case L 1 (A) > −∞.

92

3 Abstract Continuity of Lyapunov Exponents −2N t

Let 0 < δ < t Ne C N −1 , and let B ∈ Cm with dist(B, A) < δ, so B(x)− A(x) < δ for μ-a.e. x ∈ X . Then clearly, for any 1 ≤ m ≤ N and for μ-a.e. x ∈ X we have: B (m) (x) − A(m) (x) < m C m−1 δ ≤ N C N −1 δ
−∞. The rest of the proof then follows exactly the same pattern as when L 1 (A) > −∞, the role of L 1 (A) + 2ε being now played by −2t + 1/t, while the small set O is simply U, since there was no extra small set excluded when deriving (3.25).  Remark 3.1 Note that since our cocycles are in L ∞ , Proposition 3.1 above also implies the upper semicontinuity of the top Lyapunov exponent as a function of the cocycle. If L 1 (A) = −∞, this in particular implies the continuity of L 1 at A; moreover, since L 1 (A) ≥ L 2 (A) ≥ · · · ≥ L m (A), this also implies the continuity at A of each Lyapunov exponent. Therefore, from now on, we may assume that L 1 (A) > −∞.

3.2 Upper Semicontinuity of the Top Lyapunov Exponent

93

Remark 3.2 If (X, μ, T ) is uniquely ergodic (e.g. an ergodic torus translation), we may choose Ξ to be the set of all continuous functions on X . If the cocycle is continuous, then using Urysohn’s lemma, it is easy to verify that the compatibility condition between Ξ and A holds. For uniquely ergodic systems, the convergence in Birkhoff’s ergodic theorem is uniform in the phase for all continuous observables. Hence the base-LDT estimate (3.5) holds automatically for all ξ ∈ Ξ , with deviation measure function ι(t) ≡ 0. It follows that if (X, μ, T ) is uniquely ergodic, then the statements in Proposition 3.1 above hold for a.e. x. Because the cocycle is continuous, these statements hold for all phases x. Hence we recover the corresponding result in [5]. The main application of Proposition 3.1 is the following lemma, which we use repeatedly throughout the inductive argument. It gives us a lower bound on the gap between the first two singular values of the iterates of a cocycle, thus ensuring the gap condition in the Avalanche Principle. Throughout this chapter, if A ∈ C is such that L 1 (A) > L 2 (A) ≥ −∞, then κ(A) denotes the gap between the first two LE, i.e. κ(A) := L 1 (A) − L 2 (A) > 0 when L 2 (A) > −∞, while if L 2 (A) = −∞ then κ(A) is a fixed, large enough finite constant. We are now under the assumptions of the abstract continuity Theorem 3.1. Lemma 3.1 Let A ∈ Cm be a cocycle for which L 1 (A) > L 2 (A) and let ε > 0. There are δ0 = δ0 (A, ε) > 0, n 0 = n 0 (A, ε) ∈ N and ι = ι(A, ε) ∈ I such that for all B ∈ Cm with dist(B, A) < δ0 and for all n ≥ n 0 , if   (n)  L (B) − L (n) (A) < θ, 1

1

(3.26)

then for all phases x outside a set of measure < ιn we have:

Moreover,

1 log gr(B (n) (x)) > κ(A) − 2θ − 3ε. n

(3.27)

(n) L (n) 1 (B) − L 2 (B) > (κ(A) − 2θ − 3ε) (1 − ιn ).

(3.28)

Proof Fix ε > 0. If L 2 (A) = −∞, let t = t (A) := −2L 1 (A) + κ(A). Since L 1 (A) > L 2 (A), the cocycle A satisfies a uniform fiber-LDT with a parameter p = p(A, ε) ∈ P and in a neighborhood around A of size δ(A, ε) > 0. The compatibility condition holds for all cocycles in C , hence also for ∧2 A. Note that L 1 (∧2 A) = L 1 (A) + L 2 (A), hence L 1 (∧2 A) > −∞ iff L 2 (A) > −∞. The nearly uniform upper semicontinuity of the top LE (Proposition 3.1) can then be applied to ∧2 A, and it gives parameters δ > 0, ι ∈ I, n 0 ∈ N that define the range of validity of (3.10) and (3.11) respectively. These parameters depend on A and ε when L 2 (A) > −∞ and only on A when L 2 (A) = −∞. Pick δ = δ(A, ε) > 0, n 0 = n 0 (A, ε) ∈ N, ι = ι(A, ε) ∈ I such that both the uniform fiber-LDT and Proposition 3.1 apply for all cocycles B ∈ Cm with

94

3 Abstract Continuity of Lyapunov Exponents

dist(B, A) < δ, for all n ≥ n 0 and for all x outside a set of measure < ιn . Fix such B, n, x. For any matrix g ∈ Mat(m, R) we have gr(g) =

g2 s1 (g) = ∈ [1, ∞]. s2 (g) ∧2 g

(3.29)

From (3.29) we get 1 1 1 log gr(B (n) (x)) = 2 logB (n) (x) − log∧2 B (n) (x). n n n

(3.30)

The uniform fiber-LDT gives a lower bound on the first term on the right hand side of (3.30): 1 (n) logB (n) (x) > L (n) 1 (B) − εn > L 1 (B) − ε. n Moreover, from assumption (3.26) we have (n) L (n) 1 (B) > L 1 (A) − θ ≥ L 1 (A) − θ,

hence

1 logB (n) (x) > L 1 (A) − θ − ε. n

(3.31)

Proposition 3.1 applied to ∧2 A will give an upper bound on n1 log∧2 B (n) (x). If L 2 (A) > −∞, so L 1 (∧2 A) > −∞, from part (i) of Proposition 3.1 we get 1 log∧2 B (n) (x) < L 1 (∧2 A) + ε = L 1 (A) + L 2 (A) + ε. n

(3.32)

Combine (3.30)–(3.32) to conclude that for all chosen B, n, x we have: 1 log gr(B (n) (x)) > κ(A) − 2θ − 3ε, n which proves (3.27). Integrating in x we derive (3.28). Now if L 2 (A) = −∞, so L 1 (∧2 A) = −∞, use part (ii) of Proposition 3.1 to get 1 log∧2 B (n) (x) < −t = 2L 1 (A) − κ(A). n

(3.33)

Combine (3.30)–(3.33) and get (3.27) in this case as well. Then (3.28) follows as above. 

3.2 Upper Semicontinuity of the Top Lyapunov Exponent

95

For the rest of this chapter, we are given an ergodic MPDS (X, F, μ, T ), a space of measurable cocycles C , a set of observables Ξ and a set of LDT parameters P with corresponding spaces of deviation functions E and I. We assume the compatibility condition in Definition 3.4 between Ξ and any cocycle A ∈ C , the base-LDT for any observable ξ ∈ Ξ , the uniform L p -boundedness condition (put p = 2 to simplify notations) on any cocycle A ∈ C with L 1 (A) > −∞ and the uniform fiber-LDT for any cocycle A ∈ C with L 1 (A) > L 2 (A). These LDT estimates hold for parameters p ∈ P.

3.3 Finite Scale Continuity We show that the finite scale Lyapunov exponents have a continuous behavior if the scale is fixed. We are not able to prove actual continuity of these finite scale quantities, unless we make some restrictions on the space of cocycles. However, this continuous behavior at finite scale is sufficient for our purposes, as the inductive procedure described in the next section leads to the actual continuity of the limit quantities (the LE) as the scale goes to infinity. Proposition 3.2 (finite scale uniform continuity) Let A ∈ Cm be a cocycle for which L 1 (A) > L 2 (A). There are δ0 = δ0 (A) > 0, n 01 = n 01 (A), C1 = C1 (A) > 0 and ι = ι(A) ∈ I such that for any two cocycles B1 , B2 ∈ Cm with dist(Bi , A) ≤ δ0 where i = 1, 2, if n ≥ n 01 and dist(B1 , B2 ) < e−C1 n , then   (n)  L (B1 ) − L (n) (B2 ) < ι1/2 . 1

1

n

(3.34)

Proof Let ε0 := κ(A)/10 > 0. Since L 1 (A) > L 2 (A), the uniform fiber-LDT and Lemma 3.1 hold for A, ε0 . Choose parameters p = p(A) ∈ P, p = (n 0 , ε, ι) and δ0 = δ0 (A) > 0 such that εn 0 ≤ ε0 and Lemma 3.1 and the fiber-LDT hold for all cocycles B ∈ Cm with dist(B, A) ≤ δ0 and for all n ≥ n 0 . Let C0 = C0 (A) > 0 such that for all such cocycles B we have B L ∞ ≤ eC0 and for all n ≥ 1,   (n)  1  (n)  L (B) ≤  logB (x)   2 ≤ C0 . 1 L n Pick C1 > 2C0 + ε0 and n 01 ≥ n 0 such that e−C1 n 01 < δ0 . Let n ≥ n 01 and Bi ∈ Cm with dist(Bi , A) ≤ δ0 (i = 1, 2) be arbitrary but fixed. Assume that dist(B1 , B2 ) < e−C1 n . Apply the fiber-LDT to each Bi and conclude that for all x outside a set Bin , with μ(Bin ) < ιn we have: 1 logBi(n) (x) > L (n) 1 (Bi ) − εn ≥ −C 0 − ε0 . n

(3.35)

96

3 Abstract Continuity of Lyapunov Exponents

Let Bn = B1n ∪ B2n , so μ(Bn ) < 2 ιn and if x ∈ Bn then we have: B1(n) (x), B2(n) (x) > e−(C0 +ε0 ) n .

(3.36)

Moreover, for μ-a.e. x ∈ Bn and all 0 ≤ j ≤ n − 1 we also have: B1 (T j x) − B2 (T j x) ≤ dist(B1 , B2 ) < e−C1 n . Therefore, for μ-a.e. x ∈ Bn we get: (n)   1   logB (n) (x) − 1 logB (n) (x) = 1 log B1 (x)  1 2 (n) n n n B2 (x)



1 B1(n) (x) − B2(n) (x) n min{B1(n) (x), B2(n) (x)}



n−1 1  (C0 +ε0 ) n (n− j−1) ( j) e B2 (T j+1 x) B1 (T j x) − B2 (T j x) B1 (x) n j=0

n−1 1  (C0 +ε0 ) n C0 (n− j−1) −C1 n C0 j e e e e ≤ e−n (C1 −2C0 −ε0 ) . ≤ n j=0

Integrating in x we conclude:  B n

 1  logB (n) (x) − 1 logB (n) (x) μ(d x) < e−n (C1 −2C0 −ε0 ) . 1 2 n n

(3.37)

By Cauchy-Schwarz we have 

hence

 1  logB (n) (x) − 1 logB (n) (x) μ(d x) 1 2 n Bn n 1  1      ≤  logB1(n) (x) 2 · μ(Bn )1/2 +  logB2(n) (x) 2 · μ(Bn )1/2 , L L n n  Bn

 1  logB (n) (x) − 1 logB (n) (x) μ(d x)  C0 ι1/2 . n 1 2 n n

(3.38)

Since ι ∈ I decays at most exponentially, we may assume, by choosing C1 large, that 1/2 e−n (C1 −2C0 −ε0 ) < ιn , so (3.37) and (3.38) imply   (n)  L (B1 ) − L (n) (B2 ) ≤ 1

1

which proves (3.34).

 X

 1  logB (n) (x) − 1 logB (n) (x) μ(d x) < ι1/2 , n 1 2 n n 

3.4 The Inductive Step Procedure

97

3.4 The Inductive Step Procedure In this section we derive the main technical result used to prove our continuity theorem, an inductive tool based on the avalanche principle (2.42), the uniform fiberLDT in Definition 3.7 and the nearly uniform upper semicontinuity Proposition 3.1. All estimates involving two consecutive scales n 0 , n 1 of the inductive procedure will carry errors of order at most nn01 . We begin with a simple lemma which shows that we may always assume that n 1 is a multiple of n 0 , otherwise an extra error term of the same order is accrued. Lemma 3.2 Let A ∈ C be an L 1 -bounded cocycle, and let C be its L 1 -bound. If n 0 , n 1 , n, r ∈ N are such that n 1 = n · n 0 + r and 0 ≤ r ≤ n 0 , then − 2C

n0 n0 n0 ) (n n 0 ) 1) + L ((n+1) (A) ≤ L (n (A) + 2C 1 1 (A) ≤ L 1 n1 n1

(3.39)

   Proof From the L 1 -boundedness assumption on A, for all m ≥ 1,  L (m) 1 (A) ≤ C. (n 1 ) (r ) n n0 (n n 0 ) Since n 1 = n · n 0 + r and r ≥ 0, we have A (x) = A (T x) · A (x), hence A(n 1 ) (x) ≤ A(r ) (T n n 0 x) A(n n 0 ) (x) . Taking logarithms, dividing by n 1 then integrating in x we get: 1) L (n 1 (A) ≤

n n 0 (n n 0 ) r (r ) L 1 (A) + L (A). n1 n1 1

This implies (n n 0 ) 1) L (n (A) ≤ 1 (A) − L 1

r r [L (r ) (A) − L 1(n n 0 ) (A)] ≤ 2C , n1 1 n1

which proves the right hand side of (3.39). Now write (n + 1) n 0 = n 1 + q, where q = n 0 − r , so 0 ≤ q ≤ n 0 . Then A((n+1) n 0 ) (x) = A(q) (T n 1 x) · A(n 1 ) (x), A((n+1) n 0 ) (x) ≤ A(q) (T n 1 x) A(n 1 ) (x). Taking logarithms, dividing by (n + 1) n 0 then integrating in x we get: n0 ) (A) ≤ L ((n+1) 1

n1 q (q) 1) L (n L (A). 1 (A) + (n + 1) n 0 (n + 1) n 0 1

This implies n0 ) 1) L ((n+1) (A) − L (n 1 1 (A) ≤

q 1 (q) 1) , [L (A) − L (n 1 (A)] ≤ 2C (n + 1) n 0 1 (n + 1)

which proves the left hand side of (3.39).



98

3 Abstract Continuity of Lyapunov Exponents

Lemma 3.3 Let B ∈ C satisfying a fiber-LDT with parameter p = (n 0 , ε, ι) ∈ P. Let m 1 , m 2 , n ∈ N and η > 0 be such that m i ≥ n ≥ n 0 for i = 1, 2 and  (m 2 +m 1 )  L (B) − L (m i ) (B) < η 1

Then

1

B (m 2 +m 1 ) (x) > e−(m 1 +m 2 )(η+2εn ) B (m 2 ) (T m 1 x) B (m 1 ) (x)

(3.40)

for all x outside a set of measure < 3 ιn . Proof Applying (one side inequality in) the fiber-LDT to the cocycle B at scale m 2 + m 1 , for all x outside a set of measure < ιm 2 +m 1 < ιn , we have: 1 2 +m 1 ) 2 +m 1 ) logB (m 2 +m 1 ) (x) > L (m (B) − εm 2 +m 1 ≥ L (m (B) − εn , 1 1 m2 + m1 hence

(m 1 +m 2 )

B (m 2 +m 1 ) (x) > e(m 1 +m 2 ) L 1

(B)−(m 2 +m 1 ) εn

.

(3.41)

Applying (the other side inequality in) the fiber-LDT to the cocycle B at scales m 2 , m 1 , for all x outside a set of measure < ιm 2 + ιm 1 < 2ιn , we have: 1 1) 1) logB (m 1 ) (x) < L (m (B) + εm 1 < L (m (B) + εn , 1 1 m1 1 logB (m 2 ) (T m 1 x) < L 1(m 2 ) (B) + εm 2 < L 1(m 2 ) (B) + εn . m2 Thus (m 1 )

B (m 1 ) (x) < em 1 L 1 B (m 2 ) (T m 1 x) < e

(B)+m 1 εn

(m ) m 2 L 1 2 (B)+m 2 εn

,

(3.42)

.

(3.43)

Combining (3.41)–(3.43), for x outside a set of measure < 3 ιn we get: B (m 2 +m 1 ) (x) B (m 2 ) (T m 1 x) B (m 1 ) (x) (m 2 +m 1 )

> em 1 (L 1

(m 1 )

(B)−L 1

(m 2 +m 1 )

(B))+m 2 (L 1

(m 2 )

(B)−L 1

(B))−2(m 2 +m 1 ) εn

> e−(m 1 +m 2 )(η+2εn ) , which proves the lemma.



3.4 The Inductive Step Procedure

99

Remark 3.3 We note that all is needed in the proof of (3.40) is the availability of the fiber-LDT estimate precisely at scales m 1 , m 2 and m 1 + m 2 and not at all scales n ≥ n 0 . This is of course irrelevant here, but it may be helpful in other contexts, when the (full) fiber-LDT estimate is not available a-priori. Proposition 3.3 (inductive step procedure) Let A ∈ Cm be a measurable cocycle such that L 1 (A) > L 2 (A). Fix 0 < ε < κ(A)/20. There are C = C(A) > 0, δ = δ(A, ε) > 0, n 00 = n 00 (A, ε) ∈ N, ι = ι(A, ε) ∈ I such that for any n 0 ≥ n 00 , if the inequalities (2n 0 ) 0) (B) < η0 (a) L (n 1 (B) − L 1   (n 0 ) (n ) (b)  L (B) − L 0 (A) < θ0 1

1

(3.44) (3.45)

hold for a cocycle B ∈ Cm with dist(B, A) < δ, and if the positive numbers η0 , θ0 satisfy (3.46) 2θ0 + 4η0 < κ(A) − 12ε, then for any integer n 1 such that −1/3 n 1+ 0 ≤ n 1 ≤ n 0 · ιn 0 ,

(3.47)

  (n 1 )  L (B) + L (n 0 ) (B) − 2L (2n 0 ) (B)  < C n 0 . 1 1 1 n1

(3.48)

we have:

Furthermore, (2n 1 ) 1) (B) < η1 (a++) L (n 1 (B) − L 1   (n 1 ) (n ) (b++)  L (B) − L 1 (A) < θ1 , 1

1

(3.49) (3.50)

where θ1 = θ0 + 4η0 + C η1 = C

n0 . n1

n0 , n1

(3.51) (3.52)

Proof Since L 1 (A) > L 2 (A), the cocycle A satisfies a uniform fiber-LDT. Moreover, Lemma 3.1 also applies. Pick δ = δ(A, ε) > 0, n 00 = n 00 (A, ε) ∈ N, ι = ι(A, ε) ∈ I such that for any n ≥ n 00 , both the uniform fiber-LDT and Lemma 3.1 apply for all cocycles B ∈ Cm with dist(B, A) < δ, for all n ≥ n 00 and for all x outside a set of measure < ιn . Let n 0 ≥ n 00 and assume (3.44) and (3.45) hold for cocycles B ∈ Cm with dist(B, A) < δ. Assume moreover that the uniform L 2 -bound in Definition 3.3 (with p = 2) applies to all such cocycles.

100

3 Abstract Continuity of Lyapunov Exponents

From (3.44), applying Lemma 3.3, we have: B (2n 0 ) (x) > e−n 0 (2η0 +4εn0 ) ≥ e−n 0 (2η0 +4ε) =: εap B (n 0 ) (T n 0 x) B (n 0 ) (x)

(3.53)

for all x outside a set of measure < 3ιn . This estimate will ensure that the angles condition in the avalanche principle (Proposition 2.42) holds. Moreover, due to the assumption (3.45), applying Lemma 3.1, for x outside a set of measure < ιn , we have: gr(B (n 0 ) (x)) > en 0 (κ(A)−2θ0 −3ε) =:

1 , κap

(3.54)

which will ensure that the gaps condition in the avalanche principle also holds. Let Bn 0 be the union of the exceptional sets in (3.53) and (3.54). To simplify notations, replace the deviation set measure function ι by 4 ι, so we may assume μ(Bn 0 ) < ιn 0 (we will tacitly do this throughout the paper). −1/3 Let n 1 be an integer such that n 1+ 0 ≤ n 1 ≤ n 0 · ιn 0 . Since ι(t) decreases at least like t −c (as t → ∞) for some c > 0, and since ι depends on ε and A, n 00 might need to be chosen larger, depending on ε and A so that if n 0 ≥ n 00 then the integer −1/3 interval [n 1+ 0 , n 0 · ιn 0 ] is large enough. Moreover, due to Lemma 3.2 we may assume that n 1 = n · n 0 for some n ∈ N. To see this, note that once (3.48) is proven for scales that are multiples of n 0 , in particular for the scales n 1 = n n 0 and n 1 = (n + 1) n 0 , then using (3.39) we derive (3.48) for any scale n 1 such that n n 0 ≤ n 1 ≤ (n + 1) n 0 . Furthermore, (3.49) and (3.50) will be derived directly from (3.48). For every 0 ≤ i ≤ n − 1 define gi = gi (x) := B (n 0 ) (T i n 0 x). (n 1 ) (2n 0 ) (T (i−1)n 0 x) for all 1 ≤ i ≤ n − 1. Then clearly g (n) = B (x) and gi gi−1 = B ¯ n 0 ) < n ιn 0 and if x ∈ ¯ n 0 then ¯ n 0 := n−1 T −i n 0 Bn 0 , so μ(B /B Let B i=0

gr(gi ) >

1 for all 0 ≤ i ≤ n − 1, κap

gi gi−1  > εap for all 1 ≤ i ≤ n − 1. gi  gi−1  2 . Note also that condition (3.46) implies κap εap Therefore, we can apply the avalanche principle (Proposition 2.42) and obtain: n−2 n−1     κap  logg (n)  + loggi  − loggi gi−1    n · 2 . εap i=1 i=1

3.4 The Inductive Step Procedure

101

κap = e−n 0 (κ(A)−4η0 −2θ0 −11ε) < e−ε n 0 . 2 εap Since the deviation set measure functions ι ∈ I decay at most exponentially fast, κ we may assume that e−εt ≤ ι(t) for t ≥ n 00 . Hence we have ε2ap < ιn 0 . ap ¯ n 0 ) < nιn 0 , ¯ n 0 , where μ(B The AP applied to our data implies that for all x ∈ /B Note that

n−2    logB (n 0 ) (T in 0 x) logB (n 1 ) (x) + i=1



n−1 

  log B (2n 0 ) (T (i−1)n 0 x)  n ιn 0 .

(3.55)

i=1

¯ n 0 we have /B Divide both sides of (3.55) by n 1 = n · n 0 to get that for all x ∈ n−2 1 1  1  logB (n 1 ) (x) + logB (n 0 ) (T in 0 x)  n1 n i=1 n 0



n−1  2  1  log B (2n 0 ) (T (i−1)n 0 x)  ιn 0 . n i=1 2n 0

by f (x) the function on the left hand side of the estimate above, so   Denote  f (x)  ιn for all x ∈ ¯ n 0 . Clearly /B 0  X

1) f (x) μ(d x) = L (n 1 (B) +

n − 2 (n 0 ) 2(n − 1) (2n 0 ) L 1 (B) − L 1 (B). n n

Using Cauchy-Schwarz and the L 2 -boundedness assumption we have 

   f (x) μ(d x) = X

 ¯ B n0

   f (x) μ(d x) +

 ¯n B 0

   f (x) μ(d x)

¯ n 0 ))1/2  ιn 0 +  f  L 2 1B¯ n  L 2 ≤ ιn 0 + C(A) (μ(B 0 n0 ≤ ιn 0 + C(A) (n ιn 0 )1/2 ≤ C , n1 where the last inequality follows from (3.47). Therefore,  (n 1 )   L (B) + n − 2 L (n 0 ) (B) − 2(n − 1) L (2n 0 ) (B) ≤ 1 1 1 n n



   f (x) μ(d x) < C n 0 . n1 X

102

3 Abstract Continuity of Lyapunov Exponents

The term on the left hand side of the above inequality can be written in the form   (n 1 )  L (B) + L (n 0 ) (B) − 2L (2n 0 ) (B) − 2 [L (n 0 ) (B) − L (2n 0 ) (B)] , 1 1 1 1 1 n hence we conclude:  (n 1 )   L (B) + L (n 0 ) (B) − 2L (2n 0 ) (B)  1 1 1 n0 2 n0 0) 0) L 2 (A). Then the map Cm  B → L 1 (B) is continuous at A and the map Cm  B → L 1 (B) − L 2 (B) is lower semicontinuous at A. Proof Let 0 < ε < κ(A)/100 be arbitrary but fixed. Since L (n) 1 (A) → L 1 (A) as n → ∞, there is n 02 = n 02 (A, ε) ∈ N such that for all n ≥ n 02 we have

3.5 General Continuity Theorem

103 (2n) L (n) 1 (A) − L 1 (A) < ε.

(3.58)

We will apply the inductive step Proposition 3.3 repeatedly. We first choose the relevant parameters (which will depend on A and ε) so that both the inductive step Proposition 3.3 and the finite scale continuity Proposition 3.2 apply. The latter will ensure that the assumptions (3.44)–(3.46) of the inductive step Proposition 3.3 are satisfied for a large enough scale n 0 = n 0 (A, ε), so we can start running the inductive argument with that scale. Let ι ∈ I be the sum of the corresponding deviation measure functions in the inductive step Proposition 3.3 and the finite scale continuity Proposition 3.2. Let δ0 be less than the size of the neighborhood of A ∈ Cm from the inductive step Proposition 3.3 and from the finite scale continuity Proposition 3.2 respectively. Let C1 be the constant in the finite scale continuity Proposition 3.2 and let C be the constant in the inductive step Proposition 3.3. Finally, let the scale n 0 be greater than the thresholds n 00 from the inductive step Proposition 3.3, n 01 from the finite scale continuity Proposition 3.2 and n 02 from 1/2 (3.58) above. Moreover, assume n 0 to be large enough so that e−C1 2n 0 < δ0 , ιn 0 < ε −1/3 −0− 0+ for n ≥ n 0 and C n 0  ε. and n ιn Let δ := e−C1 2n 0 and let B ∈ Cm with dist(B, A) < δ. Since δ = e−C1 2n 0 < e−C1 n 0 , we can apply the finite scale continuity Proposition 3.2 (with B2 = B and B1 = A) at scales 2n 0 and n 0 and get:   (n 0 )  L (B) − L (n 0 ) (A) < ι1/2 =: θ0 < ε, n0 1 1  (2n 0 )  1/2 (2n 0 ) L  (B) − L 1 (A) < ι2n 0 < ι1/2 n 0 = θ0 . 1

(3.59) (3.60)

Then (3.58)–(3.60) imply 0) (B) < 2 ι1/2 L 1(n 0 ) (B) − L (2n n 0 + ε =: η0 < 3ε. 1

(3.61)

The inequality (3.59) also implies the assumption (3.26) in Lemma 3.1 for n = n 0 , hence by (3.28) we have (n 0 ) 0) L (n 1 (B) − L 2 (B) > (κ(A) − 2θ0 − 3ε) · (1 − ιn 0 ).

(3.62)

The inequalities (3.61) and (3.59) imply, respectively, (3.44) and (3.45), that is, the assumptions (a) and (b) in the inductive step Proposition 3.3. Moreover, 2θ0 + 4η0 < 2ε + 12ε = 14ε < κ(A) − 12ε, so the condition (3.46) between parameters is also satisfied. We can apply the inductive step Proposition 3.3 and conclude that for n 1  n 1+ 0 we have: (2n 1 ) 1) (B) < η1 , L (n 1 (B) − L 1   (n 1 ) (n )  L (B) − L 1 (A) < θ1 , 1

1

(3.63) (3.64)

104

3 Abstract Continuity of Lyapunov Exponents

where θ1 = θ0 + 4η0 + C η1 = C

n0 , n1

n0 . n1

(3.65) (3.66)

Again, the inequality (3.64) implies the assumption (3.26) in Lemma 3.1 for n = n 1 , hence by (3.28) we have (n 1 ) 1) L (n 1 (B) − L 2 (B) > (κ(A) − 2θ1 − 3ε) · (1 − ιn 1 ).

(3.67)

Furthermore, (3.63) and (3.64) ensure that the assumptions (a) and (b) of the inductive step Proposition 3.3 hold at scale n 1 . n0 Moreover, 2θ1 + 4η1 = (2θ0 + 8η0 ) + 6C < (2ε + 24ε) + ε < κ(A) − 12ε, n1 hence the inductive step Proposition 3.3 applies again, and for n 2  n 1+ 1 we have: (2n 2 ) 2) L (n (B) < η2 , 1 (B) − L 1   (n 2 )  L (B) − L (n 2 ) (A) < θ2 , 1 1

(3.68) (3.69)

where θ2 = θ1 + 4η1 + C η2 = C

n1 , n2

n1 . n2

(3.70) (3.71)

Note that 2θ2 + 4η2 = (2θ0 + 8η0 ) + [10C

n0 n1 + 6C ] < (2ε + 24ε) + ε < κ(A) − 12ε. n1 n2

It is now clear how we continue this procedure. Going from step k to step k + 1, we choose a scale n k+1  n 1+ k and from Lemma 3.1 we get (n k ) k) L (n 1 (B) − L 2 (B) > (κ(A) − 2θk − 3ε) · (1 − ιn k ),

(3.72)

while from Proposition 3.3 we get (n

)

(2n

)

L 1 k+1 (B) − L 1 k+1 (B) < ηk+1 ,   (n k+1 ) (n ) L (B) − L 1 k+1 (A) < θk+1 , 1 where ηk+1 = C

nk n k+1

(3.73) (3.74)

(3.75)

3.5 General Continuity Theorem

105

and nk n k+1 k−1  ni nk = (θ0 + 4η0 ) + 5C +C n i+1 n k+1 i=0

θk+1 = θk + 4ηk + C

∞  ni < (θ0 + 4η0 ) + 5C n i+1 i=0

< (θ0 + 4η0 ) + 10C n −0− < (ε + 12ε) + 10ε = 23 ε. 0 Hence θk+1 < 23 ε.

(3.76)

Moreover 2θk+1 + 4ηk+1 = (2θ0 + 8η0 ) + 10C < (2θ0 + 8η0 ) + 10C

k−1  ni nk + 6C n i+1 n k+1 i=0 ∞  ni n i+1 i=0

< (2θ0 + 8η0 ) + 20C n −0− < (2ε + 24ε) + 20ε = 46ε, 0 so 2θk+1 +4ηk+1 < κ(A)−12ε, ensuring that the inductive process runs indefinitely. Now take the limit as k → ∞ in (3.74), and using (3.76) we have    L 1 (B) − L 1 (A) ≤ 23ε, which proves the continuity at A of the top Lyapunov exponent L 1 . Moreover, taking the limit as k → ∞ in (3.72), and using again (3.76) we have L 1 (B) − L 2 (B) ≥ κ(A) − 46ε − 3ε > L 1 (A) − L 2 (A) − 50ε, which proves the lower semicontinuity at A of the gap between the first two LE.  Note that estimate (3.74) in the proof of Theorem 3.2 says that if the cocycle B is close enough to A, then   (n)  L (B) − L (n) (A)  ε 1

1

(3.77)

holds for an increasing sequence of scales n = n k+1 , k ≥ 0. A slight modification of the argument shows that (3.77) holds in fact for all large enough scales n.

106

3 Abstract Continuity of Lyapunov Exponents

Indeed, it is enough to first ensure that the base step of the inductive procedure, i.e. that the estimates   (n 0 )  L (B) − L (n 0 ) (A) < ι1/2 =: θ0 < ε n0 1 1   (2n 0 ) 1/2 (2n 0 )  L (B) − L 1 (A) < ι2n 0 < ι1/2 n 0 = θ0 1 hold not just for a single scale n 0 , but for a whole (finite) interval of scales N0 = + [n 00 , en 00 ] =: [n − 0 , n 0 ], where n 00 is greater than the applicability threshold of various estimates (e.g. uniform fiber-LDT, finite scale continuity etc.). Let ψ(t) := t 1+ and define inductively the intervals of scales N1 = [ψ(n − 0 ), − + − + − + )] =: [n , n ], N = [ψ(n ), ψ(n )] =: [n , n ] for all k ≥ 0. ψ(n + k+1 0 1 1 k k k+1 k+1 It follows that if n = n 1 ∈ N1 , then n = n 1  ψ(n 0 ) = n 1+ 0 for some n 0 ∈ N0 , and so (3.63) and (3.64) hold for all n 1 ∈ N1 . Continuing inductively, for every k ≥ 1, if n = n k+1 ∈ Nk+1 , then there is n k ∈ Nk such that n = n k+1  ψ(n k ) = n 1+ k and then (3.73) and (3.74) hold as well. The intervals N0 and N1 overlap because − 1+ n 00 = n+ n− 1 = ψ(n 0 ) = n 00 < e 0.

Then since ψ is increasing and Nk+1  ψ(Nk ), the intervals Nk and Nk+1 will overlap for all k ≥ 0. Therefore, if n ≥ n − 1 then n ∈ Nk+1 for some k ≥ 0 and so (3.77) holds. This means, moreover, that we may apply Lemma 3.1 at all such scales and conclude that for all x outside a set of measure < ιn , 1 log gr(B (n) (x)) > κ(A) − 5ε. n We conclude that the following uniform, finite scale statement holds. Lemma 3.4 Given a cocycle A ∈ Cm with L 1 (A) > L 2 (A) and 0 < ε < κ(A)/100, there are δ = δ(A, ε) > 0, n 0 = n 0 (A, ε) ∈ N and ι = ι(A, ε) such that for all n ≥ n 0 and for all B ∈ Cm with dist(B, A) < δ we have:   (n)  L (B) − L (n) (A) < ε,

(3.78)

1 log gr(B (n) (x)) > κ(A) − 5ε, n

(3.79)

1

1

for all x outside a set of measure < ιn . Corollary 3.1 For all m ≥ 1, and for all 1 ≤ k ≤ m, the Lyapunov exponents L k : Cm → [−∞, ∞) are continuous functions. Proof Let A ∈ C be a measurable cocycle. If L 1 (A) > L 2 (A), then we can conclude, from Theorem 3.2 above that L 1 is continuous at A and that L 1 − L 2 is lower semicontinuous at A.

3.5 General Continuity Theorem

107

If A has a different gap pattern, by taking appropriate exterior powers, we can always reduce the problem to one where there is a gap between the first two Lyapunov exponents. For instance, if L 1 (A) = L 2 (A) > L 3 (A) ≥ · · · L m (A), consider instead the cocycle ∧2 A. Clearly L 1 (∧2 A) = L 1 (A) + L 2 (A) and L 2 (∧2 A) = L 1 (A) + L 3 (A), hence L 1 (∧2 A) − L 2 (∧2 A) = L 2 (A) − L 3 (A) > 0. Then there is a gap between the first two Lyapunov exponents of ∧2 A. This implies, using Theorem 3.2 for ∧2 A, that the block L 1 + L 2 is continuous and the gap L 2 − L 3 is lower semi-continuous at A. This argument shows that given a cocycle A ∈ C with any gap pattern, the corresponding Lyapunov blocks are all continuous at A, while the corresponding gaps are lower semicontinuous at A. Moreover, the general assumptions made on the space of cocycles ensure that the map    logdet[B(x)] μ(d x) Cm  B → L 1 (B) + · · · + L m (B) = X

is continuous everywhere. It is then a simple exercise (see Lemma 6.1 and Theorem 6.2 in [2] for its solution) to see that this is all that is needed to conclude continuity of each individual Lyapunov exponent, irrespective of any gap pattern. 

3.6 Modulus of Continuity The following proposition, which is also interesting in itself, will be the main ingredient in obtaining the modulus of continuity of the top Lyapunov exponent. It gives the rate of convergence of the finite scale exponents L (n) 1 (B) to the top Lyapunov exponent L 1 (B) and it gives an estimate on the proximity of these finite scale exponents at different scales. These estimates are uniform in a neighborhood of a cocycle A ∈ Cm for which L 1 (A) > L 2 (A), and they depend on a deviation measure function ι = ι(A) ∈ I, which will be fixed in the beginning. Define the map ψ(t) = ψι (t) := t · [ι(t)]−1/3 , and let φ = φι be its inverse. Moreover, for every integer n ∈ N, denote n++ := −1/3 ψ(n) = n ιn  and n-- := φ(n), so (n++)--  n. These estimates will be obtained by applying repeatedly the inductive step Proposition 3.3. In order to obtain the sharpest possible estimate, when going from one scale to the next, we will make the greatest possible jump, which is why we have −1/3 defined the “next scale” n++ above as n ιn . Proposition 3.4 (uniform speed of convergence) Let A ∈ Cm be a measurable cocycle for which L 1 (A) > L 2 (A). There are δ = δ(A) > 0, C = C(A), n 00 = n 00 (A) ∈ N, ι = ι(A) ∈ I such that, with the above notations, for all n ≥ n 00 and for all B ∈ Cm with dist(B, A) < δ we have:

108

3 Abstract Continuity of Lyapunov Exponents

L (n) 1 (B) − L 1 (B) < C  (n++)  (2n) L  (B) + L (n) 1 1 (B) − 2L 1 (B) < C

φ(n) ≤ ι1/3 n-n n ≤ ι1/3 n . n++

(3.80) (3.81)

Proof To prove (3.80) it is enough to show, under similar constraints on B and n, and for some function ι = ι(A) ∈ I, that (2n) L (n) 1 (B) − L 1 (B) < C

φ(n) . n

(3.82)

φ(2k n) . 2k n

(3.83)

This would imply, for all k ≥ 0, L 1(2

k

n)

(B) − L 1(2

k+1

n)

(B)
0, C1 > 0 and C > 0 be as in the beginning of the proof of Theorem 3.2. − Pick n − 0 ∈ N large enough that for n ≥ n 0 the inductive step Proposition 3.3 and (2n) the finite scale continuity Proposition 3.2 apply, and that L (n) 1 (A) − L 1 (A) < ε0 . n−

−C1 2 e 0 −0− < δ0 , ιn − < ε0 , C(n − < ε0 , Assume also n − 0 to be large enough that e 0) 1/2 0

−1/3

n 0+ ιn for n ≥ n − 0 and since ι decays at most exponentially, we may also assume − −1/2 n− 0 that n 0 ιn − < e . 0



+

− + n0 −C1 2n 0 Now set n + . 0 := e , N0 := [n 0 , n 0 ] and δ := e The assumptions above ensure that for all cocycles B ∈ Cm with dist(B, A) < δ, + and for all n 0 ∈ N0 , since δ = e−C1 2n 0 ≤ e−C1 2n 0 < e−C1 n 0 , the finite scale continuity Proposition 3.2 applies at scales 2n 0 , n 0 . This implies, as in the proof of Theorem 3.2, the assumptions in the inductive step Proposition 3.3 for every n 0 ∈ N0 .

3.6 Modulus of Continuity

109

− + + − + Let n − 1 := ψ(n 0 ), n 1 := ψ(n 0 ) and N1 := [n 1 , n 1 ]  ψ(N0 ). We may assume that for every n 1 ∈ N1 there is n 0 ∈ N0 such that n 1 = −1/3 −1/3 n 0 ιn 0   n 0 ιn 0 (= ψ(n 0 )) (this is because by Lemma 3.2, the estimates involving scales n 1 ∈ N1 which are not divisible by n 0 will only carry an additional error of order nn01 ). We apply the inductive step Proposition 3.3 and obtain:

n0 φ(n 1 ) C , n1 n1   (n 1 )  L (B) + L (n 0 ) (B) − 2L (2n 0 ) (B)  < C n 0  Cι1/3 . n0 1 1 1 n1 (2n 1 ) 1) (B) < C L (n 1 (B) − L 1

(Since n 1  ψ(n 0 ), we have φ(n 1 )  n 0 , as φ is the inverse of ψ.) The procedure continues in the same way, with intervals of scales defined induc+ tively by Nk+1 = [n − k , n k ]  ψ(Nk ) for all k ≥ 0. Again, each two consecutive intervals of scales Nk and Nk+1 overlap. Therefore, if n ≥ n − 1 , then n = n k+1 ∈ Nk+1 −1/2 for some k ≥ 0, so there is n k ∈ Nk such that n k+1 = n k ιn k   n k + +. We then have: nk φ(n k+1 ) C , n k+1 n k+1   (n k+1 ) nk (2n k ) k) L (B) + L (n (B)  < C  Cι1/3 nk . 1 (B) − 2L 1 1 n k+1 (n

)

(2n k+1 )

L 1 k+1 (B) − L 1

(B) < C

which completes the proof.



The following theorem shows that locally near any cocycle A ∈ Cm for which L 1 (A) > L 2 (A), the top Lyapunov exponent has a modulus of continuity given by a map that depends explicitly on a deviation measure function ι, hence on the strength of the large deviation type estimates satisfied by the dynamical system. Theorem 3.3 (modulus of continuity) Let A ∈ Cm be a measurable cocycle for which L 1 (A) > L 2 (A). There are δ = δ(A) > 0, ι = ι(A) ∈ I and c = c(A) > 0 such that if we define the modulus of continuity function ω(h) := [ι (c log h1 )]1/3 , then for any cocycles Bi ∈ Cm with dist(Bi , A) < δ, where i = 1, 2, we have:    L 1 (B1 ) − L 1 (B2 ) ≤ ω(dist(B1 , B2 )).

(3.84)

More generally, if for some 1 ≤ k ≤ m the cocycle A has the Lyapunov spectrum gap L k (A) > L k+1 (A), then the map Λk := L 1 + · · · + L k satisfies   Λk (B1 ) − Λk (B2 ) ≤ ω(dist(B1 , B2 )).

(3.85)

Proof Choose parameters δ0 = δ0 (A) > 0, n 00 = n 00 (A) ∈ N and ι = ι(A) ∈ I such that both the finite scale uniform continuity Proposition 3.2 and the uniform

110

3 Abstract Continuity of Lyapunov Exponents

speed of convergence Proposition 3.4 apply with deviation measure function ι for all cocycles B ∈ Cm with dist(B, A) < δ0 and for all n ≥ n 00 . Let C1 = C1 (A) > 0 be the constant from Proposition 3.2. Set δ := min{δ0 , 21 e−C1 4n 00 }. Let Bi ∈ Cm be measurable cocycles with dist(Bi , A) < δ (i = 1, 2) and put dist(B1 , B2 ) =: h (< 2δ ≤ e−C1 4n 00 ). Set n :=  2C1 1 log(1/ h) ∈ N. Then e−C1 4n ≤ h ≤ e−C1 2n , so dist(B1 , B2 ) = h ≤ e−C1 2n and n ≥ n 00 . All of this preparation shows that we can apply the finite scale uniform continuity Proposition 3.2 to B1 , B2 at scales n and 2n and get:   (n)  L (B1 ) − L (n) (B2 ) < ι1/2 < ι1/3 , n n 1 1   (2n)  L (B1 ) − L (2n) (B2 ) < ι1/2 < ι1/2 < ι1/3 . n n 1 1 2n

(3.86) (3.87)

Since dist(Bi , A) < δ ≤ δ0 and n ≥ n 00 , we can also apply the uniform speed of convergence Proposition 3.4 to Bi (i = 1, 2) at scale n and have: (Bi ) − L 1 (Bi ) < ι1/3 L (n++) n , 1   (n++) (n) (2n)  L (Bi ) + L 1 (Bi ) − 2L 1 (Bi ) < ι1/3 n . 1

(3.88) (3.89)

Combining (3.86)–(3.89) we conclude:    L 1 (B1 ) − L 1 (B2 )  ι1/3 ≤ [ι (1/(2C1 ) log(1/ h))]1/3 n =: ω(h) = ω(dist(B1 , B2 )). The more general assertion of the theorem follows by simply taking exterior powers. Indeed, the cocycle ∧k A has the property L 1 (∧k A) = (L 1 +· · ·+ L k−1 + L k )(A) > (L 1 +· · ·+ L k−1 + L k+1 )(A) = L 2 (∧k A), hence (3.85) follows from (3.84) applied to ∧k A.



Remark 3.4 If instead of uniform L 2 -boundedness we have uniform L p -boundedness −1/3 (for some p > 1), then the range n 1 ≤ n 0 ιn 0 in (3.47) should be replaced by −( p−1)/(2 p−1) . This is required at some point in the proof of the inductive n 1 ≤ n 0 ιn 0 step Proposition 3.3, where instead of Cauchy-Schwarz we apply Hölder’s inequality. Later, this range of values for the scale n 1 leads to the modulus of continuity stated in the abstract continuity Theorem 3.1.

References

111

References 1. L. Arnold, Random Dynamical Systems, Springer Monographs in Mathematics (Springer, Berlin, 1998). MR 1723992 (2000m:37087) 2. P. Duarte, S. Klein, Continuity of the Lyapunov exponents for quasiperiodic cocycles. Commun. Math. Phys. 332(3), 1113–1166 (2014). MR 3262622 3. A. Furman, On the multiplicative ergodic theorem for uniquely ergodic systems. Ann. Inst. H. Poincaré Probab. Stat. 33(6), 797–815 (1997). MR 1484541 (98i:28018) 4. M. Goldstein, W. Schlag, Hölder continuity of the integrated density of states for quasi-periodic Schrödinger equations and averages of shifts of subharmonic functions. Ann. Math. 154(1–2), 155–203 (2001). MR 1847592 (2002h:82055) 5. S. Jitomirskaya, R. Mavi, Continuity of the measure of the spectrum for quasiperiodic Schrödinger operators with rough potentials. Comm. Math. Phys. 325(2), 585–601 (2014). MR 3148097 6. Y. Katznelson, B. Weiss, A simple proof of some ergodic theorems. Israel J. Math. 42(4), 291– 296 (1982). MR 682312 (84i:28020) 7. W. Schlag, Regularity and convergence rates for the Lyapunov exponents of linear cocycles. J. Mod. Dyn. 7(4), 619–637 (2013). MR 3177775

Chapter 4

The Oseledets Filtration and Decomposition

Abstract In this chapter we obtain a new proof of the classical Multiplicative Ergodic Theorem of V. Oseledets, using the Avalanche Principle. Furthermore, we establish the continuity of the Oseledets filtration and decomposition as functions of the cocycle, assuming the availability in the space of cocycles of appropriate uniform large deviation type estimates. The same assumptions lead in the previous chapter to an abstract continuity theorem of the Lyapunov exponents. This result and other technical estimates derived there, along with the inductive scheme based on the Avalanche Principle are the main ingredients of the arguments in this chapter.

4.1 Introduction and Statements Let (X, μ, T ) be an ergodic dynamical system and let A : X → Mat(m, R) be a measurable function defining a linear cocycle on the bundle space X × Rm by X × Rm  (x, v) → (Tx, A(x)v) ∈ X × Rm . In his 1968 paper [12] in the Transactions of the Moscow Mathematical Society, V. Oseledets proved his now famous Multiplicative Ergodic Theorem. Assuming the integrability of the cocycle, this theorem proves the existence of a measurable and (T , A)-invariant filtration of the fiber {0} = Fk+1 (x)  Fk (x)  · · ·  F2 (x)  F1 (x) = Rm , and the existence of a sequence λ1 > λ2 > · · · > λk ≥ −∞, such that for μ-a.e. phase x ∈ X and for every vector v ∈ Fj (x) \ Fj+1 (x), 1 logA(n) (x) v = λj . n→+∞ n lim

The numbers λ1 , λ2 , . . . , λk , measuring the rate of expansion of the cocycle along the invariant Oseledets subspaces, are the distinct Lyapunov exponents. The repeated © Atlantis Press and the author(s) 2016 P. Duarte and S. Klein, Lyapunov Exponents of Linear Cocycles, Atlantis Studies in Dynamical Systems 3, DOI 10.2991/978-94-6239-124-6_4

113

114

4 The Oseledets Filtration and Decomposition

Lyapunov exponents L1 (A) ≥ L2 (A) ≥ · · · ≥ Lm (A) are defined by the FurstenbergKesten (or Kingman’s sub-additive ergodic) theorem. Making further assumptions (e.g. the base dynamics and the fiber action are invertible), there is a measurable and (T , A)-invariant decomposition (also called splitting) into subspaces Rm = ⊕k+1 j=1 Ej (x), such that for μ-a.e. x ∈ X and for every 1 v ∈ Ej (x) \ {0}, limn→±∞ n logA(n) (x) v = λj . There are several methods of proving the multiplicative ergodic theorem. One approach is due to Raghunathan in [14] who in particular has shown the existence of the following limits of symmetric matrices  1/2n lim A(n) (x)∗ A(n) (x)

n→∞

and

 1/2n lim A(n) (T −n x)A(n) (T −n x)∗ .

n→∞

(4.1)

We also mention the proofs of Gol’dsheid and Margulis in [5] (for a detailed presentation of this proof see [1]), Mañé (see his monograph [10]), Walters [19] as well as variants of these proofs by Viana (see his recent monograph [17]) or Bochi (see the lecture notes [3] on his web page). Many extensions of this theorem are available, including those of Gol’dsheid and Margulis in [5], Ka˘ımanovich in [7] or Ruelle in [15, 16]. In this chapter we give a new proof of the multiplicative ergodic theorem, which is based upon the AP. More precisely, we use the estimate in the AP on the distance between the most expanding direction of a product of matrices and the most expanding direction of the first term in the product. We assume the base dynamics to be invertible. However, the existence of the Oseledets filtration for non-invertible base dynamics can be reduced to the invertible case by a natural extension construction (see Sect. 1.3 in [13]). The Oseledets decomposition is usually obtained under the assumption that both the base dynamics and the fiber action are invertible. Our pooof does not require invertibility of the fiber action. We construct the Oseledets filtration as the μ-a.e. limit as n → ∞ of filtrations corresponding to the singular value decomposition of the iterates A(n) (x) of the cocycle A. The convergence of these (finite scale) filtrations follows from our extension of the AP concerning estimates on the distance between most expanding singular directions of products of matrices. The assumptions of the AP are ensured by Kingman’s ergodic theorem, which provides μ-a.e. convergence to the Lyapunov exponents of certain quantities related to the iterates A(n) (x) of the cocycle. Our proof of the Oseledets theorem uses a concept of avalanche times when the assumptions of the AP are satisfied. It was pointed to us that this notion was related to that of good times, introduced recently by Gouëzel and Karlsson in [6]. We note that the AP allows us to derive quantitative estimates that seem to have no conceptual correspondence in [6]. Finally, we mention that our method of proving Oseledets theorem may also be used to establish (4.1). If a quantitative version of the convergence in Kingman’s ergodic theorem is available, that is, if our system satisfies fiber large deviation type (LDT) estimates,

4.1 Introduction and Statements

115

then we establish a rate of convergence of the finite scale filtrations to the Oseledets filtration. Moreover, if the LDT is uniform in the cocycle, we derive continuity of the Oseledets filtration as a function of the cocycle, in an appropriate average sense. The argument is again inductive and based upon the AP, whose assumptions are shown to hold off of small sets of phases related to the exceptional sets in the LDT estimates. We construct the subspaces of the Oseledets decomposition of the cocycle A as intersections between components of the orthogonal complements of the filtration of A and components of the filtration of the adjoint cocycle. The continuity of the Oseledets decomposition (under the same assumption of having uniform LDT estimates) is derived using a similar scheme as the one employed for the continuity of the filtration. However, this needs to be combined with a careful analysis of the Lipschitz behavior of the intersection of vector subspaces, which we obtained in Chap. 2. A precise formulation of the continuity of the Oseledets filtration and decomposition as functions of the cocycle requires some preparation. We introduce (see the preamble to Sect. 4.3) a general topological space of measurable cocycles. We then allow perturbations of a given cocycle within the whole space. We define spaces of measurable filtrations and decompositions and endow them with appropriate topologies (see Sect. 4.3.2). In the case of higher dimensional (i.e. Mat(m, R)-valued, with m > 2) cocycles, as we perturb a given cocycle, the dimensions of the corresponding subspaces of its Oseledets filtration or decomposition may change. We define some natural projections / restrictions of these filtrations / decompositions, which will allow us to formulate and to prove stronger continuity results. In Sect. 4.3.1 we establish the continuity of the most expanding direction, in Sect. 4.3.3 that of the Oseledets filtration, and finally in Sect. 4.3.4 we obtain the continuity of the Oseledets decomposition. We note that as with the Lyapunov exponents, our continuity results are quantitative. To give an idea of these continuity results, we formulate here a simplified, particular version of our results in Sect. 4.3. Let (X, μ, T ) be an ergodic dynamical system with T invertible. Let Cm be a space of measurable cocycles A : X → Mat(m, R), endowed with a distance (dist) at least as fine as the L ∞ -distance. We make the following assumptions: i. The base dynamics satisfies an LDT estimate for a rich enough (relative to Cm ) set of observables. ii. Every cocycle A ∈ Cm satisfies a uniform (relative to dist) integrability condition. iii. Every cocycle A ∈ Cm with L1 (A) > L2 (A) satisfies a fiber LDT which is uniform in a neighborhood of A. If A ∈ Cm is such that L1 (A) > L2 (A), then its Oseledets decomposition contains a one dimensional subspace E1 (A)(x) corresponding to the maximal Lyapunov expo-

116

4 The Oseledets Filtration and Decomposition

nent L1 (A). This defines (after identifying one dimensional subspaces with points in the projective space P(Rm )) a measurable function E1 (A) : X → P(Rm ). By the continuity of the Lyapunov exponents established in Chap. 3, if A ∈ Cm is such that L1 (A) > L2 (A), then for any nearby cocycle B we have L1 (B) > L2 (B). Hence E1 (B) is well defined as well, and we will prove the following. Theorem 4.1 With the settings and assumptions described above, if A ∈ Cm with L1 (A) > L2 (A), then locally near A the map Cm  B → E1 (B) ∈ L 1 (X, P(Rm )) is continuous, with a modulus of continuity depending explicitly on the parameters of the LDT estimates. In fact, a more precise pointwise statement holds. There are constants δ > 0, α > 0 and a modulus of continuity function ω(h), all dependent only on A, such that for any cocycles Bi , i = 1, 2 with dist(Bi , A) < δ, μ {x ∈ X : d(E1 (B1 )(x), E1 (B2 )(x)) > dist(B1 , B2 )α } < ω(dist(B1 , B2 )), where as h → 0, ω(h) → 0 at a rate that depends explicitly on the LDT estimates. This result (and the more general ones in Sect. 4.3) are applicable to both random (i.i.d. or Markov) irreducible cocycles and to quasi-periodic cocycles, since LDT estimates will be established for these models (see Chaps. 5 and 6). Continuity of the Oseledets decomposition for GL(2, C)-valued random i.i.d. cocycles was obtained by Bocker-Neto and Viana in [4]. Their result is not quantitative but it requires no generic assumptions (such as irreducibility) on the space of cocycles. Another related result was recently obtained in [2]. A different type of continuity property, namely stability of the Lyapunov exponents and of the Oseledets decomposition under random perturbations of a fixed cocycle, was studied in [9, 11].

4.2 The Ergodic Theorems We formulate the ergodic theorems of Birkhoff and Kingman, then define the LE of a linear cocycle over a measurable bundle. We obtain a new proof of the multiplicative ergodic theorem of Oseledets using the AP.

4.2.1 Review of Grassmann Geometry Concepts and Notations A sequence of integers τ = (τ1 , . . . , τk ) with 1 ≤ τ1 < τ2 < · · · < τk < m is called a signature. We make the convention that τ0 = 0 and τk = m.

4.2 The Ergodic Theorems

117

Let s1 (g) ≥ s2 (g) ≥ · · · ≥ sm (g) ≥ 0 denote the ordered (repeated) singular values of a matrix g ∈ Mat(m, R). We say that g has a singular spectrum with a τ -gap pattern, or shortly that it has a τ -gap pattern, when sτj (g) > sτj +1 (g) for all j = 1, . . . , k. We say that it has an exact τ -gap pattern when furthermore sτj +1 (g) = sτj+1 (g) for all j = 0, 1, . . . , k. Analogously, let L1 (A) ≥ L2 (A) ≥ · · · ≥ Lm (A) ≥ −∞ denote the ordered (repeated) Lyapunov exponents of a linear cocycle A. We say that A has a Lyapunov spectrum with a τ -gap pattern, or shortly that it has a τ -gap pattern, when Lτj (A) > Lτj +1 (A) for all j = 1, . . . , k. We say that it has an exact τ -gap pattern when furthermore Lτj +1 (A) = Lτj+1 (A) for all j = 0, 1, . . . , k. > 1, Given a matrix g ∈ Mat(m, R) with singular value gap ratio gr(g) := ss21 (g) (g) its most expanding direction is the point v(g) ∈ P(Rm ) determined by any singular vector of g associated to the first singular value s1 (g) = g. sk (g) > 1, the most More generally, if 1 ≤ k ≤ m is such that gr k (g) := sk+1 (g) expanding k-subspace is the k-dimensional vector subspace vk (g) spanned by the singular vectors of g associated to the first k singular values of g. Finally, when g has a τ -gap pattern, hence gr τ (g) := min1≤j≤k gr τj (g) > 1, we define the most expanding τ -flag vτ (g) := (vτ1 (g), . . . , vτ1 (g)) ∈ Fτ (Rm ). A τ -flag in Rm is any finite strictly increasing sequence F = (F1 , . . . , Fk ) of vector subspaces F1 ⊂ F2 ⊂ . . . ⊂ Fk ⊂ Rm such that dim Fj = τj for all j = 1, . . . , k. The space of all τ -flags in Rm is denoted here by Fτ (Rm ). The orthogonal complement F ⊥ of a flag F = (F1 , . . . , Fk ) is the flag F ⊥ = ⊥ (Fk , . . . , F1⊥ ) of its orthogonal complements, which has the complementary signature τ ⊥ = (m − τk , . . . , m − τ1 ). A τ -decomposition of Rm is a family E· = {Ej }1≤j≤k+1 of vector subspaces such that Rm = ⊕k+1 j=1 Ej , and dim Ej = τj − τj−1 for all j = 1, . . . , k + 1. We denote by Dτ (Rm ) the space of all τ -decompositions of the Euclidean space Rm . Given two flags F ∈ Fτ (Rm ) and F ∈ Fτ ⊥ (Rm ), of complementary signatures, the quantity θ (F, F  ) measures the transversality between each subspace Fj in F and the corresponding subspace Fk−j+1 in F  . When θ (F, F  ) > 0 all these pairs (Fj , Fk−j+1 ) of subspaces have a transversal intersection and the following family of  }1≤j≤k+1 is a τ -decomposition (see Proposition 2.36 subspaces F  F  = {Fj ∩ Fk−j+2 in Chap. 2). Table 4.1 summarizes these notations. Table 4.1 Table of notations

Concept

Takes values in

Defined in

v(g)

P(Rm )

vk (g)

Gr k (Rm ) Fτ (Rm ) Fτ ⊥ (Rm ) Dτ (Rm )

2.14 2.15 2.18 2.6 2.35 2.34

vτ (g)

F⊥ F  F θ (F, F  )

R

118

4 The Oseledets Filtration and Decomposition

4.2.2 The Ergodic Theorems of Birkhoff and Kingman The proofs of Birkhoff’s pointwise ergodic theorem and Kingman’s ergodic theorem can be found in most monographs covering topics in ergodic theory (see for instance [17, 18]). It is also worth mentioning in this context the simple proofs by Katznelson and Weiss [8]. The method in [8] is based on a stopping time argument, an instance of which will appear in our proof of the MET in Sect. 4.2.3, and was also used in Chap. 3 to establish a type of uniform upper semicontinuity of the maximal LE. Theorem 4.2 (Birkhoff’s ergodic theorem) Let (X, μ, T ) be an ergodic dynamical system, and let ξ ∈ L 1 (X, μ) be an observable. Then 1 ξ(T j x) → n j=0 n−1

 ξ(x)μ(dx) for μ a.e. x ∈ X. X

A sequence of numbers {an }n≥0 in [−∞, +∞) is called sub-additive if an+m ≤ an + am

for all n, m ≥ 0.

Lemma 4.1 (Fekete’s Subadditive Lemma) Given a sub-additive sequence {an }n≥0 the following limit exists lim

n→∞

an an = inf ∈ [−∞, +∞). n≥1 n n

Theorem 4.3 (Kingman’s Ergodic Theorem) Let (X, μ, T ) be an ergodic dynamical system. Given a sequence of measurable functions fn : X → [−∞, +∞) such that f1+ ∈ L 1 (X, μ) and fn+m ≤ fn + fm ◦ T n

for all n, m ≥ 0,

(4.2)

 the sequence { fn dμ}n≥0 is sub-additive, and for μ-a.e. x ∈ X, n1 fn (x) converges to 1 lim n→∞ n



1 fn dμ = inf n≥1 n

 fn dμ ∈ [−∞, +∞).

Let B ⊆ X ×Rm be a measurable bundle determined by some measurable function E : X → Gr(Rm ). This means that B = { (x, v) : x ∈ X, v ∈ E(x) }. We denote by B(x) the fiber over the base point x and note that as a set, it coincides with the subspace E(x).

4.2 The Ergodic Theorems

119

Definition 4.1 A linear cocycle on B over a measure preserving dynamical system (X, μ, T ) is a measurable map FA : B → B, defined by a measurable family of linear maps A(x) : E(x) → E(Tx), FA (x, v) := (Tx, A(x)v). We identify FA with the pair (T , A) or simply with A. Definition 4.2 A cocycle A is said to be μ-integrable if 

log+ A(x) dμ(x) < +∞. X

Proposition 4.1 Given a μ-integrable cocycle A, for μ almost every x ∈ X, L1 (A) := lim

n→∞

1 logA(n) (x). n

The number L1 (A) is called the first Lyapunov exponent of A. Proof The sequence of functions fn (x) = logA(n) (x) satisfies the sub-additivity property (4.2) and f1+ = log+ A ∈ L 1 (X, μ). The conclusion follows by Theorem 4.3.  Proposition 4.2 If A is a μ-integrable cocycle then the following limit exists for any 1 ≤ i ≤ m and μ-a.e. x ∈ X, Li (A) := lim

n→∞

1 log si (A(n) (x)). n

(4.3)

The number Li (A) ∈ [−∞, +∞) is called the ith Lyapunov exponent of A. Moreover, for all 2 ≤ i ≤ m, L1 (∧i A) = Li (A) + L1 (∧i−1 A).

(4.4)

Proof Consider the exterior power cocycles ∧i A where 1 ≤ i ≤ m + 1. Since log∧i A ≤ i logA, 

the integrability condition X log+ A dμ < +∞ for A implies that all cocycles ∧i A are also μ-integrable. Because ∧m+1 A(x) ≡ 0 we have L1 (∧m+1 A) = −∞. Let k be the first integer 1 ≤ j ≤ m + 1 such that L1 (∧j A) = −∞. Then L1 (∧k−1 A) > L1 (∧k A) = −∞. By Proposition 2.6 we have for 1 ≤ i ≤ k, si (A(n) (x)) =

∧i A(n) (x) . ∧i−1 A(n) (x)

Notice that ∧i−1 A(n) (x) is eventually non-zero because L1 (∧i−1 A) > −∞. Hence, taking logarithms and applying Kingman’s theorem, the limit (4.3) exists and the relation (4.4) holds. Notice that for i = k we get

120

4 The Oseledets Filtration and Decomposition

Lk (A) = L1 (∧k A) − L1 (∧k−1 A) = −∞. For k ≤ i ≤ m, since si (A(n) (x)) ≤ sk (A(n) (x)), by comparison we infer that Li (A) = −∞ as well.   Corollary 4.1 If X log+ A(x) dμ(x) < +∞ then for μ-a.e. x ∈ X, and 1 ≤ i ≤ m, lim

n→∞

1 log∧i A(n) (x) = L1 (A) + · · · + Li (A). n

Proof Apply Proposition 4.2, using (4.4) inductively.



4.2.3 The Multiplicative Ergodic Theorem Throughout this section let T : X → X be an ergodic invertible measure preserving transformation on a probability space (X, F , μ). Consider a measurable bundle B ⊆ X × Rm determined by some measurable function E : X → Gr(Rm ) and a μ-integrable linear cocycle FA : B → B, defined by a measurable family of linear maps A(x) : E(x) → E(Tx). Given a vector v ∈ E(x) we define the Lyapunov exponents along v 1 logA(n) (x)v, n 1 inf logA(n) (x)v. λ− A (x, v) := lim n→+∞ n

λA (x, v) := lim sup n→+∞

Note that λA (x, 0) = −∞. These functions satisfy the following properties: Proposition 4.3 For every x ∈ X, given vectors v, v ∈ E(x), (a) (b) (c) (d) (e)

λA (x, v) ≤ L1 (A), λA (x, c v) = λA (x, v) if c = 0, λA (x, v + v ) ≤ max{λA (x, v), λA (x, v )}, −   if λA (x, v ) < λ− A (x, v) = λA (x, v) then λA (x, v + v ) = λA (x, v + v ) = λA (x, v), λA (x, v) = λA (Tx, A(x) v).

Proof Item (a) follows from the inequality A(n) (x) v ≤ A(n) (x) v. Item (b) is a straightforward consequence of the definition. Item (c) follows from the inequality   logA(n) (x)(v + v ) ≤ log A(n) (x)v + A(n) (x)v    ≤ log 2 max{A(n) (x)v, A(n) (x)v } = log 2 + max{logA(n) (x)v, logA(n) (x)v }.

4.2 The Ergodic Theorems

121

Item (d) follows from the inequality 

A(n) (x)v  A (x)v 1 − A(n) (x)v (n)





A(n) (x)v  ≤ A (x)(v+v ) ≤ A (x)v 1 + A(n) (x)v (n)



(n)



and the fact that lim supn→+∞ n1 logA(n) (x)v  < limn→+∞ n1 logA(n) (x)v implies that the ratio A(n) (x)v /A(n) (x)v converges geometrically to 0. Finally, item (e)  follows from the identity A(n) (x)v = A(n−1) (Tx)(A(x)v). Given a real number λ ∈ R, the set Fλ (x) := { v ∈ E(x) : λA (x, v) ≤ λ }, is a linear subspace of E(x), because of items (b) and (c) of the previous proposition. This family of subspaces determines a finite filtration (flag) {0}  Fλ1 (x)  Fλ2 (x) · · ·  Fλk (x)  Fλk+1 (x) = E(x) which by item (e) is invariant in the sense that A(x) Fλ (x) ⊆ Fλ (Tx), for all x ∈ X. The multiplicative ergodic theorem (MET) gives a precise description of this filtration and its relation with the Lyapunov exponents. Assume that A is μ-integrable and L1 (A) > L2 (A). The following proposition is about the existence of a measurable function v∞ (A) : X → P(Rm ) with the most expanding direction of the cocycle A. For each n ∈ N we define a partial function v(n) (A)(x) :=



v(A(n) (x)) if gr(A(n) (x)) > 1 undefined otherwise.

Definition 4.3 Let (Y , d) be a metric space. We say that a sequence of partial functions fn : Dn ⊆ X → Y is μ almost everywhere Cauchy if given ε > 0 there exists a set B ∈ A with μ(B) < ε and n0 ∈ N such that for all n ≥ n0 , the function fn (x) is well-defined on X \ B, i.e., X \ B ⊆ Dn , and the sequence {fn (x)}n≥n0 is Cauchy for every x ∈ / B. Proposition 4.4 Let A be a μ-integrable cocycle such that L1 (A) > L2 (A). The sequence of (partial) functions v(n) (A) from X to P(Rm ) is μ almost everywhere Cauchy. In particular, it converges μ almost everywhere to a (total) measurable function v(∞) (A) : X → P(Rm ). Moreover, for μ-a.e. x ∈ X, lim sup n→+∞

1 log d(v(n) (A)(x), v(∞) (A)(x)) ≤ L2 (A) − L1 (A) < 0. n

This proposition will be proved using the Avalanche Principle. Lemma 4.2 Given ε > 0 there exists r ∈ N such that for any n, n0 ∈ N with n0 ≥ r and n ≥ r n0 there is a sequence of integers {mi }i≥0 for which

122

(a) (b) (c)

4 The Oseledets Filtration and Decomposition

m0 = n0 , k ≥ 1, and m

k = n for some

mi − 2 mi−1 < ε mi for all i ≥ 1.

Proof Choose k ≥ 1 such that 2k ≤ n/n0 < 2k+1 , and define θ = 1k log2 (n/n0 ) − 1, so that 0 ≤ θ < 1k . The sequence mi := n0 2(1+θ)i  satisfies (a) and (b). From mi mi + 1 − 1 ≤ n0 2(1+θ)(i−1) − 1 < mi−1 ≤ n0 2(1+θ)(i−1) < 1+θ (i ≥ 1) 21+θ 2 we obtain, multiplying by 2/mi ,



mi − 2mi−1

mi

1 1 2mi−1



2mi−1

log 2 2 ≤ 1− θ + θ − ≤ = 1 − + . mi 2 2 mi k mi

Notice that by the mean value theorem 1/k



1 − 1 ≤ 1 − 1 = 2 − 1 ≤ log 2 . 2θ 21/k 21/k k

Item (c) follows choosing r = 2l where l ∈ N is such that

log 2 l

+

1 2l+1

< ε.



Definition 4.4 Given ε > 0, we call

an ε-doubling sequence any sequence {mi }i≥0 of integers such that mi − 2 mi−1 < ε mi for all i ≥ 1. Lemma 4.3 Given ε > 0 small enough and a measurable set Ω ⊂ X such that μ(Ω) > 1 − ε/4 there is a measurable subset Ω0 ⊆ Ω with μ(Ω0 ) > 1 − ε and there are integers n0 ≥ r such that (a) For each x ∈ Ω0 there is a ε-doubling sequence {mi }i≥0 satisfying m0 = n0 and T mi x ∈ Ω for all i ≥ 0; (b) For all x ∈ Ω0 and n ≥ r n0 , there is an ε-doubling sequence {mi }i≥0 satisfying m0 = n0 , mk = n for some k ≥ 1, and T mi x ∈ Ω for all 0 ≤ i < k. Proof By Birkhoff’s ergodic theorem, for μ-a.e. x ∈ X, lim

m→∞

1 ε #{ 0 ≤ j ≤ m − 1 : T j x ∈ / Ω } = μ(X \ Ω) < . m 4

(4.5)

Given a phase x, if we denote by m(x) the first integer such that the inequality 1 ε #{ 0 ≤ j ≤ m − 1 : T j x ∈ / Ω}< m 4 holds for all m ≥ m(x), then by (4.5), m(x) is defined for μ-a.e. x ∈ X. For every integer n, let Un := {x ∈ Ω : m(x) ≤ n}. Since Un ⊂ Un+1 and ∪n Un has full (relative) measure in Ω, there is n0 = n0 (ε) such that μ(Ω \ Un0 ) < ε/2.

4.2 The Ergodic Theorems

123

Note that if x ∈ Un0 , then / Ω}< #{ 0 ≤ j ≤ m − 1 : T j x ∈

εm 4

for all m ≥ n0 .

(4.6)

We set Ω0 := Un0 ∩ T −n0 (Ω). Then μ(X \ Ω0 ) ≤ μ(X \ Ω) + μ(Ω \ Un0 ) + μ(X \ T −n0 (Ω)) < ε, and if x ∈ Ω0 then (4.6) holds and T n0 x ∈ Ω. To prove (a), take x ∈ Ω0 and consider the sequence ai := 2i n0 . For each i ≥ 1, applying (4.6) with m = ai , there is an integer mi in the range (1 − ε/4)ai ≤ mi ≤ ai such that T mi x ∈ Ω. A straightforward computation shows ε/4 < ε. that {mi }i≥0 is an ε -doubling sequence with ε = 1−ε/4 Finally, to prove (b), we use Lemma 4.2 to get an integer r = r(ε) such that if n ≥ r n0 and n0 ≥ r then there is an 4ε -doubling sequence {mi }i≥0 with m0 = n0 and mk = n for some index k ≥ 1. Given x ∈ Ω0 , by the frequency bound (4.6) applied  with m = mi + ε 6mi , for each 1 ≤ i ≤ k − 1 there is mi ∈ N such that T mi x ∈ Ω and

mi − m < ε mi /6. Setting m = m0 = n0 and m = mk = n, the sequence {m }i≥0 0 i i k satisfies (b), and a simple calculation shows that it is ε-doubling.  The next proposition says that for any given ε > 0 there is a measurable set of phases Ω0 with μ(Ω0 ) > 1 − ε, such that if x ∈ Ω0 then there exists an ε-doubling sequence of avalanche times, that is, times where the assumptions of the AP hold. Proposition 4.5 Let A be a μ-integrable cocycle such that L1 (A) > L2 (A). Given 0 < κ < L1 (A) − L2 (A) and 0 < ε  κ, there exist integers n0 ≥ r ≥ 1 and a measurable set Ω0 ⊂ X with μ(Ω0 ) > 1 − ε such that for any x ∈ Ω0 and for any n ≥ r n0 there exists an ε-doubling sequence {m0 , . . . , mk } satisfying the following properties: m0 = n0 , mk = n and for all 0 ≤ i < k, (1) gr(A(mi ) (x)) ≥ emi (κ−2ε) and gr(A(mi+1 −mi ) (T mi x)) ≥ emi (κ−2ε)(1−ε)/(1+ε) , A(mi+1 ) (x) ≥ e−5mi ε . (2) A(mi+1 −mi ) (T mi x) A(mi ) (x) Moreover, for each x ∈ Ω0 there exists an ε-doubling sequence {mi }i≥0 with m0 = n0 such that (1) and (2) hold for all i ≥ 0. Proof The following limits exist for μ-a.e. x ∈ X, 1 logA(n) (x) = L1 (A), n 1 log∧2 A(n) (x) = L1 (A) + L2 (A) < 2L1 (A) − κ. lim n→∞ n lim

n→∞

Take 0 < ε  κ small. For any n0 ∈ N consider the measurable set Ωn0 (ε) of phases x ∈ X such that for all n ≥ n20 we have

124

4 The Oseledets Filtration and Decomposition

en (L1 (A)−ε) ≤ A(n) (x) ≤ en (L1 (A)+ε)

and

∧2 A(n) (x) ≤ en (2L1 (A)−κ) .

(4.7)

The almost sure convergences above imply that lim μ(Ωn (ε)) = 1.

n→+∞

We assume that n0 is chosen large enough so that also μ(X \ Ωn0 (ε)) < ε/2. Setting Ω := Ωn0 (ε), by Lemma 4.3, there exist integers r and n0 ≥ max{n0 , r}, and a measurable subset Ω0 ⊂ Ω such that for all x ∈ Ω0 and n ≥ r n0 , there is an ε-doubling sequence {mi }0≤i≤k satisfying m0 = n0 , mk = n, and T mi x ∈ Ωn0 (ε) for all 0 ≤ i < k. Item (1) follows from the fact that if x ∈ Ωn0 (ε) then by (4.7) gr(A(n) (x)) =

A(n) (x)2 ≥ en (κ−2ε) ∧2 A(n) (x)

for all n ≥

n0 . 2

Applying the estimate above with n := mi yields the first inequality in item (1), while the second follows by putting n := mi+1 − mi . Note that the ε-doubling condition implies that mi+1 − mi > 1−ε m ≥ 21 n0 . 1+ε i For item (2) we use again (4.7). Since x, T mi x ∈ Ωn0 (ε), A(mi+1 ) (x) ≥ e−2 ε mi+1 ≥ e−5 ε mi . A(mi ) (x) A(mi+1 −mi ) (T mi x) 

This completes the proof.

Proof (of Proposition 4.4) Given 0 < ε  κ < L1 (A) − L2 (A), consider the integers n0 ≥ r ≥ 1 and the measurable set Ω0 ⊆ X provided by Proposition 4.5. Given x ∈ Ω0 and n ≥ n0 r by this proposition there is a ε-doubling sequence {m0 , m1 , . . . , mk } such that m0 = n0 , mk = n and both gap and angle conditions (1) and (2) hold for all 0 ≤ i < k. We apply the avalanche principle to the sequence of two matrices g0 = A(mi ) (x) and g1 = A(mi+1 −mi ) (T mi x), with g1 g0 = A(mi+1 ) (x). The key parameters in this application of the AP are κap = e−mi (κ−2ε)(1−ε)/(1+ε) < e−mi (κ−2ε)/2

and

εap = e−5 ε mi ,

κ

for which we have εap2 < e−mi (κ/2−11 ε)  1. ap Conclusions (1) and (2) of Proposition 4.5 imply the gap and angle conditions of the AP. Therefore, by Proposition 2.42, d(v(A(mi ) (x)), v(A(mi+1 ) (x)))


3 2

mi . Hence

d(v(A(n0 ) (x)), v(A(n) (x))) = d(v(A(m0 ) (x)), v(A(mk ) (x))) ≤

k−1 

d(v(A(mi ) (x)), v(A(mi+1 ) (x)))

i=0



k−1  i=0

e−mi θ ≤

k−1 

e−(3/2)

i

n0 θ

 e−n0 θ .

i=0

Taking n0 large enough, the bound e−n0 θ becomes arbitrarily small. This proves that the sequence {v(A(n) (x))}n≥n0 is Cauchy. Moreover, passing to the limit as n → +∞, d(v(A(n0 ) (x)), v(∞) (A)(x))  e−n0 θ . Therefore, as n = n0 is arbitrary, lim sup n→+∞

1 log d(v(n) (A)(x)), v(∞) (A)(x)) ≤ −θ. n

Finally, since ε > 0 can be taken arbitrarily small, and κ can be taken arbitrarily close to L1 (A) − L2 (A), we conclude that lim sup n→+∞

1 log d(v(n) (A)(x), v(∞) (A)(x)) ≤ L2 (A) − L1 (A). n



Given 0 ≤ k ≤ m, we define a sequence of partial functions v(n) k (A) on X taking values in Gr k (Rm ), v(n) k (A)(x) :=



vk (A(n) (x)) if gr k (A(n) (x)) > 1, undefined otherwise.

Next proposition asserts the convergence of this sequence of partial functions. The limit function v(∞) k (A) is called the most expanding k-plane of the cocycle A. Proposition 4.6 If Lk (A) > Lk+1 (A) then the sequence of partial functions v(n) k (A) from X to Gr k (Rm ) is almost everywhere Cauchy. In particular, it converges μ almost m everywhere to a (total) measurable function v(∞) k (A) : X → Gr k (R ). Moreover, for μ-a.e. x ∈ X, lim sup n→+∞

1 (∞) log d(v(n) k (A)(x), vk (A)(x)) ≤ Lk+1 (A) − Lk (A) < 0. n

126

4 The Oseledets Filtration and Decomposition

Proof Apply Proposition 4.4 to the cocycle ∧k A.



Definition 4.5 Given a μ-integrable cocycle A, we say that x ∈ X is a μ-regular point if whenever Lj (A) > Lj+1 (A), we have 1 log∧j A(n) (x) = L1 (∧j A) n 1 (∞) (A)(x) ≤ Lj+1 (A) − Lj (A). lim sup log d v(n) j (A)(x), vj n→+∞ n lim

n→+∞

Proposition 4.7 The set of μ-regular points of a cocycle has full μ-measure. Proof By Proposition 2.11, gr 1 (∧j A(n) (x)) = gr j (A(n) (x)). By Definition 2.15 we   have Ψ vj (A(n) (x)) = v(∧j A(n) (x)), where Ψ stands for the Plücker embedding. To finish apply Corollary 4.1 and Proposition 4.6.  We recall Definition 2.11 from Sect. 2.2. Given a linear map A : V → V  between Euclidean spaces V and V  of dimension m, we call singular basis of A any orthonormal basis {vj }1≤j≤m of V consisting of singular vectors vj of A such that A vj  = sj (A) for all j = 1, . . . , m. Notice that for every 1 ≤ k ≤ m, the unit k-vector v1 ∧ · · · ∧ vk is a most expanding vector of ∧k A. Proposition 4.8 Consider a μ-integrable cocycle A, and a μ-regular point x ∈ X. If γ = L1 (A) = · · · = Lk (A) > Lk+1 (A) then for any v ∈ v(∞) k (A)(x) \ {0}, we have 1 logA(n) (x) v = γ . lim n→+∞ n In particular, λA (x, v) = λ− A (x, v) = γ . Proof Consider a singular basis {v1,n , . . . , vm,n } for the linear map A(n) (x). Let {v1 , . . . , vk } ⊂ v(∞) k (A)(x) be an orthonormal family obtained as limit of the sequence {v1,ns , . . . , vk,ns }, for some subsequence of integers ns . Let wn = v1,n ∧ . . . ∧ vk,n and w = v1 ∧ . . . ∧ vk . ˆ Thus possibly changing the sign of v1,n , we may Because x is μ-regular, wˆn → w. assume that wn → w as n → +∞. Therefore,



∧k A(n) (x) w − ∧k A(n) (x) wn 

∧k A(n) (x) w



−1 = ∧k A(n) (x) wn  ∧k A(n) (x) (n) ∧k A (x) w − ∧k A(n) (x) wn  ≤ w − wn  → 0. ≤ ∧k A(n) (x) Because of this, and since ∧k A(n) (x) w ≤

k

A(n) (x) vj , j=1

4.2 The Ergodic Theorems

127

we have 1 log∧k A(n) (x) n 1 1 = lim log∧k A(n) (x) wn  = lim log∧k A(n) (x) w n→∞ n n→∞ n k k 1  1  ≤ lim inf logA(n) (x) vj  ≤ lim sup logA(n) (x) vj  n→∞ n n→∞ n j=1 j=1

k γ = L1 (A) + · · · + Lk (A) = lim

n→∞

k 1  logA(n) (x) = k γ . n→∞ n j=1

≤ lim

Consider now ci,n :=

0 ≤ ci,n ≤

1 1 logA(n) (x) − logA(n) (x) vi  which satisfies n n

k 

cj,n

j=1

k k 1  (n) = logA (x) − logA(n) (x) vj . n n j=1

Since the right-hand-side above converges to 0, limn→+∞ cj,n = 0 for all j = 1, . . . , k. Equivalently, 1 logA(n) (x) vj  = γ . n→+∞ n

λA (x, vj ) = lim

Now, given v ∈ v(∞) k (A)(x) \ {0}, assume, by contradiction, that there exists a sequence ns → +∞ such that lim

n→+∞

1 logA(ns ) (x) v < γ . ns

(4.8)

Possibly changing the limit points vj , and extracting a subsequence of ns , we may assume that vj,ns → vj as s → +∞, for all 1 ≤ j ≤ k. Pick any j such that v, vj  = 0. Since the vectors A(ns ) (x) vj,ns are pairwise orthogonal, (ns )

A

k  (x) v = v, vj,ns 2 A(ns ) (x) vj,ns 2 2

j=1 k  = v, vj,ns 2 sj (A(ns ) (x))2 ≥ v, vj,ns 2 sj (A(ns ) (x))2 . j=1

Hence, taking logarithms, dividing by ns and passing to the limit we get lim

s→+∞

1 1 logA(ns ) (x)v ≥ lim log sj (A(ns ) (x)) = Lj (A) = γ , s→+∞ ns ns

128

4 The Oseledets Filtration and Decomposition

which contradicts (4.8). This shows that λA (x, v) = lim

n→+∞

1 logA(n) (x) v = γ , n 

and concludes the proof.

Definition 4.6 The adjoint of a cocycle (T , A) is the map FA∗ : B → B, defined by FA∗ (x, v) = (T −1 x, A(T −1 x)∗ v). This cocycle is denoted by (T −1 , A∗ ) or by A∗ . Remark 4.1 The adjoint cocycle satisfies for any n ∈ N and x ∈ X, (A∗ )(n) (x) = A(n) (T −n x)∗ . Proposition 4.9 If A is μ-integrable then the adjoint cocycle A∗ is also μ-integrable. Moreover, the cocycle A and its adjoint A∗ have the same Lyapunov exponents, Li (A) = Li (A∗ ) for all i = 1, . . . , m. Proof The integrability of A∗ follows from the relation A = A∗ . The second statement is a consequence of a linear operator and its adjoint sharing the same singular values. In fact, by Proposition 4.2 and the previous remark  1 logsi (A(n) (x)) dμ(x) Li (A) = lim n→+∞ n X  1 = lim logsi (A(n) (T −n x)) dμ(x) n→+∞ n X  1 = lim logsi ((A∗ )(n) (x)) dμ(x) = Li (A∗ ). n→+∞ n X



Lemma 4.4 If L1 (A) > L2 (A) then for μ-almost every x ∈ X, α v(∞) (A∗ )(x), v(∞) (A)(x) > 0. Proof Take 0 < ε  κ := L1 (A) − L2 (A), and consider the measurable set Ω0 and the order n0 ∈ N provided by Proposition 4.5. For x ∈ Ω0 let {mi }i be an ε-doubling sequence of avalanche times. Then for all i ≥ 0,   α A(mi ) (x), A(mi+1 −mi ) (T mi x) 

A(mi+1 ) (x) ≥ e−5 mi ε . A(mi ) (x) A(mi+1 −mi ) (T mi x)

Define Ωi := T mi Ω0 . We have μ(X \ Ωi ) < ε, and for all i ≥ 0 and x ∈ Ωi ,   α v(mi ) (A∗ )(x), v(mi+1 −mi ) (A)(x) = α v(A(mi ) (T −mi x)∗ ), v(A(mi+1 −mi ) (x))   = α A(mi ) (T −mi x), A(mi+1 −mi ) (x)  e−5 mi ε .

4.2 The Ergodic Theorems

129

Notice that by Proposition 4.4, for i large, the distances d(v(mi ) (A∗ )(x), v(∞) (A∗ )(x))

and

d(v(mi+1 −mi ) (A)(x), v(∞) (A)(x))

are much smaller than e−5 mi ε . Hence on the set Ωi α v(∞) (A∗ )(x), v(∞) (A)(x)  e−5 mi ε > 0. Since Ωi has measure μ(Ωi ) > 1 − ε with arbitrary ε, this proves the lemma.



Definition 4.7 Given a measurable sub-bundle vˆ : X → P(Rm ), we call unit measurable section of vˆ : X → P(Rm ) any measurable function v : X → Rm such that v(x) = 1 and v(x) ∈ vˆ (x) for μ-a.e. x ∈ X. Let us abbreviate vˆ (x) := v(∞) (A)(x), vˆ n (x) := v(n) (A)(x), vˆ ∗ (x) := v(∞) (A∗ )(x) and vˆ n∗ (x) := v(n) (A∗ )(x). Moreover, let us respectively denote by v(x), vn (x), v∗ (x) and vn∗ (x) unit measurable sections of vˆ (x), vˆ n (x), vˆ ∗ and vˆ n∗ (x). Lemma 4.5 Assume L1 (A) > L2 (A) and let v : X → Rm be a unit measurable section of v(∞) (A). Then A(x)∗ v(Tx) = 0 for μ-almost every x ∈ X. Proof By Lemma 4.4, α(ˆv(Tx), vˆ ∗ (Tx)) > 0 for μ-a.e. x ∈ X. By Proposition 4.4 applied to the adjoint cocycle A∗ , for μ-a.e. x ∈ X and all large enough n ≥ 1, α0 := α(ˆv(Tx), v(A(n) (T −n+1 x)∗ )) = α(ˆv(Tx), v((A∗ )(n) (Tx))) = α(ˆv(Tx), v(n) (A∗ )(Tx)) = α(ˆv(Tx), vˆ n∗ (Tx)) > 0. Hence by item (a) of Proposition 2.22 A(n) (T −n+1 x)∗ v(Tx) ≥ α0 A(n) (T −n+1 x)∗  > 0. Finally, since A(n) (T −n+1 x)∗ v(Tx) = A(n−1) (T −n+1 x)∗ A(x)∗ v(Tx), we infer that A(x)∗ v(Tx) = 0.



From now on, given a matrix A(x) and a projective or Grassmannian point vˆ we will abbreviate ϕA(x) vˆ and ϕA(x)−1 vˆ writing respectively A(x) vˆ and A(x)−1 vˆ . With this notation, from (2.16) we obtain A(n) (x)v(A(n) (x)) = v(A(n) (x)∗ ) and A(n) (x)∗ v(A(n) (x)∗ ) = v(A(n) (x)).

(4.9)

The following proposition establishes the invariance of the most expanding subbundles v(∞) k (A).

130

4 The Oseledets Filtration and Decomposition

Proposition 4.10 If Lk (A) > Lk+1 (A) then for μ-a.e. x ∈ X, (a) (b)

(∞) A(x)∗ [v(∞) k (A)(Tx)] = vk (A)(x), (∞) ⊥ ⊥ A(x)−1 [v(∞) k (A)(Tx) ] = vk (A)(x) .

Proof By Proposition 2.15, (b) reduces to (a). Working with exterior powers we can reduce (a) to the case k = 1, i.e., (a) reduces to the identity A(x)∗ vˆ (Tx) = vˆ (x). By Proposition 4.4 and (4.9), vˆ (x) ≈ vˆ n (x) = v(A(n) (x)) = A(n) (x)∗ v(A(n) (x)∗ ) = A(n) (x)∗ vˆ n∗ (T n x) and analogously ∗ (T n x). vˆ (Tx) ≈ vˆ n−1 (Tx) = A(n−1) (Tx)∗ vˆ n−1

Hence ∗ (T n x) A(x)∗ vˆ (Tx) ≈ A(x)∗ vˆ n−1 (Tx) = A(n) (x)∗ vˆ n−1

≈ A(n) (x)∗ vˆ n∗ (T n x) = vˆ n (x) ≈ vˆ (x). Item (a) follows from taking limits in these proximity relations. On the first occurrence of ≈ we use the continuity of the projective action of A(x)∗ and Lemma 4.5. This lemma asserts that A(x)∗ v(Tx) = 0 for μ-a.e. x ∈ X and any unit measurable section v of vˆ . By Proposition 2.26 we have     d A(x)∗ vˆ (Tx), A(x)∗ vˆ n−1 (Tx) ≤ C d vˆ (Tx), vˆ n−1 (Tx) → 0 where C is any constant larger than A(x)∗ v(Tx)−1 . On the second occurrence of ≈, take 0 < κ < L1 (A) − L2 (A), 0 < ε  κ arbitrary small and, by Egorov’s theorem, a measurable subset E ⊂ X such that vˆ n∗ converges uniformly to vˆ ∗ on E. Then choose a sequence of times n ∈ N such that T n x ∈ E and gr(A(n) (x)∗ ) = gr(A(n) (x)) ≥ en κ . Because of this large gap ratio, A(n) (x)∗ acts as a strong contraction in a neighborhood of vˆ n∗ (T n x). But for T n x ∈ E, ∗ (T n x) are both very close to vˆ ∗ (T n x), and hence close to each other. vˆ n∗ (T n x) and vˆ n−1 Thus δ( A(x)∗ vˆ n−1 (Tx), A(x)∗ vˆ n (Tx) )  δ(ˆvn−1 (Tx), vˆ n (Tx)) converges to 0 as n → +∞. On the last occurrence of ≈ we apply Proposition 4.4.



Lemma 4.6 Given a measurable function f : X → R such that f −f ◦T ∈ L 1 (X, μ), for μ-a.e. x ∈ X, 1 f (T n x) = 0. lim n→+∞ n

4.2 The Ergodic Theorems

131



Proof First prove that lim inf n→+∞ n1 f (T n x) = 0 using Poincaré’s recurrence theorem. Then notice that n−1 1 1  1 f (T n x) = f (x) − (f − f ◦ T )(T j x), n n n j=0



and apply Birkhoff’s ergodic theorem.

Lemma 4.7 Let T : X → X be an ergodic MPT on a probability space (X, μ), and let f : X → (0, +∞) be a measurable non-integrable function. Then for μ-a.e. x ∈ X, n−1 1  lim f (T j x) = +∞. n→+∞ n j=0 Proof Defining fn = max{f , n}, by Lebesgue’s monotone convergence theorem 

 lim

n→+∞ X

fn dμ =

f dμ = +∞. X

For each n ∈ N, since fn is μ-integrable there is  a full measure set Bn ⊆ X such that m−1 j f (T x) = for all x ∈ Bn , limm→+∞ m1 j=0 n X fn dμ. Thus B = ∩n∈N Bn is also a full measure set.  Given x ∈ B and L > 0, consider p ∈ N such that X fp dμ > L. Since x ∈ Bp , there is an order n0 = n0 (x) > p such that for n ≥ n0 n−1 n−1 1  1  j f (T x) ≥ fp (T j x) ≥ L, n j=0 n j=0



which proves the lemma.

Proposition 4.11 Assume L1 (A) > L2 (A) and let v, v∗ : X → Rm be unit measurable sections of v(∞) (A) and v(∞) (A∗ ), respectively. Then the functions logA v∗  and log(A ◦ T −1 )∗ v are μ-integrable, and 

logA(x) v∗ (x) dμ(x) = X



logA(T −1 x)∗ v(x) dμ(x) = L1 (A).

X

Proof Because the cocycles A and A∗ play symmetric roles, it is enough proving the μ-integrability of the function logA v∗ . Applying Proposition 4.10 to A∗ , we see that A(x) v∗ (x) = ±v∗ (Tx). From this invariance relation, for μ-a.e. x ∈ X, logA(n) (x) v∗ (x) =

n−1  j=0

logA(T j x) v∗ (T j x).

132

4 The Oseledets Filtration and Decomposition

Let vn : X → Rm be a unit measurable section of v(n) (A). For notational simplicity we will also write vˆ ∗ (x) := v(∞) (A∗ )(x) and vˆ n (x) := v(n) (A)(x). By item (a) of Proposition 2.22,   A(n) (x)v∗ (x) ≥ α vˆ ∗ (x), vˆ n (x) A(n) (x), and hence   1 1 1 1 logA(n) (x) + log α vˆ ∗ (x), vˆ n (x) ≤ logA(n) (x)v∗ (x) ≤ logA(n) (x). n n n n By Proposition 4.1, n1 logA(n) (x) converges to L1 (A) almost surely. By Lemma 4.4,   α vˆ ∗ (x), vˆ (x) > 0, and hence n1 log α(ˆv∗ (x), vˆ n (x)) converges to zero. Thus, for μ-almost every x ∈ X, n−1 1  1 logA(n) (x)v∗ (x) = L1 (A). lim logA(T j x) v∗ (T j x) = lim n→∞ n n→∞ n j=0

The function logA(x)v∗ (x) is bounded from above by the μ-integrable function log+ A(x). Hence, h(x) := log+ A(x) − logA(x)v∗ (x) is a non-negative measurable function whose Birkhoff averages converge μ-almost everywhere to  log+ A dμ − L1 (A). By Lemma 4.7 it follows that h ∈ L 1 (X, μ), which implies that logA v∗  ∈ L 1 (X, μ).   Thus, by Birkhoff’s theorem, X logA(x) v∗  dμ = L1 (A). Proposition 4.12 Assume L1 (A) > L2 (A). Then for μ-a.e. x ∈ X, 1 (a) lim log α v(∞) (A∗ )(T n x), v(∞) (A)(T n x) = 0. n→+∞ n 1 (b) lim sup log α A(n) (x)v(∞) (A)(x), v(∞) (A)(T n x) = 0. n→+∞ n Proof Take unit measurable sections v, v∗ : X → P(Rm ) of v(∞) (A) and v(∞) (A∗ ), respectively, and as before let us write vˆ (x) := v(∞) (A)(x) and vˆ ∗ (x) := v(∞) (A∗ )(x). Consider the function f (x) := log α(ˆv∗ (x), vˆ (x)). By Lemma 4.6, for (a) it is enough to prove that f − f ◦ T ∈ L 1 (μ). By Proposition 4.10 we have α(ˆv∗ (x), A(x)∗ vˆ (Tx)) α(ˆv∗ (x), vˆ (x)) = log α(ˆv∗ (Tx), vˆ (Tx)) α(A(x) vˆ ∗ (x), vˆ (Tx)) ∗ ∗ v (x), A(x) v(Tx) A(x)v∗ (x) = log ∗ A(x) v(Tx) (A(x)v∗ (x), v(Tx) = logA(x)v∗ (x) − logA(x)∗ v(Tx).

f (x) − f (Tx) = log

By Proposition 4.11, logA v∗  ∈ L 1 (X, μ), and logA∗ (v ◦ T ) ∈ L 1 (X, μ). Hence by Lemma 4.6 this implies (a).

4.2 The Ergodic Theorems

133

As before, we use the notation vˆ n and vˆ n∗ for the sub-bundles v(n) (A) and v(n) (A∗ ), respectively. Since A(n) (x) vˆ n (x) = vˆ n∗ (T n x), by Proposition 2.35 we have α(A(n) (x) vˆ (x), vˆ (T n x)) ≥ α(ˆv∗ (T n x), vˆ (T n x)) − d(ˆv∗ (T n x), vˆ n∗ (T n x)) − d(A(n) (x) vˆ n (x), A(n) (x) vˆ (x)). Now take 0 < κ < L1 (A) − L2 (A) and 0 < ε  κ arbitrary small. By item (a), for all large enough n we have α(ˆv∗ (T n x), vˆ (T n x)) ≥ e−nε . Because as n grows, A(n) (x) has a large gap ratio, it acts as a strong contraction in a neighborhood of vˆ n∗ (x). Hence by Proposition 4.4 for all n large enough d(A(n) (x) vˆ n (x), A(n) (x) vˆ (x))  d(ˆvn (x), vˆ (x)) ≤ e−n (κ−ε) . We can not guarantee that the second distance d(ˆv∗ (T n x), vˆ n∗ (T n x)) converges to 0 μ-almost everywhere, but since vˆ n∗ converges almost surely to vˆ ∗ , with the speed provided by Proposition 4.4, for μ-a.e. x ∈ X there is a sequence of times {ni }i such that d(ˆv∗ (T ni x), vˆ n∗i (T ni x)) ≤ e−ni (κ−ε) ∀ i. Thus, taking logarithms and dividing by n, (b) follows.



Proposition 4.13 Given x ∈ X and unit vectors vk ∈ ∧k E(x) and vr ∈ ∧r E(x), λ∧k+r A (x, vk ∧ vr ) ≤ λ∧k A (x, vk ) + λ∧r A (x, vr ), − − λ− ∧k+r A (x, vk ∧ vr ) ≤ λ∧k A (x, vk ) + λ∧r A (x, vr ). Proof By item (a) of Proposition 2.34, log∧k+r A(n) (x)vk ∧ vr  ≤ log∧k A(n) (x)vk  + log∧r A(n) (x)vr .    Proposition 4.14 Assume Lk (A) > Lk+1 (A). Given unit vectors vk ∈ ∧k v(∞) (A)(x) k   (∞) ⊥ and vr ∈ ∧r vk (A)(x) , Hence, dividing by n an passing to the limit, the inequalities follow.

− λ− ∧k+r A (x, vk ∧ vr ) = λ∧k A (x, vk ) + λ∧r A (x, vr ). − Moreover, if λ− ∧r A (x, vr ) = λ∧r A (x, vr ) then λ∧k+r A (x, vk ∧ vr ) = λ∧k+r A (x, vk ∧ vr ).

Proof Because ∧k [v(∞) k (A)] has dimension one, vk is a limit point of the sequence of most expanding vectors for ∧k A(n) (x). Hence, by Proposition 4.8 we have λ− ∧k A (x, vk ) = λ∧k A (x, vk ).

134

4 The Oseledets Filtration and Decomposition

In view of Proposition 4.13, it is enough to prove that − λ∧k A (x, vk ) + λ− ∧r A (x, vr ) ≤ λ∧k+r A (x, vk ∧ vr ). n ⊥ By Proposition 4.10 we have ∧r A(n) (x) vr ∈ ∧r [v(∞) k (A)(T x) ]. Hence by Proposition 2.34 (b)

∧k A(n) (x)vk  ∧r A(n) (x) vr  ≤ αn (x)−1 ∧k+r A(n) (x) (vk ∧ vr ), where (∞) n (A)(x), v (A)(T x) . αn (x) := αk A(n) (x)v(∞) k k Therefore, by Proposition 4.12 (b), λ∧k A (x, vk ) + λ− ∧r A (x, vr ) 1 = lim inf log∧k A(n) (x)vk  ∧r A(n) (x) vr  n→+∞ n 1 1 ≤ lim inf log∧k+r A(n) (x) (vk ∧ vr ) + lim inf log αn (x)−1 n→+∞ n n→+∞ n 1 log αn (x) = λ− ≤ λ− ∧k+r A (x, vk ∧ vr ) − lim sup ∧k+r A (x, vk ∧ vr ). n→+∞ n Assume now that λ− ∧r A (x, vr ) = λ∧r A (x, vr ). Combining Proposition 4.13 with the previous inequality λ∧k+r A (x, vk ∧ vr ) ≤ λ∧k A (x, vk ) + λ∧r A (x, vr ) ≤ λ∧k A (x, vk ) + λ− ∧r A (x, vr ) ≤ λ− ∧k+r A (x, vk ∧ vr ), which implies that λ− ∧k+r A (x, vk ∧ vr ) = λ∧k+r A (x, vk ∧ vr ).



Definition 4.8 Given a μ-regular point x ∈ X of a cocycle A, we call limit singular basis of the fiber E(x) any orthonormal basis {u1 , . . . , um } of E(x) obtained as a limit point of a sequence of singular basis {u1,n , . . . , um,n } of A(n) (x). Lemma 4.8 Let {u1 , . . . , um } be a limit singular basis of E(x) at some μ-regular point x ∈ X. Then for all i = 1, . . . , m, λ− ∧i A (x, u1 ∧ . . . ∧ ui ) = λ∧i A (x, u1 ∧ . . . ∧ ui ) = L1 (∧i A).

4.2 The Ergodic Theorems

135

Proof Let {u1,n , . . . , um,n } be a singular basis of A(n) (x), and {u1 , . . . , um } a corresponding limit singular basis for the cocycle A. Choose k such that L1 (∧i A) = · · · = Lk (∧i A) > Lk+1 (∧i A). Since u1 ∧ · · · ∧ ui is a limit point of u1,n ∧ · · · ∧ ui,n , which is a sequence of vectors (∞) in v(n) k (∧i A)(x), we infer that u1 ∧ · · · ∧ ui ∈ vk (∧i A)(x). The conclusion follows  then applying Proposition 4.8 to the cocycle ∧i A. Proposition 4.15 Consider a cocycle A such that Lk (A) > Lk+1 (A). Then Li A|v⊥k = Li+k (A) for any 1 ≤ i ≤ m − k, ⊥ where A|v⊥k stands for the restriction of A to the invariant bundle v(∞) k (A) .

Proof It is enough to see that L1 (∧k A) + L1 ∧i A|v⊥k = L1 (∧i+k A).

(4.10)

In fact, from (4.10), using Corollary 4.1, L1 (A) + · · · + Lk (A) + L1 (A|v⊥k ) + · · · + Li (A|v⊥k ) = L1 (A) + · · · + Li+k (A). Therefore, the conclusion follows subtracting these identities for consecutive indexes i and i − 1. Let us prove (4.10). This identity is reduces to a pair of inequalities. We will use Propositions 4.13 and 4.14 to establish each of these inequalities. Fix a μ-regular point x ∈ X, and consider a limit singular basis {u1 , . . . , um } of the fiber E(x). By Lemma 4.8, L1 (∧i+k A) = λ∧i+k A (x, u1 ∧ · · · ∧ uk+i ) ≤ λ∧k A (x, u1 ∧ · · · ∧ uk ) + λ∧i A (x, uk+1 ∧ · · · ∧ uk+i ) ≤ L1 (∧k A) + L1 (∧i A|v⊥k ). On the last step we use that uk+1 ∧ · · · ∧ uk+i is a non zero vector in the fiber of the ⊥ bundle ∧i [v(∞) k (A) ]. For the converse inequality, choose an orthonormal basis {u1 , . . . , uk } of v(∞) k (A)(x) and extend it with a limit singular basis {uk+1 , . . . , um } for the cocycle A|v⊥k . By Lemma 4.8 applied to the cocycle A|v⊥k we get λ− ). ∧i A (x, uk+1 ∧ · · · ∧ um ) = λ∧i A (x, uk+1 ∧ · · · ∧ um ) = L1 (∧i A|v⊥ k

136

4 The Oseledets Filtration and Decomposition

Hence, by Propositions 4.14 and 4.8, L1 (∧i+k A) ≥ λ∧k+i A (x, u1 ∧ · · · ∧ uk ∧ uk+1 ∧ . . . ∧ uk+i ) = λ∧k A (x, u1 ∧ · · · ∧ uk ) + λ∧i A (x, uk+1 ∧ · · · ∧ uk+i ) = L1 (∧k A) + L1 (∧i A|v⊥k ). Putting together these two inequalities we prove (4.10).



Proposition 4.16 Consider integers 1 ≤ k < k + r ≤ m such that Lk (A) > Lk+1 (A) = · · · = Lk+r (A) > Lk+r+1 (A). Then for μ-almost every x ∈ X, (∞) ⊥ v(∞) )(x) = v(∞) r (A|v⊥ k (A)(x) ∩ vk+r (A)(x). k

In particular, for every non-zero vector v in the fiber over x of this sub-bundle, 1 logA(n) (x) v = Lk+1 (A). n→+∞ n

λ− A (x, v) = λA (x, v) = lim

Proof The stated relation is a simple application of Proposition 2.41 to the matrices g = A(n) (x). Notice that for a generic point x ∈ X these matrices have exponentially large gap ratios gr k (A(n) (x)) and gr k+r (A(n) (x)). By this proposition (∞) (∞) ⊥  δ(v(n) δ vr (A(n) (x)|v⊥k ), v(n) (A)(x) ∩ v (A)(x) k+r k k (A)(x), vk (A)(x) ) converges to zero. Hence the proposition follows by taking the limit as n tends to +∞. The last statement is a consequence of Proposition 4.8.  Given a signature τ = (τ1 , . . . , τk ), we define a sequence of partial functions m v(n) τ (A) on X taking values on Fτ (R ), v(n) τ (A)(x) :=



vτ (A(n) (x)) if gr τ (A(n) (x)) > 1 undefined otherwise

whose converge is established below. The almost sure limit of this sequence of functions, denoted by v(∞) τ (A), will be called the most expanding τ -flag of A. We say that the Lyapunov spectrum of a cocycle A has a τ -gap pattern when Lτj (A) > Lτj +1 (A),

for all 1 ≤ j < k.

The size of these gaps is measured by gapτ (A) := min Lτj (A) − Lτj +1 (A). 1≤j λ2 (A) > · · · > λk (A) > λk+1 (A) ≥ −∞. Proposition 4.17 If the Lyapunov spectrum of A has a τ -gap pattern, then the m sequence of partial functions v(n) τ (A) from X to Fτ (R ) is almost everywhere Cauchy. In particular, it converges μ almost everywhere to a (total) measurable function m v(∞) τ (A) : X → Fτ (R ). Moreover, for μ-a.e. x ∈ X, lim sup n→+∞

1 (∞) log d(v(n) τ (A)(x), vτ (A)(x)) ≤ −gapτ (A) < 0. n

Proof Apply Proposition 4.6 at the dimensions i = τj , with j = 1, . . . , k.



We are now able to state and to prove the Oseledets Multiplicative Ergodic Theorem, which has two versions, one on the existence of the Oseledets filtration, and the other on the existence of the Oseledets decomposition. Theorem 4.4 (Oseledets I) Let T : X → X be an invertible ergodic MPT of a probability space (X, F, μ), and let FA : B → B be a μ-integrable linear cocycle on a measurable bundle B ⊆ X × Rm . Then there exist λ1 > λ2 > · · · > λk ≥ −∞ and a family of measurable functions Fj : X → Gr(Rm ), 1 ≤ j ≤ k, such that for μ-almost every x ∈ X, (a) A(x) Fj (x) ⊆ Fj (Tx) for j = 1, . . . , k (b) {0} = Fk+1 (x)  Fk (x)  . . .  F2 (x)  F1 (x) = B(x) 1 (c) for every v ∈ Fj (x) \ Fj+1 (x), lim logA(n) (x) v = λj . n→+∞ n Proof Assume the cocycle A has a Lyapunov spectrum with exact gap pattern τ = (τ1 , . . . , τk−1 ), where 0 = τ0 < τ1 < . . . < τk−1 < τk = dim E, and E = E(x) (∞) denotes the fiber of B. Set by convention v(∞) τ0 (A) = {0} and vτk (A) = E(x). (∞) Define Fj (x) := vτj−1 (A)(x)⊥ for j = 1, . . . , k + 1, so that dim Fj (x) = dim E − τj−1 . This implies (b). The invariance (a) follows from Proposition 4.10. (∞) To shorten notation let us respectively write vk and v⊥ k instead of vk (A)(x) and ⊥ ⊥ ⊥ v(∞) k (A)(x) . Given v ∈ Fj \Fj+1 = vτj−1 \vτj , consider the orthogonal decomposition

138

4 The Oseledets Filtration and Decomposition

⊥  v = u + v , with u ∈ v⊥ τj−1 ∩ vτj , u  = 0, and v ∈ vτj . By Proposition 4.16 the non-zero vector u is in the fiber of (∞) ⊥ ⊥ A| = v(∞) v(∞) τj −τj−1 τj−1 (A) ∩ vτj (A), vτ j−1

and λ− A (x, u) = λA (x, u) = Lτj−1 +1 (A) = Lτj (A) = λj (A). Analogously, and using Proposition 4.15, λA (x, v ) ≤ L1 A|v⊥τ = Lτj +1 (A) = Lτj+1 (A) = λj+1 (A) < λj (A). j

Finally, applying item (d) of Proposition 4.3 we infer that  λ− A (x, v) = λA (x, v) = λA (x, u + v ) = λA (x, u) = λj (A).



This proves (c).

Recall that Rg and Kg denote respectively the range and the kernel of any given a linear map g : V → V  . Definition 4.9 Given a linear map g : V → V  between Euclidean spaces V and V  its pseudo inverse g+ : V  → V is the composition g+ := (g|Kg⊥ )−1 ◦ πRg of the orthogonal projection πRg : V  → Rg with the inverse of g|Kg⊥ : Kg⊥ → Rg . Lemma 4.9 For any linear map g : V → V  , and integer 0 ≤ k ≤ dim V , ∧k (g+ ) = (∧k g)+ . Proof We make use of three functorial properties of exterior powers which can be easily checked: (1) ∧k Rg = R∧k g , (2) ∧k (g|E ) = ∧k g|∧k E and (3) (∧k g)−1 = ∧k (g−1 ). Thus ∧k (Kg⊥ ) = ∧k (Rg∗ ) = R∧k g∗ = K∧⊥k g , and ∧k (g+ ) = ∧k (g|Kg⊥ )−1 ◦ ∧k πRg = (∧k g|∧k Kg⊥ )−1 ◦ π∧k Rg = (∧k g|K∧⊥ g )−1 ◦ πR∧k g = (∧k g)+ . k

These cumbersome algebraic calculations have a natural geometric meaning. Definition 4.10 Given a cocycle A : X → Mat(m, R), we define for n > 0 A(−n) (x) := A(n) (T −n x)+ .



4.2 The Ergodic Theorems

139

When the cocycle takes invertible values, i.e., A : X → GL(m, R), the backward iterates A(−n) (x) correspond to forward iterates by the inverse cocycle (T −1 , A−1 ). Theorem 4.5 (Oseledets II) Let T : X → X be an invertible ergodic MPT of a probability space (X, F, μ), and let FA : B → B be a μ-integrable linear cocycle on a measurable bundle B ⊆ X × Rm . Then there exist λ1 > λ2 > . . . > λk+1 ≥ −∞ and a family of measurable functions Ej : X → Gr(Rm ), 1 ≤ j ≤ k + 1, such that for μ-almost every x ∈ X, (a) B(x) = ⊕k+1 j=1 Ej (x), (b) A(x) Ej (x) = Ej (Tx) for j = 1, . . . , k, and A(x) Ek+1 (x) ⊆ Ek+1 (Tx), 1 logA(n) (x) v = λj , (c) for every v ∈ Ej (x) \ {0}, lim n→±∞ n



1 (d) lim log sin min (⊕j≤l Ej (T n x), ⊕j>l Ej (T n x)) = 0, for any l = 2, . . . , k. n→±∞ n Proof Assume that A has a Lyapunov spectrum with exact gap pattern τ = (τ1 , . . . , τk ), where 0 = τ0 < τ1 < . . . < τk < τk+1 = dim E, and E = E(x) (∞) denotes the fiber of B. Set by convention v(∞) τ0 (A) = {0} and vτk+1 (A) = E(x). (∞) ∗ ⊥ Define Ej (x) := v(∞) τj (A )(x) ∩ vτj−1 (A)(x) for j = 1, . . . , k + 1. (∞) ∗ ⊥ By Proposition 4.10, both sub-bundles v(∞) τj (A ) and vτj−1 (A) are A-invariant, and hence the same is true about the intersection. This proves (b). For (a) consider the flag valued measurable functions (∞) ∗ ∗ v∗τ (x) = (v(∞) τ1 (A )(x), . . . , vτk (A )(x)) ∈ Fτ (E(x)), (∞) (∞) ⊥ ⊥ v⊥ τ (x) = (vτk (A)(x) , . . . , vτ1 (A)(x) ) ∈ Fτ ⊥ (E(x)).

According to Definition 2.35 we have {Ej (x)}1≤j≤k+1 = v∗τ (x)  v⊥ τ (x). Thus, in view of Proposition 2.36 it is now enough to see that θ (v∗τ (x), v⊥ τ (x)) > 0 for μ-a.e. x ∈ X. But by Definition 2.34 and Lemma 2.6, (∞) ∗ (∞) ⊥ θ (v∗τ , v⊥ τ ) = min θ∩ (vτi (A ), vτi (A) ) 1≤i≤k

(∞) ∗ = min ατi (v(∞) τi (A ), vτi (A)) 1≤i≤k

= min α(v(∞) (∧τi A∗ ), v(∞) (∧τi A)) > 0. 1≤i≤k

The final positivity follows from Lemma 4.4. This proves (a), or in other words that {Ej (x)}1≤j≤k+1 is a direct sum decomposition of E(x) with dim Ej (x) = τj − τj−1 , for all j = 1, . . . , k + 1.

140

4 The Oseledets Filtration and Decomposition

We prove (c) through several reductions. Consider first the case j = 1 and τ1 = 1. In this case τ0 = 0 and the intersection sub-bundle Ej is the 1-dimensional A-invariant bundle v(∞) (A∗ ). Let v∗ : X → P(Rm ) be a unit measurable section of this bundle. By invariance we have A(x) v∗ (x) = ±v∗ (Tx) for μ-a.e. x ∈ X, and hence logA(n) (x) v∗ (x) =

n−1 

logA(T j x) v∗ (T j x).

j=0

By Proposition 4.11, the function logA v∗  is μ-integrable with L1 (A). Therefore, by Birkhoff’s ergodic theorem we have lim

n→+∞



logA v∗  dμ =

1 logA(n) (x) v∗ (x) = L1 (A). n

On the other hand, since T is invertible, the Birkhoff averages (n)

logA (T

−n



x) v (T

−n

x) =

n−1 

logA(T −j x) v∗ (T −j x)

j=0

also converge μ-almost everywhere to L1 (A). Now, inverting the relation A(n) (T −n x) v∗ (T −n x) = A(n) (T −n x) v∗ (T −n x) v∗ (x), we get

A(n) (T −n x)+ v∗ (x) = A(n) (T −n x) v∗ (T −n x)−1 v∗ (T −n x),

so that

logA(−n) (x) v∗ (x) = − logA(n) (T −n x) v∗ (T −n x).

Thus lim

n→∞

1 logA(−n) (x) v∗ (x) = −L1 (A). n

Next consider the case j = 1 and r = τ1 > 1. In this case τ0 = 0 and the ∗ intersection sub-bundle Ej is the r-dimensional A-invariant bundle v(∞) r (A ). ∗ Given a unit vector v1 in v(∞) r (A ), include it in some orthonormal basis {v1 , . . . , vr } (∞) ∗ of vr (A ) and take the unit r-vector w = v1 ∧ · · · ∧ vr . Applying the previous case to the cocycle ∧r A and w ∈ v(∞) (∧r A), we conclude that lim

n→±∞

1 log∧r A(n) (x) (v1 ∧ . . . ∧ vr ) = L1 (∧r A) = r L1 (A). n

4.2 The Ergodic Theorems

141

By Proposition 4.13 we have log∧r A(n) (x)(v1 ∧ · · · ∧ vr ) ≤

r 

logA(n) (x)vi  ≤ r logA(n) (x),

i=1

and since both upper and lower bounds of this sum converge to r L1 (A), as n → ±∞, we conclude that for all i = 1, . . . , r, and in particular for i = 1, lim

n→±∞

1 logA(n) (x)vi  = L1 (A). n

Finally consider the general case, where 2 ≤ j ≤ k. By Proposition 4.16, (∞) (∞) ∗ ⊥ ). Ej = v(∞) τj (A ) ∩ vτj−1 (A) = vτj −τj−1 (A|v⊥ τ j−1

We denote this A-invariant sub-bundle by Bj . Given a non-zero vector v ∈ Bj (x), applying the previous case to the restricted cocycle A|Bj by Proposition 4.15 lim

n→±∞

1 1 logA(n) (x) v = lim log(A|Bj )(n) (x) v n→±∞ n n = Lτj −τj−1 (A|Bj ) = Lτj (A) = λj (A).

This proves (c). (∞) ∗ ⊥ For the last item, (d), notice that ⊕j≤l Ej = v(∞) τl (A ), while ⊕j>l Ej = vτl (A) . Hence by Proposition 2.19 (d),



sin min (⊕j≤l Ej (x), ⊕j>l Ej (x)) = ∠m in(v(∞) (A∗ ), v(∞) (A)⊥ ) ≥

(∞) ∗ ατl (v(∞) τl (A ), vτl (A))

= α(v

τl (∞)

τl

(∧τl A ), v(∞) (∧τl A)). ∗

Thus item (d) follows by Proposition 4.12 (a).



4.3 Abstract Continuity Theorem of the Oseledets Filtration Given an ergodic system satisfying base and uniform fiber LDT estimates, we prove that the Oseledets filtration and decomposition vary continuously with the cocycle in an L 1 sense. We begin by proving the continuity of the most expanding direction, as it contains the main ingredients of our argument. We then define the space of measurable filtrations and endow it with an appropriate topology. Using the construction of the Oseledets filtration and decomposition in Sect. 4.2.3, we deduce the continuity

142

4 The Oseledets Filtration and Decomposition

of these two quantities from that of the most expanding direction. This is obtained via some Grassmann geometrical considerations established in Chap. 2. Throughout this section we will be under the assumptions of the ACT (Theorem 3.1 in Chap.3). That is, given an ergodic MPDS (X, μ, T ), a space of measurable cocycles C , a set of observables Ξ , a set of LDT parameters P, we assume the following: 1. Ξ is compatible with every cocycle A ∈ C . 2. Every observable ξ ∈ Ξ satisfies a base-LDT estimate: for every ε > 0 there is p = p(ξ, ε) ∈ P, p = (n0 , ε, ι), such that for all n ≥ n0 we have εn ≤ ε and  n−1

1  ξ(T j x) − ξ dμ > εn } < ιn . μ {x ∈ X :

n j=0 X

(4.11)

3. Every cocycle with finite maximal Lyapunov exponent is uniformly L p -bounded, where 1 < p ≤ ∞. For simplicity of notations we let p = 2. 4. Every cocycle A ∈ C such that L1 (A) > L2 (A) satisfies a uniform fiber-LDT estimate: for all ε > 0 there are δ = δ(A, ε) > 0 and p = p(A, ε) ∈ P, p = (n0 , ε, ι), such that if B ∈ Cm with dist(B, A) < δ and if n ≥ n0 then εn ≤ ε and

1 (4.12) μ {x ∈ X : logB(n) (x) − L1(n) (B) > εn } < ιn . n As before, εn := ε(n), ιn := ι(n), where ε, ι : (0, ∞) → (0, ∞) are such that the deviation size functions ε(t) are non-increasing, while the deviation set measure functions ι(t) are continuous and strictly decreasing to 0, as t → ∞, at least like a power and at most like an exponential. The latter constraint is just for convenience, so we can write, whenever needed, e−c0 n + ιn  ιn . Moreover, it will be convenient in this section to assume a stronger decay at infinity of ι(t), which holds in all our applications. Hence we will assume that e−c0 t < ι(t) < t −10 as t → ∞, where c0 > 0 is some fixed constant.

4.3.1 Continuity of the Most Expanding Direction We employ the Lipschitz estimates on Grassmann manifolds and the avalanche principle derived in Chap. 2. Recall from Sect. 4.2.3 that the most expanding direction of the nth iterate of a cocycle A ∈ C defines a partial function

4.3 Abstract Continuity Theorem of the Oseledets Filtration

 (n)

v (A)(x) :=

v(A(n) (x)) undefined

143

if gr(A(n) (x)) > 1 otherwise.

By Proposition 4.4, as n → ∞ the functions v(n) (A)(x) converge μ a.e. to a measurable function v(∞) (A) : X → P(Rm ). Let L 1 (X, P(Rm )) be the space of all Borel measurable functions F : X → P(Rm ). Consider the distance  dist(F1 , F2 ) := d(F1 (x), F2 (x)) μ(dx), X

where the quantity under the integral sign refers to the distance between points in the projective space P(Rm ). Clearly all the functions v(n) (A) are in L 1 (X, P(Rm )), and by dominated convergence we have that as n → ∞, v(n) (A) → v(∞) (A)

in L 1 (X, P(Rm )).

(4.13)

We will prove that if L1 (A) > L2 (A), then locally near A, the map B → v(∞) (B) is continuous with a modulus of continuity depending on the LDT parameter. We do so by deriving a quantitative version of the convergence in (4.13), which moreover is somewhat uniform in phase and cocycle. This more precise convergence comes as a consequence of the availability of the LDT estimates for our system, as the exceptional sets of phases in the domain of applicability of the avalanche principle can be precisely (and uniformly in the cocycle) measured. Fix a cocycle A ∈ C such that L1 (A) > L2 (A). Let κ(A) := L1 (A) − L2 (A) > 0 if L2 (A) > −∞ or else let κ(A) be a large enough constant. Fix ε0 := κ(A)/100. What follows is a bookkeeping of various exceptional sets related to notions from Chap. 3. They will eventually define and measure the exceptional sets in the domain of applicability of the avalanche principle (AP) in Proposition 2.42, for certain sequences of iterates of a cocycle B in a small neighborhood of A. They depend on A and ε0 , hence only on A. Pick for the rest of this subsection δ = δ(A) > 0, n0 = n0 (A) ∈ N, ι = ι(A) ∈ I such that, by Lemma 3.4 in Chap. 3 we have: for all B ∈ Cm with dist(B, A) < δ, gr(B(n) (x)) > en κ(A)/2 =:

1 (> 1) κn

(4.14)

holds for all n ≥ n0 and for all x outside a set of measure < ιn , and

(n)

L (B) − L (m) (A) < κ(A)/50 1

holds for all m, n ≥ n0 .

1

(4.15)

144

4 The Oseledets Filtration and Decomposition

As before, (4.14) will ensure the gap condition in the AP, while (4.15) via Lemma 3.3 will ensure the angle condition. Fix a cocycle B with dist(B, A) < δ. We will define, for all scales n ≥ n0 , the exceptional sets outside which the AP can be applied for various block lengths and configurations of block components. The exceptional set in the nearly uniform upper semicontinuity of the maximal Lyapunov exponent (Proposition 3.1 in Chap. 3) depends on A and ε0 , hence only on usc A, and we denote it by Busc n (A). Its measure is μ [Bn (A)] < ιn . Let

1

logB(n) (x) − L (n) (B) > ε0 } Bldt n (B) := {x ∈ X : 1 n be the exceptional set in the uniform fiber-LDT estimate, so μ [Bldt n (B)] < ιn . Let usc Bgn (B) := Bldt n (B) ∪ Bn (∧2 A). g

A simple inspection of the proof of Lemma 3.1 in Chap. 3 shows that Bn (B) is g contained in the exceptional set in (4.14), and its measure satisfies μ [Bn (A)]  ιn . (n) Note also that (4.14) ensures that v(B (x)) is defined (since there is a gap between the two largest singular values). Moreover, a simple inspection of the proof of Lemma 3.3 in Chap. 3, combined with (4.15), shows that for 2n ≥ m1 , m2 ≥ n ≥ n0 the bound B(m2 +m1 ) (x) > e−(m1 +m2 ) κ(A)/20 > e−n κ(A)/5 =: εn B(m2 ) (T m1 x) B(m1 ) (x)

(4.16)

holds provided that ldt −m1 ldt Bm2 (B). x∈ / Bldt m2 +m1 (B) ∪ Bm1 (B) ∪ T

Note from (4.14) and (4.16) that κε2n = e−n κ(A)/10 < ιn  1, hence the condition n on κ and ε from the AP is satisfied. The bound on the distance between most expanding directions in the conclusion of the AP is κn = e−3 n κ(A)/10 < ιn . (4.17) εn When using the AP, we will always apply (4.16) to configurations for which n is fixed and m1 = n while n ≤ m2 ≤ 2n. This motivates defining Ban (B) :=



−n ldt [Bldt Bm (B)]. m (B) ∪ T

n≤m≤3n

Clearly μ [Ban (B)]  n ιn , and if x ∈ / Ban (B), then the angle condition will be ensured for block components of the kind indicated above.

4.3 Abstract Continuity Theorem of the Oseledets Filtration

145

Let g a Bga n (B) := Bn (B) ∪ Bn (B), ga

ga

/ Bn (B), both the gap and the angle conditions hold so μ [Bn (B)]  n ιn , and if x ∈ for appropriate block components at scale n. −1/3 Let n0 ≥ n0 and 2n0 ≤ n1 ≤ ιn0 . If we define 

Bap n0 (B) :=

T −in0 Bga n0 (B),

(4.18)

−1/3

0≤i L2 (A), there are parameters δ, n0 , ι depending only on A so that for any cocycle B ∈ Cm with dist(B, A) < δ and for any scale k ≥ n0 , there are exceptional ap 1/2  sets Bk (B) and Bk (B) of measure < ιk such that −1/3

ap

1. If n0 ≥ n0 , 2n0 ≤ n1 ≤ ιn0 and if x ∈ / Bn0 (B), then the AP with parameters κn0 , εn0 can be applied to a block B(n1 ) (x) of length n1 whose components have lengths n0 , except possibly for the last, whose length is between n0 and 2n0 . / Bn (B), then for any scales n0 , n1 such that n0 ≥ n and 2. If n ≥ n0 and if x ∈ −1/3 2n0 ≤ n1 ≤ ιn0 , the AP with parameters κn0 , εn0 can be applied to a block B(n1 ) (x) of length n1 whose components have lengths  n0 . The following results are now easy to phrase and to prove. −1/3

Lemma 4.10 Let n1 , n0 ∈ N such that n0 ≥ n0 and 2n0 ≤ n1 ≤ ιn0 then κn d(v(B(n1 ) (x)), v(B(n0 ) (x))) < 0 . εn0

ap

. If x ∈ / Bn0 (B) (4.20)

146

4 The Oseledets Filtration and Decomposition

Proof Consider the block B(n1 ) (x) and break it down into n−1 many blocks of length n0 each, and a remaining block of length m with n0 ≤ m < 2n0 . In other words, write n1 = (n − 1) n0 + m, for some n0 ≤ m < 2n0 and define gi = gi (x) := B(n0 ) (T i n0 x) for 0 ≤ i ≤ n − 2, and gn−1 = gn−1 (x) := B(m) (T (n−1) n0 x). Then

g(n) = gn−1 . . . g1 g0 = B(n1 ) (x), gi gi−1 = B(2n0 ) (T (i−1)n0 x)

for 1 ≤ i ≤ n − 2, while gn−1 gn−2 = B(m+n0 ) (T (n−2)n0 x). ap

Since x ∈ / Bn0 (B), we are in the setting described in Remark 4.2, hence the AP in the form described in Proposition 2.42 in Chap. 2 applies and we have: d(v(B(n1 ) (x)), v(B(n0 ) (x))) = d(v(g(n) ), v(g0 )) 

κn0 . εn0



Lemma 4.11 For all n0 ≥ n0 , m ≥ n02+ , if x ∈ / Bn0 (B) then d(v(B(m) (x)), v(B(n0 ) (x))) 

κn0 . εn0

(4.21)

Proof Fix 0 < c  1. Let ψ(t) := t 2 . Define inductively the following intervals of scales N0 := [n01+c , n03+c ] ⊂ −1/3 [2n0 , ιn0 ], N1 := ψ(N0 ) = [n02+2c , n06+2c ] and for all k ≥ 0, Nk+1 := ψ(Nk ). These intervals overlap, so they cover up all integers ≥ n01+c . Let m ≥ n02+2c . Then there is k ≥ 0 such that m = mk+1 ∈ Nk+1 , and so mk+1  mk2 for some mk ∈ Nk . In fact, there is a backward “orbit” of integers m0 ∈ N0 , m1 ∈ N1 , . . . , mk ∈ Nk such that mj+1  mj2 . / Bn0 (B), by (4.19) we For any 0 ≤ j ≤ k, since mj ≥ m0 ≥ n0 , and since x ∈ ap −1/3 have x ∈ / Bmj (B). Moreover, since mj+1  mj2 , we have 2mj ≤ mj+1 ≤ ιmj , hence Lemma 4.10 is applicable to the scales mj , mj+1 and we get d(v(B(mj+1 ) (x)), v(B(mj ) (x)))
κ1n > 1, so in particular v(Bi(n) (x)) are well defined. Moreover, the fiber-LDT estimate applies to Bi and we have

1 κ(A) κ(A) logBi(n) (x) > L1(n) (Bi ) − ε0 > L1(n) (A) − − > −C0 , n 50 100 where we used (4.15) in the estimate above, and C0 = C0 (A) < ∞. Then gi  = Bi(n) (x) > e−C0 n . Since for every x, Bi (x) < C(A) < ∞, by possibly increasing C0 , we may also assume that gi  = Bi(n) (x) < eC0 n . Moreover, assuming dist(B1 , B2 ) < e−C1 n , with C1 to be chosen later, g1 − g2  = B1(n) (x) − B2(n) (x) ≤ n eC0 (n−1) dist(B1 , B2 ) < e−(C1 −2C0 ) n . If we choose C1 > 3C0 + C2 , then drel (g1 , g2 ) :=

e−(C1 −2C0 ) n g1 − g2  ≤ < e−C2 n  1. max{g1 , g2 } e−C0 n

Proposition 2.40 in Chap. 2 applies, and we conclude: d(v(g1 ), v(g2 )) ≤

16 drel (g1 , g2 )  e−C2 n . 1 − κn2

This proves (4.26), while (4.27) follows by integration in x.



We are now ready to formulate and to prove the continuity of the most expanding direction. Theorem 4.6 Let A ∈ Cm with L1 (A) > L2 (A). There are δ > 0, ι ∈ I, c > 0, α > 0, all depending only on A, such that for any cocycles B1 , B2 ∈ Cm with dist(Bi , A) < δ, where i = 1, 2, we have: μ {x ∈ X : d(v(∞) (B1 )(x), v(∞) (B2 )(x)) > dist(B1 , B2 )α } < ωι (dist(B1 , B2 )),

4.3 Abstract Continuity Theorem of the Oseledets Filtration

149

where ωι (h) := [ι (c log(1/h))]1/2 is a modulus of continuity function, and clearly ωι (h) → 0 as h → 0. Moreover, (4.28) dist(v(∞) (B1 ), v(∞) (B2 )) < ωι (dist(B1 , B2 )). Proof Fix any C2 > 0 and let C1 be the constant in Proposition 4.19. Put dist(B1 , B2 ) =: h and choose n ∈ N such that h  e−C1 n . Since h ≤ 2δ and n  1/C1 log 1/h, by taking δ small enough we may assume that n ≥ n0 . Apply Proposition 4.19 to get that for x outside a set of measure < ιn , dist(v(B1(n) (x)), v(B2(n) (x))) < e−C2 n .

(4.29)

Now apply Proposition 4.18 to B = Bi , i = 1, 2, to get that for x outside a set of 1/2 measure < ιn , (4.30) d(v(Bi(n) (x)), v(∞) (Bi )(x)) < e−3 nκ(A)/10 . 1/2

Combine (4.29) and (4.30) to conclude that for x outside a set of measure  ιn and for c0 < min{C2 , 3κ(A)/10}, we have d(v(∞) (B1 )(x), v(∞) (B2 )(x)) < e−c0 n = hα ,

where α = Cc01 . This proves the pointwise estimate. To prove (4.28), simply integrate in x and take into account the fact that since the large deviation measure function ι decays at most exponentially, the corresponding modulus of continuity function ωι will decay at most like a power of h, so we may  assume hα < ωι (h) as h → 0.

4.3.2 Spaces of Measurable Filtrations and Decompositions We introduce a space of measurable filtrations, i.e. a space of functions from the phase space to the set of all flags. Thus the Oseledets filtration of a linear cocycle is an element of this space. We endow the space of measurable filtrations with a natural topology. Similarly, we define a space of measurable decompositions. We start with an example that will motivate the formalism below. Let A be a linear cocycle with exact gap pattern say τ = (2, 3), that is, L1 (A) = L2 (A) > L3 (A) > L4 (A) = · · · = Lm (A). The Oseledets filtration of A is a τ ⊥ = (m − 3, m − 2)-flag {0} = F4 (A)(x)  F3 (A)(x)  F2 (A)(x)  F1 (A)(x) = Rm ,

150

4 The Oseledets Filtration and Decomposition

for μ-a.e. x ∈ X, thus defining a measurable function F(A) : X → Fτ ⊥ (Rm ). The growth rate of the iterates of A along vectors in F3 (A)(x) is L4 (A) or less, the growth rate along vectors in F2 (A)(x) is L3 (A) or less and the growth rate along vectors in F1 (A)(x) is L1 (A) or less. By the continuity of the Lyapunov exponents of a linear cocycle (which holds under the assumptions in this section), if B is a small perturbation of A, then L1 (B) ≥ L2 (B) > L3 (B) > L4 (B) ≥ · · · ≥ Lm (B), meaning that B will still have a τ = (2, 3) gap pattern. However, this might not be its exact gap pattern, as we could have L1 (B) > L2 (B), leading to a finer gap pattern, say τ  = (1, 2, 3). If τ  were the exact gap pattern of B, then its Oseledets filtration would be a τ ⊥ = (m − 1, m − 2, m − 3)-flag {0} = F5 (B)(x)  F4 (B)(x)  F3 (B)(x)  F2 (B)(x)  F1 (B)(x) = Rm , for μ-a.e. x ∈ X, thus defining a measurable function F(B) : X → Fτ ⊥ (Rm ). The subspaces F4 (B)(x), F3 (B)(x), F2 (B)(x) and F1 (B)(x) correspond to the Lyapunov exponents L4 (B), L3 (B), L2 (B) and L1 (B) respectively. In order to compare the Oseledets filtration of B with that of A, we would need to “forget” the extra subspace F2 (B)(x) corresponding to the Lyapunov exponent L2 (B), which appears precisely because the gap pattern τ  of B is finer than that of A. In other words, we consider the projection F τ (B) of the Oseledets filtration F(B) to the space of coarser τ ⊥ = (m − 3, m − 2)-flags valued filtrations {0} = F5 (B)(x)  F4 (B)(x)  F3 (B)(x)  F1 (B)(x) = Rm . Now F(A)(x) and F τ (B)(x) are both τ ⊥ -flags, and we may define a distance between them component-wise (as points in the same Grassmann manifold). The distance between the measurable filtrations F(A) and F τ (B) as functions on X will be the space average of the pointwise distances. Furthermore, the Oseledets decomposition E· (A) of the cocycle A with exact τ = (2, 3) gap pattern, consists of a 2-dimensional subspace E1 (A)(x) corresponding to L1 (A) = L2 (A), a one dimensional subspace E2 (A)(x) corresponding to L3 (A), and an m −3-dimensional subspace E3 (A)(x) corresponding to the remaining (and equal) Lyapunov exponents. If a small perturbation B of A has (as above) the finer τ  = (1, 2, 3) gap pattern, then its Oseledets decomposition will consist of subspaces E1 (B)(x) (one dimensional, corresponding to L1 (B)), E2 (B)(x) (one dimensional, corresponding to L2 (B)), E3 (B)(x) (one dimensional, corresponding to L3 (B)) and the subspace E4 (B)(x) (m−3 dimensional, corresponding to the remaining Lyapunov exponents). In order to compare the Oseledets decompositions of A and B, we would have to “patch up” the first two Oseledets subspaces for B. In other words, we will consider the natural restriction E·τ (B) of the Oseledets decomposition E· (B), consisting of the subspaces E1 (B) ⊕ E2 (B), E3 (B), E4 (B).

4.3 Abstract Continuity Theorem of the Oseledets Filtration

151

We make the obvious observation that for two dimensional (i.e. Mat(2, R)-valued) cocycles, or for cocycles of any dimension with simple Lyapunov spectrum, these projection/restriction of the filtration/decomposition are not needed. Let us now formally define the space of measurable filtrations. Given two signatures τ = (τ1 , . . . , τk ) and τ  = (τ1 , . . . , τk  ), we say that τ refines τ  , and write τ ≥ τ  , if {τ1 , . . . , τk } ⊇ {τ1 , . . . , τk  }. Given τ ≥ τ  , there is a natural projection ρτ,τ  : Fτ (Rm ) → Fτ  (Rm ), defined by ρτ,τ  (F) = ρτ,τ  (F1 , . . . , Fk ) := (Fi1 , . . . , Fik ), where τj = τij for j = 1, . . . , k  . With respect to the following normalized distance on the flag manifold Fτ (Rm ) (see (2.13) in Chap. 2), dτ (F, F  ) = max d(Fj , Fj ) 1≤j≤k

these projections are Lipschitz, with Lipschitz constant 1. Let (X, F , μ) be a probability space and let T : X → X be an ergodic measure preserving transformation. We call measurable filtration of Rm any mod 0 equivalence class of an F measurable function F : X → F(Rm ). Two functions F, F  : X → F(Rm ) are said to be equivalent mod 0 when F(x) = F  (x) for μ-a.e. x ∈ X. From now on we will identify each mod 0 equivalence class with any of its representative measurable functions. Given any measurable filtration F of Rm , let τ (F)(x) denote the signature of the flag F(x). We say that F has a T -invariant signature if τ (F)(x) = τ (F)(Tx) for μ-a.e. x ∈ X. If this is the case, then by the ergodicity of (T , μ) the function τ (F)(x) is constant μ-a.e. Define F(X, Rm ) to be the space of mod 0 equivalence classes of measurable filtrations with a T -invariant signature, which is a constant that we denote by τ (F). We say that F has a τ -pattern when τ (F) ≥ τ . Given a signature τ , let us define F⊃τ (X, Rm ) to be the subspace of measurable filtrations in F(X, Rm ) with a τ -pattern. By definition F⊃τ (X, Rm ) ⊆ F⊃τ  (X, Rm ), whenever τ ≥ τ  . Given F ∈ F⊃τ (X, Rm ), the function F τ (x) := ρτ (F),τ (F(x)) determines a measurable filtration with constant signature τ , which will be referred to as the τ -restriction of F. We endow F⊃τ (X, Rm ) with the following distance distτ (F, F  ) :=

 X

  dτ F τ (x), (F  )τ (x) dμ(x).

(4.31)

152

4 The Oseledets Filtration and Decomposition

Finally, we endow the space F(X, Rm ) of all measurable filtrations of Rm with the topology determined by the following neighborhood bases, Vδ,τ (F) := { F  ∈ F⊃τ (X, Rm ) : distτ (F, F  ) < δ }, where δ > 0, F ∈ F(X, Rm ) and τ = τ (F). We note that this topology is not metrizable. Proposition 4.20 Let C be a topological space. A map F : C → F(X, Rm ) is continuous w.r.t. this topology if and only if for all A ∈ C such that F(A) has a τ -pattern, there exists a neighborhood U ⊂ C of A such that F(U) ⊆ F⊃τ (X, Rm ) and the τ -restricted function F τ |U : U → F⊃τ (X, Rm ), B → F τ (B), is continuous w.r.t. the distance distτ defined above. Proof Assume first that F : C → F(X, Rm ) is continuous and take A ∈ C such that F(A) ∈ F(X, Rm ). Consider the neighborhood Vδ,τ (F(A)) of F(A) where δ > 0. By continuity of F, there exists a neighborhood U ⊂ C of A such that F(U) ⊂ Vδ,τ (F(A)) ⊂ F⊃τ (X, Rm ). By definition of the topology in F(X, Rm ), the set F⊃τ (X, Rm ) is open in F(X, Rm ), and the projection ρτ : F⊃τ (X, Rm ) → F⊃τ (X, Rm ), ρτ (F) = F τ , is continuous. The restriction F τ |U : U → F⊃τ (X, Rm ), B → F τ (B), is continuous because it coincides with the composition ρτ ◦ F. The converse statement is a direct consequence of the definition.  Recall that a τ -decomposition is a family of linear subspaces E· = {Ei }1≤i≤k+1 in Gr(Rm ) such that Rm = ⊕k+1 i=1 Ei and dim Ei = τi − τi−1 for all 1 ≤ i ≤ k + 1. In Chap. 2, we have denoted by Dτ (V ) the space of all τ -decompositions of Rm . Given τ ≥ τ  , there is a natural projection ρτ,τ  : Dτ (Rm ) → Dτ  (Rm ), defined by ρτ,τ  (E· ) = ρτ,τ  (E1 , . . . , Ek+1 ) := (E1 , . . . , Ek  +1 ), where Ej = ⊕ij ≤l 0, E· ∈ D(X, Rm ) and τ = τ (E· ). Again, this topology is not metrizable, and a similar characterization of the continuity of a map E· : C → D(X, Rm ) holds. Proposition 4.21 Let C be a topological space. A map E· : C → D(X, Rm ) is continuous w.r.t. this topology if and only if for all A ∈ C such that E· (A) has a τ -pattern, there exists a neighborhood U ⊂ C of A such that E· (U) ⊆ D⊃τ (X, Rm ) and the τ -restricted function E·τ |U : U → D⊃τ (X, Rm ), B → E·τ (B), is continuous w.r.t. the distance distτ defined above.

4.3.3 Continuity of the Oseledets Filtration We denote by F(A) the Oseledets filtration of a cocycle A ∈ Cm . If A has a τ gap pattern, by the continuity of the Lyapunov exponents, any nearby cocycle B has the same or a finer gap pattern τ  ≥ τ . Let F τ (B) denote the projection of the Oseledets filtration of B to the space F⊃τ (X, Rm ) of measurable filtrations with a τ -pattern. We are now ready to phrase and to prove the continuity of the Oseledets filtration. Theorem 4.7 Let A ∈ Cm be a cocycle with a τ gap pattern. Then locally near A, the map Cm  B → F τ (B) ∈ F⊃τ (X, Rm )

154

4 The Oseledets Filtration and Decomposition

is continuous with a modulus of continuity ω(h) := [ι (c log h1 )]1/2 for some constant c = c(A) > 0 and for some deviation measure function ι = ι(A) from the corresponding set of LDT parameters. In fact, a stronger pointwise estimate holds: μ {x ∈ X : d(F τ (B1 )(x), F τ (B2 )(x)) > dist(B1 , B2 )α } < ω(dist(B1 , B2 )), for any B1 , B2 ∈ Cm in a neighborhood of A, and for some α = α(A) > 0. Moreover, the map Cm  A → F(A) ∈ F(X, Rm ) is continuous everywhere. Proof Since A has a τ = (τ1 , . . . , τk ) gap pattern, Lτj (A) > Lτj−1 (A) for all indices j, so L1 (∧τj A) > L2 (∧τj A). We may then apply the continuity of the most expanding direction in Theorem 4.6 to ∧τj A and obtain that Cm  B → ∧τj B → v(∞) (∧τj B) ∈ L 1 (X, P(∧τj Rm )) is continuous at A, with a modulus of continuity of the form ω(h) = [ι (c log h1 )]1/2 . A similar pointwise estimate holds as well. The Oseledets filtration of A was obtained in the proof of the Oseledets Theo⊥  rem 4.4 as F(A)(x) = v(∞) τ (A)(x) , where (∞) (∞) v(∞) τ (A)(x) = vτ1 (A)(x), . . . , vτk (A)(x) , and

−1 (∞) v(∞) (v (∧τj A)(x)). τj (A)(x) = Ψ

Moreover, since for any nearby cocycle B we clearly have ⊥ F τ (B)(x) = Ψ −1 (v(∞) (∧τ1 B)(x)), . . . , Ψ −1 (v(∞) (∧τk B)(x)) , the first two assertions follow from the continuity of the most expanding direction and the fact that the Plücker embedding Ψ and the orthogonal complement ⊥ are isometries. The third assertion is an immediate consequence of Proposition 4.20. 

4.3.4 Continuity of the Oseledets Decomposition We denote by E· (A) the Oseledets decomposition of a cocycle A ∈ Cm . Assume that A has a τ = (τ1 , . . . , τk ) gap pattern. By the construction in the proof of Theorem 4.5, we have (∞) ∗ ⊥ E· (A)(x) = v(∞) τ (A )(x)  vτ (A)(x) .

4.3 Abstract Continuity Theorem of the Oseledets Filtration

155

By the continuity of the Lyapunov exponents, any nearby cocycle B has the same or a finer gap pattern τ  ≥ τ . Let E·τ (B)(x) denote the τ -restriction of E· (B)(x) to the space of decompositions with signature τ . Clearly we have (∞) ∗ ⊥ E·τ (B)(x) = v(∞) τ (B )(x)  vτ (B)(x) .

We may immediately conclude from Sect. 4.3.3, or directly from he continuity of the most expanding direction derived in Sect. 4.3.1 that the maps (∞) ∗ ⊥ B → v(∞) τ (B ) and B  → vτ (B)

are continuous in a neighborhood of A, with an appropriate modulus of continuity. However, this does not automatically guarantee the continuity of the intersection. Indeed, by Proposition 2.38 in Chap. 2, the intersection map  : Fτ (V ) × Fτ ⊥ (V ) → Dτ (V ) is locally Lipschitz, but with a Lipschitz constant that depends on the transversality measurement of the subspaces, which may blow up for some phases x. That is why we need to control these transversality measurements at finite scale first. We will employ a similar scheme as in the establishing of the continuity of the most expanding direction in Sect. 4.3.1. Recall from Sect. 4.2.3 the nth scale partial functions v(n) τ (B) on X taking values in Fτ (Rm ), vτ (B(n) (x)) if gr τ (B(n) (x)) > 1 v(n) (B)(x) := τ undefined otherwise, where   vτ (B(n) (x)) = vτ1 (B(n) (x)), . . . , vτk (B(n) (x))   = Ψ −1 (v(∧τ1 B(n) (x))), . . . , Ψ −1 (v(∧τk B(n) (x))) . Consider the exceptional sets defined in Sect. 4.3.1 for each dimension τj , that is, define  Bn (∧τj B). Bn (B) := 1≤j≤k

Redefine κ(A) := min κ(∧τj A), which subsequently determine κn and εn as in (4.14) 1≤j≤k

and (4.16). Since A has a τ = (τ1 , . . . , τk ) gap pattern, the estimates on the most expanding direction, namely Remark 4.2, Propositions 4.18 and 4.19 are applicable to ∧τj B, 1 ≤ j ≤ k. We summarize the relevant results in the following remark. Remark 4.3 There are parameters δ, n0 and ι, depending only on A, such that the following hold for all cocycles B with dist(B, A) < δ and for all scales n ≥ n0 .

156

4 The Oseledets Filtration and Decomposition

1. vτ (B(n) (x)) is well defined for all phases x ∈ / Bn (B). Moreover, for all such x we have 1 , κn ∧τj B(2n) (x) > εn . ατ (B(n) (T −n x), B(n) (x))  min 1≤j≤k ∧τj B(n) (T −n x) ∧τj B(n) (x) gr τ (B(n) (x)) = min gr(∧τj B(n) (x)) > 1≤j≤k

2. The sequence of partial functions v(n) τ (B) converges μ-a.e, as n → ∞, to a (∞) m function vτ (B) : X → Fτ (R ). 3. For all phases x ∈ / Bn (B), we have the following rate of convergence: κ n (∞) . dτ v(n) τ (B)(x), vτ (B)(x) < εn

(4.33)

4. The partial functions v(n) τ (B) satisfy the following finite scale uniform continuity property. Given C2 > 0, there is C1 = C1 (A, C2 ) < ∞ such that for any cocycles Bi ∈ Cm with dist(Bi , A) < δ for i = 1, 2, if dist(B1 , B2 ) < e−C1 n , then for x outside a set of measure < ιn we have: (n) −C2 n dτ v(n) . τ (B1 )(x), vτ (B2 )(x) < e

(4.34)

Proof The statements in item 1 above follow from (4.14) and (4.16) applied to ∧τj B for 1 ≤ j ≤ k. Each component of the flag vτ (B(n) (x)) converges, for μ-a.e. x ∈ X, by Proposition 4.4 and the fact that B has the τ gap pattern. The rate of convergence in item 3 is a consequence of Proposition 4.18 applied in each component of the flag vτ (B(n) (x)), that is, applied to the exterior powers ∧τj B for 1 ≤ j ≤ k. The same argument holds for item 4.  Remark 4.4 Since A has the τ gap pattern, so does A∗ . Therefore, by possibly doubling the size of the exceptional set, we may assume that the rate of convergence (4.33) holds for both B and B∗ . The same applies to the finite scale continuity (4.34). We define a finite scale decomposition which will be shown to converge to the (τ restricted) Oseledets decomposition. Consider the partial function on X taking values in Dτ (Rm ) and defined by (n) ∗ ⊥ E·(n) (B)(x) := v(n) τ (B )(x)  vτ (B)(x)

if

(n) ∗ ⊥ > 0, gr τ (B(n) (x)) > 1 and θ v(n) τ (B )(x), vτ (B)(x)

otherwise it is undefined.

4.3 Abstract Continuity Theorem of the Oseledets Filtration

157

Clearly this map is well defined for all x ∈ / Bn (B). We begin by establishing a lower bound on the transversality measurement for the flags defining this finite scale decomposition. Lemma 4.12 For all x ∈ / Bn (B) and n ≥ n0 we have (n) ∗ ⊥ ≥ εn . θ v(n) τ (B )(x), vτ (B)(x)

(4.35)

Proof This lower bound follows easily from Proposition 2.39 in Chap. 2 and the second inequality in item 1 of Remark 4.3. (n) ∗ ⊥ = θ vτ (B∗ (n) (x)), vτ (B(n) (x))⊥ θ v(n) τ (B )(x), vτ (B)(x) ∗ = θ vτ (B(n) (T −n x) ), vτ (B(n) (x))⊥ ≥ ατ (B(n) (T −n x), B(n) (x)) ≥ εn .



Next we establish the convergence to E·τ (B) of the finite scale decomposition introduced above. Proposition 4.22 (speed of convergence) For all x ∈ / Bn (B) and n ≥ n0 we have   κn d E·(n) (B)(x), E·τ (B)(x) < 2 . εn

(4.36)

Proof Fix the phase x and the scale n. For simplicity of notation let (∞) ∗ m  ⊥ m F := v(∞) τ (B )(x) ∈ Fτ (R ), F := vτ (B)(x) ∈ Fτ ⊥ (R ), (n) ∗ m ⊥ m F0 := v(n) τ (B )(x) ∈ Fτ (R ), F0 := vτ (B)(x) ∈ Fτ ⊥ (R ). 



With these notations we have E·τ (B)(x) = F  F  and E·(n) (B)(x) = F0  F0 . By Proposition 2.38 in Chap. 2, we have    d E·(n) (B)(x), E·τ (B)(x) = d(F0  F0 , F  F  )  1 1 (dτ (F0 , F) + dτ ⊥ (F0 , F  )). ≤ max  , θ (F0 , F0 ) θ (F, F  )

(4.37)

Applying (4.33) to B∗ we get: κ n (∞) ∗ ∗ (B )(x), v (B )(x) < , dτ (F0 , F) = dτ v(n) τ τ εn

(4.38)

158

4 The Oseledets Filtration and Decomposition

while applying (4.33) to B and using the fact the the orthogonal complement ⊥ is an isometry, we get: (∞) ⊥ ⊥ dτ ⊥ (F0 , F  ) = dτ ⊥ v(n) τ (B)(x) , vτ (B)(x) κ n (∞) = dτ v(n) . τ (B)(x), vτ (B)(x) < εn

(4.39)

By Lemma 4.12 we have (n) ∗ ⊥ ≥ εn , (B )(x), v (B)(x) θ (F0 , F0 ) = θ v(n) τ τ

(4.40)

and by Lemma 2.37 in Chap. 2 combined with (4.38) and (4.39) we have: θ (F, F  ) ≥ θ (F0 , F0 ) − dτ (F, F0 ) − dτ ⊥ (F  , F0 ) κn κn −  εn . ≥ εn − εn εn We conclude by combining (4.37)–(4.41).

(4.41) 

Remark 4.5 The proposition above shows in particular that the partially defined finite scale decompositions E·(n) (B)(x) converge for μ-a.e. x ∈ X to the τ -restriction E·τ (B)(x) of the Oseledets decomposition of B. Proposition 4.23 (finite scale continuity) There are constants C1 = C1 (A) < ∞ and C3 = C3 (A) > 0 such that for any cocycles Bi ∈ Cm with dist(Bi , A) < δ for i = 1, 2, if dist(B1 , B2 ) < e−C1 n , then for x outside a set of measure < ιn and n ≥ n0 we have:   (4.42) d E·(n) (B1 )(x), E·(n) (B2 )(x) < e−C3 n . Proof Let C2 > κ(A)/2. We apply item 4 of Remark 4.3. There is C1 = C1 (A) such that for any cocycles Bi ∈ Cm with dist(Bi , A) < δ for i = 1, 2, there is a set of phases of measure < ιn such that outside of that set, (4.34) holds for both B1 , B2 and B1∗ , B2∗ . Fix such a phase x, and to simplify notations, for i = 1, 2 let (n) ∗  ⊥ Fi := v(n) τ (Bi )(x), Fi := vτ (Bi )(x) ,

hence E·(n) (Bi )(x) = Fi  Fi . By Proposition 2.38 in Chap. 2, we have   d E·(n) (B1 )(x), E·(n) (B2 )(x) = d(F1  F1 , F2  F2 )  1 1 , (dτ (F1 , F2 ) + dτ ⊥ (F1 , F2 )). ≤ max θ (F1 , F1 ) θ (F2 , F2 )

(4.43)

4.3 Abstract Continuity Theorem of the Oseledets Filtration

159

Applying (4.34) to B1∗ , B2∗ we get dτ (F1 , F2 ) = dτ v(n) (B1∗ )(x), v(n) (B2∗ )(x) < e−C2 n ,

(4.44)

and applying (4.34) to B1 , B2 we get dτ ⊥ (F1 , F2 ) = dτ ⊥ v(n) (B1 )(x)⊥ , v(n) (B2 )(x)⊥ = dτ v(n) (B1 )(x), v(n) (B2 )(x) < e−C2 n .

(4.45)

By Lemma 4.12 we have, for i = 1, 2: (n) ∗ ⊥ ≥ εn = e−n κ(A)/5 . θ (Fi , Fi ) = θ v(n) τ (Bi )(x), vτ (Bi )(x)

(4.46)

Combining (4.43)–(4.46) we conclude:   d E·(n) (B1 )(x), E·(n) (B2 )(x)  en κ(A)/5 e−C2 n < e−C3 n , for an appropriate constant C3 , which proves the proposition.



We are now ready to formulate the ACT for the Oseledets decomposition. Theorem 4.8 Let A ∈ Cm be a cocycle with a τ gap pattern. Then locally near A, the map Cm  B → E·τ (B) ∈ Dτ (X, Rm ) is continuous with a modulus of continuity ω(h) := [ι (c log h1 )]1/2 for some constant c = c(A) > 0 and for some deviation measure function ι = ι(A) from the corresponding set of LDT parameters. In fact, a stronger pointwise estimate holds: μ {x ∈ X : d(E·τ (B1 )(x), E·τ (B2 )(x)) > dist(B1 , B2 )α } < ω(dist(B1 , B2 )), for any B1 , B2 ∈ Cm in a neighborhood of A, and for some α = α(A) > 0. Moreover, the map Cm  A → E· (A) ∈ D(X, Rm ) is continuous everywhere. Proof The first two assertions are derived from the speed of convergence in Proposition 4.22 and the finite scale continuity in Proposition 4.23 in exactly the same way we derived the continuity of the most expanding direction in Theorem 4.6. The third assertion is an immediate consequence of Proposition 4.21. 

160

4 The Oseledets Filtration and Decomposition

References 1. L. Arnold, Random Dynamical Systems. Springer Monographs in Mathematics (Springer, Berlin, 1998). MR 1723992 (2000m:37087) 2. L. Backes, A note on the continuity of Oseledets subspaces for fiber-bunched cocycles (2015), pp. 1–6 3. J. Bochi, The multiplicative ergodic theorem of Oseledets. http://www.mat.uc.cl/jairo.bochi/ docs/oseledets.pdf (2008) (online lecture notes) 4. C. Bocker-Neto, M. Viana, Continuity of Lyapunov exponents for random 2D matrices (2010), pp. 1–38 (to appear in Ergodic Theory and Dynamical Systems) 5. I. Ya. Gol’dshe˘ıd, G.A. Margulis, Lyapunov exponents of a product of random matrices. Uspekhi Mat. Nauk 44, 5(269), 13–60 (1989). MR 1040268 (91j:60014) 6. S. Gouëzel, A. Karlsson, Subadditive and multiplicative ergodic theorems (2015), pp. 1–20 7. V.A. Ka˘ımanovich, Lyapunov exponents, symmetric spaces and a multiplicative ergodic theorem for semisimple Lie groups, Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. (LOMI) 164 (1987) no. Differentsialnaya Geom. Gruppy Li i Mekh. IX, 29–46, 196–197. MR 947327 (89m:22006) 8. Y. Katznelson, B. Weiss, A simple proof of some ergodic theorems. Israel J. Math. 42(4), 291–296 (1982). MR 682312 (84i:28020) 9. F. Ledrappier, L.-S. Young, Stability of Lyapunov exponents. Ergodic Theory Dyn. Syst. 11(3), 469–484 (1991). MR 1125884 (92i:58096) 10. R. Mañé, Ergodic theory and differentiable dynamics, in Ergebnisse der Mathematik und ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)], vol. 8 (Springer-Verlag, Berlin, 1987) (Translated from the Portuguese by Silvio Levy. MR 889254 (88c:58040)) 11. G. Ochs, Stability of Oseledets spaces is equivalent to stability of Lyapunov exponents. Dyn. Stab. Syst. 14(2), 183–201 (1999). MR 1698361 (2000g:37067) 12. V.I. Oseledec, A multiplicative ergodic theorem. Characteristic Ljapunov, exponents of dynamical systems. Trudy Moskov. Mat. Obšˇc. 19, 179–210 (1968). MR 0240280 (39 #1629) 13. K. Petersen, Ergodic Theory, Cambridge Studies in Advanced Mathematics, vol. 2 (Cambridge University Press, Cambridge, 1983). MR 833286 (87i:28002) 14. M.S. Raghunathan, A proof of Oseledec’s multiplicative ergodic theorem. Israel J. Math. 32(4), 356–362 (1979). MR 571089 (81f:60016) 15. D. Ruelle, Ergodic theory of differentiable dynamical systems. Inst. Hautes Études Sci. Publ. Math. (50), 27–58 (1979). MR 556581 (81f:58031) 16. D. Ruelle, Characteristic exponents and invariant manifolds in Hilbert space. Ann. Math. 115(2), 243–290 (1982). MR 647807 (83j:58097) 17. M. Viana, Lectures on Lyapunov Exponents (Cambridge University Press, Studies in Advanced Mathematics, Cambridge, 2014) 18. P. Walters, An introduction to ergodic theory, in Graduate Texts in Mathematics, vol. 79 (Springer, New York, 1982). MR 648108 (84e:28017) 19. P. Walters, A dynamical proof of the multiplicative ergodic theorem. Trans. Am. Math. Soc. 335(1), 245–257 (1993). MR 1073779 (93c:28016)

Chapter 5

Large Deviations for Random Cocycles

Abstract In this chapter we prove the continuity of all Lyapunov exponents, as well as the continuity of the Oseledets decomposition for a class of irreducible cocycles over strongly mixing Markov shifts. Moreover, gaps in the Lyapunov spectrum lead to a Hölder modulus of continuity for these quantities. This result is an application of the abstract continuity theorems obtained in previous chapters, and generalizes a theorem of E. Le Page on the Hölder continuity of the maximal LE for one-parameter families of strongly irreducible and contracting cocycles over a Bernoulli shift.

5.1 Introduction and Statements We introduce a class of locally constant cocycles over Markov shifts satisfying a strong mixing property. We formulate the results on the continuity of their Lyapunov exponents and Oseledets splitting. To prove them we need base and fiber-LDT estimates which are obtained through a spectral method of S.V. Nagaev, which we briefly explain here. We finish the section with a short revision of the bibliography on the subject of large deviations and other limit theorems for random linear cocycles.

5.1.1 Description of the Model Let Σ be a compact metric space and let F be its Borel σ -field. Definition 5.1 A Markov kernel is a function K : Σ × F → [0, 1] such that (1) for every x ∈ Σ, E → K (x, E) is a probability measure on Σ, also denoted by K (x), (2) for every E ∈ F, the function x → K (x, E) is F-measurable.

© Atlantis Press and the author(s) 2016 P. Duarte and S. Klein, Lyapunov Exponents of Linear Cocycles, Atlantis Studies in Dynamical Systems 3, DOI 10.2991/978-94-6239-124-6_5

161

162

5 Large Deviations for Random Cocycles

The iterated Markov kernels are defined recursively, setting (a) K 1 = K ,  (b) K n+1 (x, E) = Σ K n (y, E) K (x, dy), for all n ≥ 1. Each power K n is itself a Markov kernel on (Σ, F). A probability measure μ on (Σ, F) is called K -stationary if for all E ∈ F,  μ(E) =

K (x, E) μ(d x).

A set E ∈ F is said to be K -invariant when K (x, E) = 1 for all x ∈ E and K (x, E) = 0 for all x ∈ X \ E. A K -stationary measure μ is called ergodic when there is no K -invariant set E ∈ F such that 0 < μ(E) < 1. As usual, ergodic measures are the extremal points in the convex set of K -stationary measures. Definition 5.2 A Markov system is a pair (K , μ), where K is a Markov kernel on (Σ, F) and μ is a K -stationary probability measure. Let (K , μ) be a Markov system. There is a canonical construction, due to Kolmogorov, of a probability space (X, F , Pμ ) and a Markov stochastic process {en : X → Σ}n≥0 with initial distribution μ and transition kernel K , i.e., for all x ∈ Σ and E ∈ F, 1. Pμ [ e0 ∈ E ] = μ(E), 2. Pμ [ en ∈ E | en−1 = x ] = K (x, E). We briefly outline this construction. Elements in Σ are called states. Consider the space X + = Σ N of state sequences x = (xn )n∈N , with xn ∈ Σ for all n ∈ N, and let F + be the product σ -field F + = FN generated by the F-cylinders, i.e., generated by sets of the form C(E 0 , . . . , E m ) := { x ∈ X + : x j ∈ E j , for 0 ≤ j ≤ m }, where E 0 , . . . , E m ∈ F are measurable sets. The (topological) product space X + is compact and metrizable. The σ -field F + coincides with the Borel σ -field of the compact space X + . Definition 5.3 Given any probability measure θ on (Σ, F), the following expression determines a pre-measure P+ θ [C(E 0 , . . . , E m )] :=



 ··· Em

θ (d x0 ) E0

m 

K (x j−1 , d x j )

j=1

on the semi-algebra of F-cylinders. By Carathéodory’s extension theorem this pre+ + measure extends to a unique probability measure P+ θ on (X , F ).

5.1 Introduction and Statements

163

It follows from this definition that the sequence of random variables en : X + → Σ, defined by en (x) := xn for x = (xn )n∈N , is a Markov chain with initial distribution θ and transition kernel K w.r.t. the probability space (X + , F + , P+ θ ). It also follows that the process {en }n≥0 is stationary w.r.t. (X, F + , P+ ) if and only if θ is a K θ stationary measure. Consider now the space X = Σ Z of bi-infinite state sequences x = (xn )n∈Z , with xn ∈ Σ for all n ∈ Z, and let F be the product σ -field F = FZ generated by the F-cylinders in X . Again the topological product space X is both metrizable and compact, and the σ -field F is the Borel σ -field on the compact metric space X . There is a canonical projection π : X → X + , defined by π(xn )n∈Z = (xn )n∈N , relating these two spaces. Markov systems are probabilistic evolutionary models, which can also be studied in dynamical terms. For that we introduce the shift mappings. Definition 5.4 The one-sided shift is the map T : X + → X + , T (xn )n≥0 = (xn+1 )n≥0 , while the two-sided shift is the map T : X → X , T (xn )n∈Z = (xn+1 )n∈Z . The map T : X + → X + is continuous, and hence F + -measurable. It also + + preserves the measure P+ μ , i.e., T∗ Pμ = Pμ . Moreover, the Markov process {en }n≥0 on (X + , F + , P+ ) is dynamically generated by the observable e0 in the sense that μ en = e0 ◦ T n , for all n ≥ 0. The two-sided-shift T : X → X is a homeomorphism, and hence F bimeasurable. The projection π : X → X + semi-conjugates the two shifts. The two-sided-shift is the natural extension of the one-sided-shift. According to this construction (see [13]), there is a unique probability measure Pμ on (X, F ) such + that T∗ Pμ = Pμ and π∗ Pμ = P+ μ . We will refer to the measures Pμ and Pμ as the Kolmogorov extensions of the Markov system (K , μ). The expected value of a random variable ξ : X + → R w.r.t. a probability measure θ on (Σ, F) is denoted by  Eθ (ξ ) := ξ dP+ θ . X+

When θ = δx is a Dirac measure we will write Ex instead of Eδx . Definition 5.5 Given a Markov system (K , μ) let Pμ be the Kolmogorov extension of (K , μ) on X = Σ Z . The dynamical system (X, Pμ , T ) is called a Markov shift. Let (L ∞ (Σ), · ∞ ) denote the Banach algebra bounded F-measurable   of complex functions with the sup norm f ∞ = supx∈Σ  f (x). The following concept corresponds to condition (A1) in [1]. Definition 5.6 We say that a Markov system (K , μ) is strongly mixing if there are constants C > 0 and 0 < ρ < 1 such that for every f ∈ L ∞ (Σ), all x ∈ Σ and n ∈ N,

164

5 Large Deviations for Random Cocycles

 



 Σ

f (y) K n (x, dy) −

Σ

 f (y) μ(dy) ≤ C ρ n f ∞ .

It follows from this definition that, Proposition 5.1 If the Markov system (K , μ) is strongly mixing then the Markov shift (X, Pμ , T ) is a mixing dynamical system. Proof Consider a bounded measurable observable f : X → R depending only on the coordinates x0 , . . . , x p , and write f (x) = f (x0 , . . . , x p ). Let g(x) = g(x−q , . . . , x−1 ) be another bounded measurable observable depending only on the coordinates x−q , . . . , x−1 with q ∈ N. Denote by {en }n∈Z the Markov process on (X, Pμ ) with common distribution μ and transition kernel K . By the strong mixing property  Ex0 [ f (en , . . . , en+ p )] =

Σ

 ···

Σ

f (xn , . . . , xn+ p ) K n (x0 , d xn )

n+ p−1

K (x j , d x j+1 )

j=n

converges uniformely (in x0 ) to 



Σ

···



n+ p−1

Σ

f (xn , . . . , xn+ p ) μ(d xn )

K (x j , d x j+1 ) = Eμ ( f ).

j=n

Hence Eμ [( f ◦ T n ) g] = Eμ [g(e−q , . . . , e−1 ) f (en , . . . , en+ p )]   −1  ··· g(x−q , . . . , x−1 ) Ex0 [ f (en , . . . , en+ p )] μ(d x−q ) K (x j , d x j+1 ) = Σ

Σ

j=−q

converges to  Eμ ( f )

Σ

 ···

Σ

g(x−q , . . . , x−1 ) μ(d x−q )

−1 

K (x j , d x j+1 ) = Eμ ( f ) Eμ (g).

j=−q

The mixing property of the shift (X, Pμ , T ) follows applying the previous argument to the indicator functions of any two cylinders, because the σ -algebra of cylinders generates the Borel σ -field of X .  Examples of strongly mixing Markov systems arise naturally from Markov kernels satisfying the Doeblin condition (see [3]). We say that K satisfies the Doeblin condition if there is a positive finite measure ρ on (Σ, F) and some ε > 0 such that

5.1 Introduction and Statements

165

for all x ∈ Σ and E ∈ F, K (x, E) ≥ 1 − ε



ρ(E) ≥ ε.

Given E ∈ F, define L ∞ (E) := { f ∈ L ∞ (Σ) : f |Σ\E ≡ 0 }, which is a closed Banach sub-algebra of (L ∞ (Σ), · ∞ ). Proposition 5.2 Let K be a Markov kernel on (Σ, F). If K satisfies the Doeblin condition then there are sets Σ1 , . . . , Σm in F and probability measures ν1 , . . . , νm on Σ such that for all i, j = 1, . . . , m, Σi ∩ Σ j = ∅ when i = j, Σi is K -forward invariant, i.e., K (x, Σi ) = 1 for x ∈ Σi , νi is K -stationary and ergodic with νi (Σ j ) = δi j , lim n→+∞ K n (x, Σ1 ∪ · · · ∪ Σm ) = 1, with geometric uniform speed of convergence, for all x ∈ Σ, 5. ν(Σ1 ∪ · · · ∪ Σm ) = 1, for every K -stationary probability ν.

1. 2. 3. 4.

Moreover, for every 1 ≤ i ≤ m there is an integer pi ∈ N and measurable sets Σi,1 , . . . , Σi, pi ∈ F such that 1. {Σi,1 , . . . , Σi, pi } is a partition of Σi , 2. K (x, Σi, j+1 ) = 1 for x ∈ Σi, j and 1 ≤ j ≤ pi , with Σi, pi +1 = Σi,1 , 3. the Markov system (Σi, j , K pi ) is strongly mixing for all 1 ≤ j ≤ pi . Proof See [3, Sect. V-5]. Let (K , μ) be a Markov system. We introduce a space of measurable functions A : Σ × Σ → GL(m, R). Definition 5.7 The space B∞ m (K ) consists of all functions A : Σ ×Σ → GL(m, R) such that A and A−1 are both measurable and uniformly bounded. On this space we consider the metric d∞ (A, B) = A − B ∞ . Definition 5.8 The function A ∈ B∞ m (K ) determines a linear cocycle F A : X × Rm → X × Rm over the Markov shift (X, Pμ , T ), defined by FA (x, v) := (T x, A(x) v) , where we identify A with the function A : X → GL(m, R), A(x) := A(x0 , x1 ), for x = (xn )n∈Z ∈ X . These will be referred to as random Markov cocycles. The iterates of FA are the maps FAn : X × Rm → X × Rm , FAn (x, v) = (T n x, A(n) (x) v),

166

5 Large Deviations for Random Cocycles

with A(n) : X → GL(m, R) defined for all x = (xn )n∈Z by A(n) (x) := A(xn−1 , xn ) . . . A(x1 , x2 ) A(x0 , x1 ). The cocycle FA is determined by the data (K , μ, A), and identified by the function A whenever the Markov system (K , μ) is fixed. Definition 5.9 Let Gr(Rm ) denote the Grassmann manifold of the Euclidean space Rm . An F-measurable section V : Σ → Gr(Rm ) is called A-invariant when A(xn−1 , xn ) V (xn−1 ) = V (xn ) for Pμ -a.e. x = (xn )n∈Z . Assuming (K , μ) is strongly mixing, the ergodicity of this Markov kernel implies that the subspaces V (x) have constant dimension μ-a.e., denoted by dim(V ). We say that this family is proper if 0 < dim(V ) < d. Next we introduce the concepts of irreducible and totally irreducible cocycle (see Definition 2.7 in [1]). Definition 5.10 A cocycle A ∈ B∞ m (K ) is called irreducible w.r.t. (K , μ) if it admits no measurable proper A-invariant section V : Σ → Gr(Rm ). A cocycle A ∈ B∞ m (K ) is called totally irreducible w.r.t. (K , μ) if the exterior powers ∧k A are irreducible for all 1 ≤ k ≤ m − 1. We denote by Im∞ (K ) the subspace of totally irreducible cocycles in B∞ m (K ). Proposition 5.3 The subspace Im∞ (K ) is open in B∞ m (K ). Proof A cocycle A ∈ B∞ m (K ) is reducible (i.e. not irreducible) if it admits a measurable proper A-invariant section V : Σ → Gr(Rm ). It is enough to prove that the set of reducible cocycles is closed. Let Ak → A be a convergent sequence of reducible cocycles in B∞ m (K ), and let Vk : Σ → Gr(Rm ) be a measurable proper Ak -invariant section. We will prove that A is also reducible. Assume that (Σ, μ) is a complete probability space. Let Ω ⊂ X be a Borel measurable set with Pμ (Ω) = 1 such that for all k ≥ 1 all x = (xn )n∈Z ∈ Ω and n ∈ Z, Ak (xn−1 , xn ) Vk (xn−1 ) = Vk (xn ). Fix any point s0 ∈ Σ. Extracting a subsequence we may assume that Vk (s0 ) converges to V0 ∈ Gr(Rm ) as k tends to ∞. Consider then the set E := { s ∈ Σ : ∃ x ∈ Ω, n ∈ N such that x0 = s0 and xn = s }. In general E may fail to be a Borel set, but it is an analytic set in the sense of descriptive set theory (see [8, Definition 14.1 and Exercise 14.3]). By [8, Theorem 21.10] this set is universally measurable, and in particular it is measurable w.r.t. μ. Hence, because of the strong mixing property,

5.1 Introduction and Statements

167

μ(Σ \ E) = lim Es0 [1Σ\E (en )] = lim Ps0 { x ∈ Ω : en (x) ∈ Σ \ E } = lim Ps0 (∅) = 0 , n→∞

n→∞

n→∞

which proves that for μ-a.e. s ∈ Σ there exists a sequence x ∈ Ω such that x0 = s0 and xn = s for some n ∈ N. Then Vk (s) = Ak (xn−1 , xn ) . . . Ak (x1 , x2 ) Ak (x0 , x1 ) Vk (s0 ), which implies that the sequence Vk (s) converges to A(xn−1 , xn ) . . . A(x1 , x2 ) A(x0 , x1 ) V0 when k → ∞. Thus, Vk (s) converges for μ-a.e. s ∈ Σ, and the limit function V (s) = limk→∞ Vk (s) is a measurable and proper A-invariant section, with the same dimen sion as the sections Vk . This proves that the cocycle A is reducible. For the reader’s convenience we briefly recall some definitions and notations regarding the Lyapunov exponents, Oseledets filtrations and decompositions of a cocycle A in any space of cocycles Cm . The ergodic theorem of Kingman allows us to define the Lyapunov exponents L j (A) with 1 ≤ j ≤ m as L j (A) := Λ j (A) − Λ j−1 (A) where Λ j (A) := lim

n→∞

1 log ∧ j A(x)

n

for μ-a.e. x ∈ X.

Let τ = (1 ≤ τ1 < · · · < τk < m) be a signature. If A ∈ Cm has a τ -gap pattern, i.e., L τ j (A) > L τ j+1 (A) for all j, we define the Lyapunov τ -block Λτ (A) := (Λτ1 (A), . . . , Λτk (A)) ∈ Rk . A flag of Rm is any increasing sequence of linear subspaces. The corresponding sequence of dimensions is called its signature. A measurable filtration is a measurable function on X , taking values in the space of flags of Rm with almost sure constant signature. We denote by F(X, Rm ) the Note that the Oseledets filtration of A, which we denote by F(A), is an element of this space. We denote by F⊃τ (X, Rm ) the subset of measurable filtrations with a signature τ or finer. If F ∈ F⊃τ (X, Rm ) there is a natural projection F τ with signature τ , obtained from F by simply ‘forgetting’ some of its components. This space is endowed with the following pseudo-metric 



dτ (F τ (x), (F  )τ (x)) μ(d x),

distτ (F, F ) := X

where dτ refers to the metric on the τ -flag manifold. On the space F(X, Rm ) we consider the coarsest topology that makes the sets F⊃τ (X, Rm ) open, and the pseudo-metrics dist τ continuous. A decomposition of Rm is a sequence of linear subspaces {E j }1≤ j≤k+1 whose direct sum is Rm . This determines the flag E 1 ⊂ E 1 ⊕ E 2 ⊂ · · · ⊂ E 1 ⊕ · · · ⊕ E k , whose signature τ also designates the signature of the decomposition.

168

5 Large Deviations for Random Cocycles

A measurable decomposition is a measurable function on X , taking values in the space of decompositions of Rm with almost sure constant signature. We denote by D(X, Rm ) the space of measurable decompositions. Note that the Oseledets decomposition of A, which we denote by E · (A), is an element of this space. We denote by D⊃τ (X, Rm ) the subset of measurable decompositions with a signature τ or finer. If E · ∈ D⊃τ (X, Rm ) there is a natural restriction E ·τ with signature τ , obtained from E · by simply ‘patching up’ the appropriate components. This space is endowed with the following pseudo-metric distτ (E · , E · ) :=

 X

dτ (E ·τ (x), (E · )τ (x)) μ(d x),

where dτ refers to the metric on the manifold of τ -decompositions. On the space D(X, Rm ) we consider the coarsest topology that makes the sets D⊃τ (X, Rm ) open, and the pseudo-metrics dist τ continuous. We are ready to state a general result on the continuity of the LE, the Oseledets filtration and the Oseledets decomposition for irreducible Markov cocycles. Theorem 5.1 Let (K , μ) be a strongly mixing Markov system and let m ≥ 1. Then all Lyapunov exponents L j : Im∞ (K ) → R, with 1 ≤ j ≤ m, the Oseledets filtration F : Im∞ (K ) → F(X, Rm ), and the Oseledets decomposition E · : Im∞ (K ) → D(X, Rm ), are continuous functions of the cocycle A ∈ Im∞ (K ). Moreover, if A ∈ Im∞ (K ) has a τ -gap pattern then the functions Λτ , F τ and E ·τ are Hölder continuous in a neighborhood of A. These theorems are proved in Sect. 5.4.1. They are applications of Theorem 3.1 in Chap. 3, and Theorem 4.7 and 4.8 in Chap. 4. The main ingredients in these applications are two theorems on base and fiber uniform LDT estimates of exponential type that we now formulate. We begin with the base LDT theorem. Consider the metric d : X × X → [0, 1]  x  ) := 2− inf{ |k| : k∈Z, xk =xk } , d(x, for all x = (xk )k∈Z and x  = (xk )k∈Z in X .  is not compact. Remark 5.1 Notice that unless Σ is finite (X, d) Given k ∈ N, α > 0 and f ∈ L ∞ (X ) define    y) ≤ 2−k }, vk ( f ) := sup{  f (x) − f (y) : d(x, vα ( f ) := sup{ 2αk vk ( f ) : k ∈ N},

f α := f ∞ + vα ( f ), Hα (X ) := { f ∈ L ∞ (X ) : vα ( f ) < +∞ }.

(5.1)

5.1 Introduction and Statements

169

The last set, Hα (X ), is the space of Hölder continuous functions with exponent α w.r.t. the distance d on X . In fact it follows easily from the definition that    f (x) − f (x  ) . vα ( f ) = sup  x  )α d(x, x=x  From now on we denote by 1 the constant function 1. Proposition 5.4 For all 0 ≤ α ≤ 1, (Hα (X ), · α ) is a Banach algebra with unit element 1, and also a lattice. Proof To see that (Hα (X ), · α ) is a normed algebra with unity, it is enough to verify the following inequalities: vk ( f g) ≤ f ∞ vk (g) + g ∞ vk ( f ), vα ( f g) ≤ f ∞ vα (g) + g ∞ vα ( f ). They imply that

f g α ≤ f α g α , and clearly 1 α = 1 ∞ + vα (1) = 1 + 0 = 1. The proof that (Hα (X ), · α ) is a lattice and a Banach space is left as an exercise.  Definition 5.11 We say that f : X → C is future independent if f (x) = f (y) for any x, y ∈ X such that xk = yk for all k ≤ 0. Define the space Hα (X − ) := { f ∈ Hα (X ) : f is future independent } .

(5.2)

The space Hα (X − ) is a closed sub-algebra of Hα (X ), and hence a unital Banach algebra itself. Denote by F − the sub σ -field of F generated by cylinders in non-positive coordinates. With this terminology, the subspace Hα (X − ) consists of all F − -measurable functions in Hα (X ). The base LDT theorem below is proved in Sect. 5.3.1. Theorem 5.2 Let (K , μ) be a strongly mixing Markov system. For any 0 < α ≤ 1 and ξ ∈ Hα (X − ) there exist C = C(ξ ) < ∞, k = k(ξ ) > 0 and ε0 = ε0 (ξ ) > 0 such that for all 0 < ε < ε0 , x ∈ Σ and n ∈ N, ⎤ n−1  1 2 ξ ◦ T j − Eμ (ξ ) > ε ⎦ ≤ C e−k ε n . Pμ ⎣  n j=0 ⎡

Moreover, the constants C, k and ε0 depend only on K and ξ α , and if K is fixed then they are uniform in ξ ranging over any bounded set in Hα (X − ).

170

5 Large Deviations for Random Cocycles

The fiber LDT theorem, proved in Sect. 5.3.2, has the following statement. Theorem 5.3 Given a Markov system (K , μ) and A ∈ B∞ m (K ), assume (1) (K , μ) is strongly mixing, (2) A is irreducible, (3) L 1 (A) > L 2 (A). Then there exists V neighborhood of A in B∞ m (K ) and there exist C = C(A) < ∞, k = k(A) > 0 and ε0 = ε0 (A) > 0 such that for all 0 < ε < ε0 , B ∈ V and n ∈ N, Pμ

 1  log B (n) − L 1 (B) > ε n

≤ C e−k ε n . 2

5.1.2 The Spectral Method Consider a Markov system (K , μ) on a compact metric space Σ. Given some F-measurable observable ξ : Σ → R, let ξˆ : X + → R be the F + -measurable function ξˆ (x) = ξ(x0 ). + + Given x ∈ Σ, let P+ x denote the probability on the measurable space (X , F ) + that makes {en : X → Σ}n≥0 a Markov process with transition kernel K and initial distribution with point mass δx (see Definition 5.3). Then {ξˆ ◦T n }n≥0 is also a Markov process on (X + , F + , P+ x ). Definition 5.12 We call sum process of the observable ξ : X + → R the following sequence of random variables {Sn (ξ )}n≥0 on (X + , F + ), Sn (ξ )(x) :=

n−1 j=0

ξˆ ◦ T j (x) =

n−1

ξ(x j ).

j=0

Definition 5.13 An observed Markov system on (Σ, F) is a triple (K , μ, ξ ) where (K , μ) is a Markov system on (Σ, F), and ξ : Σ → R is an F-measurable function. Definition 5.14 We say that ξ satisfies LDT estimates of exponential type if there exist positive constants C, k and ε0 such that for all n ∈ N, 0 < ε < ε0 and x ∈ Σ, P+ x



1  y ∈ X :  Sn (ξ )(y) − Eμ (ξ ) > ε n +



≤ C e−n k ε . 2

Given a class X of observed Markov systems (K , μ, ξ ) on a given measurable space (Σ, F), we say that X satisfies uniform LDT estimates of exponential type if there exist positive constants C, k and ε0 such that for every observed Markov system (K , μ, ξ ) ∈ X, the observable ξ satisfies LDT estimates of exponential type with the same constants C, k and ε0 .

5.1 Introduction and Statements

171

Definition 5.15 Let η : X + → R be a random variable on (X + , F + ). The function c(η, x, ·) : R → R, c(η, x, t) := log Ex [et η ] is called the second characteristic function of η, also known as the cumulant generating function of η (see [11]). Proposition 5.5 Let η : X + → R be a F + -measurable random variable. Assume there exist a, M > 0 such that for all x ∈ Σ, max{Ex [ea η ], Ex [ |η| ea η ]} ≤ M. Then the cumulant generating function c(η, x, ·) satisfies (1) (2) (3) (4) (5)

c(η, x, t) is well-defined and analytic for t ∈ (−a, a), c(η, x, 0) = 0, dc (η, x, 0) = Ex (η), dt c(η, x, t) ≥ t Ex (η), for all t ∈ (−a, a), the function c(η, x, ·) : (−a, a) → R, t → c(η, x, t) is convex.

Proof For (1) notice that the assumptions imply that the parametric integral Ex (e z η ) and its formal derivative Ex (e z η η) are well-defined continuous functions on the disk |z| < a. Since c(η, x, 0) = log Ex (1) = log 1 = 0, (2) follows. Property (3) holds (η, x, 0) = Ex (η 1)/Ex (1) = Ex (η). The convexity (5) follows by Hölder because dc dt inequality, with conjugate exponents p = 1/s and q = 1/(1 − s), where 0 < s < 1. In fact, for all t1 , t2 ∈ R, s  t2 η 1−s  e ] c(η, x, s t1 + (1 − s) t2 ) = log Ex [ et1 η     t1 η s t2 η 1−s ≤ log Ex [e ] Ex [e ] = s c(η, x, t1 ) + (1 − s) c(η, x, t2 ). Finally, (2), (3) and (5) imply (4).



Given an observable ξ : Σ → R, the function cn (ξ, x, ·) : R → R defined by cn (ξ, x, t) := log Ex [et Sn (ξ ) ], is the cumulant generating function of Sn (ξ ). Under general conditions, e.g., if ξ is bounded, this function is analytic in C, or at least analytic in a neighbourhood of 0. Let us write Da (0) = { z ∈ C : |z| < a }. Definition 5.16 We call limit cumulant generating function of the process {Sn (ξ )}n≥0 any function c(ξ, ·) : Da (0) → C such that there exist a constant C > 0 and a numeric sequence {δn }n≥0 for which the following properties hold: (1) (2) (3)

c n (ξ, ·) is well defined and analytic on Da (0), for all n ∈ N,   n c(ξ, z) − cn (ξ, x, z) ≤ C z  + δn , for all n ∈ N, z ∈ Da (0) and x ∈ Σ, limn→+∞ δn = 0.

172

5 Large Deviations for Random Cocycles

Before discussing why they exist, let us draw some conclusions from the existence of limit cumulant generating functions. Proposition 5.6 Given an F-measurable observable ξ : Σ → R, let c(ξ, z) be a limit cumulant generating function of the process {Sn (ξ )}n≥0 on Da (0). Then (1) (2) (3) (4) (5)

z → c(ξ, z) is analytic on Da (0), c(ξ, 0) = 0, dc (ξ, 0) = Eμ (ξˆ ), dt c(ξ, t) ≥ t Eμ (ξˆ ), for all t ∈ R, the function c(ξ, ·) : (−a, a) → R, t → c(ξ, t), is convex.

Proof The function c(ξ, z) is analytic on Da (0) because it is the uniform limit of the sequence of analytic functions n1 cn (ξ, x, z). This proves (1). Item (2) follows directly from Proposition 5.5(2). Consider now the sequence of analytic functions  cˆn (ξ, z) := Then d cˆn (ξ, 0) = dt

 Σ

Σ

cn (ξ, x, z) dμ(x).

Ex [Sn (ξ )] dμ(x) = Eμ (ξˆ ).

Taking the limit identity (3) holds. Since convexity is a closed property, (5) follows from Proposition 5.5(5). Finally, (2), (3) and (5) imply (4).



The next proposition relates the existence of a limit cumulant generating function for the process {Sn (ξ )}n≥0 with LDT estimates of exponential type for ξ . Proposition 5.7 Let ξ : Σ → R be an F-measurable observable and let c(ξ, z) be a limit cumulant generating function of the process {Sn (ξ )}n≥0 on Da (0). 2 Given h > ddt 2c (ξ, 0), there exist C, ε0 > 0 such that for all n ∈ N, x ∈ Σ and 0 < ε < ε0 ,

  ε2 + 1  ˆ Sn (ξ ) − Eμ (ξ ) > ε ≤ C e−n 2h . Px n In other words, ξ satisfies LDT estimates of exponential type. Proof Let us abbreviate c(t) = c(ξ, t). We can assume that c (0) = Eμ (ξˆ ) = 0. Otherwise we would work with ξ  = ξ − Eμ (ξ ) 1, for which Eμ (ξˆ  ) = 0. Notice that the normalized process {Sn (ξ  )}n≥0 admits the limit cumulant generating function c(ξ  , t) = c(t) − t Eμ (ξ ) = c(t) − t c (0).

5.1 Introduction and Statements

173

Since h > c (0), we can choose 0 < t0 < a such that for all t ∈ (−t0 , t0 ), 0 ≤ c(t)
nε ] ≤ e

Given 0 < ε < ε0 := h t0 , pick t = function g(t) = e



2 tε− h 2t

ε h

C0 −n e 2

  2 tε− h 2t

.

∈ (0, t0 ). This choice of t minimizes the

. For this value of t we obtain P+ x [ Sn (ξ ) > nε ] ≤

C 0 − ε2 n e 2h . 2

We can derive the same conclusion for −ξ , because c(ξ, −t) is a limit cumulant generating function of the process {Sn (−ξ )}n≥0 , P+ x [ Sn (ξ ) < −nε ] = Px [ Sn (−ξ ) > nε ] ≤

1 ε2 C 0 e− 2 h n . 2

Thus, for all x ∈ Σ, 0 < ε < ε0 and n ∈ N, ε2

− 2h n P+ . x [ |Sn (ξ )| > nε ] ≤ C 0 e



Remark 5.2 To obtain a sharp upper bound on the rate function for the large deviations of the process Sn (ξ ) we should have used the Legendre transform of the convex function c(t)−t c (0). Here because we do not care about sharp estimates, but mainly to avoid dealing with the degenerate case where c(t) is not strictly convex, we have 2 replaced c(t) − t c (0) with its upper bound h2t on the small neighborhood (−t0 , t0 ), which is always strictly convex. Consider now a topological space X of observed Markov systems (K , μ, ξ ), on a given measurable space (Σ, F). Denote by H(Da (0)) the Banach space of analytic functions f : Da (0) → C with a continuous extension up to its boundary. Endow this space with the usual max norm f ∞ = max|z|≤a | f (z)|.

174

5 Large Deviations for Random Cocycles

Corollary 5.1 Assume there is continuous map c : X → H(Da (0)) such that (a) for each (K , μ, ξ ) ∈ X, the function c(ξ, z) := c(K , μ, ξ )(z) is a limit cumulant generating function of the process {Sn (ξ )}n≥0 on Da (0), (b) the parameters C and δn in Definition 5.16 can be chosen uniformly in X. Then (1) For each (K , μ, ξ ) ∈ X there exists a neighborhood V in X such that V satisfies uniform LDT estimates of exponential type. 2 (2) If there exists h > 0 such that dtd 2 c(ξ, 0) < h for all (K , μ, ξ ) ∈ X then X satisfies uniform LDT estimates of exponential type. Proof Given (K 0 , μ0 , ξ0 ) ∈ X, let c0 (t) := c(K 0 , μ0 , ξ0 )(t), and take h > c0 (0). By continuity of c : X → H(Da (0)) there exist a neighborhood V of (K 0 , μ0 , ξ0 ) in X and t0 > 0 such that for any (K , μ, ξ ) ∈ V, the function c(ξ, z) := c(K , μ, ξ )(z) satisfies for all t ∈ (−t0 , t0 ), c(ξ, t) − t

dc h t2 (ξ, 0) < . dt 2

The argument used to prove Proposition 5.7 shows that V satisfies uniform LDT estimates of exponential type.  The strategy to meet the assumptions of Corollary 5.1, i.e., to prove the existence of a limit cumulant generating function for the process {Sn (ξ )}n≥0 , is a spectral method that we describe now. Define a family of Laplace-Markov operators  (Q t f )(x) = (Q K ,ξ,t f )(x) :=

Σ

f (y) et ξ(y) K (x, dy),

on some appropriate Banach space B, embedded in L ∞ (Σ, F), and containing the ˆ constant functions. Notice that by definition (Q t 1)(x) = Ex [et ξ ]. Hence, iterating this relation we obtain the following formula for the moment generating function of Sn (ξ ): for all x ∈ Σ and n ∈ N, Ex [et Sn (ξ ) ] = (Q nt 1)(x). For t = 0, the operator Q 0 : B → B, is a Markov operator. In particular it is a positive operator which fixes the constant functions, e.g., Q 0 1 = 1, and whose spectrum is contained in the closed unit disk. The key ingredient to estimate the moment generating function Ex [et Sn (ξ ) ] via this spectral approach is the assumption that the operator Q 0 : B → B is quasi-compact and simple. This means that the eigenvalue 1 of Q 0 is simple and there exists a spectral gap separating this eigenvalue from the rest of spectrum inside the open unit disk. Under this hypothesis, Q t is a positive operator, whenever defined, and there exists a unique eigenfunction v(t) ∈ B

5.1 Introduction and Statements

175

such that Q t v(t) = λ(t) v(t), normalized by Eμ [v(t)] = 1, and corresponding to a positive eigenvalue λ(t) of Q t . Hence, because the functions t → λ(t) and t → v(t) are continuous in t (in fact analytic), we have Eμ [et Sn (ξ ) ] =



 (Q nt 1) dμ ≈

 Q nt v(t) dμ =

λ(t)n v(t) dμ = λ(t)n .

From this relation we infer that c(t) = log λ(t) is a limit cumulant generating function for the process Sn (ξ ). Therefore, by Proposition 5.7, ξ satisfies LDT estimates of exponential type. To obtain uniform LDT estimates, through Corollary 5.1, we assume some weak continuous dependence of the family of operators t → Q K ,ξ,t on the observed Markov system (K , μ, ξ ), which implies that the eigenvalue function λ(t) ∈ H(Da (0)) also depends continuously on (K , μ, ξ ).

5.1.3 Literature Review We mention briefly some of the origins of this subject. One is the proof by H. Furstenberg and H. Kesten of a law of large numbers for random i.i.d. products of matrices [4], which was later abstracted by Furstenberg to a seminal theory on random products in semisimple Lie groups [5]. In this context, a first central limit theorem was proved by Tutubalin in [16]. Since its origin, the scope of Furstenberg’s theory has been greatly extended by many contributions (see for instance [6, 14]). Another source is a central limit theorem of S.V. Nagaev for stationary Markov chains (see [12]). In his approach Nagaev uses the spectral properties of a quasicompact Markov operator acting on some space of bounded measurable functions. This method was used by E. Le Page to obtain more general central limit theorems, as well as a large deviation principle for random i.i.d. products of matrices [10]. Later P. Bougerol extended Le Page’s approach, proving similar results for Markov type random products of matrices (see [1]). The book of Bougerol and Lacroix [2], on random i.i.d. products of matrices, is an excellent introduction on the subject in [1, 10]. More recently, the book of Hennion and Hervé [7] describes a powerful abstract setting where the method of Nagaev can be applied to derive limit theorems. It contains several applications, including to dynamical systems and linear cocycles, that illustrate the method.

5.2 An Abstract Setting In this section we specialize an abstract setting in [7], from which we derive an abstract theorem on the existence of uniform LDT estimates for Markov processes.

176

5 Large Deviations for Random Cocycles

5.2.1 The Assumptions Let B be a Banach space and let L (B) denote the Banach algebra of bounded linear operators T : B → B. Given T ∈ L (B), we denote its spectrum by σ (T ), and its spectral radius by ρ(T ) = lim T n 1/n = inf T n 1/n . n→+∞

n≥0

Definition 5.17 The operator T is called quasi-compact if there is a T -invariant decomposition B = F ⊕ H such that dim F < +∞ and the spectral radius of T |H is (strictly) less than the absolute value |λ| of any eigenvalue λ of T | F . T is called quasi-compact and simple when furthermore dim F = 1. In this case σ (T | F ) consists of a single simple eigenvalue referred to as the maximal eigenvalue of T . Consider a Markov system (K , μ) on a compact metric space Σ. Definition 5.18 The following linear operator is called a Markov operator  (Q f )(x) = (Q K f )(x) :=

Σ

f (y) K (x, dy).

This operator acts on F-measurable functions on Σ, mapping L p functions to L p functions, for any 1 ≤ p ≤ ∞. It is easy to verify that for all n ≥ 1 and f ∈ L ∞ (Σ),  (Q nK f )(x) =

Σ

f (y) K n (x, dy).

Also, a probability measure μ on (Σ, F) is K -stationary if and only if 

 Σ

(Q K f ) dμ =

f dμ, Σ

for all f ∈ L ∞ (Σ).

We shall write Q instead of Q K when the kernel K is fixed. The operator Q : L ∞ (Σ) → L ∞ (Σ) satisfies the following. Proposition 5.8 For any f ∈ L ∞ (Σ), (a) Q1 = 1,  (b) Σ Q f dμ = Σ f dμ, (c) Q f ∞ ≤ f ∞ .

 In particular if H0 = { f ∈ L ∞ (Σ) : Σ f dμ = 0} then L ∞ (Σ) = R1 ⊕ H0 is a Q-invariant decomposition and ρ(Q| H0 ) ≤ 1.  Proof Since Σ K (x, dy) = 1, items (a) and (c) follow. Item (b) is a consequence of μ being a K -stationary probability measure. 

5.2 An Abstract Setting

177

Definition 5.19 The following linear operator is called a Laplace-Markov operator  (Q ξ f )(x) = (Q K ,ξ f )(x) :=

Σ

f (y) eξ(y) K (x, dy).

It also operates on F-measurable functions on Σ, but the domain of Q ξ depends also on the observable ξ . Proposition 5.9 Given a Markov kernel K on (Σ, F) the following are equivalent: (a) there is a K -stationary measure μ such that (K , μ) is strongly mixing, (b) Q K : L ∞ (Σ) → L ∞ (Σ) is quasi-compact and simple. Proof If (K , μ) is strongly mixing, by Definition 5.6 there exist constants C > 0 and 0 < ρ < 1 such that for all f ∈ L ∞ (Σ),

(Q K )n f −  f, μ 1 ∞ ≤ C ρ n f ∞ . Defining H0 = { f ∈ L ∞ (Σ) :  f, μ = 0 }, since (Q K )∗ μ = μ, this subspace is Q K -invariant. Thus, we have a Q K -invariant decomposition L ∞ (Σ) = 1 ⊕ H0 such that (Q K )n | H0 ≤ C ρ n . This implies that ρ(Q K | H0 ) ≤ ρ < 1. Conversely, if Q K : L ∞ (Σ) → L ∞ (Σ) is quasi-compact and simple, there exists a Q K -invariant decomposition L ∞ (Σ) = 1 ⊕ H0 such that ρ(Q K | H0 ) < 1. By the Hahn-Banach Theorem there is a bounded linear functional Λ : L ∞ (Σ) → R such that Λ(1) = 1, and Λ( f ) = 0 for all f ∈ H0 . We claim that Λ is a positive functional, i.e., Λ( f ) ≥ 0 whenever f ≥ 0. Take any function f ∈ L ∞ (Σ) such that f ≥ 0, and write f = c 1 + h with h ∈ H0 . Since Q K is a positive operator we have c 1 = lim (c 1 + (Q K )n h) = lim (Q K )n f ≥ 0, n→+∞

n→+∞

which implies that c = Λ( f ) ≥ 0. Hence Λ is positive. By the Riez-Markov Kakutani Theorem there is a probability measure μ on Σ such that Λ( f ) = Σ f dμ, for all f ∈ L ∞ (Σ). Let us prove that μ is K -stationary. Given f ∈ L ∞ (Σ), write f = c 1 + h, with h ∈ H0 . Hence Q K f = c 1 + Q K h, with Q K h ∈ H0 , and 

 Σ

(Q K f ) dμ = Λ(Q K f ) = c = Λ( f ) =

f dμ. Σ

This proves that μ is K -stationary. Now, because H0 is the kernel of Λ : L ∞ (Σ) → R, we get that for all f ∈ ∞ L (Σ), f ∈ H0 ⇔  f, μ = 0. Thus f −  f, μ 1 ∈ H0 , and taking ρ(Q K | H0 ) ≤

178

5 Large Deviations for Random Cocycles

ρ < 1, there is a constant C > 0 such that

(Q K )n f −  f, μ 1 ∞ = (Q K )n [ f −  f, μ 1] ∞ ≤ C ρ n f −  f, μ 1 ∞ ≤ 2 C ρ n f ∞ . This proves that (K , μ) is strongly mixing.



We now discuss a setting, consisting of the assumptions (B1)–(B7) and (A1)–(A4) below, where an abstract LDT theorem is proved, and from which Theorems 5.2 and 5.3 will be deduced. The context here specializes a more general setting in [7]. Let (X, dist) be a metric space of observed Markov systems (K , μ, ξ ) over the compact metric space (Σ, d). Besides X, this setting consists of a scale of complex Banach algebras (Bα , · α ) indexed in α ∈ [0, 1], where each Bα is a space of bounded Borel measurable functions on Σ. We assume that there exist seminorms vα : Bα → [0, +∞) such that for all 0 ≤ α ≤ 1, (B1) (B2) (B3) (B4)

f α = vα ( f ) + f ∞ , for all f ∈ Bα , B0 = L ∞ (Σ), and · 0 is equivalent to  ∞,  ·

Bα is a lattice, i.e., if f ∈ Bα then f ,  f  ∈ Bα , Bα is a Banach algebra with unity 1 ∈ Bα and vα (1) = 0.

Assume also that this family is a scale of normed spaces in the sense that (see [9]) for all 0 ≤ α0 < α1 < α2 ≤ 1 (B5) Bα2 ⊂ Bα1 ⊂ Bα0 , (B6) vα0 ( f ) ≤ vα1 ( f ) ≤ vα2 ( f ), for all f ∈ Bα2 , α2 −α1

α1 −α0

(B7) vα1 ( f ) ≤ vα0 ( f ) α2 −α0 vα2 ( f ) α2 −α0 , for all f ∈ Bα2 . The next proposition shows that an example of a scale of Banach algebras satisfying (B1)–(B7) are the spaces of α-Hölder continuous functions on (Σ, d). The norms on these spaces are defined as follows: for all α ∈ (0, 1] and f ∈ L ∞ (Σ), let    f (x) − f (y)

f α := vα ( f ) + f ∞ , with vα ( f ) := sup . d(x, y)α x,y∈Σ x= y

Proposition 5.10 If (Σ, d) has diameter ≤ 1 then the family of spaces Hα (Σ) := { f ∈ L ∞ (Σ) : vα ( f ) < +∞ }, α ∈ [0, 1] satisfies (B1)–(B7). Proof (B1) holds by definition of the Hölder norm · α . For (B2) notice that v0 ( f ) measures the oscillation of f , and hence v0 ( f ) ≤ 2 f ∞ . Property (B3) is obvious. Item (B4) follows from the inequality

5.2 An Abstract Setting

179

vα ( f g) ≤ f ∞ vα (g) + g ∞ vα ( f ), that holds for all f, g ∈ L ∞ (Σ). The monotonicity properties (B5) and (B6) are straightforward to check. The function α → log vα ( f ) is convex. In fact, given α1 , α2 , s ∈ [0, 1], | f (x) − f (y)|s+(1−s) s α +(1−s) α2 x= y d(x, y) 1  s  1−s | f (x) − f (y)| | f (x) − f (y)| ≤ log sup sup d(x, y)α1 d(x, y)α2 x= y x= y

log vs α1 +(1−s) α2 ( f ) = log sup

= s log vα1 ( f ) + (1 − s) log vα2 ( f ). 

Item (B7) follows from this convexity.

We now adopt a second set of assumptions that rule the action of the Markov and Laplace-Markov operators associated to observed Markov systems (K , μ, ξ ) ∈ X on the Banach algebras Bα . Assume there exists an interval [α1 , α0 ] ⊂ (0, 1] with α1 < α20 such that for all α ∈ [α1 , α0 ] the following properties hold: (A1) (K , μ, −ξ ) ∈ X whenever (K , μ, ξ ) ∈ X. (A2) The Markov operators Q K : Bα → Bα are uniformly quasi-compact and simple. More precisely, there exist constants C > 0 and 0 < σ < 1 such that for all (K , μ, ξ ) ∈ X and f ∈ Bα ,

Q nK f −  f, μ1 α ≤ C σ n f α . (A3) The operators Q K ,z ξ act continuously on the Banach algebras Bα , uniformly in (K , μ, ξ ) ∈ X and z small. More precisely, we assume there are constants b > 0 and M > 0 such that for i = 0, 1, 2, |z| < b and f ∈ Bα , Q K ,z ξ ( f ξ i ) ∈ Bα

and

Q K ,z ξ ( f ξ i ) α ≤ M f α .

(A4) The family of functions X  (K , μ, ξ ) → Q K ,z ξ , indexed in |z| ≤ b, is Hölder equi-continuous in the sense that there exists 0 < θ ≤ 1 such that for all |z| ≤ b, f ∈ Bα and (K 1 , μ1 , ξ1 ), (K 2 , μ2 , ξ2 ) ∈ X,

Q K 1 ,z ξ1 f − Q K 2 ,z ξ2 f ∞ ≤ M f α dist((K 1 , μ1 , ξ1 ), (K 2 , μ2 , ξ2 ))θ . The interval [α1 , α0 ] will be called the range of the scale of Banach algebras. In the fiber LDT theorem we will need to take α0 small enough to have contraction in (A2), but at the same time we need α1 bounded away from 0 to have uniformity in this contraction. The need for the condition α1 < α20 is explained below (see Remark 5.3).

180

5 Large Deviations for Random Cocycles

The positive constants C, σ , M, b and θ above will be called the setting constants. Examples of contexts satisfying all assumptions (B1)–(B7) and (A1)–(A4) are provided by the applications in Sects. 5.3.1 and 5.3.2. The symmetry assumption (A1) allows us to reduce deviations below average to deviations above average, thus shortening the arguments. (A2) is the main assumption: all Markov operators Q K : Bα → Bα are quasicompact and simple, uniformly in (K , μ, ξ ) ∈ X. This will imply that, by possibly decreasing b, all Laplace-Markov operators Q K ,z ξ : B  α → Bα are also quasicompact and simple, uniformly in (K , μ, ξ ) ∈ X and z  < b. (A3) is a regularity assumption. The operators Q K ,z ξ act continuously on Bα ,  uniformly in (K , μ, ξ ) ∈ X and z  < b. Moreover, it implies that Db  z → Q K ,z ξ ∈ L (Bα ), is an analytic function. Finally, (A4) implies that the function (K , μ, ξ ) → λ K ,ξ (z) is uniformly Hölder continuous. Here λ K ,ξ (z) denotes the maximal eigenvalue of Q K ,z ξ . These facts follow from the propositions stated and proved in the rest of this subsection. Hypothesis (A3) implies that Q K ,zξ ∈ L (Bα ), for all z ∈ Db . In particular the function Q K ,∗ξ : Db → L (Bα ), z → Q K ,zξ , is well-defined, for every (K , μ, ξ ) ∈ X. The next proposition establishes its analyticity. Proposition 5.11 The function Q K ,∗ξ : Db → L (Bα ) is analytic and if f ∈ Bα then d Q K ,z ξ ( f ) = Q K ,z ξ ( f ξ ) dz for all (K , μ, ξ ) ∈ X, and α1 ≤ α ≤ α0 . Proof Given b ∈ R, for all z, z 0 ∈ C, ez b − ez0 b − b ez0 b = z − z0



z

b2 eζ b

z0

z−ζ dζ. z − z0

This is the first order Taylor remainder formula for h(z) = eb z at z = z 0 . To shorten notation we write Q z for Q K ,z ξ . Replacing b by ξ(y), multiplying by f (y) K (x, dy) and integrating over Σ we get Q z f − Q z0 f − Q z0 ( f ξ ) = z − z0



z

Qζ ( f ξ 2)

z0

z−ζ dζ. z − z0

Hence, by (A3), for all z ∈ Db ,   z − ζ   |dζ |

Q ζ ( f ξ ) α  z − z 0  z0   ≤ M f α z − z 0 ,

Q z f − Q z0 f − Q z0 ( f ξ ) α ≤

z − z0



z

2

5.2 An Abstract Setting

181

which proves that the following limit exists in L (Bα ), lim

z→z 0

Q z − Q z0 = Q z0 (ξ ·). z − z0

Notice that (A3) also implies the operator Q z0 (ξ ·)( f ) := Q z0 (ξ f ) is in L (Bα ).



The next proposition focuses on the quasi-compactness and simplicity of Q z = Q K ,z ξ , and it is proved using arguments in [1, 10]. Proposition 5.12 Consider a metric space X of observed Markov systems satisfying (A1)–(A4) in the range [α1 , α0 ] ⊂ (0, 1] with setting constants C, σ , M, b and θ . Given ε > 0 there exist C  , M  > 0 and 0 < b0 < b such that the following statement holds: for all (K , μ, ξ ) ∈ X, z ∈ Db0 and α1 ≤ α ≤ α0 there exist: a one dimensional subspace E z = E K ,z ξ ⊂ Bα , a hyperplane Hz = HK ,z ξ ⊂ Bα , a number λ(z) = λ K ,ξ (z) ∈ C, and a linear map Pz = PK ,z ξ ∈ L (Bα ) such that (1) (2) (3) (4) (5) (6)

Bα = E z ⊕ Hz is a Q z -invariant decomposition, Pz is a projection onto E z , parallel to Hz , Q z ◦ Pz = Pz ◦ Q z = λ(z) Pz , Q z f = λ(z) f for all f ∈ E z , z → λ(z) is analytic in a neighborhood of Db0 , λ(z) ≥ 1 − ε.

Furthermore, for all f ∈ Bα , (7) Q nz f − λ(z)n Pz f α ≤ C  (σ + ε)n f α , (8) Pz f α ≤ C  f α ,   (9) Pz f − P0 f α ≤ C  z  f α , and for all z ∈ Db0 and (K 1 , μ1 , ξ1 ), (K 2 , μ2 , ξ2 ) ∈ X,   θ (10) λ K 1 ,ξ1 (z) − λ K 2 ,ξ2 (z) ≤ M  d((K 1 , μ1 , ξ1 ), (K 2 , μ2 , ξ2 )) 2 . Given (K , μ, ξ ) ∈ X, define the operators  1 Rz (w) dw 2πi Γ1  1 := w Rz (w) dw 2πi Γ1  1 := w Rz (w) dw 2πi Γ0

Pz = PK ,zξ := L z = L K ,zξ Nz = N K ,zξ

(5.3) (5.4) (5.5)

182

5 Large Deviations for Random Cocycles

where Γ0 and Γ1 are the positively oriented circles   1 + 2σ Γ 0 = { w ∈ C : w  = }, 3  1−σ  Γ1 = { w ∈ C : w − 1 = }, 3 and Rz (w) = R K ,zξ stands for the resolvent of Q K ,zξ , −1  Rz (w) := w I − Q K ,zξ . Lemma 5.1 Given a Banach space (B, · ) and linear operators T, T0 ∈ L (B), if T0 is invertible with T0−1 ≤ C and T − T0 ≤ ε < C −1 then C , 1. T is invertible, with T −1 ≤ 1−Cε C2 2. T −1 − T0−1 ≤

T − T0 . 1−Cε  −1 n n −1 Proof Since T −1 = ∞ n=0 (−1) (T0 (T − T0 )) T0 , we have

T −1 ≤



T0−1 n+1 T − T0 n = n=0

T0−1

1−

T0−1

T

− T0



C . 1−Cε

For (2) use the formula T −1 − T0−1 = −T −1 (T − T0 ) T0−1 .



Lemma 5.2 There exist constants C0 > 0 and 0 < b0 < b, depending only on C, M, σ and b, such that for (K , μ, ξ ) ∈ X, z ∈ Db0 , and for any of the five / int(Γ0 ) ∪ int(Γ1 ), operators Tz = Q K ,zξ , L z , Nz , Pz , and Rz (w) with w ∈ 1. Tz ≤ C0 ,   2. Tz − T0 ≤ C0 z . Proof By the spectral decomposition theorem (see [15, Chap. XI]) applied to the operator Q 0 , L 0 = P0 is the projection P0 f =  f, μ 1 and N0 = Q 0 − P0 . Notice also that Q n0 = P0 + N0n for all n ≥ 1. Hence L 0 = P0 = 1, N0 ≤ C σ , and

Q 0 = L 0 + N0 ≤ 1 + C σ . We now go through the given operators, one at a time. Assume 0 < b0 < b is small and take z ∈ Db0 . For Q K ,z ξ , item (1) follows from assumption (A3), taking C0 := M, while (2) follows from (A3) and Proposition 5.11 with the same constant. For the operator Rz (w), we have R0 (w) = w−1 (I − w−1 Q 0 )−1 = w−1 = w−1

∞ Q n0 wn n=0

∞ ∞ ∞ P0 N0n N0n P0 −1 + + w = . n n w w w − 1 n=0 wn+1 n=0 n=0

5.2 An Abstract Setting

183

    Notice also that w ∈ / int(Γ0 ) ∪ int(Γ1 ) implies w − 1 ≥ 1−σ and w ≥ 1+2σ , and 3 3 hence n  ∞ C σ

P0

+   

R0 (w) ≤  w − 1 w w  n=0 n ∞  3σ 3 3 + 3C 3C ≤ = + =: C1 . 1−σ 1 + 2σ n=0 1 + 2σ 1−σ Therefore, applying Lemma 5.1 to w I − Q z and w I − Q 0 , item (1) holds with C2 C C2 := 1−CC1 1C0 b0 , while (2) holds with C3 := 1−C11 C00 b0 . Of course we have to pick 0 < b0 < b small enough to make sure the denominators in constants C2 and C3 are both positive. For the remaining operators Pz , L z and Nz we use the integral formulas (5.3)–(5.5) to reduce to the previous case, using the same constants C2 and C3 as before.  Proof (of Proposition 5.12) By Lemma 5.2 for all |z| < b and w ∈ / int(Γ0 ) ∪ int(Γ1 ), the operator norm Rz (w) is uniformly bounded. This implies that the spectrum Σz of Q K ,z ξ is contained in int(Γ0 ) ∪ int(Γ1 ), and hence we can write Σz = Σz0 ∪ Σz1 with Σzi ⊂ int(Γi ), for i = 0, 1. By the spectral theory of bounded operators on Banach spaces (see [15, Chap. XI]) if we denote by Hz and E z the subspaces of Bα , respectively associated to the spectral components Σz0 and Σz1 , then for all z ∈ Db0 , with b0 > 0 small enough, (a) (b) (c) (d) (e) (f)

the operators Q z , Pz , L z and Nz commute, L z f = Q z f ∈ E z , for all f ∈ E z , Nz f = Q z f ∈ Hz , for all f ∈ Hz , Q z = L z + Nz , Bα = E z ⊕ Hz , Pz is the projection to E z parallel to Hz .

For z = 0, the condition (A2) implies that the operator Q 0 |Bα is quasi-compact and simple, with spectrum Σ00 ⊂ Dσ and Σ01 = {1}. Since 1 is a simple eigenvalue, E 0 = 1 is the space of constant functions. The operator Q 0 leaves invariant the subspace of functions with zero average and acts on it as  a contraction with spectral radius = { f ∈ B : f dμ = 0 }. Thus for all f ∈ Bα , ≤ σ . Hence we must have H 0  α P0 f = ( f dμ) 1 and N0 f = Q 0 f − ( f dμ) 1. Since 1 is a simple eigenvalue of Q 0 , a continuity argument implies that Σz1 is a singleton, i.e., Σz1 = {λ(z)}, for all z ∈ Db . It follows easily that dim(E z ) = 1, and λ(z) = L z 1, μ/Pz 1, μ. By perturbation theory, and Proposition 5.11, the function λ : Db0 → C is analytic. Hence, to finish the proof of Proposition 5.12, it is now enough to establish items (6)–(10). Take 0 < b0 < b according to Lemma 5.2. Fixing a reference probability measure μ0 on Σ, we can write, for all z ∈ Db ,

184

5 Large Deviations for Random Cocycles

λ K ,μ,ξ (z) =

L K ,z ξ 1, μ0  . PK ,z ξ 1, μ0 

(5.6)

Notice that by Lemma 5.2, for all (K , μ, ξ ) ∈ X, PK ,z ξ 1, μ0  ≥ 1 − PK ,z ξ 1 − PK ,0 1 α ≥ 1 − C0 b0 . Hence, for all z ∈ Db0 ,     λ K ,μ,ξ (z) − 1 ≤  L K ,z ξ 1, μ0  − L K ,0 1, μ0   PK ,z ξ 1, μ0  PK ,0 1, μ0      L K ,z ξ 1 − L K ,0 1, μ0  C0 PK ,z ξ 1 − PK ,0 1, μ0  + ≤ 1 − C0 b0 (1 − C0 b0 )2 C0 b0 C02 b0 ≤ + = O(b0 ). 1 − C0 b0 (1 − C0 b0 )2 Thus, given ε >0 we can make  b0 > 0 small enough so that for all (K , μ, ξ ) ∈ X, and all z ∈ Db0 , λ K ,μ,ξ (z) − 1 < ε. This implies (6). To prove (7), choose p ∈ N such that C σ p ≤ (σ + 2ε ) p , and make b0 > 0 small enough so that ε p p C0 b0 < (σ + ε) p − (σ + ) p = O(ε). 2 We then have p

p

Nzp ≤ N0 + Nzp − N0

p−1

p

≤ C σ p + p C0

Nz − N0 ≤ C σ p + p C0 b0 ε ≤ C σ p + (σ + ε) p − (σ + ) p < (σ + ε) p . 2

It follows that for all n ∈ N, Nzn ≤ C0 (σ + ε)n . This proves (7) with C  = C0 . Items (8) and (9) follow from Lemma 5.2. To prove item (10), we claim that for all (K 1 , μ1 , ξ1 ), (K 2 , μ2 , ξ2 ) ∈ X, z ∈ Db0 , 2α1 ≤ α ≤ α0 , and f ∈ Bα , p

p

θ

v α2 (Q K 1 ,zξ1 f − Q K 2 ,zξ2 f )  f α dist ((K 1 , μ1 , ξ1 ), (K 2 , μ2 , ξ2 )) 2 .

(5.7)

In fact by (B7), (B2) and (A4), we have 1

1

v α2 (Q K 1 ,zξ1 f − Q K 2 ,zξ2 f ) ≤ v0 (Q K 1 ,zξ1 f − Q K 2 ,zξ2 f ) 2 vα (Q K 1 ,zξ1 f − Q K 2 ,zξ2 f ) 2 1

1

2  Q K 1 ,zξ1 f − Q K 2 ,zξ2 f ∞ vα (Q K 1 ,zξ1 f − Q K 2 ,zξ2 f ) 2 θ

 f α dist ((K 1 , μ1 , ξ1 ), (K 2 , μ2 , ξ2 )) 2 .

5.2 An Abstract Setting

185

Equation (5.7) implies, for all (K 1 , μ1 , ξ1 ), (K 2 , μ2 , ξ2 ), z, α and f as above, and all w ∈ / int(Γ0 ) ∪ int(Γ1 ), θ

v α2 (R K 1 ,zξ1 (w) f − R K 2 ,zξ2 (w) f )  f α dist ((K 1 , μ1 , ξ1 ), (K 2 , μ2 , ξ2 )) 2 .

(5.8)

This follows from (5.7), Lemma 5.2, and the algebraic relation R K 1 ,zξ1 (w) − R K 2 ,zξ2 (w) = −R K 1 ,zξ1 (w) ◦ (Q K 1 ,zξ1 − Q K 2 ,zξ2 ) ◦ R K 2 ,zξ2 (w). Thus, integrating (5.3) and (5.4), we obtain θ

PK 1 ,zξ1 f − PK 2 ,zξ2 f α2  f α dist ((K 1 , μ1 , ξ1 ), (K 2 , μ2 , ξ2 )) 2 , θ

L K 1 ,zξ1 f − L K 2 ,zξ2 f α2  f α dist ((K 1 , μ1 , ξ1 ), (K 2 , μ2 , ξ2 )) 2 . Finally, (10) follows from the previous inequalities and (5.6).



Remark 5.3 The condition α1 < α20 and the assumption (A4) are only needed to prove item (10) of Proposition 5.12.

5.2.2 An Abstract Theorem In this subsection we state and prove an abstract LDT theorem. Let (Bα , · )α∈[0,1] be a scale of Banach algebras satisfying (B1)–(B7). Assume X is a metric space of observed Markov systems for which assumptions (A1)–(A4) hold. Take 0 < b0 < b according to Proposition 5.12. Given (K , μ, ξ ) ∈ X, let c K ,ξ (z) := log λ K ,ξ (z),

(5.9)

where λ K ,ξ (z) denotes the maximal eigenvalue of Q K ,tξ . Theorem 5.4 Given (K 0 , μ0 , ξ0 ) ∈ X and h > (c K 0 ,ξ0 ) (0), there exist a neighborhood V of (K 0 , μ0 , ξ0 ) ∈ X , C > 0 and ε0 > 0 such that for all (K , μ, ξ ) ∈ V, 0 < ε < ε0 , x ∈ Σ and n ∈ N,

  ε2 + 1  Sn (ξ ) − Eμ (ξ ) ≥ ε ≤ C e− 2 h n . Px (5.10) n Remark 5.4 Averaging in x, w.r.t. μ we get for all 0 < ε < ε0 , (K , μ, ξ ) ∈ V and n ∈ N,

186

5 Large Deviations for Random Cocycles

P+ μ



 1  Sn (ξ ) − Eμ (ξ ) ≥ ε n

ε2

≤ C e− 2h n .

Lemma 5.3 For all (K , μ, ξ ) ∈ X, n ∈ N, z ∈ Db0 and x ∈ Σ,   ((Q K ,zξ )n 1)(x) = Ex e z Sn (ξ ) =

 X+

e z Sn (ξ ) dP+ x.

In particular, for all z ∈ Db0 ,   Eμ ((Q K ,zξ )n 1) = Eμ e z Sn (ξ ) . Proof In fact,  ((Q K ,zξ )n 1)(x0 ) =

ez Σn

n j=1

ξ(x j )

n−1 

  K (x j , d x j+1 ) = Ex0 e z Sn (ξ ) .

j=0

We obtain the second identity averaging this relation in x0 w.r.t. μ.



The next proposition shows that c K ,ξ (z), defined in (5.9), is a limit cumulant generating function of the process {Sn (ξ )}n≥0 . Moreover it says that the parameters C and δn in Definition 5.16 can be chosen uniformly in X. Proposition 5.13 There exist C1 > 0 and a sequence δn converging geometrically to 0 such that for all (K , μ, ξ ) ∈ X, z ∈ Db0 (0), x ∈ Σ and n ∈ N    n log λ K ,ξ (z) − log Ex e z Sn (ξ )  ≤ C1 |z| + δn . Proof We will use the notations in Proposition 5.12, choosing ε>   0 small enough so that σ + ε < 1 − ε. By Lemma 5.3, (Q nz 1)(x) = Ex e z Sn (ξ ) . By Lemma 5.2 there exists B > 0 such that for all z ∈ Db0 (0), Pz − I α ≤ B |z|. Hence      z S (ξ )  Ex e n − λ K ,ξ (z)n  ≤ (Q nz 1)(x) − λ K ,ξ (z)n  ≤ Q nz 1 − λ K ,ξ (z)n Pz 1 α + λ K ,ξ (z)n 1 − Pz 1 α = Nzn 1 α + λ K ,ξ (z)n 1 − Pz 1 α ≤ C (σ + ε)n + B |z| λ K ,ξ (z)n . Thus         log Ex et Sn (ξ ) − n log λ K ,ξ (z) = log Ex et Sn (ξ ) − log λ K ,ξ (z)n    t S (ξ )   Ex e n − λ K ,ξ (z)n    ≤ min{λ K ,ξ (z)n , Ex et Sn (ξ ) }

5.2 An Abstract Setting

187

B |z| λ K ,ξ (z)n + C (σ + ε)n (1 − B |z|) λ K ,ξ (z)n − C (σ + ε)n B |z| + δn ≤ ≤ 2 (B |z| + δn ), 1 − B |z| − δn ≤

where δn := C

(σ +ε)n λ K ,ξ (z)n

≤C

 σ +ε n 1−ε

converges geometrically to zero.

Proof (of Theorem 5.4) Combine Proposition 5.13 with Corollary 5.1.

 

5.3 The Proof of LDT Estimates Here we prove the base-LDT and uniform fiber-LDT estimates for irreducible cocycles over mixing Markov shifts. These results follow from the abstract Theorem 5.4.

5.3.1 Base LDT Estimates To deduce Theorem 5.2 from Theorem 5.4 we specify the data (Bα , · α ) and X, and check the validity of the assumptions (B1)–(B7) and (A1)–(A4). Consider a strongly mixing Markov system (K , μ) on the compact metric space Σ. − Let X − = Σ Z0 be the space of sequences in Σ indexed in the set Z− 0 of non-positive − is countable, the product X is a compact metrizable topological integers. Since Z− 0 space. We denote by F its Borel σ -field. The kernel K on Σ induces another Markov  on X − defined by kernel K ( ..., x−1 ,x0 ) := K

 Σ

δ( ..., x−1 ,x0 ,x1 ) K (x0 , d x1 ).

Let P− μ denote the Kolmogorov extension of (K , μ), which is also the unique -stationary measure. Theorem 5.4 will be applied to the Markov system ( K , P− K μ ). − Consider the spaces Hα (X ) introduced in Definition (5.11). Its functions can be regarded as measurable functions on X − . Proposition 5.14 The family of spaces Hα (X − ) forms a scale of Banach algebras satisfying (B1)–(B7).  has Proof This follows essentially from Proposition 5.10. The metric space (X − , d) diameter 1 but it is not compact (see Remark 5.1). Hence, formally, this proposition is not a direct consequence of Proposition 5.10. Properties (B1), (B3) and (B4) follow from Proposition 5.4. For α = 0, the seminorm v0 measures the variation of f . Hence H0 (X ) = L ∞ (X ), while the norm · 0 is equivalent to · ∞ . This proves (B2). The remaining properties, (B5)–(B7), can be proved as in Proposition 5.10. 

188

5 Large Deviations for Random Cocycles

Fix 0 < α0 ≤ 1 and 0 < L < +∞ and consider the space X of observed Markov −  − , P− systems ( K μ , ξ ) over the fixed Markov system ( K , Pμ ), with ξ ∈ Hα0 (X ) and −

ξ α0 ≤ L. This space is identified with a subspace of Hα0 (X ) and endowed with the corresponding norm distance. Proposition 5.15 The space X of observed Markov systems satisfies (A1)–(A4).  determines the Markov operator Q K : L ∞ (X − ) → L ∞ (X − ), The kernel K  (Q K f )( . . . , x−1 , x0 ) :=

Σ

f ( . . . , x−1 , x0 , x1 ) K (x0 , d x1 ).

This operator acts continuously on Ha (X − ). Its iterates are given by  (Q nK

f )( . . . , x−1 , x0 ) =

Σn

f ( . . . , x0 , x1 , . . . , xn )

n−1 

K (x j , d x j+1 ).

(5.11)

j=0

Proposition 5.16 For all f ∈ Hα (X − ) and n ∈ N, (1) (Q K)n f ∞ ≤ f ∞ , (2) vα ((Q K)n f ) ≤ max{2 (Q K)n f ∞ , 2−n α vα ( f )}. Proof We shall write Q = Q K. The first inequality follows from (c) of Proposition 5.8. For the second, notice that if k ≥ 1 then vk (Q n f ) ≤ vk+n ( f ). Indeed, for  x  ) ≤ 2−k with k ≥ 1, we have x = (xn )n≤0 and x  = (xn )n≤0 in X − such that d(x,  x0 = x0 . Thus   n (Q f )( . . . , x−1 , x0 ) − (Q n f )( . . . , x  , x  ) −1 0    n−1   f ( . . . , x 0 , x 1 , . . . , x n ) − f ( . . . , x  , x 1 , . . . , x n ) ≤ K (x j , d x j+1 ) 0 Σn

j=0

 ≤ vk+n ( f )

n−1  Σ n j=0

K (x j , d x j+1 ) = vk+n ( f ),

 x  ) ≤ 2−k , the inequality vk (Q n f ) ≤ and taking the sup in x, x  ∈ X − such that d(x, vk+n ( f ) follows. Hence, for k ≥ 1, 2αk vk (Q n f ) = 2−n α (2α(k+n) vk+n ( f )) ≤ 2−n α vα ( f ) . For k = 0 notice that v0 (Q n f ) is the variation of Q n f . Thus v0 (Q n f ) ≤ 2 Q n f ∞ . Taking the sup in k ∈ N, item (2) follows.  The next proposition shows that X satisfies (A2) with range [α1 , α] for any given 0 < α1 ≤ α. The setting constants C > 0 and 0 < σ < 1 depend on the number α1 .

5.3 The Proof of LDT Estimates

189

Proposition 5.17 If (K , μ) is strongly mixing, then given 0 < α1 < α0 there are constants C > 0 and 0 < σ < 1 such that for all α1 ≤ α ≤ α0 , Q K : Hα (X − ) → Hα (X − ) is quasi-compact and simple with spectral constants C and σ , i.e., for all f ∈ Hα (X − ), n

(Q K )n f −  f, P− μ 1 α ≤ C σ f α .

Proof Given a function f ∈ Hα (X − ), denote by f k : X − → C the following function  f k ( . . . , x0 ) := f ( . . . , x−k , . . . , x0 ) dP− μ ( . . . , x −k ). X−

Note that if Fk− is the sub σ -field of F− generated by the cylinders in the coordinates − − − x−k+1 , . . . , x−1 , x0 , we have f k = E− μ ( f |Fk ), and in particular Eμ ( f k ) = Eμ ( f ), for all k ∈ N. By definition of f k ,

Q n ( f − f k ) ∞ ≤ f − f k ∞ ≤ vk ( f ) ≤ 2−αk vα ( f ).

(5.12)

Because (K , μ) is strongly mixing, there  are constants C > 0 and 0 < ρ < 1 such that for any function h ∈ L ∞ (Σ) with Σ h dμ = 0,  

 Σ

 h(y)K n (x, dy) ≤ C ρ n h ∞ .

Now, if h ∈ L ∞ (X − ) is a function which depends only on the first coordinate x0 then by (5.11) Q n h also depends only on the first coordinate, and it is given by  (Q h)( . . . , x0 ) = n

Σ

h(y)K n (x0 , dy).

Hence, if h has zero average, i.e., E− μ (h) = 0, then

Q n h ∞ ≤ C ρ n h ∞ .

(5.13)

We claim that h = Q k ( f k − E− μ ( f ) 1) is a function with zero average that depends only on the first coordinate. The first part of claim follows because Q preserves − averages (see Proposition 5.8) and, as remarked above, E− μ ( f k ) = Eμ ( f ). For the second part notice two things: first Q preserves functions that depend only on the first coordinate x0 ; second, Q maps a function f that depends only on the coordinates x−k , . . . , x−1 , x0 to a function that depends only on the coordinates x−k+1 , . . . , x−1 , x0 , in other words Q f looses dependence in x−k . Therefore, from (5.13)

190

5 Large Deviations for Random Cocycles n−k

Q n ( f k − E− h ∞ ≤ C ρ n−k h ∞ μ ( f ) 1) ∞ = Q

≤Cρ

n−k

Q ( f k − k

E− μ(

(5.14)

f ) 1) ∞

n−k ≤ C ρ n−k f k − E−

f ∞ . μ ( f ) 1 ∞ ≤ 2C ρ α1 √ Setting σ = max{2− 2 , ρ} we have 0 < σ < 1. From the inequalities (5.12) and (5.14), with k = n/2, we have

n n −

Q n f − E− μ ( f ) 1 ∞ ≤ Q ( f − f k ) ∞ + Q ( f k − Eμ ( f ) 1) ∞

≤ 2−α 2 vα ( f ) + 2Cρ 2 f ∞ ≤ σ n vα ( f ) + 2Cσ n f ∞ . n

n

On the other hand, by item (2) of Proposition 5.16, n − vα (Q n f − E− μ ( f ) 1) = vα (Q ( f − Eμ ( f ) 1)) −n α ≤ max{ Q n f − E− vα ( f ) } μ ( f ) 1 ∞ , 2

≤ max{ σ n vα ( f ) + 2Cσ n f ∞ , σ 2n vα ( f ) } = σ n va ( f ) + 2Cσ n f ∞ . Thus, for all f ∈ Hα (X − ), n

Q n f − E− μ ( f ) 1 α ≤ 4Cσ f α ,

which proves the proposition.



Proof (of Proposition 5.15) Property (A1) is obvious. Proposition 5.17 proves (A2). By Proposition 5.16 the operator Q K : Hα (X − ) → Hα (X − ) is bounded with norm

Q K ≤ 2. Since Hα (X − ) is a Banach algebra, given 0 < α ≤ α0 and ξ ∈ X ≡ Hα0 (X − ) ⊂ Hα (X − ), the multiplication operator Dξ : Hα (X − ) → Hα (X − ), Dξ f := ξ f , is uniformly bounded for ξ ∈ X. Thus, because Q K ,ξ = Q K ◦ Deξ , the map Q K ,∗ : Hα (X − ) → L (Hα (X − )), ξ → Q K ,ξ , is analytic. These considerations imply (A3). A simple computation, using that (Hα (X − ), · α ) is a Banach algebra, shows that for all f ∈ Hα (X − ) and ξ1 , ξ2 ∈ X,

Q K ,ξ1 f − Q K ,ξ2 f α ≤ 2 e L ξ1 − ξ2 α f α , which implies (A4).



Proof (of Theorem 5.2) By Theorem 5.4 every ξ ∈ X admits a neighborhood satisfying uniform LDT estimates of exponential type. However, as stated in Theorem 5.2, we want to see that these LDT estimates hold uniformly for all observed Markov system in X. For δ > 0 small, denote by Bδ (0) the δ-ball around the origin in Hα (X − ), and consider the analytic function λˆ K : Bδ (0) → C, ξ → λˆ K (ξ ) = maximal eigenvalue

5.3 The Proof of LDT Estimates

191

of Q K ,ξ . Decreasing δ we can assume that λˆ is bounded. Choose b0 > 0 small, such ˆ ξ ) for all z ∈ Db0 (0). Hence the family that b0 L < δ, and notice that λ K ,ξ (z) = λ(z of analytic functions {λ K ,ξ (z)}ξ ∈X is uniformly bounded over Db0 (0). Shrinking b0 even more, the derivatives λK ,ξ (z) and λK ,ξ (z) are also bounded. Thus, there exists h > 0 such that (c K ,ξ ) (t) < h for all t ∈ [−b0 , b0 ] and ξ ∈ X. By Proposition 5.13, conclusion (2) of Corollary 5.1 and Remark 5.4, there are constants ε0 and C > 0 such that for all ξ ∈ X, 0 < ε < ε0 and all n ∈ N,

2  1  Sn (ξ ) − E− (ξ ) ≥ ε ≤ C e− 2ε h n . P− μ μ n Consider the natural (measure preserving) projection π : X → X − . Since π

−1



 1  Sn (ξ ) − E− (ξ ) ≥ ε μ n ⎧ ⎫  n−1 ⎨ ⎬ 1  = x∈X :  ξ(T j (x)) − ξ dPμ  ≥ ε , ⎩ ⎭ n j=0 X

all observables ξ ∈ X satisfy a uniform base-LDT estimate. The constants δ, b0 , h, ε0 and C depend on L and λˆ K , i.e., on L and K .



5.3.2 Fiber LDT Estimates In this section we use Theorem 5.4 to establish the fiber LDT Theorem 5.3. First we specify the data (Bα , · α ) and the metric space X. Then we check that assumptions (B1)–(B7) and (A1)–(A4) hold. Consider the space B∞ m (K ) of random cocycles over a Markov system (K , μ). m For each cocycle A ∈ B∞ m (K ) we define a Markov kernel on Σ × Σ × P(R ) by  ˆ := K A (x, y, p)

Σ

δ(y,z,A(y,z) p) ˆ K (y, dz).

(5.15)

We will see that (c.f. Corollary 5.2), under the assumptions of Theorem 5.3, this kernel admits a unique K A -stationary probability measure μ A in Σ × Σ × P(Rm ). m For each A ∈ B∞ m (K ) consider the observable ξ A : Σ × Σ × P(R ) → R ˆ := log A(x, y) p , ξ A (x, y, p) for any unit vector p ∈ p. ˆ

(5.16)

192

5 Large Deviations for Random Cocycles

We can now introduce the metric space of observed Markov systems X := { (K A , μ A , ±ξ A ) : A ∈ B∞ m (K ), A irreducible, L 1 (A) > L 2 (A) }. This space is identified with a subspace of B∞ m (K ), and endowed with the distance dist ((K A , μ A , ξ A ), (K B , μ B , ξ B )) := d∞ (A, B). Proposition 5.18 The space X of observed Markov systems satisfies (A1)–(A4). Next we define the scale of Banach algebras. Recall the following projective distance (see (2.3)) δ( p, ˆ q) ˆ :=

p ∧ q

,

p

q

where p ∈ pˆ and q ∈ q. ˆ Given 0 ≤ α ≤ 1 and f ∈ L ∞ (Σ × Σ × P(Rm )), let

f α := vα ( f ) + f ∞ ,    f (x, y, p) ˆ − f (x, y, q) ˆ  . vα ( f ) := sup δ( p, ˆ q) ˆ α x,y,∈Σ

(5.17) (5.18)

pˆ =qˆ

For simplicity of notation from now on we will simply write p for both the vector and the corresponding projective point. Consider the normed space Hα (Σ × Σ × P(Rm )) of all functions f ∈ L ∞ (Σ × Σ × P(Rm )) such that vα ( f ) < +∞, endowed with the norm (5.17). Proposition 5.19 The family of spaces Hα (Σ × Σ × P(Rm )) is a scale of Banach algebras satisfying (B1)–(B7). Proof (B1) holds by definition of the Hölder norm · α . For (B2) notice that v0 ( f ) measures the maximum oscillation of f on the projective fibers, and hence v0 ( f ) ≤ 2 f ∞ . Property (B3) is obvious. Assumption (B4) is a consequence of the inequality vα ( f g) ≤ f ∞ vα (g) + g ∞ vα ( f ),

f, g ∈ L ∞ (Σ).

The monotonicity properties (B5) and (B6) are straightforward to check. The assumption (B7) follows from the convexity of the function α → log vα ( f ), whose proof is analogous to that of Proposition 5.10.  Definition 5.20 We define Hα (Σ × P(Rm )) to be the subspace of functions f (x, y, p) in Hα (Σ × Σ × P(Rm )) that do not depend on the first coordinate x. This subspace is clearly a closed sub-algebra of Hα (Σ × Σ × P(Rm )).

5.3 The Proof of LDT Estimates

193

Proposition 5.20 The family Hα (Σ × P(Rm )) is a scale of Banach sub-algebras satisfying (B1)–(B7). ∞ Given A ∈ B∞ m (K ), consider the linear transformation Q A : L (Σ × Σ × P(Rm )) → L ∞ (Σ × Σ × P(Rm )) defined by

 (Q A f )(x, y, p) :=

Σ

f (y, z, A(y, z) p) K (y, dz).

(5.19)

This is the Markov operator associated with the kernel (5.15). Assumption (A1) follows from the definition of X. Since (Q A f )(x, y, p) does not depend on the coordinate x, the Markov operator Q A leaves invariant the subspace of functions f (x, y, p) that are constant in x. Next, we are going to see that Q A acts invariantly on the subspace Hα (Σ × P(Rm )). Given A ∈ B∞ m (K ) and 0 < α ≤ 1, define for all n ∈ N, ! καn (A)

:=

sup

x∈Σ, p=q

Ex

δ(A(n) p, A(n) q) δ( p, q)

α " ∈ [0, +∞]

(5.20)

Lemma 5.4 Let A ∈ B∞ m (K ) and n ∈ N. (a) A(±n) ∞ ≤ max{ A ∞ , A−1 ∞ }n . (b) A(n) − B (n) ∞ ≤ n max{ A ∞ , B ∞ }n−1 A − B ∞ . Proof Item (a) is straightforward. To prove (b), we use the formula

A

(n)

−B

(n)

n−1 = (A( j) ◦ T n− j )(A ◦ T n−1− j − B ◦ T n−1− j ) B (n−1− j) . j=0

The following lemma highlights the importance of the quantity (5.20). m Lemma 5.5 Given A ∈ B∞ m (K ), f ∈ Hα (Σ × P(R )) and n ∈ N,

vα (Q nA f ) ≤ καn (A) vα ( f ). Proof For any f ∈ Hα (Σ × P(Rm )), and (x0 , p) ∈ Σ × P(Rm ),  (Q nA f )(x0 , p) =

Σn

f (xn , A(xn−1 , xn ) . . . A(x0 , x1 ) p)

  = Ex0 f (en , A(n) p) .

n−1  j=0

K (x j , d x j+1 )



194

5 Large Deviations for Random Cocycles

Hence vα (Q nA

f) = ≤

sup

x∈Σ, p=q

sup

   Ex f (en , A(n) p) − f (en , A(n) q)  δ( p, q)α   Ex  f (en , A(n) p) − f (en , A(n) q)

x∈Σ, p=q

≤ vα ( f )

! sup

x∈Σ, p=q

Ex

δ( p, q)α δ(A(n) p, A(n) q) δ( p, q)

α "

= vα ( f ) καn (A).



Lemma 5.6 The sequence {καn (A)}n≥0 is sub-multiplicative, i.e., καn+ (A) ≤ καn (A) κα (A) for n,  ∈ N. In particular, lim κ n (A)1/n n→+∞ α

= inf{ καn (A)1/n : n ∈ N }.

Proof Let us write Mn = A(n) . Given x ∈ Σ and p = q in P(Rm ), 

 δ(Mn+m p, Mn+m q) α δ( p, q)     δ((Mn ◦ T m )Mm p, (Mn ◦ T m )Mm q) α δ(Mm p, Mm q) α ≤ Ex δ(Mm p, Mm q) δ( p, q)   α   δ(Mm p, Mm q) δ(Mn p , Mn q  ) α m ≤ Ex sup E K (x,·) ≤ καm καn , δ( p, q) δ( p  , q  ) p =q 

Ex

and taking the sup we get καn+m ≤ καn καm .



These constants become finite provided α is small enough. Lemma 5.7 Given A ∈ B∞ m (K ) and n ∈ N for all 0 < α ≤

1 , 4n

καn (A) ≤ max{ A ∞ , A−1 ∞ }. Proof We write as before Mn = A(n) . Recall that given M ∈ GL(m, R), the quantity (M) := max{log M , log M −1 } is sub-multiplicative, in the sense that for any matrices M1 , M2 ∈ GL(m, R), (M1 M2 ) ≤ (M1 ) + (M2 ). By Lemma 2.12, given x ∈ Σ, and p = q in P(Rm ),  Ex

δ(Mn p, Mn q) δ( p, q)



= Ex

    δ(Mn p, Mn q) exp α log ≤ Ex e4 α (Mn ) . δ( p, q)

5.3 The Proof of LDT Estimates

If 0 < α ≤

1 , 4n

195

setting c := max{log A ∞ , log A−1 ∞ }   Ex e4 α (Mn ) ≤ e4nαc ≤ ec = max{ A ∞ , A−1 ∞ }.

Hence, taking the sup in x and p = q we obtain καn ≤ max{ A ∞ , A−1 ∞ }.



By the previous lemmas the operator Q A leaves the subspace Hα (Σ × P(Rm )) invariant, for all small enough α > 0. To prove that Q A is quasi-compact and simple all hypotheses of Theorem 5.3 are essential. The irreducibility and gap assumptions are used in the following lemmas. Lemma 5.8 Given A ∈ B∞ m (K ) such that (K A , μ A , ξ A ) ∈ X lim

n→+∞

1 Ex (log A(n) p ) = L 1 (A), n

with uniform convergence in (x, p) ∈ Σ × P(Rm ). Proof See Lemma 3.1 in [1].



Lemma 5.9 Given A ∈ B∞ m (K ) such that (K A , μ A , ξ A ) ∈ X, there exists n ∈ N such that for all x ∈ Σ and p = q in P(Rm ),

δ(A(n) p, A(n) q) ≤ −1. Ex log δ( p, q) Proof We write Mn = A(n) . Given x ∈ Σ and p = q in P(Rm ),

1 δ(Mn p, Mn q)

(Mn p) ∧ (Mn q) p q

1 Ex log ≤ Ex log n δ( p, q) n

Mn p Mn q p ∧ q



(Mn p) ∧ (Mn q) p

q

1 ≤ Ex log n

p ∧ q

Mn p Mn q

  1     1 1 ≤ Ex log ∧2 A(n) − Ex log A(n) p − Ex log A(n) q , n n n and the right hand side converges to L 1 + L 2 − 2 L 1 = L 2 − L 1 < 0. By Lemma 5.8, we have

δ(Mn p, Mn q) 1 Ex log ≤ L 2 − L 1 < 0. lim sup sup δ( p, q) n→+∞ x∈Σ, p=q n Hence taking n large enough such that n (L 2 − L 1 ) < −1 the Lemma follows.



Proposition 5.21 Given A ∈ B∞ m (K ) such that (K A , μ A , ξ A ) ∈ X, there exists a α0 neighborhood V of A in B∞ m (K ), and there are constants 0 < α1 < 2 < α0 , C > 0

196

5 Large Deviations for Random Cocycles

and 0 < σ < 1 such that vα (Q nB f ) ≤ C σ n vα ( f ), for all B ∈ V, α ∈ [α1 , α0 ], n ∈ N and f ∈ Hα (Σ × P(Rm )). Proof We begin deriving a modulus of continuity for B → καn (B). Fix a neighborhood V of A in B∞ m (K ) such that for all B ∈ V, B ∞ ≤ C and

B −1 ∞ ≤ C. By Lemma 5.4 B (±n) ∞ ≤ C n for all B ∈ V and n ∈ N. Thus, by Lemmas 2.13 and 5.4(b), there exists a polynomial expression C(g1 , g2 ), with degree < 11 in the variables g1 , g2 , g1−1 and g2−1 , such that   n κ (A) − κ n (B) ≤ α α

 !  α "  δ(A(n) p, A(n) q) α δ(B (n) p, B (n) q)   sup Ex  −    δ( p, q) δ( p, q) x∈Σ, p=q

≤ α C(A(n) , B (n) ) A(n) − B (n) ∞ ≤ α C 11 n A(n) − B (n) ∞ ≤ α n C 12n−1 A − B ∞ .

Let Mn = A(n) and note that (Mn ) ≤ n log C. We claim that καn00 (A) < 1 for some n 0 ∈ N and 0 < α0 ≤ 1 small enough. We will make use the following inequality ex ≤ 1 + x +

x 2 |x| e . 2

Choose n 0 ∈ N as given by Lemma 5.9. For all x ∈ Σ, p = q in P(Rm ),    δ(Mn 0 p, Mn 0 q) α δ(Mn 0 p, Mn 0 q) = Ex exp α log δ( p, q) δ( p, q) !   " 2 δ(Mn 0 p, Mn 0 q) δ(Mn 0 p, Mn 0 q) α δ(Mn 0 p, Mn 0 q) α + log2 ≤ Ex 1 + α log δ( p, q) 2 δ( p, q) δ( p, q) 

Ex

≤1−α+

# $ α2 Ex 16 (Mn 0 )2 exp(4 α (Mn 0 )) ≤ 1 − α + O(α 2 ). 2

  The last inequality follows because Ex 16 (Mn 0 )2 exp(4 α (Mn 0 )) is finite and uniformly bounded in x and 0 < α ≤ 1 by the constant 16 n 20 (log C)2 C 4 n 0 α . Taking α > 0 sufficiently small the right-hand-side above becomes less than 1, which implies that καn 0 (A) < 1. Hence, we can choose 0 < α1 < α20 and 0 < ρ < 1 such that for all α1 ≤ α ≤ α0 , καn 0 (A) ≤ ρ. Next, we extend this inequality to all cocycles B ∈ V. Pick ρ  ∈ (ρ, 1) and choose δ > 0 such that α0 n 0 C 12n 0 −1 δ < ρ  − ρ. Make the neighborhood V small enough so that A − B ∞ < δ for all B ∈ V. Then, using the modulus of continuity for καn (B), for all B ∈ V and α1 ≤ α ≤ α0 ,   n κ 0 (A) − κ n 0 (B) < ρ  − ρ, α

α

5.3 The Proof of LDT Estimates

197

which implies   καn 0 (B) ≤ καn 0 (A) + καn 0 (A) − καn 0 (B) < ρ  . j

By Lemma 5.7, κα (B) ≤ C for all B ∈ V, 0 < α ≤ 4 1n 0 and 0 ≤ j ≤ n 0 . Shrinking if necessary the constants α1 and α0 above, we may assume that α0 ≤ 4 1n 0 . Thus, because the sequence {καn (B)}n≥0 is sub-multiplicative, letting σ = (ρ  )1/n 0 we have καn (B) ≤ C  σ n for all B ∈ V, n ∈ N and α1 ≤ α ≤ α0 where C  = C  (α1 , C) < ∞. The proposition follows then from the inequality in Lemma 5.5.  Next proposition implies (A2). Proposition 5.22 Given A ∈ B∞ m (K ) such that (K A , μ A , ξ A ) ∈ X, there exist a (K ), a range 0 < α1 < α20 < α0 ≤ 1 and there are neighborhood V of A in B∞ m constants C > 0 and 0 < σ < 1 such that for all B ∈ V, α ∈ [α1 , α0 ] and f ∈ Hα (Σ × P(Rm )),

Q nB f −  f, μ B  1 α ≤ C σ n f α . Proof The argument below is an adaptation of the proof of Theorem 3.7 in [1]. Take the neighborhood V, and the constants 0 < α1 < α20 , C > 0 and 0 < σ < 1 given by Proposition 5.21. Enlarging the constants C > 0 and 0 < σ < 1 we can assume that the conditions of Definition 5.6 are also satisfied with ρ = σ . By Lemma 5.5, given B ∈ V and any K B -stationary measure ν B , vα (Q nB f −  f, ν B  1) = vα (Q nB f ) = vα ( f ) καn (B) ≤ C σ n f α . Hence it is now enough to prove that

Q nB f −  f, ν B  1 ∞ ≤ C σ n f α . We define four families of transformations (i) : L ∞ (Σ × P(Rm )) → L ∞ (Σ × P(Rm )) i = 0, 1, 2, 3, TB,n,m

depending on B ∈ V, and n ≥ m, n, m ∈ N, which act continuously on the scale of Banach spaces Hα (Σ × P(Rm )) with 0 < α ≤ α0 .   (0) (TB,n f )(x, p) := (Q nB f )(x, p) = Ex f (en , B (n) p) .   (1) (TB,n,m f )(x, p) := Ex f (en , (B (m) ◦ T n−m ) p) .   (2) (TB,m f )(x, p) := Eμ f (em , B (m) p) . (2) maps Hα (Σ × P(Rm )) onto the space Hα (P(Rm )) of α-Hölder continuous TB,m (2) functions, constant in x. In particular TB,m : Hα (Σ × P(Rm )) → C(P(Rm )) is a compact transformation.

198

5 Large Deviations for Random Cocycles

(TB(3) f )(x, p) :=



f dν B , where ν B is any K B -stationary measure.

TB(3) maps L ∞ (Σ × P(Rm )) onto the space of constant functions. In particular the linear transformation TB(3) : Hα (Σ × P(Rm )) → C(P(Rm )) has rank 1. We claim that for all B ∈ V and all f ∈ Hα (Σ × P(Rm )) with 0 < α ≤ α0 , for all n, m ∈ N with n ≥ m, and all (x, q) ∈ Σ × P(Rm ),   (0) (1) f )(x, q) − (TB,n,m f )(x, q) ≤ C σ m f α . (1) (TB,n  (1)  (2) (2) (TB,n,m f )(x, q) − (TB,m f )(q) ≤ C σ n−m f α .  (2)  (2) (3) (TB,m f )(q) − (TB,n f )(q) ≤ C σ m f α . Let us finish the proof before proving these three claims. Setting n = 2m in (1) and (2), and n =  in (3), for all B ∈ V and f ∈ Hα (Σ × P(Rm )) with 0 < α ≤ α0 , (2) m

Q 2m B f − TB, f ∞ ≤ 3 C σ f α .

(5.21)

(2) The sequence {TB, f }≥0 is relatively compact in C(P(Rm )). Hence, the set S f of its limit points in (C(P(Rm )), · ∞ ) is non-empty. Take any g ∈ S f and any K B stationary probability measure ν B . We claim that g =  f, ν B  1. From (5.21) we have for all m ∈ N, m

Q 2m B f − g ∞ ≤ 3 C σ f α . 2m

f α , we get vα (g) = 0, which On the other hand, since vα (Q 2m B f) ≤ Cσ 2m ∗ implies that g is constant. But Q B f, ν B  =  f, (Q 2m B ) ν B  =  f, ν B  implies that g, ν B  =  f, ν B . Therefore g =  f, ν B  1, and also m

Q 2m B f −  f, ν B  1 ∞ ≤ 3 C σ f α

∀ m ∈ N.

This establishes the claim of the proposition. To finish, we still have to prove the three claims: Claim (1): Let Fn denote the sub σ -field generated by the random variables {e j : j ≥ n}. Note that for any random variable f : X → C % & Ex ( f ) = Ex Een ( f |Fn ) . Then, using this fact we have,     (0)  (T f − T (1) f )(x, q) = Ex f (en , B (n) q) − f (en , (B (m) ◦ T n−m ) q)  B,n B,n,m   ≤ Ex  f (en , (B (m) ◦ T n−m )B (n−m) q) − f (en , (B (m) ◦ T n−m ) q) #  α $ ≤ f α Ex δ (B (m) ◦ T n−m )B (n−m) q, (B (m) ◦ T n−m ) q ' #  $( α = f α Ex Een−m δ (B (m) ◦ T n−m )B (n−m) q, (B (m) ◦ T n−m ) q |Fn−m

5.3 The Proof of LDT Estimates

199

#  α $ ≤ f α sup Ex δ B (m) p, B (m) q x, p,q

!

≤ f α sup Ex x, p=q

δ(B (m) p, B (m) q) δ( p, q)

α " = f α καm (B) ≤ f α C σ m .

  Claim (2): Defining ϕm,q (x) := Ex f (em , B (m) q) , because (K , μ) is strongly mixing on L ∞ (Σ) we have       (2) f − TB,m f )(x, q) = Ex f (en , (B (m) ◦ T n−m ) q) − Eμ f (em , B (m) q)   %   &  = Ex Een−m f (en , (B (m) ◦ T n−m ) q) − Eμ f (em , B (m) q)         n−m  = Ex ϕm,q (en−m ) − ϕm,q dμ = (Q n−m . K ϕm,q )(x) − ϕm,q , 1 ≤ C σ

 (1) (T

B,n,m

Claim (3): Because μ is K -stationary,     (2)    (2) (m)   (T q) − Eμ f (en , B (n) q)  B,m f )(q) − (TB,n f )(q) = Eμ f (em , B      = Eμ f (em , B (m) q) − Eμ f (em , B (m) B (n−m) q)    ≤ Eμ  f (em , B (m) q) − f (em , B (m) B (n−m) q)   ≤ f α Eμ δ(B (m) q, B (m) B (n−m) q)α %  & ≤ f α Eμ Een−m δ(B (m) q, B (m) B (n−m) q)α   ≤ f α sup Ex δ(B (m) q, B (m) p)α x, p,q

≤ f α sup Ex x, p=q

!

δ(B (m) q, B (m) p) δ(q, p)

α " = f α καm (B) ≤ f α C σ m . 

Corollary 5.2 Given A ∈ B∞ m (K ) such that (K A , μ A , ξ A ) ∈ X, the kernel K A on the product space Σ × Σ × P(Rm ) has a unique stationary measure. Proof In the proof of Proposition 5.22 we have shown that given a function f ∈ (2) Hα (Σ × P(Rm )), if we denote by S f the set of limit points of {TB, f }≥0 , then S f = { f, ν B  1} for any K B -stationary measure ν B . Hence, given any other K B -stationary measure μ B , and f ∈ Hα (Σ × P(Rm )), we have  f, ν B  =  f, μ B . Since Hα (Σ × P(Rm )) is dense in L ∞ (Σ × P(Rm )), it  follows that ν B = μ B . The Laplace-Markov operator Q A,z of the observed Markov system (K A , μ A , ξ A ) is given by  (Q A,z f )(x, y, p) =

Σ

f (y, z, A(y, z) p) A(y, z) z K (y, dz).

200

5 Large Deviations for Random Cocycles

Proof (of Proposition 5.18) Assumption (A1) follows from the definition of X. Like the Markov operator Q A defined in (5.19), the Laplace-Markov operator Q A,z leaves invariant the subspaces Hα (Σ × P(Rm )), for all small enough α > 0. Choose 0 < α1 < α0 ≤ 1 according to Proposition 5.22. Assumption (A2) is a consequence of this proposition. Assumption (A3) is automatically satisfied because A ∞ < ∞ and A−1 ∞ < ∞ which imply that ξ A ∈ Hα (Σ ×P(Rm )) for all α > 0. Note that Q A,z = Q A ◦Dez ξa , where Dez ξa denotes the multiplication operator by e z ξa . This is a bounded operator because Hα (Σ × P(Rm )) is a Banach algebra containing the function e z ξa . Finally the next lemma proves (A4).  Lemma 5.10 Given A, B ∈ B∞ m (K ) and b > 0, there is a constant C 2 > 0 such that for all f ∈ Hα (Σ × P(Rm )), and all z ∈ C such that Re z ≤ b,

Q A,z f − Q B,z f ∞ ≤ C2 d∞ (A, B)α f α . Moreover, C2 is bounded on a neighborhood of A. Proof A simple computation shows that for all z ∈ C with Re z ≤ b, and all A, B ∈ GL(d, R),    A p z − B p z  ≤ b max{ A b−1 , B b−1 } A − B . Hence     (Q A,z f − Q B,z f )(x, p) ≤ Ex  A p z f (e1 , A p) − B p z f (e1 , B p)     ≤ f ∞ Ex  A p z − B p z  + B b∞ Ex  f (e1 , A p) − f (e1 , B p) b−1 , B b−1 } A − B f + B b v ( f ) E δ(A p, B p)α  ≤ b max{ A ∞ ∞ ∞ x ∞ ∞ α b−1 , B b−1 } A − B f

≤ b max{ A ∞ ∞ ∞ ∞

+ B b∞ vα ( f ) A − B α∞ ≤ C2 f α d∞ (A, B)α , b−1 where C2 = max{ B b∞ , b B b−1 ∞ , b A ∞ }.



Proof (of Theorem 5.3) By Theorem 5.4, there exists V ⊂ X neighborhood of (K A , μ A , ξ A ), which we identify with a neighborhood of A ∈ B∞ m (K ), and there are constants ε0 , C, h > 0 such that for all B ∈ V, 0 < ε < ε0 , (x, p) ∈ Σ × P(Rm ) and n ∈ N,

 1 ε2 (n)   log B p − L 1 (B, μ) ≥ ε ≤ C e− 2 h n . Px n Integrating w.r.t. μ we get for all p ∈ P(Rm ), Pμ

 1  log B (n) p − L 1 (B, μ) ≥ ε n

ε2

≤ C e− 2 h n .

5.3 The Proof of LDT Estimates

201

Choose the canonical basis {e1 , . . . , em } of Rm and consider the following norm ·  on the space of matrices Mat(m, R), M  := max1≤ j≤m M e j . Since this norm is equivalent to the operator norm, for all B ∈ V, p ∈ P(Rm ) and n ∈ N,

B (n) p ≤ B (n)  B (n)  = max B (n) e j . 1≤ j≤m

Thus a simple comparison of the deviation sets gives Pμ

 1  log B (n) − L 1 (B, μ) ≥ ε n

ε2

 e− 2 h n

for all B ∈ V, 0 < ε < ε0 and n ∈ N.



5.4 Deriving Continuity of the Lyapunov Exponents In this last section we use the LDT estimates (Theorems 5.2 and 5.3) to derive the continuity of the Lyapunov exponents and of the Oseledets filtration/decomposition. We give some simple generalizations of the continuity results and explain the method’s limitations regarding the continuity of the LE in the reducible case.

5.4.1 Proof of the Continuity For the reader’s convenience, we briefly review the relevant definitions as we explain how Theorems 3.1, 4.7 and 4.8 are applicable to the context of this chapter. Proof (of Theorem 5.1) Let (K , μ) be a strongly mixing Markov system, and consider the associated Markov shift (X, Pμ , T ). The collection C = {(Im∞ (K ), d∞ )}m∈N of totally irreducible cocycles over (X, Pμ , T ) is a space of measurable cocycles in the sense of Definition 3.1. Consider the space of LDT parameters P = N × E × I, where E is the set of constant deviation functions ε(t) ≡ ε, 0 < ε < 1, and we use the set of exponential functions I = { ι(t) ≡ M e−c t : M < ∞, c > 0 } to measure the deviation sets. Define Ξ to be the set of observables ξ : X → R which depend only on finitely many coordinates. Finally, take p = ∞. We now check the four assumptions of Theorems 3.1, 4.7 and 4.8. Given a cocycle A ∈ Im∞ (K ) and N ∈ N, let F N (A) be the lattice generated by the sets {x ∈ X : A(n) (x) ≤ c} and {x ∈ X : A(n) (x) ≥ c}. Recall that Ξ is said to be compatible with a cocycle A ∈ B∞ m (K ) when for any N ∈ N, F ∈ F N (A), ε > 0, there is ξ ∈ Ξ such that 1 F ≤ ξ and X ξ dμ ≤ μ(F)+ε. In our setting, the set Ξ is compatible with all cocycles A ∈ B∞ m (K ) because for any

202

5 Large Deviations for Random Cocycles

set F ∈ F N (A) its indicator function 1 F depends only on finitely many coordinates, i.e., 1 F ∈ Ξ . Hence the compatibility inequality holds with ε = 0. Recall that ξ ∈ Ξ is said to satisfy a base-LDT estimate if for every ε > 0 there is an LDT parameter (n 0 , ε, ι) ∈ P such that for all n ≥ n 0 we have ε(n) ≤ ε and  n−1     {x ∈ X :  1 ξ(T j x) − ξ d x  > ε(n)} < ι(n). n j=0 X Given an observable ξ ∈ Ξ there exists p ∈ N such that ξ ◦ T p depends only on negative coordinates, i.e., coordinates x j with − p ≤ j ≤ 0. This implies that ξ ◦ T p ∈ Hα (X − ). By Theorem 5.2, the observable ξ ◦ T p satisfies a base-LDT estimate w.r.t. P. Since  Sn (ξ ) − Sn (ξ ◦ T p ) converges uniformly to zero as n → ∞, it follows that ξ satisfies base-LDT estimates too. Recall that a cocycle A is said to be uniformly L p -bounded if there is a constant C = C(A) < ∞ such that )1 ) ) ) ) log B (n) ) p < C L n for all cocycles B that are close enough to A (in the given topology on the space of cocycles) and for all scales n ≥ 1. In our setting the L p -boundedness is automatic because p = ∞ and the cocycle functions B and B −1 are bounded. Recall that a cocycle A is said to satisfy a uniform fiber-LDT if for every ε > 0 there are δ > 0 and an LDT parameter (n 0 , ε, ι) ∈ P such that for any cocycle B with dist(B, A) < δ and for all n ≥ n 0 , we have ε(n) ≤ ε and     {x ∈ X :  1 log B (n) (x) − L (n) (B) > ε(n)} < ι(n). 1 n Given A ∈ Im∞ (K ) such that L 1 (A) > L 2 (A), by Theorem 5.3 the cocycle A satisfies uniform fiber-LDT estimates w.r.t. P. A simple computation shows that the modulus of continuity associated to the choice of deviation function sets E and I above corresponds to Hölder continuity. Hence, this theorem follows from Theorems 3.1, 4.7 and 4.8. 

5.4.2 Some Generalizations Consider a compact metric space Σ. A Markov kernel of order p ∈ N on Σ is a map K : Σ p → Prob(Σ) that assigns a probability measure K (x0 , . . . , x p−1 , dy) on Σ to each tuple (x0 , . . . , x p−1 ) ∈ Σ p . The concept of Markov kernel in Definition 5.1 corresponds to a Markov kernel of order p = 1.

5.4 Deriving Continuity of the Lyapunov Exponents

203

Any Markov kernel K of order p on Σ determines the following Markov kernel Kˆ of order 1 on the product space Σ p , Kˆ (x0 , . . . , x p−1 ) :=

 Σ

δ(x1 ,...,x p ) K (x0 , . . . , x p−1 , d x p ).

A probability measure μ on Σ p is said to be K -stationary when it is Kˆ -stationary. We call a Markov system of order p any pair (K , μ), where K is a Markov kernel of order p on Σ, and μ is a K -stationary probability on Σ p . We say that (K , μ) is strongly irreducible when ( Kˆ , μ) is a strongly irreducible Markov system on Σ p . Given a Markov system (K , μ) of order p, let Pˆ μ denote the Kolmogorov extension of (K , μ) on the space of sequences Xˆ := (Σ p )Z. Then, letting Tˆ : Xˆ → Xˆ denote the shift homeomorphism, the triple Xˆ , Pˆ μ , Tˆ is a Markov shift. Let X := Σ Z and consider the maps ψ : X → Xˆ , ψ{xn }n∈Z = {(xn , . . . , xn+ p−1 )}n∈Z , π : Xˆ → X , π {(x0,n , . . . , x p−1,n )}n∈Z = {x0,n }n∈Z , which satisfy π ◦ ψ = id X . ˆ Defining P  μ := π ∗ Pμ , these maps are bimeasurable isomorphisms conjugating the shifts on Xˆ , Pˆ μ and (X, Pμ ), where the measure Pμ is invariant under the shift T : X → X . The triple (X, Pμ , T ) is called a Markov shift of order p. Consider now the space B∞ m (K , p) of measurable functions A : X → GL(m, R) which depend only on the coordinates (x0 , . . . , x p ) ∈ Σ p+1 with A ∞ < ∞ and

A−1 ∞ < ∞. Note that the iterates of A are A(n) (x) = A(xn−1 , . . . , xn−1+ p ) . . . A(x1 , . . . , x1+ p ) A(x0 , . . . , x p ). p+1 → GL(m, R). Each such We identify B∞ m (K , p) as a space of functions A : Σ function determines a locally constant cocycle over the Markov shift (X, Pμ , T ). p p ˆ Given A ∈ B∞ m (K , p), we define A : Σ × Σ → GL(m, R)

  Aˆ (x0 , . . . , x p−1 ), (y0 , . . . , y p−1 ) := A(x0 , . . . , x p−1 , y p−1 ). Identifying Aˆ with a function Aˆ : Xˆ → GL(m, R) we have Aˆ ◦ ψ = A. Hence the ˆ and (T, A) are conjugated. cocycles (Tˆ , A) The cocycle (T, A) over the Markov shift (X, Pμ , T ) will be called a random Markov cocycle of order p. Define Im∞ (K , p) to be the subspace of totally irreducible cocycles A ∈ ∞ Bm (K , p), i.e., the subspace of cocycles A such that Aˆ is totally irreducible over ( Xˆ , Pˆ μ , Tˆ ). From these considerations and Theorem 5.1 we obtain the following result.

204

5 Large Deviations for Random Cocycles

Theorem 5.5 Let (K , μ) be a strongly mixing Markov system of order p ∈ N. Then all Lyapunov exponents L j : Im∞ (K , p) → R, with 1 ≤ j ≤ m, the Oseledets filtration F : Im∞ (K , p) → F(X, Rm ), and the Oseledets decomposition E · : Im∞ (K , p) → D(X, Rm ), are continuous functions of the cocycle A ∈ Im∞ (K , p). Moreover, if A ∈ Im∞ (K , p) has a τ -gap pattern then the functions Λτ , F τ and τ E · are Hölder continuous in a neighborhood of A. In particular, all conclusions above on the continuity of the LE, the Oseledets filtration, and the Oseledets decomposition, apply to irreducible and locally constant cocycles over strongly mixing Markov and Bernoulli shifts. The abstract setting developed in Sect. 5.2 is general enough to deal with a cocycle having singularities, i.e., points x ∈ X where the matrix A(x) is singular. Consider the family of spaces Bam (K ), with 0 < a < ∞, consisting of all bounded measurable functions A : Σ × Σ → GL(m, R) such that for some C > 0 and all x ∈ Σ,  ηaA (x) := A(x, y)−1 a K (x, dy) ≤ C. Σ

Equip this space with the distance da (A, B) := A − B ∞ + ηaA − ηaB ∞ . The collection C = {(Bam (K ), da )}m∈N is not a space of measurable cocycles, because item 3 of Definition 3.1 fails. However, both the uniform fiber-LDT estimates and the continuity statements about the LE can be extended to the spaces Ima (K ) of totally irreducible cocycles in Bam (K ). More precisely, it can be proved that Theorem 5.3 holds for all a ≥ 4, and Theorem 5.1 holds for all a ≥ 4 m.

5.4.3 Method Limitations We need the irreducibility assumption in order to prove uniform fiber LDT estimates in Theorem 5.3. The proof exploits the fact that for irreducible cocycles there is some Banach algebra of measurable functions, independent of the cocycle, where the associated Laplace-Markov operators act as quasi-compact and simple operators (see Proposition 5.22). For reducible cocycles this fact may still be true, and it could eventually lead to fiber LDT estimates. However, the Banach algebra would have to be tailored to the cocycle, and hence the scheme of proof presented here would not provide the required uniformity.

References

205

References 1. P. Bougerol, Théorèmes limite pour les systèmes linéaires à coefficients markoviens. Probab. Theory Related Fields 78(2), 193–221 (1988). MR 945109 (89i:60122) 2. P. Bougerol, J. Lacroix, Products of random matrices with applications to Schrödinger operators, Progress in Probability and Statistics, vol. 8 (Birkhäuser Boston Inc., Boston, MA, 1985). MR 886674 (88f:60013) 3. J.L. Doob, Stochastic Processes (Wiley Classics Library, Wiley, New York, 1990), Reprint of the 1953 original, A Wiley-Interscience Publication. MR 1038526 (91d:60002) 4. H. Furstenberg, H. Kesten, Products of random matrices. Ann. Math. Stat. 31, 457–469 (1960). MR 0121828 (22 #12558) 5. H. Furstenberg, Noncommuting random products. Trans. Am. Math. Soc. 108, 377–428 (1963). MR 0163345 (29 #648) 6. Y. Guivarc’h, A. Raugi, Frontière de Furstenberg, propriétés de contraction et théorèmes de convergence. Z. Wahrsch. Verw. Gebiete 69(2), 187–242 (1985). MR 779457 (86h:60126) 7. H. Hennion, L. Hervé, Limit Theorems for Markov Chains and Stochastic Properties of Dynamical Systems by Quasi-compactness, Lecture Notes in Mathematics, vol. 1766 (Springer, Berlin, 2001) 8. A.S. Kechries, Kechries, Classical Descriptive Set Theory, Graduate Texts in Mathematics, vol. 156 (Springer, New York, 1995) 9. S.G. Krein, Yu.I. Petunin, Scales of Banach spaces. Russ. Math. Surv. 21(2), 85 (1966) 10. É. Le Page, Théorèmes limites pour les produits de matrices aléatoires, Probability measures on groups (Oberwolfach, 1981), Lecture Notes in Mathematics, vol. 928 (Springer, Berlin, 1982), pp. 258–303. MR 669072 (84d:60012) 11. E. Lukacs, Characteristic Functions (Hafner Publishing Co., New York, 1970), 2nd edn. revised and enlarged. MR 0346874 (49 #11595) 12. S.V. Nagaev, Some limit theorems for stationary Markov chains. Teor. Veroyatnost. i Primenen. 2, 389–416 (1957). MR 0094846 (20 #1355) 13. K. Petersen, Ergodic Theory, Cambridge Studies in Advanced Mathematics, vol. 2 (Cambridge University Press, Cambridge, 1983). MR 833286 (87i:28002) 14. A. Raugi, J. Rosenberg, Fonctions harmoniques et théorèmes limites pour les marches aléatoires sur les groupes, Bulletin de la Société mathématique de France. Mémoire, no. 54, Société mathématique de France (1977) 15. F. Riesz, B. Sz˝okefalvi-Nagy, Functional Analysis, Ungar (1955) 16. V.N. Tutubalin, Limit theorems for a product of random matrices. Teor. Verojatnost. i Primenen. 10, 19–32 (1965). MR 0175169 (30 #5354)

Chapter 6

Large Deviations for Quasi-Periodic Cocycles

Abstract We derive large deviations type estimates for linear cocycles over an ergodic multifrequency torus translation. These models are called quasi-periodic cocycles. We make the following assumptions on the model: the translation vector satisfies a generic Diophantine condition, and the fiber action is given by a matrix valued analytic function of several variables which is not identically singular. The LDT estimates obtained here depend on some uniform measurements on the cocycle. The general results derived in the previous chapters regarding the continuity properties of the Lyapunov exponents and of the Oseledets filtration and decompositions are then applicable. In particular we obtain local weak-Hölder continuity of these quantities in the presence of gaps in the Lyapunov spectrum. The main new feature of this work is allowing a cocycle depending on several variables to have singularities, i.e. points of non invertibility. This requires a careful analysis of the set of zeros of certain analytic functions of several variables and of the singularities (i.e. negative infinity values) of pluri-subharmonic functions related to the iterates of the cocycle. A refinement of this method in the one variable case leads to a stronger LDT estimate and in turn to a stronger, nearly-Hölder modulus of continuity of the LE, Oseledets filtration and Oseledets decomposition.

6.1 Introduction and Statements We introduce the quasi-periodic cocycles model and describe our assumptions on it. We then formulate the main statements and relate them to recent results for similar models.

6.1.1 Description of the Model Let T = R/Z be the one variable torus, which we may regard as the unit circle in the complex plane. We use the notation e(x) = e2πi x and in fact we write f (x) instead of f (e(x)) whenever f is a function on T. © Atlantis Press and the author(s) 2016 P. Duarte and S. Klein, Lyapunov Exponents of Linear Cocycles, Atlantis Studies in Dynamical Systems 3, DOI 10.2991/978-94-6239-124-6_6

207

208

6 Large Deviations for Quasi-Periodic Cocycles

Throughout this chapter, a  b will stand for a ≤ Cb for some context-universal constant C, which may be discarded from subsequent estimates. Let Td = (R/Z)d be the torus with d ≥ 1 variables. We denote by |·| the Haar measure on Td . Let T x = x + ω be the translation on the torus Td by the vector ω = (ω1 , ω2 , . . . , ωd ). We assume that 1, ω1 , ω2 , . . . , ωd are rationally independent, hence T is ergodic. The map T defines the base dynamics and it is assumed fixed. Let A : Td → Mat(m, R) be a matrix valued real analytic function. The pair (T, A), acting on the vector bundle Td × Rm by (x, v) → (T x, A(x)v) is called an analytic, quasi-periodic linear cocycle (quasi-periodic because of the base dynamics, linear due to the linear fiber action, and analytic due to the analytic dependence on the base point). As before, the frequency vector ω will be fixed, hence we identify the cocycle with its fiber action given by the function A(x). In order to be able to treat occurrences of small denominators, the translation vector will be assumed to satisfy a generic Diophantine condition: k · ω ≥

t |k|

d+δ0

(6.1)

for some t > 0, δ0 > 0 and for all k ∈ Zd \ {0}, where for any real number x we denote x := mink∈Z x − k . Note that if δ0 is fixed and if for every t > 0, DCt denotes the set of frequency vectors satisfying the condition (6.1) above, then the set DC := ∪t>0 DCt has full measure. Since A(x) is analytic on Td , it has an extension A(z) to Ard = Ar × · · · × Ar , where Ar = {z ∈ C : 1 − r < z  < 1 + r } is the annulus of width 2r . Note that the iterates A(n) (x) := A(x + (n − 1) ω) . . . A(x + ω) A(x) of the cocycle are also analytic functions on Ard . We denote by Crω (Td , R) the Banach space of real valued analytic functions   on d Ar with continuous extension up to the boundary and norm  f r := supz∈Ard  f (z). For every integer m ≥ 1, let Crω (Td , Mat(m, R)) be the vector space of matrix valued analytic functions on Ard , with continuous extension up to the boundary. Endowed with the norm Ar := supz∈Ard A(z), it is a Banach space. In a previous work (see [7]) we studied GL(m, R)-valued analytic cocycles. Here we will allow our cocycles to have singularities (i.e. points of non-invertibility), as long as they are not identically singular (which in particular ensures that all Lyapunov exponents are finite). Let us then define Cm to be the set of cocycles A ∈ Crω (Td , Mat(m, R)) with det[A(x)] ≡ 0. This condition implies in particular that all LE are finite. The set Cm is open in Crω (Td , Mat(m, R)) and we let dist(A, B) := A − Br be the induced distance on it. The collection C := {(Cm , dist)}m≥1 is the space of cocycles for our quasi-periodic model. For the reader’s convenience we briefly recall some definitions and notations regarding the Lyapunov exponents, Oseledets filtrations and decompositions of a cocycle A in a space of cocycles Cm .

6.1 Introduction and Statements

209

The ergodic theorem of Kingman allows us to define the Lyapunov exponents L j (A) with 1 ≤ j ≤ m as L j (A) := Λ j (A) − Λ j−1 (A) where Λ j (A) := lim

n→∞

1 log∧ j A(x) n

for -a.e. x ∈ Td .

Let τ = (1 ≤ τ1 < · · · < τk < m) be a signature. If A ∈ Cm has a τ -gap pattern, i.e. L τ j (A) > L τ j+1 (A) for all j, we define the Lyapunov τ -block Λτ (A) := (Λτ1 (A), . . . , Λτk (A)) ∈ Rk . A flag of Rm is any increasing sequence of linear subspaces. The corresponding sequence of dimensions is called its signature. A measurable filtration is a measurable function on Td , taking values in the space of flags of Rm with almost sure constant signature. We denote by F(Td , Rm ) the space of measurable filtrations. Note that the Oseledets filtration of A, which we denote by F(A), is an element of this space. We denote by F⊃τ (Td , Rm ) the subset of measurable filtrations with a signature τ or finer. If F ∈ F⊃τ (Td , Rm ) then there is a natural projection F τ with signature τ , obtained from F by simply ‘forgetting’ some of its components. This space is endowed with the following pseudo-metric 



distτ (F, F ) :=

Td

dτ (F τ (x), (F  )τ (x)) d x,

where dτ refers to the metric on the τ -flag manifold. On the space F(Td , Rm ) we consider the coarsest topology that makes the sets F⊃τ (Td , Rm ) open, and the pseudo-metrics dist τ continuous. A decomposition of Rm is a sequence of linear subspaces {E j }1≤ j≤k+1 whose direct sum is Rm . This determines the flag E 1 ⊂ E 1 ⊕ E 2 ⊂ · · · ⊂ E 1 ⊕ · · · ⊕ E k , whose signature τ also designates the signature of the decomposition. A measurable decomposition is a measurable function on Td , taking values in the space of decompositions of Rm with almost sure constant signature. We denote by D(Td , Rm ) the space of measurable decompositions. Note that the Oseledets decomposition of A, which we denote by E · (A), is an element of this space. We denote by D⊃τ (Td , Rm ) the subset of measurable decompositions with a signature τ or finer. If E · ∈ D⊃τ (Td , Rm ) then there is a natural restriction E ·τ with signature τ , obtained from E · by simply ‘patching up’ the appropriate components. This space is endowed with the following pseudo-metric distτ (E · , E · )

 :=

Td

dτ (E ·τ (x), (E · )τ (x)) d x,

where dτ refers to the metric on the manifold of τ -decompositions.

210

6 Large Deviations for Quasi-Periodic Cocycles

On the space D(Td , Rm ) we consider the coarsest topology that makes the sets D⊃τ (Td , Rm ) open, and the pseudo-metrics dist τ continuous. We are ready to state a general result on the continuity of the LE, the Oseledets filtration and the Oseledets decomposition for quasi-periodic cocycles. Theorem 6.1 Assume that the translation ω ∈ DCt for some t > 0, and let m ≥ 1. Then all Lyapunov exponents L j : Cm → R, with 1 ≤ j ≤ m, the Oseledets filtration F : Cm → F(Td , Rm ), and the Oseledets decomposition E · : Cm → D(Td , Rm ), are continuous functions of the cocycle A ∈ Cm . Moreover, if A ∈ Cm has a τ -gap pattern then the functions Λτ , F τ and E ·τ are weak-Hölder continuous in a neighborhood of A. In the one-variable case d = 1, and for translations that satisfy a slightly stronger (but still generic) Diophantine condition, we obtain a stronger modulus of continuity (see Sect. 6.5). Remark 6.1 The gap pattern hypothesis in Theorem 6.1, which for SL(2, R)-valued cocycles simply means positivity of the top LE, is a necessary assumption for obtaining a modulus of continuity (see Chap. 8 in J. Bourgain’s monograph [3] and the preamble to Sect. 2 in Ávila’s paper [1]). Moreover, according to a private conversation of the second author with Qi Zhou, it seems that even in the presence of a gap pattern, an arithmetic condition on the frequency ω is also a necessary assumption for proving Hölder or weak-Hölder continuity of the LE.

6.1.2 Literature Review A more detailed review of related results was given in Sect. 1.6 of the introductory Chap. 1. The reader may also consult the recent surveys [6, 14]. Here we focus mainly on recent work on quasi-periodic models and on their difference with our results. In some sense, the strongest result on continuity of the Lyapunov exponents for quasi-periodic cocycles in the one-frequency translation (d = 1) case is due to Ávila et al. [2]. The space of cocycles considered in this paper is the whole Crω (T1 , Mat(m, C)) (hence identically singular cocycles are not excluded) and the authors prove joint continuity in cocycle and frequency at all points (A, ω) with ω irrational. A previous work of Jitomirskaya and Marx [13] established a similar result for Mat(2, C)-valued cocycles, using a different approach. We note here that both approaches rely crucially on the convexity of the top Lyapunov exponent of the complexified cocycle, as a function of the imaginary variable, by establishing first continuity away from the torus. This method immediately breaks down in the several variables (d > 1) case which we treat here. The approaches of [2, 13] are independent of any arithmetic constraints on the translation frequency ω and they do not use large deviations. However, the results are not quantitative, in the sense that the they do not provide any modulus of continuity of the Lyapunov exponents. All available quantitative results, from the classic result of

6.1 Introduction and Statements

211

Goldstein and Schlag [9] to more recent results such as [7, 20, 21, 23] or our current work, use some type of large deviations, whose derivation depends upon imposing appropriate arithmetic conditions on ω. We note that in the (more particular) context of Schrödinger cocycles, joint continuity in the energy parameter and the frequency translation was proven for the one variable d = 1 case by Bourgain and Jitomirskaya [5] and for the several variables d > 1 case by Bourgain [4]. Both papers used weaker versions of large deviation estimates, proven under weak arithmetic (i.e. restricted Diophantine) conditions on ω, although eventually the results were made independent of any such restrictions. Continuity properties of the Lyapunov exponents were also established for certain non-analytic quasi-periodic models (see [15, 16, 22]). In this chapter we are dealing with both a base dynamics given by a translation on the several variables torus and a fiber action which has singularities. Our approach is based in an essential way upon establishing certain uniform estimates on analytic functions of several variables and on pluri subharmonic functions. The issue of singularity is especially delicate for several variables functions. One obstacle, for instance, is the fact that an analytic function of several variables may vanish identically along hyperplanes, while not being globally identically zero. Related to this, a pluri subharmonic function may be −∞ along hyperplanes, while not being globally −∞. A crucial tool in our analysis is Theorem 6.3, which shows that the obstacle described above for an analytic function can be removed with an appropriate change of coordinates. Another crucial tool in our analysis is the observation that for the pluri subharmonic functions corresponding to iterates of a cocycle, while they may have singularities as described above, these singularities can be captured by certain analytic functions. Most of the work in this chapter is devoted to proving a uniform LDT estimate for iterates of the cocycle. Uniform LDT estimates for cocycles with singularities were obtained before by Jitomirskaya and Marx [12] for Mat(2, C)-valued cocycles. Again, the approach in [12] is one-variable specific. Let us now phrase the large deviation type estimate obtained in this chapter. Theorem 6.2 Given A ∈ Cm and ω ∈ DCt , there are constants δ = δ(A) > 0, n 0 = n 0 (A, t) ∈ N, C = C(A) < ∞, a = a(d) > 0 and b = b(d) > 0 such that if B − Ar ≤ δ and n ≥ n 0 then     {x ∈ Td :  1 logB (n) (x) − L (n) (B) > C n −a } < e−n b . 1 n

(6.2)

Once the above LDT is established, we simply need to verify that we are in the context of the abstract continuity Theorem 1.6 in Chap. 3. We note that the above LDT estimate is of independent interest. Such estimates have been widely used in the study of discrete quasi-periodic Schrödinger operators, to establish positivity of Lyapunov exponents, estimates on Green’s functions,

212

6 Large Deviations for Quasi-Periodic Cocycles

continuity of the integrated density of states and spectral properties (such as Anderson localization) for such operators (see Bourgain’s monograph [3], see also [15, 16] and references therein). The LDTs proven here, along with the other technical analytic tools, may then prove useful for future projects on topics in mathematical physics related to larger classes of discrete, quasiperiodic operators. The rest of this chapter is organized as follows. In Sect. 6.2 we prove general uniform estimates on analytic and pluri subharmonic functions. These abstract results are then applied in Sect. 6.3 to quantities related to cocycles iterates, leading to the proof of the LDT. In Sect. 6.4 we explain how our system satisfies the assumptions of the general criterion in Chap. 3. Finally, in Sect. 6.5 we show that in the onevariable case, the LDT proven in Sect. 6.3 can be used in an inductive argument that eventually leads to a sharper LDT, and in turn, to a stronger modulus of continuity for the Lyapunov exponents.

6.2 Estimates on Unbounded Pluri-Subharmonic Functions In this section we derive certain uniform estimates on analytic functions of severable variables and on pluri-subharmonic functions. These estimates are of a general nature, and they will be applied in the next section to quantities related to iterates of analytic cocycles. Uniformity is understood relative to some measurements which are stable under small perturbations of the functions being measured. The main technical difficulties in establishing these estimates are related to the non-trivial nature of the zeros of an analytic function of several variables, and correspondingly, to the unboundedness of the pluri-subharmonic functions we study.

6.2.1 The Uniform Łojasiewicz Inequality Throughout this chapter, a quantitative description of quasi-analyticity, the Łojasiewicz inequality, will play a crucial role. We make the observation that this property is uniform in a small neighborhood of such a function. Lemma 6.1 Let f (x) ∈ Crω (Td , R) such that f (x) ≡ 0. Then there are constants δ = δ( f ) > 0, S = S( f ) < ∞ and b = b( f ) > 0 such that if g(x) ∈ Crω (Td , R) with g − f r < δ then     {x ∈ Td : g(x) < t} < S t b

for all t > 0.

(6.3)

  Proof We may assume that f (x) is not constant, otherwise f (x) ≡ C, C  > 0 and (6.3) is then obvious.

6.2 Estimates on Unbounded Pluri-Subharmonic Functions

213

Łojasiewicz inequality (6.3) for a fixed, non-constant analytic function f (x) of several variables has been established for instance in [9] (see Lemma 11.4), and for smooth, transversal functions in [15] for d = 1 (see Lemma 5.3) and in [16] for d > 1 (see Theorem 5.1). Moreover, the constants S and b in [15, 16] depend explicitly on some measurements of f , and it is easy to see that these measurements are uniform. Assume for simplicity that d = 2, although a similar argument holds for any d ≥ 1. Then recall from [16] that a smooth function f (x) is called transversal if for any point x ∈ T2 there is a multi-index α = (α1 , α2 ) ∈ N2 , α = (0, 0) such that the corresponding partial derivative is non-zero: ∂ α f (x) = 0. Clearly, non-constant analytic functions are smooth and transversal. By Lemma 5.1 in [16], for such a function f , there are m = m( f ) = (m 1 , m 2 ) ∈ N2 , m = (0, 0) and c = c( f ) > 0 such that for any x ∈ T2 we have  α  ∂ f (x) ≥ c (6.4) for some multi-index α = (α1 , α2 ) with α1 ≤ m 1 , α2 ≤ m 2 . Let   A( f ) := max{∂ α f (x) : x ∈ Td , α = (α1 , α2 ), α1 ≤ m 1 + 1, α2 ≤ m 2 + 1}. (6.5) Theorem 5.1 in [16] says that     {x ∈ T2 :  f (x) < t} < St b

for all t > 0

where, according to the last line of its proof (see also Remark 5.1) S = S( f ) ∼ A( f ) · m( f ) and b = b( f ) = 3m(1 f ) . Therefore, in order to obtain the uniform estimate (6.3), all we need to show is that the above constants m = m( f ), c = c( f ), A = A( f ) depend uniformly on the function f . Indeed, let g ∈ Crω (T2 , R) such that g − f r < δ. By analyticity, for some constant B = B( f ) depending only on α and r , ∂ α g − ∂ α f 0 ≤ B δ, hence if α = (α1 , α2 ) with α1 ≤ m 1 + 1, α2 ≤ m 2 + 1 and m = (m 1 , m 2 ) = m( f ) from above, then c( f ) , ∂ α g − ∂ α f 0 ≤ B( f )δ = 2 c( f ) provided δ = δ( f ) := 2B( . f) From (6.4) we conclude that if g − f r < δ, then for every x ∈ T2

  α ∂ g(x) ≥ c 2 holds for some multi-index α = (α1 , α2 ) with α1 ≤ m 1 , α2 ≤ m 2 .

214

6 Large Deviations for Quasi-Periodic Cocycles

Moreover, for such functions g, the upper bound A(g) satisfies A(g) ≤ A( f ) +

c ∼ A( f ), 2 

which concludes the proof of the lemma.

Lemma 6.2 Let f be a bounded function satisfying the Łojasiewicz inequality with constants S, b:     {x ∈ Td :  f (x) < t} < S t b for all t > 0. (6.6) Then

  log f  L 2 (Td ) ≤ C,

(6.7)

where C is a finite explicit constant depending only on  f ∞ , S and b, i.e. on some measurements of f . Proof The argument is straightforward. From (6.6), the set {x : f (x) = 0} has zero measure. Split the rest of the phase space into   E :={x :  f (x) ≥ 1},   1 1 E n :={x : n+1 ≤  f (x) ≤ n } 2 2 for all n ≥ 0.          If x ∈ E, then log  f (x) ≤ log f ∞ < ∞.     If x ∈ E n , then log f (x)  n + 1.    b  n Moreover, from (6.6),  E n  < S 21n = S 21b . Then   log f 2L 2 (Td ) =



∞    2  log  f  + E

2  ≤ log f ∞ + S



  2 log  f 

n=0 E n ∞  n=0

(n + 1)2 (2b )n

2  22b (2b + 1) = log f ∞ + S . (2b − 1)3 We then conclude:     log f  L 2 (Td )  log f ∞  + S(2b − 1)−3/2 .



Remark 6.2 The previous two lemmas imply that if f ∈ Crω (Td , R), f ≡ 0, then there are constants δ = δ( f ), C = C( f ) < ∞ such that

6.2 Estimates on Unbounded Pluri-Subharmonic Functions

215

  logg  L 2 (Td ) ≤ C holds for any g ∈ Crω (Td , R) with g − f r < δ.

6.2.2 Uniform L 2 -Bounds on Analytic Functions Consider the following norm on measurable functions f : Td → R  ||| f ||| :=

sup

x1 ,...,x j−1 ,x j+1 ,...,xd ∈T 1≤ j≤d

T

   f (x1 , . . . , x j−1 , t, x j+1 , . . . , xd )2 dt

1/2 .

We say that a measurable function f : Td → R is uniformly separately L 2 -bounded if ||| f ||| < +∞. Lemma 6.3 Given a translation T on Td , for any measurable function f : Td → R, ||| f ◦ T ||| = ||| f |||. Proof Just use (in each variable) the translation invariance of the Lebesgue measure on T.  Definition 6.1 We say that a function f : Td → R vanishes along an axis if there are 1 ≤ j ≤ d and x1 , . . . , x j−1 , x j+1 , . . . , xd ∈ T so that f (x1 , . . . , x j−1 , t, x j+1 , . . . , xd ) = 0 for every t ∈ T. d Lemma 6.4 Given  an analytic function f : T2 → R, if f does not vanish along   any axis then log f is uniformly separately L -bounded.

Proof The assumption implies that for all 1 ≤ j ≤ d and for all x1 , . . . , x j−1 , x j+1 , . . . , xd ∈ T, the analytic function ϕ j;x1 ,...,x j−1 ,x j+1 ,...,xd (t) := f (x1 , . . . , x j−1 , t, x j+1 , . . . , xd ) is not identically zero. Since clearly for all 1 ≤ j ≤ d, the set of functions {ϕ j;x1 ,...,x j−1 ,x j+1 ,...,xd (t) : x1 , . . . , x j−1 , x j+1 , . . . , xd ∈ T} is compact, applying Remark 6.2 (with d = 1) to the one-variable functions above, we  conclude that there is C = C( f ) < ∞ such that logϕ j;x1 ,...,x j−1 ,x j+1 ,...,xd  L 2 < C, which shows that log f  is uniformly separately L 2 -bounded. We note that in fact more can be shown, namely that for all  1 ≤ j ≤ d, the function  H j : Td−1 → R, H j (x1 , . . . , x j−1 , x j+1 , . . . , xd ) := logϕ j;x1 ,...,x j−1 ,x j+1 ,...,xd  L 2 is continuous, hence it has a maximum value, which leads to the same conclusion. 

216

6 Large Deviations for Quasi-Periodic Cocycles

Theorem 6.3 For any analytic function f ∈ Crω (Td , R) with f ≡ 0, there are δ = δ( f, r ) > 0, C = C( f, r ) < ∞ and there is a matrix M ∈ SL(d, Z) such that for any g ∈ Crω (Td , R) with  f − gr < δ, and for any x1 , . . . , x j−1 , x j+1 , . . . , xd ∈ T with 1 ≤ j ≤ d,   logg ◦ M(x1 , . . . , x j−1 , · , x j+1 , . . . , xd ) L 2x

j

(T)

≤ C.

In other in the torus Td , the functions   words, up to some linear change of coordinates 2   log g with g near f are uniformly separately L -bounded. Given δ > 0, define Σδ to be the set of all k ∈ Zd such that any geodesic circle parallel to the vector through k is δ-dense in Td . Proposition 6.1 Given δ > 0, there is a matrix M ∈ SL(d, Z) such that every column of M is in Σδ . Proof Take any matrix M ∈ SL(d, Z) with non-negative entries which is primitive, and has a characteristic polynomial p M (λ) = det(M − λI ) irreducible over Z (see Lemma 6.5). Then M has a dominant eigenvector ω ∈ int(Rd+ ). Consider the canonical projection π : Rd → Td and define H = π(ω) (topological closure in Td ), where ω = { t ω : t ∈ R }. Define also h = π −1 (H ). The set H is a compact connected subgroup of Td , while h is a linear subspace of Rd , the Lie algebra of H . The group H is invariant under the torus automorphism φ M : Td → Td , φ M (x) = M x (mod Zd ), and hence (by restriction and quotient) the map φ M induces H : Td /H → Td /H . Thus M h = h, the toral automorphisms φ H : H → H and φ and the linear maps of these toral automorphisms at the level of Lie algebras are h : Rd /h → Rd /h, Φ h (x + h) = M x + h. The Φh : h → h, Φh (x) = M x, and Φ characteristic polynomials of these linear automorphisms have integer coefficients because they are associated with toral automorphisms. Finally, because h is invariant under M, the characteristic polynomial p M (λ) factors as the product of the charac h . Since the polynomial p M (λ) is irreducible, we teristic polynomials of Φh and Φ d conclude that H = T , which implies that the line spanned by ω is dense in Td . Because M is irreducible, the lines spanned by the columns of M n approach the line spanned by ω as n → +∞. Hence for n large enough, every column of M n lies in Σδ .  Consider the following family of matrices in SL(d, Z) ⎛

1 ⎜1 ⎜ ⎜0 ⎜ Md = ⎜ . ⎜ .. ⎜ ⎝0 0

⎞ 1 1⎟ ⎟ 0⎟ ⎟ .. ⎟ if d > 2, .⎟ ⎟ 0 0 ··· 1 0⎠ 0 0 ··· 1 1 0 1 1 .. .

0 0 1 .. .

··· ··· ··· .. .

0 0 0 .. .

 M2 =

11 12

 (6.8)

6.2 Estimates on Unbounded Pluri-Subharmonic Functions

217

Lemma 6.5 The matrix Md is primitive and its characteristic polynomial is irreducible over Z. Proof Computing the characteristic polynomial of the matrix Md with the Laplace determinant rule, we obtain pd (λ) = det(Md − λI ) = (−1)d ((λ − 1)d − λ). In particular, det(Md ) = pd (0) = (−1)d (−1)d = 1. Considering the permutation matrix ⎞ ⎛ 0 0 0 ··· 0 1 ⎜1 0 0 ··· 0 0⎟ ⎟ ⎜ ⎜0 1 0 ··· 0 0⎟ ⎟ ⎜ P =⎜. . . . . .⎟ ⎜ .. .. .. . . .. .. ⎟ ⎟ ⎜ ⎝0 0 0 ··· 0 0⎠ 0 0 0 ··· 1 0 we have Md ≥ I + P, where the partial order ≥ refers to component-wise comparison of the entries of the two matrices. Hence (Md ) ≥ (I + P) = d

d

d    d j=0

j

P j,

and since the right-hand-side has all entries positive, it follows that Md is primitive. Writing μ = λ − 1, we get (λ − 1)d − λ = μd − μ − 1. Hence the irreducibility of pd (λ) is equivalent to that of μd − μ − 1, which was established to hold for every d ≥ 2 by Selmer (see Theorem 1 in [19]).  Proof (Proof of Theorem 6.3) Let f : Td → R be an analytic function such that f ≡ 0. Take constants c and C such that 0 < c <  f ∞ and D f ∞ < C < +∞. Let δ > 0 be such that for every g ∈ Crω (Td , R) with g − f r < δ, one still has 0 < c < g∞ and Dg∞ < C. Choose δ > 0 such that C δ < c, and pick a matrix M ∈ SL(d, Z) such that every column of M lies in Σδ . Then any axis γ of Td along the coordinate system defined by M has homotopy type in Σδ . We cannot have g|γ ≡ 0, because the δ-density of γ implies the contradiction c < g∞ ≤ C δ 0, if the disk D(z, r ) ⊂ Ω, then we have  0

r

μ(D(z, t)) dt = t



1

u(z + r e(θ ))dθ − u(z).

0

Since clearly 

r

0

μ(D(z, t)) dt ≥ t



r

r/2

μ(D(z, t)) dt  μ(D(z, r/2)), t

we conclude that μ(D(z, r/2))  C.

(6.9)

220

6 Large Deviations for Quasi-Periodic Cocycles

Therefore, we obtain the following measurement on the total Riesz mass of u: μ(Ω1 )  C(Ω, Ω1 ) C, where C is the bound on u(z) and C(Ω, Ω1 ) is a constant that depends on how the subdomain Ω1 is covered by disks contained in Ω. An argument that uses the Poisson-Jensen representation formula (see Sect. 3.7 in [11]) leads to a similar bound on the L ∞ -norm (on a slightly smaller compactly contained subdomain Ω2 ) for the harmonic part h(z) in the Riesz representation of u(z). We conclude that if |u(z)| ≤ C on Ω, then μ(Ω1 ) + h L ∞ (Ω2 ) ≤ C(Ω, Ω1 , Ω2 ) C.

(6.10)

We call the estimate in (6.10) a uniform measurement on the bounded subharmonic function u(z), since it only depends on its bound and on its domain. It is precisely this measurement that determines the parameters in the base-LDT estimates for the observable u(x). This chapter requires similar estimates for subharmonic functions that are unbounded from below. A uniform measurement like (6.10) on the Riesz mass and on the harmonic part of such a subharmonic function was obtained by M. Goldstein and W. Schlag (see Lemma 2.2 in [10]). The derivation is based on the Poisson-Jensen formula and on considerations that involve Green’s functions. The result in [10] is formulated for functions u : Ω → R. It holds, however, also for u : Ω → [−∞, ∞), as long as u ≡ −∞. That is because the only requirements for the applicability of the Poisson-Jensen formula (see Sect. 3.7 in [11]) are u ≡ −∞ and some assumptions on the boundary of Ω. We formulate the aforementioned result in [10] for a subharmonic function on an annulus, as this is the context of our model. Lemma 6.6 Let u : Ar → [−∞, ∞) be a subharmonic function, and let  u(z) =

Ar/2

log |z − ζ | dμ(ζ ) + h(z)

be its Riesz representation on the smaller annulus Ar/2 . Assume that sup u(z) − sup u(z) ≤ C z∈Ar

(6.11)

z∈Ar/2

Then μ(Ar/2 ) + h L ∞ (Ar/4 ) ≤ Cr C,

(6.12)

where Cr is a constant that depends on the width r of the annulus. Remark 6.5 Lemma 6.6 above says that in order to obtain a uniform measurement like (6.10) for a subharmonic function u(z), it is enough to have an upper bound

6.2 Estimates on Unbounded Pluri-Subharmonic Functions

221

everywhere and a lower bound at some point. Clearly assumption (6.11) is implied (up to doubling the constant) by sup u(z) + u L 2 (T) ≤ C.

(6.13)

z∈Ar

Remark 6.6 We comment on the order of magnitude of the constant Cr C in (6.12), as r → 0. In the bounded case |u(z| ≤ C for all z ∈ Ar , from (6.9) and the fact that Ar/2 can be covered by O( r1 ) many disks of radius O(r ), it follows that the total Riesz mass of u is of order r1 C or less. The L ∞ bound on h will be of the same order, showing that Cr C  r1 C. In the unbounded case, under the assumption (6.11), an inspection of the proof of Lemma 2.2 in [10] shows that the constant Cr in (6.12) depends only on the annulus Ar , via certain estimates on its Green’s function and an argument involving Harnack’s inequality. These considerations lead to an estimate on Cr which is exponential in r1 as r → 0. One can show, via some calculations involving elliptic integrals, that this estimate on the order of Cr cannot be significantly improved, unless of course (6.11) is strengthened. We note, however, that throughout this chapter, the width r of the annulus Ar is fixed. We do perform a change of coordinates of the multivariable torus Td , which in turn affects the size of the domain of the relevant subharmonic functions. However, this change of coordinates is performed only once. Hence for all intents and purposes, the constant Cr in this chapter may be treated as a universal constant, and so the uniform measurement on u(z) given by (6.12) depends only on the bound in (6.11) or in (6.13). We formulate the crucial estimates on a subharmonic function u(z) which are needed in the proof of the LDT: a rate of decay of the Fourier coefficients of u(x) and an estimate on its BMO norm derived under an appropriate splitting assumption. The reader may consult [18] or [8] for background on the relevant harmonic analysis topics. Let u : Ar → [−∞, ∞) be a subharmonic function, and let  u(z) = log |z − ζ | dμ(ζ ) + h(z) Ar/2

be its Riesz representation on the smaller annulus Ar/2 . Assume that μ(Ar/2 ) + h L ∞ (Ar/4 ) + ∂x h L ∞ (T) ≤ S.

(6.14)

Lemma 6.7 Under the assumptions above, the following estimates on the Fourier coefficients of u(x) as a function on T hold:   1 u(k) ˆ S  k 

for all k ∈ Z, k = 0.

(6.15)

222

6 Large Deviations for Quasi-Periodic Cocycles

Lemma 6.8 Let u(z) be a subharmonic function satisfying (6.14). Assume moreover that there is a splitting u = u 0 + u 1 with u 0  L ∞ (T) < ε0 and u 1  L 1 (T) < ε1 . Then u(x) has the following BMO bound: u B M O(T)  ε0 + (S ε1 )1/2 .

(6.16)

We make some comments regarding the proofs of these lemmas. These types of results are available for bounded subharmonic functions, see Chap. 4 in Bourgain’s monograph [3] or Sect. 1 in [4]. A careful inspection of their proofs shows that the boundedness of the subharmonic function u(z) is not strictly necessary: it is only used to derive estimates on the Riesz mass μ(Ar/2 ) and on the L ∞ -norm of the harmonic part h in the Riesz representation of u, that is, to derive the uniform measurement (6.10) on the annulus. Moreover, the resulting constants appearing in the estimates on the Fourier coefficients and on the BMO norm of u(x) depend precisely on this uniform measurement on u and on the bound on the derivative of h, that is, on ∂x h L ∞ (T) . Therefore, the bound in (6.14) can be substituted for the boundedness of u(z) and Lemmas 6.7 and 6.8 are proven along the same lines as their counterparts in [3, 4]. Remark 6.7 Let u : Ar → [−∞, ∞) be a subharmonic function and assume that (6.13) holds, that is (6.17) sup u(z) + u L 2 (T) ≤ C. z∈Ar

Then from Lemma 6.6 we have μ(Ar/2 ) + h L ∞ (Ar/4 ) ≤ Cr C. Using the Poisson integral representation for harmonic functions and scaling, it is easy to see that since h L ∞ (Ar/4 ) ≤ Cr C, then ∂x h L ∞ (T)  Crr C. We conclude that Cr C, r   or in other words, the assumption (6.14) above holds with S = O Crr C . We also note that since the annulus Ar will be fixed throughout the chapter, the uniform measurement in (6.14) will depend only on the bound C in (6.17). That is, we may write μ(Ar/2 ) + h L ∞ (Ar/4 ) + ∂x h L ∞ (T)  C. μ(Ar/2 ) + h L ∞ (Ar/4 ) + ∂x h L ∞ (T) 

The following result is an immediate consequence of Lemma 6.8 and it shows that a a weak a-priori concentration inequality for a subharmonic function can always be boosted to a stronger estimate. We use the notation u to denote the space average  T u(x)d x of the function u(x) on T.

6.2 Estimates on Unbounded Pluri-Subharmonic Functions

223

Lemma 6.9 Let u(z) be a subharmonic function such that the bound (6.14) holds. If u(x) satisfies the weak a-priori estimate:     {x ∈ T : u(x) − u > ε0 } < ε1

(6.18)

with ε1 ≤ ε04 , then for some absolute constant c > 0,  1/2 −1     1/4 −1/2 −1 −1/2 {x ∈ T : u(x) − u > ε1/2 } < e−c ε0 +S ε1 ε0 < e−c S ε0 . 0

(6.19)

 We may of course assume that u = 0. Denote the set in (6.18) by B, so Proof B < ε1 . Then u = u 0 + u 1 with u 0 := u · 1  and u 1 := u · 1B . B Clearly u 0  L ∞ (T) ≤ ε0 . Since (6.14) holds, we may apply Lemma 6.7 and (6.15) 1/2 implies u L 2 (T)  S, so u 1  L 1 (T) ≤ u L 2 (T) · 1B  L 2 (T) ≤ S ε1 . Lemma 6.8 applies, and we have the BMO bound: 1/4

u B M O(T) ≤ ε0 + S ε1 . The conclusion follows directly from John-Nirenberg inequality (see [18]).



6.2.4 Base LDT Estimates for Pluri-Subharmonic Observables The Birkhoff ergodic theorem implies that if the translation by ω is ergodic, then for any observable ξ ∈ L 1 (Td ), as n → ∞ the Birkhoff average n−1 1 ξ(x + jω) → ξ  n j=0

for a.e. x ∈ Td .

Moreover, if ξ is continuous, then the convergence above is uniform in x. In order to establish fiber-LDT estimates, we need a quantitative version of this convergence, one which applies to observables that admit a pluri-subharmonic extension. Moreover, the parameters describing this quantitative convergence should depend explicitly on a certain uniform measurement of such observables. We formulate and prove this quantitative version of the Birkhoff ergodic theorem for pluri-subharmonic functions that are unbounded from below but otherwise satisfy some bounds on average. A similar result for bounded pluri-subharmonic functions was formulated and proven in [16] (see also [3, 4] for results in the same spirit). We begin with some general considerations on pluri-subharmonic functions.

224

6 Large Deviations for Quasi-Periodic Cocycles

Definition 6.3 Let Ω be a domain in Cd . A function u : Ω → [−∞, ∞) is called pluri-subharmonic if it is upper semicontinuous and its restriction to any complex line is subharmonic. It follows from the definition above that the composition of a pluri-subharmonic function with a linear function is pluri-subharmonic as well. Moreover, a plurisubharmonic function is subharmonic in each variable. If f (z) is holomorphic (i.e. analytic in each variable), then u(z) = log | f (z)| is pluri-subharmonic. Moreover, if A(z) is a holomorphic matrix-valued function, then u(z) = logA(z) is pluri-subharmonic. We note two important differences between subharmonic and pluri-subharmonic functions. Firstly, the zero set of an analytic function of one variable is discrete, which is of course not the case for several variables analytic functions. Correspondingly, if u(z) is pluri-subharmonic, then the set {z : u(z) = −∞} can be quite complex, i.e. a variety of co-dimension 1 in Cd . In particular, this set may contain hyperplanes or lines parallel to the Euclidian coordinate axes. Secondly, the Riesz representation Theorem 6.4, which is an important tool in the study of subharmonic functions, is not available for pluri-subharmonic functions. Crucial to proving a quantitative Birkhoff ergodic theorem for observables on T with subharmonic extension to Ar , is having the decay (6.15) on its Fourier coefficients from Lemma 6.7 and the boosting of a concentration inequality in Lemma 6.9. Similar results for a pluri-subharmonic function u(z) on Ard can be obtained through a slicing argument, provided we may apply Lemmas 6.7 and 6.9 in each variable, and with the same measurements. More precisely, this type of argument works if the measurement (6.14) applies uniformly for all subharmonic functions Ar  z i → u(z 1 , . . . , z i−1 , z i , z i+1 , . . . , z d ) where 1 ≤ i ≤ d and z 1 , . . . , z i−1 , z i+1 , . . . , z d ∈ Ar . Of course this would be automatic if the pluri-subharmonic function u(z) were bounded. A weaker assumption is for (6.13) to hold uniformly in each variable. In other words, we require that u(z) be bounded from above on Ard and that its L 2 -norm in each variable be bounded as well. The latter assumption is equivalent to having a bound on |||u||| (see Sect. 6.2.2 for the meaning of this norm). To summarize, the assumption we make in this section on a pluri-subharmonic function u : Ard → [−∞, ∞) is that for some constant C < ∞ we have sup u(z) + |||u||| ≤ C.

(6.20)

z∈Ard

We now state the analogue of the boosting Lemma 6.9 for several variables. For simplicity we consider d = 2 variables, but a similar result, proven the same way, holds for any number d of variables. The meaning of the constant Cr below was given in Sect. 6.2.3 (see Remark 6.6). We remind the reader that Cr depends only on the annulus Ar , hence only on r .

6.2 Estimates on Unbounded Pluri-Subharmonic Functions

225

Lemma 6.10 Let u(z) be a pluri-subharmonic function on Ar2 such that the uniform measurement (6.20) holds for some constant C < ∞ and let S := Crr C. If u(x) satisfies the weak a-priori estimate:     {x ∈ T2 : u(x) − u > ε0 } < ε1

(6.21)

with ε1 ≤ ε08 , then for some absolute constant c > 0,  1/4 −1     1/8 −1/2 −1 −1/4 {x ∈ T2 : u(x) − u > ε1/4 } < e−c ε0 +S ε1 ε0 < e−c S ε0 . 0

(6.22)

For d variables, replace the powers 1/4 by 1/2d etc. A similar result was proven in [3] (see Lemma 4.12) for bounded plurisubharmonic functions, using a slicing argument and the corresponding one variable result. The reader may verify that the argument in [3] can be employed as long as the one variable result, i.e. Lemma 6.9, can be applied uniformly to the functions u(·, z 2 ), u(z 1 , ·), for all z 1 , z 2 ∈ Ar , and provided these functions are also uniformly bounded in say L 2 (T). These conditions are of course ensured by the assumption (6.20). We are now ready to formulate the main result of this section, a quantitative Birkhoff ergodic theorem. Theorem 6.5 Let u : Ard → [−∞, ∞) be a pluri-subharmonic function satisfying sup u(z) + |||u||| ≤ C.

(6.23)

z∈Ard

Let ω ∈ DCt and put n 0 := t −2 . There is a = a(d) > 0 so that for all n ≥ n 0 n−1      a {x ∈ Td :  1 u(x + jω) − u > S n −a } < e−c n , n j=0

(6.24)

  where S = O Crr C and c = O(1). Proof We prove this statement for d = 2 variables, but the same argument holds for any number of variables. We follow the same strategy used in the proof of Proposition 4.1 in [16], as the main ingredients of the argument depend only on having a uniform decay on the Fourier coefficients of u (separately in each variable) and on the boosting Lemma 6.10, both of which are ensured by the assumption (6.23). For the reader’s convenience, we present the complete argument here. The assumption (6.23) implies that the bound (6.17) holds in each variable and with the same constant C, i.e. for all subharmonic functions u(·,  z 2 ),u(z 1 , ·) with z 1 , z 2 ∈ Ar . Hence by Remark 6.7, the bound (6.14) with S = O Crr C holds for all these functions as well.

226

6 Large Deviations for Quasi-Periodic Cocycles

This ensures that we can apply Lemma 6.7 on the decay of the Fourier coefficients in each variable and obtain     1 1 ˆ 1 , x 2 ) ≤ S · ˆ 1 , l 2 ) ≤ S · and sup u(x sup u(l |l1 | |l2 | x2 ∈T x1 ∈T

(6.25)

for all l = (l1 , l2 ) ∈ Z2 with l1 = 0, l2 = 0. Expand u(x) = u(x1 , x2 ) into a Fourier series 

u(x1 , x2 ) = u +

u(l ˆ 1 , l2 ) · e((l1 , l2 ) · (x1 , x2 )).

(l1 ,l2 )∈Z2 (l1 ,l2 ) =(0,0)

Then the Birkhoff averages have the form n−1 1 u((x1 , x2 ) + j (ω1 , ω2 )) n j=0

= u +



u(l ˆ 1 , l2 ) · e((l1 , l2 ) · (x1 , x2 )) ·

(l1 ,l2 )∈Z2 (l1 ,l2 ) =(0,0)

= u +



n−1 1 

n

e( j (l1 , l2 ) · (ω1 , ω2 ))



j=0

u(l ˆ 1 , l2 ) · e((l1 , l2 ) · (x1 , x2 )) · K n ((l1 , l2 ) · (ω1 , ω2 )),

(l1 ,l2 )∈Z2 (l1 ,l2 ) =(0,0)

where we denoted by K n (y) the Fejér kernel on T: K n (y) =

n−1 1 1 1 − e(ny) . e( j y) = n j=0 n 1 − e(y)

Clearly K n (y) has the following bound:     K n (y) ≤ min 1,

1  ny

where y was defined in (6.1). We then have: n−1 1  2   u((x1 , x2 ) + j (ω1 , ω2 )) − u 2 2  L (T ) n j=0   2  2 u(l ˆ 1 , l2 ) ·  K n ((l1 , l2 ) · (ω1 , ω2 )) = (l1 ,l2 )∈Z2 (l1 ,l2 ) =(0,0)

(6.26)

6.2 Estimates on Unbounded Pluri-Subharmonic Functions



=

2  2  u(l ˆ 1 , l2 ) ·  K n ((l1 , l2 ) · (ω1 , ω2 ))

1≤|l1 |+|l2 | 0. There are constants δ = δ( f, r ) > 0, a = a(d) > 0, k0 = k0 ( f, r, d) ∈ N and S = S( f, r, C) < ∞ such that if u : Ard → [−∞, ∞) is a pluri-subharmonic function satisfying the bounds −C +

m−1  1   log g(z + jω) ≤ u(z) ≤ C m j=0

for all z ∈ Ard ,

(6.28)

for some g ∈ Crω (Td , R) with g − f r < δ and for some m ≥ 1, then n−1      a {x ∈ Td :  1 u(x + jω) − u > S n −a } < e−n n j=0

(6.29)

holds for all n ≥ n 0 := t −2 k0 . Proof By Theorem 6.3, since f ≡ 0, there is M ∈ SL(d, Z) such that in the new coordinates x  = M x, the analytic f (x  ) does not vanish identically along   function  any hyperplane, and in fact, log  f (x ) has the property that its L 2 -norms    separately in each variable are uniformly bounded, or in other words that log  f (x  ) is bounded. Moreover, this holds uniformly in a neighborhood of f . We note that the conclusion (6.29) of the theorem is coordinate agnostic. Indeed, if M ∈ SL(d, Z) then M −1 preserves the Haar measure on Td , while if ω ∈ DCt then ω = M −1 ω ∈ DCt for t  = t/Md+1 1 , hence the Diophantine condition is preserved up to a constant. Furthermore, since M is linear, u ◦ M(z) is also pluri-subharmonic, and its domain contains Ar  , where r  ∼ r/M. That is because the linear map induced by M expands the imaginary direction, hence the width of the domain of u ◦ M(z) is proportionally smaller. However, M is a fixed constant depending upon M, hence only upon f . Hence it is enough to prove (6.29) for u(x) replaced by u ◦ M(x) and for ω replaced by M −1 ω. Moreover, the assumption (6.28) holds for u ◦ M(x) provided we replace g(x) by g ◦ M(x), which is still δ-close to f ◦ M(x). Therefore, with no loss of generality, we may assume that for δ = δ( f, r ) small enough, and for some finite constant C0 = C0 ( f, r ) we have: if g ∈ Crω (Td , R) with g − f r < δ, then |||log |g|||| ≤ C0 . (6.30)

230

6 Large Deviations for Quasi-Periodic Cocycles

From (6.28) we have: m−1      u(x) ≤ 2C + 1 logg(x + jω) for all x ∈ Td . m j=0

Then using Lemma 6.3, |||u||| ≤ 2C + |||log |g|||| ≤ 2C + C0 , hence sup u(z) + |||u||| ≤ 3C + C0 . z∈Ard

This shows that the assumption (6.23) in Theorem 6.5 is now (after a change of coordinates) satisfied. 6.5 to conclude that (6.29) holds with S = S( f, r, C) =   We apply Theorem O Crr (3C + C0 ) and for all n ≥ n 0 , where n 0 := t −2 k0 . We choose k0 = k0 ( f, r, d) so that k0 ≥ M2(d+1) and so that the constant c = O(1) in (6.24) is absorbed.  1

6.3 The Proof of the Fiber LDT Estimate Given any cocycle A ∈ Cm , we derive some uniform measurements on A which will allow us to apply the base LDT estimate in Proposition 6.2 to all the iterates of any cocycle B near A, in a uniform way. This, combined with an almost invariance under the base dynamics property for the iterates of the cocycle, will lead to the proof of the uniform fiber LDT estimate.

6.3.1 Uniform Measurements on the Cocycle We introduce some notations. For a cocycle A ∈ Crω (Td , Mat(m, R)), let f A (z) := det[A(z)]. Then clearly f A ∈ Crω (Td , R). Moreover, for every scale n ≥ 1, let u (n) A (z) :=

1 logA(n) (z). n

6.3 The Proof of the Fiber LDT Estimate

231

Note that due to the analyticity of the cocycle A(z), the functions u (n) A (z) are plurisubharmonic on Ard . This property is crucial in establishing the fiber LDT estimate. We denote the space averages of these functions by L (n) 1 (A) =

 Td

u (n) A (x) d x =

 Td

1 logA(n) (x) d x. n

The following proposition introduces some locally uniform measurements on a cocycle. It shows that the above functions are bounded in the L 2 -norm, uniformly in the scale, and uniformly in a neighborhood of a given non identically singular cocycle. It also shows that the failure of the above functions to be bounded in the L ∞ -norm is captured by Birkhoff averages of a one dimensional cocycle. Proposition 6.3 Given a cocycle A ∈ Crω (Td , Mat(m, R)) with f A = det[A] ≡ 0, there are constants δ = δ(A) > 0 and C = C(A) < ∞, such that for any cocycle B ∈ Crω (Td , Mat(m, R)), if B − Ar < δ, then f B = det[B] ≡ 0,   log f B  L 2 (Td ) ≤ C

(6.31) (6.32)

and for all n ≥ 1 we have −C +

n−1  1   log f B (T i z) ≤ u (n) B (z) ≤ C, n i=0

u (n) B  L 2 (Td ) ≤ C.

(6.33) (6.34)

Proof Clearly the map Crω (Td , Mat(m, R))  B → f B = det[B] ∈ Crω (Td , R) is locally Lipschitz, hence we can choose δ = δ(A) > 0 small enough such that if B − Ar < δ, then the analytic function g := f B satisfies the Łojasiewicz inequality (6.3) with the same constants S = S( f A ), b = b( f A ) as f := f A (see Lemma 6.1). In particular, f B ≡ 0 and from Remark 6.2 we have the uniform L 2 -bound:   log f B  L 2 (Td ) ≤ C( f A ∞ , S, b) ≤ C2 (A). The upper bound in (6.33) is clear: if B − Ar < δ, then Br ∼ Ar , and since for every z ∈ Ard we have B (n) (z) ≤

n−1 

B(T i z) ≤ Brn ,

i=0

we conclude that u (n) B (z) =

1 logB (n) (z) ≤ logBr < C1 (A). n

232

6 Large Deviations for Quasi-Periodic Cocycles

To establish the lower bound, we use Cramer’s rule: det[B(z)] · I = B(z) · adj(B(z)). Hence n−1 

f B (T i z) · I =

i=0

n−1 

det[B(T i z)] · I

i=0

= B(T n−1 z) . . . B(z) · adj(B(z)) . . . adj(B(T n−1 z)) = B (n) (z) · adj(B(z)) . . . adj(B(T n−1 z)). Clearly adj(B(z))  B(z)m−1 ≤ Brm−1

for all z.

  This implies, for some C1 = C1 (A) ∼ logAr , u (n) B (z) =

n−1  1 1   logB (n) (z) ≥ −C1 + log f B (T i z), n n i=0

which establishes (6.33). Moreover, (6.33) also implies that for all x ∈ Td , n−1   (n)     u (x) ≤ 2C1 + 1 log f B (T i x). B n i=0

Hence by (6.32) and the measure invariance of the translation T ,     u (n) B  L 2 (Td ) ≤ 2C 1 + log f B  L 2 (Td )  C 1 + C 2 .



6.3.2 The Nearly Almost Invariance Property For GL(m, R)-valued cocycles, the functions u (n) A (x) are almost invariant under the base dynamics T , in the sense that for all x ∈ Td ,   (n) u (x) − u (n) (T x) ≤ C 1 . A A n For a cocycle that has singularities but it is not identically singular, we establish this property off of an exponentially small set of phases. It is in fact crucial to obtain a bound on the measure of this exceptional set of phases, uniform in the cocycle.

6.3 The Proof of the Fiber LDT Estimate

233

Proposition 6.4 Let A ∈ Crω (Td , Mat(m, R)) such that det[A(x)] ≡ 0. Then there are constants δ = δ(A) > 0 and C = C(A) < ∞, such that for any a ∈ (0, 1), if B ∈ Crω (Td , Mat(m, R)) with B − Ar < δ, then   (n) u (x) − u (n) (T x) ≤ C 1 (6.35) B B na   1−a holds for all n ≥ 1 and for all x ∈ / Bn , where Bn  < e−n . The exceptional set Bn may depend on the coycle B, but its measure does not. Proof For any cocycle B ∈ Crω (Td , Mat(m, R)), let f B (x) := det[B(x)] ∈ Crω (Td , R). Then if B− Ar < δ we have  f B − f A r < C δ, where C = C(A) > 0. Using Lemma 6.1, there are constants δ, C, b > 0, all depending only on A, such that if B − Ar < δ then     {x ∈ Td :  f B (x) < t} < Ct b

for all t > 0.

(6.36)

In particular (or by Fubini), the set Z0 (B) := {x ∈ Td : f B (x) = 0} has zero measure, and so does ∪n≥0 T −n Z0 (B) =: Z(B). Hence if x ∈ / Z(B), then B(T n x) is invertible for all n ≥ 0 and we can write: 1 1 logB (n) (x) − logB (n) (T x) n n B(T n x)−1 · [B(T n x) · B(T n−1 x) · · · · · B(T x)] · B(x) 1 = log n B(T n x) · B(T n−1 x) · · · · · B(T x) 1 1 C ≤ log[B(T n x)−1  · B(x)] ≤ + logB(T n x)−1 . n n n The last inequality follows from B(x) ≤ B − Ar + Ar < δ + Ar ∼ Ar , thus logB(x) ≤ C(A) for all x ∈ Td . Similarly, 1 1 logB (n) (T x) − logB (n) (x) n n B(T n x) · [B(T n−1 x) · · · · · B(T x) · B(x)] · B(x)−1  1 = log n B(T n−1 x) · · · · · B(x) 1 1 C ≤ log[B(T n x) · B(x)−1 ] ≤ + logB(x)−1 . n n n

234

6 Large Deviations for Quasi-Periodic Cocycles

We conclude that if x ∈ / Z(B), then 1   logB (n) (x) − 1 logB (n) (T x) n n 1 C 1 < + logB(T n x)−1  + logB(x)−1 . n n n

(6.37)

Therefore, in order to prove (6.35), we need to obtain a (uniform in B) upper bound for B(x)−1 , where x is outside an exponentially small set. Upper bounds on the norm of the inverse of a matrix are obtained from lower bounds on the determinant via Cramer’s rule. Indeed, if x ∈ / Z(B) then 1 1  ≤ C1 ·  , B(x)−1  = adj(B(x)) ·  det[B(x)]  f B (x) which holds because adj(B(x))  B(x)m−1 ≤ Brm−1 ≤ C1 (A). Fix a ∈ (0, 1) and apply (6.36) with  1/b 1 1−a t := e−(1/b) n . C   d b −n 1−a   , such that if x ∈ / Bn then  there is a set Bn ⊂ T with Bn < Ct = e  Then  f B (x) ≥ t. Thus B(x)−1  ≤ C1 hence

1 1−a 1−a = C1 C 1/b e(1/b) n = C2 (A) e(1/b) n , t

n 1−a 1 log C2 1 logB(x)−1  ≤ + 1/b ≤ C3 (A) a n n n n

which, combined with (6.37), proves the proposition.



6.3.3 The Statement and Proof of the LDT Theorem 6.6 Given A ∈ Crω (Td , Mat(m, R)) with det[A(x)] ≡ 0 and ω ∈ DCt , there are constants δ = δ(A) > 0, k0 = k0 (A) ∈ N, C = C(A, r ) < ∞, a = a(d) > 0 and b = b(d) > 0 such that if B − Ar ≤ δ and n ≥ n 0 := t −2 k0 , then     {x ∈ Td :  1 logB (n) (x) − L (n) (B) > C n −a } < e−n b . 1 n

(6.38)

6.3 The Proof of the Fiber LDT Estimate

235

Proof Using Proposition 6.3, there are constants δ = δ(A), C = C(A), such that if B is a cocycle near A: B − Ar < δ, then for all scales n ≥ 1 n−1  1   −C + log f B (T i z) ≤ u (n) B (z) ≤ C. n i=0

We apply Proposition 6.2 with f = f A , C = C(A), so the dependence of the constants on the data will be: δ = δ( f A , r ) = δ(A), a = a(d), k0 = k0 ( f A , r, d) = k0 (A) and S = S( f A , r, C) = S(A). Since the map Crω (Td , Mat(m, R))  B → f B ∈ Crω (Td , R) is locally Lipschitz, by possibly decreasing δ, we may assume that whenever B − Ar < δ we have that  f B − f A r is small enough that the pluri-subharmonic function u(z) = u (n) B (z) satisfies the assumption (6.28) with g = f B . Hence Proposition 6.2 applied to our context says that for all R ≥ n 0 we have: R−1       a (n)  {x ∈ Td :  1 > S R −a } < e−c R u (n) (x + jω) − u B B R j=0

(6.39)

From the nearly almost invariance property given by Proposition 6.4, after possibly decreasing δ, and for a constant C  = C  (A) < ∞, we have that if B − Ar < δ, then   (n) u (x) − u (n) (T x) ≤ C  1 B B na   1−a for all n ≥ 1 and for all x ∈ / Bn , where Bn  < e−n .   ¯ n  ≤ R e−n 1−a and if x ∈ ¯ n := ∪ R−1 T −i Bn . Hence B ¯ n then Let B /B i=0 R−1    (n)  R  u (x) − 1 u (n) B B (x + jω) ≤ C a R j=0 n

(6.40)

Pick R  n a , say R = n a/(a+1)  and let C = 2(C  +S). The conclusion (6.38) of the theorem then follows from (6.39) and (6.40) for some easily computable choice of the new parameter a and the parameter b. Remark 6.8 What determines all constants in the LDT estimate above, are precisely some measurements on the function f A (x) := det[A(x)] and the parameter t of the Diophantine condition DCt on the frequency ω. We also note that unlike in the case of random cocycles, the fiber-LDT estimate above was proven without assuming the existence of a gap between the first two Lyapunov exponents. In particular, using exterior powers, we can derive an LDT estimate for every Lyapunov exponent.

236

6 Large Deviations for Quasi-Periodic Cocycles

6.4 Deriving Continuity of the Lyapunov Exponents To prove the continuity of the LE, we use the abstract criterion given by Theorem 3.1, which was formulated and proven in Chap. 3. Section 3.1 contains all the relevant definitions and the precise formulation of this criterion. Under the same assumptions, in Chap. 4 we obtained abstract criteria for the continuity of the Oseledets filtration (Theorem 4.7) and of the Oseledets decomposition (Theorem 4.8). For the reader’s convenience, we briefly review the relevant definitions. We then explain how Theorems 3.1, 4.7 and 4.8 are applicable to the context of this chapter. Proof (of Theorem 6.1) The torus Td together with the σ -algebra of Borel sets, the Haar measure and thetranslation by a rationally independent vector ω form an ergodic MPDS. Let C = m≥1 Cm be the space of analytic, not identically singular cocycles over this ergodic system defined in Sect. 6.1.1. We say that a cocycle A is uniformly L 2 -bounded if there are constants C = C(A) < ∞ and δ = δ(A) > 0 such that  1    logB (n)  2 d < C L (T ) n for all cocycles B that are close enough to A (in the given topology on the space of cocycles) and for all scales n ≥ 1. Estimate (6.34) in Proposition 6.3 shows that all cocycles in C are uniformly L 2 -bounded. Given a cocycle A ∈ Cm and N ∈ N, note that the sets {x ∈ Td : A(n) (x) ≤ c} or {x ∈ Td : A(n) (x) ≥ c} for some 1 ≤ n ≤ N and c > 0 are closed, so the lattice F N (A) generated by them consists only of closed sets. We say that a set Ξ of observables and a cocycle A ∈ C are compatible, iffor any  N ∈ N, F ∈ F N (A), ε > 0, there is ξ ∈ Ξ such that 1 F ≤ ξ and Td ξ d x ≤  F  +ε. Let Ξ := C0 (Td ) be the set of all continuous observables ξ : Td → R. By  the regularity of the Borel measure, there is an open set U ⊇ F such that U  ≤  F  + ε. By Urysohn’s lemma, there is a continuous function ξ ∈ Ξ such that 0 ≤ ξ ≤ 1, ξ ≡ 1 on F and ξ ≡ 0 on U  . Then      1 F ≤ ξ and ξ d x ≤ U  ≤  F  + ε, Td

which shows that Ξ is compatible with every cocycle in C , a property we call the compatibility condition. We call deviation size function any non-increasing map ε : (0, ∞) → (0, ∞), and deviation measure function any sufficiently fast decreasing function ι : (0, ∞) → (0, ∞). A triplet (n 0 , ε, ι), where n 0 ∈ N and ε, ι are deviation size or measure functions is called an LDT parameter, while any set P containing such triplets is called a set of LDT parameters.

6.4 Deriving Continuity of the Lyapunov Exponents

237

We say than an observable ξ ∈ Ξ satisfies a base-LDT estimate with parameter space P, if for every ε > 0 there is an LDT parameter (n 0 , ε, ι) ∈ P such that for all n ≥ n 0 we have ε(n) ≤ ε and  n−1      {x ∈ Td :  1 ξ(T j x) − ξ d x  > ε(n)} < ι(n). n j=0 Td A torus translation by a rationally independent vector ω is uniquely ergodic, hence the convergence in Birkhoff’s ergodic theorem is uniform for continuous observables. This shows that the base-LDT estimate holds trivially for ξ ∈ Ξ , with ε ≡ ε and ι ≡ 0. Finally, a cocycle A ∈ Cm is said to satisfy a uniform fiber-LDT with parameter space P, if for every ε > 0 there are δ > 0 and an LDT parameter (n 0 , ε, ι) ∈ P which may only depend upon A and ε, such that for any B ∈ Cm with dist(B, A) < δ and for all n ≥ n 0 , we have ε(n) ≤ ε and     {x ∈ Td :  1 logB (n) (x) − L (n) (B) > ε(n)} < ι(n). 1 n Theorem 6.6 shows that a uniform fiber-LDT estimate holds for all cocycles in C , with the parameter space P being the set of all triplets (n 0 , ε, ι) with n 0 ∈ N, ε(t) ≡ b C t −a and ι(t) ≡ et where the constants n 0 , a, b are explicitly described in terms of some measurements of A and the Diophantine condition on ω. Theorem 3.1 in Chap. 3 says that given an ergodic system, a space of cocycles, a set of observables and a set of LDT parameters, if the compatibility condition, the uniform L 2 -boundedness, the base-LDT and the uniform fiber-LDT estimates hold, then all Lyapunov exponents are continuous. Moreover, if A ∈ Cm has a Lyapunov spectrum gap, i.e. for some 1 ≤ k ≤ m, L k (A) > L k+1 (A), then locally near A, the map B → Λk (B) = (L 1 + · · · + L k )(B) has a modulus of continuity ω(h) := [ι (c log h1 )]1/3 , for some c > 0 and some deviation measure function ι corresponding to an LDT parameter in P. Given that the deviation measure functions obtained in this chapter have the form b b ι(t) ≡ e−t , the modulus of continuity is ω(h) = e−c [log(1/ h)] , i.e. we obtain weakHölder continuity. The statements on the Oseledets filtration and decomposition follow directly from the corresponding general criteria. 

6.5 Refinements in the One-Variable Case Let us now consider the case of a one-variable torus translation by a frequency ω that satisfies a stronger Diophantine condition, namely:

238

6 Large Deviations for Quasi-Periodic Cocycles

t   kω ≥   k  (log k )1+

(6.41)

for some t > 0 and for all k ∈ Z \ {0}. Using the already established fiber-LDT estimate in Theorem 6.6 and the corresponding continuity result for Lyapunov exponents, we derive a sharper fiberLDT estimate under the additional assumption that the cocycle A has the property L 1 (A) > L 2 (A). This in turn leads to a stronger modulus of continuity of the LE, the Oseledets filtration and the Oseledets decomposition. We note that our argument for proving this sharper fiber-LDT does require the gap condition L 1 (A) > L 2 (A), since it depends essentially on the Avalanche Principle proven in Chap. 2. In fact, the argument requires the full strength of the continuity results in Chap. 3, specifically the finite scale uniform estimates in Lemma 3.4. We remind the reader the particular estimate in the general AP which we use here (see Chap. 2, specifically Proposition 2.42 in Sect. 2.4). We recall the notation ∈ [1, ∞] for the ratio of the first two singular values of a matrix gr(g) = ss21 (g) (g) g ∈ Mat(m, R). Proposition 6.5 There exists c > 0 such that given 0 < ε < 1, 0 < κ ≤ c ε2 and g0 , g1 , . . . , gn−1 ∈ Mat(m, R), if 1 κ gi gi−1  >ε gi  gi−1 

gr(gi ) >

for all 0 ≤ i ≤ n − 1 for all 1 ≤ i ≤ n − 1

then   n−2 n−1     κ   (n) loggi  − loggi gi−1   n · 2 . logg  +   ε i=1 i=1 The argument we are about to present works only for one-variable translations because it requires the following sharp version of the quantitative Birkhoff ergodic Theorem 6.5, and this sharp version is only available in the one-variable setting. Proposition 6.6 Let ω ∈ T satisfying the strong Diophantine condition (6.41) for some t > 0 and let u : Ar → [−∞, ∞) be a subharmonic function satisfying the bound sup u(z) + u L 2 (T) ≤ C. z∈Ar

There are constants c1 , c2 > 0 that depend only on C and r and there is n 0 ∈ N that depends on t such that for all ε > 0 and n ≥ n 0 we have: n−1      4 {x ∈ T :  1 u(x + jω) − u > ε} < e−c1 εn+c2 (log n) . n j=0

6.5 Refinements in the One-Variable Case

239

This result was proven in [9] (see Theorem 3.8) for bounded subharmonic functions. It remains valid in our setting, by the same argument following Lemma 6.8. We can now phrase the sharper fiber-LDT in the one-variable case. Theorem 6.7 Let A ∈ Crω (T, Mat m (R)) with det[A(x)] ≡ 0, and let ω ∈ T be a frequency satisfying the strong Diophantine condition (6.41). Assume that L 1 (A) > L 2 (A) and let ε > 0. There are δ = δ(A) > 0, p = p(A) < ∞, n¯ 1 = n¯ 1 (A, ω, ε) ∈ N such that for all B ∈ Crω (T, Mat m (R)) with B − Ar < δ and for all n ≥ n¯ 1 we have:       {x ∈ T :  1 log B (n) (x) − L (n) (B) > ε} < e−ε n/(log n) p . 1   n Before starting the proof of this sharper fiber-LDT, we note that it cannot be obtained along the same lines as Theorem 6.6, by using the sharper estimate in Proposition 6.6, precisely because the nearly almost invariance property (Proposition 6.4) is too weak, as a consequence of the singularities of the cocycle. Proof We first explain the mechanics of the proof, then we detail the argument, which bears some similarities with an argument used in [10]. We already have a version of the fiber-LDT estimate in Theorem 6.6, where the b deviation functions ε(t) ≡ C t −a , ι(t) ≡ e−t are both relatively coarse. Using this LDT at a scale n 0 and the avalanche principle in Proposition 6.5, we derive an LDT b1 estimate at a larger scale n 1  en 0 (where 0 < b1 < b), which has a much sharper deviation size function ε(t), but a very coarse deviation measure function ι(t). This will lead, via Lemma 6.8, to BMO estimates, which with the help of John-Nirenberg’s inequality will prove the desired stronger LDT estimate. Let γ := L 1 (A) − L 2 (A) > 0. Let n¯ 0 ∈ N, δ > 0 be such that the fiberLDT estimate in Theorem 6.6 holds for all cocycles B ∈ Crω (T, Mat m (R)) with B − Ar < δ and for all n ≥ n¯ 0 . Moreover, the Lyapunov exponents are already known to be continuous, and in fact, the more precise finite scale uniform estimates from Lemma 3.4. in Chap. 3 hold as well. Therefore, we can choose n¯ 0 and δ so that the following conditions hold for all B ∈ Crω (T, Mat m (R)) with B − Ar < δ. Firstly, 1 log gr(B (m) (x)) ≥ γ /2 m

(6.42)

holds for all x outside a set of measure ε, gi  gi−1  B (m i ) (T m i−1 T qi−1 x) B (m i−1 ) (T qi−1 x)

(6.47)

while from (6.45), for a similar set, gr(gi ) = gr(B (m i ) (T qi x)) >

1 . κ

(6.48)

6.5 Refinements in the One-Variable Case

241

By excluding a set of measure < n e−n 0 < e−1/2 n 0 the estimates (6.47) and (6.48) will hold for all indices i, so the avalanche principle of Proposition 6.5 applies and we have:   n−2 n−1     κ   (n) loggi  − loggi gi−1   n · 2 < n e−n 0 γ /10 , logg  +   ε i=1 i=1 b

b

which becomes n−2    logB (m i ) (T qi x)  logB (n 1 ) (x) +

(6.49)

i=1



n−1 

  logB (m i +m i−1 ) (T qi−1 x)  < n e−n 0 γ /10 .

i=1

We will compute an average of (6.49) to establish a relation between the finite (n 0 ) (2n 0 ) 1) (B) (see formula (6.55) scale Lyapunov exponents L (n 1 (B), L 1 (B) and L 1 below). In this average, the first and the last terms appearing in the second sum of (6.49) will be discarded. Let us now explain why these two terms are negligible. First note that if m  n 0 , then applying (6.38), we have that for x outside a set of b measure < e−n 0 , 1 −a log B (m) (x) > L (m) > −C, 1 (B) − m m for some C = C(A) < ∞, where the lower bound on L (m) 1 (B) follows from (6.34). Moreover, since by (6.33) we have that for all x ∈ T, 1 log B (m) (x) ≤ C, m we conclude that for x outside a set of measure < e−n 0 , we have b

  log B (m) (x) ≤ C m  C n 0 . This estimate clearly applies to the function x → log B (m 2 +m 1 ) (x) and to the function x → log B (m n−1 +m n−2 ) (T qn−2 x) which represent the terms corresponding to i = 1 and i = n − 1 in the second sum in (6.49). Let v(x) := log B (m 2 +m 1 ) (x) + log B (m n−1 +m n−2 ) (T qn−2 x). Then for x outside b a set of measure  e−n 0 , the function v(x) has the bound   v(x)  Cn 0 .

242

6 Large Deviations for Quasi-Periodic Cocycles b1

Since n 1  en 0 , with b1 < 1, estimate (6.49) then implies that for all x outside a b set of measure  e−1/2 n 0 we have: n−2  1 1   logB (n 1 ) (x) + logB (m i ) (T qi x)  n1 n 1 i=1



(6.50)

n−2  n −n 0 γ /10 n0 n0 1   logB (m i +m i−1 ) (T qi−1 x)  < e +C  . n 1 i=2 n1 n1 n1

We are now ready to average (6.50) over some specific choices of the integers m 0 , m 1 , . . . , m n−1 . 1 (m) (z) for m = n 0 The goal is to apply Proposition 6.6 to u (m) B (z) = m log B 0) and m = 2n 0 , which will allow us to replace the two sums in (6.50) by L (n 1 (B) and 0) L (2n (B) respectively. 1 The reason we will not simply choose all integers m i to be n 0 , but instead we will consider n 0 -many configurations and then average, is that otherwise the translates T qi x = x + qi ω would only be by multiples of n 0 , i.e. qi = i n 0 . However, Proposition 6.6 involves all translations T j x = x + jω, 0 ≤ j < n 1 and not just the translations by i n 0 ω, 0 ≤ i ≤ n − 1. Firstly, pick all integers m i to be n 0 except for the last one, m n−1 , which is chosen such that 2n 0 ≤ m n−1 ≤ 3n 0 . Note that in this case q0 = 0, qi = i n 0 for 1 ≤ i ≤ n − 1, hence we are getting the translates by multiples of n 0 . Secondly, increase by 1 the size of the first block, decrease by 1 the size of the last block, and keep all the other blocks the same. In other words, let m 0 = n 0 + 1, m 1 = m 2 = · · · = m n−2 = n 0 and m n−1  n 0 is chosen so that all integers m i add up to n 1 . In this case q0 = 0, qi = in 0 + 1 for 1 ≤ i ≤ n − 1, so we are getting the translates by multiples of n 0 plus 1. Continue to increase the first block by 1, decrease the last by 1 and keep the rest the same, for n 0 steps. In other words, for each 0 ≤ j ≤ n 0 − 1, choose the following integers: m 0 = n 0 + j, m 1 = m 2 = · · · = m n−2 = n 0 and m n−1  n 0 , so that they all add up to n 1 . Then q0 = 0 and qi = in 0 + j. Apply (6.50) for each of these n 0 configurations of integers m i , 0 ≤ i ≤ n − 1, add up all the estimates and divide by n 0 to get: n 0 −1  n−2  1 1 1   logB (n 1 ) (x) + logB (n 0 ) (T i n 0 + j x)  n1 n 1 j=0 i=1 n 0



n 0 −1  n−2  n0 1 2   logB (2n 0 ) (T (i−1) n 0 + j x)  < C n 1 j=0 i=2 2n 0 n1

(6.51)

6.5 Refinements in the One-Variable Case

243

for all x outside a set of measure < n 0 e−1/2 n 0 . Estimate (6.51) can be written as: b

 1 1  logB (n 1 ) (x) +  n1 n1 −

2 n1

(n−2) n 0 −1  k=n 0

(n−1) n 0 −1  k=n 0

1 logB (n 0 ) (T k x) n0

(6.52)

 n0 1 (log n 1 )1/b1  logB (2n 0 ) (T k x)  < C  2n 0 n1 n1

for all x outside a set of measure < e−1/3 n 0 . Due to the uniform estimates (6.33), (6.34) in Propositions 6.3, 6.6 can be applied to the functions b

1 1 0) log B (n 0 ) (z) and u (2n (z) = log B (n 0 ) (z). B n0 2n 0

0) u (n B (z) =

Let ε  (log nn11) and we may of course assume that 1/b1 > 1. Moreover, since the number (n − 2) n 0 and respectively (n − 3) n 0 of translates in (6.52) are  n 1 , then up to an additional error of order nn01 , using Proposition 6.6 we obtain: 1/b1

1   n1

(n−1) n 0 −1  k=n 0

 (log n 1 )1/b1 1  0) logB (n 0 ) (T k x) − L (n 1 (B) < ε  n0 n1

for x outside a set of measure < e−c1 εn 1 +c2 (log n 1 ) < e−c3 (log n 1 ) Similarly we have: 4

1   n1

(n−2) n 0 −1  k=n 0

1/b1

.

 (log n )1/b1 1 1  0) logB (2n 0 ) (T k x) − L (2n (B) < 1 2n 0 n1

for x outside a set of measure < e−c3 (log n 1 ) Now integrate (6.52) to get:

1/b1

(6.53)

(6.54)

.

  (n 1 )  L (B) + L (n 0 ) (B) − 2L (2n 0 ) (B) 1

1

1

n0 (log n 1 )1/b1 b < C + C e−1/3 n 0  . n1 n1

(6.55)

Combine (6.52), (6.53), (6.54), (6.55) to conclude that for x outside a set of 1/b measure  e−c3 (log n 1 ) 1 , we have   1  (log n 1 )1/b1 (n 1 ) (n 1 )   .  n logB (x) − L 1 (B)  n1 1

(6.56)

244

6 Large Deviations for Quasi-Periodic Cocycles

Let ε0 

(log n 1 )1/b1 1/b1 , ε1  e−c3 (log n 1 )  ε08 n1

and let 1) u(z) = u (n B (z) =

1 log B (n 1 ) (z). n1

Then u(z) is subharmonic on Ar , and by Proposition 6.3, for some C = C(A) < ∞ we have the bound sup u(z) + u L 2 (T) ≤ C, z∈Ar

  which via Remark 6.7 implies (6.14) with S = O Crr C . Moreover, from (6.56)     {x ∈ T : u(x) − u > ε0 } < ε1 . All the assumptions of Lemma 6.8 are then satisfied and we conclude that u B M O(T)  ε0 + (S ε1 )1/2  ε0 , provided n 1 is large enough, depending on A, so that ε1 absorbs the constant S. Therefore, (log n 1 )1/b1 u B M O(T)  ε0  , n1 and so John-Nirenberg’s inequality implies that for all ε > 0     {x ∈ T : u(x) − u > ε} < e−c0 ε/u B M O(T) . Writing the last estimate in terms of iterates of our cocycle we have:       {x ∈ T :  1 logB (n 1 ) (x) − L (n 1 ) (B) > ε} < e−c0 ε n 1 /(log n 1 )1/b1 . 1   n1

(6.57)

We ran this argument starting at any scale n 0 ≥ n¯ 0 and we obtained (6.57) for b1 b1 n 1  en 0 . Therefore, if n¯ 1  en¯ 0 , then the estimate (6.57) holds for all n 1 ≥ n¯ 1 , which proves our theorem.  Remark 6.9 Using the terminology in Sect. 6.4, we have shown that for a onevariable torus translation by a (strongly) Diophantine frequency, a uniform fiber-LDT holds with deviation measure function ι(t) ≡ e−c t/[log t] , b

for some c, b > 0.

6.5 Refinements in the One-Variable Case

245

We may then conclude, as in the proof of Theorem 6.1, that if A ∈ Cm has a τ -gap pattern, then in a neighborhood of A the functions Λτ , F τ and E ·τ have the modulus of continuity b ω(h) := e−c [log 1/ h]/[log log 1/ h] . That is, in the presence of gaps in the Lyapunov spectrum, the LE, the Oseledets filtration and the Oseledets decomposition are locally nearly-Hölder continuous.

References 1. A. Ávila, Global theory of one-frequency Schrödinger operators. Acta Math. 215(1), 1–54 (2015). MR 3413976 2. A. Ávila, S. Jitomirskaya, C. Sadel, Complex one-frequency cocycles. J. Eur. Math. Soc. (JEMS) 16(9), 1915–1935 (2014). MR 3273312 3. J. Bourgain, Green’s function estimates for lattice Schrödinger operators and applications. Annals of Mathematics Studies, vol. 158 (Princeton University Press, Princeton, NJ, 2005). MR 2100420 (2005j:35184) 4. J. Bourgain, Positivity and continuity of the Lyapounov exponent for shifts on Td with arbitrary frequency vector and real analytic potential. J. Anal. Math. 96, 313–355 (2005). MR 2177191 (2006i:47064) 5. J. Bourgain, S. Jitomirskaya, Continuity of the Lyapunov exponent for quasiperiodic operators with analytic potential. J. Stat. Phys. 108(5–6), 1203–1218 (2002). Dedicated to David Ruelle and Yasha Sinai on the occasion of their 65th birthdays. MR 1933451 (2004c:47073) 6. D Damanik, Schrödinger operators with dynamically defined potentials: a survey (2015), 1–80 (to appear in Ergodic Theory and Dynamical Systems) (preprint) 7. P. Duarte, S. Klein, Continuity of the Lyapunov exponents for quasiperiodic cocycles. Comm. Math. Phys. 332(3), 1113–1166 (2014). MR 3262622 8. J. Duoandikoetxea, Fourier analysis, Graduate Studies in Mathematics, vol. 29 (American Mathematical Society, Providence, RI, 2001). Translated and revised from the 1995 Spanish original by David Cruz-Uribe. MR 1800316 (2001k:42001) 9. M. Goldstein, W. Schlag, Hölder continuity of the integrated density of states for quasi-periodic Schrödinger equations and averages of shifts of subharmonic functions. Ann. Math. (2) 154(1), 155–203 (2001). MR 1847592 (2002h:82055) 10. M. Goldstein, W. Schlag, Fine properties of the integrated density of states and a quantitative separation property of the Dirichlet eigenvalues. Geom. Funct. Anal. 18(3), 755–869 (2008). MR 2438997 (2010h:47063) 11. W.K. Hayman, P.B. Kennedy, Subharmonic functions. Vol. I. (Academic Press [Harcourt Brace Jovanovich, Publishers], London, 1976). London Mathematical Society Monographs, No. 9. MR 0460672 (57 #665) 12. S. Jitomirskay, C.A. Marx, Continuity of the Lyapunov exponent for analytic quasi-periodic cocycles with singularities. J. Fixed Point Theory Appl. 10(1), 129–146 (2011). MR 2825743 (2012h:37095) 13. S. Jitomirskaya, C.A. Marx, Analytic quasi-perodic cocycles with singularities and the Lyapunov exponent of extended Harper’s model. Comm. Math. Phys. 316(1), 237–267 (2012). MR 2989459 14. S. Jitomirskaya, C.A. Marx, Dynamics and spectral theory of quasi-periodic Schrödinger-type operators (2015), 1–44 (to appear in Ergodic Theory and Dynamical Systems) (preprint) 15. S. Klein, Anderson localization for the discrete one-dimensional quasi-periodic Schrödinger operator with potential defined by a Gevrey-class function. J. Funct. Anal. 218(2), 255–292 (2005). MR 2108112 (2005m:82070)

246

6 Large Deviations for Quasi-Periodic Cocycles

16. S. Klein, Localization for quasiperiodic Schrödinger operators with multivariable Gevrey potential functions. J. Spectr. Theory 4, 1–53 (2014) 17. B.Ya. Levin, Lectures on entire functions. Translations of Mathematical Monographs, vol. 150 (American Mathematical Society, Providence, RI, 1996) In collaboration with and with a preface by Yu. Lyubarskii, M. Sodin, V. Tkachenko, Translated from the Russian manuscript by Tkachenko. MR 1400006 (97j:30001) 18. C. Muscalu, W. Schlag, Classical and multilinear harmonic analysis, vol. I. Cambridge Studies in Advanced Mathematics, vol. 137 (Cambridge University Press, Cambridge, 2013). MR 3052498 19. E.S. Selmer, On the irreducibility of certain trinomials. Math. Scand. 4, 287–302 (1956). MR 0085223 (19,7f) 20. K. Tao, Continuity of Lyapunov exponent for analytic quasi-periodic cocycles on higherdimensional torus. Front. Math. China 7(3), 521–542 (2012). MR 2915794 21. K. Tao, Hölder continuity of Lyapunov exponent for quasi-periodic Jacobi operators. Bull. Soc. Math. France 142(4), 635–671 (2014). MR 3306872 22. Y. Wang, Z. Zhang, Uniform positivity and continuity of Lyapunov exponents for a class of C 2 quasiperiodic Schrödinger cocycles. J. Funct. Anal. 268(9), 2525–2585 (2015). MR 3325529 23. J. You, S. Zhang, Hölder continuity of the Lyapunov exponent for analytic quasiperiodic Schrödinger cocycle with weak Liouville frequency. Ergodic Theory Dynam. Syst. 34(4), 1395–1408 (2014). MR 3227161

Chapter 7

Further Related Problems

Abstract We describe limitations, counterexamples and extensions of the topics presented in this monograph. We outline some connections with the spectral theory of discrete Schrödinger operators with ergodic potentials. We formulate a few related open problems, some of which may be studied using similar methods.

7.1 Limitations and Counterexamples An intrinsic limitation of the inductive method used to prove the continuity of the Lyapunov exponents is that it can never lead to a better than Hölder modulus of continuity. That is because the modulus of continuity depends on the speed of convergence of the finite to the infinite scale Lyapunov exponents. This speed of convergence is in turn dependent upon the leap in the inductive procedure from one scale to the next, which is determined by the strength of the large deviation type estimates. But the exceptional sets in these estimates cannot be better than exponentially small, leading, at best, to Hölder modulus of continuity. However, as we explain below, in general this is optimal. Indeed, in the quasi-periodic, one variable setting, consider the almost Mathieu cocycle   λ cos(x) − E −1 , Aλ,E (x) := 1 0 where λ, E are some real parameters, and the translation is a Diophantine number. Bourgain has shown in [9] that for large enough λ, as a function of E, the Lyapunov exponent of Aλ,E is α-Hölder continuous for any α < 21 . Moreover, due to the presence of gaps in the spectrum of the associated almost Mathieu operator, the Hölder exponent 21 is optimal (see [9, 10]). In the random setting, more precisely for random Schrödinger cocycles, B. Halperin has provided the following family of cocycles with arbitrary small Hölder exponent. Consider the probability measure μa,b (E) := 21 δ X E + 21 δY E , where X E and Y E are the SL(2, R)-matrices

© Atlantis Press and the author(s) 2016 P. Duarte and S. Klein, Lyapunov Exponents of Linear Cocycles, Atlantis Studies in Dynamical Systems 3, DOI 10.2991/978-94-6239-124-6_7

247

248

7 Further Related Problems

XE =

  a − E −1 1 0

YE =

  b − E −1 . 1 0

Each of these measures determines a random irreducible cocycle over a full shift in two symbols. Halperin has proven (see [33, Appendix 3]) that the Hölder exponent in the modulus of continuity for the Lyapunov exponent as a function of the parameter E is less or equal than α(a, b) =

2 log 2 , arccosh(1 + 21 |a − b|)

which tends to 0 as |a − b| → +∞. The previous limitations on the modulus of continuity of the Lyapunov exponents also apply to the continuity of the Oseledets filtration. Assuming the cocycle A has an exact gap pattern τ = (τ1 , . . . , τk ), if the Oseledets filtration has a certain modulus of continuity around A then, because the Oseledets ⊥ filtration F = (F1 , . . . , Fk ) is characterized by F j = v(∞) τ j (A) , the most expand-

−1 (∞) (v (∧τ j A)) ∈ Gr τ j (Rm ) have the same modulus of ing directions v(∞) τ j (A) = Ψ continuity. Recall that Ψ denotes the Plücker embedding of Gr k (Rm ) into P(∧k Rm ). Therefore, in order to understand how the limitations on the continuity of the Lyapunov exponents imply similar limitations on the continuity of the Oseledets filtration, it is enough to see that Hölder continuity of the most expanding direction v(∞) (A) implies Hölder continuity of the first Lyapunov exponent L 1 (A). To be more precise, consider the space Cm of all bounded measurable cocycles A : X → GL(m, R), with bounded inverse, endowed with a metric such that

dist(A, B) ≥ sup A(x) − B(x). x∈X

Recall that L 1 (X, P(Rm )) denotes the space of measurable functions F : X → P(Rm ), where two functions are identified when they differ over azero measure set. This is a complete metric space with the distance d(F, G) :=

d(F(x), G(x)) X

μ(d x).

Proposition 7.1 Given A ∈ Cm such that L 1 (A) > L 2 (A), if v(∞) : Cm → L 1 (X, P(Rm )) is locally α-Hölder around A then the first Lyapunov exponent L 1 : Cm → R is also locally α-Hölder in the same neighborhood of A. Proof Consider a unit measurable section v A : X → Rm of v(∞) (A) : X → P(Rm ). By Proposition 4.11, 

logA(T −1 x)∗ v A (x) μ(d x).

L 1 (A) = X

This formula implies the affirmed relation between the modulus of continuity of v(∞) and L 1 . 

7.1 Limitations and Counterexamples

249

Therefore, the almost Mathieu example for quasi-periodic cocycles, and the Halperin example for random cocycles, show that Hölder is the optimal modulus of continuity one can expect for the Oseledets filtration in these contexts. More serious limitations are provided by the examples where the Lyapunov exponents are known to be discontinuous. By Kingman’s Ergodic theorem, the first Lyapunov exponent of a cocycle is the infimum of the corresponding finite scale Lyapunov exponents. Since in general the finite scale Lyapunov exponents are continuous, the first Lyapunov exponent is an upper semi-continuous function of the cocycle. The Bochi-Mañé dichotomy says that for a residual set of volume preserving C 1 -diffeomorphisms, the systems are either Anosov (uniformely hyperbolic) or else have zero Lyapunov exponents. This result was first announced by R. Mañé in the 1980s, and later proved by Bochi [6] in the context of area preserving diffeomorphisms. Bochi and Viana [7] have generalized this dichotomy to higher dimensional diffeomorphisms, including both the cases of symplectic and volume preserving diffeomorphisms. This type of dichotomy provides residual sets of continuous S L(2, R)-cocycles which are either uniformly hyperbolic or else have zero Lyapunov exponents [5]. Consider an ergodic automorphism T : X → X on a probability space (X, μ), where X is a compact metric space. Let C be the class of continuous cocycles A : X → SL(2, R) with L 1 (A, μ) > 0 but which are not uniformly hyperbolic. By the previous dichotomy, the Lyapunov exponent L 1 : C → R, A → L 1 (A, μ) is discontinuous everywhere. In the context of quasi-periodic cocycles Wang and You [37] have recently provided examples, for any 0 ≤ l ≤ ∞, of class C l quasi-periodic SL(2, R)-cocycles where the first Lyapunov exponent L 1 : C l (T1 , SL(2, R)) → R is discontinuous. A previous work of Young [38] had signaled the ubiquity of elliptic behavior for quasi-periodic cocycles A : T → SL(2, R) over a rotation on the torus T, which is a key ingredient to produce the discontinuities in [37]. As a last example, in the context of random cocycles, consider the following family of probability measures μθ := θ δ X + (1 − θ ) δY , where 0 ≤ θ ≤ 1, and X and Y are the SL(2, R)-matrices  X=

0 −1 1 0

 Y =

  2 0 . 0 2−1

(7.1)

The probability measure μθ determines a random cocycle over a full shift in two symbols. Let us denote its first Lyapunov exponent by L 1 (μθ ). We leave as an exercise for the reader to check that L 1 (μ0 ) = log 2 while L 1 (μθ ) = 0 for all 0 < θ ≤ 1. This shows that θ = 0 is a discontinuity point of the function θ → L 1 (μθ ). In spite of these limitations on the continuity of the Lyapunov exponents, there are positive results which enhance the regularity of the Lyapunov exponents at the cost of restricting the space of cocycles. In the context of random cocycles over Bernoulli shifts of finite type, Peres [32] has proven the analiticity of the first Lyapunov exponent as a function of the probability

250

7 Further Related Problems

vector (Bernoulli measure). More precisely, given matrices X 0 , X 1 , . . . , X k ∈ . , pk ) in the k-dimensional simMat(m, R) and a probability vector p = ( p0 , . . plex Δk , Y. Peres considers the measure μ p := kj=0 p j δ X j , and proves that the function p → L 1 (μ p ) is analytic in the interior of the simplex Δk . The negative example (7.1) justifies the need for excluding the boundary of the simplex. In the same context of random cocycles over Bernoulli shifts, Le Page [31] has proven the smoothness of the first Lyapunov exponent for cocycles determined by probability measures which are absolutely continuous with respect to the group’s Haar measure. In the same spirit we mention the work of Bourgain [12, Theorem 3], [13, Sect. 4] on the smoothness of the Lyapunov exponents for certain random Schrödinger SL(2, R) cocycles (e.g. Anderson-Bernoulli). Finally, there is the work of Ávila [1] who proves the analyticity of the first Lyapunov exponent on the strata of some stratification of the space of analytic quasi-periodic Schrödinger cocycles over an irrational rotation of the circle T. These results indicate that the regularity of the Lyapunov exponents is dependent upon the choice of the space of cocycles.

7.2 Some Connections to Mathematical Physics We outline some connections to the spectral theory of discrete Schrödinger operators. These types of operators describe the Hamiltonian of a quantum particle on the integer lattice. We specialize to ergodic operators, whose potentials are dynamically defined. We then describe some related problems for the more general case of block Jacobi operators. Let (X, μ, T ) be an ergodic dynamical system. A discrete ergodic Schrödinger operator is an operator Hλ (x) on l 2 (Z) ψ = {ψn }n∈Z , defined by [Hλ (x) ψ]n := −(ψn+1 + ψn−1 ) + λ vn (x) ψn ,

(7.2)

where i. λ = 0 is a coupling constant encoding the disorder of the system, and ii. for every lattice point n ∈ Z, the potential vn (x) is given by vn (x) = f (T n x), for some bounded and measurable potential (or sampling) function f : X → R. Due to the ergodicity of the system, the spectral properties of the family of operators {Hλ (x) : x ∈ X } are independent of x almost surely. Consider the Schrödinger (i.e. eigenvalue) equation Hλ (x) ψ = E ψ,

(7.3)

for some energy (i.e. eigenvalue) E ∈ R and state (i.e. eigenvector) ψ = {ψn }n∈Z .

7.2 Some Connections to Mathematical Physics

251

Define the associated Schrödinger cocycle as the cocycle (T, Aλ,E ), where 

 λ f (x) − E −1 Aλ,E (x) := ∈ SL(2, R). 1 0 Note that the Schrödinger equation (7.3) is a second order finite difference equation. An easy calculation shows that its formal solutions are given by 

ψn+1 ψn



= A(n+1) λ,E (x) ·



 ψ0 , ψ−1

where A(n) λ,E (x) are the iterates of Aλ,E (x), for all n ∈ N . The Lyapunov exponents of a Schrödinger cocycle are regarded as functions of the energy parameter E (or, when the disorder λ varies, as functions of λ and E). Note that since Aλ,E (x) ∈ SL(2, R), we have L 2 (Aλ,E ) = −L 1 (Aλ,E ). In spectral theory it is common to use the notation L(E) (or L(λ, E)) for the first Lyapunov exponent, and to refer to it as the Lyapunov exponent. There are direct connections between the LE of a Schrödinger cocycle and the spectral properties of the corresponding operator. We refer the reader to the recent survey paper [20] by Damanik, and only mention here some examples of such connections. By Johnson’s theorem, x-almost surely we have that if L(E) = 0 then E is in the spectrum of Hλ (x) and if E is in the spectrum of Hλ (x) but L(E) > 0, then Aλ,E is not uniformly hyperbolic. More directly relevant to the topic of this monograph, the quantitative continuity properties (such as Hölder continuity) of the LE can be transferred to those of the integrated density of states (IDS) of Hλ (x). The IDS represents the limiting distribution of the eigenvalues of the family of operators {Hλ (x) : x ∈ X }. Its Hölder continuity is a crucial ingredient in obtaining localization properties for random Schrödinger operators. Let us define the IDS and describe its relationship with the LE. Denote by Pn the coordinate restriction operator to {1, 2, . . . , n} ⊂ Z, and let Hλ(n) (x) := Pn Hλ (x) Pn∗ . By ergodicity, the following limit exists for μ-a.e. x ∈ X : Nλ (E) := lim

n→∞

 1  # (−∞, E] ∩ Spectrum of Hλ(n) (x) . n

The function E → Nλ (E) is called the integrated density of states of the family of ergodic operators {Hλ (x) : x ∈ X }. Thouless formula relates the LE and the IDS essentially via the Hilbert transform:  L(E) =

R

  log E − E  d N (E  ).

The IDS is known to be log-Hölder continuous in a very general setting (see [19]). A stronger modulus of continuity may be derived from that of the LE using the Thouless formula. We refer the reader to Lemma 10.3 in [24], which shows

252

7 Further Related Problems

that any singular integral operator (including the Hilbert transform) preserves the modulus of continuity of a function, provided that modulus of continuity satisfies certain conditions. These conditions hold for all of the examples that showed up in our applications, e.g. for Hölder, weak-Hölder and nearly-Hölder continuity. Schrödinger cocycles over a Bernoulli shift with nontrivial probability distribution are automatically irreducible (see [20]), hence by Furstenberg’s theorem they have positive (maximal) LE for all coupling constant λ = 0. Quasi-periodic Schrödinger cocycles (with analytic sampling function) have positive LE for large enough coupling constant λ  1 by Sorets-Spencer theorem and its extensions (see [10]). Therefore, under these conditions, Le Page’s theorem (in the i.i.d. random case) and Goldstein-Schlag theorem (in the quasi-periodic case with Diophantine translations) provide Hölder (or weak-Hölder for the multifrequency torus translation) continuity of the LE. This in turn implies Hölder (or weak-Hölder, respectively) continuity for the IDS of the corresponding Schrödinger operator. The results outlined above concerning random and quasi-periodic Schrödinger operators have been known for some time. We now describe a more general and not as well studied model, that of ergodic block Jacobi operators (also called strip or band lattice operators). These types of operators describe the Hamiltonian of a quantum particle on a band lattice of the form Z × S, where S may be a finite subset of any integer lattice or a finite graph. To simplify matters, let S = {1, . . . , l}. Then the block Jacobi operator acts on l 2 (Z × {1, . . . , l}, R)  l 2 (Z, Rl ), that is, on square #» #» summable sequences of vectors ψ = {ψ n }n∈Z , by #» #» #» #» [Hλ (x) ψ ]n := −(Wn+1 (x) ψ n+1 + Wnt (x) ψ n−1 ) + λ Vn (x) ψ n ,

(7.4)

where we consider i. an underlying ergodic base dynamics (X, μ, T ) (e.g. random or quasi-periodic); ii. Wn (x) = W (T n x) for a bounded measurable ‘weight’ function W : X → Mat(l, R), iii. Vn (x) = F(T n x), for a bounded measurable ‘potential’ function F : X → Mat(l, R). These types of operators have been studied in [14, 23, 26, 33, 35] and in other papers, mostly for more particular models (e.g. with W (x) ≡ I or with l = 2). In [22] we obtained a Sorets-Spencer type theorem for the operator (7.4) over a one frequency torus translation. A recent result of Chapman and Stolz [18] provides a Thouless-type formula (relating the LE and the IDS) which in the full generality of the oper is applicable ator (7.4). Assuming that C W := X logdet W (x)μ(d x) > −∞, their result states that    (L 1 + · · · + L l )(E) = l log E − E  d N (E  ) − C W , (7.5) R

where L 1 , . . . , L l are the first l (i.e. the non-negative) Lyapunov exponents of (7.4).

7.2 Some Connections to Mathematical Physics

253

These considerations suggest that a similar strategy used to establish Hölder continuity of the IDS for Schrödinger operators may be applicable to block Jacobi operators like (7.4) as well. Question 1 Give sufficient conditions under which the IDS of block Jacobi operators over random and quasi-periodic dynamics are Hölder continuous. Note that the cocycle associated with the eigenvalue equation corresponding to the operator (7.4) is more complex than the Schrödinger cocycles: it is higher dimensional, i.e. Mat(m, R)-valued (m = 2l) and it may have singularities. However, it fits the general setting of this monograph. This leads to continuity of the LE (with no restrictions) and to a modulus of continuity (under appropriate conditions) of the LE as functions of various input data: the energy E (most importantly), the coupling constant λ and even the potential function F(x) or the weight W (x). The issue of course is to determine those appropriate conditions that ensure the existence of relevant spectral gaps and thus a modulus of continuity of the LE (or rather of the Lyapunov spectrum blocks). Question 2 Study the spectral properties (e.g. localization in the appropriate regime) of the operator (7.4) with random or quasi-periodic underlying dynamics. We note that some of the crucial ingredients used in establishing localization for Schrödinger operators are already available or likely to follow from the results in this monograph. Indeed, the quasi-periodic model satisfies LDT estimates, while in the random case, a positive answer to Question 1 would provide the Hölder continuity of the IDS.

7.3 Continuity for Other Spaces of Cocycles We discuss some problems regarding the continuity of the LE for larger spaces of cocycles and for different types of base dynamics. In Chap. 6 we established weak-Hölder continuity of the Lyapunov exponents and of the Oseledets filtration/decomposition for quasi-periodic, analytic cocycles with singularities. We assumed the translation vector to be Diophantine. We restricted the problem to the space of non identically singular cocycles. This restriction will be removed in a project which is currently underway. The approach used there is much more technically involved than the one used in Chap. 6. However, this extension will have significant consequences regarding sharp lower bounds for Lyapunov exponents or simplicity of the Lyapunov spectrum. The results just mentioned above, on cocycles with singularities, apply to translations on the torus Td of any number d of variables.

254

7 Further Related Problems

Let us focus now on the one variable case d = 1. In Sect. 6.5 we obtained a stronger result, nearly-Hölder continuity. In [21], for cocycles without singularities, we obtained Hölder continuity. The authors of [35] obtained Hölder continuity for the case of 2 × 2 Jacobi cocycles. This suggests one could expect Hölder continuity in the setting of Chap. 6 as well. Furthermore, a very delicate, related question concerns finding the optimal, or at least an explicit expression for the Hölder exponent α. This problem was studied for almost Mathieu cocycles (see [9], where α = 21 −), for Schrödinger cocycles given by a fixed trigonometric polynomial of degree k (see [25] where α = 2k1 −) and also for more general, Jacobi cocycles with trigonometric polynomials (or analytic functions) entries (see [35], where α is similarly explicit in terms of certain characteristics of these entries). These considerations suggest the following questions. Question 3 Consider the space of m × m analytic, quasi-periodic cocycles with singularities. Assume a generic arithmetic condition on the translation. Assume say, simplicity of the Lyapunov spectrum. (a) In the one variable case d = 1, are the Lyapunov exponents Hölder, instead of just nearly-Hölder continuous? (b) If so, can the Hölder exponent be explicitly given in terms of some characteristics of the cocycle entries? (c) For the several variables d > 1 model, can the modulus of continuity of the Lyapunov exponents be improved from weak-Hölder to Hölder? This last question seems to be very hard even in the case of the Schrödinger cocycles, which are SL(2, R)-valued. In Chap. 5 we established the Hölder continuity of the maximal Lyapunov exponent in the space of irreducible random cocycles, in a neighborhood of a cocycle with a gap between its first two Lyapunov exponents. The same result applies to the Oseledets filtration and decomposition. In [8] (see also [36]), the authors establish continuity of the Lyapunov exponents and of the Oseledets decomposition for GL(2, C)-valued i.i.d. random cocycles, without any generic assumptions (such as irreducibility) on the cocycle. A similar result concerning Lyapunov exponents was obtained in [30] for cocycles over Markov shifts that depend only on one coordinate. Other related results were recently obtained in [3, 4] for fiber-bunched cocycles. A higher dimensional version of the result in [8] for Lyapunov exponents was also announced by A. Ávila, A. Eskin and M. Viana. None of these results provide a modulus of continuity, as the proofs proceed by contradiction. These considerations lead to the following natural questions. Question 4 Consider even the simplest random model, that of cocycles depending on one coordinate, over a finite alphabet Bernoulli shift.

7.3 Continuity for Other Spaces of Cocycles

255

(a) Let A be a reducible random cocycle whose Lyapunov exponents have a certain gap pattern, say simple Lyapunov spectrum. Are the Lyapunov exponents (and the Oseledets filtration/decomposition) still Hölder, or at least weakly-Hölder continuous locally near A? (b) In the absence of gaps between Lyapunov exponents, what can be said about the modulus of continuity of the Lyapunov exponents, even assuming irreducibility? We note in connection with item (b) above that for both random and quasi-periodic cocycles, all available quantitative results (including the ones in this book) depend on the existence of a gap between Lyapunov exponents. One should then perhaps begin the study of this problem by considering a simple SL(2, R)-valued cocycle A with zero Lyapunov exponents and seeing if she could find a sequence Ak → A for which L 1 (Ak ) → 0 at an arbitrarily slow rate. A testing example for such continuity properties at a cocycle which is both reducible and has zero Lyapunov exponents (hence no Lyapunov spectrum gaps), is the following. Let Aε be the cocycle obtained by choosing with equal probability either X ε or Y , where     20 sin ε − cos ε . X ε := and Y := cos ε sin ε 0 21 We know from [8] that ε → L(Aε ) is continuous near 0. What is the modulus of continuity of this map? In Chaps. 5 and 6 we applied the abstract continuity theorem of the Lyapunov exponents derived in Chap. 3 to various models of random and quasi-periodic cocycles, thus extending previous quantitative continuity results to more general spaces of cocycles. In fact, most available results regarding quantitative continuity in the cocycle, including results referring to other types of base dynamics, do fit our scheme, as we explain below. Consider the Schrödinger equation − (ψn+1 + ψn−1 ) + λ f (T n x) ψn = E ψn

(7.6)

associated to a discrete, one dimensional Schrödinger operator with dynamically defined potential, as described in Sect. 7.2. Let (Aλ, f,E , T ) be the associated Schrödinger cocycle, where 

 λ f (x) − E −1 ∈ SL(2, R). Aλ, f,E (x) := 1 0 We consider different types of base dynamics T as follows. i. Let T be the skew translation on T2 given by T (x1 , x2 ) := (x1 + x2 , x2 + ω),

256

7 Further Related Problems

where ω ∈ T satisfies a Diophantine condition. Assume also that f (x1 , x2 ) is a non-constant real-analytic function on T2 . J. Bourgain, M. Goldstein and W. Schlag proved (see [16])   that for every ε > 0, there are λ0 = λ0 (ε, f ) and a set of frequencies Ωε with Ωε  > 1 − ε, such that for all ω ∈ Ωε and for all λ ≥ λ0 , the map E → L 1 (Aλ, f,E ) is weakly-Hölder. That is, as a function of the energy parameter E, the Lyapunov exponent is weakly-Hölder continuous. We note that the assumption λ  1 ensures the positivity of the first LE, and hence the gap between the Lyapunov exponents. ii. Let T be an expanding map of the torus, i.e. either the doubling map T x := 2x mod 1 on T or a hyperbolic toral automorphism T x := M x mod 1 on T2 , for some M ∈ SL(2, Z). Assume that f (x) is a C 1 -function on T with T f = 0. J. Bourgain and W. Schlag proved (see [17]) that for an appropriate range I of energies and for small enough λ, the map I E → L 1 (Aλ, f,E ) is Hölder continuous. The argument depends on the positivity of the first Lyapunov exponents, which is ensured by the small λ assumption. iii. Let T be either a Diophantine translation on Td , d ≥ 1 or the skew-translation defined above. Assume that f is a Gevrey-class function, that is, for all multiindices m ∈ Nd we have   sup ∂ m f (x)  K |m| (m!)s

(7.7)

x∈Td

for some constants M, K > 0 and s ≥ 1 called the order of the Gevrey-class. Denote by G s (Td ) the set of all Gevrey-class functions of order s, and note that s = 1 represents the analytic class, while as s increases, G s (Td ) becomes larger. S. Klein proved (see [28, 29]) that for large λ, the map E → L 1 (Aλ, f,E ) is weakly-Hölder, provided the potential (i.e. sampling) function f satisfies a generic transversality condition. That transversality condition is automatically satisfied by non-constant analytic functions. A similar result holds even without a transversality assumption on f , but the order s of the Gevrey-class is restricted in that case. In all of these papers, LDT estimates are derived (and then used to prove continuity of the LE). Given the availability of these LDT estimates, our abstract continuity Theorem 3.1 applies as well. Indeed, consider the space of cocycles C ( f, λ) := {Aλ, f,E : E ∈ I} for a given function f , coupling constant λ and appropriate interval of energies I. Then if f is as above (i.e. real-analytic or Gevrey-class), there is λ0 such that C ( f, λ) satisfies the assumptions of the ACT, provided λ ≥ λ0 for the translation and skew-translation and λ ≤ λ0 for the expanding map base dynamics.

7.3 Continuity for Other Spaces of Cocycles

257

These considerations lead to some natural questions. Question 5 Let T be as above (i..e a skew translation or an expanding map of the torus) and assume that the cocycle is analytic. Study the continuity problem of the LE for Schrödinger cocycles Aλ, f,E without restrictions on λ. In other words, if for fixed λ and for some E 0 we have L 1 (Aλ, f,E0 ) > 0, is the map E → L 1 (Aλ, f,E ) (weakly-) Hölder continuous near E 0 ? What about the continuity in (λ, E)? Question 6 Instead of Schrödinger, hence SL(2, R)-valued cocycles, consider the more general case of say GL(m, R)-valued cocycles over the same type of base dynamics. Are the LE still (weakly-) Hölder in a neighborhood of a cocycle with simple Lyapunov spectrum? Regarding the last two question above, we note that when T is the skew translation, we could derive the fiber LDT estimates needed for the ACT in an inductive procedure, if only an initial scale LDT estimate were available. Getting the initial scale LDT estimate is precisely where the large λ assumption is used in the case of Schrödinger cocycles, and we do not have an argument for it otherwise. Let T be a translation on T by a well chosen frequency ω. As the work [37] of Y. Wang and J. You shows, unlike in the analytic case, continuity of the LE exponents does not hold in the larger space of C l , 1 ≤ l ≤ ∞ cocycles.   f (x) −1 More precisely, assume ω of bounded type and let A f (x) := . Then 1 0 the discontinuity of the LE from [37] occurs in the space C := {A f : f ∈ C l (T)}, at a point A f0 where f 0 is some non transversal function. As mentioned above, continuity of the LE for the one-parameter family of Schrödinger cocycles {Aλ, f,E : E ∈ R} (hence f is fixed) was obtained for f in any Gevrey-class (which is intermediate between analytic and C ∞ ) under a transversality assumption. If, moreover, the order of the Gevrey-class is restricted, the transversality condition is not necessary. All of these suggest the following questions. Question 7 What is responsible for the continuity of the LE for quasi-periodic cocycles: the regularity of the function, the transversality condition, or the choice of the topology on the space of cocycles? What is the regularity class threshold between positive and negative continuity results? Question 8 Study the continuity of the LE for Gevrey-class, GL(m, R)-valued cocycles over a translation or a skew-translation. We note, as with Question 6, that the main obstacle for proving fiber LDT estimates inductively is again the initial step.

7.4 Continuity with Respect to Other Parameters In this last section we address a few problems on the continuity of Lyapunov exponents with respect to the frequency vector (for quasi-periodic cocycles), and to the transition probability measure (for random cocycles).

258

7 Further Related Problems

In the context of quasi-periodic cocycles, the bundle map is determined by a translation vector ω ∈ Rd as well as by a function A : Td → Mat(m, R), and hence it is natural to ask about joint continuity in the pair (ω, A). In the case of analytic quasi-periodic Schrödinger cocycles, Bourgain and Jitomirskaya [15] obtained joint continuity in frequency and energy (ω, E) at all points with ω ∈ T irrational. Later Bourgain [11] extended this result to quasiperiodic cocycles on Td with d > 1. In a much broader context, Jitomirskaya and Marx [27] obtained joint continuity in (ω, A) for irrational ω and analytic cocycles A : T → Mat(2, C). A generalization to cocycles of arbitrary dimension follows from the work of Ávila et al. [2]. All of these results are non quantitative. Recently, Tao [34] proved weak-Hölder joint continuity of the LE in (ω, A), for ω ∈ DC and A : Td → Mat(2, R) analytic, non-identically singular and with a fixed determinant. Here DC denotes any full measure set of vectors ω ∈ Rd satisfying a Diophantine condition with a fixed exponent but with varying constant. Question 9 Consider the space Cm of analytic functions A : Td → Mat(m, R) which are non-identically singular. Is there some quantitative joint continuity of the LE in (ω, A) ∈ DC×Cm , assuming for instance the Lyapunov spectrum to be simple? Question 10 Consider the space Cm of analytic functions A : Td → GL(m, R). Are the LE jointly continuous in (ω, A) ∈ {rationally indep. vectors} × Cm ? In the random setting, Markov cocycles are determined by a stochastic kernel K on some compact space of symbols Σ and a measurable function X x → A(x) ∈ Mat(m, R) on the space of sequences X = Σ Z , which depends only on two coordinates x0 and x1 . The base dynamics for these cocycles is a Markov shift on the space of sequences X . Bernoulli cocycles correspond to the special case where the transition probabilities K (x, ·) = μ are constant in x, and the matrix valued function A(x) depends only on the coordinate x0 . In this setting it is natural to ask about joint continuity of the LE in (K , A) (for Markov cocycles) or in (μ, A) (for Bernoulli cocycles). In the context of Bernoulli cocycles, Bocker-Neto and Viana [8] proved joint continuity of the LE in (μ, A) for μ ∈ Prob(Σ) and A : Σ Z → GL(2, R) depending only on the first coordinate. The generalization of this result to cocycles of arbitrary dimension has been announced by A. Ávila, A. Eskin and M. Viana (see the introduction of the book [36]). Finally we mention the work of E. Malheiro and M. Viana [30], in the context of random Markov cocycles, who prove joint continuity of the LE in (K , A), where K is a stochastic matrix in some finite space of symbols Σ, and A : Σ Z → GL(2, R) depends only on the first coordinate x0 . These results provide no modulus of continuity. Question 11 Consider the space Cm of measurable functions A : Σ Z → GL(m, R) which depend only on the first coordinate x0 . Is there some quantitative joint continuity of the LE in (μ, A) ∈ Prob(Σ) × Cm , assuming for instance that the Lyapunov spectrum is simple and the cocycle is irreducible?

7.4 Continuity with Respect to Other Parameters

259

Question 12 Consider the space Cm of measurable functions A : Σ Z → GL(m, R) which depend only on the first two coordinates x0 and x1 . For some appropriate topology on the space M (Σ) of stochastic kernels on Σ, is there some quantitative joint continuity of the LE in (K , A) ∈ M (Σ) × Cm , assuming for instance that the Lyapunov spectrum is simple and the cocycle is irreducible? Some of these problems will be addressed in future projects. We believe that the work in this monograph can be easily adapted to obtain positive answers to the Questions 1, 9, 11 and 12.

References 1. A. Ávila, Global theory of one-frequency Schrödinger operators. Acta Math. 215(1), 1–54 (2015). MR 3413976 2. A. Ávila, S. Jitomirskaya, C. Sadel, Complex one-frequency cocycles. J. Eur. Math. Soc. (JEMS) 16(9), 1915–1935 (2014). MR 3273312 3. L. Backes, A note on the continuity of Oseledets subspaces for fiber-bunched cocycles, preprint (2015), 1–6 4. L. Backes, A.W. Brown, C. Butler, Continuity of Lyapunov exponents for cocycles with invariant holonomies (2015), 1–34 (preprint) 5. J. Bochi, Discontinuity of the Lyapunov exponent for non-hyperbolic cocycles (1999), 1–14 (preprint) 6. J. Bochi, Genericity of zero Lyapunov exponents. Ergodic Theor. Dynam. Syst. 22(6),break 1667–1696 (2002). MR 1944399 (2003m:37035) 7. J. Bochi, M. Viana, Uniform (projective) hyperbolicity or no hyperbolicity: a dichotomy for generic conservative maps. Ann. Inst. H. Poincaré Anal. Non Linéaire 19(1), 113–123 (2002). MR 1902547 (2003f:37040) 8. C. Bocker-Neto, M. Viana, Continuity of Lyapunov exponents for random 2d matrices (2010), 1–38 (to appear in Ergodic Theory and Dynamical Systems) (preprint) 9. J. Bourgain, Hölder regularity of integrated density of states for the almost Mathieu operator in a perturbative regime. Lett. Math. Phys. 51(2), 83–118 (2000). MR 1774640 (2003a:47072) 10. J. Bourgain, Green’s function estimates for lattice Schrödinger operators and applications. Annals of Mathematics Studies, vol. 158 (Princeton University Press, Princeton, NJ, 2005). MR 2100420 (2005j:35184) 11. J. Bourgain, Positivity and continuity of the Lyapounov exponent for shifts on Td with arbitrary frequency vector and real analytic potential. J. Anal. Math. 96, 313–355 (2005). MR 2177191 (2006i:47064) 12. J. Bourgain, On the Furstenberg measure and density of states for the Anderson-Bernoulli model at small disorder. J. Anal. Math. 117, 273–295 (2012). MR 2944098 13. J. Bourgain, An application of group expansion to the Anderson-Bernoulli model. Geom. Funct. Anal. 24(1), 49–62 (2014). MR 3177377 14. J. Bourgain, S. Jitomirskaya, Anderson localization for the band model. Geometric Aspects of Functional Analysis, Lecture Notes in Mathematics, vol. 1745 (Springer, Berlin, 2000), pp. 67–79. MR 1796713 (2002d:81053) 15. J. Bourgain, Continuity of the Lyapunov exponent for quasiperiodic operators with analytic potential. J. Statist. Phys. 108(5–6), 1203–1218 (2002). Dedicated to David Ruelle and Yasha Sinai on the occasion of their 65th birthdays 16. J. Bourgain, M. Goldstein, W. Schlag, Anderson localization for Schrödinger operators on Z with potentials given by the skew-shift. Comm. Math. Phys. 220(3), 583–621 (2001). MR 1843776 (2002g:81026)

260

7 Further Related Problems

17. J. Bourgain, W. Schlag, Anderson localization for Schrödinger operators on Z with strongly mixing potentials. Comm. Math. Phys. 215(1), 143–175 (2000). MR 1800921 (2002d:81054) 18. J. Chapman, G. Stolz, Localization for random block operators related to the XY spin chain. Ann. Henri Poincaré 16(2), 405–435 (2015). MR 3302603 19. W. Craig, B. Simon, Log Hölder continuity of the integrated density of states for stochastic Jacobi matrices. Comm. Math. Phys. 90(2), 207–218 (1983). MR 714434 (85k:47012) 20. D. Damanik, Schrödinger operators with dynamically defined potentials: a survey, preprint (2015), 1–80, to appear in Ergodic Theory and Dynamical Systems 21. P. Duarte, S. Klein, Continuity of the Lyapunov exponents for quasiperiodic cocycles. Comm. Math. Phys. 332(3), 1113–1166 (2014). MR 3262622 22. P. Duarte, S. Klein, Positive Lyapunov exponents for higher dimensional quasiperiodic cocycles. Comm. Math. Phys. 332(1), 189–219 (2014). MR 3253702 23. I.Ya. Gol dshe˘ıd, E. Sorets, Lyapunov exponents of the Schrödinger equation with quasiperiodic potential on a strip. Comm. Math. Phys. 145(3), 507–513 (1992). MR 1162358 (93f:39007) 24. M. Goldstein, W. Schlag, Hölder continuity of the integrated density of states for quasi-periodic Schrödinger equations and averages of shifts of subharmonic functions. Ann. of Math. (2) 154(1), 155–203 (2001). MR 1847592 (2002h:82055) 25. M. Goldstein, W. Schlag, Fine properties of the integrated density of states and a quantitative separation property of the Dirichlet eigenvalues. Geom. Funct. Anal. 18(3), 755–869 (2008). MR 2438997 (2010h:47063) 26. A. Haro, J. Puig, A Thouless formula and Aubry duality for long-range Schrödinger skewproducts. Nonlinearity 26(5), 1163–1187 (2013). MR 3043377 27. S. Jitomirskaya, C.A. Marx, Analytic quasi-perodic cocycles with singularities and the Lyapunov exponent of extended Harper’s model. Comm. Math. Phys. 316(1), 237–267 (2012). MR 2989459 28. S. Klein, Anderson localization for the discrete one-dimensional quasi-periodic Schrödinger operator with potential defined by a Gevrey-class function. J. Funct. Anal. 218(2), 255–292 (2005). MR 2108112 (2005m:82070) 29. S. Klein, Localization for quasiperiodic Schrödinger operators with multivariable Gevrey potential functions. J. Spectr. Theor. 4, 1–53 (2014) 30. E.C. Malheiro, M. Viana, Lyapunov exponents of linear cocycles over Markov shifts. Stoch. Dyn. 15(3), 1550020, 27 (2015). MR 3349975 31. É. Le Page, Régularité du plus grand exposant caractéristique des produits de matrices aléatoires indépendantes et applications. Annales de l’institut Henri Poincaré (B) Probabilités et Statistiques 25(2), 109–142 (1989) (fre) 32. Y. Peres, Analytic dependence of Lyapunov exponents on transition probabilities. Lyapunov Exponents (Oberwolfach, 1990), Lecture Notes in Mathematics, vol. 1486 (Springer, Berlin, 1991), pp. 64–80. MR 1178947 (94c:60116) 33. B. Simon, M. Taylor, Harmonic analysis on SL(2, R) and smoothness of the density of states in the one-dimensional Anderson model. Comm. Math. Phys. 101(1), 1–19 (1985). MR 814540 (87i:82087) 34. K. Tao, Continuity of Lyapunov exponent for analytic quasi-periodic cocycles on higherdimensional torus. Front. Math. China 7(3), 521–542 (2012). MR 2915794 35. K. Tao, M. Voda, Hölder continuity of the integrated density of states for quasi-periodic jacobi operators, preprint (2015), 1–19 36. M. Viana, Lectures on Lyapunov Exponents. Cambridge Studies in Advanced Mathematics (Cambridge University Press, 2014) 37. Y. Wang, J. You, Examples of discontinuity of Lyapunov exponent in smooth quasiperiodic cocycles. Duke Math. J. 162(13), 2363–2412 (2013). MR 3127804 38. L.-S. Young, Lyapunov exponents for some quasi-periodic cocycles. Ergodic Theor. Dynam. Syst. 17(2), 483–504 (1997). MR 1444065 (98c:58123)

Index

A Abstract continuity theorem, 15, 86 Abstract setting assumptions (A1)–(A4), 179 assumptions (B1)–(B7), 178 constants, 180 space of observed Markov systems, 178 theorem, 185 Almost everywhere Cauchy, 121 Avalanche principle, 1, 63, 68 complex version, 78 flag version, 76 practical version, 73 times, 3, 123

B Birkhoff’s ergodic theorem, 7, 118, 223 quantitative, 225 sharp, 238 Bundle base, 7 fiber, 7 measurable, 7 trivial, 7 unit measurable section, 129

D Deviation functions set measure, 13, 85 size, 13, 85 Diophantine condition, 17, 208 DCt , 208 DC, 208 strong, 237, 238

E Expansion rift, 43 Expected value, 163 Exterior algebra, 25 k-vector, 25 norm, 26 simple, 25 unit, 26 dual wedge product, 27 exterior power, 27 Hodge star operator, 27 inner product, 26 volume element, 27 wedge product, 25

F Fekete’s subadditive lemma, 118 Filtration finite scale, 114 measurable, 113 Oseledets, 114 Flag, 117, 167 α-angle, 43 length, 31 manifold, 31, 117 metrics ρ, d, δ, 31 orthogonal complement, 31, 117 orthogonal hyperplane, 43 pull back action, 32 push forward action, 32 signature, 31, 116

G Grassmannian, 28 α-angle, 41

© Atlantis Press and the author(s) 2016 P. Duarte and S. Klein, Lyapunov Exponents of Linear Cocycles, Atlantis Studies in Dynamical Systems 3, DOI 10.2991/978-94-6239-124-6

261

262 duality, 29 Hausdorff distance, 39 intersection operation, 29 metrics ρ, d, δ, 28 minimum distance, 39 orthogonal hyperplane, 41 Plücker embedding, 28 pull back action, 30 push forward action, 30 sum operation, 29

J Jacobi operator, 250, 252

K Kingman’s ergodic theorem, 7, 118

L Laplace-Markov operator, 177 Large deviation principle, 13 Large deviation type estimates, 4, 13, 84, 169, 185 LDT base, 14, 85, 169 quasi-periodic cocycle, 225, 229 estimates, 14, 84, 185 exponential type, 170, 185 fiber, 15, 85, 170 parameters, 14, 85 uniform fiber, 15, 85, 170, 211 Least expansion, 32 Linear cocycle, 8, 81, 119 τ -gap pattern, 117, 137, 149, 209 exact, 117, 137, 149 adjoint, 128 base dynamics, 8, 81 compatible with set of observables, 11, 84 fiber action, 8, 81 integrable, 8, 119 irreducible, 16, 166 iterates, 8, 81 limit singular basis, 134 most expanding τ -flag, 136 most expanding k-plane, 125 most expanding direction, 121 non identically singular, 17, 208 quasi-periodic, 8, 208 almost Mathieu, 247 Schrödinger, 211 with singularities, 208, 211

Index random Bernoulli, 8 Halperin example, 247 random Markov, 9, 165 regular point, 126 Schrödinger, 251 singular basis, 126 Linear map α-angle, 44 β-angle, 44 τ -block product, 76 τ -gap pattern, 36, 117 exact, 117 τ -gap ratio, 36 τ -singular value product, 76 k singular gap, 35 k-gap ratio, 35 adjoint, 27 expansion rift, 43, 63 first singular gap, 35 gap ratio, 35, 63, 117 kernel, 29 least expanding τ -flag, 37 least expanding k-subspace, 36 most expanding τ -flag, 37, 117 most expanding k-subspace, 36, 117 most expanding direction, 36, 117 pseudo inverse, 138 range, 29 relative distance, 57 Lojasiewicz inequality, 212 Lyapunov τ -block, 167, 209 exponents, 9, 117, 119, 120, 137, 167 finite scale, 82 spectrum, 10, 117, 137 gap, 109

M Markov kernel, 6, 161 Doeblin condition, 165 ergodic measure, 162 finite order, 202 invariant set, 162 stationary measure, 162 strongly mixing, 6 system, 162 finite order, 203 Kolmogorov extension, 163 observable, 170 strong mixing, 163 Markov operator, 176

Index Measurable bundle, 118 fiber, 118 Measurable decomposition, 168 τ -pattern, 152, 168 space of, 149, 152, 168 Measurable filtration, 167 τ -pattern, 151, 167 space of, 149, 151, 167 Measure preserving dynamical system, 5 ergodic, 5, 81 mixing, 5 Measure preserving transformation, 5 Modulus of continuity Hölder, 17 nearly-Hölder, 245 weak-Hölder, 17 Multiplicative ergodic theorem, 113

O Observable, 6, 84 compatible with cocycle, 11, 84 cumulant generating function, 171 future independent, 169 Hölder norm, 168, 178 space of, 168, 178 sum process, 170 cumulant generating function, 172 Observed Markov system, 170 Operation ⊕, 44 Operator Laplace-Markov, 177 Markov, 176 quasi-compact, 176 simple, 176 spectral radius, 176 Oseledets decomposition, 168 filtration, 168 multiplicative ergodic theorem, 9, 120, 137, 139

263 orthogonal hyperplane, 41 space, 23

S Schrödinger cocycle, 251 quasi-periodic, 211 operator, 250 quasi-periodic, 211 Separately L 2 -bounded function, 215 uniformly, 216 Sequence ε-doubling, 122 Shift, 81, 163 Bernoulli, 6 Markov, 6 Singular value, 32 decomposition, 33 singular basis, 33 volume expansion factor,det+ , 34 Singular vector, 32 Space of cocycles L p -bounded, 10, 83 measurable, 10, 82 uniformly L p -bounded, 10, 83 Stochastic kernel, 6 Subharmonic function, 218 BMO norm estimate, 222 Fourier coefficients decay, 221 Riesz representation theorem, 219 uniform measurement of, 220, 222

T Torus translation, 5, 81 Transversal intersection, 51 measurements θ+ , θ∩ , 51 sum, 51

U Upper semicontinuity of the top LE, 86, 92 P Pluri subharmonic function, 211, 223 uniform measurement of, 224 Projective α-angle, 41 action, 24 metric δ, 38 metrics ρ, d, δ, 24

V Vector space τ -decomposition, 54, 117 intersection operation, 56, 117 space of, 117 transversality, 55, 117