Martingales in Banach spaces 9781107137240, 1107137241, 9781316480588, 1316480585

This book focuses on the major applications of martingales to the geometry of Banach spaces, and a substantial discussio

365 45 3MB

English Pages xxviii, 561 pages) : PDF file(s [590] Year 2016

Report DMCA / Copyright

DOWNLOAD FILE

Polecaj historie

Martingales in Banach spaces
 9781107137240, 1107137241, 9781316480588, 1316480585

Table of contents :
Contents......Page 6
Introduction......Page 10
Description of the contents......Page 14
1.1 Banach space valued Lp-spaces......Page 30
1.2 Banach space valued conditional expectation......Page 36
1.3 Martingales: basic properties......Page 38
1.4 Examples of filtrations......Page 41
1.5 Stopping times......Page 46
1.6 Almost sure convergence: Maximal inequalities......Page 48
1.7 Independent increments......Page 57
1.8 Phillips’s theorem......Page 60
1.9 Reverse martingales......Page 63
1.10 Continuous time*......Page 65
1.11 Notes and remarks......Page 70
2.1 Vector measures......Page 71
2.2 Martingales, dentability and the Radon-Nikodým property......Page 75
2.3 The dual of Lp(B)......Page 86
2.4 Generalizations of Lp(B)......Page 89
2.5 The Krein-Milman property......Page 92
2.6 Edgar’s Choquet theorem......Page 96
2.7 The Lewis-Stegall theorem......Page 99
2.8 Notes and remarks......Page 102
3.1 Harmonicity and the Poisson kernel......Page 105
3.2 The hp spaces of harmonic functions on D......Page 109
3.3 Non-tangential maximal inequalities: boundary behaviour......Page 116
3.4 Harmonic functions and RNP......Page 126
3.5 Brownian martingales*......Page 130
3.6 Notes and remarks......Page 139
4.1 Subharmonic functions......Page 141
4.2 Outer functions and Hp(D)......Page 145
4.3 Banach space valued Hp-spaces for 0 < p ≤ ∞......Page 147
4.4 Analytic Radon-Nikodým property......Page 156
4.5 Hardy martingales and Brownian motion*......Page 160
4.6 B-valued hp and Hp over the half-plane U*......Page 169
4.7 Further complements*......Page 175
4.8 Notes and remarks......Page 177
5.1 Martingale transforms (scalar case): Burkholder’s inequalities......Page 180
5.2 Square functions for B-valued martingales: Kahane’s inequalities......Page 183
5.3 Definition of UMD......Page 188
5.4 Gundy’s decomposition......Page 191
5.5 Extrapolation......Page 197
5.6 The UMD1 property: Burgess Davis decomposition......Page 204
5.7 Examples: UMD implies super-RNP......Page 209
5.8 Dyadic UMD implies UMD......Page 212
5.9 The Burkholder-Rosenthal inequality......Page 219
5.10 Stein inequalities in UMD spaces......Page 224
5.11 Burkholder’s geometric characterization of UMD space......Page 226
5.12 Appendix: hypercontractivity on {−1, 1}......Page 235
5.13 Appendix: Hölder-Minkowski inequality......Page 236
5.14 Appendix: basic facts on weak-Lp......Page 238
5.15 Appendix: reverse Hölder principle......Page 239
5.16 Appendix: Marcinkiewicz theorem......Page 241
5.18 Notes and remarks......Page 243
6.1 Hilbert transform: HT spaces......Page 247
6.2 Bourgain’s transference theorem: HT implies UMD......Page 257
6.3 UMD implies HT......Page 262
6.4 UMD implies HT (with stochastic integrals)*......Page 275
6.5 Littlewood-Paley inequalities in UMD spaces......Page 278
6.6 The Walsh system Hilbert transform......Page 283
6.7 Analytic UMD property*......Page 284
6.8 UMD operators*......Page 286
6.9 Notes and remarks......Page 289
7.1 Banach space valued H1 and BMO: Fefferman’s duality theorem......Page 292
7.2 Atomic B-valued H1......Page 295
7.3 H1, BMO and atoms for martingales......Page 311
7.4 Regular filtrations......Page 320
7.5 From dyadic BMO to classical BMO......Page 321
7.6 Notes and remarks......Page 326
8 Interpolation methods (complex and real)......Page 328
8.1 The unit strip......Page 329
8.2 The complex interpolation method......Page 332
8.3 Duality for the complex method......Page 350
8.4 The real interpolation method......Page 361
8.5 Real interpolation between Lp-spaces......Page 365
8.6 The K-functional for (L1(B0 ), L∞(B1 ))......Page 371
8.7 Real interpolation between vector valued Lp-spaces......Page 375
8.8 Duality for the real method......Page 381
8.9 Reiteration for the real method......Page 386
8.10 Comparing the real and complex methods......Page 391
8.11 Symmetric and self-dual interpolation pairs......Page 392
8.12 Notes and remarks......Page 397
9 The strong p-variation of scalar valued martingales......Page 401
9.1 Notes and remarks......Page 409
10.1 Uniform convexity......Page 411
10.2 Uniform smoothness......Page 424
10.3 Uniform convexity and smoothness of Lp......Page 433
10.4 Type, cotype and UMD......Page 435
10.5 Square function inequalities in q-uniformly convex and p-uniformly smooth spaces......Page 446
10.6 Strong p-variation, uniform convexity and smoothness......Page 452
10.7 Notes and remarks......Page 454
11.1 Finite representability and super-properties......Page 457
11.2 Super-reflexivity and inequalities for basic sequences......Page 462
11.3 Uniformly non-square and J-convex spaces......Page 475
11.4 Super-reflexivity and uniform convexity......Page 483
11.5 Strong law of large numbers and super-reflexivity......Page 486
11.6 Complex interpolation: θ-Hilbertian spaces......Page 488
11.7 Complex analogues of uniform convexity*......Page 492
11.8 Appendix: ultrafilters, ultraproducts......Page 501
11.9 Notes and remarks......Page 503
12 Interpolation between strong p-variation spaces......Page 506
12.1 The spaces vp(B), Wp(B) and Wp,q(B)......Page 507
12.2 Duality and quasi-reflexivity......Page 511
12.3 The intermediate spaces up(B) and vp(B)......Page 514
12.4 Lq-spaces with values in vp and Wp......Page 518
12.5 Some applications......Page 521
12.6 K-functional for (vr (B), ℓ∞(B))......Page 523
12.7 Strong p-variation in approximation theory......Page 525
12.8 Notes and remarks......Page 528
13.1 Exponential inequalities......Page 529
13.2 Concentration of measure......Page 532
13.3 Metric characterization of super-reflexivity: trees......Page 535
13.4 Another metric characterization of super-reflexivity: diamonds......Page 541
13.5 Markov type p and uniform smoothness......Page 548
13.6 Notes and remarks......Page 550
14.1 Non-commutative probability space......Page 552
14.2 Non-commutative Lp-spaces......Page 553
14.3 Conditional expectations: non-commutative martingales......Page 556
14.4 Examples......Page 558
14.5 Non-commutative Khintchin inequalities......Page 560
14.6 Non-commutative Burkholder inequalities......Page 562
14.7 Non-commutative martingale transforms......Page 564
14.8 Non-commutative maximal inequalities......Page 566
14.9 Martingales in operator spaces......Page 568
14.10 Notes and remarks......Page 570
Bibliography......Page 571
Index......Page 589

Citation preview

CAMBRIDGE STUDIES IN ADVANCED MATHEMATICS 155 Editorial Board B. BOLLOBÁS, W. FULTON, A. KATOK, F. KIRWAN, P. SARNAK, B. SIMON, B. TOTARO

MARTINGALES IN BANACH SPACES This book is focused on the major applications of martingales to the geometry of Banach spaces, but a substantial discussion of harmonic analysis in Banach space valued Hardy spaces is presented. Exciting links between super-reflexivity and some metric spaces related to computer science are covered, as is an outline of the recently developed theory of non-commutative martingales, which has natural connections with quantum physics and quantum information theory. Requiring few prerequisites and providing fully detailed proofs for the main results, this self-contained study is accessible to graduate students with basic knowledge of real and complex analysis and functional analysis. Chapters can be read independently, each building from introductory notes, and the diversity of topics included also means this book can serve as the basis for a variety of graduate courses. Gilles Pisier was a professor at the University of Paris VI from 1981 to 2010 and has been Emeritus Professor since then. He has been a distinguished professor and holder of the Owen Chair in Mathematics at Texas A&M University since 1985. His international prizes include the Salem Prize in harmonic analysis (1979), the Ostrowski Prize (1997), and the Stefan Banach Medal (2001). He is a member of the Paris Académie des sciences, a Foreign Member of the Polish and Indian Academies of Science, and a Fellow of both the IMS and the AMS. He is also the author of several books, notably The Volume of Convex Bodies and Banach Space Geometry (1989) and Introduction to Operator Space Theory (2002), both published by Cambridge University Press.

CAMBRIDGE STUDIES IN ADVANCED MATHEMATICS Editorial Board: B. Bollobás, W. Fulton, A. Katok, F. Kirwan, P. Sarnak, B. Simon, B. Totaro All the titles listed below can be obtained from good booksellers or from Cambridge University Press. For a complete series listing, visit: www.cambridge.org/mathematics. Already published 116 D. Applebaum Lévy processes and stochastic calculus (2nd Edition) 117 T. Szamuely Galois groups and fundamental groups 118 G. W. Anderson, A. Guionnet & O. Zeitouni An introduction to random matrices 119 C. Perez-Garcia & W. H. Schikhof Locally convex spaces over non-Archimedean valued fields 120 P. K. Friz & N. B. Victoir Multidimensional stochastic processes as rough paths 121 T. Ceccherini-Silberstein, F. Scarabotti & F. Tolli Representation theory of the symmetric groups 122 S. Kalikow & R. McCutcheon An outline of ergodic theory 123 G. F. Lawler & V. Limic Random walk: A modern introduction 124 K. Lux & H. Pahlings Representations of groups 125 K. S. Kedlaya p-adic differential equations 126 R. Beals & R. Wong Special functions 127 E. de Faria & W. de Melo Mathematical aspects of quantum field theory 128 A. Terras Zeta functions of graphs 129 D. Goldfeld & J. Hundley Automorphic representations and L-functions for the general linear group, I 130 D. Goldfeld & J. Hundley Automorphic representations and L-functions for the general linear group, II 131 D. A. Craven The theory of fusion systems 132 J. Väänänen Models and games 133 G. Malle & D. Testerman Linear algebraic groups and finite groups of Lie type 134 P. Li Geometric analysis 135 F. Maggi Sets of finite perimeter and geometric variational problems 136 M. Brodmann & R. Y. Sharp Local cohomology (2nd Edition) 137 C. Muscalu & W. Schlag Classical and multilinear harmonic analysis, I 138 C. Muscalu & W. Schlag Classical and multilinear harmonic analysis, II 139 B. Helffer Spectral theory and its applications 140 R. Pemantle & M. C. Wilson Analytic combinatorics in several variables 141 B. Branner & N. Fagella Quasiconformal surgery in holomorphic dynamics 142 R. M. Dudley Uniform central limit theorems (2nd Edition) 143 T. Leinster Basic category theory 144 I. Arzhantsev, U. Derenthal, J. Hausen & A. Laface Cox rings 145 M. Viana Lectures on Lyapunov exponents 146 J.-H. Evertse & K. Gy˝ory Unit equations in Diophantine number theory 147 A. Prasad Representation theory 148 S. R. Garcia, J. Mashreghi & W. T. Ross Introduction to model spaces and their operators 149 C. Godsil & K. Meagher Erd˝os–Ko–Rado theorems: Algebraic approaches 150 P. Mattila Fourier analysis and Hausdorff dimension 151 M. Viana & K. Oliveira Foundations of ergodic theory 152 V. I. Paulsen & M. Raghupathi An introduction to the theory of reproducing kernel Hilbert spaces 153 R. Beals & R. Wong Special functions and orthogonal polynomials (2nd Edition) 154 V. Jurdjevic Optimal control and geometry: Integrable systems 155 G. Pisier Martingales in Banach Spaces

Martingales in Banach Spaces GILLES PISIER Texas A&M University

University Printing House, Cambridge CB2 8BS, United Kingdom Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781107137240

© Gilles Pisier 2016 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2016 Printed in the United Kingdom by Clays, St Ives plc A catalogue record for this publication is available from the British Library ISBN 978-1-107-13724-0 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.

Contents

Introduction Description of the contents

page x xiv

1

Banach space valued martingales 1.1 Banach space valued L p -spaces 1.2 Banach space valued conditional expectation 1.3 Martingales: basic properties 1.4 Examples of filtrations 1.5 Stopping times 1.6 Almost sure convergence: Maximal inequalities 1.7 Independent increments 1.8 Phillips’s theorem 1.9 Reverse martingales 1.10 Continuous time* 1.11 Notes and remarks

1 1 7 9 12 17 19 28 31 34 36 41

2

Radon-Nikodým property 2.1 Vector measures 2.2 Martingales, dentability and the Radon-Nikodým property 2.3 The dual of L p (B) 2.4 Generalizations of L p (B) 2.5 The Krein-Milman property 2.6 Edgar’s Choquet theorem 2.7 The Lewis-Stegall theorem 2.8 Notes and remarks

42 42 46 57 60 63 67 70 73

v

vi 3

Contents 76 76 80

Harmonic functions and RNP 3.1 Harmonicity and the Poisson kernel 3.2 The h p spaces of harmonic functions on D 3.3 Non-tangential maximal inequalities: boundary behaviour 3.4 Harmonic functions and RNP 3.5 Brownian martingales* 3.6 Notes and remarks

87 97 101 110

4

Analytic functions and ARNP 4.1 Subharmonic functions 4.2 Outer functions and H p (D) 4.3 Banach space valued H p -spaces for 0 < p ≤ ∞ 4.4 Analytic Radon-Nikodým property 4.5 Hardy martingales and Brownian motion* 4.6 B-valued h p and H p over the half-plane U* 4.7 Further complements* 4.8 Notes and remarks

112 112 116 118 127 131 140 146 148

5

The UMD property for Banach spaces 5.1 Martingale transforms (scalar case): Burkholder’s inequalities 5.2 Square functions for B-valued martingales: Kahane’s inequalities 5.3 Definition of UMD 5.4 Gundy’s decomposition 5.5 Extrapolation 5.6 The UMD1 property: Burgess Davis decomposition 5.7 Examples: UMD implies super-RNP 5.8 Dyadic UMD implies UMD 5.9 The Burkholder-Rosenthal inequality 5.10 Stein inequalities in UMD spaces 5.11 Burkholder’s geometric characterization of UMD space 5.12 Appendix: hypercontractivity on {−1, 1} 5.13 Appendix: Hölder-Minkowski inequality 5.14 Appendix: basic facts on weak-L p 5.15 Appendix: reverse Hölder principle 5.16 Appendix: Marcinkiewicz theorem 5.17 Appendix: exponential inequalities and growth of L p -norms 5.18 Notes and remarks

151 151 154 159 162 168 175 180 183 190 195 197 206 207 209 210 212 214 214

Contents

vii

6

The Hilbert transform and UMD Banach spaces 6.1 Hilbert transform: HT spaces 6.2 Bourgain’s transference theorem: HT implies UMD 6.3 UMD implies HT 6.4 UMD implies HT (with stochastic integrals)* 6.5 Littlewood-Paley inequalities in UMD spaces 6.6 The Walsh system Hilbert transform 6.7 Analytic UMD property* 6.8 UMD operators* 6.9 Notes and remarks

218 218 228 233 246 249 254 255 257 260

7

Banach space valued H 1 and BMO 7.1 Banach space valued H 1 and BMO: Fefferman’s duality theorem 7.2 Atomic B-valued H 1 7.3 H 1 , BMO and atoms for martingales 7.4 Regular filtrations 7.5 From dyadic BMO to classical BMO 7.6 Notes and remarks

263 263 266 282 291 292 297

8

Interpolation methods (complex and real) 8.1 The unit strip 8.2 The complex interpolation method 8.3 Duality for the complex method 8.4 The real interpolation method 8.5 Real interpolation between L p -spaces 8.6 The K-functional for (L1 (B0 ), L∞ (B1 )) 8.7 Real interpolation between vector valued L p -spaces 8.8 Duality for the real method 8.9 Reiteration for the real method 8.10 Comparing the real and complex methods 8.11 Symmetric and self-dual interpolation pairs 8.12 Notes and remarks

299 300 303 321 332 336 342 346 352 357 362 363 368

9

The strong p-variation of scalar valued martingales 9.1 Notes and remarks

372 380

10

Uniformly convex Banach space valued martingales 10.1 Uniform convexity 10.2 Uniform smoothness 10.3 Uniform convexity and smoothness of L p 10.4 Type, cotype and UMD

382 382 395 404 406

viii

Contents 10.5 Square function inequalities in q-uniformly convex and p-uniformly smooth spaces 10.6 Strong p-variation, uniform convexity and smoothness 10.7 Notes and remarks

417 423 425

11

Super-reflexivity 11.1 Finite representability and super-properties 11.2 Super-reflexivity and inequalities for basic sequences 11.3 Uniformly non-square and J-convex spaces 11.4 Super-reflexivity and uniform convexity 11.5 Strong law of large numbers and super-reflexivity 11.6 Complex interpolation: θ -Hilbertian spaces 11.7 Complex analogues of uniform convexity* 11.8 Appendix: ultrafilters, ultraproducts 11.9 Notes and remarks

428 428 433 446 454 457 459 463 472 474

12

Interpolation between strong p-variation spaces 12.1 The spaces v p (B), W p (B) and W p,q (B) 12.2 Duality and quasi-reflexivity 12.3 The intermediate spaces u p (B) and v p (B) 12.4 Lq -spaces with values in v p and W p 12.5 Some applications 12.6 K-functional for (v r (B), ∞ (B)) 12.7 Strong p-variation in approximation theory 12.8 Notes and remarks

477 478 482 485 489 492 494 496 499

13

Martingales and metric spaces 13.1 Exponential inequalities 13.2 Concentration of measure 13.3 Metric characterization of super-reflexivity: trees 13.4 Another metric characterization of super-reflexivity: diamonds 13.5 Markov type p and uniform smoothness 13.6 Notes and remarks

500 500 503 506

14

An invitation to martingales in non-commutative L p -spaces* 14.1 Non-commutative probability space 14.2 Non-commutative L p -spaces 14.3 Conditional expectations: non-commutative martingales 14.4 Examples

512 519 521 523 523 524 527 529

Contents 14.5 14.6 14.7 14.8 14.9 14.10

Non-commutative Khintchin inequalities Non-commutative Burkholder inequalities Non-commutative martingale transforms Non-commutative maximal inequalities Martingales in operator spaces Notes and remarks

Bibliography Index

ix 531 533 535 537 539 541 542 560

Introduction

Martingales (with discrete time) lie at the centre of this book. They are known to have major applications to virtually every corner of probability theory. Our central theme is their applications to the geometry of Banach spaces. We should emphasize that we do not assume any knowledge about scalar valued martingales. Actually, the beginning of this book gives a self-contained introduction to the basic martingale convergence theorems for which the use of the norm of a vector valued random variable instead of the modulus of a scalar one makes little difference. Only when we consider the ‘boundedness implies convergence’ phenomenon does it start to matter. Indeed, this requires the Banach space B to have the Radon-Nikodým property (RNP). But even at this point, the reader who wishes to concentrate on the scalar case could simply assume that B is finite-dimensional and disregard all the infinite-dimensional technical points. The structure of the proofs remains pertinent if one does so. In fact, it may be good advice for a beginner to do a first reading in this way. One could argue similarly about the property of ‘unconditionality of martingale differences’ (UMD): although perhaps the presence of a Banach space norm is more disturbing there, our reader could assume at first reading that B is a Hilbert space, thus getting rid of a number of technicalities to which one can return later. A major feature of the UMD property is its equivalence to the boundedness of the Hilbert transform (HT). Thus we include a substantial excursion in (Banach space valued) harmonic analysis to explain this. Actually, connections with harmonic analysis abound in this book, as we include a rather detailed exposition of the boundary behaviour of B-valued harmonic (resp. analytic) functions in connections with the RNP (resp. analytic RNP) of the Banach space B. We introduce the corresponding B-valued Hardy spaces in analogy with their probabilistic counterparts. We are partly motivated x

Introduction

xi

by the important role they play in operator theory, when one takes for B the space of bounded operators (or the Schatten p-class) on a Hilbert space. Hardy spaces are closely linked with martingales via Brownian motion: indeed, for any B-valued bounded harmonic (resp. analytic) function u on the unit disc D, the composition (u(Wt∧T ))t>0 of u with Brownian motion stopped before it exits D is an example of a continuous B-valued martingale, and its boundary behaviour depends in general on whether B has the RNP (resp. analytic RNP). We describe this connection with Brownian motion in detail, but we refrain from going too far on that road, remaining faithful to our discrete time emphasis. However, we include short sections summarizing just what is necessary to understand the connections with Brownian martingales in the Banach valued context, together with pointers to the relevant literature. In general, the sections that are a bit far off our main goals are marked by an asterisk. For instance, we describe in §7.1 the Banach space valued version of Fefferman’s duality theorem between H 1 and BMO. While this is not really part of martingale theory, the interplay with martingales, both historically and heuristically, is so obvious that we felt we had to include it. The asterisked sections could be kept for a second reading. In addition to the RN and UMD properties, our third main theme is super-reflexivity and its connections with uniform convexity and smoothness. Roughly, we relate the geometric properties of a Banach space B with the study of the p-variation 1/p ∞  fn − fn−1 Bp Sp( f ) = 1

of B-valued martingales ( fn ). Depending on whether S p ( f ) ∈ L p is necessary or sufficient for the convergence of ( fn ) in L p (B), we can find an equivalent norm on B with modulus of uniform convexity (resp. smoothness) ‘at least as good as’ the function t → t p . We also consider the strong p-variation Vp ( f ) =

sup 0=n(0) 0 (resp. supt>0 ρ(t )t −p < ∞). In that case we say that the space is q-uniformly convex (resp. p-uniformly smooth). The proof also uses inequalities going back to Gurarii, James and Lindenstrauss on monotone basic sequences. We apply the latter to martingale difference sequences viewed as monotone basic sequences in L p (B). Our treatment of uniform smoothness in §10.2 runs parallel to that of uniform convexity in §10.1. In §10.3 we estimate the moduli of uniform convexity and smoothness of L p for 1 < p < ∞. In particular, L p is p-uniformly convex if 2 ≤ p < ∞ and p-uniformly smooth if 1 < p ≤ 2.

xxiv

Description of the contents

In §10.5 we prove analogues of Burkholder’s inequalities, but with the square function now replaced by 1/p  ∞  fn − fn−1 Bp . S p ( f ) =  f0 Bp + 1

Unfortunately, the results are now only one sided: if B satisfies (8) (resp. (9)), then Sq ( f )r is dominated by (resp. S p ( f )r dominates)  f Lr (B) for all 1 < r < ∞, but here p ≤ 2 ≤ q and the case p = q is reduced to the Hilbert space case. In §10.6 we return to the strong p-variation and prove analogous results to the preceding ones, but this time with Wq ( f ) and Wp ( f ) in place of Sq ( f ) and S p ( f ) and 1 < p < 2 < q < ∞. The technique here is similar to that used for the scalar case in Chapter 9. Chapter 11 is devoted to super-reflexivity. A Banach space B is superreflexive if every space that is finitely representable in B is reflexive. In §11.1 we introduce finite representability and general super-properties in connection with ultraproducts. We include some background about the latter in an appendix to Chapter 11. In §11.2 we concentrate on super-P when P is either ‘reflexivity’ or the RNP. We prove that super-reflexivity is equivalent to the super-RNP (see Theorem 11.11). We give (see Theorem 11.10) a fundamental characterization of reflexivity, from which one can also derive easily (see Theorem 11.22) one of super-reflexivity. As in the preceding chapter, we replace B by L2 (B) and view martingale difference sequences as monotone basic sequences in L2 (B). Then we deduce the martingale inequalities from those satisfied by general basic sequences in super-reflexive spaces. For that purpose, we review a number of results about basic sequences that are not always directly related to our approach. For instance, we prove the classical fact that a Banach space with a basis is reflexive iff the basis is both boundedly complete and shrinking. While we do not directly use this, it should help the reader understand why super-reflexivity   implies inequalities of the form either ( xn q )1/q ≤ C xn  for q < ∞ or    xn  ≤ ( xn  p )1/p for p > 1 (see (iv) and (v) in Theorem 11.22). Indeed, they can be interpreted as a strong form of ‘boundedly complete’ for the first one and of ‘shrinking’ for the second one. In §11.3 we show that uniformly non-square Banach spaces are reflexive, and hence automatically super-reflexive (see Theorem 11.24 and Corollary 11.26). More generally, we go on to prove that B is super-reflexive iff it is J-convex, or equivalently iff it is J-(n, ε) convex for some n ≥ 2 and some ε > 0. We say that B is J-(n, ε) convex if, for any n-tuple (x1 , . . . , xn ) in the unit ball of B,

Description of the contents

xxv

there is an integer j = 1, . . . , n such that









xi − xi

≤ n(1 − ε).



i< j i≥ j When n = 2, we recover the notion of ‘uniformly non-square’. The implication super-reflexive ⇒ J-convex is rather easy to derive (as we do in Corollary 11.34) from the fundamental reflexivity criterion stated as Theorem 11.10. The converse implication (due to James) is much more delicate. We prove it following essentially the Brunel-Sucheston approach ([122]), which in our opinion is much easier to grasp. This construction shows that a non-super-reflexive (or merely non-reflexive) space B contains very extreme finite-dimensional structures that constitute obstructions to either reflexivity or the RNP. For  finitely representable in B for which instance, any such B admits a space B  such that there is a dyadic martingale ( fn ) with values in the unit ball of B ∀n ≥ 1

 fn − fn−1 B ≡ 1.

 contains an extremely sparsely separated infinite dyadic Thus the unit ball of B tree. (See Remark 1.35 for concrete examples of such trees.) In §11.4 we finally connect super-reflexivity and uniform convexity. We prove that B is super-reflexive iff it is isomorphic to either a uniformly convex space, or a uniformly smooth one, or a uniformly non-square one. By the preceding Chapter 10, we already know that the renormings can be achieved with moduli of convexity and smoothness of ‘power type’. Using interpolation (see Proposition 11.44), we can even obtain a renorming that is both p-uniformly smooth and q-uniformly convex for some 1 < p, q < ∞, but it is still open whether this holds with the optimal choice of p > 1 and q < ∞. To end Chapter 11, we give a characterization of super-reflexivity by the validity of a version of the strong law of large numbers for B-valued martingales. In §11.6 we discuss the stability of super-reflexivity (as well as uniform convexity and smoothness) by interpolation. We also show that a Banach lattice is super-reflexive iff it is isomorphic, for some θ > 0, to an interpolated space (B0 , B1 )θ where B1 is a Hilbert space (and B0 is arbitrary). Such spaces are called θ -Hilbertian. In §11.7 we discuss the various complex analogues of uniform convexity: When restricted to analytic (or Hardy, or PSH) martingales, the martingale inequalities characterizing p-uniform convexity lead to several variants of uniform convexity, where roughly convex functions are replaced by plurisubharmonic ones. Of course, this subject is connected to the analytic RNP and analytic UMD, but many questions remain unanswered.

xxvi

Description of the contents

In Chapter 12 we study the real interpolation spaces (v 1 , ∞ )θ,q between the space v 1 of sequences with bounded variation and the space ∞ of bounded sequences. Explicitly, v 1 (resp. ∞ ) is the space of scalar sequences (xn ) such  that ∞ 1 |xn − xn−1 | < ∞ (resp. sup |xn | < ∞) equipped with its natural norm. The inclusion v 1 → ∞ plays a major part (perhaps behind the scene) in our treatment of (super-) reflexivity in Chapter 11. Indeed, by the fundamental Theorem 11.10, B is non-reflexive iff the inclusion J : v 1 → ∞ factors through B, i.e. it admits a factorization a

b

v 1 −→ B −→ ∞ , with bounded linear maps a, b such that J = ba. The remarkable work of James on J-convexity (described in Chapter 11) left open an important point: whether any Banach space B such that n1 is not finitely representable in B (i.e. is not almost isometrically embeddable in B) must be reflexive. James proved that the answer is yes if n = 2, but for n > 2, this remained open until James himself settled it [281] by a counter-example for n = 3 (see also [283] for simplifications). In the theory of type (and cotype), it is the same to say that, for some n ≥ 2, B does not contain n1 almost isometrically or to say that B has type p for some p > 1 (see the survey [347]). Moreover, type p can be equivalently defined by an inequality analogous to that of p-uniform smoothness, but only for martingales with independent increments. Thus it is natural to wonder whether the strongest notion of ‘type p’, namely type 2, implies reflexivity. In another tour de force, James [282] proved that it is not so. His example is rather complicated. However, it turns out that the real interpolation spaces W p,q = (v 1 , ∞ )θ,q (1 < p, q < ∞, 1 − θ = 1/p) provide very nice examples of the same kind. Thus, following [399], we prove in Corollary 12.19 that W p,q has exactly the same type and cotype exponents as the Lorentz space  p,q = (1 , ∞ )θ,q as long as p = 2, although W p,q is not reflexive since it lies between v 1 and ∞ . The singularity at p = 2 is necessary since (unlike 2 = 2,2 ) the space W2,2 , being non-reflexive, cannot have both type 2 and cotype 2 since that would force it to be isomorphic to Hilbert space. A key idea is to consider similarly the B-valued spaces W p,q (B) = (v 1 (B), ∞ (B))θ,q , where B can be an arbitrary Banach space. When p = q, we set W p (B) = W p,p (B). We can derive the type and cotype of W p,q in two ways. The first one proves that the vector valued spaces W p,q (Lr ) satisfy the same kind of ‘HölderMinkowski’ inequality as the Lorentz spaces  p,q , with the only exception of

Description of the contents

xxvii

p = r. This is the substance of Corollary 12.18 (and Corollary 12.27): we have the inclusion W p (Lr ) ⊂ Lr (W p )

if

r>p

and the reverse inclusion if p < r. Another way to prove this (see Remark 12.28) goes through estimates of the K-functional for the pairs (v 1 , ∞ ) and (v r , ∞ ) for 1 < r < ∞ (see Lemma 12.24). Indeed, by the reiteration theorem, we may identify (v 1 , ∞ )θ,q and (v r , ∞ )θ ,q if 1 − θ = (1 − θ )/r, and similarly in the Banach space valued case (see Theorem 12.25), we have (v 1 (B), ∞ (B))θ,q = (v r (B), ∞ (B))θ ,q . We also use reiteration in Theorem 12.14 to describe the space (v r , ∞ )θ,q for 0 < r < 1. In the final Theorem 12.31, we give an alternate description of W p = W p,p , which should convince the reader that it is a very natural space (this is closely connected to ‘splines’ in approximation theory). The description  is as follows: a sequence x = (xn )n belongs to W p iff N SN (x) p < ∞, where SN (x) is the distance in ∞ of x from the subspace of all sequences (yn ) such that card{n | |yn − yn−1 | = 0} ≤ N. In Chapter 12 we also include a discussion of the classical James space (usually denoted by J), which we denote by v 20 . The spaces W p,q are in many ways similar to the James space; in particular, if 1 < p, q < ∞, they are of co-dimension 1 in their bidual (see Remark 12.8). In §13.1 and §13.2 we present applications of a certain exponential inequality (due to Azuma) to concentration of measure for the symmetric groups and for the Hamming cube. In Chapter 13 we give two characterizations of super-reflexive Banach spaces by properties of the underlying metric spaces. The relevant properties involve finite metric spaces. Given a sequence T = (Tn , dn ) of finite metric spaces, we say that the sequence T embeds Lipschitz uniformly in a metric space (T, d) if, n ⊂ T and bijective mappings fn : Tn → for some constant C, there are subsets T  Tn with Lipschitz norms satisfying sup  fn Lip  fn−1 Lip < ∞. n

Consider for instance the case when Tn is a finite dyadic tree restricted to its first 1 + 2 + · · · + 2n = 2n+1 − 1 points viewed as a graph and equipped with the usual geodesic distance. In Theorem 13.10 we prove following [138] that a Banach space B is super-reflexive iff it does not contain the sequence of these dyadic trees Lipschitz uniformly. More recently (cf. [289]), it was proved that

xxviii

Description of the contents

the trees can be replaced in this result by the ‘diamond graphs’. We describe the analogous characterization with diamond graphs in §13.4. In §13.5 we discuss several non-linear notions of ‘type p’ for metric spaces, notably the notion of Markov type p, and we prove the recent result from [357] that p-uniformly smooth implies Markov type p. The proof uses martingale inequalities for martingales naturally associated to Markov chains on finite state spaces. In the final chapter, Chapter 14, we briefly outline the recent developments of ‘non-commutative’ martingale inequalities initiated in [401]. This is the subject of a second volume titled Martingales in Non-Commutative L p -Spaces to follow the present one (to be published on the author’s web page). We only give a glimpse of what this is about. There the probability measure is replaced by a standard normalized trace τ on a von Neumann algebra A, the filtration becomes an increasing sequence (An ) of von Neumann subalgebras of A, the space L p (τ ) is now defined as the completion of A equipped with the norm x → (τ (|x| p ))1/p and the conditional expectation En with respect to An is the orthogonal projection from L2 (τ ) onto the closure of An in L2 (τ ). Although there is no analogue of the maximal function (in fact, there are no functions at all !), it turns out that there is a satisfactory non-commutative analogue (by duality) of Doob’s inequality (see [292]). Moreover, Gundy’s decomposition can be extended to this framework (see [373, 410, 411]). Thus one obtains a version of Burkholder’s martingale transform inequalities. In other words, martingale difference sequences in non-commutative L p -spaces (1 < p < ∞) are unconditional. This implies in particular that the latter spaces are UMD as Banach spaces. The BurkholderRosenthal and Stein inequalities all have natural generalizations to this setting.

1 Banach space valued martingales

We start by recalling the definition and basic properties of conditional expectations. Let (, A, P) be a probability space and let B ⊂ A be a σ -subalgebra. On L2 (, A, P), the conditional expectation EB can be defined simply as the orthogonal projection onto the subspace L2 (, B, P). One then shows that it extends to a positive contraction on L p (, A, P) for all 1 ≤ p ≤ ∞, taking values in L p (, B, P). The resulting operator EB : L p (, A, P) → L p (, B, P) is a linear projection and is characterized by the property EB (h f ) = hEB ( f ).

∀ f ∈ L p (, A, P) ∀h ∈ L∞ (, B, P)

(1.1)

Here ‘positive’ really means positivity preserving, i.e. for any f ∈ L p , f ≥ 0 ⇒ T ( f ) ≥ 0. As usual, we often abbreviate ‘almost everywhere’ by ‘a.e.’ and ‘almost surely’ by ‘a.s.’ We will now consider conditional expectation operators on Banach space valued L p -spaces.

1.1 Banach space valued L p-spaces Let (, A, m) be a measure space. Let B be a Banach space. We will denote by F (B) the space of all measurable simple functions, i.e. the functions f :  → B for which there is a partition of , say,  = A1 ∪ · · · ∪ AN with Ak ∈ A, and elements bk ∈ B such that N ∀ω ∈  f (ω) = 1Ak (ω)bk . (1.2) 1

Equivalently, F (B) is the space of all measurable functions f :  → B taking only finitely many values. 1

2

Banach space valued martingales

Definition 1.1. We will say that a function f :  → B is Bochner measurable if there is a sequence ( fn ) in F (B) tending to f pointwise. Let 1 ≤ p ≤ ∞. We denote by L p (, A, m; B) the space of(equivalence classes of) Bochner measurable functions f :  → B such that  f Bp dm < ∞ for p < ∞, and ess sup f (·)B < ∞ for p = ∞. As usual, two functions that are equal a.e. are identified. We equip this space with the norm   f L p (B) =

 f Bp

1/p dm

 f L∞ (B) = ess sup f (·)B

for

p t})dt. (1.3)  f Bp dm = 0

Of course, this definition of L p (B) coincides with the usual one in the scalar valued case, i.e. if B = R (or C). In that case, we often denote simply by L p (, A, m) (or sometimes L p (m), or even L p ) the resulting space of scalar valued functions. For brevity, we will often write simply L p (m; B), or, if there is no risk of confusion, simply L p (B), instead of L p (, A, m; B). Given ϕ1 , . . . , ϕN ∈ L p and b1 , . . . , bN ∈ B, we can define a function  f :  → B in L p (B) by setting f (ω) = N1 ϕk (ω)bk . We will denote this func tion by N1 ϕk ⊗ bk and by L p ⊗ B the subspace of L p (B) formed of all such functions. Proposition 1.2. Let 1 ≤ p < ∞. Each of the subspaces F (B) ∩ L p (B) and L p ⊗ B ⊂ L p (B) is dense in L p (B). More generally, for any subspace V ⊂ L p dense in L p , V ⊗ B is dense in L p (B). Proof. Consider f ∈ L p (B). Let fn ∈ F (B) be such that fn → f pointwise. Then  fn (·)B →  f (·)B pointwise, so that if we set gn (ω) = fn (ω) 1{ fn  ε}) → 0 ∀ε > 0. When m() = ∞, assuming m σ -finite, let w ∈ L1 (m) be such that w > 0 a.e. so that the measure m = w · m is finite and equivalent to m. We may obviously identify L0 (m; B) with L0 (m ; B). We then define the topological vector space structure of L0 (m; B) by transplanting the one just defined on L0 (m ; B). Then fn tends to f in this topology iff, for any measurable subset A ⊂  with m(A) < ∞, the restriction of fn to A converges in measure to f|A in the preceding sense (this shows in particular that this is independent of the choice of w). We then say that fn tends to f ‘in measure’ (or ‘in probability’ if m() = 1). It is sufficient (resp. necessary) for this that fn tends to f almost everywhere (resp. that ( fn ) admits a subsequence converging a.e. to f ). In conclusion, for any σ -finite measure space, and any 0 < p ≤ ∞, we have a continuous injective linear embedding L p (m; B) ⊂ L0 (m; B). All these facts are easily checked.

1.2 B-valued conditional expectation

7

1.2 Banach space valued conditional expectation In particular, Proposition 1.6 applied with p = q, is valid for T = EB . For any f in L1 (, A, P; B), we denote again simply by EB ( f ) the function T ⊗ IB ( f ) B for T = E . Proposition 1.10. Let (, A, P) be a probability space. The conditional expectation f → EB ( f ) is an operator of norm 1 on L p (B) for all 1 ≤ p ≤ ∞, satisfying EB (h f ) = hEB ( f ). (1.11)

∀ f ∈ L p (, A, P; B) ∀h ∈ L∞ (, B, P)

The operator EB is a norm 1 projection from L p (, A, P; B) to L p (, B, P; B), viewed as a subspace of L p (, A, P; B). Proof. By (1.1), Proposition 1.6 applied to T = EB with p = q produces an operator P of norm 1 on L p (, A, P; B), such that P(hx) = hP(x) for any x ∈ L p (A, P) ⊗ B and h ∈ L∞ (, B, P) and also such that P(x) = x for any x ∈ L p (B, P) ⊗ B. Therefore, by density, P(hx) = hP(x) holds for any x ∈ L p (, A, P; B). Also P(x) = x holds for any x ∈ L p (B, P) ⊗ B ⊂ L p (, A, P; B). Since we may clearly identify the latter subspace with L p (, B, P; B), the statement follows. Remark 1.11. Let 1 ≤ p, p ≤ ∞ such that p−1 + p −1 = 1. Let f ∈ L p (, A, P; B). Note that g = EB ( f ) ∈ L p (, B, P; B) satisfies the following:   ∀E ∈ B gdP = f dP. (1.12) E

E

In particular, E(g) = E( f ). More generally,   ∀h ∈ L p (, B, P) hgdP = h f dP, and also

 ∀h ∈ L p (, A, P)

 hgdP =

(EB h)gdP.

(1.13)

(1.14)

Indeed (among many other ways to verify these equalities), they are easy to check by ‘scalarization’, by a simple deduction from the scalar case. More precisely, for each of these identities, we have to check the equality of two vectors in B, say, x = y, and this is the same as ξ (x) = ξ (y) for all ξ in B∗ . But we know by (1.7) that ξ , g(·) = EB ξ , f , so to check (1.12)–(1.14), it suffices to check them for B = R (or B = C in the case of complex scalars) and for all scalar valued f . But in that case, they are easy to deduce from the basic (1.1). Indeed, by the density of L∞ in L p , (1.1) remains valid when h ∈ L p , so we

8

Banach space valued martingales

derive (1.12) and (1.13) from it. To check (1.14), inverting the roles of g and h, we find by (1.12) with E =  (and with f replaced by hg) that    hgdP = EB (hg)dP = (EB h)gdP. In the sequel, we will refer to the preceding way to check a vector valued functional identity as the scalarization principle. We will sometimes use the following generalization of (1.8). Proposition 1.12. Let f ∈ L1 (, A, P; B). For any continuous convex function ϕ : B → R such that ϕ( f ) ∈ L1 , we have ϕ(EB ( f )) ≤ EB ϕ( f ).

(1.15)

EB ( f ) ≤ EB  f .

(1.16)

In particular,

Proof. We may clearly assume B separable. Recall that a real valued convex function on B is continuous iff it is locally bounded above ([82, p. 93]). By Hahn-Banach, we have ϕ(x) = supa∈C a(x), where C is the collection of all continuous real valued affine functions a on B such that a ≤ ϕ. By the separability of B, we can find a countable subcollection C such that ϕ(x) = supa∈C f (x). By (1.7), we have a(EB ( f )) = EB (a( f )) for any a ∈ B∗ , and hence, also for any affine continuous function a on B (since this is obvious for constant functions). Thus we have a(EB f ) = EB a( f ) ≤ EB ϕ( f ) for any a ∈ C , and hence, taking the sup over a ∈ C , we find (1.15). Remark 1.13. To conform with the tradition, we assumed that P is a probability in Corollary 1.10. However, essentially the same results remain valid if we merely assume that the restriction of P to B is σ -finite (or that L∞ ∩ L1 is weak*-dense in L∞ both for B and A). Indeed, we can still define EB on L2 (, A, P) as the orthogonal projection onto L2 (, B, P), and although it is rarely used, it is still true that the resulting operator on L p (, A, P) preserves positivity and extends to an operator of norm 1 on L p (, A, P; B) for all 1 ≤ p ≤ ∞, satisfying (1.11), (1.13) and (1.14). See Remark 6.20 for an illustration. Remark. By classical results due to Ron Douglas [213] and T. Ando [102], conditional expectations can be characterized as the only norm 1 projections on L p (1 ≤ p = 2 < ∞) that preserve the constant function 1. This is also true for p = 2 if one restricts to positivity-preserving operators.

1.3 Martingales: basic properties

9

1.3 Martingales: basic properties Let B be a Banach space. Let (, A, P) be a probability space. A sequence (Mn )n≥0 in L1 (, A, P; B) is called a martingale if there exists an increasing sequence of σ -subalgebras A0 ⊂ A1 ⊂ · · · ⊂ An ⊂ · · · ⊂ A (this is called ‘a filtration’) such that for each n ≥ 0, Mn is An -measurable and satisfies Mn = EAn (Mn+1 ).

(1.17)

This implies, of course, that ∀n < m

Mn = EAn Mm .

In particular, if (Mn ) is a B-valued martingale, (1.12) yields   ∀n ≤ m ∀A ∈ An Mn dP = Mm dP. A

(1.18)

A

A sequence of random variables (Mn ) is called ‘adapted to the filtration’ (An )n≥0 if Mn is An -measurable for each n ≥ 0. Note that the martingale property Mn = EAn (Mn+1 ) automatically implies that (Mn ) is adapted to (An ). Of course, the minimal choice of a filtration to which (Mn ) is adapted is simply the filtration Mn = σ (M0 , M1 , . . . , Mn ). Moreover, if (Mn ) is a martingale in the preceding sense with respect to some filtration (An ), then it is a fortiori a martingale with respect to (Mn ). Indeed, we have obviously Mn ⊂ An for all n, and hence applying EMn to both sides of (1.17) implies Mn = EMn (EAn Mn+1 ) = EMn Mn+1 . An adapted sequence of random variables (Mn ) is called ‘predictable’ if Mn is An−1 -measurable for each n ≥ 1. Of course, the predictable sequences of interest to us will not be martingales, since predictable martingales must form a constant sequence. We will also need the definition of a submartingale. A sequence (Mn )n≥0 of real valued random variables in L1 is called a submartingale if there are σ -subalgebras An , as previously, such that Mn is An -measurable and ∀n ≥ 0

Mn ≤ EAn Mn+1 .

This implies, of course, that ∀n < m

Mn ≤ EAn Mm .

For example, if (Mn ) is a B-valued martingale in L1 (B), then, for any continuous convex function ϕ : B → R such that ϕ(Mn ) ∈ L1 for all n, the sequence (ϕ(Mn )) is a submartingale. Indeed, by (1.15), we have ϕ(Mn ) ≤ EAn ϕ(Mn+1 ).

(1.19)

10

Banach space valued martingales

A fortiori, taking the expectation of both sides, we have (for future reference) Eϕ(Mn ) ≤ Eϕ(Mn+1 ).

(1.20)

More generally, if I is any partially ordered set, then a collection (Mi )i∈I in L1 (, P; B) is called a martingale (indexed by I) if there are σ -subalgebras Ai ⊂ A such that Ai ⊂ A j whenever i < j and Mi = EAi M j . In particular, when the index set is I = {0, −1, −2, . . .}, the corresponding sequence is usually called a ‘reverse martingale’. The following convergence theorem is fundamental. Theorem 1.14. Let (An ) be a fixed increasing sequence of σ -subalgebras of A.  Let A∞ be the σ -algebra generated by n≥0 An . Let 1 ≤ p < ∞ and consider M in L p (, P; B). Let us define Mn = EAn (M). Then (Mn )n≥0 is a martingale such that Mn → EA∞ (M) in L p (, P; B) when n → ∞. Proof. Note that since An ⊂ An+1 , we have EAn EAn+1 = EAn , and similarly, EAn EA∞ = EAn . Replacing M by EA∞ M, we can assume without loss of generality that M is A∞ -measurable. We will use the following fact: the union  n L p (, An , P; B) is densein L p (, A∞ , P; B). Indeed, let C be the class of all sets A such that 1A ∈ n L∞ (, An , P), where the closure is meant in  L p (, P) (recall p < ∞). Clearly C ⊃ n≥0 An and C is a σ -algebra, hence C ⊃ A∞ . This gives the scalar case version of the preceding fact. Now, any f in L p (, A∞ , P; B) can be approximated (by definition of the spaces L p (B))  by functions of the form n1 1Ai xi with xi ∈ B and Ai ∈ A∞ . But since 1Ai ∈   n L∞ (, An , P), we clearly have f ∈ n L p (, An , P; B), as announced. We can now prove Theorem 1.14. Let ε > 0. By the preceding fact, there is an integer k and g in L p (, Ak , P; B) such that M − g p < ε. We have then g = EAn g for all n ≥ k, hence ∀n ≥ k

Mn − M = EAn (M − g) + g − M

and, finally, Mn − M p ≤ EAn (M − g) p + g − M p ≤ 2ε. This completes the proof. Definition 1.15. Let B be a Banach space and let (Mn )n≥0 be a sequence in L1 (, A, P; B). We will say that (Mn ) is uniformly integrable if the sequence of non-negative random variables (Mn (·))n≥0 is uniformly integrable. More

1.3 Martingales: basic properties

11

precisely, this means that (Mn ) is bounded in L1 and that, for any ε > 0, there is a δ > 0 such that  ∀A ∈ A P(A) < δ ⇒ sup Mn  < ε. n≥0

A

In the scalar valued case, it is well known that a subset of L1 is weakly relatively compact iff it is uniformly integrable. Corollary 1.16. In the scalar case (or the finite-dimensional case), every martingale that is bounded in L p for some 1 ≤ p < ∞ and is uniformly integrable if p = 1 is actually convergent in L p to a limit M∞ such that Mn = EAn M∞ for all n ≥ 0. Proof. Let (Mnk ) be a subsequence converging weakly to a limit, which we denote by M∞ . Clearly M∞ ∈ L p (, A∞ , P), and we have ∀A ∈ An ,   M∞ dP = lim Mnk dP, A

but whenever nk ≥ n, we have erty. Hence

 A

A

Mnk dP =

 A

Mn dP by the martingale prop



∀A ∈ An

M∞ dP = A

Mn dP, A

which forces Mn = EAn M∞ . We then conclude by Theorem 1.14 that Mn → M∞ in L p -norm. Note that, conversely, any martingale that converges in L1 is clearly uniformly integrable. Remark 1.17. Fix 1 ≤ p < ∞. Let I be a directed set, with order denoted simply by ≤. This means that, for any pair i, j in I, there is k ∈ I such that i ≤ k and j ≤ k. Let (Ai ) be a family of σ -algebras directed by inclusion (i.e. we have Ai ⊂ A j whenever i ≤ j). The extension of the notion of martingale is obvious: a collection of random variables ( fi )i∈I in L p (B) will be called a martingale if fi = EAi ( f j ) holds whenever i ≤ j. The resulting net converges in L p (B) iff, for any increasing sequence i1 ≤ · · · ≤ in ≤ in+1 ≤ · · · , the (usual sense) martingale ( fin ) converges in L p (B). Indeed, this merelyfollows from  the  A metrizability of L p (B) ! More precisely, if we assume that σ i = A, i∈I then, for any f in L p (, A, P; B), the directed net (EAi f )i∈I converges to f in L p (B). Indeed, this net must satisfy the Cauchy criterion, because otherwise we would be able for some δ > 0 to construct (by induction) an increasing sequence i(1) ≤ i(2) ≤ · · · in I such that EAi(k) f − EAi(k−1) f L p (B) > δ for all

12

Banach space valued martingales

k > 1, and this would then contradict Theorem 1.14. Thus EAi f converges to  a limit F in L p (B), and hence, for any set A ⊂  in j∈I A j , we must have    f = lim EAi f = F. 

A



i

A

A

Since the equality A f = A F must remain true on the σ -algebra generated by  A j , we conclude that f = F, thus completing the proof that EAi f → f in L p (B).

1.4 Examples of filtrations The most classical example of filtration is the one associated to a sequence of independent (real valued) random variables (Yn )n≥1 on a probability space (, A, P). Let An = σ (Y1 , . . . , Yn ) for all n ≥ 1 and A0 = {φ, }. In that case, a sequence of random variables ( fn )n≥0 is adapted to the filtration (An )n≥0 iff f0 is constant and, for each n ≥ 1, fn depends only on Y1 , . . . , Yn , i.e. there is a (Borel-measurable) function Fn on Rn such that fn = Fn (Y1 , . . . , Yn ). The martingale condition can then be written as  Fn (Y1 , . . . , Yn ) = Fn+1 (Y1 , . . . , Yn , y) dPn+1 (y), ∀n ≥ 0 where Pn+1 is the probability distribution (or ‘the law’) of Yn+1 . An equivalent but more ‘intrinsic’ model arises when one considers  = RN∗  equipped with the product probability measure P = n≥1 Pn . If one denotes by Y = (Yn )n≥1 a generic point in , the random variable Y → Yn appears as the nth coordinate, and Y → Fn (Y ) is An -measurable iff Fn (Y ) depends only on the n first coordinates of Y . The dyadic filtration (Dn )n≥0 on  = {−1, 1}N∗ is the fundamental example of this kind: here we denote by εn :  → {−1, 1}

(n = 1, 2, . . .)

the nth coordinate, we equip  with the probability measure P = ⊗(δ1 + δ−1 )/2, and we set Dn = σ (ε1 , . . . , εn ), D0 = {φ, }. We denote by D (and sometimes by D∞ ) the σ -algebra generated by ∪n Dn . Clearly the variables (εn ) are independent and take the values ±1 with equal probability 1/2. Note that Dn admits exactly 2n atoms and, moreover, dim L2 (, Dn , P) = 2n .  For any finite subset A ⊂ [1, 2, . . .], let wA = n∈A εn (up to indexation, these

1.4 Examples of filtrations

13

are called Walsh functions) with the convention wφ ≡ 1. It is easy to check that {wA | A ⊂ [1, . . . , n]} (resp. {wA | |A| < ∞}) is an orthonormal basis of L2 (, Dn , P) (resp. L2 (, D, P)). Given a Banach space B, a B-valued martingale fn :  → B adapted to the dyadic filtration (Dn ) is characterized by the property that ∀n ≥ 1

( fn − fn−1 )(ε1 , . . . , εn ) = εn ϕn−1 (ε1 , . . . , εn−1 ),

where ϕn−1 depends only on ε1 , . . . , εn−1 . We leave the easy verification of this to the reader. Of course, the preceding remarks remain valid if one works with any sequence of ±1 valued independent random variables (εn ) such that P(εn = ±1) = 1/2 on an ‘abstract’ probability space (, P). In classical analysis it is customary to use the Rademacher functions (rn )n≥1 on the Lebesgue interval ([0, 1], dt ) instead of (εn ). We need some notation to introduce these together with the Haar system. Given an interval I ⊂ R, we divide I into parts of equal length, and we denote by I + and I − , respectively, the left and right half of I. Note that we do not specify whether the endpoints belong to I since the latter are negligible for the Lebesgue measure on [0, 1] (or [0, 1[ or R). Actually, in the Lebesgue measure context, whenever convenient, we will identify [0, 1] with [0, 1) or (0, 1). Let hI = 1I + − 1I − . We denote I1 (1) = [0, 1), I2 (1) = [0, 12 ), I2 (2) = [ 21 , 1) and, more generally,   k−1 k , In (k) = 2n−1 2n−1 √ n−1 for k = 1, 2, . . . , 2 (n ≥ 1). We then set h ≡ 1, h = h , h = 2 hI2 (1) , 1 2 I (1) 3 1 √ h4 = 2 hI2 (2) and, more generally, ∀n ≥ 1 ∀k = 1, . . . , 2n−1

h2n−1 +k = |In (k)|−1/2 hIn (k) .

Note that hn 2 = 1 for all n ≥ 1. The Rademacher function rn can be defined, for each n ≥ 1, by rn (t ) = sign(sin(2n πt )) (in some books this same function is denoted rn−1 ). Equivalently, 2n−1 hIn (k) . rn = k=1

Then the sequence (rn )n≥1 has the same distribution on ([0, 1], dt ) as the sequence (εn )n≥1 on (, P). Let An = σ (r1 , . . . , rn ). Then An is generated by the 2n -atoms {In+1 (k) | 1 ≤ k ≤ 2n }, each having length 2−n . The dimension of

14

Banach space valued martingales

L2 ([0, 1], An ) is 2n and the functions {h1 , . . . , h2n } (resp. {hn | n ≥ 1}) form an orthonormal basis of L2 ([0, 1], An ) (resp. L2 ([0, 1])). The Haar filtration (Bn )n≥1 on [0, 1] is defined by Bn = σ (h1 , . . . , hn ) so that we have σ (h1 , . . . , h2n ) = σ (r1 , . . . , rn ) or, equivalently, B2n = An for all n ≥ 1 (note that here B1 is trivial). It is easy to check that Bn is an atomic σ -algebra with exactly n atoms. Since the conditional expectation EBn is the orthogonal projection from L2 to L2 (Bn ), we have, for any f in L2 ([0, 1]), n ∀n ≥ 1 EBn f =  f , hk hk , 1

and hence for all n ≥ 2, EBn f − EBn−1 f =  f , hn hn .

(1.21)

More generally, for any B-valued martingale ( fn )n≥∅ adapted to (Bn )n≥1 , we have ∀n ≥ 2

fn − fn−1 = hn xn

for some sequence (xn ) in B. The Haar functions are in some sense the first example of wavelets (see e.g. [75, 98]). Indeed, if we set h = 1[0, 1 ) − 1[ 1 ,1) 2

2

(this is the same as the function previously denoted by h2 ), then the system of functions   m (1.22) 2 2 h((t + k)2m ) | k, m ∈ Z is an orthonormal basis of L2 (R). Note that the constant function 1 can be omitted since it is not in L2 (R). This is sometimes called the Haar wavelet. See §6.3 for an illustration of this connection. In the system (1.22), the sequence {hn | n ≥ 2} coincides with the subsystem formed of all functions in (1.22) with support included in [0, 1]. Remark 1.18. In the dyadic filtration, each atom of Dn−1 is split into half to give rise to two atoms of Dn with equal probability. Consider now on (, P) a filtration (Tn ) where each atom of Tn−1 is partitioned into at most m(n) disjoint sets in Tn of unequal probability. We assume that the partition obtained by partitioning all the atoms of Tn−1 generates Tn . Assuming that T0 = {φ, }, we find that the number of atoms of Tn is at most m(1) · · · m(n). If one wishes to, one can realize this filtration on [0, 1] simply by partitioning it into at most

1.4 Examples of filtrations

15

m(1) intervals in one-to-one correspondence with the partition of  that generates T1 . Then one continues in the same way for each interval in the resulting partition. We would like to point out to the reader that the resulting picture is a very close approximation of the general case. In fact, let B be any Banach space and let ( fn ) be a B-valued martingale with respect to an arbitrary filtration (An ). If ( fn ) is formed of simple functions, then ( fn ) is actually adapted to a filtration (Tn ) of the kind just described. Indeed, we may replace An by Tn = σ ( f0 , . . . , fn ), since this obviously preserves the martingale property, and then for each n, let Tn−1 ⊂ Bn be the finite subset that is the range of ( f0 , . . . , fn−1 ). For each x ∈ Tn−1 we set Ax = {ω | ( f0 , . . . , fn−1 )(ω) = x}, with the convention that negligible sets are eliminated, so that P(Ax ) > 0. Then (Ax )x∈Tn−1 forms a finite partition of , generating Tn−1 , and each Ax is partitioned into the sets Ax ∩ { fn = y}

y ∈ fn ().

If we denote by m(n) a common upper bound, valid for all x with P(Ax ) > 0 for the number of y ∈ fn () with P(Ax ∩ { fn = y}) > 0, then we find that (Tn ) is of the kind just described. Note, however, that the number of atoms of Tn of which Ax is the union (i.e. the number of such ys) depends on x. The dyadic case is the special case of (Tn ) when m(n) = 2 for all n and the partitions are into sets of equal probability. When f0 is constant equal to, say, xφ , one can picture the values of the martingale as the vertices of a rooted tree with root at xφ , in such a way that the vertices in the n th level of the tree are the values of fn and each vertex is in the convex hull of its ‘children’. Indeed, the martingale property reduces to ∀x ∈ Tn−1

x=

 (x,y)∈Tn

P(Ax ∩ { fn = y}) y. P(Ax )

In the dyadic case, the tree is regular of degree 2, and in the case of (Tn ) just described, the degree is at most m(n) for the vertices of the (n − 1)th generation. As we will show in Lemma 5.41, the martingales formed of simple functions just described are sort of ‘dense’ in the set of all martingales. Remark 1.19. We can refine (Tn ) just like we did earlier for (An ) with the Haar filtration (Bk ). There is a filtration (Ck ) and 0 = K(0) < K(1) < K(2) < · · · such that CK(n) = Tn for any n ≥ 0 and such that, for each k ≥ 0, Ck+1 has at most one more atom than Ck .

16

Banach space valued martingales

More precisely, assuming that T0 is trivial, i.e. it has a single atom, and that T1 has k(1) + 1 atoms (k(1) ≥ 1), enumerated as A1 , . . . , Ak(1) , Ak(1)+1 , we define the σ -algebras C1 , . . . , Ck(1) such that T0 = C0 ⊂ C1 ⊂ · · · ⊂ Ck(1) = T1 as follows. For 1 ≤ j ≤ k(1), we define the σ -algebra C j as the one generated by A1 , . . . , A j . Thus the list of its ( j + 1) atoms is A1 , . . . , A j ,  \ (A1 ∪ · · · ∪ A j ). Now assume that, for each n > 0, Tn has k(n) + 1 atoms (k(n) ≥ 1) that are not atoms of Tn−1 . Let K(n) = k(1) + · · · + k(n). By an inductive construction, this procedure leads to a filtration (Ck )k≥0 such that CK(n) = Tn for any n > 0. Indeed, assuming we already have constructed CK(n) , we enumerate the atoms of Tn+1 that are not atoms of Tn as A¯ 1 , . . . , A¯ k(n+1) , A¯ k(n+1)+1 and for each j such that K(n) < j ≤ K(n) + k(n + 1), we define C j as generated by CK(n) and A¯ 1 , . . . , A¯ j . The resulting filtration (Ck )k≥0 has the property that Ck+1 has at most one more atom than Ck for any k ≥ 0. In the dyadic case, for the Haar filtration, we set B1 trivial, so the correspondence with the present construction should be shifted by 1. Moreover, we did not make sure (although this is easy to do) that the filtration (Ck )k≥0 is strictly increasing, as in the Haar case for (Bk )k≥1 . Remark 1.20. We will see in Chapters 3 and 4 more examples of discrete or continuous filtrations related to the boundary behaviour of harmonic and analytic functions. In Chapter 6, we will consider a filtration (An )n∈Z on R equipped with Lebesgue’s measure. Each An is generated by a partition of R into intervals of length 2n (n ∈ Z). Remark 1.21. In [258], the following beautiful example appears. Let T = R/2π Z, equipped with its normalized Haar measure m. Let q > 1 be a prime number. For each integer n ≥ 0, consider on T the σ -algebra F−n generated by all the measurable functions on T with period q−n . Note that F0 is simply the whole Borel σ -algebra on T. The conditional expectation with respect to F−n on L1 (T) can be described simply like this: for any f ∈ L1 (T) with formal  Fourier series k∈Z fˆ(k)eikt the function EF−n ( f ) has a formal Fourier series  supported by the set of ks that are divisible by qn , namely k∈Z,qn |k fˆ(k)eikt . Thus, if fˆ is finitely supported, we have EF−n ( f ) =

 k∈Z, qn |k

fˆ(k)eikt .

1.5 Stopping times We can also write EF−n ( f )(t ) = q−n

qn j=1

17

f (t + q−n j).

Note that F−∞ · · · ⊂ F−n−1 ⊂ F−n ⊂ · · · ⊂ F0 , where F−∞ = ∩n≥0 F−n is the trivial σ -algebra on T. This example illustrates well the notion of reverse martingale described in §1.9. Let (xn ) be a finite sequence in a Banach space  j B. Let f0 = j x j eiq t . Then the sequence formed by the reverse partial sums  j f−n = x j eiq t (1.23) j≥n

forms a martingale with respect to the filtration (F−n ) with martingale differn ences f−n − f−n−1 = xn eiq t .

1.5 Stopping times To handle the a.s. convergence of martingales, we will need (as usual) the appropriate maximal inequalities. In the martingale case, these are Doob’s inequalities. Their proof uses stopping times, which are a basic tool in martingale theory. Given an increasing sequence (An )n≥0 of σ -subalgebras on , a random variable T :  → N ∪ {∞} is called a stopping time if ∀n ≥ 0

{T ≤ n} ∈ An ,

∀n ≥ 0

{T = n} ∈ An .

or equivalently if

If T < ∞ a.s., then T is called a finite stopping time. For any martingale (Mn ) and any stopping time T we will denote by MT the random variable ω → MT (ω) (ω). Proposition 1.22. For any martingale (Mn )n≥0 in L1 (B) relative to (An )n≥0 and for every stopping time T , (Mn∧T )n≥0 is a martingale on the same filtration.  Proof. Observe that Mn∧T = k 1) ). Then we can write MT = dk 1{k≤T } and since {k ≤ T } ∈ Ak−1 the martingale property implies EAn (dk 1{k≤T } ) = 0 for all k ≥ n. It fol lows that EAn MT = kN} (MN − MT ) and hence MT ∧N − MT  ≤ 1{N t



otherwise.

Then T is a stopping time relative to the sequence of σ -algebras (A k ) defined by A k = Ak∧n . Since Mk > t on the set {T = k}, we have    ∗ tP{Mn > t} = tP{T ≤ n} = t P{T = k} ≤ Mk , k≤n

whence by (1.31) ≤

k≤n

{T =k}







k≤n

{T =k}

Mn =

{T ≤n}

Mn .

This proves (1.29). To prove (1.30), we use an extrapolation trick. Recalling (1.3), if Mn∗ ≥ 0, we have  ∞ pt p−1 P{Mn∗ > t}dt EMn∗p = 0   ∞ p−2 pt Mn dP dt ≤ 0



 =

{Mn∗ >t}

Mn∗

Mn

  pt p−2 dt dP =

0

p Mn (Mn∗ ) p−1 dP, p−1

hence by Hölder’s inequality, ≤ p Mn  p (Mn∗ ) p−1  p = p Mn  p (EMn∗p ) so that after division by (EMn∗p )

p−1 p

p−1 p

,

, we obtain (1.30).

The following inequality is known as the Burkholder-Davis-Gundy inequality. It is dual to Doob’s maximal inequality. Indeed, by (1.30), we have for any x in L p (EAn x)L p (∞ ) =  supn |EAn x| p ≤ p x p .

(1.32)

Therefore it is natural to expect a dual inequality involving an ‘adjoint mapping’ from L p (1 ) to L p , as follows: Theorem 1.26. Let (θn )n≥0 be an arbitrary family of random variables. Then, for any 1 ≤ p < ∞,









|θn | . |EAn θn | ≤ p

(1.33)

p

p

1.6 Almost sure convergence: Maximal inequalities In particular, if θn ≥ 0,

21











θn . EAn θn ≤ p

p

p

Proof. Since |EAn θn | ≤ EAn |θn |, it suffices to prove this assuming

 θn ≥ 0. In that case, consider f ≥ 0 in L p with  f  p = 1 such that EAn θn p =  A  E n θn , f . Then    EAn θn , f = θn , EAn f 





θn  supn EAn f  p ; ≤

p

hence, by Doob’s inequality,





θn . ≤ p

p

Remark 1.27. Note that it is crucial for the validity of Theorems 1.25 and 1.26 that the conditional expectations be totally ordered, as in a filtration. However, as we will now see, in some cases we can go beyond that. Let (A1n )n≥0 , (A2n )n≥0 , . . . , (Adn )n≥0 be a d-tuple of (a priori mutually unrelated) filtrations on a probability space (, A, P). Let Id = Nd and for all i = (n(1), . . . , n(d)) let Ei = EAn(1) EAn(2) · · · EAn(d) . 1

2

d

(1.34)

Then, by a simple iteration argument, we find that for any 1 < p ≤ ∞ and any x in L p we have  supi∈Id |Ei x|  p ≤ (p )d x p . A similar iteration holds for the dual to Doob’s inequality: for any family (xi )i∈Id in L p we have











|Ei xi | ≤ (p )d

|xi | .

p

p

To illustrate this (following [145]), consider a dyadic rooted tree T , i.e. the points of T are finite sequences ξ = (ξ1 , . . . , ξk ) with ξ j ∈ {0, 1} and there is also a root (or origin) denoted by ξφ . We introduce a partial order on T in the natural way, i.e. ξφ is ≤ any element and then we set (ξ1 , . . . , ξk ) ≤ (ξ1 , . . . , ξ j ) if k ≤ j and (ξ1 , . . . , ξk ) = (ξ1 , . . . , ξk ). In other words, ξ ≤ ξ if ξ is on the same ‘branch’ as ξ , but ‘after’ ξ . This is clearly not totally ordered since two points situated on disjoint branches are incomparable. Nevertheless, as observed in [145], we have the

22

Banach space valued martingales

following: Consider a family {εξ | ξ ∈ T } of independent random variables and for any ξ in T let Aξ = σ ({εη | η ≤ ξ }, and let Eξ = EAξ . We have then for any 1 < p ≤ ∞ and any x in L p  supξ ∈T |Eξ x|  p ≤ (p )3 x p . The idea is that Eξ is actually of the form (1.34) with d = 3, see [145] for full details. Remark 1.28. Let B be a Banach space and let (Mn )n≥0 be a B-valued martingale. Then the random variables Zn defined by Zn (ω) = Mn (ω)B form a submartingale. Indeed, by (1.8) we have for every k and every f in L1 (, P; B) a.s.

EAk ( f ) ≤ EAk ( f B )

(1.35)

hence taking f = Mn with k ≤ n we obtain Mk  ≤ EAk (Mn ), which shows that (Zn ) is a submartingale. In particular, by (1.31), we have for any A in Ak E(1A Mk ) ≤ E(1A Mn ).

(1.36)

As a consequence, we can apply Doob’s inequality to the submartingale (Zn ), and we obtain the following. Corollary 1.29. Let (Mn ) be a martingale with values in an arbitrary Banach space B. Then supt>0 tP{supn≥0 Mn  > t} ≤ supn≥0 Mn L1 (B)

(1.37)

and for all 1 < p < ∞  supn≥0 Mn  p ≤ p supn≥0 Mn L p (B) .

(1.38)

We can now prove the martingale convergence theorem. Theorem 1.30. Let 1 ≤ p < ∞. Let B be an arbitrary Banach space. Consider f in L p (, A, P; B), and let Mn = EAn ( f ) be the associated martingale. Then Mn converges a.s. to EA∞ ( f ). Therefore, if a martingale (Mn ) converges in L p (, P; B) to a limit M∞ , it necessarily converges a.s. to this limit, and we have Mn = EAn M∞ for all n ≥ 0. Proof. The proof is based on a general principle, going back to Banach, that allows us to deduce almost sure convergence results from suitable maximal

1.6 Almost sure convergence: Maximal inequalities

23

inequalities. By Theorem 1.14, we know that EAn ( f ) converges in L p (B) to M∞ = EA∞ ( f ). Fix ε > 0 and choose k so that supn≥k Mn − Mk L p (B) < ε. We will apply (1.37) and (1.38) to the martingale (Mn )n≥0 defined by Mn = Mn − Mk if n ≥ k

and

Mn = 0 if n ≤ k.

We have in the case 1 < p < ∞  supn≥k Mn − Mk  p ≤ p ε and in the case p = 1 supt>0 tP{supn≥k Mn − Mk  > t} ≤ ε. Therefore, if we define pointwise  = limk→∞ supn,m≥k Mn − Mm , we have  = inf supn,m≥k Mn − Mm  ≤ 2 supn≥k Mn − Mk . k≥0

Thus we find  p ≤ 2p ε and, in the case p = 1, supt>0 tP{ > 2t} ≤ ε, which implies (since ε > 0 is arbitrary) that  = 0 a.s., and hence by the Cauchy criterion that (Mn ) converges a.s. Since Mn → M∞ in L p (B) we have necessarily Mn → M∞ a.s. Note that if a martingale Mn tends to a limit M∞ in L p (B) then necessarily Mn = EAn (M∞ ). Indeed, Mn = EAn Mm for all m ≥ n and by continuity of EAn we have EAn Mm → EAn M∞ in L p (B) so that Mn = EAn M∞ as announced. This settles the last assertion. Corollary 1.31. Every scalar valued martingale (Mn )n≥0 that is bounded in L p for some p > 1 (resp. uniformly integrable) must converge a.s. and in L p (resp. L1 ). Proof. By Corollary 1.16, if (Mn )n≥0 is bounded in L p for some p > 1 (resp. uniformly integrable), then Mn converges in L p (resp. L1 ), and by Theorem 1.30, the a.s. convergence is then automatic. The following useful lemma illustrates the use of stopping times as a way to properly ‘truncate’ a martingale. Lemma 1.32. Let (Mn )n≥0 be a martingale bounded in L1 (, A, P; B) where B is an arbitrary Banach space. Fix t > 0 and let  inf{n ≥ 0 | Mn  > t} if sup Mn  > t T = ∞ otherwise. Then E(MT 1{T t ). More generally, we have n

Theorem 1.40. Let (Yn ) be a sequence of B-valued Borel measurable functions on (, A, P), such that, for any choice of signs ξn = ±1, the sequence (ξnYn )  has the same distribution as (Yn ). Let fn = n0 Yk . We have then: ∀t > 0

P(sup  fn  > t ) ≤ 2 lim sup P( fn  > t )

(1.42)

∀p > 0

E sup  fn  ≤ 2 lim sup E fn  .

(1.43)

p

p

If fn converges to a limit f∞ in probability (i.e. P{ fn − f∞  > ε} → 0 for any ε > 0), then it actually converges a.s. In particular, if fn converges in L p (p > 0), then it automatically converges a.s. Finally, if fn converges a.s. to a limit f∞ , we have ∀t > 0

P(sup  fn  > t ) ≤ 2P( f∞  > t ).

(1.44)

More generally for any Borel convex subset K ⊂ B, we have P(∪n { fn ∈ K}) ≤ 2P({ f∞ ∈ K}). Proof. Let us first assume that fn converges a.s. to a limit f∞ . We start by the last assertion. Observe that for each n, f∞ = fn + ( f∞ − fn ) has the same distribution as fn − ( f∞ − fn ). Let T be the stopping time T = inf{n ≥ 0 | fn ∈ K}. Then ∞ P(∪n { fn ∈ K}) = P{T < ∞} = P(T = n). 0

Note that by the convexity of K, fn + ( f∞ − fn ) ∈ K and fn − ( f∞ − fn ) ∈ K imply fn ∈ K, and hence {T = n} implies that either f∞ ∈ K or fn − ( f∞ − fn ) ∈ K. Therefore P(T = n) ≤ P(T = n, f∞ ∈ K) + P(T = n, fn − ( f∞ − fn ) ∈ K). Choosing the signs ξk = 1 ∀k ≤ n and ξk = −1 for k > n, we see that P(T = n, fn − ( f∞ − fn ) ∈ K) = P(T = n, f∞ ∈ K)

1.7 Independent increments hence we conclude P(∪n { fn ∈ K}) =

∞ 0

P(T = n) ≤ 2

∞ 0

29

P(T = n, f∞ ∈ K)

≤ 2P( f∞ ∈ K). Replacing K by the ball of radius t, we obtain (1.44) assuming the a.s. convergence of { fn }. In particular (replacing Yn by zero for all n > N) this yields P(sup  fn  > t ) ≤ 2P( fN  > t ).

(1.45)

n≤N

Taking the supremum  ∞ over all N’s we obtain the announced inequality (1.42). Using E fn  p = 0 pt p−1 P( fn  > t ), we deduce from (1.45) that E sup  fn  p ≤ 2E fN  p n≤N

from which (1.43) follows immediately. If fn converges in L p (B) to a limit, say f∞ , we have for any fixed k E sup  fn − fk  p ≤ 2 sup E fn − fk  p . n>k

n>k

The a.s. convergence is then proved exactly as earlier for Theorem 1.30. More generally, if fn converges in probability, we have, for any fixed k and t > 0, by (1.42) P(sup  fn − fk  > t ) ≤ 2 sup P( fn − fk  > t ) n>k

n>k

from which we deduce similarly that fn converges a.s. Corollary 1.41. Let (Yn ) be independent variables in L1 (B) with mean zero  (i.e. EYn = 0) and let fn = n0 Yk as before. Then, for any p ≥ 1, we have  sup  fn  p ≤ 21+1/p sup  fn L p (B) . Proof. Let (Yn )n be an independent copy of the sequence (Yn ), let  Yn = Yn − Yn  n and fn = 0 Yn . Note that ( Yn ) are independent and symmetric. By (1.43), we have fn  p, E sup   fn  p ≤ 2 sup E  but now if p ≥ 1, we have by convexity E sup  fn  p = E sup  fn − E fn  p ≤ E sup  fn − fn  p ≤ 2 sup E fn − fn  p ≤ 2 sup E( fn  +  fn ) p ≤ 2 p (E sup  fn  p + E sup  fn  p ) = 2 p+1 E sup  fn  p .

30

Banach space valued martingales

Corollary 1.42. For a series of independent B-valued random variables, convergence in probability implies almost sure convergence.  Proof. Let fn = n0 Yk , with (Yk ) independent. Let (Yk ) be an independent copy  of the sequence (Yk ) and let fn = n0 Yk . Then the variables (Yk − Yk ) are independent and symmetric. If fn converges in probability (when n → ∞), then obviously fn and hence fn − fn also does. By the preceding theorem, fn − fn converges a.s. therefore we can choose fixed values xn = fn (ω0 ) such that fn − xn converges a.s. A fortiori, fn − xn converges in probability, and since fn also does, the difference fn − ( fn − xn ) = xn also does, which means that (xn ) is convergent in B. Thus the a.s. convergence of fn − xn implies that of fn . Remark. Let B be any Banach space. Any B-valued martingale (Mn ) bounded in L1 (B) that converges in probability must converge a.s. Indeed, if it converges in probability there is a subsequence (Mn(k) ) that converges a.s., and we will show that a.s. convergence of the whole sequence follows. Fix t > 0 and consider (Mn∧T ) as in Lemma 1.32. The a.s. converging subsequence (Mn(k)∧T ) must converge in L1 (B) to a limit f since by (1.40) the convergence is dominated. By Proposition 1.22 we have Mn∧T = EAn (Mn(k)∧T ) for any n ≤ n(k) and hence by the continuity of EAn we must have Mn∧T = EAn f . By Theorem 1.30 Mn∧T → f a.s. This shows that (Mn ) converges a.s. on the set {T = ∞} = {sup Mn  ≤ t}. But by Doob’s inequality, we can choose t > 0 large enough so that {sup Mn  > t} is arbitrarily small, and hence we conclude that (Mn ) converges a.s. We now come to a very useful classical result due to Ito and Nisio [272]. Theorem 1.43. Consider a separable Banach space B. Let D ⊂ B∗ be a countable subset that is norming, i.e. such that, for x in B, we have x = supξ ∈D |ξ (x)|. Let ( fn ) be as in Theorem 1.40. Assume that ω-a.s. there is f (ω) in B such that ξ ( f (ω)) = lim ξ ( fn (ω)) for any ξ in D. Then lim  fn (ω) − f (ω) = 0 a.s. Proof. Exceptionally, we will use here the results of the next section. We start by the observation that f is Borel measurable. Indeed, our assumptions on D and f imply that for any x0 ∈ B and any R > 0 the set { f − x0  < R} = {ω | supξ ∈D |ξ ( f (ω)) − ξ (x0 )| < R} is measurable (because the fn ’s are assumed Borel measurable). Thus, f −1 (β ) is measurable for any open ball β, and since B is separable, f must be Borel measurable. Hence by Corollary 1.46, for any ε > 0 there is K ⊂ B convex compact such that P( f ∈ / K) < ε.

1.8 Phillips’s theorem

31

By the proof of the preceding statement, since fn − ( f − fn ) clearly has the same distribution as f for any n, we have   { fn ∈ / K} ≤ 2P( f ∈ / K) < 2ε. P n

Since ε > 0 is arbitrary, it follows that { fn (ω)|n ≥ 0} is relatively compact ω-a.s. But then the norm topology coincides with the σ (B, D)-topology on σ (B,D)

{ fn (ω)|n ≥ 0}, and since fn (ω)−−−−→ f (ω), we conclude as announced that fn (ω) → f (ω) in norm ω-a.s. Remark. Here is a nice application of the preceding statement to random Fourier series on T = R/2π Z. Consider complex numbers {an | n ∈ Z}, and let (εn ) be an independent sequence of symmetric ±1 valued variables as usual. Let us consider the formal Fourier series  εn an eint . n∈Z

We wish to know when this series is almost surely the Fourier series of a (random) function in B, where B is either L p ([0, 2π ], dt ) (1 ≤ p < ∞) or C([0, 2π ]). So the question is, when is it true that for almost all choices of signs (εn (ω)), there is a function fω ∈ B such that ∀n ∈ Z

fω (n) = εn (ω)an ?

The preceding result shows that this holds iff the sequence of partial sums  SN = |n|≤N εn an eint converges a.s. in norm in B. (Just take for D the intersection of the unit ball of B∗ with the set of all finite Fourier series, with, say, rational coefficients. To verify that it is norming, one may use the Fejer kernel.) Actually, the norm convergence holds in any reordering of the summation. Thus we find that norm convergence of the partial sums (in any order) is automatic, which is in sharp contrast with the case of ordinary Fourier series of (non-random) continuous functions.

1.8 Phillips’s theorem Let (, A, P) be a σ -finite measure space. A function f :  → B is called scalarly measurable if for every ξ in B∗ the scalar valued function ξ ( f ) is measurable. Now assume B separable. As is well known, the unit ball of B∗ is weak∗ compact and metrizable, and hence separable. Therefore, there is a dense countable subset D ⊂ B∗ such that ∀x ∈ B

x = supξ ∈D |ξ (x)|.

(1.46)

32

Banach space valued martingales

Moreover B∗ is σ (B∗ , B)-separable. By (1.46) if f :  → B is scalarly measurable, then  f B is measurable. Similarly, for every x in B, the function  f − xB is measurable. The following application of Theorems 1.34 and 1.36 is a classical result (due to Phillips) in measure theory on Banach spaces. Theorem 1.44. Let (, A, P) be σ -finite. Assume that B is a separable Banach space. Then every scalarly measurable function f :  → B is Bochner measurable. Proof. Let D be as in (1.46). Since  f (·)B is a finite (measurable) func tion, we have  = n { f B ≤ n}; therefore we can easily reduce the proof to the case when  f (·)B is bounded, and in particular, we can assume  f B ∈ L1 (, A, P). Moreover, we can easily reduce to the case of a probability space (, A, P) and also assume that A is the σ -algebra generated by (ξ ( f ))ξ ∈D . Hence we can assume A countably generated. It is then easy to check that there is an increasing sequence (An ) of finite σ -subalgebras of A such that the union  n≥0 An generates A. For each n ≥ 0, we can then define an An -measurable simple function fn :  → B∗∗ by setting ξ , fn (ω) = EAn (ξ , f ).

(1.47) ∗∗

Indeed, for any atom A of An , and any ω in A we set fn (ω) = x where x∗∗ ∈ B∗∗ is defined by x∗∗ (ξ ) = E(1A ξ ( f (ω)). Clearly, (1.47) holds and ( fn ) is a martingale, so that ( fn )n≥0 is a submartingale. Similarly, for every x in B, ( fn − x) is a martingale and hence ( fn − x) is a submartingale. By Theorem 1.36,  fn − x converges a.s. for every x in B. We first claim that lim  fn − x ≤  f − x a.s. (see also the next remark). Indeed note that by (1.47) we have for any ξ in the unit ball of B∗ |ξ ( fn − x)| = |EAn (ξ ( f − x))| ≤ EAn  f − x, hence, taking the sup over all such ξ (and recalling that fn − x is a simple function), we find  fn − x ≤ EAn  f − x, but by Theorem 1.30 (scalar case), lim supn→∞ EAn  f − x =  f − x; therefore a.s. as announced, ∀x ∈ B

lim supn→∞  fn − x ≤  f − x.

(1.48)

This is enough to conclude. Indeed, let D0 be a dense countable subset of B and let A be the set of all ω in  such that lim supn→∞  fn (ω) − x ≤  f (ω) − x

1.8 Phillips’s theorem

33

for all x in D0 . Then (since D0 is countable) A ∈ A and

P(A) = 1.

But when a sequence of points xn in B∗∗ satisfies for some b in B for all x in a dense subset of B, lim supn→∞ xn − x ≤ b − x,

(1.49)

it is very easy to check that xn must tend to b when n → ∞. (Indeed, this inequality (1.49) remains true by density for all x in B, hence we simply take x = b!) In particular, for any ω in A we must have limn→∞  fn (ω) − f (ω) = 0, which shows that f is Bochner measurable as a B∗∗ valued function. Finally, the next lemma shows that f is Bochner measurable as a B-valued function (and hence a posteriori, the variables fn actually take their values in B). Lemma 1.45. Let B be a closed subspace of a Banach space X. Let f be an X valued Bochner measurable function. If the values of f are actually (almost all) in B, then f is Bochner measurable as a B-valued function. Proof. For any point x in X, let us choose a point x˜ ∈ B such that x − x ˜ ≤ 2 dist(x, B). Let ( fn ) be a sequence of X valued (measurable) simple functions tending pointwise to f . Let gn (ω) = f˜n (ω). Note that the ‘steps’ of gn are the same as those of fn . Then,  fn (ω) − gn (ω) ≤ 2 dist( fn (ω), B) ≤ 2 fn (ω) − f (ω) → 0. Therefore (gn ) is a sequence of B-valued (measurable) simple functions tending pointwise to f . Remark. Actually, there is always equality in (1.48). Indeed, let {ξ1 , ξ2 , . . .} be an enumeration of the set D. Note that, for each fixed k, the variables sup j≤k |ξ j ( fn − f )| tend to 0 when n → ∞ a.s. and in L1 , since this is true for each single martingale (ξ ( fn ))n≥0 by the martingale convergence theorem. Hence E f − x = supk E sup j≤k |ξ j ( f − x)|

(1.50)

≤ supk E lim supn→∞ sup j≤k |ξ j ( fn − x)| ≤ E lim supn→∞  fn − x, but since  fn − x converges a.s., we have by Fatou’s lemma E lim sup  fn − x = E lim inf  fn − x ≤ lim inf E fn − x, (1.51)

34

Banach space valued martingales

and by (1.48), ≤ E f − x.

(1.52)

The chain of inequalities (1.50), (1.51), (1.52) and (1.48) shows that E lim  fn − x = E f − x but lim  fn − x ≤  f − x . This forces limn→∞  fn − x =  f − x a.s. Corollary 1.46. Let f :  → B be a scalarly measurable (in particular a Borel measurable) function with values in a separable Banach space, then for any ε > 0, there is a convex compact subset K ⊂ B such that P{ f ∈ K} < ε.

 Proof. Consider εk > 0 tending to zero and such that k εk < ε. By the theorem, there is a sequence of simple functions ( fn ) tending to f . Now since  fn − f  → 0 a.s. for any k > 0, there is n(k) large enough so that P{ fn(k) − f  > εk } < εk .

(1.53)

Let Tk be the (finite) set of values of fn(k) , and let T be the set of points b ∈ B such that d(b, Tk ) ≤ εk for all k. Clearly T is relatively compact (since for any δ > 0 it admits a finite δ-net, namely Tk for any k such that εk < δ).  By (1.53), we have P{d( f , Tk ) > εk } < εk , and hence P{ f ∈ T } ≤ k εk < ε. To conclude, we simply let K be the closure of the convex hull of T . Remark. When the conclusion of the preceding corollary holds, one usually says that the distribution of f (i.e. the image measure of P under f ) is a Radon measure on B, or that it is ‘tight’. What the corollary says is that this is automatic in the separable case.

1.9 Reverse martingales We will prove here the following. Theorem 1.47. Let B be an arbitrary Banach space. Let (, A, P) be a probability space and let A0 ⊃ A−1 ⊃ A−2 ⊃ · · · be a (this time decreasing) ! sequence of σ -subalgebras of A. Let A−∞ = n≥0 A−n . Then for any f in L p (, A, P; B), with 1 ≤ p < ∞, the reverse martingale (EA−n ( f ))n≥0 converges to EA−∞ ( f ) a.s. and in L p (B). We first check the convergence in L p (B). Since the operators (EA−n )n≥0 are equicontinuous on L p (B) it suffices to check this for f in a dense subset of  L p (B). In particular, it suffices to consider f of the form f = n1 ϕi xi with ϕi

1.9 Reverse martingales

35

an indicator function and xi in B. Since ϕi ∈ L2 (, P), we have (by classical Hilbert space theory) EA−n ϕi → EA−∞ ϕi in L2 (, P) when n → ∞. (Note that L2 (, A−∞ , P) is the intersection of the family (L2 (, A−n , P))n≥0 .) Observe that  f − g p ≤  f − g2 if p ≤ 2 and  f − g pp ≤ 2 p−2  f − g22 if  f ∞ ≤ 1, g∞ ≤ 1 and p > 2. Using this, we obtain that, a fortiori, EA−n f → EA−∞ f in L p (B) for every f of the earlier form, and hence for every f in L p (B). We now turn to a.s. convergence. We first replace f by f˜ = f − EA−∞ ( f ) so that we can assume EA−n ( f ) → 0 in L p (B) and a fortiori in L1 (B). Let fn = EA−n f . Now fix n > 0 and k > 0 and consider the (ordinary sense) martingale " f−n−k+ j for j = 0, 1, . . . , k, Mj = f−n if j ≥ k. Then, by Doob’s inequality (1.37) applied to (M j ), we have for all t > 0 tP{supn≤m≤n+k  f−m  > t} ≤ E f−n ; therefore tP{supm≥n  f−m  > t} ≤ E f−n , and since E f−n  → 0 when n → ∞, we have supm≥n  f−m  → 0 a.s., or  equivalently f−n → 0 a.s. when n → ∞. As a corollary, we have the following classical application to the strong law of large numbers. Corollary 1.48. Let ϕ1 , . . . , ϕn be a sequence of independent, identically distributed random variables in L1 (, A, P; B). Let Sn = ϕ1 + · · · + ϕn . Then Sn /n → Eϕ1 a.s. and in L1 (B). Proof. Let A−n be the σ -algebra generated by (Sn , Sn+1 , . . .). We claim that Sn /n = EA−n (ϕ1 ). Indeed, for every k ≤ n, since the exchange of ϕ1 and ϕk preserves Sn , Sn+1 , . . . , we have EA−n (ϕk ) = EA−n (ϕ1 ). Therefore, averaging the preceding equality over k ≤ n, we obtain 1 EA−n (ϕ1 ) = EA−n (ϕk ) = EA−n (Sn /n) = Sn /n. 1≤k≤n n Hence (Sn /n)n≥1 is a reverse martingale satisfying the assumptions of the preceding theorem (we may take say A0 = A−1 ); therefore 1n Sn → EA−∞ (ϕ1 ) a.s. ! and in L1 (B). Finally, let T = n≥0 σ {ϕn , ϕn+1 , . . .} be the tail σ -algebra. By the zero-one law, T is trivial. The limit of Sn /n is clearly T -measurable, hence it must be equal to a constant c, but then E(Sn /n) → c, so c = E(ϕ1 ).

36

Banach space valued martingales

1.10 Continuous time* Although martingales with continuous time play a central role in probability theory (via Brownian motion and stochastic integrals), in the Banach space valued case they have not been so essential up to now. So, although we occasionally discuss the continuous case, we will mostly avoid developing that direction. Actually, in the context of martingale inequalities (our main interest in this book), the discrete time case is almost always the main point. Furthermore, in many situations, once the discrete case is understood, the continuous case follows by rather routine techniques, which are identical in the scalar and Banach valued case. The latter techniques are now very well described in several books, to which we refer the reader, e.g. [18, 22, 61–64, 81, 90]. In this section we briefly outline some of the basic points of martingales with continuous parameter analogous to what we saw for the basic convergence results proved in the discrete case in the preceding chapter. Let (, A, P) be a probability space. Let I ⊂ R be a time interval. Given a filtration (At ) indexed by I of σ -subalgebras of A, we say that (Mt )t∈I is a martingale if (as usual) it is a martingale relative to the ordered set I. Of course if I contains its right endpoint denoted by b then Mt = EAt Mb for all t ∈ I. We will say that a collection of B-valued random variables (Mt )t∈I is separable if there is a countable subset Q ⊂ I (in practice Q will be the set of rational numbers in I) and  ⊂  in A with P( ) = 1 such that for any ω ∈  and any t ∈ I there is a sequence (tn ) in Q such that tn → t and Mt (ω) = limn→∞ Mtn (ω). (The usual definition is slightly more general.) This holds in particular if the paths t → Mt (ω) are continuous on I (or merely either left or right continuous assuming, say, that I is open). This implies for instance that for any open subinterval J ⊂ I, we have a.s. sup Mt  = sup Mt  and sup Mt − Ms  = sup Mt − Ms , t∈J

t∈J∩Q

t,s∈J

t,s∈J∩Q

thus proving that suprema of this kind are measurable, even though each is a priori defined as an uncountable supremum of measurable functions. It is important to emphasize that for the definition of separability to make sense, we must be given (Mt ) as a collection of measurable functions and not just as equivalent classes modulo a.s. equality. While the distinction is irrelevant in the discrete time case, it is important in the continuous case. When I = [0, ∞), we denote by A∞ the smallest σ -algebra containing all the σ -algebras At . Without loss of generality, we may assume that A = A∞ . The basic martingale convergence Theorem 1.14 is easy to extend.

1.10 Continuous time*

37

Theorem 1.49. Assume I = [0, ∞). Let 1 ≤ p < ∞, let B be any Banach space and let M ∈ L p (, P, A; B). We have lim EAt M = EA∞ M

t→∞

a.s

in L p (B). Moreover, if (Mt ) is a separable martingale such that Mt = EAt M for any t > 0, then limt→∞ Mt = EA∞ M almost surely. Proof. The set I is a particular instance of directed set, as considered in Remark 1.17. Thus the convergence in L p (B) has already been established there. Let M∞ = EA∞ M. Assume first p > 1. We claim that for any 0 ≤ s < ∞ we have  supt>s Mt − Ms B  p ≤ p supt>s Mt − Ms L p (B) .

(1.54)

Indeed, if the sup on the left-hand side is restricted to a countable subset {t ∈ Q | t > s}, then this is easy to derive from Doob’s inequality (1.38) (applied to arbitrary finite subsets of this set). But now since (Mt ) is separable, there is  ⊂  with P( ) = 1 and Q countable such that for any ω ∈  we have supt>s Mt (ω) − Ms (ω)B = supt>s,t∈Q Mt (ω) − Ms (ω)B . This proves our claim. Similarly, if p = 1, we obtain supc>0 P{supt>s Mt − Ms B > c} ≤ supt>s Mt − Ms L1 (B) . Then the proof can be completed exactly as we did for Theorem 1.30 by showing that lims→∞ supt>s Mt − Ms  = 0 a.s. Remark 1.50. It is instructive to observe that Doob’s maximal inequalities for directed martingales can be exploited effectively to justify (1.54), as follows. Let us denote by Zs∗ the lattice supremum of the family {Mt − Ms B | t > s} in the sense of Remark 1.39. By the latter remark, we have for any p > 1 Zs∗  p ≤ p supt>s Mt − Ms L p (B) . By definition, Zs∗ is the smallest Z such that Z ∈ L p and Mt − Ms B ≤ Z a.s. for any t > s. But by our separability assumption, the variable Z = Zs∗ defined by Zs∗ (ω) = sup Mt (ω) − Ms (ω)B = sup Mt (ω) − Ms (ω)B t>s,t∈Q

∀ω ∈ 

t>s,t∈Q

satisfies this and is obviously minimal. Thus we must have Zs∗ = Zs∗ a.s., and we obtain (1.54). A similar argument can be used for p = 1. The definitions and the basic properties of stopping times and of the algebra AT can be extended without any problem: we say that a function :  → [0, ∞] is a stopping time if {T ≤ t} ∈ At for any t ∈ [0, ∞), and by definition A ∈ AT if A ∩ {T ≤ t} ∈ At for any t ∈ [0, ∞).

38

Banach space valued martingales

Since we will be mainly interested in Brownian martingales in §3.5 and §4.5, far from seeking maximal generality (for that we refer e.g. to [18, 64] or [81]) we allow ourselves rather strong assumptions, which are satisfied in the Brownian case. We will often assume, as is the case for Brownian motion, that the paths t → Mt (ω) are continuous for almost all ω. If, in addition, we also assume t → Mt locally bounded on I in L p (B) for some p > 1, then by dominated convergence (by Doob’s inequality), t → Mt is continuous from I to L p (B). We will now describe how the continuous time can be discretized in order to apply the discrete time theory that we just saw. The following basic facts are easy to verify at this stage: If (Mt ) is a B-valued martingale with respect to (At ), then (Mt B ) is a non-negative submartingale. Assuming that t → Mt is separable and bounded in L p (, A, P; B), then Doob’s inequality (1.38) clearly extends: For any 1 < p < ∞ and any 0 < R < ∞ we have  sup Mt B  p ≤ p MR B  p ,

(1.55)

0 0. It will be convenient to consider the following assumption: (A)R have

The paths t → Mt (ω) are (uniformly) continuous on [0, R] and we E sup Mt  < ∞.

(1.56)

0≤t≤R

Remark 1.51. By dominated convergence, if the paths are a.s. (uniformly) continuous on [0, R] (here a.s. means except for a negligible measurable subset of ), we have necessarily lim E

δ→0

sup

0≤s,t≤R, |t−s|≤δ

Mt − Ms  = 0.

We start with an elementary discretization of stopping times. Lemma 1.52. Fix R > 0. Then, for any stopping time T (with respect to (At )), there is a sequence of stopping times Sn ≥ T taking only finitely many values in [0, R] ∪ {∞} such that almost surely T ∧ R = lim Sn ∧ R. Proof. Let 0 = t0 < t1 < t2 < · · · tN−1 < tN = R be such that |t j − t j−1 | ≤ δ for all j. Let Q = {t0 , . . . , tN }. We define S(ω) = inf{q ∈ Q | q ≥ T (ω)}, and we set S(ω) = ∞ if T > R. This means that if 0 < S(ω) ≤ R then S(ω) is equal to the unique tk such that tk−1 < T (ω) ≤ tk . Then S is a stopping time with

1.10 Continuous time*

39

respect to the filtration (At ) with S ≥ T , and S is a simple function (admitting ∞ as one of its values) with values in Q ∪ {∞}, such that |S ∧ R − T ∧ R| ≤ δ. Choosing, say, δ = 1/n, the lemma follows. The next result is an extension of Proposition 1.22. Lemma 1.53. Let T be a stopping time with respect to (At ). Fix R > 0. Let B be a Banach space and let (Mt ) be a martingale (with respect to (At )) satisfying (A)R . Then, for any 0 ≤ s < t ≤ R we have a.s. EAs MT ∧t = MT ∧s .

(1.57)

Proof. Let us first assume that T takes only finitely many values included in the set [0, R] ∪ {∞}. Let Q = {t0 , . . . , tN }, be a finite set containing {s, t} and also containing all the possible finite values of T . We may assume 0 = t0 < t1 < t2 < · · · < tN−1 < tN = R. The idea of the proof is very simple: We consider the martingale (Mt )t∈Q adapted to (At )t∈Q and we view T as a stopping time relative to the latter filtration and taking values in Q ∪ {∞}. Then Proposition 1.22 yields (1.57). More precisely, for any 0 ≤ j ≤ N, we set B j = At j

and

f j = Mt j

and also B j = AR and f j = MR for all j > N. Then it is easy to check that ( f j ) is a martingale adapted to the filtration (B j ). With this new filtration, the time T corresponds to the variable θ :  → N defined as follows. We set θ = j on the set {T = t j } and θ = N on the set {T = ∞}. Since Q contains all the finite values of T , this defines θ on , and it is easy to check that θ is now a stopping time relative to the filtration (B j ). By Proposition 1.22 applied to the martingale ( f j ) we have for any 0 ≤ j ≤ k ≤ N EB j fθ∧k = fθ∧ j .

(1.58)

In particular, this holds for j, k such that t j = s and tk = t. In that case B j = At j = As , fθ∧ j = MT ∧s and fθ∧k = MT ∧t . Thus (1.58) coincides with (1.57) in this case. This proves (1.57) under our initial assumption on T . To prove the general case, we use a sequence Sn as in Lemma 1.52. By the first part of the proof we have for any n EAs MSn ∧t = MSn ∧s .

(1.59)

But now since the paths are continuous we know that MSn ∧s → MT ∧s and MSn ∧t → MT ∧t a.s. when n → ∞, and moreover by dominated convergence (recall (1.56)) the convergence holds also in L1 (B). Thus, passing to the limit in n, we see that (1.59) yields (1.57).

40

Banach space valued martingales

Remark. In the 1970s M. Métivier and J. Pellaumail (see [61, 62] for details) strongly advocated the idea that stochastic integrals should be viewed as vector measures with values in the topological vector space L0 (resp. L0 (B)) of measurable functions with values in R (resp. a Banach space B). Equipped with convergence in measure the latter space is metrizable but not locally convex. The relevant vector measures are defined on the σ -algebra P of ‘predictable subsets’ of [0, ∞) × , associated to a filtration (At )t>0 on . The σ -algebra P is defined as the smallest one containing all subsets of [0, ∞) ×  of the form [S, T ] = {(t, ω) | S(ω) < t ≤ T (ω)}, where S, T are stopping times such that S ≤ T , as well as all subsets of the form {0} × A with A ∈ A0 . This is the same as the σ -algebra generated by all real valued adapted processes with continuous paths. Moreover, P contains all the elementary processes of the form f (t, ω) = 1]t0 ,t1 ] (t )g(ω) with g bounded and At0 -measurable (0 < t0 < t1 < ∞). For instance, given a martingale (Mt ) in L1 (, P), we can define a vector measure μM on P by setting first μM (]S, T ]) = MT − MS ,

μM ({0} × A) = 0

and then extending this to all of P. In particular, if f is an elementary process as previously, we have  f dμM = (Mt1 − Mt0 )g.  This allows to make sense of FdμM when F is a linear combination of elementary processes and one then denotes simply   FdM = FdμM . In this way one defines the stochastic integral with respect to M of any bounded predictable process supported on a bounded interval. Actually it suffices for μM to extend to such a vector measure on P that M be a so-called ‘semi-martingale’ (roughly this is the sum of a martingale and a process with bounded variation) and, quite remarkably, it was proved (by Dellacherie-Mokobodzki-Meyer and Bichteler) that conversely only semi-martingales give rise to an L0 valued vector measure (among all processes with sufficiently regular paths, called ‘càdlàg’). This result is presented in detail in [62], to which we refer the reader for the precise definition of a semi-martingale.

1.11 Notes and remarks

41

1.11 Notes and remarks Among the many classical books on Probability that influenced us, we mention [11, 22], see also [35]. As for martingales, the references that considerably influenced us are [26, 34, 71] and the papers [165, 175]. Martingales were considered long before Doob (in particular by Paul Lévy) but he is the one who invented the name and proved their basic almost sure convergence properties using what is now called Doob’s maximal inequality. In Theorem 1.40, we slightly digress and concentrate on a particular sort of martingale, those that are partial sums of series of independent random vectors. In the symmetric case, it turns out that the maximal inequalities (and the associated almost sure convergence) hold for ‘martingales’ bounded in L p (B) for p < 1. Our presentation of this is inspired by Kahane’s book [44]. See Hoffmann-Jørgensen’s [265] for more on this theme. We also include an important useful result (Theorem 1.43) due to Ito and Nisio [272] showing that, for such sums, even a very mild looking kind of convergence automatically guarantees norm convergence. Although this looks at first rather eccentric, conditional expectations and martingales may be considered with respect to certain complex measures, see [190] for more details. This chapter ends with a brief presentation of continuous time martingales. As already mentioned in the text, many excellent books are now available such as e.g. [18, 61–64, 81, 90]. We should warn the reader that we use a stronger notion of ‘separability’ for a random process than in most textbooks, such as [63], but the present one suffices for our needs.

2 Radon-Nikodým property

2.1 Vector measures To introduce the Radon-Nikodým property (in short RNP), we will need to briefly review the basic theory of vector measures. Let B be a Banach space. Let (, A) be a measure space. Every σ -additive map μ : A → B will be called a (B-valued) vector measure. We will say that μ is bounded if there is a finite positive measure ν on (, A) such that ∀A ∈ A

μ(A) ≤ ν(A).

(2.1)

When this holds, it is easy to show that there is a minimal choice of the measure ν. Indeed, for all A in A let |μ|(A) = sup{μ(Ai )}, where the supremum runs over all decompositions of A as a disjoint union A = ∪Ai of finitely many sets in A. Using the triangle inequality, one checks that |μ| is an additive set function. By (2.1) |μ| must be σ -additive and finite. Clearly, when (2.1) holds, we have |μ| ≤ ν. We define the ‘total variation norm’ of μ as follows: μ = inf{ν() | ν ∈ M(, A), ν ≥ |μ|}, or equivalently μ = |μ|(). We will denote by M(, A) the Banach space of all bounded complex valued measures on (, A), and by M+ (, A) the subset of all positive bounded measures. We will denote by M(, A; B) the space of all bounded B-valued 42

2.1 Vector measures

43

measures μ on (, A). When equipped with the preceding norm, it is a Banach space. Let μ ∈ M(, A; B) and ν ∈ M+ (, A). We will write |μ|  ν if |μ| is absolutely continuous (or equivalently admits a density) with respect to ν. This happens iff there is a positive function w ∈ L1 (, A, ν) such that |μ| ≤ w.ν or equivalently such that ∀A ∈ A

 μ(A) ≤

wdν. A

Recapitulating, we may state: Proposition 2.1. A vector measure μ is bounded in the preceding sense iff its total variation is finite, the total variation being defined as  n μ(Ai ) V (μ) = sup 1  where the sup runs over all measurable partitions  = ni=1 Ai of . Thus, if μ is bounded, we have V (μ) = |μ|().  n μ(Ai ) , where Proof. Assuming V (μ) < ∞, let ∀A ∈ A ν(A) = sup 1  the sup runs over all measurable partitions A = ni=1 Ai of A. Then ν is a σ -additive finite positive measure on A, and satisfies (2.1). Thus μ is bounded in the earlier sense (and of course ν is nothing but |μ|). The converse is obvious. Remark. Given ν ∈ M(, A) and f ∈ L1 (, A, ν; B), we can define a vector measure μ denoted by f .ν (or sometimes simply by f ν) by  f dν. ∀A ∈ A μ(A) = A

It is easy to check that μ = f .ν implies |μ| = F.ν where F (ω) =  f (ω)B ,

(2.2)

 f .νM(,A;B) =  f L1 (,A,ν;B) .

(2.3)

and therefore

Indeed, by Jensen’s inequality, we clearly have  ∀A ∈ A μ(A) ≤  f dν, A

hence |μ| ≤ F.ν. To  prove the converse, let  > 0 and let g be a B-valued simple function such that A  f − gdν < . We can clearly assume that g is supported

44

Radon-Nikodým property

by A, so that we can write g = of A. We have

n 1

1Ai xi , with xi ∈ B and Ai is a disjoint partition







μ(Ai ) − ν(Ai )xi  =  ( f − g)dν

≤  f − gdν < ; Ai

hence

A

 gdν = ν(Ai )xi  ≤ μ(Ai ) + , A

and finally,



  f dν ≤

A

gdν +  ≤ μ(Ai ) + 2, A

which implies

  f dν ≤ |μ|(A) + 2. A

This completes the proof of (2.2). Proposition 2.2. Consider a bounded B-valued vector measure μ on (, A). There is a unique continuous linear map [μ] : L1 (, A, |μ|) → B such that [μ](1A ) = μ(A) for all  A in A. By convention, for any g in L1 (, A, |μ|) we will denote [μ](g) = gdμ. With this notation, we have







gdμ ≤ |g|d|μ|. (2.4)



Proof. Let V be the set of all linear combination of characteristic functions, i.e.  g = n1 ci 1Ai with ci ∈ C and with Ai ∈ A mutually disjoint then we can define ∀g ∈ V

[μ](g) =

n 

ci μ(Ai ).

1

This definition is clearly unambiguous and satisfies by the triangle inequality    [μ](g)B ≤ |ci | μ(Ai ) ≤ |ci ||μ|(Ai ) = |g|d|μ|. Thus [μ] is continuous on the dense linear subspace V ⊂ L1 (, A, |μ|), hence [μ] has a unique extension to the whole of L1 (, A, |μ|). We note if |μ| ≤ w.ν ∀g ∈ L∞ (, A, ν)



 



gdμ ≤ |g|wdν.



(2.5)

B

For future reference, we include here an elementary ‘regularization’ lemma concerning the case when |μ| is a finite measure on T or R (or any locally

2.1 Vector measures

45

compact Abelian group) absolutely continuous with respect to the HaarLebesgue measure m. Lemma 2.3. Let B be a Banach space. Let (, A) be either T or R equipped with the Borel σ -algebra and let μ ∈ M(, A; B) be such that |μ| is absolutely continuous with respect to m. Then for any ϕ ∈ L1 (, A, m) ∩ L∞ (, A, m) we define μ ∗ ϕ ∈ M(, A) by setting for any Borel set A   μ ∗ ϕ(A) = ( ϕ(y)1A (x + y)dm(y))dμ(x). Then there is a bounded continuous function f ∈ L1 (, A, m; B) such that for any Borel set A  f dm. μ ∗ ϕ(A) = A

Proof. Let ϕ(x) ˇ = ϕ(−x) and ϕˇy (x) = ϕ(y − x). Let g = 1A . By the translation invariance of m and a straightforward generalization of Fubini’s theorem to this setting, we have   μ ∗ ϕ(A) = ( ϕ(y)1A (x + y)dm(y))dμ(x)   = ( 1A (y)ϕˇy (x)dm(y))dμ(x)  = f (y)dm(y) 

A

where f (y) = ϕˇy dμ. We claim that y → ϕˇy is a continuous function from  to L1 (|μ|). Using this, by (2.4) f is a continuous and bounded B-valued function and moreover    f dm ≤ |ϕˇy |d|μ|dm(y) ≤ |μ|()ϕL1 (dm) < ∞. To verify the claim, let w be the density of |μ| relative to m, so that |μ| = w.m. By a classical result, for any ϕ ∈ L1 (, A, m) the mapping y → ϕˇy is  a continuous function from  to L1 (m). Let ε > 0, let c be such that w>c wdm < ε. We have    |ϕˇy − ϕˇy ||d|μ| = |ϕˇy − ϕˇy |wdm ≤ c |ϕˇy − ϕˇy |dm + 2ϕ ˇ ∞ε  ˇ ∞ ε. Since ε is arbitrary, the and hence lim sup|y−y |→0 |ϕˇy − ϕˇy ||d|μ| ≤ 2ϕ latter lim sup = 0, proving the claim. We will use very little from the theory of vector measures, but for more information we refer the interested reader to [21].

46

Radon-Nikodým property

2.2 Martingales, dentability and the Radon-Nikodým property Definition. A Banach space B is said to have the Radon-Nikodým property (in short RNP) if for every measure space (, A), for every σ -finite positive measure ν on (, A) and for every B-valued measure μ in M(, A; B) such that |μ|  ν, there is a function f in L1 (, A, ν; B) such that μ = f .ν i.e. such that  f dν. ∀A ∈ A μ(A) = A

Remark 2.4. For the preceding property to hold, it suffices to know the conclusion (i.e. the existence of a RN derivative f ) whenever ν is a finite measure and we have |μ| ≤ ν. In fact it all boils down to the case when ν = |μ|. Indeed, if we assume that this holds in the latter case, let ν be σ -finite such that |μ|  ν. We have then (scalar RN theorem) |μ| = w.ν for some w ∈ L1 (ν), so if we find f in L1 (|μ|; B) such that μ = f .|μ|, we have by (2.2) |μ| =  f .|μ| and hence  f  = 1 |μ|-a.s. and therefore if f = w f we have μ = f .ν and f ∈ L1 (ν; B). We will need the concept of a δ-separated tree. Definitions. Let δ > 0. A martingale (Mn )n≥0 in L1 (, A, P; B) will be called δ-separated if (i) M0 is constant, (ii) Each Mn takes only finitely many values, (iii) ∀n ≥ 1, ∀ω ∈  Mn (ω) − Mn−1 (ω) ≥ δ. Moreover, the set S = {Mn (ω) | n ≥ 0, ω ∈ } of all possible values of such a martingale will be called a δ-separated tree. If the martingale is dyadic, i.e. it is adapted to the standard dyadic filtration on {−1, 1}N described in §1.4, then we will call it a δ-separated dyadic tree. • •

• • • •

M0 • •







• •

• • Figure 2.1. A δ-separated dyadic tree (M0 , M1 , M2 , M3 ).

2.2 Martingales, dentability and the RNP

47

Remark (Warning). Some authors use the term δ-bush for what we call a δseparated tree (e.g. [5, p. 111]). Some authors use the term δ-separated tree for what we call a δ-separated dyadic tree. Another perhaps more intuitive description of a δ-separated tree is as a collection of points {xi | i ∈ I} indexed by the set of nodes of a tree-like structure that starts at some origin (0) then separates into N1 branches, which we denote by (0, 1), (0, 2), . . . , (0, N1 ), then each branch itself splits into a finite number of branches, etc. in such a way that each point xi is a convex combination of its immediate successors, and all these successors are at distance at least δ from xi . We will also need another more geometric notion.

ε x D

Figure 2.2. Dentable set.

Definition. Let B be a Banach space. A subset D ⊂ B is called dentable if for any ε > 0 there is a point x in D such that x∈ / conv(D\B(x, ε)) where conv denotes the closure of the convex hull, and where B(x, ε) = {y ∈ B | y − x < ε}. By a slice of a convex set C (in a real Banach space B) we mean the nonempty intersection of C with an open half-space determined by a closed hyperplane. Equivalently, S ⊂ C ⊂ B is a slice of C if there is ξ ∈ B∗ and α > 0 such that S = {x ∈ C | ξ (x) > α} = φ. Remark 2.5. Let D ⊂ B be a bounded subset and let C be the closed convex hull of D. If C is dentable, then D is dentable. Moreover, C is dentable iff C admits slices of arbitrarily small diameter. Note in particular that the dentability of all closed bounded convex sets implies that of all bounded sets. Indeed, the presence of slices of small diameter clearly implies dentability. Conversely, if C is dentable, then for any ε > 0 there is a point x in C that does not belong to the

48

Radon-Nikodým property

closed convex hull of C \ B(x, ε), and hence by Hahn-Banach separation, there is a slice of C containing x and included in B(x, ε), therefore with diameter less than 2ε. Now if C = conv(D), then this slice must contain a point in D, exhibiting that D itself is dentable. The following beautiful theorem gives a geometric sufficient condition for the RNP. We will see shortly that it is also necessary. Theorem 2.6. If every bounded subset of a Banach space B is dentable, then B has the RNP. Proof. Let (, A, m) be a finite measure space and let μ : A → B be a bounded vector measure such that |μ|  m. We will show that μ admits a RadonNikodým derivative in L1 (, A, m; B). As explained in Remark 2.4 we may as well assume that m is finite and |μ| ≤ m. Now assume |μ| ≤ m and for every μ(A) and let A in A let xA = m(A) CA = {xβ | β ∈ A, β ⊂ A, m(β ) > 0}. Note that xA  ≤ 1 for all A in A, so that the sets CA are bounded. We will show that if every set CA is dentable then the measure admits a Radon-Nikodým derivative f in L1 (, A, m; B). Step 1: We first claim that if C is dentable then ∀ε > 0 ∃A ∈ A with m(A) > 0 such that diam(CA ) ≤ 2ε. This (as well as the third) step is proved by an exhaustion argument. Suppose that this does not hold, then ∃ε > 0 such that every A with m(A) > 0 satisfies diam(CA ) > 2ε. In particular, for any x in B, A contains a subset β with m(β ) > 0 such that x − xβ  > ε. Then, consider a fixed measurable A with m(A) > 0 and let (βn ) be a maximal collection of disjoint measurable subsets of A with positive measure such that xA − xβn  > ε. (Note that since m(βn ) > 0 and the sets are disjoint, such a maximal collection must be at most countable.) By our   assumption, we must have A = βn , otherwise we could take A = A \ βn and find a subset β of A that would contradict the maximality of the family  (βn ). But now if A = βn , we have xA = (m(βn )/m(A))xβn

and

xA − xβn  > ε.

Since we can do this for every A ⊂  with m(A) > 0 this means that for some ε > 0, every point x of C lies in the closed convex hull of points in C \ B(x, ε), in other words this means that C is not dentable, which is the

2.2 Martingales, dentability and the RNP

49

announced contradiction. This proves the preceding claim and completes step 1. Working with CA instead of C , we immediately obtain Step 2: ∀ε > 0 ∀A ∈ A with m(A) > 0 ∃A ⊂ A with m(A ) > 0 such that diam(CA ) ≤ 2ε. Step 3: We use a second exhaustion argument. Let ε > 0 be arbitrary and let (An ) be a maximal collection of disjoint measurable subsets of  with m(An ) > 0 such that diam(CAn ) ≤ 2ε. We claim that, up to a negligible set, we have   necessarily  = An . Indeed if not, we could take A =  − ( An ) in step 2 and find A ⊂ A contradicting the maximality of the family (An ). Thus  =  An . Now let gε = 1An xAn . Clearly, gε ∈ L1 (, m; B) and we have μ − gε .mM(,A;B) ≤ 2εm().

(2.6)

Indeed, for every A in A with m(A) > 0  μ(A) − gε dm = m(A ∩ An )[xA∩An − xAn ] A

hence







μ(A) − gε dm ≤ m(A ∩ An )xA∩A − xA  n n



A

≤ m(A)(2ε), which implies (2.6). This shows that μ belongs to the closure in M(, A, B) of the set of all measures of the form f .m for some f in L1 (, A; B), and since this set is closed by (2.3) we conclude that μ itself is of this form. Perhaps, a more concrete way  to say the same thing is to say that if fn = g2−n then f = f0 + n≥1 fn − fn−1 is in L1 (, m; B) and we have μ = f .m. (Indeed, note that (2.6) (with (2.3) and the triangle inequality) implies  fn − fn−1 L1 (B) ≤ 6.2−n m().) To expand on Theorem 2.6, the following simple lemma will be useful. Lemma 2.7. Fix ε > 0. Let D ⊂ B be a subset such that ∀x ∈ D

x ∈ conv(D\B(x, ε))

(2.7)

 = D + B(0, ε/2) satisfies then the enlarged subset D  ∀x ∈ D

 x ∈ conv(D\B(x, ε/2)).

(2.8)

 x = x + y with x ∈ D and y < ε/2. Choose Proof. Consider x in D, δ > 0 small enough so that δ + y < ε/2. By (2.7) there are positive numbers α1 , . . . , αn with αi = 1 and x1 , . . . , xn ∈ D such that xi − x  ≥ ε and

50

Radon-Nikodým property

x − αi xi  < δ. Hence x = αi xi + z with z < δ. We can write x = x +  since z + y ≤ z + y < y = αi (xi + z + y). Note that xi + z + y ∈ D ε/2 and moreover x − (xi + z + y) = x − xi − z ≥ x − xi  − z ≥ ε − δ ≥ ε/2. Hence we conclude that (2.8) holds. Remark 2.8. In the preceding proof, if we wish, we may obviously choose our coefficients (αi ) in Q or even of the form αi = 2−N ki for some N large enough  with ki ∈ N such that ki = 2N . We now come to a very important result which incorporates the converse to Theorem 2.6. Theorem 2.9. Fix 1 < p < ∞. The following properties of a Banach space B are equivalent: (i) B has the RNP. (ii) Every uniformly integrable B-valued martingale converges a.s. and in L1 (B). (iii) Every B-valued martingale bounded in L1 (B) converges a.s. (iv) Every B-valued martingale bounded in L p (B) converges a.s. and in L p (B). (v) For every δ > 0, B does not contain a bounded δ-separated tree. (vi) Every bounded subset of B is dentable. Proof. (i) ⇒ (ii). Assume (i). Let (, A, P) be a probability space and let (An )n≥0 be an increasing sequence of σ -subalgebras. Let us assume A = A∞ for simplicity. Let (Mn ) be a B-valued uniformly integrable martingale adapted to (An )n≥0 . We can associate to it a vector measure μ as follows. For any A in A = A∞ , we define  Mn dP. (2.9) μ(A) = lim n→∞ A

We will show that this indeed makes sense and defines a bounded vector  measure. Note that if A ∈ Ak then by (1.18) for all n ≥ k A Mn dP = A Mk dP, so that the limit in (2.9) is actually stationary. Thus, (2.9) is well defined  when A ∈ n≥0 An . Since (Mn ) is uniformly integrable, ∀ε > 0 ∃δ > 0 such that P(A) < δ ⇒ μ(A) < ε. Using this, it is easy to check that μ extends to a σ -additive vector measure on A∞ . Indeed, note that, by (1.14), we have

2.2 Martingales, dentability and the RNP

51

E(Mn 1A ) = E(Mn EAn (1A )). Thus the limit in (2.9) is the same as lim E(Mn EAn (1A )).

(2.10)

n→∞

To check that this definition makes sense, note that if ϕn = EAn (1A ), then E(Mn ϕn ) − E(Mm ϕm ) = E(Mm (ϕn − ϕm )),

∀n < m

(2.11)

and ϕn → 1A in L1 by Theorem 1.14. But by the uniform integrability (since |ϕn − ϕm | ≤ 2) we also must have E(Mm (ϕn − ϕm )) → 0 when n, m → ∞. Indeed, we can write for any t > 0  Mm  + tE|ϕn − ϕm |, E(Mm (ϕn − ϕm )) ≤ 2 sup m

Mm >t

 so that lim supn,m→∞ E(Mm (ϕn − ϕm )) ≤ 2 supm Mm >t Mm  and hence must vanish by the uniform integrability. Thus by (2.11) we conclude that the limit in (2.9) exists by the Cauchy criterion. By Theorem 1.36, the submartingale Mn  converges in L1 to a limit w in L1 . By (2.9) and Jensen’s inequality, we have  μ(A) ≤ lim E(Mn 1A ) = wdP, n→∞

A

from which the σ -additivity follows easily (the additivity is obvious by (2.9)). Note that for all A in A  (2.12) |μ|(A) ≤ wdP. A

Indeed, for all A1 , . . . , Am in A disjoint with A = ∪Ai  m m    μ(Ai ) ≤ wdP = wdP 1

1

Ai

A

and taking the supremum of the left-hand side, we obtain Claim 2.12. This shows |μ|   P. By our assumption (i), there is f in L1 (, A, P; B) such that μ(A) = A f dP for all A in A. Recall that for any k ≥ 0 and for any A in Ak we have by (1.18) ∀n ≥ k

E(Mn 1A ) = E(Mk 1A )

hence by (2.9) μ(A) = E(Mk 1A ) for any A in Ak . Therefore we must have   ∀k ≥ 0 ∀A ∈ Ak f dP = Mk dP A

A

52

Radon-Nikodým property

or equivalently, since this property characterizes EAk ( f ) (see the remarks after (1.7)) Mk = EAk ( f ). Hence by Theorems 1.14 and 1.30, (Mn ) converges to f a.s. and in L1 (B). This completes the proof of (i) ⇒ (ii). (ii) ⇒ (iii). This follows from Proposition 1.33. (iii) ⇒ (iv) is obvious. We give in what follows a direct proof that (iv) implies (i). (iv) ⇒ (v) is clear, indeed a bounded δ-separated tree is the range of a uniformly bounded martingale (Mn ), which converges nowhere since Mn − Mn−1  ≥ δ everywhere. (vi) ⇒ (i) is Theorem 2.6, so it only remains to prove (v) ⇒ (vi). Assume that (vi) fails. We will show that (v) must also fail. Let D ⊂ B be a  in Lemma 2.7, we can bounded non-dentable subset. Replacing D by the set D assume that there is a number δ > 0 such that ∀x ∈ D

x ∈ conv(D − B(x, δ)).

We will then construct a δ-separated tree inside D. Let (, A, P) be the Lebesgue interval. We pick an arbitrary point x0 in D and let M0 ≡ x0 on  = [0, 1]. Then since x0 ∈ conv(D − B(x0 , δ)) ∃α1 > 0, . . . , αn > 0 with

n 

αi = 1 ∃x1 , . . . , xn ∈ D

1

such that x0 =

n 

αi xi

and

xi − x0  ≥ δ.

(2.13)

1

We can find in  disjoint subsets A1 , . . . , An such that P(Ai ) = αi and ∪Ai = . We then let A0 be the trivial σ -algebra and let A1 be the σ -algebra generated by A1 , . . . , An . Then we define M1 (ω) = xi if ω ∈ Ai . Clearly (2.13) implies EA0 M1 = M0 and M1 − M0  ≥ δ everywhere. Since each point xi is in D, we can continue in this way and represent each xi as a convex combination analogous to (2.13). This will give M2 , M3 , etc. We skip the details of the obvious induction argument. This yields a δseparated martingale and hence a δ-separated tree. This completes the proof of (v) ⇒ (vi) and hence of Theorem 2.9. Finally, as promised, let us give a direct argument for (iv) ⇒ (i). Assume (iv) and let μ be a B-valued vector measure such that |μ|  ν where ν is as in the definition of the RNP. Then, by the classical RN theorem, there is a scalar density w such that |μ| = w.ν, thus it suffices to produce a RN density for μ with respect to |μ|, so that, replacing ν by |μ| and normalizing, we may as well

2.2 Martingales, dentability and the RNP

53

assume that we have a probability P such that ∀A ∈ A

μ(A) ≤ P(A).

Then for any finite σ -subalgebra B ⊂ A, generated by a finite partition A1 , . . . , AN of , we consider the B-measurable (simple) function fB :  → B that is equal to μ(A j )P(A j )−1 on each atom A j of B. It is then easy to check that { fB | B ⊂ A, |B| < ∞} is a martingale indexed by the directed set of all such B’s. By Remark 1.17, if (iv) holds then the resulting net converges in L p (B), and a fortiori in L1 (B) to a limit f ∈ L1 (B). By the continuity of EC , for each fixed finite C, EC ( fB ) → EC ( f ) in L1 (B), and EC ( fB ) = fC when C ⊂ B, therefore we must have EC ( f ) = fC for any finite C. Applying this to an arbitrary A ∈ A, taking for C the σ -subalgebra generated by A (and its complement), we obtain (recall that fC is constant on A, equal to μ(A)P(A)−1 ) E(1A f ) = E(1A fC ) = P(A) × μ(A)P(A)−1 = μ(A), so that we conclude that f .P = μ, i.e. we obtain (i). Remark. If the preceding property (vi) is weakened by considering only dyadic trees (i.e. martingales relative to the standard dyadic filtration on say [0, 1]), or k-regular trees, then it does not imply the RNP: Indeed, by [143] there is a Banach space B (isometric to a subspace of L1 ) that does not contain any bounded δ-separated dyadic tree, but that fails the RNP. Actually, that same paper shows that for any given sequence (K(n)) of integers, there is a Banach space B failing the RNP but not containing any δ-separated tree relative to a filtration such that |An | ≤ K(n) for all n. Corollary 2.10. If for some 1 ≤ p ≤ ∞ every B-valued martingale bounded in L p (B) converges a.s. then the same property holds for all 1 ≤ p ≤ ∞. Remark 2.11. Note that for 1 < p < ∞, if a B-valued martingale (Mn ) is bounded in L p (B) and converges a.s. to a limit f , then it automatically also converges to f in L p (B). Indeed, by the maximal inequalities (1.38) the convergence of Mn − f  p to zero is dominated, hence by Lebesgue’s theorem  Mn − f  p dP → 0. Corollary 2.12. The RNP is separably determined, that is to say: if every separable subspace of a Banach space B has the RNP, then B also has it. Proof. This follows from Theorem 2.9 by observing that a B-valued martingale in L1 (B) must ‘live’ in a separable subspace of B. Alternately, note that any δseparated tree is included in a separable subspace.

54

Radon-Nikodým property

Corollary 2.13. If a Banach space B satisfies either one of the properties (ii)– (v) in Theorem 2.9 for martingales adapted to the standard dyadic filtration on [0,1], then B has the RNP. Proof. It is easy to see by a suitable approximation (see Remark 2.8) that if B contains a bounded δ-separated tree, then it contains one defined on a subsequence {Ank | k ≥ 1}, (n1 < n2 < · · · ) of the dyadic filtration (An ) in [0,1]. This yields the desired conclusion. Corollary 2.14. If a Banach space B satisfies the property in Definition 2.2 when (, ν) is the Lebesgue interval ([0, 1], dt ), then B has the RNP. Corollary 2.15. Any reflexive Banach space and any separable dual have the RNP. Proof. Since the RNP is separably determined by Corollary 2.12, it suffices to prove that separable duals have the RNP. So assume B = X ∗ , and that B is separable. Note that X is necessarily separable too and the closed unit ball of B is a metrizable compact set for σ (X ∗ , X ). Let {Mn } be a martingale with values in the latter unit ball. For any ω, let f (ω) be a cluster point for σ (X ∗ , X ) of {Mn (ω) | n ≥ 0}. Let D ⊂ X be a countable dense subset of the unit ball of X. For any d in D, the bounded scalar martingale Mn , d converges almost surely to a limit that has to be equal to  f (ω), d. Hence since D is countable, there is  ⊂  with P( ) = 1 such that ∀ ω ∈ 

∀d ∈D

Mn (ω), d →  f (ω), d. σ (X ∗ ,D)

In other words we have Mn (ω)−−−−−−−→ f (ω) or equivalently (since we are σ (X ∗ ,X )

in the unit ball of B) Mn (ω)−−−−−→ f (ω) for any ω in  . Notice that we did not discuss the measurability of f yet. But now we know that ω →  f (ω), x is measurable for any x in X, hence since X is separable for any b ∈ B, ω → b − f (ω) = supx∈D |b − f (ω), x| is measurable, so f −1 (β ) = {ω | f (ω) ∈ β} is measurable for any open (or closed) ball β ⊂ X ∗ , and finally since X ∗ is separable, for any open set U ⊂ X ∗ , the set f −1 (U ) must be measurable, so f is Borel measurable. We claim that this implies that f is Bochner measurable. This (and the desired conclusion) follows from Phillips’s theorem (see §1.8). Alternatively we can conclude the proof by the same trick as in §1.8, as follows. For any b in B we have b − Mn  =

sup d∈D,d≤1

|b − Mn , d| =

sup d∈D,d≤1

|En b − f , d| ≤ En b − f 

2.2 Martingales, dentability and the RNP

55 a.s.

(note that ω → b − f (ω) is bounded and measurable, so that En b − f  −→ b − f ). Therefore, lim supn b − Mn ≤b − f  a.s.We can assume that this holds on the same set of probability one for all b in a countable dense subset of B, hence actually for all b in B. But then taking b = f (ω) we have for almost all ω, lim supn→∞  f (ω) − Mn (ω) = 0. Thus we conclude by Theorem 2.9 that B has the RNP. Remark. The examples of divergent martingales described in Remark 1.35 show that the separable Banach spaces L1 ([0, 1]) and c0 fail the RNP. Remark. The RNP is clearly stable by passing to subspaces but obviously not to quotients Indeed, 1 , being a separable dual, has the RNP but any separable space (e.g. c0 ) is a quotient of it. Remark 2.16. The notion of ‘quasi-martingale’ is useful to work with random sequences that are obtained by perturbation of a martingale. An adapted sequence (Fn )n≥0 in L1 (B) is said to be a quasi-martingale if ∞ 

En−1 (Fn − Fn−1 )L1 (B) < ∞.

1

Given such a sequence, let fn = Fn −

n 1

Ek−1 (Fk − Fk−1 ),

so that denoting dfn = fn − fn−1 we can write dfn = dFn − En−1 (dFn ). Clearly ( fn ) is then a martingale and for all m < n we have pointwise  Ek−1 (Fk − Fk−1 )B ( fn − fm ) − (Fn − Fm )B ≤ m t F (s) − F (t ) = f (x)dx, t

and then the B-valued version of the classical Lebesgue differentiation theorem ensures that F (x) exists and F (x) = f (x) at a.e. x in (a, b). The B-valued version of Lebesgue’s differentiation theorem can be deduced from the HardyLittlewood maximal inequality in much the same way that we deduced the martingale convergence theorem from Doob’s inequality. See §3.3 and Remark 3.9 for more details. Conversely, the a.e. differentiability of all B-valued Lipschitz functions F implies the RNP by Corollary 2.14. The following complements the panorama of the interplay between martingale convergence and Radon-Nikodým theorems. This statement is valid for general Banach spaces, but we should emphasize for the reader that the ω-a.s. convergence of the variables ω →  fn (ω) is considerably weaker than that of the sequence ( fn (ω)) itself. The latter requires the RNP by Theorem 2.9.

2.3 The dual of L p (B)

57

Proposition. Let B be an arbitrary Banach space. Consider μ ∈ M(, A; B) such that |μ| = w.P where P is a probability measure on (, A) and w ∈ L1 (, A, P). Let (An )n≥0 be a filtration such that A∞ = A, and such that, for each n, μ|An admits a RN density fn in L1 (, A, P; B) (for instance this is automatic if An is finite or atomic). Then  fn  → w a.s. Proof. By Proposition 2.1, for each fixed ε > 0 we can find unit vectors ξ1 , . . . , ξN in B∗ such that the vector measure μN : A → N∞ defined by μN (A) =  (ξ j (μ(A))) j≤N satisfies |μN |() > |μ|() − ε = 1 − ε. Assume |μ|() = w dP = 1 for simplicity. Note that |μ|An | ≤ |μ||An = wn .P where wn = EAn w. Therefore  fn  ≤ wn . By the martingale convergence Theorem 1.14, wn → w a.s. and in L1 , and hence lim sup  fn  ≤ w and



a.e.

 lim sup  fn  ≤ w = 1. We claim that   lim inf n  fn  ≥ lim inf n sup j≤N |ξ j ( fn )| = |μN |() > 1 − ε.

Indeed, being finite-dimensional, N∞ has the RNP and hence μN = ϕN .P for some ϕN in L1 (; A, P; N∞ ). This implies (by (2.2)) |μN | = ϕN .P. Clearly EAn ϕN = (ξ j ( fn )) j≤N and hence when n → ∞ sup j≤N |ξ j ( fn )| → ϕN  Thus

a.s. and in L1 .

 E lim inf n sup j≤N |ξ j ( fn )| =

ϕN dP = |μN |() > 1 − ε,

proving the preceding claim. Using this claim,  we conclude easily: We have lim inf  fn  ≤ lim sup  fn  ≤ w but lim inf  fn dP > w dP − ε, so we obtain lim inf  fn  = lim sup  fn  = w a.e.

2.3 The dual of L p (B) Notation. By analogy with the Hardy space case, let us denote by h p (, (An )n≥0 , P; B)

58

Radon-Nikodým property

the (Banach) space of all B-valued martingales M = {Mn | n ≥ 0} that are bounded in L p (B), equipped with the norm M = sup Mn L p (B) . n≥0

Remark 2.18. Note that, by Theorem 1.14, the mapping f → {En ( f ) | n ≥ 0} defines an isometric embedding of L p (, A∞ , P; B) into h p (, (An )n≥0 , P; B). Remark 2.19. Let 1 < p < ∞. With this notation, Theorem 2.9 says that B has the RNP iff h p (, (An )n≥0 , P; B) = L p (, A∞ , P; B). We now turn to the identification of the dual of L p (B). Let p be the conjugate exponent such that p−1 + p −1 = 1. Suppose that we are given a filtration A0 ⊂ · · · An ⊂ An+1 ⊂ · · · of finite σ -subalgebras and let us assume A = A∞ . Let L p (B) = L p (, A, P; B) with 1 ≤ p < ∞. Let ϕ be a bounded linear form on L p (B). By restriction to L p (, An , P; B), ϕ defines a linear form ϕn in L p (, An , P; B)∗ . But, since An is finite, we have L p (, An , P; B)∗ = L p (, An , P; B∗ ) isometrically, hence ϕn corresponds to an element Mn in L p (, An , P; B∗ ). Moreover, since ϕn is the restriction of ϕn+1 it is easy to see that Mn = En (Mn+1 ), i.e. that {Mn } is a B∗ valued martingale. Moreover, we have sup Mn L p (B∗ ) = ϕL p (B)∗ . n

Proposition 2.20. If 1 ≤ p < ∞, the preceding correspondence ϕ → (Mn )n≥0 is an isometric isomorphism from L p (, A, P; B)∗ to the space h p (, (An )n≥0 , P; B∗ ). Proof. Indeed, it is easy to see conversely that given any martingale {Mn } in the unit ball of h p (, (An )n≥0 , P; B∗ ), Mn defines an element ϕn in L p (, An , P; B)∗ so that ϕn+1 extends ϕn , and ϕn  ≤ 1. Hence by density of the union of the spaces L p (, An , P; B) in L p (B), we can extend the ϕn ’s to a (unique) functional ϕ in L p (B)∗ with ϕ ≤ 1. Thus, it is easy to check that the correspondence is one-to-one and isometric.

2.3 The dual of L p (B)

59

Remark 2.21. By Remark 2.18, we have an isometric embedding L p (, A∞ , P; B∗ ) ⊂ L p (, A∞ , P; B)∗ . Theorem 2.22. A dual space B∗ has the RNP iff for any countably generated probability space and any 1 ≤ p < ∞ we have (isometrically) L p (, A, P; B)∗ = L p (, A, P; B∗ ). Moreover for B∗ to have the RNP it suffices that this holds for some 1 ≤ p < ∞ and for the Lebesgue interval. Proof. If A is countably generated we can assume A = A∞ with A∞ associated to a filtration of finite σ -algebras (An ), as earlier. Then Theorem 2.22 follows from Proposition 2.20 and Remark 2.19. The second assertion follows from Corollary 2.13. Remark 2.23. The extension of the duality theorem to the case when P is replaced by a σ -finite measure m is straightforward. Indeed, in that case, we can find a disjoint measurable partition of  into sets n of finite measure, so that L p (, m; B) =  p ({L p (n , mn ; B)}) (where mn denotes the restriction of m to n ) and we are reduced to the finite case. Incidentally, the same idea allows to extend the duality to the case when the space L p (, m; B) can be identified with a direct sum  p ({L p (i , mi ; B)}i∈I ) over a set I of arbitrary cardinal, with each mi finite. Using this one could also remove the σ -finiteness assumption but we will never need this level of generality. By Corollary 2.15 we have Corollary 2.24. If B is reflexive, then L p (, A, P; B) is also reflexive for any 1 < p < ∞. Remark. Of course the preceding isometric duality holds for any dual space B∗ when the measure space is discrete (i.e. atomic). Remark. The preceding theorem does remain valid for p = 1. Note however that, if dim(B) = ∞, the B-valued simple functions are, in general, not dense in the space L∞ (, A, P; B) (see, however, Lemma 2.41). This is in sharp contrast with the finite-dimensional case, where they are dense because the unit ball, being compact, admits a finite ε-net for any ε > 0. We wish to emphasize that we defined the space L∞ (, A, P; B) (in Bochner’s sense) as the space of

60

Radon-Nikodým property

B-valued Bochner-measurable functions f (see §1.8) such that  f (.)B is in L∞ , equipped with its natural norm. This definition makes sense for any measure space (, A, P), and, with it, the preceding theorem does hold for p = 1.

2.4 Generalizations of L p (B) We now turn to a slightly different description of h p (, (An )n≥0 , P; B) that is more convenient in applications. Let (, A, m) be a finite measure space. (The σ -finite case can be treated as in Remark 2.23.) We denote by  p (, A, m; B), or simply by  p (m; B), or even simply  p (B) when there is no risk of confusion, the space of all vector measures μ : A → B such that |μ|  m and such is in L p (, A, m). We equip it with the norm: that the density d|μ| dm



d|μ|

μ p (m;B) =

.

dm

L p (m) It is not hard to check that this is a Banach space. Let μ ∈  p (B) so that |μ| = w.m with w p = μ p (B) < ∞. Then for any scalar simple function f = 1Ak fk in L p (Ak disjoint in A, fk ∈ C), if we denote   f dμ = μ(Ak ) fk , we have obviously



  



f dμ ≤ μ(A )| f | ≤ w dm| f | = | f |w dm k k k



Ak

≤  f  p w p . Therefore, we can extend the integral f → Iμ : L p (m) → B such that



f dμ to a continuous linear map

Iμ  ≤ w p . (2.14)   For convenience, we still denote Iμ ( f ) by f dμ. Note however that f dμ ∈ B. Remark 2.25. More generally, if B ⊂ A is a σ -subalgebra, the analogue of the conditional expectation of μ is simply its restriction μ|B to B. It is easy to check that the density of |μ|B | (with respect to |m|B |) is ≤ EB w. Thus conditional expectations define operators of norm 1 on  p (B) for any 1 ≤ p ≤ ∞. If (, A, m) is a countably generated probability space and if (An )n≥0 is a filtration of finite σ -subalgebras of A whose union generates A, then if p > 1,  p (B) can be identified with the space h p (, (An )n≥0 , m; B). Indeed, it is not

2.4 Generalizations of L p (B)

61

hard to check that for any martingale ( fn ) bounded in L p (B), the set function μ  defined on n≥0 An by the (stationary) limit  μ(A) = lim fk dm (2.15) k→∞ A

extends to a vector measure in  p (B) with μ p (B) = supn≥0  fn  p . Conversely, any vector measure μ defines such a martingale by simply restricting μ to An for each n ≥ 0. See the proof of Theorem 2.9. Thus we have proved: Proposition 2.26. The correspondence ( fn ) → μ defined by (2.15) is an isometric isomorphism from h p (, (An )n≥0 , m; B) to  p (, A∞ , m; B) for any p > 1. We leave the proof as an exercise. Remark 2.27. When p = 1, the space 1 (, A∞ , m; B) can be identified with the subspace of h1 (, (An )n≥0 , m; B) formed of all martingales such that ( fn B ) is uniformly integrable. Consider now f in L p (, m; B) and μ in  p (B∗ ). Assume first that f is a  B-valued simple function i.e. f = 1Ak fk with (Ak ) as before but now with fk  in B. We denote f · dμ = μ(Ak ), fk  ∈ C. Again we find    f · dμ ≤ μ(Ak ) fk  ≤ |μ|(Ak ) fk    ≤  f d|μ| =  f wdm ≤ w p  f L p (B) . Therefore, we may extend this integral to the whole of L p (B), so that we may define a ‘duality relation’ by setting  μ, f  = f · dμ. ∀ f ∈ L p (B) ∀μ ∈  p (B∗ ) Proposition 2.28. Let 1 ≤ p < ∞ and let p be conjugate to p, i.e. p−1 + p −1 = 1. Then, in the duality described earlier, we have an isometric identity L p (, A, m; B)∗ =  p (, A, m; B∗ ). Proof. We may as well assume m() = 1. If A is countably generated, then the proof follows from Propositions 2.20 and 2.26. The general case is similar using Remark 1.17. It is natural to wonder whether there is a alternate description of the dual of L p (B) as a space of B∗ valued functions. As the next statement shows, this is possible, but at the cost of a weakening of the notion of measurability.

62

Radon-Nikodým property

A function f :  → B∗ will be called weak∗ scalarly measurable if for every b in B the scalar valued function  f (.), b is measurable. Assume B separable. Let us denote by  p (, A, m; B∗ ) the space of (equivalence classes of) weak∗ scalarly measurable functions f :  → B∗ such that the function ω →  f (ω)B∗ (which is measurable since B is assumed separable) is in L p . We equip this space with the obvious norm   f  = (  f (ω)Bp∗ )1/p . We have then Theorem 2.29. Assume B separable. Then for any 1 ≤ p ≤ ∞, we have (isometrically)  p (, A, m; B∗ ) =  p (, A, m; B∗ ). Proof. We assume as before that A is generated by a filtration (An )n≥0 of finite algebras. We may clearly reduce to the case when m = P for some probability P. Assume first p > 1. By Proposition 2.20, it suffices to show how to identify h p (, (An )n≥0 , P; B∗ ) with  p (, A, P; B∗ ). Consider a martingale ( fn ) in h p (, (An )n≥0 , P; B∗ ). By the maximal inequality, ( fn ) is bounded a.s. and hence a.s. weak∗ compact. Let f (ω) be a weak∗ cluster point of ( fn ). Then for any fixed b ∈ B, the scalar martingale  fn (.), b converges a.s. Its limit must necessarily be equal to  f (.), b. This shows that f is weak∗ scalarly measurable. Let D be a countable dense subset of the unit ball of B. Since D is countable, and  f (.), b = limn→∞  fn (.), b for any b ∈ D, we have a.s.  f  = sup | f , b| ≤ lim  fn  b∈D

n→∞

and hence by Fatou’s lemma  f  p ≤ ( fn )h p . Note that for any b ∈ D, we have  fn (.), b = En ( f (.), b).

(2.16)

Conversely, consider now f ∈  p (, A, P; B∗ ). Fix n. Let A be an atom of  An . Then b → P(A)−1 A b, f  is a continuous linear form on B with norm ≤ P(A)−1 A  f B∗ . Let us denote it by fA . Let fn be the B∗ valued function that is equal to fA on each atom A ∈ An . We have clearly En (b, f ) = b, fn  and hence En (b, fn+1 ) = b, fn  for any b in D. Since D separates points, this

2.5 The Krein-Milman property

63

shows that ( fn ) is a martingale, and moreover  fn  = supb∈D |b, fn | ≤ En  f . It follows that  ( fn )h p ≤ (  f  p )1/p =  f  p . By (2.16), the correspondences ( fn ) → f and f → ( fn ) are inverses of each other. This shows that ( fn ) → f is an isometric isomorphism from h p to  p if p > 1. When p = 1, we use Remark 2.27. By the uniform integrability, (2.16) still holds so the same proof is valid. Corollary 2.30. Assume B separable. Then B∗ has the RNP iff any weak* scalarly measurable function f with values in B∗ is Bochner measurable. Proof. If B∗ has the RNP, we know by Theorem 2.22, that ∞ (B∗ ) = L∞ (B∗ ). In other words any bounded weak* scalarly measurable function is Bochner measurable. By truncation, the unbounded case follows. Conversely, if the latter holds we have ∞ (B∗ ) = L∞ (B∗ ) and B∗ has the RNP by Theorems 2.22 and 2.29.

2.5 The Krein-Milman property Throughout this section, let B be a Banach space over R. Recall that a point x in a convex set C ⊂ B is called extreme in C if whenever x lies inside a segment S = {θ y + (1 − θ )z | 0 < θ < 1} with endpoints y, z in C, then we must have y = z = x. Equivalently C\{x} is convex. See [21] and [10] for more information. Definition. We will say that a Banach space B has the Krein-Milman property (in short KMP) if every closed bounded convex set in B is the closed convex hull of its extreme points. We will show that RNP ⇒ KMP. The converse remains a well-known important open problem (although it is known that RNP is equivalent to a stronger form of the KMP; see later). We will use the following beautiful fundamental result due to Bishop and Phelps. Theorem 2.31 (Bishop-Phelps). Let C ⊂ B be a closed bounded convex subset of a (real) Banach space B. Then the set of functionals in B∗ that attain their supremum on C is dense in B∗ . We will need a preliminary simple lemma.

64

Radon-Nikodým property

Lemma 2.32. Let f , g ∈ B∗ be such that  f  = g = 1. Let 0 < ε < 1 and t > 0. Let C( f , t ) = {x ∈ B | x ≤ t f (x)}. (i) Assume that f (y) = 0 and y ≤ 1 imply |g(y)| ≤ ε/2. Then either  f − g ≤ ε or  f + g ≤ ε. (ii) Assume that t > 1 + 2/ε and that g is non-negative on C( f , t ). Then  f − g ≤ ε. Proof. (i) Let g1 ∈ B∗ be a Hahn Banach extension of the restriction of g to ker( f ) such that g1  ≤ ε/2. Then g − g1 = a f for some a ∈ R. Since f , g have norm 1 we must have 1 − ε/2 ≤ |a| ≤ 1 + ε/2, and hence either |a − 1| ≤ ε/2 or |a + 1| ≤ ε/2. Also g ± f = (a ± 1) f + g1 , and hence g ± f  ≤ |a ± 1| + ε/2. This yields (i). (ii) Assume f (y) = 0 and y ≤ 1. Let z = (2/ε)y. Since  f  = 1 there is x with x = 1 such that f (x) ≥ (1 + 2/ε)/t and f (x) > ε. Then x ± z ≤ 1 + 2/ε ≤ t f (x) = t f (x ± z), and hence x ± z ∈ C( f , t ). By our assumption on g, we have g(x ± z) ≥ 0, and hence |g(y)| = (ε/2)|g(z)| ≤ (ε/2)g(x) ≤ ε/2. By (i) we obtain  f − g ≤ ε since the other case is ruled out by our choice of x. Indeed,  f + g ≤ ε would imply f (x) + g(x) ≤ ε and hence g(x) ≤ ε − f (x) < 0. Proof of Theorem 2.31. Let t > 0 and let f ∈ B∗ with  f  = 1. We claim that there is a point x0 ∈ C such that C ∩ (x0 + C( f , t )) = {x0 }. Taking this for granted, let us complete the proof. Let 0 < ε < 1 and t > 1 + 2/ε. Note that C( f , t ) is a closed convex cone. Since t > 1, C( f , t ) has a non-empty interior denoted by U, which, by a well-known fact, is dense in C( f , t ). Note that x0 + U and C are disjoint (because 0 ∈ U). By Hahn-Banach separation (between the convex open set x0 + U and C) there is a 0 = g ∈ B∗ and α ∈ R such that g ≤ α on C and g ≥ α on x0 + C( f , t ). We must have g(x0 ) = α = sup{g(x) | x ∈ C}, and after normalization, we may assume g = 1. Then g ≥ α on x0 + C( f , t ) implies g ≥ 0 on C( f , t ). Therefore, by (ii) in Lemma 2.32,  f − g ≤ ε and g attains its supremum on C at x0 . This proves the Theorem. We now prove the claim. We will use Zorn’s lemma. We introduce the partial order on C defined by the convex cone C( f , t ): For x, y ∈ C we declare that x ≥ y if x − y ∈ C( f , t ). Note that this means x − y ≤ t( f (x) − f (y)). We will show that there is a maximal element in C. Let {xi } be a totally ordered subset. Since { f (xi )} is bounded in R (because C is bounded), it admits a convergent subnet and since xi − x j  ≤ t( f (xi ) − f (x j )) if xi ≥ x j , the subnet

2.5 The Krein-Milman property

65

(xi ) is Cauchy and hence converges to some limit x ∈ C (because C is closed). Clearly we have in the limit x − x j  ≤ t( f (x) − f (x j )), or equivalently x ≥ x j for all j. Thus, the order is inductive and we may apply Zorn’s lemma: There is a maximal element x0 ∈ C, which means ∀x ∈ C

x0 − x ≤ t( f (x0 ) − f (x)).

(2.17)

Then if x ∈ C ∩ (x0 + C( f , t )) we have x − x0 ∈ C( f , t ), which means x − x0  ≤ t f (x − x0 ) and hence by (2.17) x − x0  = 0. This shows that x0 is the only point in C ∩ (x0 + C( f , t )). Remark 2.33. Our goal in the next series of observations is to show that any slice of a closed bounded convex set C contains a (non-void) face. (i) Let x∗ ∈ B∗ be a functional attaining its supremum on C, so that if α = sup{x∗ (b) | b ∈ C}, the set F = {b ∈ C | x∗ (b) = α} is non-void. We will say that F is a face of C. We need to observe that a face enjoys the following property: If a point in F is inside the segment joining two points in C, then this segment must entirely lie in F. (ii) In particular, any extreme point of F is an extreme point of C. (iii) Now assume that we have been able to produce a decreasing sequence of sets · · · ⊂ Fn ⊂ Fn−1 ⊂ · · · F0 = C such that Fn is a face of Fn−1 for any n ≥ 1 and the diameter of Fn tends to zero. Then, by the Cauchy criterion, the intersection of the Fn ’s contains exactly one point x0 in C. We claim that x0 is an extreme point of C. Indeed, if x0 sits inside a segment S joining two points in C, then by (i) we have S ⊂ F1 , hence (since F2 is a face in F1 and x0 ∈ F2 ) S ⊂ F2 and so on. Hence S ⊂ ∩Fn = {x0 }, which shows that x0 is extreme in C. (iv) Assume that every closed bounded convex subset C ⊂ B has at least one extreme point. Then B has the KMP. To see this, let C1 ⊂ C be the closed convex hull of the extreme points of C. We claim that we must have C1 = C. Indeed, otherwise there is x in C\C1 and by Hahn-Banach there is x∗ ∗ in B∗ such that x|C < β and x∗ (x) > β. Assume first that this functional 1 achieves its supremum α = sup{x∗ (b) | b ∈ C}. This case is easier. Note α > β. Then let F = {b ∈ C | x∗ (b) = α}, so that F is a face of C disjoint from C1 . But now F is another non-void closed bounded convex set that, according to our assumption, must have an extreme point. By (ii) this point is also extreme in C, but this contradicts the fact that F is disjoint from C1 . In general, x∗ may fail to achieve its norm, but we can use the BishopPhelps theorem (Theorem 2.31) to replace x∗ by a small perturbation of

66

Radon-Nikodým property itself that will play the same role in the preceding argument. Indeed, by Theorem 2.31, for any ε > 0 there is y∗ in B∗ with x∗ − y∗  < ε that achieves its sup on C. We may assume b ≤ r for any b in C. Let γ = sup{y∗ (b) | b ∈ C} and note that γ > α − rε; and hence y∗ (b) = γ implies x∗ (b) > α − 2rε. Hence if ε is chosen so that α − 2rε > β,  = {b ∈ C | y∗ (b) = γ } is included in {b | x∗ (b) > β} we are sure that F  hence is disjoint from C1 . We now repeat the preceding argument: F must have an extreme point, by (ii) it is extreme in C hence must be in  ∩ C1 = ∅. This proves the claim, and hence B C1 , but this contradicts F has the KMP. (v) The preceding argument establishes the following general fact: let S be a slice of C, i.e. we assume given x∗ in B∗ and a number β so that S = {b ∈ C | x∗ (b) > β}, then if S is non-void it must contain a (non-void) face of C.

Theorem 2.34. The RNP implies the KMP. Proof. Assume B has the RNP. Let C ⊂ B be a bounded closed convex subset. Then by Theorem 2.9, C is dentable. So for any ε > 0, there is x in C such that x ∈ / conv(C\B(x, ε)). By Hahn-Banach separation, there is x∗ in B∗ and a number β such that the slice S = {b ∈ C | x∗ (b) > β} contains x and is disjoint from C\B(x, ε). In particular, we have b − x ≤ ε for any b in S, so the diameter of S is ≤ 2ε. By Remark 2.33 (v), S must contain a face F1 of C, a fortiori of diameter ≤ 2ε. Now we can repeat this procedure on F1 : we find that F1 admits a face F2 of arbitrary small diameter, then F2 also admits a face of small diameter, and so on. Thus, adjusting ε > 0 at each step, we find a sequence of (non-void) sets · · · ⊂ Fn+1 ⊂ Fn ⊂ · · · ⊂ F1 ⊂ F0 = C such that Fn+1 is a face of Fn and diam (Fn ) < 2−n . Then, by Remark 2.33 (iii), the intersection of {Fn } contains an extreme point of C. By Remark 2.33 (iv), we conclude that B has the KMP. Let C ⊂ B be a convex set. A point x in C is called ‘exposed’ if there is a functional x∗ such that x∗ (x) = sup{x∗ (b) | b ∈ C} and x is the only point of C satisfying this. (Equivalently, if the singleton {x} is a face of C.) The point x is called ‘strongly exposed’ if the functional x∗ can be chosen such that, in addition, the diameter of the slice {b ∈ C | x∗ (b) > sup x∗ − ε} C

tends to zero when ε → 0. Clearly, the existence of such a point implies that C is dentable. More precisely, if C is the closed convex hull of a bounded

2.6 Edgar’s Choquet theorem

67

set D, then D is dentable because every slice of C contains a point in D (see Remark 2.5). We will say that B has the ‘strong KMP’ if every closed bounded convex subset C ⊂ B is the closed convex hull of its strongly exposed points. It is clear (by (vi) ⇒ (i) in Theorem 2.9) that the strong KMP implies the RNP. That the converse also holds is a beautiful and deep result due to Phelps, for the proof of which we refer to [381]: Theorem 2.35. The RNP is equivalent to the strong KMP.

2.6 Edgar’s Choquet theorem Our goal is the following version due to Edgar [216] of Choquet’s classical theorem for compact convex subsets of (metrizable) locally convex spaces for which we refer to [16]. We follow the very simple proof outlined in [247]. Theorem 2.36. Let C be a bounded, separable, closed convex subset of a Banach space B with the RNP. For any z ∈ C, there is a Bochner-measurable function z :  → C on a probability space (, P) taking values in the set ex(C) of extreme points of C and such that z = E(z ). The proof is a beautiful application of martingale ideas, but there is a catch: it uses more sophisticated measure theory than used in most of this book, in particular universally measurable sets and von Neumann’s lifting theorem, so we start by a review of these topics. Recall that a subset A ⊂ X of a topological space X is called universally measurable if it is measurable with respect to any probability measure m on the Borel σ -algebra of X, denoted by BX . This means that for any ε > 0, there are Borel subsets A1 , A2 such that A1 ⊂ A ⊂ A2 and m(A2 \ A1 ) < ε. Clearly these sets form a σ -algebra, that we denote by U (BX ). If we are given a probability P on BX and A ∈ U (BX ), taking suitable unions and intersections we find Borel subsets A1 , A2 (depending on P) such that A1 ⊂ A ⊂ A2 and P(A2 \ A1 ) = 0. Therefore P obviously extends to a probability on U (BX ) (for such a set A we set P(A) = P(A1 ) = P(A2 )). A mapping ρ : X → Y between topological spaces will be called universally measurable if it is measurable when X, Y are equipped respectively with U (BX ), U (BY ). A topological space is called Polish if it admits a metric (compatible with the topology) with which it becomes a complete separable metric space, or

68

Radon-Nikodým property

equivalently if it is homeomorphic to a complete separable metric space. We will use the basic fact that a continuous image of a Polish space is universally measurable to prove the following lemma. Lemma 2.37. For any C as in the preceding Theorem, the set of extreme points ex(C) is universally measurable. Proof. Clearly C and hence C × C is Polish. Let g : C × C → C be defined by g(x, y) = (x + y)/2. Let εn > 0 be any sequence tending to 0. Let Fn = {(x, y) ∈ C × C | x − y ≥ εn }. Clearly Fn is closed and hence Polish, and C \ ex(C) = ∪n g(Fn ). By the basic fact just recalled, C \ ex(C) (or equivalently ex(C)) is universally measurable. In our context, von Neumann’s classical selection theorem implies in particular the following fact that we admit without proof (see e.g. [337] for a detailed proof). Lemma 2.38. For C and g as earlier, let denote the diagonal in C × C. Then there is a universally measurable lifting ρ : C \ ex(C) → (C × C) \ , i.e. we have (‘lifting property’) g(ρ(x)) = x for any x ∈ C \ ex(C). Proof of Theorem 2.36. Let I = {1, 2, . . .}. Let  = {−1, 1}I with P equal to the product of (δ1 + δ−1 )/2. We denote by B (resp. Bn ) the σ -algebra generated by the coordinates ω → ωk with k ∈ I (resp. with 1 ≤ k ≤ n), and let B0 denote the trivial σ -algebra. Then we ‘complete B’ by setting A = U (B). Fix a point z ∈ C. We will define a dyadic martingale ( fn ) indexed by the set I, such that f0 = z and formed of C valued Bochner measurable functions with fn ∈ L1 (, Bn , P). Since C is bounded, this martingale will be bounded in L1 (B), and by the RNP of B (and in some sense merely of C) it will converge in L1 (B) and a.s. Moreover we will adjust fn to make sure that f (ω) = limn→∞ fn (ω) takes its values in ex(C) for a.a. ω. The idea is very simple: if z ∈ ex(C) we are done, so assume z ∈ ex(C), then we can write z = (x + y)/2 with y − x = 0, x, y ∈ C. This gives us f1 : we set f1 (ω) = x if ω1 = 1 and f1 (ω) = y if ω1 = −1. Then we repeat the same operation on each point x, y to define the next variable f2 and so on. More precisely, assuming we have defined a dyadic C valued martingale f0 , . . . , fn relative to B0 , . . . , Bn we define fn+1 like this: either fn (ω) ∈ ex(C) in which case nothing needs to be done so we set fn+1 (ω) = fn (ω), or f (ω) ∈ C \ ex(C) and then we may write fn+1 (ω) = (x(ω) + y(ω))/2

2.6 Edgar’s Choquet theorem

69

with x(ω) = y(ω) both Bn -measurable. We then set fn+1 (ω) = x(ω) if ωn+1 = 1 and fn+1 (ω) = y(ω) if ωn+1 = −1. We can rewrite fn+1 as fn+1 (ω) = fn (ω) + ωn+1 (x(ω) − y(ω))/2. Note that at this point the measurable selection problem is irrelevant: since fn takes at most 2n values, we can lift them one by one. Up to now, the construction is simple minded. There is however a difficulty. If no further constraint is imposed on the choices of x(ω), y(ω), there is no reason for the eventual limit f (ω) to lie in the extreme points. Edgar’s original proof remedies this by repeating the construction ‘trans-finitely’ using martingales indexed by the set of countable ordinals instead of our set I and proving that one of the variables thus produced will be with values in ex(C) before the index reaches the first uncountable ordinal. The alternate proof in [247] uses a trick to accelerate the convergence. The trick is to define for any z ∈ C δ(z) = sup{x − y/2 | x, y ∈ C, z = (x + y)/2} and to select x(ω), y(ω) in the preceding construction such that x(ω) − y(ω)/2 > δ( fn (ω)) − δn where δn > 0 is any sequence (fixed in advance) such that δn → 0 when n → ∞. Since fn+1 (ω) − fn (ω) = ±(x − y)/2, this gives us  fn+1 − fn  > δ( fn ) − δn .

(2.18)

Then let f = lim fn . Let us now repeat the martingale step but now applied to f . The idea is to prove that we cannot continue further. We define (universally) measurable functions y, z :  → C as follows. Either f (ω) ∈ ex(C) and then we do nothing: we set y(ω) = z(ω) = f (ω), or f (ω) ∈ C \ ex(C) and then we may write f (ω) = (x(ω) + y(ω))/2 with x(ω) = y(ω) defined by ρ( f (ω)) = (x(ω), y(ω)). By Lemma 2.38 (here a measurable selection is really needed), x and y are universally measurable. To show that f (ω) ∈ ex(C) a.s. it suffices to show that x = y a.s. or equivalently that E(1A (x − y)) = 0 for any measurable A ∈ A, with P(A) > 0. Let us fix such a set A. The key lies in the following (we write En instead of EBn ) Claim: En ((x − y)1A ) ≤ δ( fn ) a.s. Let xˆ = 1\A f + 1A x

and

yˆ = 1\A f + 1A y.

70

Radon-Nikodým property

These A-measurable functions take their values in C and we have f = (xˆ + y)/2. ˆ Therefore fn = En f = (En xˆ + En y)/2, ˆ ˆ ≤ δ( fn ) and our claim and by the very definition of δ(.) we have En xˆ − En y/2 follows. Now by (2.18) we have E fn+1 − fn  + δn > Eδ( fn ) and hence, since fn converges in L1 (B), we must have Eδ( fn ) → 0. Therefore, by the preceding claim EEn ((x − y)1A ) → 0. But by Jensen (see (1.4)) E(En ((x − y)1A )) ≤ EEn ((x − y)1A ) and hence this tends to 0 but in fact E(En ((x − y)1A )) = E((x − y)1A ), so we conclude E((x − y)1A ) = 0. Since this holds for any A, we have x = y a.s. proving that f ∈ ex(C) a.s. Of course this statement is relative to A equipped with the obvious extension of P to A. More precisely, we just proved that there are subsets A1 , A2 ∈ B with A1 ⊂ { f ∈ ex(C)} ⊂ A2 such that P(A1 ) = P(A2 ) = 1. Thus if we define z as equal to f = lim fn on A1 and, say, equal to 0 on  \ A1 , then z satisfies the required properties.

2.7 The Lewis-Stegall theorem Let μ : A → B be a bounded vector measure and let m be a finite measure on A such that for some constant C ∀A ∈ A

μ(A) ≤ Cm(A).

(2.19)

Let uμ : L1 (m) → B be the bounded linear map defined by  ∀g ∈ L1 (m) uμ (g) = g dμ. 

It is easy to verify that uμ is bounded with uμ : L1 (m) → B = sup{m(A)−1 μ(A) | A ∈ A m(A) > 0}. (2.20)

2.7 The Lewis-Stegall theorem

71

Moreover, if μ admits a RN density ψ ∈ L1 (m; B) so that μ = ψ.m, then (2.19) implies ψ ∈ L∞ (m; B) with ψL∞ (m;B) = uμ : L1 (m) → B.

(2.21)

Indeed, (2.19) implies that for any closed convex subset β of {x ∈ B | x > C}  the set A = {ψ ∈ β} must satisfy m(A) = 0 (because otherwise m(A)−1 A ψdm ∈ β, which contradicts (2.19)), and since we may assume B separable, this implies m({ψ > C}) = 0 and hence we find ψL∞ (m;B) ≤ uμ : L1 (m) → B. The converse is obvious by (1.4). We say that a linear operator u : X → Y between Banach spaces factors through 1 if there is a factorization of u of the form w

v

u : X −→ 1 −→ Y with bounded operators v, w. We denote by 1 (X, Y ) the space of all such maps u : X → Y . Moreover, we define γ1 (u) = inf{v w} where the infimum runs over all possible such factorizations. The space 1 (X, Y ) equipped  with this norm is easily seen to be a Banach space (Hint: if u = un with  γ1 (un ) < ∞, we may factorize un as un = v n wn with v n  = wn  ≤ (1 + εn )γ1 (un )1/2 , then one can factor u through the 1 -space that is the 1 -direct sum of the 1 -spaces through which the un ’s factor). The reason why we introduce this notion is the following result due to Lewis and Stegall. Theorem 2.39. Let μ be a vector measure satisfying (2.19) so that uμ : L1 (m) → B is bounded. Then μ admits a Radon Nikodým density in L1 (m; B) (actually in L∞ (B)) iff uμ factors through 1 . We will use two lemmas. Lemma 2.40. Consider ψ ∈ L∞ (, A, m; B) taking only countably many values. Let μ = ψ.m be the associated vector measure. Then uμ ∈ 1 (L1 (m), B) and γ1 (uμ ) = ψL∞ (B) . Proof. By assumption, we have a countable partition of  into sets (An )n≥0 in A such that ψ is constant on each An , say ψ (ω) = xnfor all ω  in An . We define w : L1 (m) → 1 and v : 1 → B by w(g) = An g dm n≥0 and  v ((αn )) = αn xn . Then obviously w ≤ 1 and v ≤ sup xn  ≤ ψL∞ (B) . Since uμ = vw we obtain γ1 (uμ ) ≤ ψL∞ (B) . The converse follows from (2.21).

72

Radon-Nikodým property

Lemma 2.41. Assume B separable. The subspace of L∞ (, A, m; B) formed of functions with countable range (i.e. ‘simple functions’ but with countably many values) is dense in L∞ (, A, m; B). Proof. Let f ∈ L∞ (m; B). Since B is separable, for any ε > 0 there is a countable partition of B into Borel sets (Bεn ) of diameter at most ε. Let Aεn = f −1 (Bεn ), so that {Aεn | n ≥ 0} is a countable partition of  in A. Let xnε = m(Aεn )−1

 Aεn

f dm and

fε =

 n

1Aεn xnε .

Note that fε is the conditional expectation of f with respect to the σ -algebra generated by {Aεn | n ≥ 0}. Since Bεn has diameter at most ε, we have  f (ω) − f (ω ) ≤ ε for a.a. ω, ω ∈ Aεn and hence  f (ω) − xnε  ≤ ε for a.a. ω ∈ Aεn . Therefore we have  f − fε L∞ (B) ≤ ε. Proof of Theorem 2.39. If uμ factors through 1 , then μ = v μ for some μ (still satisfying (2.19)). Since 1 (I) has the 1 (I) valued vector measure  RNP,  μ and hence a fortiori v μ admits a RN derivative in L1 (m, B). Conversely, assume μ = f .m for some f in L1 (m; B). Then we may as well assume B separable. By (2.21) our assumption (2.19) implies  f L∞ (B) ≤ C. By Lemma 2.41 for any ε > 0 there is fε ∈ L∞ (B) with countable range such  that  f − fε L∞ (B) ≤ ε. Now taking ε(n) = 2−n (say) we find f = ∞ 0 ψn with ψ0 = fε(0) and ψn = fε(n) − fε(n−1) for any n ≥ 1 so that ∞ 0

ψn L∞ (B) < ∞

and each ψn is a countably valued function in L∞ (B). Let un : L1 (m) → B be the linear operator associated to the vector measure μn = ψn .m. Clearly  u = un . Therefore (since 1 (L1 (m), B) is a Banach space) we conclude by Lemma 2.40 that u factors through 1 . Remark. Note that for any f in L1 (, A, m; B) there is a countably generated σ -subalgebra B ⊂ A and a separable subspace B1 ⊂ B such that f ∈ L1 (, B, m; B1 ). Indeed, we already noticed that we can replace B by a separable subspace B1 . Then, let B be the σ -subalgebra generated by f (i.e. the one formed by the sets f −1 (β ) where β runs over all Borel subsets of B1 ). Then, B is countably generated since f is Bochner-measurable and trivially f ∈ L1 (, B, m; B). This implies that, if μ = f .m, we have uμ = uμ EB .

2.8 Notes and remarks

73

2.8 Notes and remarks For vector measures and Radon-Nikodým theorems, a basic reference is [21]. A more recent, much more advanced, but highly recommended reading is Bourgain’s Lecture Notes on the RNP [8]. For more on the convexity issues related to the Bishop-Phelps theorem, see [20]. For martingales in the Banach space valued case, the first main reference is Chatterji’s paper [178] where the equivalence of (i), (ii), (iii) and (iv) in Theorem 2.9 is proved. The statements numbered from 2.10 to 2.22 all follow from Chatterji’s result but some of them were probably known before (see in particular [271]). Rieffel introduced dentability and proved that it suffices for the RNP. The converse is (based on work by Maynard) due to Davis-Phelps and Huff independently. The Lewis-Stegall theorem in §2.7 comes from [329]. Theorem 2.34 is due to Joram Lindenstrauss and Theorem 2.35 to Phelps [381]. See [21] for a more detailed history of the RNP and more precise references. In §2.6 we present Edgar’s theorem (improving Theorem 2.34) that the RNP implies a Choquet representation theorem using Ghoussoub and Maurey’s simpler proof from [247] (see also [416]). We refer to [314] for more illustrations of the use of Banach valued martingales. Martingales have been used repeatedly as a tool to differentiate measures. See Naor and Tao’s [358] for a recent illustration of this. Our presentation of the RNP is limited to the basic facts. We will now briefly survey additional material. Charles Stegall [428] proved the following beautiful characterization of duals with the RNP: Stegall’s Theorem ([428]). Let B be a separable Banach space. Then B∗ has the RNP iff it is separable. More generally, a dual space B∗ has the RNP iff for any separable subspace X ⊂ B, the dual X ∗ is separable. In the 1980s, a lot of work was devoted (notably at the impulse of H.P. Rosenthal and Bourgain) to ‘semi-embeddings’. A Banach space X is said to semiembed in another one Y if there is an injective linear mapping u : X → Y such that the image of the closed unit ball of X is closed in B (and such a u is then called a semi-embedding). The relevance of this notion lies in Proposition 2.42. If X is separable and semi-embeds in a space Y with the RNP, then X has the RNP. Proof. One way to prove this is to consider a martingale ( fn ) with values in the closed unit ball BX of X. Let u : X → Y be a semi-embedding. If Y has

74

Radon-Nikodým property

RNP then the martingale gn = u( fn ) converges in Y to a limit g∞ such that g∞ (·) ∈ u(BX ) = u(BX ). Let now f (ω) = u−1 (g∞ (ω)). We will show that f is Borel measurable. Let U be any open set in X. By separability, there is a sequence {βn } of closed balls in X such that U = ∪βn . Then {ω | f (ω) ∈ U} = ∪n {ω | g∞ (ω) ∈ u(βn )} but since u(βn ) is closed and g∞ measurable we find that f −1 (U ) is measurable. This shows that f is Borel measurable. By Phillips’s theorem, f is Bochner measurable. Now, since gn = En (g∞ ) = En (u( f )) = u(En ( f )) we have fn = u−1 (gn ) = En ( f ), and hence fn converges to f a.s. This shows that X has the RNP (clearly one could use a vector measure instead of a martingale and obtain the RNP a bit more directly). We refer to [144] for work on semi-embeddings. More generally, an injective linear map u : X → Y is called a Gδ -embedding if the image of any closed bounded subset of X is a Gδ -subset of Y . We refer to [241, 244, 245, 247, 248] for Ghoussoub and Maurey’s work on Gδ -embeddings. To give the flavor of this work, let us quote the main result of [245]: A separable Banach space X has the RNP iff there is a Gδ -embedding u : X → 2 such that u(BX ) is a countable intersection of open sets with convex complements. The proof of Proposition 2.42 shows that the RNP is stable under Gδ -embedding. As mentioned in the text, it is a famous open problem whether the KMP implies the RNP. It was proved for dual spaces by Huff and Morris using the theorem of Stegall [428], see [10, p. 91], and also for Banach lattices by Bourgain and Talagrand ([10, p. 423]). See also Chu’s paper [181] for preduals of von Neumann algebras. Schachermayer [419] proved that it is true for Banach spaces isomorphic to their square. See also [420–423, 435] for related work in the same direction. We should mention that one can define the RNP for subsets of Banach spaces. One can then show that weakly compact sets are RNP sets. See [10, 77] for more on RNP sets. Many results from this chapter have immediate extensions to RNP operators: we say that an operator u : B1 → B2 between two Banach spaces has the RNP if it transforms B1 valued bounded martingales into a.s. convergent B2 valued ones. With this terminology, B has the RNP iff the identity on B is an RNP operator. See Ghoussoub and Johnson’s [242] for an example of an RNP operator that does not factor through an RNP space. This is in sharp contrast with the

2.8 Notes and remarks

75

situation for weakly compact operators, which, by a classical result from [203], do factor through reflexive spaces. A Banach space X is called an Asplund space if every continuous convex function defined on a (non-empty) convex open subset D ⊂ E is Fréchet differentiable on a dense Gδ -subset of D. Stegall [429] proved that X is Asplund iff X ∗ has the RNP. We refer the reader to [77] for more information in this direction. A metric characterization of the RNP based on Theorem 2.9 (see Chapter 13 for more on this theme) appears in [371]. In [179], Cheeger and Kleiner give a characterization of the RNP (for separable spaces) as those that can be represented as inverse limits of finitedimensional spaces with what they call the ‘determining property’. In [180], they establish differentiability almost everywhere for Lipschitz maps from a certain class of metric spaces (called PI spaces) to Banach spaces with the Radon-Nikodým property (RNP).

3 Harmonic functions and RNP

In contrast with the next chapter, we will mostly work in the present one with real Banach spaces (unless specified otherwise). Note however that the complex case is included since any complex space may be viewed a fortiori as a real one (but not conversely). We will denote by D the open unit disc in C. Its boundary ∂D, the unit circle, will be often identified with the compact group T = R/2π Z( R/Z) by the usual map taking t ∈ T to eit . From now on in this book, we reserve the notation m for the normalized Haar (or Lebesgue) measure on T (or ∂D), correspondingly identified with the measure dt/2π . Possible exceptions to this rule will be clear from the context. In the present chapter, we focus on functions on D, and postpone the discussion of harmonic functions on the upper half-plane to §4.6.

3.1 Harmonicity and the Poisson kernel Let V ⊂ C be an open subset of the complex plane. A function u : V → R is called harmonic if it is C2 and if u = 0. If V is simply connected (for instance if V is an open disc), every real valued harmonic function u : V → R is the real part of an analytic function F : V → C. This implies in particular that every harmonic function is C∞ . Also, just like the analytic ones, harmonic functions satisfy the mean value theorem, as follows. We say that a continuous function u : V → R satisfies the mean value property if for every z in V and every r > 0 such that B(z, r) ⊂ V , we have (recall dm(t ) = dt/2π )  u(z) = 76

u(z + reit ) dm(t ).

(3.1)

3.1 Harmonicity and the Poisson kernel Integrating in polar coordinates, it is easy to check that this implies  dadb . u(z) = u(a + ib) |B(z, r)| B(z,r)

77

(3.2)

The mean value property is closely connected to martingales, and in fact, as we will see in §3.5, fundamental examples of martingales are produced by composing a harmonic function with Brownian motion. It is well known that the mean value property characterizes harmonicity. Actually, an even weaker formulation is enough, as follows. We say that u has the weak mean value property if for every z in V , there is a sequence of radii rn > 0 with rn → 0 such that B(z, rn ) ⊂ V and  (3.3) ∀n u(z) = u(z + rn eit ) dm(t ). It is known that this implies that u is harmonic (and hence has the ordinary mean value property). Another classical property of harmonic functions is the maximum principle, which says that if K ⊂ V is compact then supz∈K |u(z)| is attained on the boundary of K. (Of course, the case K = D¯ follows immediately from the variant of the mean value formula appearing as (3.9).) Now assume D ⊂ V . The mean value property gives us  u(0) = u(eit )dm(t ). (3.4) In fact, there is also a formula that gives u(z) as an average of its boundary values on D, this is given by the classical Poisson kernel. We have for all z = reiθ in D  u(z) = u(eit )Pr (θ − t )dm(t ), (3.5) where Pr (s) =

 1 − r2 = r|n| eins . 2 n∈Z 1 − 2r cos s + r

We will use the notation Pz (t ) = Pr (θ − t ) =

 n≥0

zn e−int +

 n 0 and bn = 12 a¯−n for n < 0. ur (θ ) = u(reiθ ).

By (3.6) we have Pr (n) = r|n| , by (3.11) ur (n) = bn r|n| and f (n) = bn . Hence (3.5) boils down to the observation that ur (n) = Pr (n) f (n).

∀n ∈ Z

Here (as often) we use the classical fact that two L1 -functions (or more generally two measures) coincide as soon as their Fourier transforms are equal. Let us record in passing that Pr is a continuous function on T with Pr ∞ =

1+r 1−r

(0 ≤ r < 1)

and also that if 0 ≤ r, s < 1 Pr ∗ Ps = Prs .

(3.12)

3.1 Harmonicity and the Poisson kernel

79

We will use repeatedly the basic inequalities valid for any f ∈ L p (T, m) and any μ ∈ M(T), if 1 ≤ p ≤ ∞ and 0 ≤ r < 1 Pr ∗ f  p ≤  f  p

(3.13)

Pr ∗ μM ≤ μM .

(3.14)

These are easy to check using the fact that Pr 1 = 1. Indeed, Pr ∗ f (resp. Pr ∗ μ) appears as an average of translates of f (resp. μ), so this follows immediately from the convexity of the norm in L p (resp. M). The symmetry of the Poisson kernel, i.e. the fact that Pr (t ) = Pr (−t ), also plays an important role in the sequel. For instance, using Fubini’s theorem, this symmetry immediately implies that, if 1 ≤ p ≤ ∞, 1/p + 1/q = 1 and 0 ≤ r < 1, we have   g(Pr ∗ f )dm = (Pr ∗ g) f dm, (3.15) ∀ f ∈ L p (T, m) ∀g ∈ Lq (T, m) T

T

 ∀μ ∈ M(T) ∀ f ∈ C(T)



T

(Pr ∗ f )dμ =

T

(Pr ∗ μ) f dm. (3.16)

Classically, the Poisson kernel is the fundamental tool to solve the Dirichlet problem in the disc, i.e. the problem of finding a harmonic extension inside the disc of a function defined on the boundary. The extension process is given by the ‘Poisson integral’ of a function, defined as follows. Consider f in L1 (T, m) real valued, we define, for all z = reiθ in D  u(z) = f (t )Pr (θ − t )dm(t ) = Pr ∗ f (θ ) (3.17) or equivalently

 u(z) =

f (t )Pz (dt ).

(3.18)

More generally, given a real measure μ on T, we can define  u(z) = Pr (θ − t )μ(dt ) = Pr ∗ μ(θ ) or

 u(z) =

Pz (t )μ(dt ).

(3.19)

Note that the series appearing in (3.7) converge uniformly (and even absolutely) on T, hence   (3.20) zn μ(n) + z¯n μ(n). u(z) = n≥0

n 1 from this result. For observe that if 0 < w ≤ 1, we have |w ± t|2 ≤ |1 ± tw|2 for any 0 < t < 1, and hence 2 1 * * w ± p−1 ≤ 1 + p−1 w ; q−1 q−1 so we have ⎧ q q ⎫1/q * * p−1 p−1 ⎪ ⎨ w + q−1 ⎬ + w − q−1 ⎪ ⎪ ⎩

2

⎪ ⎭



⎧ q q ⎫1/q * * p−1 p−1 ⎪ ⎪ ⎨ 1 + q−1 w + 1 − q−1 w ⎬ ⎪ ⎩ "



⎪ ⎭

2 |1 + w| p + |1 − w| p 2

#1/p .

Now dividing through both sides of this equation by a factor w and setting y = 1/w, we obtain the preceding inequality for the case y > 1. Thus we need

5.13 Appendix: Hölder-Minkowski inequality only to show the inequality ⎧ q q ⎫ * * p−1 p−1 ⎪ ⎪ ⎨ 1 + q−1 y + 1 − q−1 y ⎬ ⎪ ⎩

⎪ ⎭

2

" ≤

|1 + y| p + |1 − y| p 2

207

#1/p

for the restricted case 0 < y ≤ 1 and 1 < p ≤ q ≤ 2. With the use of the binomial expansion, this inequality is equivalent to     k % p/q ∞  p  m q p−1 y2k , y2k ≤ q−1 k=0 2k k=0 2k     and for 1 < p ≤ q ≤ 2 the binomial coefficients 2kp and 2kq are both positive, and in addition      p−1 k p q p ≤ . q 2k q−1 2k Using the elementary result that for 0 < λ ≤ 1 and x > 0 (1 + x)λ ≤ 1 + λx, we have . / p/q    ∞  q   p − 1 k q p − 1 k 2k p ∞ 2k 1+ y ≤1+ y k=1 2k k=1 2k q−1 q q−1 ∞  p  y2k . ≤1+ k=1 2k

5.13 Appendix: Hölder-Minkowski inequality For further reference, we wish to review here a classical set of inequalities usually referred to as ‘the Hölder-Minkowski inequality’. Let 0 < q ≤ p ≤ ∞ and let (, A, m) be any measure space. Consider a sequence (xn ) in L p (, A, m). Then

 1/q

1/q 

 q q



|x | x  . (5.107) n n p



p

Indeed, this is an easy consequence of the fact (since p/q > 1) that L p/q is a normed space. In particular, when q = 1 we find

 



|xn | ≤ xn  p

p

208

The UMD property for Banach spaces

that is but the triangle inequality in L p . If 0 < p ≤ q ≤ ∞, the inequality is reversed: we have

 1/q

1/q 



q q



|x | x  . (5.108) n n p



p

In particular, when q = ∞, we find simply the obvious inequality



sup |xn | ≥ sup xn  p . n n p One way to check (5.108) is to set r = q/p, r = r(r − 1)−1 and yn = |xn | p . Then (5.108) is the same as

 1/r

1/r 



r r



|y | ≥  y n n 1



1

that is easy to derive from  1/r    |yn |r = sup αn |yn | αn ≥ 0 |αn |r ≤ 1 . Indeed, we find  

|yn |r

1/r

≥  sup

|αn |r ≤1

 

|αn | |yn | =



yn r1

1/r

.

In its simplest form (5.107) and (5.108) reduce to: ∀x, y ∈ L p (|x|q + |y|q )1/q  p ≤ (xqp + yqp )1/q

if

p≥q

(|x|q + |y|q )1/q  p ≥ (x||qp + yqp )1/q

if

p ≤ q.

It is easy to see that actually the preceding inequalities imply conversely (5.107) and (5.108), by iteration. In the opposite direction, one can easily deduce from (5.107) and (5.108) the following refinements of (5.107) and (5.108). Let ( , A , m ) be another measure space. Consider a measurable function F :  ×  → R. Then (5.107) and (5.108) become FL p (m;Lq (m )) ≤ FLq (m ;L p (m))

if

p≥q

(5.109)

FL p (m;Lq (m )) ≥ FLq (m ;L p (m))

if

p ≤ q.

(5.110)

The latter two inequalities reduce to the same one if one exchanges p and q: We have a norm 1 inclusion Lq (m ; L p (m)) ⊂ L p (m; Lq (m )) when p ≥ q. Essentially the same proof as for (5.107) and (5.108) establishes (5.109) and (5.110). One can also deduce the latter from (5.107) and (5.108) by a simple argument, approximating integrals by finite sums. Note that (5.107) and (5.108)  correspond to  = N equipped with the counting measure m = δn .

5.14 Appendix: basic facts on weak-L p

209

5.14 Appendix: basic facts on weak-L p Let (, m) be a measure space. Let 0 < p < ∞. For any Z ∈ L0 (, A, m), let  1/p Z p,∞ = supt>0 t p m({|Z| > t}) . The following inequality is immediate ∀x, y ∈ L0 (, A, m)

|x| ∨ |y| p,∞ ≤ (x pp,∞ + y pp,∞ )1/p .

More generally, for any sequence xn ∈ L0 (, A, m)  xn  pp,∞ )1/p .  supn |xn | p,∞ ≤ ( n

(5.111)

A fortiori there is a constant c such that x + y p,∞ ≤ c(x p,∞ + y p,∞ ). We denote by L p,∞ (, A, m) (or briefly L p,∞ if there is no ambiguity) the space of those Z such that Z p,∞ < ∞, and we equip it with the quasi-norm Z → Z p,∞ . The space L p,∞ is called weak-L p because it contains L p and we have obviously ∀Z ∈ L p

Z p,∞ ≤ Z p .

Moreover, if m is a probability (or is finite) then L p,∞ ⊂ Lq for any 0 < q < p, and there is a positive constant c = c(p, q) such that Zq ≤ cZ p,∞ .

(5.112)

Indeed, this is easy to check using (1.3) (replace p by q in (1.3)). More generally, given a Banach (or quasi-Banach) space B we denote by L p,∞ (, A, m; B) (or simply L p,∞ (B)) the space of those f ∈ L0 (, m; B) such that ω →  f (ω)B is in L p,∞ and we equip it with the quasi-norm  1/p  f L p,∞ (B) = supt>0 t p m{ f B > t} . When p = 1 (arguably the most important case) and also when 0 < p < 1, the quasi-norm Z → Z p,∞ is not equivalent to a norm (unless L0 (, A, m) is finite-dimensional). But it is so when p > 1: Proposition 5.72. Assume p > 1 and let p = p/(p − 1). Then for any Z ∈ L p,∞ we have " #  −1/p |Z|dm ≤ p Z p,∞ . (5.113) Z p,∞ ≤ sup m(E ) E∈A

E

In particular, Z → Z p,∞ is equivalent to a norm, the middle term in (5.113). Proof. Since this will be proved in §8.5 in the wider context of Lorentz spaces, we only indicate a quick direct proof. The lower bound in (5.113) is obtained by

210

The UMD property for Banach spaces

∞  choosing E = {|Z| > t}. For the upper bound, we use E |Z|dm = 0 m({|Z| > t} ∩ E )dt. Then, assuming Z p,∞ ≤ 1, we may write for any s > 0  ∞  1 1−p s , |Z|dm ≤ sm(E ) + m({|Z| > t} ∩ E )dt ≤ sm(E ) + p − 1 s E  and with the optimal choice of s, this yields E |Z|dm ≤ p m(E )1/p . Corollary 5.73. Let 0 < r < 1. Then for any sequence (xn ) in L1,∞ we have

 1/r

 1/r

 r −1/r r



|x x | ≤ (1 − r)  . (5.114) n n 1,∞







1,∞

More generally, let ( , A , m ) be another measure space. Then for any measurable function S on the product  ×  , we have

 1/r

1/r 





r −1/r r |S(·, ω )| dm (ω ) ≤ (1 − r) . S(·, ω )1,∞ dm (ω )





1,∞

(5.115) Proof. Let p = 1/r. Note that for any Z we have Z p,∞ = |Z| p 1/p 1,∞ . Let yn =    |xn |r . By (5.113) we have  |yn | p,∞ ≤ p yn  p,∞ , but  |yn | p,∞ =    ( |xn |r )1/r r1,∞ and yn  p,∞ = xn r1,∞ , and (5.114) follows. When S is a step function, (5.115) is essentially the same as (5.114). The general case follows by a routine approximation argument. Remark. In analogy with the Hölder-Minkowski inequality (5.109), the proof of (5.115) (with p/r in place of 1/r) actually shows that, for any 0 < r < p < ∞, we have a bounded inclusion Lr (m ; L p,∞ (m)) ⊂ L p,∞ (m; Lr (m )). Corollary 5.74. Let 0 < r < 1. Then for any finite sequence (xn ) in L1,∞ we have  



r 1/r



2 1/2

−1/r εn xn

|xn | )

E

≤ (1 − r) , (5.116) Ar ( 1,∞

1,∞

where Ar is the constant appearing in Khintchin’s inequality (5.7).  Proof. We apply (5.115) with ( , m ) = ( , ν) and S(·, ω ) = εn (ω )xn (·). Then, using (5.7), (5.116) follows.

5.15 Appendix: reverse Hölder principle The classical Hölder inequality implies that for any measurable function Z ≥ 0 on a probability space and any 0 < q < p < ∞ we have Zq ≤ Z p . By the ‘reverse Hölder principle’ we mean the following two statements

5.15 Appendix: reverse Hölder principle

211

(closely related to [164]) in which the behaviour of Z in Lq controls conversely its belonging to weak-L p . Our first principle corresponds roughly to the case q = 0. Proposition 5.75. Let 0 < p < ∞. For any 0 < δ < 1 and any R > 0 there is a constant Cp (δ, R) such that the following holds. Consider a random variable Z ≥ 0 and a sequence (Z (n) )n≥0 of independent copies of Z. We have then 0 1 supN≥1 P supn≤N N −1/p Z (n) > R ≤ δ ⇒ Z p,∞ ≤ Cp (δ, R). (5.117) Proof. Assume P{N −1/p supn≤N Z (n) > R} ≤ δ for all N ≥ 1. By independence of Z (1) , Z (2) , . . . we have 0 1 P supn≤N Z (n) ≤ RN 1/p = (P{Z ≤ RN 1/p })N , therefore P{Z ≤ RN 1/p } ≥ (1 − δ)1/N and hence P{Z > RN 1/p } ≤ 1 − (1 − δ)1/N ≤ c1 (δ)N −1 . Consider t > 0 and N ≥ 1 such that RN 1/p < t ≤ R(N + 1)1/p . We have P{Z > t} ≤ c1 (δ)N −1 ≤ c2 (δ, R)t −p . Since we trivially have P{Z > t} ≤ 1 if t ≤ R, we obtain as announced Z p,∞ ≤ (max{R, c2 (δ, R)})1/p . Corollary 5.76. For any 0 < q < p < ∞ there is a constant R(p, q) such that for any Z as in Proposition 5.75 we have Z p,∞ ≤ R(p, q) supN≥1 N − p supn≤N Z (n) q . 1

(5.118)

Proof. By homogeneity we may assume supN≥1 N −1/p supn≤N Z (n) q ≤ 1. Then P{N −1/p supn≤N Z (n) > δ −1/q } ≤ δ, so by Proposition 5.75 with R = δ −1/q and (say) δ = 1/2 we obtain (5.118). Remark. Conversely by (5.111) and (5.112) supN≥1 N −1/p supn≤N Z (n) q ≤ c supN≥1 N −1/p supn≤N Z (n)  p,∞ ≤ cZ p,∞ . Thus, using q = 1, we find an alternate proof that Z → Z p,∞ is equivalent to a norm when p > 1. The following Banach space valued version of the ‘principle’ is very useful. Let B be an arbitrary Banach space and let f :  → B be a B-valued random variable. We will denote again by f (1) , f (2) , . . . a sequence of independent copies of the variable f . Proposition 5.77. For any 1 ≤ q < p < ∞ there is a constant R (p, q) such that any f in Lq (B) with E( f ) = 0 satisfies  f L p,∞ (B) ≤ R (p, q) supN≥1 N −1/p  f (1) + · · · + f (N ) Lq (B) . Moreover, this also holds for 0 < q < p < ∞ if we assume f symmetric.

212

The UMD property for Banach spaces

Proof. Assume q ≥ 1 and N −1/p  f (1) + · · · + f (N ) Lq (B) ≤ 1 for all N ≥ 1. By Corollary 1.41 we have





sup N −1/p  f (1) + · · · + f (n) B ≤ 21+1/q



1≤n≤N

q

and hence by the triangle inequality





sup N −1/p  f (n) B ≤ 22+1/q .



1≤n≤N

q

Therefore we conclude by Corollary 5.76 applied to Z(·) =  f (·)B . If 0 < q < 1 and f is symmetric, the same argument works but using (1.43).

5.16 Appendix: Marcinkiewicz theorem In the next statement, it will be convenient to use the following terminology. Let X, Y be Banach spaces, let (, m), ( , m ) be measure spaces and let T : L p (m; X ) → L0 (m ; Y ) be a linear operator. We say that T is of weak type (p, p) with constant C if we have for any f in L p (m; X ) T f  p,∞ = (supt>0 t p m (T f  > t ))1/p ≤ C f L p (X ) . We say that T is of strong type (p, p) if it bounded from L p (X ) to L p (Y ). We invoke on numerous occasions the following famous classical result due to Marcinkiewicz. Although we will prove a more general result in Chapter 8, we include a quick direct proof here for the convenience of the reader, in case he/she is reluctant to go into general interpolation theory. Theorem 5.78 (Marcinkiewicz). Let 0 < p0 < p1 ≤ ∞. In the preceding situation, assume that T is both of weak type (p0 , p0 ) with constant C0 and of weak type (p1 , p1 ) with constant C1 . Then for any 0 < θ < 1, T −1 −1 is of strong type (pθ , pθ ) with p−1 θ = (1 − θ )p0 + θ p1 , and moreover we have T : L pθ (X ) → L pθ (Y ) ≤ K(p0 , p1 , p)C01−θ C1θ where K(p0 , p1 , p) is a constant depending only on p0 , p1 , p. Proof. Let f ∈ L p0 (X ) ∩ L p1 (X ). Consider a decomposition f = f0 + f1 with f0 = f · 1{ f >γ λ}

and

f1 = f · 1{ f ≤γ λ} ,

5.16 Appendix: Marcinkiewicz theorem

213

where γ > 0 and λ > 0 are fixed. We have by our assumptions  −1 p0 m (T ( f0 ) > λ) ≤ (C0 λ )  f  p0 dm { f >γ λ}  −1 p1  f  p1 dm m (T ( f1 ) > λ) ≤ (C1 λ ) { f ≤γ λ}

hence since T ( f ) ≤ T ( f0 ) + T ( f1 )   m (T ( f ) > 2λ) ≤ C0p0 λ−p0  f  p0 dm + C1p1 λ−p1  f >γ λ

 f ≤γ λ

 f  p1 dm.

(5.119) Let p = pθ . If we now multiply (5.119) by 2 p pλ p−1 and integrate with respect to λ, using  λ p−p0 −1 dλ = (p − p0 )−1 ( f /γ ) p−p0 { f >γ λ}

and

 { f ≤γ λ}

λ p−p1 −1 dλ = (p1 − p)−1 ( f /γ ) p−p1 ,

we find    2 p pC0p0 γ p0 −p 2 p pC1p1 γ p1 −p  f  p dm + T ( f ) p dm ≤  f  p dm. p − p0 p1 − p Hence, we obtain the estimate 2p1/pC0p0 /p γ (p0 −p)/p 2p1/pC1p1 /p γ (p1 −p)/p + , T : L p (X ) → L p (Y ) ≤ (p − p0 )1/p (p1 − p)1/p so that choosing γ so that C0p0 γ p0 −p = C1p1 γ p1 −p we finally find the announced result with K(p0 , p1 , p) = 2p1/p (p − p0 )−1/p + 2p1/p (p1 − p)−1/p .

(5.120)

Remark 5.79. It is fairly obvious and well known that the preceding proof remains valid for ‘sublinear’ operators. Indeed, all that we need for the operator T is the pointwise inequalities T ( f0 + f1 )B ≤ T ( f0 )B + T ( f1 )B for any pair f0 , f1 in L p0 (X ) ∩ L p1 (X ), and also the positive homogeneity, i.e. ∀λ ≥ 0, ∀ f ∈ L p0 (X ) ∩ L p1 (X ) T (λ f )B = λT ( f )B .

214

The UMD property for Banach spaces

5.17 Appendix: exponential inequalities and growth of L p-norms In many cases the growth of the L p -norms of a function when p → ∞ can be advantageously reformulated in terms of its exponential integrability. This is made precise by the following elementary and well-known lemma. Lemma 5.80. Fix a number a > 0. The following properties of a random variable f ≥ 0 are equivalent: (i) sup p≥1 p−1/a  f  p < ∞ (ii) There is a number t such that E exp | f /t|a ≤ e. Moreover, let  f exp La = inf{t ≥ 0 | E exp | f /t|a ≤ e}. There is a constant C such that for any f ≥ 0 we have C−1 sup p≥1 p−1/a  f  p ≤  f exp La ≤ C sup p≥1 p−1/a  f  p . Proof. Assume that the supremum in (i) is ≤ 1. Then ∞ ∞ E exp | f /t|a = 1 + E| f /t|an (n!)−1 ≤ 1 + (an)nt −an (n!)−1 1

1

hence by Stirling’s formula for some constant C ∞ ∞ (an)nt −an n−n en = 1 + C (at −a e)n ≤ 1+C 1

1

from which it becomes clear (since 1 < e) that (i) implies (ii). Conversely, if (ii) holds we have a fortiori for all n ≥ 1 a (n!)−1  f /tan an ≤ E exp | f /t| ≤ e

and hence 1

1

1

1

1

 f an ≤ e an (n!) an t ≤ e a n a t = (an) a t(e/a)1/a , which gives  f  p ≤ p1/at(e/a)1/a for the values p = an, n = 1, 2, . . . . One can then easily interpolate (using Hölder’s inequality) to obtain (i). The last assertion is now a simple recapitulation left to the reader.

5.18 Notes and remarks The inequalities (5.1) and (5.2) were obtained in a 1966 paper by Burkholder. We refer the reader to the classical papers [165] and [175] for more on this. See also the book [34]. The best constant β p in (5.1) is equal to p∗ − 1 where p∗ = max{p, p } and 1 < p < ∞. It is also the best constant when we restrict

5.18 Notes and remarks

215

to constant multipliers (ϕn ). Thus the unconditionality constant of the Haar system in L p is also equal to p∗ − 1, and we have Cp (L p ) = p∗ − 1. n  p ≤ (p∗ − 1) sup Mn  p holds for any More generally, the inequality sup M pair of martingales with values in a Hilbert space H such that 0 H ≤ M0 H M

and

n − M n−1 H ≤ Mn − Mn−1 H M

(pointwise) for all n ≥ 1. These are called ‘differentially subordinate’ by Burkholder. Of course this implies that for any B with dim(B) ≥ 1 we have Cp (B) ≥ p∗ − 1, with equality in the Hilbert space case. Incidentally, the equality C2 (B) = 1 implies that B is isometric to Hilbert space. Indeed, this implies that the Gaussian (or Rademacher) K-convexity constant as defined in [79, p. 20] is also = 1, then [79, Theorem 3.11] implies the desired isometry, for any finite-dimensional subspace, which is enough. For (5.8) the best constants are also partially known: We have for any 1 < p0 λP{ f˜n  > λ} ≤ 2 supn  fn L1 (B) . √ The best constant β1 in (5.10) is equal to e. This is due to Cox [191]. As noted by Burkholder, by Kwapie´n’s theorem in [315], the differentially subordinate extensions of inequalities such as (5.1) or (5.2) in the Banach valued case can hold only if the space is isomorphic to a Hilbert space. For all the preceding sharp inequalities we refer the reader to [165, 170, 173, 174], to [34] and also to Os˛ekowski’s more recent book [72]. We

216

The UMD property for Banach spaces

recommend Burkholder’s surveys [172, 174], and the collection of his selected works in [14]. The best constants in the Khintchin inequalities are known: see [260, 430]. Szarek [430] proved that A1 = 2−1/2 . More generally, let γ p be the L p -norm of a standard Gaussian distribution (with mean zero and variance 1). It is well known that  √ 1/p 0 < p < ∞. γ p = 21/2 ((p + 1)/2)/ π = 1.87 . . . be the unique solution in the interval ]1, 2[ of the equation √ = γ p (or explicitly ((p + 1)/2) = π /2), then Haagerup (see [260])

Let p0 1/2−1/p

2 proved:

A p = 21/2−1/p

0 < p ≤ p0 ,

(5.122)

Ap = γp

p0 ≤ p ≤ 2,

(5.123)

Bp = γp

2 ≤ p < ∞.

(5.124)

The lower bounds A p ≥ max{γ p , 21/2−1/p } for p ≤ 2 and B p ≥ γ p for p ≥ 2 are easy exercises (by the Central Limit Theorem). For Kahane’s inequalities, some of the optimal constants are also known, in 1 1 particular (see [318]), if 0 < p ≤ 1 ≤ q ≤ 2, we have K(p, q) = 2 p − q . Kahane’s inequalities follow from the results in the first edition of [44]. The idea to derive them from the 2-point hypercontractive inequality is due to C. Borell. See [110, 256, 336] for generalizations. The property UMD was introduced by B. Maurey and the author (see [343]), together with the author’s observation that Burkholder’s ideas could be extended to show that UMD p ⇔ UMDq for any 1 < p, q < ∞ (Theorem 5.13). It was also noted initially that UMD p implies super-reflexivity (and a fortiori reflexivity), but not conversely. See Chapter 11 for more on this. Proposition 5.12 appears in [167, Theorem 2.2]. The Gundy decomposition appearing in Theorem 5.15 comes from [257]. The extrapolation principle appearing in §5.5 (sometimes called ‘good λinequality’) is based on the early ideas of Burkholder and Gundy ([175]), but our presentation in Lemma 5.23 was influenced by the refinements from [325]. As for Lemma 5.26 and the other statements in §5.5, our source is Burkholder in [165, 174]. §5.6 is a simple adaptation to the B-valued case of Burgess Davis classical results from [201]. The examples presented in §5.7 are sort of ‘folkloric’. §5.8 is due to B. Maurey [343]. Note that if a UMD Banach space B has an unconditional basis (en ), then the functions t → hk (t )en , form an unconditional

5.18 Notes and remarks

217

basis for L p ([0, 1]; B). We leave the proof as an exercise (hint: use Remark 10.53). Conversely, Aldous proved that if L p ([0, 1]; B) has an unconditional basis then B is necessarily UMD, see [100] for details. The Burkholder-Rosenthal inequality in Theorem 5.50 appears in [165]. It was preceded by Rosenthal’s paper [415] from which Corollaries 5.54 to 5.56 are extracted. §5.10 is due to Bourgain [137], but the original Stein inequality comes from [85]. §5.11 is motivated by Burkholder’s characterization of UMD spaces in terms of ζ -convexity (Theorem 5.64), for which we refer to [167, 169, 171, 173]. Theorem 5.69 is due to Kalton, Konyagin and Vesely [302]. Its proof is somewhat similar to the renorming of super-reflexive spaces proved earlier in [387] but presented later on in this volume (see Chapters 10 and 11). In sharp contrast, Burkholder’s proof that UMD implies zeta convexity is based on the weak-type (1,1) bound (5.31). An excellent detailed presentation of Burkholder’s method is given in Os˛ekowski’s book [72].

6 The Hilbert transform and UMD Banach spaces

6.1 Hilbert transform: HT spaces The close connection between martingale transforms and the Hilbert transform or more general ‘singular integrals’ was noticed very early on. In this context, the classical Calderón-Zygmund (CZ in short) decomposition is a fundamental tool (see e.g. [86]). (We saw its martingale counterpart, the Gundy decomposition, in the preceding chapter.) Thus it is not surprising that in the Banach space valued case, the two kinds of transforms are bounded for exactly the same class of Banach spaces. The goal of this chapter is the proof of this equivalence. Definition 6.1. Let 1 < p < ∞. Let m denote the normalized Haar measure on the torus T. A Banach space B is called HT p if there is a constant C such that for any finitely supported function x : Z → B we have 31/p 2 31/p 2    zn xn − zn xn  p dm(z) ≤C  zn xn  p dm(z) .  n>0

We will denote

n0

(6.1)

n 0 we have obviously ˜ = −i#(zn )) and for n = 0 we must have u˜ = 0. From u(z) ˜ = (zn ) (resp. u(z) this it is easy to deduce that f˜ = H T f . Since H T is bounded on L2 (T), this shows that u → u˜ is bounded on h2 (D), since the latter can be identified with L2 (T; R). When f is smooth enough, in particular if it is a trigonometric polynomial, we can rewrite f˜(θ ) as a ‘principal value integral’, i.e. we have  θ −t ) f (t )dm(t ), (6.7) cot( f˜(θ ) = lim ε→0 |θ−t|>ε 2 which is traditionally written f˜(θ ) = p.v.



 cot

θ −t 2

 f (t )dm(t ),

to emphasize that the integral is not absolutely convergent. More generally, we will verify (6.7) for any f such that, for some constants c and δ > 0, we have | f (t ) − f (θ )| ≤ c|t − θ |δ for all θ , t ∈ T. Indeed, since the

6.1 Hilbert transform: HT spaces

221

 )dm(t ) = 0 and hence we have for any θ ∈ T integrand is odd, |θ−t|>ε cot( θ−t 2       θ −t θ −t f (t )dm(t ) = ( f (t ) − f (θ ))dm(t ), cot cot 2 2 |θ−t|>ε |θ−t|>ε and now since | cot( θ−t )( f (t ) − f (θ ))| ∈ O(|θ − t|δ−1 ) we have absolute con2 vergence and the limit in (6.7) exists for any θ ∈ T. Furthermore, again the oddness of the integrand implies  2r sin(θ − t ) ( f (t ) − f (θ ))dm(t ) u(re ˜ iθ ) = 1 − 2r cos(θ − t ) + r2 and hence by dominated convergence    θ −t iθ ˜f (θ ) = lim u(re ( f (t ) − f (θ ))dm(t ), ˜ ) = cot r→1 2    which is the same as limε→0 |θ−t|>ε cot θ−t f (t )dm(t ). This proves (6.7) (and 2 also that the radial limits exist for any θ ) for any such f . Moreover f˜ is bounded. Let g ∈ L1 (T, m). We have      θ −t ( f (t ) − f (θ ))dm(t )dm(θ ), g(θ ) f˜(θ )dm(θ ) = g(θ ) cot 2 and hence we note for further reference that if f , g are supported on disjoint compact sets (so that g(θ ) f (θ ) vanishes identically), this implies      θ −t g(θ ) f (t )dm(t )dm(θ ), g(θ ) f˜(θ )dm(θ ) = cot 2 which we view as expressing that the transformation f → f˜ admits cot( θ−t ) as 2 its kernel. Given a real Banach space BR , we may view it as embedded isometrically in a complex one B such that B = BR + iBR . Then we can define the conjugate function u˜ for any u ∈ h p (D; B) that is a BR valued polynomial of the form (3.33), and similarly for f˜. With this definition it becomes clear, by Theorem 3.4, that B is HTp (1 < p < ∞) iff the conjugation u → u˜ extends to a bounded linear map on  h p (D; B). We will return to the theme of conjugate functions in §7.1. A parallel notion of Hilbert transform and property HT can be introduced with T and D replaced by R and the upper half-plane U, but the resulting property HT is equivalent with the same constant. Since this equivalence will be needed in the sequel, we wish to prove this now, based on classical ideas. We will denote by H R the Hilbert transform on R. This is defined as the operator on L2 (R) that acts by multiplication of the Fourier transform by the function ∀x ∈ R

ϕ(x) = −i sign(x).

222

Hilbert transform and UMD Banach spaces

This means that for any test function f ∈ L2 (R) with Fourier transform  f (y) = f (x)e−ixy dx, the image g = H R ( f ) is determined by the identity g = ϕ f.

(6.8)

When f is smooth enough, say if it is C∞ with compact support, it is well known that we can rewrite H R f (x) as a ‘principal value integral’, i.e. we have at any point x   1 1 1 1 R f (t )dt = p.v. f (t )dt. (6.9) H f (x) = lim π ε→0 |x−t|>ε x − t π x−t 1 We may then observe that since the kernel x−t 1[−1,1] (x − t ) is odd, as earlier for the circle, we have   1 1 f (t )dt = ( f (t ) − f (x)1[−1,1] (x − t ))dt x − t x − t |x−t|>ε |x−t|>ε

and hence, since f is differentiable at x, we can write H R f (x) as an absolutely convergent integral, namely  1 1 ( f (t ) − f (x)1[−1,1] (x − t ))dt. H R f (x) = π x−t Then if g is a bounded measurable function with compact support we have    1 1 g(x)H R f (x)dx = ( f (t ) − f (x)1[−1,1] (x − t ))dtdx, g(x) π x−t and if we assume that f and g are supported on disjoint compact sets, so that f (x)g(x) = 0 for all x, we find    1 1 g(x) f (t )dtdx, (6.10) g(x)H R f (x)dx = π x−t and this integral is absolutely convergent. This identity (6.10) holds for any pair f , g of C∞ functions with disjoint compact supports but, by density, it clearly remains true for any pair f , g ∈ L2 (R) supported on disjoint compact sets. This 1 is the kernel associated to the (singular integral) expresses the fact that π1 x−t operator H R . Remark 6.4. Let K(x, t ) be a measurable function on R × R bounded on {(x, t ) | |x − t| > δ} for any δ > 0. By a repeated application of Lebesgue’s

6.1 Hilbert transform: HT spaces

223

Theorem (3.35) we have for almost all (x , t ) ∈ R × R  x +δ1  t +δ2 1 1 K(x, t )dxdt. K(x , t ) = limδ1 →0 limδ2 →0 2δ1 2δ2 x −δ1 t −δ2 Therefore, if thereis an operator  T : L2 (R) → L2 (R) admitting K as its kernel, i.e. such that gT ( f ) = g(x)K(x, t ) f (t )dxdt for any pair f , g ∈ L2 (R) supported on disjoint compact sets, then K is uniquely determined by T on the complement of a negligible set. (Note that the diagonal is negligible in R × R.) We will use the following very simple characterization of the Hilbert transform on R. For any r > 0 let us denote by Dr : L2 (R) → L2 (R) the (isometric) dilation operator Dr defined by Dr f (x) = f (x/r)r−1/2 . Let us denote by S : L2 (R) → L2 (R) the (isometric) symmetry defined by S f (x) = f (−x). Theorem 6.5. Let T : L2 (R) → L2 (R) be an operator on L2 (R) commuting with all translations, with all dilations {Dr | r > 0} and such that T S = −T S. Then T is a multiple of the Hilbert transform H R . Proof. Since T commutes with translations, it must be a ‘convolution’ or equivalently there is an associated multiplier ϕ in L∞ (R) such that T4f = ϕ fˆ. Note that for any r > 0 ∀y ∈ R

5 ˆ D r f (y) = f (ry)r

ˆ ˆ   so that D r T f (y) = f (ry)ϕ(ry)r and T Dr f (y) = ϕ(y) f (ry)r. Thus if T Dr = Dr T , we must have ϕ(ry) = ϕ(y) for all r > 0, which clearly implies that ϕ is constant (almost everywhere) both on (0, ∞) and on (−∞, 0). In particular, after a negligible correction, we may assume ϕ = ϕ(1) (resp. ϕ = ϕ(−1)) 4f (y) = fˆ(−y), T S = −T S implies on (0, ∞) (resp. (−∞, 0)). Finally, since S ϕ(−y) = −ϕ(y). Thus we conclude that ϕ(y) = ϕ(1)sign(y) = c · (−i sign(y)) with c = iϕ(1), completing the proof. Remark 6.6. Assuming more regularity of the kernel, a second proof can be derived from the following observations. Let (x, t ) → K(x, t ) be a function defined for x = t. Suppose that K(x, t ) = K(x + s, t + s) for any s ∈ R

224

Hilbert transform and UMD Banach spaces

(invariance by translation), that K(x, t ) = ρ −1 K(x/ρ, t/ρ) for any ρ > 0 (invariance by dilation) and K(x, t ) = −K(t, x). This implies that K(x, t ) = (x − t )−1 K(1, 0). Indeed, taking ρ = x − t, we have K(x, t ) = K(x − t, 0) = (x − t )−1 K(1, 0) when x > t, and again K(x, t ) = −K(t, x) = (x − t )−1 K(1, 0) when x < t. We will say that a Banach space B is HT p on R with constant C if the operator H R ⊗ IdB extends to a bounded operator on L p (R; B) with norm = C. We will show that this holds iff B is HT p on T. That HT p on R implies HT p on T follows easily from the fact that the respective kernels have the same singularity at x = 0, namely we have cot 2x ≈ 2x when x → 0, see Remark 6.9 for more details. We will use the following basic fact in §6.3. Proposition 6.7. If a Banach space B is HT p on R with constant C then B is HT p (on T) with constant CHT p (B) ≤ C. Proof. Since the reduction to this case is easy, we assume B finite-dimensional.   Let f = eint xn ∈ L p (T; B) and ξ = eint yn ∈ L p (T; B∗ ) be trigonometric polynomials. Since   ξ (−t ), (H T f )(t )dm(t ) = yn , xn ϕ(n) , it suffices to show that  yn , xn ϕ(n) ≤ C f L p (T;B) ξ L p (T;B∗ ) .

(6.11)

Let ε > 0. Consider the Gaussian kernel on R 

gε (x) = (2π ε)−1/2 e−x

2

/2ε

.

Then gε (x) = e−εx /2 , gε (x)dx = 1, gε ∗ gε = gε+ε (∀ε > 0). Recall the Fourier inversion and self-duality formulae 2

∀ ψ, χ ∈ L2 (R) ψ (x) = 2π ψ (−x) and   ψ (x)χ (−x)dx = 2π ψ (x)χ (x)dx.

(6.12)

We now wish to replace f and ξ by (2π -periodic) functions of a real variable (instead of functions on T), so we denote by f and ξ the corresponding functions, defined by:   e−inx xn and ξ (x) = e−inx yn . ∀x ∈ R f (x) = Note that x → e−inx is the Fourier transform of the Dirac measure δn .

6.1 Hilbert transform: HT spaces

225

5 5 and χε = (1/2π )ξ g1/p Then, the functions ψε = (1/2π ) f g1/p ε admit Fourier ε transforms respectively  4ε = f ∗ g1/p = xn δn ∗ g1/p ψ ε ε χε = ξ ∗ g1/p ε =



ξm δm ∗ g1/p ε

and satisfy (note the special reverse choice of p, p here)  χε (−x), H R (ψε )(x)dx ≤ Cψε L (R;B) χε L (R;B∗ ) . p p

(6.13)

We plan to show that when ε → 0, (6.13) tends to (6.11). By (6.12), we have   χε (−x), H R (ψε )(x)dx = (1/2π ) ϕ(y)χε (y), ψ 4ε (y)dy . (6.14) Note



5 p |g1/p ε (y)| dy =

 √ 6 2 p/2 p/2 2π ε(p ) e−pp εy /2 dy = 2π (p ) / pp .

By the form of the Gaussian kernel, since f is 2π -periodic, we have   1/p p p 5  f  |gε | dy − lim ε→0



dy  f (y) 2π p

0



5 1/p p |gε (y)| dy = 0.

Therefore

lim ψε L p (R;B) = (2π )−1/p

ε→0

6

6 p ( pp )−1/p  f L p (T;B)

and similarly √ 6 lim χε L p (R;B∗ ) = (2π )−1/p p( pp )−1/p ξ L p (T;B∗ ) ,

ε→0

which together gives us 2π lim ψε L p (R;B) χε L p (R;B∗ ) =  f L p (T;B) ξ L p (T;B∗ ) . ε→0

As for the other side, we have   4ε (y)dy = xn , ym aεn,m ϕ(y)χε (y), ψ n,m

with aεn,m =





1/p ϕ(y)(δm ∗ g1/p ε )(y)(δn ∗ gε )(y)dy,

226

Hilbert transform and UMD Banach spaces

but, distinguishing the cases n = m and n = m, we have (easy verification)  ε lim an,n = lim ϕ(y + n)gε (y)dy = ϕ(n) ∀n ∈ Z ε→0 ε→0  1/p lim |aεn,m | ≤ ϕ∞ lim g1/p ε (y + n − m)gε (y) = 0 ∀n = m ∈ Z. ε→0

ε→0

Thus, taking the limit of (6.13) when ε → 0, and recalling (6.14) we obtain (6.11). For the sake of completeness, we now choose, among many available possibilities, a quick argument to show that the converse is also true and the constants in Proposition 6.7 are actually equal. Proposition 6.8. If a Banach space B is HT p (on T) then it is HT p on R with constant C ≤ CHT p (B). ) (resp. Proof. The idea of the proof is that the kernel KT (t, θ ) = (2π )−1 cot( t−θ 2 1 ) of H T (resp. H R ) are such that KR (t, θ ) = π −1 t−θ lim sKT (st, sθ ) = KR (t, θ ).

s→0

Heuristically, the real line behaves like a circle of radius 1/s → ∞. Again we may assume B finite-dimensional. Let R > 0. Let f , g ∈ L2 (R) with support in [−R, R]. Then for any 0 < r < π /R we define a function r f ∈ L2 (T, m)) by setting ∀t ∈ [−π , π ]

r

f (eit ) = Dr f (t ) = f (t/r)r−1/2 .

Note that [−rR, rR] ⊂ [−π , π ] and hence  r f 2L2 (T,m) = (1/2π ) f 2L2 (R) . We then consider the operator T defined by  r −1 2π H T s f , s gds/s T f , g = lim (Log r) r→0,U

0

where the limit is with respect to an utrafilter U refining the net of the convergence r → 0. It is easy to check that the assumptions of Theorem 6.5 hold so that T is a multiple of H R . We claim that actually T = H R . Indeed, if we assume that f , g are supported in disjoint compact sets, the same is true for s f , s g for s > 0 small enough and hence    t −θ f (t/s)g(θ /s)dtdθ /s 2π H T s f , s g = (2π )−1 cot 2 R    s(t − θ ) f (t )g(θ )dtdθ = (2π )−1 s cot 2 R  1 f (t )g(θ )dtdθ when s → 0. → π −1 t −θ R

6.1 Hilbert transform: HT spaces

227

Thus, by (6.9), we find T f , g = H R f , g, proving our claim that T = H R . Now for f ∈ L p (R; B) and g ∈ L p (R; B∗ ), since (T ⊗ Id) f , g is a limit of averages of 2π H T s f , s g and since when s is small enough 

s f L p (T;m,B) 

s gL p (T,m;B∗ )

= (2π )−1  f L p (R;B) gL p (R;B∗ )

we find |(T ⊗ Id) f , g| ≤ CHT p 2π lim sup 

s f L p (T;m,B) 

s→0

s gL p (T,m;B∗ )

= CHT p  f L p (R;B) gL p (R;B∗ ) . Thus, we obtain as announced H R ⊗ IdB(L p (R;B)) ≤ CHT p . Remark 6.9. Let B be any Banach space and let 1 ≤ p ≤ ∞. Let G = T (or G = R or any other locally compact Abelian group equipped with Haar measure). Then for any f ∈ L p (G; B) and any ϕ ∈ L1 (G) the convolution f ∗ ϕ is in L p (G; B) and we have  f ∗ ϕL p (G;B) ≤  f L p (G;B) ϕL1 (G) . In other words, the convolution by any ϕ ∈ L1 (G) defines a bounded operator on L p (G; B). This is easy to check using Jensen’s inequality (1.4). As a consequence, if the singular kernels of two ‘principal value’ convolution operators T1 , T2 such as (6.7) on T (or (6.9) on R) differ by a function ϕ ∈ L1 (G), then T1 , T2 will be bounded on L p (G; B) for the same class of Banach spaces B. We can use this idea to give a simpler proof that if B is HT p on R then it is also HT p on T. Indeed, by restricting H R to functions supported on [−π , π ], if B is HT p on R, it is easy to see that the ‘principal value’ integral  π 1 1 T1 ( f )(θ ) = p.v. f (t )dt π −π θ − t is bounded on L p ([−π , π ]; B), or equivalently on L p (T; B). But since   1 θ 11 − cot ∈ L1 ([−π , π ], dθ ) ϕ(θ ) = πθ 2π 2 π 1 we conclude that T2 ( f )(θ ) = 2π p.v. −π cot( θ−t ) f (t )dt is also bounded on 2 L p (T; B), and hence B is HT p on T.

228

Hilbert transform and UMD Banach spaces

6.2 Bourgain’s transference theorem: HT implies UMD The main result of this section is due to J. Bourgain [134]. It allows for transplanting certain Fourier multipliers from T to TN . Let G = TN and let  = Z(N) ˆ identified with be its dual group (one often denotes this by  = Gˆ or G = ) (N) the set of all the continuous characters on G. Here Z denotes the set of all  sequences of integers (nk )k≥0 in ZN such that ∞ 0 |nk | < ∞. Then, to any such n = (nk )k≥0 is associated the function γn : TN → T defined by  ∀z = (zk ) ∈ TN γn (z) = znk k . k≥0

This gives us a one-to-one correspondence between Z(N) and the multiplicative group formed of all continuous characters, which is also a group isomorphism. Then {γn | n ∈ Z(N) } forms an orthonormal basis of L2 (G; μ), where μ is the normalized Haar measure on G. The set  = Z(N) can be totally ordered by the (reverse) lexicographic order, which can be defined like this: a non-zero element n = (nk )k≥0 is > 0 iff there is an integer K such that nK > 0 and nk = 0 ∀ k > K. Equipped with this order structure,  becomes an ordered group for which we can develop Harmonic Analysis on the model of the group Z. This is entirely classical. See [84, Chapter 8] for details. In particular, we can define the space H p (G) as the closed span in L p (G, μ) of the characters in + = {n > 0} ∪ {0}. Consider n = (nk )k≥0 and m = (mk )k≥0 in . We set n, m =

∞ 

nk mk .

k=0

We will show that several Fourier multipliers can be transplanted from Z to . The combinatorial key behind Bourgain’s theorem is the following useful fact. Lemma 6.10. For any finite subset A ⊂  there is an m ∈  such that ∀n ∈ A \ {0}

sign(n, m) = sign(n).

where sign(n) = ± depending whether n > 0 or n < 0 in  (just like in Z). More generally, for any sequence of signs (εk )k≥0 there is m ∈  such that ∀n ∈ A \ {0} sign((εk nk ), m) = sign((εk nk )). Proof. The second assertion follows from the first one applied to A = {(εk nk ) | n ∈ A}, so it suffices to prove the first one.

6.2 HT implies UMD

229

Let WK = {n ∈  | nK = 0, n j = 0 ∀ j > K} be the set of ‘words’ of length exactly K + 1. The idea is simply that for any n ∈ WK we have limmK →∞ sign(n, m) = sign(nK ).∞ = sign(n).∞. By induction on K we will prove that for any A ⊂ W0 ∪ · · · ∪ WK there is m ∈ W0 ∪ · · · ∪ WK such that ∀n ∈ A

sign(n, m) = sign(n).

(6.15)

The case K = 0 is trivial. Assume that we have proved this for any A ⊂ W0 ∪ · · · ∪ WK−1 and let us prove it for A ⊂ W0 ∪ · · · ∪ WK . By the induction hypothesis there is (m0 , . . . , mK−1 , 0, . . .) such that sign(n, m) = sign(n) for any n ∈ A ∩ (W0 ∪ · · · ∪ WK−1 ). Consider now any n ∈ A ∩ WK so that nK = 0. Then choosing mK > 0 large enough ensures that sign(n, m) = sign(n), and since A is finite, we can achieve this for any n ∈ A ∩ WK . Then m = (m0 , . . . , mK−1 , mK , 0, . . .) satisfies (6.15), completing the induction argument. Remark. A look at the preceding proof shows that if we are given in advance any (large) number N we can find m ∈  such that moreover |n, m| > N for any n ∈ A. Let ZA,N be the set of suitable ms. Since all these sets ZA,N are nonvoid, they generate a filter Z (or a net) on  such that for any non-zero n in  we have limm, n = sign(n) · ∞ m,Z

where sign(n) = ± depending whether n > 0 or n < 0. This net has a fortiori the following property: for any integer k and any n(1) < n(2) < · · · < n(k) in , there is an m such that, in Z, we have m, n(1) < m, n(2) < · · · < m, n(k). Given a bounded function ϕ :  → C on a discrete group  (we refer to ϕ as a ‘multiplier’), we will always denote by Mϕ : L2 () → L2 () the corresponding multiplier operator on L2 () defined by Mϕ γ = ϕ(γ )γ for any γ ∈ . Here G =  is the (compact) dual of  equipped with its normalized Haar measure. When it is bounded, we denote by Mϕ : L p (G; B) → L p (G; B) the operator that extends Mϕ ⊗ IdB . As usual, we denote L p (G, μ; B) simply by L p (G; B). The next result is easy to obtain by a classical transference argument. Lemma 6.11. Let B be a Banach space. Let 1 ≤ p < ∞. Let ϕ : Z → C be a multiplier. For any m in  we define the multiplier ϕm :  → C by ∀n∈

ϕm (n) = ϕ(n, m).

230

Hilbert transform and UMD Banach spaces

We have then: Mϕm : L p (G; B) → L p (G; B) ≤ Mϕ : L p (T, B) → L p (T, B). Proof. By homogeneity we may assume Mϕ : L p (T, B) → L p (T, B) = 1. By Fubini we also have Mϕ : L p (T; L p (G; B)) → L p (T; L p (G; B)) = 1. Fix m ∈ . For any z = (zk ) ∈  and any w ∈ T we denote w.z = (w mk zk )k≥0 . To any f ∈ L p (G; B) we associate F ∈ L p (T, L p (G; B)) defined by F (w)(z) = f (w.z). We may assume that f is a finite sum of the form f = γn (w.z) = w n,m γn (z), and hence



n xn γn .

Note that

Mϕ F (w)(z) = Mϕm f (w.z). We have then by the translation invariance of the L p (G; B)-norm Mϕm f L p (G;B) = Mϕ FL p (T;L p (G;B)) ≤ FL p (T;L p (G;B)) =  f L p (G;B) , which means that Mϕm : L p (G; B) → L p (G; B) ≤ 1. Remark 6.12. An analogous result holds for multipliers of weak-type (1,1). We have: Mϕm : L1 (G; B) → L1,∞ (G; B) ≤ Mϕ : L1 (T, B) → L1,∞ (T, B). Indeed, we can use the elementary fact that for any function f on a product of measure spaces such as G × T we have   f L1,∞ (G×T) ≤  f (x, ·)L1,∞ (T) μ(dx). To state the main point of this section, we need some specific notation. We will apply Lemma 6.11 to the multiplier ϕ : Z → C defined by ϕ(0) = 0 and ϕ(n) = sign(n). Up to a factor i, The corresponding operator on L2 (T) is the Hilbert transform H T , namely Mϕ = iH T . We equip the space (G, μG ) with the filtration (Ak )k≥0 defined by Ak = σ (z0 , z1 , . . . , zk ) where zk : G → T denotes the k-th coordinate on G. Given a Banach space B, any element f in L1 (G; B) be written as a convergent  series f = k≥0 dfk of martingale differences with respect to this filtration. A martingale ( fn )n≥0 is called a Hardy martingale if for any n ≥ 0 the variable

6.2 HT implies UMD

231

fn depends ‘analytically’ on the ‘last’ variable zn ; equivalently this means that each fn belongs to the closure of H 1 (G) ⊗ B in L1 (G; B). We will denote by Hk : L2 (G) → L2 (G) the Hilbert transform acting on the k-th variable only. So that if L2 (G) is iden L2 (T) then Hk corresponds to the transformation tified with k≥0

I ⊗ ··· ⊗ I ⊗ H ⊗ I ⊗ ··· with H T sitting at the coordinate of index k. We can now state Bourgain’s transference theorem: Theorem 6.13. Let 1 < p < ∞. Let B be any Banach space such that the Hilbert transform H T is bounded on L p (T, B). Recall CHT p (B) = Then for any choice of signs ε = ±1 and for any H T : L p (B) → L p (B). k   dfk in L p (G; B) we have finite martingale f = f dμ + k≥0





k≥0



εk Hk dfk

L p (B)

≤ CHT p (B) f L p (B) .

(6.16)

Proof. We may assume that the Fourier transform of f is supported on a finite subset A ⊂ . Recall that we set ϕ(0) = 0 and ϕ(n) = sign(n) for any n ∈ Z. Let us denote simply εn = (εk nk ) for any n ∈  and any ε = (εk ) ∈ {−1, 1}N . By Lemma 6.10, we can find m ∈  such that for any n ∈ A we have ϕεm (n) = sign(εn). More precisely, if n = (nk )k≥0 with nK = 0 and nk = 0 ∀ k > K, we have ϕεm (n) = εK sign(nK ). Thus ϕεm (0) = 0 and Mϕεm [dfK ] = iεK HK dfK . Thus the theorem now follows from Lemma 6.11, since Mϕ = iH T . Corollary 6.14. In the situation of Theorem 6.13, let P+ denote the orthogonal projection from L2 (G) onto the subspace H 2 [G] formed of all f such that the associated martingale ( fn ) is Hardy. Then for any finite sum f = f dμ +  dfk in L p (G; B) we have k≥0

(P+ ⊗ IdB )( f )L p (B) ≤ (2−1CHT p (B) + 1) f L p (B) .

232

Hilbert transform and UMD Banach spaces

In other words, if we denote by H˜ p [G; B] ⊂ L p (G; B) the closure of H p [G] ⊗ B in L p (G; B), then P+ is a bounded linear projection from L p (G; B) to H˜ p (G; B) of norm at most 2−1CHT p (B) + 1. Proof. Just like on T (see (6.4)), this follows from the identity   f dμ + 2−1 (1 + iHk )dfk P+ ( f ) =   −1 =2 Hk dfk , f dμ + 2−1 f + 2−1 i from which we deduce P+ ( f )L p (B) ≤ (2−1CHT p (B) + 1) f L p (B) . Corollary 6.15. In the situation of Theorem 6.13, let A ⊂ N be a subset such that for each n in A, the variable dfn is analytic in the variable zn . We have then for all f in L p (B)







dfn

≤ CHT

p (B) f L p (B) .



n∈A

L p (B)

Proof. Note that iHk dfk = dfk for all k in A. By Theorem 6.13 we have





 



dfk ± iHk dfk

≤ CHT

p (B) f L p (B) .



k∈A

k∈A /

L p (B)

Hence the first part follows from the triangle inequality. The next statement is the result for which Bourgain invented the preceding transference ‘trick’. Corollary 6.16. For any Banach space B and any 1 < p < ∞ we have 2 Cp (B) ≤ CHT p (B) .

In particular, HT implies UMD. Proof. We may restrict consideration to f ∈ L p (B) with vanishing mean so that  f0 = 0. Let g = k≥0 εk Hk dfk . By (6.16) we have gL p (B) ≤ CHT p (B) f L p (B) . By (6.16) applied to g, with all the signs εk equal to 1, we find





Hk dgk

≤ CHT

p (B)gL p (B) . k≥0

L p (B)

But since Hk dgk = εk Hk2 dfk = −εk dfk for any k ≥ 0, (we can include k = 0  2 because f0 = g0 = 0) we obtain  k≥0 εk dfk L p (B) ≤ CHT p (B)  f L p (B) .

6.3 UMD implies HT

233

In the sequel, it will be useful for us to avoid squaring the constant p. This is possible if one restricts the UMD property to Hardy martingales. More precisely, if we denote by Cap (B) the smallest constant C such that (5.21) holds for any B-valued Hardy martingale f = ( fn )n≥0 in L p (G; B), then we have Cap (B) ≤ CHT p (B). Indeed, when ( fn ) is a Hardy martingale, Hk (dfk ) = dfk , hence this follows immediately from Theorem 6.13. See §6.7 for more on this. However, the following seems to be still open. Problem 6.17. Is there an absolute constant K such that Cp (B) ≤ KCHT p (B) for any 1 < p < ∞ and any B?

6.3 UMD implies HT The main result of this section is: Theorem 6.18 ([169]). Let 1 < p < ∞ and let B be a UMD p Banach space. Then the Hilbert transform on R is bounded on L p (R; B). Combined with Corollary 6.16 this yields: Theorem 6.19. UMD and HT are equivalent properties for Banach spaces. The traditional way to prove that UMD implies HT is to use Brownian motion as in the next section (see [169] in turn fundamentally based on [176]). We will use instead an idea in [378]. For that purpose, we first need to introduce some specific notation. Let I = [0, 1). We denote D00,1 = {k + [0, 1) | k ∈ Z}. Then D00,1 constitutes a disjoint partition of R into intervals of equal measure 1 with 0 as origin (i.e. 0 is the endpoint of some interval). Similarly, we denote for each n in Z Dn0,1 = {2n k + [0, 2n ) | k ∈ Z}, so that Dn0,1 is a partition of R into intervals of equal measure 2n admitting 0 as origin. Equivalently, Dn0,1 is obtained from D00,1 by a dilation of ratio 2n . We set  Dn0,1 D0,1 = n∈Z

234

Hilbert transform and UMD Banach spaces

for each I in D0,1 , we denote by I+ the left half of I and by I− the right half, so that, for instance, if I = [0, 1) we have I+ = [0, 12 ) and I− = [ 12 , 1). (Note that actually the endpoints are irrelevant because they are negligible for all present measure theoretic purposes.) Remark 6.20. Let Cn be the σ -algebra on R generated by the partition Dn0,1 . The family (Cn )n∈Z gives us an example of a filtration on an infinite measure space, namely (R, dx), for which the conditional expectation makes perfectly good sense (see Remark 1.13). Thus martingale theory tells us that for any f ∈ L p (R, dx) (1 ≤ p < ∞) the sequence fn = ECn f is a martingale indexed by Z such that fn → f in L p when n → +∞, and also (reverse martingale) fn → 0 in L p when n → −∞ (because 0 is the only constant function in L p ). But actually, although rather elegant, this viewpoint is not so important because the main issues we address are all local, and locally R can be treated as a bounded interval and hence ‘at the bottom’ as a probability space. We will denote ∀I ∈ D0,1

hI = |I|−1/2 (1I+ − 1I− ),

so that hI is positive on I+ and negative on I− (curiously this rather natural convention is opposite to the one made in [378]). It is well known that {hI | I ∈ D0,1 } forms an orthonormal basis of L2 (R). The fundamental operator that we will study is the linear operator T 0,1 : L2 (R) → L2 (R) defined as follows:  ∀ f ∈ L2 (R) T 0,1 ( f ) =  f , hI 2−1/2 (hI+ − hI− ). Note that T 0,1 hI = 2−1/2 (hI+ − hI− ) and it is easy to see that the family {2−1/2 (hI+ − hI− ) | I ∈ D0,1 } is orthonormal in L2 (R). Therefore, T 0,1 is an isometry on L2 (R) into itself. We will first show that if (and only if) B is UMD then T 0,1 is bounded on L2 (R; B). This is rather easy: Lemma 6.21. For each 1 < p < ∞, T 0,1 is bounded on L p (R). More generally, if B is UMD p then T 0,1 is bounded on L p (R; B). Proof. Consider f in L2 (R; B). We may assume (by density) that f is a sum  f = I∈D0,1 xI hI with only finitely many non-zero xI s. We define T 0,1 ( f ) =  xI T 0,1 (hI ). Note: according to our previous notation (see Proposition 1.6) 0,1 ( f ), but for simplicity we choose to abuse the we should denote this by T7 notation here.

6.3 UMD implies HT

235

 Let us first assume that f = I⊂I0 xI hI where I0 = [0, 1). Then with an  appropriate reordering the series f = xI hI appears as a sum of dyadic martingale differences, to which the Burkholder inequalities can be applied. Assume B = C. Then the square function S( f ) of the corresponding martingale is given by   |xI |2 |hI |2 = |I|−1 |xI |2 1I . S( f )2 = I I  Let g = T 0,1 ( f ) = xI 2−1/2 (hI+ − hI− ). Note that g is also a sum of dyadic martingale differences and since |2−1/2 (hI+ − hI− )| = |hI |

(6.17)

we have the following pointwise equality: S(g)2 = S( f )2 . By Theorem 5.21 we have g p ≤ b p S(g) p = b p S( f ) p ≤ b p a p  f  p , which proves that T 0,1 : L p (I0 ) → L p (I0 ) ≤ a p b p . Now, if B is UMD p and xI ∈ B, we can argue similarly: indeed, by (6.17) and by (5.19) we have the following pointwise equality R({xI 2−1/2 (hI+ − hI− )}) = R({xI hI }), and hence by (5.23) we have gL p (B) = T 0,1 ( f )L p (B) ≤ C1C2  f L p (B) .  Now in the general case, when f = xI hI with only finitely many non-zero xI s, we may clearly decompose f as a sum f = f 1 + f 2 so that there are inter vals I 1 ⊂ (−∞, 0] and I 2 ⊂ [0, +∞) in D0,1 such that f 1 = I⊂I 1 xI hI and  f 2 = I⊂I 2 xI hI . Since we have  f Lpp (B) =  f 1 Lpp (B) +  f 2 Lpp (B) , it suffices to majorize separately T 0,1 ( f 1 )L p (B) and T 0,1 ( f 2 )L p (B) . But for each of them we may argue exactly as before, replacing I0 in the preceding by I 1 or I 2 . We will use Theorem 6.5. Recall that, for any r > 0, we denote by Dr : L2 (R) → L2 (R) the (isometric) dilation operator Dr defined by Dr f (x) = f (x/r)r−1/2 . We now introduce the family of intervals Dα,r obtained from D0,1 by dilating the intervals in D00,1 from measure 1 to measure r and translating the origin from 0 to α. In other words the intervals in Dα,r are the images of those in D0,1 under the transformation x → rx + α. They are of length r2n (n ∈ Z). Clearly the collection {hI | I ∈ Dα,r } is still an orthonormal basis in L2 (R). Let us denote T α,r the operator we obtain from T 0,1 after we replace D0,1

236

Hilbert transform and UMD Banach spaces

by Dα,r . More precisely T α,r ( f ) =

 I∈Dα,r

 f , hI (hI+ − hI− )2−1/2 .

Obviously, T α,r is an isometry on L2 (R). Let G be the group R × R+ where R is equipped with addition and R+ = {r > 0} is equipped with multiplication. The integral of a (nice enough) function F : (α, r) → F (α, r) with respect to Haar measure on G is given by  ∞ ∞ F (α, r) dαdr/r. −∞

0

But since it is an infinite measure, we need to use instead an invariant mean  on G. By definition this is a translation invariant positive linear form on L∞ (G) such that (1) = 1. So  looks like a probability but it is not unique and it defines only an additive (and not σ -additive) set function on the Borel subsets of R. Nevertheless, for any F in L∞ (G), abusing the notation, we will denote (F ) by  F → F (α, r) d(α, r). Note that G being commutative is amenable. We will use a specific choice of  as follows: we choose non-trivial ultrafilters U on R and V on R+ and we set (actually, we will show that, for the specific F to which we apply this, the true limits exist so that we can avoid using ultrafilters) 3   R 2  a dr 1 1 F (α, r) dα lim F (α, r)d(α, r) = lim . R→∞ 2 Log R R−1 a→∞ 2a −a r U V

(6.18) From now on, the invariant mean  is defined by (6.18). This choice guarantees a supplementary invariance under symmetries as follows:   F (α, r)d(α, r) = F (−α, r)d(α, r). (6.19) We then define the operator T on L2 (R) by setting  ∀ f , g ∈ L2 (R) T ( f ), g = T α,r ( f ), g d(α, r).

(6.20)

Note that, since (α, r) → T α,r ( f ), g is in L∞ (G), this does make sense. Let us denote by λ(t ) : L2 (R) → L2 (R) the (unitary) operator of translation by t. Theorem 6.22. The operator T is a non-zero multiple of the Hilbert transform on L2 (R).

6.3 UMD implies HT

237

First part of the proof. We will use Theorem 6.5. Given I in D0,1 , we denote by I α,r the interval obtained from I after first dilation of ratio r (from 0) and then translation by α, so that |I α,r | = r|I|. Clearly I → I α,r is a (1-1)-correspondence. Moreover, we have obviously hI α,r = λ(α)Dr hI and similarly with I+ and I− in place of I. We have  T α,r ( f ) =  f , hI α,r (hI−α,r − hI+α,r )2−1/2 (6.21) 0,1 I∈D

and hence −1 T α,r = (λ(α)Dr )T 0,1 (λ(a)Dr )∗ = λ(α)Dr T 0,1 D−1 r λ(α) .

(6.22)

By the (translation) invariance of , it follows immediately from this that T = λ(t )T λ(t )−1 for any t in R. Note that λ(α)Dr = Dr λ(α/r), so that T α,r = Dr λ(α/r)T 0,1 λ(α/r)−1 D−1 r .

(6.23) D−1 r

Therefore, again by the invariance of  we must have T = Dr T for any r > 0. Finally, observe that, for any I in D0,1 , we have hI (−t ) = −h−I (t ) but also (−I)+ = −I− and (−I)− = −I+ , so that T 0,1 ShI = −ST 0,1 hI . Therefore we have T 0,1 S = −ST 0,1 and a fortiori T α,r S = −ST −α,r for any (α, r) in G. Thus, by (6.19) we must also have T S = −ST . By Theorem 6.5, it follows that T is a multiple of the Hilbert transform on R. But there remains a crucial point: to check that T = 0 ! This turns out to be delicate. We do not know a simple proof of this sticky point, avoiding the calculations of the second part of the proof, which require first a more detailed study of the kernel of T α,r . To complete the proof, we will essentially reproduce the argument from Stefanie Petermichl’s [378], just inserting a few more details to ease the reader’s task. The pictures (reproducing those from [378]) are crucial to understand what goes on. We will compute the kernel of T by averaging the kernels of T α,r as in (6.20). Remark 6.23. We will use the following well-known elementary fact (related to the theory of almost periodic functions). Let f : R → R be bounded and measurable. Assume that there are r > 0 and c ≥ 0 such that for any integers N ≥ 0 and m ≥ 1 sup | f (α) − f (rm2N + α)| ≤ c2−N . α∈R

(6.24)

R Then the averages MR ( f ) = (2R)−1 −R f (α)dα converge when R → ∞. Of course, a fortiori, this holds if f is a periodic function with period r (in which case (6.24) holds with c = 0).

238

Hilbert transform and UMD Banach spaces

Indeed, fix ε > 0 and N such that c2−N < ε. Let P = r2N , so that we have supα∈R | f (α) − f (mP + α)| < ε for any m ≥ 1. We have for any k ≥ 1 k−1   1  P 1 P |MkP ( f ) − f (α)dα| ≤ (2k)−1 ( f (mP + α) − f (α))dα ≤ ε. P 0 P 0 m=−k Moreover, for any R ≥ P such that kP ≤ R < (k + 1)P we have |MR ( f ) −

kP P MkP ( f )| ≤ supα∈R | f (α)|, R R

which implies 

1 lim sup |MR ( f ) − P R→∞

P

f (α)dα| ≤ ε,

0

from which the announced convergence follows easily by the Cauchy criterion. Recall that for each α, r we defined a dyadic shift operator T α,r by  (T α,r f )(x) = ( f , hI )(hI+ (x) − hI− (x)). I∈Dα,r

√ Its L2 operator norm is 2 and its representing kernel, defined for any x = t, is  hI (t )(hI+ (x) − hI− (x)). (6.25) K α,r (x, t ) = I∈Dα,r

Let Dnα,r ⊂ Dα,r denote the subcollection formed of the intervals of length r2 that form a partition of R. Note that Dα,r = ∪n∈Z Dnα,r . Note also that α,r , and hence Dα,2r = Dα,r . Let Dnα,2r = Dn+1  Knα,r (x, t ) = hI (t )(hI+ (x) − hI− (x)), α,r n

I∈Dn

so that (for any x = t), K α,r =

 n∈Z

Knα,r .

Lemma 6.24. The convergence of the sum (6.25) is uniform for |x − t| ≥ δ for every δ > 0. For x = t, let  a 1 r Kn (x, t ) = lim K α,r (x, t ) dα, (6.26) a→∞ 2a −a n 1 K (x, t ) = lim a→∞ 2a



a

r

−a

K α,r (x, t ) dα,

(6.27)

6.3 UMD implies HT and also 1 R→∞ 2 Log R



K(x, t ) = lim

R 1/R

K r (x, t ) dr r .

239

(6.28)

These 3 limits exist pointwise and the convergence is bounded for |x − t| ≥ δ for every δ > 0. In addition, each of these is a function of x − t. Moreover,  K r (x, t ) = Knr (x, t ). (6.29) n∈Z

Proof. For any x ∈ R, let In (x) be the unique interval containing x in Dnα,r . We have √ |hI (t )(hI+ (x) − hI− (x))| = 1t∈I 1x∈I 2|I|−1 . In particular, |Knα,r (x, t )| ≤



2(r2n )−1 .

(6.30)

Note that In (x) = In (t ) ⇒ |x − t| ≤ |In (x)| = |In (t )| = r2n and hence ∀α ∈ R and ∀r > 0 we have √  √ |hI (t )(hI+ (x) − hI− (x))| = 2|In (x)|−1 1In (x)=In (t ) ≤ 2 2/|x − t|. I∈Dα,r

n∈Z

In particular, the sum converges absolutely and uniformly for |x − t| ≥ δ for every δ > 0, and we have √ (6.31) |K α,r (x, t )| ≤ 2 2/|x − t|. The existence of the limits is due to either the periodicity or the ‘almost periodicity’ in α and the (multiplicative) periodicity in r. More precisely, note that the sum defining Knα,r (x, t ) is finite and periodic in α (with period r2n ). From this it is easy to show (see Remark 6.23) that the limit (6.26) exists. Furthern more, for any fixed integer m ∈ Z, we have Dnα,r = Dnα+rm2 ,r , and a fortiori N N Dnα,r = Dnα+rm2 ,r for any N ≥ n, and consequently Knα,r = Knα+rm2 ,r if n ≤ N. Therefore  N N |Knα,r − Knα+rm2 ,r |, |K α,r − K α+rm2 ,r | ≤ n>N

so that by (6.30)

√  N |K α,r − K α+rm2 ,r | ≤ 2 2

n>N

√ (r2n )−1 = 2 2(r2N )−1 .

By Remark 6.23, this implies that the limit (6.27) exists for any r and any (x, t ), and by (6.31) we have √ |K r (x, t )| ≤ 2 2/|x − t|. (6.32)

240

Hilbert transform and UMD Banach spaces

Since Dα,2r = Dα,r , we have K r (x, t ) = K 2r (x, t ). Note that the change of variable r = exp s gives us a periodic function in s, so that Remark 6.23 again implies that the limit in R exists in (6.28). Note that for any β ∈ R we have Knα,r (x − β, t − β ) = Knα+β,r (x, t ). This allows us to use the invariance property of the ‘mean’ (or ‘Banach limit’) defining (6.26). Indeed, from this and (6.30), we deduce that Knr (x − β, t − β ) = Knr (x, t ), and hence Knr (x, t ) = Knr (x − t, 0). Thus (6.26) depends only on x − t and consequently also for (6.27) and (6.28). Recall that Knα,r (x, t ) = 0 whenever 2n < |x − t|/r. Thus by (6.30), it is clear that (6.29) holds. Lemma 6.25. We have for any x = t K(x, t ) = (x − t )−1 K(1, 0). Moreover, the function (x, t ) → K(x, t ) (defined for x = t) is the kernel of the operator T in the following sense: For any pair of functions f , g ∈ L2 (R) assumed bounded and supported respectively in two disjoint compact sets we have  T f , g = K(x, t ) f (t )g(x)dtdx. (6.33) Proof. Recall that the intervals in Dαρ,rρ are the images of those in Dα,r under the transformation x → ρx. From this and (6.25), it is easy to check that K α,r (x/ρ, t/ρ) = ρK αρ,rρ (x, t ) for any ρ > 0. Averaging first in α and then in r as in (6.27) and (6.28) yields first K r (x/ρ, t/ρ) = ρK rρ (x, t ) and then K(x/ρ, t/ρ) = ρK(x, t ). By Remark 6.6 we conclude that K(x, t ) = (x − t )−1 K(1, 0). Going back to (6.20), it is now clear that K(x, t ) is the kernel of T . Indeed, assuming f , g bounded on R and (x, t ) → f (t )g(x) supported in {(x, t ) | |x − t| ≥ δ} for some δ > 0, the right-hand side of (6.33) is equal to (by (6.31) and (6.32) the convergences are dominated)  R  a 1 1 lim K α,r (x, t ) f (t )g(x)dtdxdα dr lim r , R→∞ 2 Log R 1 a→∞ 2a −a R and this is the same as2 3   R  a  1 1 α,r lim lim K (x, t ) f (t )g(x)dtdx dα dr/r, R→∞ 2 Log R R−1 a→∞ 2a −a U V

1 = lim R→∞ 2 Log R V



R R−1

2

1 lim a→∞ 2a U



a

−a

which means that K is the kernel of T .

3 T

α,r

( f ), g dα dr/r = T ( f ), g,

6.3 UMD implies HT

241

Proof that T = 0. Let us first give a picture for hI (t )(hI+ (x) − hI− (x)): x − − + − + −

+ − − + − − I t

We have hI (t )(hI+ (x) − hI− (x)) = 0 √ if and only if the point (x, t ) lies in the preceding square I × I. Its value is ± 2/|I|, where the correct sign is indicated inside the smaller rectangles. Recall that 1 a→∞ 2a

Knr (x, t ) = lim



a −a

Knα,r dα.

(6.34)

Our next goal is to compute it (and to show that it exists) for fixed r > 0 and n ∈ Z and assuming x > t. The picture is the following: x

− − + − − +

− + − − + −

− − + − − +

+ − − − + −

− − + − − +

+ − − − + −

− − + − − +

− + − − + −

t

The exact position of the squares along the diagonal depends on the starting point α. The picture will repeat itself for two values of α that differ by an integer multiple of |I|. We compute (6.34) in (x, t ) by considering the probability that (x, t ) lies in any of the squares. As we already mentioned in Lemma 6.24, this

242

Hilbert transform and UMD Banach spaces

only depends on x − t. Thus we have Knr (x, t ) = Knr (x − t, 0) and hence setting Knr (s) = Knr (s, 0) we may write Knr (x, t ) = Knr (x − t ). r If x − t = 0 then Knr (x, t ) = (1/4 − 1/4 + 1/4 − 1/4) · similarly: √ r if x − t = |I|/4 then Knr (x, t ) = 3/4 · 2/|I|, r if x − t = |I|/2 then Knr (x, t ) = 0, √ r if x − t = 3|I|/4 then Knr (x, t ) = −1/4 · 2/|I|, r if x − t ≥ |I| then Knr (x, t ) = 0.



2/|I| = 0, and

In between the preceding computed values, the function Knr (x, t ) = Knr (x − t ) is piecewise linear in x − t, so we obtain for it the following graph, depending on n and r:

√ 3 2 4r2n



• √ − 2 4r2n





r2n

x−t



Some explanations: The value when x − t = c is given by what happens in the picture when one averages the values of the kernel over the line parallel to the diagonal with equation x − t = c. By looking at the picture one sees that the value will change only at the values indicated, i.e. when c = 0, |I|/4, |I|/2, 3|I|/4 and c → Knr (c) vanishes when c ≥ |I|. Moreover, by elementary plane geometry, one sees that c → Knr (c) is linear on the interval lying between any of two consecutive values among c = 0, |I|/4, |I|/2, 3|I|/4. Consider for instance the first interval: When c ∈ [0, |I|/4]. Let θ = c(|I|/4)−1 . We need to compute the average of a function that√(if we start from the bottom of the square in the first picture) is equal to 2/|I| times successively +1, −1, +1, −1 and finally 0 and then the values repeat with period equal to

6.3 UMD implies HT

243

|I|. These successive values are the same on the diagonal except that, there (for c = θ = 0), the interval with value 0 shrinks to a singleton. Now when c = θ (|I|/4) with 0 < θ < 1 the values to be averaged are now +1, −1, +1, −1, 0 with respective proportions 1/4, (1 − θ )/4, (1 + θ )/4, (1 − θ )/4, θ /4. So we find √ Knr (c) = ( 2/|I|)[1/4 − (1 − θ )/4 + (1 + θ )/4 − (1 − θ )/4 + 0(θ /4)] √ = ( 2/|I|)[3θ /4]. Thus we find as announced that Knr (c) is linear in c between the values c = 0 and c = |I|/4, as indicated in the preceding graph. On the other 3 intervals, the linearity can be checked similarly by looking at the picture. Now we compute K r (x, t ). We will compute K r (x, t ) using Knr (x, t ) for different values of n and summing over n ∈ Z. It suffices to compute K r (x, t ) for the values x − t = 3/4 · r2m and x − t = r2m (m ∈ Z): √ √ √ √     1 2 1 1 2 3 2 9 2 r 3 m 1+ + r2 = − +· · · = K + + , m m m 4 4 r2 16 r2 64 r2 4 16 8r2m (6.35) √ √   1 1 2 3 2 1+ + + ··· = . (6.36) K r (r2m ) = m 16 r2 4 16 4r2m Here again a few more explanations might help. Consider for example (6.35). Let cm = 34 r2m . We have      r 3 m = r2 Knr (cm ) = Kmr (cm ) + Knr (cm ) + K r (cm ). K n∈Z n>m n r2n and hence since Knr is supported on [0, r2n ] the sum over n < m vanishes. A close look at the preceding graph then gives us   r Knr (cm ) = Km+1 (cm ) + K r (cm ) n>m n>m+1 n √ √   1 1 3 2 9 2 1 + + + · · · . = + 16 r2m 64 r2m 4 16 The justification of (6.36) is similar but simpler. We now claim that equations (6.35) and (6.36) imply that √ √ 3 2 2 r ≤ K (x − t ) ≤ ∀r > 0. 32(x − t ) 4(x − t )

(6.37)

Indeed, assume that r2m−1 ≤ x − t ≤ r2m . Note that the function K r restricted either to the interval [ 43 r2m , r2m ] or to [r2m−1 , 34 r2m ] is linear (since it is a sum of

244

Hilbert transform and UMD Banach spaces

the functions {Knr | n ≥ m} that are each linear on this interval). From the values of x → xK r (x) at the end points, as given by (6.35) and (6.36), it follows that K r restricted to the interval [ 34 r2m , r2m ] (resp. [r2m−1 , 34 r2m ]) is of the form K r (x) = ax + b with a, b both positive (resp. both negative), and hence x → xK r (x) is monotone on both intervals. Applying this to both intervals, we obtain the claim for r2m−1 ≤ x − t ≤ r2m and hence for x − t ≥ 0. Since K r is odd this suffices. The expression in Lemma 6.24 is obtained from K r (x − t ) by a limit of averages in r, so it is clear from √ (6.37) that the√constant c = K(1) such that K(x, t ) = c(x − t )−1 is such that 3 2/32 ≤ c ≤ 2/4. In particular c = 0, and since K is the kernel of T , T = 0. The proof of Theorem 6.22 is now complete. Remark 6.26. By (6.9), the equality K(x, t ) = K(1)(x − t )−1 means that T = π K(1)H R . At the cost of a little more effort one can compute the exact value of K(1). (A different way to calculate this same number appears in [379].) Here is an outline: We first compute Kr (1). Let n be such that r2n−1 ≤ 1 < r2n . Equivalently we have 1/r = 2s and n − 1 = [s] where [s] ∈ Z denotes the largest integer N ≤ s in Z. Let 0 ≤ θ < 1 be such that 1 = (1 − θ )r2n−1 + θ r2n . We have to distinguish θ ≤ 1/2 and θ > 1/2, because Kr is linear on both intervals corresponding to θ ≤ 1/2 and θ > 1/2. When θ ≤ 1/2 we have 1 = (1 − 2θ )r2n−1 + 2θ (3/4)r2n , and hence Kr (1) = (1 − 2θ )Kr (r2n−1 ) + 2θ Kr (((3/4)r2n ) so that by (6.35) and (6.36) we find √ Kr (1) = 2(r2n+1 )−1 (1 − 3θ /2). Now if θ ≥ 1/2 we have 1 = 2(1 − θ )(3/4)r2n + (2θ − 1)r2n , and we find similarly √ Kr (1) = 2(r2n+1 )−1 (θ /2). n−1

[s]−s

Note θ = 1−r2 = 1−2 = 2s−[s] − 1. So if we set x = s − [s], we have θ = r2n−1 2[s]−s x 2 − 1. Then 0 ≤ x < 1 and r = 2−s = 2−x−[s] . Let a = Log2 (3/2). Note that θ ≤ 1/2 iff x ≤ a. Thus we have √   Kr (1) = 2(1/4) 1{x≤a} ((5/2)2x − (3/2)22x ) + 1{x>a} ((22x − 2x )/2)

6.3 UMD implies HT Note that when R → ∞ and R = 2Y  R  (2 Log R)−1 f (r)dr/r = (2Y )−1

Y

−Y

1/R

245

f (2y )dy = (2Y )−1



Y

f (2−y )dy

−Y

we apply this to f (r) = Kr (1). Note that r = 2−s and f (2−s ) depends only on x = s − [s],  Yand hence is periodic with period 1. Thus it is clear that the average of (2Y )−1 −Y f (2−y )dy is equal to 

1

f (2−y )dy.

0

So we conclude that K(1) =





a



2(1/4)

1

((5/2)2 − (3/2)2 )dx + x

2x

0

 ((2 − 2 )/2)dx 2x

x

a

and we find K(1) =

√ 2(1/8)(1/Log 2).

Proof of Theorem 6.18. By (6.23), the operators T α,r are obtained from T 0,1 by conjugation with respect to the group generated by translations and dilations. Since translations and dilations are isometric on L2 (B), the norm on L2 (B) of the single operator T 0,1 is the same as that of any of the operators T α,r . Thus, by Lemma 6.21, the UMD2 property of B implies a uniform bound of the norm on L2 (B) for the operators T α,r , and hence by averaging for the operator T . By Theorem 6.22 and Proposition 6.7, we conclude that B is HT2 . The proof works just as well for 1 < p = 2 < ∞ once one makes a cosmetic change of the normalization of the dilation Dr so that it remains isometric on L p and L p (B): we just reset Dr to be equal to Dr f (x) = f (x/r)r−1/p . Remark 6.27. If we replace the minus sign in (6.21) by a plus sign, then the resulting operator is even and hence the same reasoning shows that it is a scalar multiple of the identity. The reader in need of calculus exercises can verify that indeed the preceding calculations lead to K(1) = 0, and hence K(x, t ) = 0 for all x = t, if the sign is changed. Remark 6.28. Both proofs of UMD ⇒ HT in this section and the next one produce an estimate of the form C2HT (B) ≤ cC2 (B)2 (for some absolute constant c). It seems to be an open problem whether this can be improved to C2HT (B) ≤ cC2 (B). See [240] and Remark 6.40 for more on this problem.

246

Hilbert transform and UMD Banach spaces

6.4 UMD implies HT (with stochastic integrals)* In this section we briefly outline an alternate proof of the implication UMD ⇒ HT based on the ideas of Burkholder, Gundy and Silverstein in [176]. The adaptation to the Banach space valued case comes from [169, 232]. To describe the main idea, we first need to relate the UMD property with a certain form of ‘decoupling’ inequality. Let B be an arbitrary Banach space. Consider a B-valued harmonic function u : D → B (we could restrict the discussion to a polynomial). We will use the representation of the martingale Mt = u(Wt∧T ) given by the Ito-formula:  t∧T ∇u(Ws ).dWs . u(Wt∧T ) = u(W0 ) + 0

Writing Wt = rewrite this as

Wt1

+

iWt2

and ∇u(x + iy) = ( ∂u (x + iy), ∂u (x + iy)) we can ∂x ∂y 

Mt = u(Wt∧T ) = u(W0 ) +

t∧T



0

% ∂u ∂u 1 2 (Ws )dWs + (Ws )dWs . ∂x ∂y

Assuming BM defined on (, P) as before, we define the ‘decoupled’ martingale Mt on ( × , P × P) by setting for any (ω, ω ) ∈  ×   t∧T (ω) ∇u(Ws (ω)).dWs (ω ) Mt (ω, ω ) = u(W0 (ω)) + 0

or equivalently Mt (ω, ω ) = u(W0 (ω)) +



t∧T (ω) 0



% ∂u ∂u (Ws (ω))dWs1 (ω ) + (Ws (ω))dWs2 (ω ) . ∂x ∂y

With this notation, the key fact to extend the proof of [176], and specifically (4.38), is the following fact: Lemma 6.29. If B is UMD, then for any 1 < p < ∞ there is a constant D p such that for any f ∈ L p (T; B) with Poisson integral u (recall u ∈ h˜ p (D; B)) we have D−1 p sup Mt  p ≤ uh p (D;B) =  f L p (T;B) ≤ D p sup Mt  p . t>0

(6.38)

t>0

Sketch of proof. Consider a B-valued dyadic martingale ( fn ) on the probability space (, P) = (D∞ , dν) as before. We can write dn = fn − fn−1 = εn ϕn−1 (ε1 , . . . , εn−1 ). We define another martingale difference sequence dn on ( × , P × P) by setting d0 = d0 and for any (ε, ε ) ∈  ×  dn (ε, ε ) = εn ϕn−1 (ε1 , . . . , εn−1 ),

6.4 UMD implies HT (with stochastic integrals)*

and we define fn by fn =

 0≤k≤n

247

dk .

By (5.23) and by Kahane’s inequalities (Theorem 5.2) there is a constant D p such that for any ( fn ) we have D−1 p sup  f n L p (B) ≤ sup  f n L p (B) ≤ D p sup  f n L p (B) . n

n

(6.39)

n

We now claim that a similar inequality holds if we replace (εn ) by a Gaussian i.i.d. sequence (gn ). Indeed, consider now  = RN∗ equipped with its canonical Gaussian measure P, denote the coordinates by (gn ) and let (Fn ) be a martingale on (, P) associated to the natural filtration as in §1.4, such that the increments dFn are of the form dFn = Fn − Fn−1 = gn ψn−1 (g1 , . . . , gn−1 ). 50 = dF0 , dF 5n (ω, ω ) = gn (ω )ϕn−1 (g1 (ω), . . . , gn−1 (ω)), We again denote dF and  5k . Fn = dF 0≤k≤n

By a suitable application of the central limit theorem (CLT in short), we can show that (6.39) ‘automatically’ implies D−1 p sup Fn L p (B) ≤ sup Fn L p (B) ≤ D p sup Fn L p (B) . n

n

(6.40)

n

Indeed, we can reduce to the case when the ψn ’s are all polynomials and the CLT gives us the following approximations g1 ≈ G1 = N(1)−1/2 (ε1 + · · · + εN(1) ) g2 ≈ G2 = N(2)−1/2 (εN(1)+1 + · · · + εN(1)+N(2) ), and so on. Thus one finds Fn (g1 , . . . , gn ) ≈ Fn (G1 , . . . , Gn ) and similarly for Fn . Then (6.39) implies for any n

n





−1

Dp

Gk (ω )ψk (G1 (ω), . . . , Gk−1 (ω))



0

L p (P×P;B)

≤ Fn (G1 , . . . , Gn )L p (P;B)

n





≤ Dp

Gk (ω )ψk (G1 (ω), . . . , Gk−1 (ω))



0

L p (P×P;B)

248

Hilbert transform and UMD Banach spaces

and in the limit we obtain Claim 6.40. Lastly, one can deduce (6.38) from (6.40) by a suitable discretization of the stochastic integrals, analogous to the one described for Lemma 4.48. We can now give an alternate proof that UMD implies HT. We will use the ‘conjugate’ harmonic function u. ˜ The latter can be defined viewing B as a real Banach space embedded isometrically into its complexification, as we did at the end of §6.1, but invoking the complexification of B is rather unpleasant. When u is a polynomial with coefficients in B of the form  #(zn )xn + (zn )yn u(z) = x0 + n>0

then u is the Poisson integral of its boundary values given by  cos(nt )xn + sin(nt )yn , f (eit ) = x0 + n>0

and u˜ is given explicitly by the simple formula  sin(nt )xn − cos(nt )yn . u(e ˜ it ) = n>0

This also shows that if we denote by f˜ the boundary values of u˜ (so that u˜ is the Poisson integral of f˜), then we have f˜ = H T f . Second proof of Theorem 6.18. Assume B has the UMD p property. Let f , u be as in the preceding lemma. For convenience, let us denote 2

 T (ω)

p 31/p





∇u(Ws (ω)).dWs (ω )

. S (u)(ω) = Eω

0

Note that if p = 2 and if B = R (or if B is a Hilbert space) then S (u) = S(u) where S(u) is as in (4.38) (and the two quantities are in any case equivalent for p = 2). The key observation (as in (4.38)) is that the Cauchy-Riemann equa˜ Therefore, (6.38) yields tions (4.37) imply that S (u) = S (u). H T f L p (T;B) =  f˜L p (T;B) ≤ D2p  f L p (T;B) , which completes our second proof that UMD ⇒ HT . A variant of the preceding proof can be based on the discretization in Remark 4.47, as was done initially by Burkholder in [169].

6.5 Littlewood-Paley inequalities in UMD spaces

249

Remark. We refer to [52] for numerous complements on stochastic integrals and decoupling inequalities for multiple integrals. See also [59] for a general presentation of stochastic integration in Banach spaces.

6.5 Littlewood-Paley inequalities in UMD spaces Let x = (xn )n≥0 be a sequence in a Banach space B. We will denote

n



R(x) = sup

εk xk

n

0

L2 (B)

where L2 (B) = L2 ( , ν; B) with ( , ν) as in (5.13). Let M = (Mn )n≥0 be a B-valued martingale with associated difference sequence dM = (dMn )n≥0 . Then R(dM) should be understood as a random variable. In our earlier study of the UMD property, we showed that if B is UMD then, if 1 < p < ∞, all B-valued martingales satisfy a p R(dM) p ≤ sup Mn L p (B) ≤ b p R(dM) p n

where a p , b p are positive constants depending on B. For Fourier series in the scalar case, there is an analogue of this, namely the famous Littlewood-Paley inequalities (see [30] and [85] for background). These say that for any f in L p (T) the dyadic partial sums 

+ fˆ(k)eikt (6.41) n (f) = 2n ≤k