Preface
Contents
Chapter 1 Motivation
1.1 From Even and Odd Functions to Group Representations
1.2 Partial Differential Equations and the Laplace Operator
1.2.1 The Heat Equation
1.2.2 The Wave Equation
1.2.3 The Mantegna Fresco
1.3 What is Spectral Theory?
1.4 The Prime Number Theorem
1.5 Further Topics
Chapter 2 Norms and Banach Spaces
2.1 Norms and Semi-Norms
2.1.1 Normed Vector Spaces
2.1.2 Semi-Norms and Quotient Norms
2.1.3 Isometries are Affine
2.1.4 A Comment on Notation
2.2 Banach Spaces
2.2.1 Proofs of Completeness
2.2.2 The Completion of a Normed Vector Space
2.2.3 Non-Compactness of the Unit Ball
2.3 The Space of Continuous Functions
2.3.1 The Arzela–Ascoli Theorem
2.3.2 The Stone–Weierstrass Theorem
2.3.3 Equidistribution of a Sequence
2.3.4 Continuous Functions in Lp Spaces
2.4 Bounded Operators and Functionals
2.4.1 The Norm of Continuous Functionals on C0(X)
2.4.2 Banach Algebras
2.5 Ordinary Differential Equations
2.5.1 The Volterra Equation
2.52 The Sturm–Liouville Equation
2.6 Further Topics
Chapter 3 Hilbert Spaces, Fourier Series, Unitary Representations
3.1 Hilbert Spaces
3.1.1 Definitions and Elementary Properties
3.1.2 Sets in Uniformly Convex Spaces
3.1.3 An Application to Measure Theory
3.2 Orthonormal Bases and Gram–Schmidt
3.2.1 The Non-Separable Case
3.3 Fourier Series on Compact Abelian Groups
3.4 Fourier Series on Td
3.4.1 Convolution on the Torus
3.4.2 Dirichlet and Fejér Kernels
3.4.3 Differentiability and Fourier Series
3.5 Group Actions and Representations
3.5.1 Group Actions and Unitary Representations
3.5.2 Unitary Representations of Compact Abelian Groups
3.5.3 The Strong (Riemann) Integral
3.5.4 The Weak (Lebesgue) Integral
3.5.5 Proof of the Weight Decomposition
3.5.6 Convolution
3.6 Further Topics
Chapter 4 Uniform Boundedness and the Open Mapping Theorem
4.1 Uniform Boundedness
4.1.1 Uniform Boundedness and Fourier Series
4.2 The Open Mapping and Closed Graph Theorems
2.2.1 Baire Category
4.2.2 Proof of the Open Mapping Theorem
4.2.3 Consequences: Bounded Inverses and Closed Graphs
4.3 Further Topics
Chapter 5 Sobolev Spaces and Dirichlet's Boundary Problem
5.1 Sobolev Spaces and Embedding on the Torus
5.1.1 L2 Sobolev Spaces on Td
5.1.2 The Sobolev Embedding Theorem on Td
5.2 Sobolev Spaces on Open Sets
5.2.1 Examples
5.2.2 Restriction Operators and Traces
5.2.3 Sobolev Embedding in the Interior
5.3 Dirichlet's Boundary Value Problem and Elliptic Regularity
5.3.1 The Semi-Inner Product
5.3.2 Elliptic Regularity for the Laplace Operator
5.3.3 Dirichlet's Boundary Value Problem
5.4 Further Topics
Chapter 6 Compact Self-Adjoint Operators, Laplace Eigenfunctions
6.1 Compact Operators
6.1.1 Integral Operators are Often Compact
6.2 Spectral Theory of Self-Adjoint Compact Operators
6.2.2 The Spectral Theorem
6.2.3 Proof of the Spectral Theorem
6.2.4 Variational Characterization of Eigenvalues
6.3 Trace-Class Operators
6.4 Eigenfunctions for the Laplace Operator
6.4.1 Right Inverse and Compactness on the Torus
6.4.2 A Self-Adjoint Compact Right Inverse
6.4.3 Eigenfunctions on a Drum
6.4.4 Weyl's Law
6.5 Further Topics
Chapter 7 Dual Spaces
7.1 The Hahn–Banach Theorem and its Consequences
7.1.1 The Hahn–Banach Lemma and Theorem
7.1.2 Consequences of the Hahn–Banach Theorem
7.1.3 The Bidual
7.1.4 An Application of the Spanning Criterion
7.2 Banach Limits, Amenable Groups, Banach–Tarski
7.2.1 Banach Limits
7.2.2 Amenable Groups
7.3 The Duals of Lp(X)
7.3.1 The Dual of L1(X)
7.3.2 The Dual of Lp(X) for p>1
7.3.3 Riesz–Thorin Interpolation
7.4 Riesz Representation: The Dual of C(X)
7.4.1 Uniqueness
7.4.2 Totally Disconnected Compact Spaces
7.4.3 Compact Spaces
7.4.4 Locally Compact -Compact Metric Spaces
7.4.5 Continuous Linear Functionals on C0(X)
7.5 Further Topics
Chapter 8 Locally Convex Vector Spaces
8.1 Weak Topologies and the Banach–Alaoglu Theorem
8.1.1 Weak* Compactness of the Unit Ball
8.1.2 More Properties of the Weak and Weak* Topologies
8.1.3 Analytic Functions and the Weak Topology
8.2 Applications of Weak* Compactness
8.2.1 Equidistribution
8.2.2 Elliptic Regularity for the Laplace Operator
8.2.3 Elliptic Regularity at the Boundary
8.3 Topologies on the space of bounded operators
8.4 Locally Convex Vector Spaces
8.5 Distributions as Generalized Functions
8.6 Convex Sets
8.6.1 Extreme Points and the Krein–Milman Theorem
8.6.2 Choquet's Theorem
8.7 Further Topics
Chapter 9 Unitary Operators and Flows, Fourier Transform
9.1 Spectral Theory of Unitary Operators
9.1.1 Herglotz's Theorem for Positive-Definite Sequences
9.1.2 Cyclic Representations and the Spectral Theorem
9.1.3 Spectral Measures
9.1.4 Functional Calculus for Unitary Operators
9.1.5 An Application of Spectral Theory to Dynamics
9.2 The Fourier Transform
9.2.1 The Fourier Transform on L1(Rd)
9.2.2 The Fourier Transform on L2(Rd)
9.2.3 The Fourier Transform, Smoothness, Schwartz Space
9.2.4 The Uncertainty Principle
9.3 Spectral Theory of Unitary Flows
9.3.1 Positive-Definite Functions; Cyclic Representations
9.3.2 The Case G=Rd
9.3.3 Stone's Theorem
9.4 Further Topics
Chapter 10 Locally Compact Groups, Amenability, Property (T)
10.1 Haar Measure
10.2 Amenable Groups
10.2.1 Definitions and Main Theorem
10.2.2 Proof of Theorem 10.15
10.2.3 A More Uniform Følner Set
10.2.4 Further Equivalences and Properties
10.3 Property (T)
10.3.1 Definitions and First Properties
10.3.2 Main Theorems
10.3.3 Proof of Každan's Property (T), Connected Case
10.3.4 Proof of Každan's Property (T), Discrete Case
10.3.5 Iwasawa Decomposition and Geometry of Numbers
10.4 Highly Connected Networks: Expanders
10.4.1 Constructing an Explicit Expander Family
10.5 Further Topics
Chapter 11 Banach Algebras and the Spectrum
11.1 The Spectrum and Spectral Radius
11.1.1 The Geometric Series and its Consequences
11.1.2 Using Cauchy Integration
11.2 C*-algebras
11.3 Commutative Banach Algebras and their Gelfand Duals
11.3.1 Commutative Unital Banach Algebras
11.3.2 Commutative Banach Algebras without a Unit
11.3.3 The Gelfand Transform
11.3.4 fand Transform for Commutative C*-algebras
11.4 Locally Compact Abelian Groups
11.4.1 The Pontryagin Dual
11.5 Further Topics
Chapter 12 Spectral Theory and Functional Calculus
12.1 Definitions and Basic Lemmas
12.1.1 Decomposing the Spectrum
12.1.2 The Numerical Range
12.1.3 The Essential Spectrum
12.2 The Spectrum of a Tree
12.2.1 The Correct Upper Bound for the Summing Operator
12.2.2 The Spectrum of S
12.2.3 No Eigenvectors on the Tree
12.3 Main Goals: The Spectral Theorem and Functional Calculus
12.4.1 Continuous Functional Calculus
12.4.2 Corollaries to the Continuous Functional Calculus
12.4.3 Spectral Measures
12.4.4 The Spectral Theorem for Self-Adjoint Operators
12.4.5 Consequences for Unitary Representations
12.5 Commuting Normal Operators
12.6 Spectral Measures and the Measurable Functional Calculus
12.6.1 Non-Diagonal Spectral Measures
12.6.2 The Measurable Functional Calculus
12.7 Projection-Valued Measures
12.8 Locally Compact Abelian Groups and Pontryagin Duality
12.8.1 The Spectral Theorem for Unitary Representations
12.8.2 Characters Separate Points
12.8.3 The Plancherel Formula
12.8.4 Pontryagin Duality
12.9 Further Topics
Chapter 13 Self-Adjoint and Symmetric Operators
13.1 Examples and Definitions
13.2 Operators of the Form T*T
13.4 Symmetric Operators
13.4.1 The Friedrichs Extension
13.4.2 Cayley Transform and Deficiency Indices
13.5 Further Topics
Chapter 14 The Prime Number Theorem
14.1 Two Reformulations
14.2 The Selberg Symmetry Formula and Banach Algebra Norm
14.2.1 Dirichlet Convolution and Möbius Inversion
14.2.2 The Selberg Symmetry Formula
14.2.3 Measure-Theoretic Reformulation
14.2.4 A Density Function and the Continuity Bound
14.2.5 Mertens' Theorem
14.2.6 Completing the Proof
14.3 Non-Trivial Spectrum of the Banach Algebra
14.4 Trivial Spectrum of the Banach Algebra
14.5 Primes in Arithmetic Progressions
14.5.1 Non-Vanishing of Dirichlet L-function at 1
Appendix A: Set Theory and Topology
A.1 Set Theory and the Axiom of Choice
A.2 Basic Definitions in Topology
A.3 Inducing Topologies
A.4 Compact Sets and Tychonoff's Theorem
A.5 Normal Spaces
Appendix B: Measure Theory
B.1 Basic Definitions and Measurability
B.2 Properties of the Integral
B.3 The p-Norm
B.4 Near-Continuity of Measurable Functions
B.5 Signed Measures
Hints for Selected Problems
Notes
References
Notation
General Index

##### Citation preview

Manfred Einsiedler Thomas Ward

Functional Analysis, Spectral Theory, and Applications

276

Graduate Texts in Mathematics Series Editors: Sheldon Axler San Francisco State University, San Francisco, CA, USA Kenneth Ribet University of California, Berkeley, CA, USA

Advisory Board: Alejandro Adem, University of British Columbia David Eisenbud, University of California, Berkeley & MSRI Irene M. Gamba, The University of Texas at Austin J.F. Jardine, University of Western Ontario Jeffrey C. Lagarias, University of Michigan Ken Ono, Emory University Jeremy Quastel, University of Toronto Fadil Santosa, University of Minnesota Barry Simon, California Institute of Technology

Graduate Texts in Mathematics bridge the gap between passive study and creative understanding, offering graduate-level introductions to advanced topics in mathematics. The volumes are carefully written as teaching aids and highlight characteristic features of the theory. Although these books are frequently used as textbooks in graduate courses, they are also suitable for individual study.

Manfred Einsiedler • Thomas Ward

Functional Analysis, Spectral Theory, and Applications

Manfred Einsiedler ETH Zürich Zürich, Switzerland

Thomas Ward School of Mathematics University of Leeds Leeds, UK

Preface

Believe us, we also asked ourselves what could be the rationale for ‘Yet another book on functional analysis’.(1) Little indeed can justify this beyond our own enjoyment of the beauty and power of the topics introduced here. Functional analysis might be described as a part of mathematics where analysis, topology, measure theory, linear algebra, and algebra come together to create a rich and fascinating theory. The applications of this theory are then equally spread throughout mathematics (and beyond). We follow some fairly conventional journeys, and have of course been influenced by other books, most notably that of Lax [59]. While developing the theory we include reminders of the various areas that we build on (in the appendices and throughout the text) but we also reach some fairly advanced and diverse applications of the material usually called functional analysis that often do not find their place in a course on that topic. The assembled material probably cannot be covered in a year-long course, but has grown out of several such introductory courses taught at the Eidgen¨ ossische Technische Hochschule Zu ¨ rich by the first named author, with a slightly different emphasis on each occasion. Both the student and (especially) the lecturer should be brave enough to jump over topics and pick the material of most interest, but we hope that the student will eventually be sufficiently interested to find out what happens in the material that was not covered initially. The motivation for the topics discussed may by found in Chapter 1. Notation and Conventions The symbols N = {1, 2, . . . }, N0 = N ∪ {0}, and Z denote the natural numbers, non-negative integers and integers; Q, R, C denote the rational numbers, real numbers and complex numbers. The real and imaginary parts of a complex number are denoted by x = ℜ(x + iy) and y = ℑ(x + iy). For functions f, g defined on a set X we write f = O(g) or f ≪ g if there is a constant A > 0 with kf (x)k 6 Akg(x)k for all x ∈ X. When the implied constant A depends on a set of parameters A, we write f = OA (g) or f ≪A g v

vi

Preface

(but we may also forget the index if the set of parameters will not vary at all in the discussion). A sequence a1 , a2 , . . . in any space will be denoted (an ) (or (an )n if we wish to emphasize the index variable of the sequence). For two C-valued functions f, g defined on Xr{x0 } for a topological space X (x) containing x0 we write f = o(g) as x → x0 if limx→x0 fg(x) = 0. This definition includes the case of sequences by letting X = N ∪ {∞} and x0 = ∞ with the topology of the one-point compactification. Additional specific notation introduced throughout the text is collected in an index of notation on p. 600. Prerequisites We will assume throughout that the reader is familiar with linear algebra and quite frequently that she is also familiar with finite-dimensional real analysis and complex analysis in one variable. Further background and conventions in topology and measure theory are collected in two appendices, but let us note that throughout compact and locally compact spaces are implicitly assumed to be Hausdorff. Organisation There are 402 exercises in the text, 221 of these with hints in an appendix, all of which contribute to the reader’s understanding of the material. A small number are essential to the development (of the ideas in the section or of later theories); these are denoted ‘Essential Exercise’ to highlight their significance. We indicate the dependencies between the various chapters in the Leitfaden overleaf and in the guide to the chapters that follows it. Acknowledgements We are thankful for various discussions with Menny Aka, Uri Bader, Michael Bj¨ orklund, Marc Burger, Elon Lindenstrauss, Shahar Mozes, Ren´e R¨ uhr, Akshay Venkatesh, and Benjamin Weiss on some of the topics presented here. We also thank Emmanuel Kowalski for making available his notes on spectral theory and allowing us to raid them. We are grateful to several people for their comments on drafts of sections, including Menny Aka, Manuel Cavegn, Rex Cheung, Anthony Flatters, Maxim Gerspach, Tommaso Goldhirsch, Thomas Hille, Guido Lob, Manuel L¨ uthi, Clemens Macho, Alex Maier, Andrea Riva, Ren´e R¨ uhr, Lukas Ruosch, Georg Schildbach, Samuel Stark, Andreas Wieser, Philipp Wirth, and Gao Yunting. Special thanks are due to Roland Prohaska, who proofread the whole volume in four months. Needless to say, despite these many helpful eyes, some typographical and other errors will remain — these are of course solely the responsibility of the authors. The second named author also thanks Grete for her repeated hospitality which significantly aided this book’s completion, and thanks Saskia and Toby for doing their utmost to prevent it. ¨rich Manfred Einsiedler, Zu Thomas Ward, Leeds 2nd April 2017

Banach Spaces

2

Hilbert Spaces & Fourier Series 3 4

5

Sobolev Spaces & Dirichlet Problem

Completeness

6 Dual Spaces

Weak* Compactness & Locally Convex Spaces

Unitary Operators & Fourier Transform

7

Compact Operators & Weyl’s Law

8

9

Banach Algebras & Pontryagin Duals

10 Haar Measure, Amenability, & Property (T)

11

12

14 Prime Number Theorem

Spectral Theorems & Pontryagin Duality

13 Unbounded Operators

vii

viii

Preface

Guide to Chapters Chapter 1 is mostly motivational in character and can be skipped for the theoretical discussions later. Chapter 4 has a somewhat odd role in this volume. On the one hand it presents quite central theorems for functional analysis that also influence many of the definitions later in the volume, but on the other hand, by chance, the theorems are not crucial for our later discussions. The dotted arrows in the Leitfaden indicate partial dependencies. Chapter 6 consists of two parts; the discussion of compact groups depends only on Chapter 3 while the material on Laplace eigenfunctions also builds on material from Chapter 5. The discussion of the adjoint operator and its properties in Chapter 6 is crucial for the spectral theory in Chapters 11, 12, and 13. Moreover, one section in Chapter 8 builds on and finishes our discussion of Sobolev spaces in Chapters 5 and 6. Finally, some of Chapter 11 needs the discussion of Haar measures in the first section of Chapter 10. With these comments and the Leitfaden it should be easy to design many different courses of different lengths focused around the topic of Functional Analysis.

Contents

1

Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1 From Even and Odd Functions to Group Representations . . . . 1.2 Partial Differential Equations and the Laplace Operator . . . . . 1.2.1 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.2 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.3 The Mantegna Fresco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 What is Spectral Theory? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 The Prime Number Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1 1 5 7 10 11 12 13 14

2

Norms and Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 Norms and Semi-Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Normed Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.2 Semi-Norms and Quotient Norms . . . . . . . . . . . . . . . . . . . 2.1.3 Isometries are Affine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 A Comment on Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Banach Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Proofs of Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 The Completion of a Normed Vector Space . . . . . . . . . . 2.2.3 Non-Compactness of the Unit Ball . . . . . . . . . . . . . . . . . . 2.3 The Space of Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 The Arzela–Ascoli Theorem . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 The Stone–Weierstrass Theorem . . . . . . . . . . . . . . . . . . . . 2.3.3 Equidistribution of a Sequence . . . . . . . . . . . . . . . . . . . . . 2.3.4 Continuous Functions in Lp Spaces . . . . . . . . . . . . . . . . . 2.4 Bounded Operators and Functionals . . . . . . . . . . . . . . . . . . . . . . 2.4.1 The Norm of Continuous Functionals on C0 (X) . . . . . . 2.4.2 Banach Algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Ordinary Differential Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.1 The Volterra Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5.2 The Sturm–Liouville Equation . . . . . . . . . . . . . . . . . . . . .

15 15 16 21 23 26 26 29 36 38 39 40 42 48 51 55 60 61 62 63 66 ix

x

Contents

2.6 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3

Hilbert Spaces, Fourier Series, Unitary Representations . . 3.1 Hilbert Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Definitions and Elementary Properties . . . . . . . . . . . . . . 3.1.2 Convex Sets in Uniformly Convex Spaces . . . . . . . . . . . . 3.1.3 An Application to Measure Theory . . . . . . . . . . . . . . . . . 3.2 Orthonormal Bases and Gram–Schmidt . . . . . . . . . . . . . . . . . . . 3.2.1 The Non-Separable Case . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 Fourier Series on Compact Abelian Groups . . . . . . . . . . . . . . . . 3.4 Fourier Series on Td . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 Convolution on the Torus . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.2 Dirichlet and Fej´er Kernels . . . . . . . . . . . . . . . . . . . . . . . . 3.4.3 Differentiability and Fourier Series . . . . . . . . . . . . . . . . . . 3.5 Group Actions and Representations . . . . . . . . . . . . . . . . . . . . . . . 3.5.1 Group Actions and Unitary Representations . . . . . . . . . 3.5.2 Unitary Representations of Compact Abelian Groups . 3.5.3 The Strong (Riemann) Integral. . . . . . . . . . . . . . . . . . . . . 3.5.4 The Weak (Lebesgue) Integral . . . . . . . . . . . . . . . . . . . . . 3.5.5 Proof of the Weight Decomposition . . . . . . . . . . . . . . . . . 3.5.6 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71 71 71 75 83 86 90 91 95 97 99 104 106 107 110 111 113 115 118 120

4

Uniform Boundedness and the Open Mapping Theorem . . 4.1 Uniform Boundedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1.1 Uniform Boundedness and Fourier Series . . . . . . . . . . . . 4.2 The Open Mapping and Closed Graph Theorems . . . . . . . . . . . 4.2.1 Baire Category . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Proof of the Open Mapping Theorem . . . . . . . . . . . . . . . 4.2.3 Consequences: Bounded Inverses and Closed Graphs . . 4.3 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

121 121 123 126 126 128 130 133

5

Sobolev Spaces and Dirichlet’s Boundary Problem . . . . . . . . 5.1 Sobolev Spaces and Embedding on the Torus . . . . . . . . . . . . . . 5.1.1 L2 Sobolev Spaces on Td . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1.2 The Sobolev Embedding Theorem on Td . . . . . . . . . . . . 5.2 Sobolev Spaces on Open Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.2 Restriction Operators and Traces . . . . . . . . . . . . . . . . . . . 5.2.3 Sobolev Embedding in the Interior . . . . . . . . . . . . . . . . . . 5.3 Dirichlet’s Boundary Value Problem and Elliptic Regularity . 5.3.1 The Semi-Inner Product . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3.2 Elliptic Regularity for the Laplace Operator . . . . . . . . . 5.3.3 Dirichlet’s Boundary Value Problem . . . . . . . . . . . . . . . . 5.4 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

135 135 135 138 140 144 146 149 152 153 155 160 165

Contents

xi

6

Compact Self-Adjoint Operators, Laplace Eigenfunctions . 6.1 Compact Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.1.1 Integral Operators are Often Compact . . . . . . . . . . . . . . 6.2 Spectral Theory of Self-Adjoint Compact Operators. . . . . . . . . 6.2.1 The Adjoint Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.2 The Spectral Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2.3 Proof of the Spectral Theorem . . . . . . . . . . . . . . . . . . . . . 6.2.4 Variational Characterization of Eigenvalues . . . . . . . . . . 6.3 Trace-Class Operators. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4 Eigenfunctions for the Laplace Operator . . . . . . . . . . . . . . . . . . . 6.4.1 Right Inverse and Compactness on the Torus . . . . . . . . 6.4.2 A Self-Adjoint Compact Right Inverse . . . . . . . . . . . . . . 6.4.3 Eigenfunctions on a Drum . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.4 Weyl’s Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

167 168 170 174 175 176 178 181 183 196 197 198 199 201 208

7

Dual Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1 The Hahn–Banach Theorem and its Consequences . . . . . . . . . . 7.1.1 The Hahn–Banach Lemma and Theorem . . . . . . . . . . . . 7.1.2 Consequences of the Hahn–Banach Theorem . . . . . . . . . 7.1.3 The Bidual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.4 An Application of the Spanning Criterion . . . . . . . . . . . 7.2 Banach Limits, Amenable Groups, Banach–Tarski . . . . . . . . . . 7.2.1 Banach Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.2 Amenable Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.3 The Banach–Tarski Paradox . . . . . . . . . . . . . . . . . . . . . . . 7.3 The Duals of Lpµ (X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.1 The Dual of L1µ (X) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 The Dual of Lpµ (X) for p > 1 . . . . . . . . . . . . . . . . . . . . . . 7.3.3 Riesz–Thorin Interpolation . . . . . . . . . . . . . . . . . . . . . . . . 7.4 Riesz Representation: The Dual of C(X) . . . . . . . . . . . . . . . . . . 7.4.1 Uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.2 Totally Disconnected Compact Spaces . . . . . . . . . . . . . . 7.4.3 Compact Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4 Locally Compact σ-Compact Metric Spaces . . . . . . . . . . 7.4.5 Continuous Linear Functionals on C0 (X) . . . . . . . . . . . . 7.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

209 209 209 212 213 214 217 217 218 223 227 228 230 233 239 240 240 243 246 248 252

8

Locally Convex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.1 Weak Topologies and the Banach–Alaoglu Theorem . . . . . . . . . 8.1.1 Weak* Compactness of the Unit Ball . . . . . . . . . . . . . . . 8.1.2 More Properties of the Weak and Weak* Topologies . . 8.1.3 Analytic Functions and the Weak Topology . . . . . . . . . . 8.2 Applications of Weak* Compactness . . . . . . . . . . . . . . . . . . . . . . 8.2.1 Equidistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

253 253 256 257 260 261 262

xii

Contents

8.2.2 Elliptic Regularity for the Laplace Operator . . . . . . . . . 8.2.3 Elliptic Regularity at the Boundary . . . . . . . . . . . . . . . . . Topologies on the space of bounded operators . . . . . . . . . . . . . . Locally Convex Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . Distributions as Generalized Functions . . . . . . . . . . . . . . . . . . . . Convex Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8.6.1 Extreme Points and the Krein–Milman Theorem . . . . . 8.6.2 Choquet’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

270 278 290 292 296 298 301 304 311

Unitary Operators and Flows, Fourier Transform . . . . . . . . . 9.1 Spectral Theory of Unitary Operators . . . . . . . . . . . . . . . . . . . . . 9.1.1 Herglotz’s Theorem for Positive-Definite Sequences . . . 9.1.2 Cyclic Representations and the Spectral Theorem . . . . 9.1.3 Spectral Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.1.4 Functional Calculus for Unitary Operators . . . . . . . . . . . 9.1.5 An Application of Spectral Theory to Dynamics . . . . . . 9.2 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.2.1 The Fourier Transform on L1 (Rd ) . . . . . . . . . . . . . . . . . . 9.2.2 The Fourier Transform on L2 (Rd ) . . . . . . . . . . . . . . . . . . 9.2.3 The Fourier Transform, Smoothness, Schwartz Space . . 9.2.4 The Uncertainty Principle . . . . . . . . . . . . . . . . . . . . . . . . . 9.3 Spectral Theory of Unitary Flows . . . . . . . . . . . . . . . . . . . . . . . . 9.3.1 Positive-Definite Functions; Cyclic Representations . . . 9.3.2 The Case G = Rd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.3.3 Stone’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9.4 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

313 313 314 316 320 323 326 329 331 337 340 342 344 344 346 350 352

10 Locally Compact Groups, Amenability, Property (T) . . . . . 10.1 Haar Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2 Amenable Groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Definitions and Main Theorem . . . . . . . . . . . . . . . . . . . . . 10.2.2 Proof of Theorem 10.15 . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.2.3 A More Uniform Følner Set . . . . . . . . . . . . . . . . . . . . . . . . 10.2.4 Further Equivalences and Properties . . . . . . . . . . . . . . . . 10.3 Property (T) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Definitions and First Properties . . . . . . . . . . . . . . . . . . . . 10.3.2 Main Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.3 Proof of Kaˇzdan’s Property (T), Connected Case . . . . . 10.3.4 Proof of Kaˇzdan’s Property (T), Discrete Case . . . . . . . 10.3.5 Iwasawa Decomposition and Geometry of Numbers . . . 10.4 Highly Connected Networks: Expanders . . . . . . . . . . . . . . . . . . . 10.4.1 Constructing an Explicit Expander Family . . . . . . . . . . . 10.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

353 353 361 362 363 371 373 375 375 377 378 384 392 400 406 408

8.3 8.4 8.5 8.6

8.7 9

Contents

xiii

11 Banach Algebras and the Spectrum . . . . . . . . . . . . . . . . . . . . . . . 11.1 The Spectrum and Spectral Radius . . . . . . . . . . . . . . . . . . . . . . . 11.1.1 The Geometric Series and its Consequences . . . . . . . . . . 11.1.2 Using Cauchy Integration . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 C ∗ -algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Commutative Banach Algebras and their Gelfand Duals . . . . . 11.3.1 Commutative Unital Banach Algebras . . . . . . . . . . . . . . 11.3.2 Commutative Banach Algebras without a Unit . . . . . . . 11.3.3 The Gelfand Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.4 The Gelfand Transform for Commutative C ∗ -algebras . 11.4 Locally Compact Abelian Groups . . . . . . . . . . . . . . . . . . . . . . . . . 11.4.1 The Pontryagin Dual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

409 409 411 413 417 418 419 421 422 423 425 428 431

12 Spectral Theory and Functional Calculus . . . . . . . . . . . . . . . . . 12.1 Definitions and Basic Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 Decomposing the Spectrum . . . . . . . . . . . . . . . . . . . . . . . . 12.1.2 The Numerical Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.3 The Essential Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2 The Spectrum of a Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.1 The Correct Upper Bound for the Summing Operator . 12.2.2 The Spectrum of S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.2.3 No Eigenvectors on the Tree . . . . . . . . . . . . . . . . . . . . . . . 12.3 Main Goals: The Spectral Theorem and Functional Calculus . 12.4 Self-Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.1 Continuous Functional Calculus . . . . . . . . . . . . . . . . . . . . 12.4.2 Corollaries to the Continuous Functional Calculus . . . . 12.4.3 Spectral Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.4.4 The Spectral Theorem for Self-Adjoint Operators . . . . . 12.4.5 Consequences for Unitary Representations . . . . . . . . . . . 12.5 Commuting Normal Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.6 Spectral Measures and the Measurable Functional Calculus . . 12.6.1 Non-Diagonal Spectral Measures . . . . . . . . . . . . . . . . . . . 12.6.2 The Measurable Functional Calculus . . . . . . . . . . . . . . . . 12.7 Projection-Valued Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8 Locally Compact Abelian Groups and Pontryagin Duality . . . 12.8.1 The Spectral Theorem for Unitary Representations . . . 12.8.2 Characters Separate Points . . . . . . . . . . . . . . . . . . . . . . . . 12.8.3 The Plancherel Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.8.4 Pontryagin Duality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.9 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

433 433 433 436 437 437 439 441 442 443 446 447 451 453 454 458 459 461 461 462 468 473 474 477 478 483 485

xiv

Contents

13 Self-Adjoint and Symmetric Operators . . . . . . . . . . . . . . . . . . . 13.1 Examples and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.2 Operators of the Form T ∗ T . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3 Self-Adjoint Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4 Symmetric Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.1 The Friedrichs Extension . . . . . . . . . . . . . . . . . . . . . . . . . . 13.4.2 Cayley Transform and Deficiency Indices . . . . . . . . . . . . 13.5 Further Topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

487 487 491 495 498 499 500 502

14 The Prime Number Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.1 Two Reformulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2 The Selberg Symmetry Formula and Banach Algebra Norm . . 14.2.1 Dirichlet Convolution and M¨obius Inversion. . . . . . . . . . 14.2.2 The Selberg Symmetry Formula . . . . . . . . . . . . . . . . . . . . 14.2.3 Measure-Theoretic Reformulation . . . . . . . . . . . . . . . . . . 14.2.4 A Density Function and the Continuity Bound . . . . . . . 14.2.5 Mertens’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.2.6 Completing the Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14.3 Non-Trivial Spectrum of the Banach Algebra . . . . . . . . . . . . . . . 14.4 Trivial Spectrum of the Banach Algebra . . . . . . . . . . . . . . . . . . . 14.5 Primes in Arithmetic Progressions . . . . . . . . . . . . . . . . . . . . . . . . 14.5.1 Non-Vanishing of Dirichlet L-function at 1 . . . . . . . . . . .

503 503 507 507 509 514 517 518 520 523 524 526 529

Appendix A: Set Theory and Topology . . . . . . . . . . . . . . . . . . . . . . . A.1 Set Theory and the Axiom of Choice . . . . . . . . . . . . . . . . . . . . . . A.2 Basic Definitions in Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.3 Inducing Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A.4 Compact Sets and Tychonoff’s Theorem . . . . . . . . . . . . . . . . . . . A.5 Normal Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

537 537 538 541 545 547

Appendix B: Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.1 Basic Definitions and Measurability . . . . . . . . . . . . . . . . . . . . . . . B.2 Properties of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.3 The p-Norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B.4 Near-Continuity of Measurable Functions . . . . . . . . . . . . . . . . . . B.5 Signed Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

551 551 554 556 558 561

Hints for Selected Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 593 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598 General Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 600

Chapter 1

Motivation

We start by discussing some seemingly disparate topics that are all intimately linked to notions from functional analysis. Some of the topics have been important motivations for the development of the theory that came to be called functional analysis in the first place. We hope that the variety of topics discussed in Sections 1.1–1.4 and those mentioned in Section 1.5 will help to give some insight into the central role of functional analysis in mathematics.

1.1 From Even and Odd Functions to Group Representations We recall the following elementary notions of symmetry and anti-symmetry for functions. A function f : R → R is said to be even if f (−x) = f (x) for all x ∈ R, and odd if f (−x) = −f (x) for all x ∈ R. Every function f : R → R can be split into an even and an odd component, since f (x) =

f (x)+f (−x) 2

|

{z

(−x) + f (x)−f . 2 } | {z }

the even part

(1.1)

the odd part

Exercise 1.1. Is the decomposition of a function into odd and even parts in (1.1) unique? (−x) That is, if f = e + o with e even and o odd, is e(x) = f (x)+f ? 2

As one might guess, behind the definition of even and odd functions and the decomposition in (1.1), is the group Z/2Z = {0, 1} acting on R via the map x 7→ (−1)ℓ x for ℓ ∈ Z/2Z. Here we are using ℓ ∈ Z as a shorthand for the coset ℓ + 2Z ∈ Z/2Z. †

Chapter 1 is atypical for this book. The reader may, and the lecturer should, skip this chapter or return to it later, as convenient.

© Springer International Publishing AG 2017 M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications, Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_1

1

2

1 Motivation

In order to generalize this observation, recall that an action of a group G on a set X is a map G × X → X, written (g, x) 7→ g x, with the properties g (h x) = (gh) x for all g, h ∈ G and x ∈ X, and e x = x for all x ∈ X, where e ∈ G is the identity element. We will sometimes write G X for an action of G on X. Having associated (in an informal way for the moment) the decomposition of a function into odd and even parts with the action of the group Z/2Z on R, the notion of group action suggests many generalizations of the decomposition. For this we note that an action of a group G on a space X gives rise to a linear action on the space of functions on X by the formula (g, f ) 7→ f g where f g (x) = f (g −1 x) for all x ∈ X, g ∈ G, and functions f on X.

..

. .

.

.

.

ý



Exercise 1.2. Show that (ℓ1 , ℓ2 ) (x1 , x2 ) = (−1)ℓ1 x1 , (−1)ℓ2 x2 for ℓ1 , ℓ2 ∈ Z/2Z and (x1 , x2 ) ∈ R2 defines an action of (Z/2Z)2 on R2 . Show that every real- or complexvalued function on R2 can be decomposed uniquely into a sum of four functions that are even in both variables; even in x1 and odd in x2 ; odd in x1 and even in x2 ; and odd in both variables.

In order to define another action on R2 we write   cos φ − sin φ k(φ) = sin φ cos φ for the matrix of anti-clockwise rotation on R2 by the angle φ ∈ R. We also let T = R/Z be the one-dimensional circle group or 1-torus, and define the action of φ ∈ T on R2 by the rotation by k(2πφ). Once again we will sometimes write t as a shorthand for the coset t + Z ∈ R/Z; in particular the interval [0, 1) may be identified with T using addition modulo 1. We note that we can also make T into a topological group by declaring cosets to be close if their representatives can be chosen to be close in R or equivalently by using the metric d(t1 + Z, t2 + Z) = minn∈Z |t1 − t2 + n| for t1 , t2 ∈ R. In studying any situation with rotational symmetry on R2 one is naturally led to this action. What is the corresponding decomposition of functions for this action? Clearly one distinguished class of functions is given by the functions invariant under rotation. That is, functions satisfying f (v) = f (k(2πφ)v) for all φ ∈ T. The graph of such a function is the surface obtained by rotating a graph of a real- or complex-valued function on [0, ∞) about the z-axis. Let us postpone the answers to the above questions and instead consider a finite analogue to the problem. Fix some integer q > 1 and define the group G = Z/qZ, which acts on R2 by letting ℓ + qZ ∈ G act by rotation by 2πℓ 2π the angle 2πℓ q using k( q ). To simplify the notation set K = k( q ) and write the action as G×R2 ∋ (ℓ+qZ, v) 7→ K ℓ v (which is well-defined since K q = I). This simplification in the notation helps to clarify the underlying structure, and reflects one of the themes of functional analysis: thinking of progressively more complicated objects (numbers, vectors, functions, operators) as ‘points’ in a larger space allows the structures to be seen more clearly.

1.1 From Even and Odd Functions to Group Representations

3

We say that a complex-valued function on R2 has weight n for this action if f (K ℓ v) = e2πinℓ/q f (v) for every v ∈ R2 and ℓ + qZ ∈ G (or equivalently only for ℓ = 1). We now generalize the formula (1.1) to the case of the finite rotation group considered here. In fact, using the shorthand ζ = e2πi/q we define for a complex-valued function f on R2 the functions fn (v) =

1 q

q−1 X

ζ −nℓ f (K ℓ v)

(1.2)

ℓ=0

for every v ∈ R2 and n = 0, . . . , q − 1. Since fn (Kv) =

1 q

q−1 X

ζ −nℓ f (K ℓ+1 v) =

1 q

q X

ζ −n(m−1) f (K ℓ v) = ζ n fn (v)

m=1

ℓ=0

for every v ∈P R2 , we see that fn has weight n. By the geometric series formula, −nℓ we see that q−1 equals q for ℓ = 0 and 0 for ℓ = 1, . . . , q − 1, so n=0 ζ q−1 X

n=0

fn (v) =

1 q

q−1 X

n,ℓ=0

ζ

−nℓ

f (K v) =

1 q

q−1 X q−1 X ℓ=0

n=0

ζ

−nℓ

!

f (K ℓ v) = f (v)

for every v ∈ R2 . Therefore, f can be written as a finite sum of functions of weight n = 0, . . . , q − 1. Let us return now to the case of the group SO2 (R) of all rotations k(2πt) for t ∈ T. To guess what the classes of functions should be, we note that all of the symmetries of functions considered above can be phrased naturally in terms of the possible continuous group homomorphisms of the acting group to the group S1 = {z ∈ C | |z| = 1}. It is easy to show (see Exercise 1.3 for the third, non-trivial, statement) that (1) any homomorphism Z/2Z → S1 has the form ℓ 7→ (±1)ℓ , (2) any homomorphism Z/qZ → S1 has the form ℓ 7→ e2πinℓ/q = ζ nℓ for some n ∈ {0, . . . , q − 1}, and (3) any continuous homomorphism T → S1 has the form φ 7→ χn (φ) = e2πinφ for some n ∈ Z, so there are infinitely many such homomorphisms, and they are naturally parameterized by the integers. For any topological group G, we call the continuous homomorphisms from G to S1 the (unitary) characters of G. Exercise 1.3. Show that any character of T has the form claimed in (3) above.

Notice that each of the characters in (1) and (2) corresponds to exactly one type of function in the decompositions of functions discussed above. Generalizing this correspondence, we turn to (3) and say that a complex-valued function f : R2 → C has weight n (is of type n) if it satisfies f (k(2πφ)v) = χn (φ)f (v) for all φ ∈ T and v ∈ R2 .

4

1 Motivation

One might now guess — and we will see in Chapter 3 that this is indeed the case — that any reasonable function f : R2 → C can be written as a linear combination X f= fn (1.3) n∈Z

where fn has weight n. However, in contrast to (1.1) this is an infinite sum, so we are no longer talking about a purely algebraic phenomenon. The decomposition (1.3) — its existence and its properties — lies both in algebra and in analysis. We therefore have to become concerned both with the algebraic structure and with questions of convergence. Depending on the notion of convergence used, the class of reasonable functions turns out to vary. These classes of reasonable functions are in fact important examples of Banach spaces, which will be defined in Chapter 2. The discussion above on decompositions into sums of functions of different weights will later be part of the treatment of Fourier analysis (see Chapter 3). For this we will initially study the mathematically simpler situation of the action of T on T by translation, (x, y) 7→ x + y for x, y ∈ T. Adjusting the definitions above appropriately, we say that a function f : T → C has weight n ∈ Z if and only if f is a multiple of χn itself. We therefore seek, for a reasonable function f : T → C, constants cn for n ∈ Z with X f= cn χn . (1.4) n∈Z

The right-hand side of (1.4) is called the Fourier series of f . We will see later that it is relatively straightforward (at least in the abstract sense) to find the Fourier coefficients cn via the identity Z cn = f (x)χn (x) dx T

for all n ∈ Z. Fourier series arise naturally in many day-to-day applications. A string or a wind instrument playing a note is producing a periodic pressure wave with a certain frequency. The tone humans hear usually corresponds to this frequency, which is called the fundamental in music theory. There are also higher frequencies, usually integer multiples of the fundamental frequency, appearing in the wave. These frequences are called harmonics and the ratio between the Fourier coefficients of the harmonics and the Fourier coefficient of the fundamental make the distinctive sound of different instruments (for example, the flute and the clarinet) when playing the same fundamental note. Returning to our discussion of symmetries for functions on R2 we will show similarly that for a reasonable function f : R2 → C the function Z fn (v) = χn (φ)f (k(2πφ)v) dφ T

1.2 Partial Differential Equations and the Laplace Operator

5

for n ∈ Z has weight n (compare this with (1.2)), and that (1.3) holds. R

Exercise 1.4. Show that if the function fn (v) = T χn (φ)f (k(2πφ)v) dφ is well-defined (say the integral exists for almost every v ∈ R2 ), then it has weight n.

To summarize, we will introduce classes of functions (which will be examples of Banach spaces), and determine whether for functions in these classes the Fourier series (1.4) or the weight decomposition (1.3) converges, and in what sense the convergence does or does not happen. For functions f : R3 → C one can generalize the discussion above in many different ways, by considering the actions of various different groups as follows: • Z/2Z, giving the familiar generalization of even and odd functions. • T∼ = SO2 (R) acting by rotations in the x, y-plane about the z-axis. This gives a generalization of our discussion of functions R2 → C, and we will be able to treat this case in a similar way to the two-dimensional case. • SO3 (R), the full group of orientation-preserving rotations of R3 . The last case in this list is more difficult to analyze than any of the cases discussed above. The additional complications arise since SO3 (R) is not abelian. In fact, the group SO3 (R) is simple, and as a result there are no non-trivial continuous homomorphisms SO3 (R) → S1 , so this cannot be used to define classes of functions in the same way. The case of SO3 (R) requires the theory of harmonic analysis and unitary representations of compact groups. We will not reach these important topics here, but will lay the ground for them and refer to the treatment in Folland [32, Ch. 5] or [26]. Although we have used actions of geometric origin to motivate the discussion above, the decompositions described hold more generally for general linear actions of finite abelian groups on vector spaces (often called group representations) and also for unitary representations of compact abelian groups (with T being the main example of a compact abelian group) on Hilbert spaces. Hilbert spaces are Banach spaces that are equipped with an inner product, and will be introduced in Chapter 3 where we will also discuss unitary representations for the first time.

1.2 Partial Differential Equations and the Laplace Operator There is no need to motivate the study of differential equations, as they are of central importance across all sciences concerned with measurable quantities that change with respect to other variables of the system studied. Even the simplest ordinary differential equations can lead directly to the study of integral operators, which may be analyzed using tools from functional analysis. The reader familiar with the theorem of Picard and Lindel¨of on existence and

6

1 Motivation

uniqueness of solutions to certain initial value problems and its proof will not be surprised by this connection. We refer to Section 2.4 for more on this. However, here we would like to discuss two particular partial differential equations. As we will see later, the mathematical background needed for this, most of which comes from functional analysis, is much more interesting (meaning difficult) than that needed for ordinary differential equations. One of the objectives of this book is to make the informal discussion in this section more formal and rigorous. We will cover this topic in Chapters 5 and 6 (apart from a technical point, which we resolve in Section 8.2). In both of the partial differential equations that we will discuss, we will need to express the difference between the value of a function at a point and its values in a neighbourhood of the point. To make the resulting equations more amenable for study one uses an infinitesimal version of this difference, which brings into the picture(2) the Laplace operator ∆ (also sometimes denoted by ∇2 ) defined by ∆f =

∂2f ∂ 2f + ··· + 2 2 ∂x1 ∂xd

(1.5)

for a smooth function f : Rd → R because of the following simple observation. Proposition 1.5 (Laplace and neighbourhood averages). Let U ⊆ Rd be an open set, and suppose that f : U → R is a C 2 function. Then Z  1 lim f (y) − f (x) dy = c∆f (x) r→0 r2 vol(Br (x)) B (x) r

for any x ∈ U , where dy denotes integration with respect to the Lebesgue 1 measure on the r-ball Br = {y ∈ Rd | kyk < r} ⊆ Rd and c = 2(d+2) . Proof. Suppose for simplicity of notation that x = 0, and apply Taylor approximation to obtain f (y) = f (0) + f ′ (0)y +

d  1 X ∂ 2f (0)yi yj + o kyk2 , 2 i,j=1 ∂xi ∂xj

as y → 0, where f ′ (0) is the total derivative of f at 0, and we use the notation o(·) from p. vi. Now in the integral over the r-ball Br the linear terms cancel out due to the symmetry of the ball. The same argument applies to the mixed quadratic terms. Thus we are left with Z

d

1 X ∂2f f (y) dy = vol(Br )f (0) + (0) 2 i=1 ∂x2i Br

Next notice that

R

Br

yi2 dy =

R

Br

Z

Br

 yi2 dy + vol(Br ) o r2 . (1.6)

yj2 dy for all 1 6 i, j 6 d and

1.2 Partial Differential Equations and the Laplace Operator

Z

Br

kyk2 dy =

Z

B1

7

r2 kzk2 rd dz

using the substitution y = rz. It follows that Z

Br

d

yi2 dy =

1X d j=1

Z

Br

yj2 dy =

1 d

Z

Br

kyk2 dy =

rd+2 d

Z

Br

kzk2 dz . B | 1 {z } =:C

Combining this with (1.6) gives 1 1 2 r vol(Br )

Z

1 1 rd+2 ∆f (0) C + o(1) vol(Br ) 2 d C = ∆f (0) + o(1). 2d vol(B1 ) | {z }

(f (y) − f (0)) dy =

r2

=c

For completeness, we calculate the value of c using d-dimensional spherical coordinates. Every point z ∈ Rd is of the form z = rv for some r > 0 and v ∈ Sd−1 = {w ∈ Rd | kwk = 1}. Using this substitution we have Z Z vol(B1 ) = Sd−1

0

1

rd−1 dr dv =

1 vol(Sd−1 ), d

where the integration with respect to v uses the (d − 1)-dimensional volume measure on the sphere Sd−1 . Similarly, Z Z Z 1 1 C= kzk2 dz = rd+1 dr dv = vol(Sd−1 ). d + 2 d−1 B1 S 0 Thus

✘ ✘ 1 d−1 ✘✘ vol(S ) C 1 d+2✘ c= = . ✘= ✘ 1 d−1 ✘ 2d vol(B1 ) 2(d + 2) ✘ 2d vol(S ) ✁ d✘ ✁ 

1.2.1 The Heat Equation The heat equation describes how temperatures in a region U ⊆ Rd (representing a physical medium) evolve given an initial temperature distribution and some prescribed behaviour of the heat at the boundary ∂U . Inside the medium we expect the flow of heat to be proportional to the difference between

8

1 Motivation

the temperature at each point and the temperature in a neighbourhood of the point. If we write u(x, t) for the temperature of the medium at the point x at the time t, then this suggests a relationship ∂u = |constant {z } ∆x u, ∂t

(1.7)

>0

where

∆x u = ∆u =

∂2u ∂2u + ···+ 2 2 ∂x1 ∂xd

is the Laplace operator with respect to the space variables x1 , . . . , xd only. Equation (1.7) is called the heat equation. If we take the physical interpretation of this equation for granted, then we can use it to give heuristic explanations of some of the mathematical phenomena that arise. Suppose first that we prescribe a time-independent temperature distribution at the boundary ∂U of the medium U , and then wait until the system has settled into thermal equilibrium. Experience (that is, physical intuition) suggests that in the long run (as time goes to infinity) the temperature distribution inside U will reach a stable (time-independent) configuration. That is, for any prescribed boundary value b : ∂U → R we expect the heat equation on U to have a time-independent solution. More formally, we expect there to be a function u : U → R satisfying  ∆u = 0 (1.8) u|∂U = b. The boundary value problem (1.8) is the Dirichlet boundary value problem, the partial differential equation ∆u = 0 is called the Laplace equation, and its solutions are called harmonic functions. Proving what the physical intuition suggests, namely that the Dirichlet boundary value problem does indeed have a (smooth) solution, will take us into the theory of Sobolev spaces. We will prove the existence of smooth solutions for the Dirichlet boundary value problem in Chapter 5 (and Section 8.2). Leaving the Dirichlet problem to one side for now, we continue with the heat equation. Motivated by the methods of linear ordinary differential equations and their initial value problems, we would like to know how we can find other solutions to the partial differential equation while ignoring the boundary values. A simple kind of solution to seek would be those with separated variables, that is solutions of the form u(x, t) = F (x)G(t) with x ∈ U ⊆ Rd and t ∈ R. The heat equation would then imply that F (x)G′ (t) =

∂u = c(∆F (x))G(t) ∂t

1.2 Partial Differential Equations and the Laplace Operator

9

and so (we may as well choose all physical constants to make c = 1) the quotient G′ (t) ∆F (x) = G(t) F (x) is independent of x and of t, and therefore is a constant (as this is not really a proof, we will not worry about the division by a quantity that may vanish). In summary, u(x, t) = F (x)G(t) solves the equation ∂u = ∆x u ∂t if G(t) = eλt and ∆F = λF for some constant λ, which one can quickly check (rigorously). Ignoring for the moment the values of F on the boundary ∂U , it is easy to find functions with ∆F = λF for any λ ∈ R by using suitable exponential and trigonometric functions. However, these simple-minded solutions turn out not to be particularly useful. Only those special functions F : U → R with  ∆F = λF inside U F |∂U = 0 turn out to be useful in the general case. However, it is not clear that such functions even exist, nor for which values of λ they may exist. Suppose now that the following non-trivial result — the existence of a basis of eigenfunctions — (which we will be able to prove in many special cases in Chapter 6) is known for the region U ⊆ Rd .

Claim. Every P sufficiently nice function f : U → R can be decomposed into a sum f = n Fn of functions Fn : U → R satisfying  ∆Fn = λn Fn for some λn < 0 Fn |∂U = 0. We may then solve the partial differential equation ∂u = ∆x u ∂t with boundary values



u|∂U×{t} = 0 for all t u|U×{0} = f

using the principle of superposition to obtain the general solution X u(x, t) = Fn (x)eλn t .

(1.9)

n

Since λn < 0 for each n > 1, the series (1.9) converges to 0 as t → ∞ if it is absolutely convergent, in accordance with our physical intuition, since the boundary condition states that the temperature is kept at 0 at the boundary

10

1 Motivation

of the region for all t > 0. We conclude by mentioning that the claim above will follow from the study of the spectral theory of an operator, but the definition of the operator involved will be somewhat indirect. 1.2.2 The Wave Equation The wave equation describes how an elastic membrane moves. We let u(x, t) be the vertical position of the membrane at time t above the point with coordinate x. As the membrane has mass (and hence inertia) our assumption is that the vertical acceleration — a second derivative of position with respect to time t — of the membrane at time t above x will be proportional to the difference between the position of the membrane at that point and at nearby points. Hence we call ∂2u = c∆x u (1.10) ∂t2 the wave equation. As in the case of the heat equation, we may as well choose physical units to arrange that c = 1. Once more we may argue from physical intuition that the Dirichlet boundary problem for the wave equation always has a solution. Consider a wire loop above the boundary ∂U (notice that even at this vague level we are imposing some smoothness: our physical image of a wire loop may be very distorted but will certainly be piecewise smooth) and imagine a soap film whose edge is the wire. Then, after some initial oscillations,† we expect the soap film to stabilize, giving a solution to the Dirichlet boundary value problem in (1.8) defined by the shape of the wire. In this context, what is the meaning of eigenfunctions of the Laplace operator that vanish on the boundary? To see this, imagine a drum whose skin has the shape U so that the vibrating membrane is fixed along the boundary ∂U , which is simply a flat loop. Suppose now that F : U → R satisfies  ∆F = λF in U F |∂U = 0 √ for some λ < 0, then we see that u(x, t) = F (x) cos( −λt) satisfies √ √ ∂2 u(x, t) = F (x) (−( −λ)2 ) cos( −λt) 2 | {z } ∂t =λ

√ √ = λF (x) cos( −λt) = ∆x (F cos( −λt))

and hence solves the wave equation. In other words, if we start the drum at time t = 0 with the prescribed shape given by the function F , then the †

In the real world there would also be a friction term, and the model for this is a modified wave equation (which we will not discuss further).

1.2 Partial Differential Equations and the Laplace Operator

11

drum will produce a pure tone† of frequency √2π . This also sheds some −λ light on which values of λ appear in the claim on p. 9 – namely those that correspond to the pure frequencies that the drum can produce. For a onedimensional drum (that is, a string) these frequencies are easy to understand (see also Exercise 1.7). However, even for two-dimensional drums the precise eigenvalues remain mysterious. We will nonetheless be able to count these shape-specific eigenvalues asymptotically using the method of Weyl from 1911 (see Section 6.4). Exercise 1.6. Assume that U satisfies the basis of eigenfunctions claim from p. 9, and that the Dirichlet boundary value problem always has a solution on U . (a) Combine the above discussions to produce a general procedure to solve the boundary value problem (no rigorous proof is expected, but find the places that lack rigour)

(

∂u = ∆u in U × [0, ∞); ∂t u|∂U ×{t} = b; u|U ×{0} = f.

(b) Repeat (a) for the wave equation. Exercise 1.7. For a circular string vibrating in one dimension — the wave equation over T — the basis of eigenfunctions claim is precisely the claim that every nice function can be represented by its Fourier series. Assuming that this holds, show the basis of eigenfunctions claim for the domain U = (0, 1) ⊆ R. This relates to the wave equation for the clamped vibrating string on [0, 1], that is, to the boundary conditions y(0) = y(1) = 0. (In fact the eigenfunctions are given by x 7→ sin(πnx) with n = 1, 2 . . . ; no rigorous proof is expected, but explore the connection.)

1.2.3 The Mantegna Fresco An illustration of how some of the ideas discussed above link together will be seen in Section 6.4.3, where we discuss eigenfunctions of the Laplacian on a disk. Here the circular symmetry is exploited as in Section 1.1, and the eigenfunctions may be used to decompose functions. A remarkable application of these ideas was made to the problem of reconstructing a bombed fresco by Andrea Mantegna in a church in Padua. The damage resulted in the fresco being broken into approximately 88, 000 small pieces which needed to be reassembled using a black and white photograph; we refer to Fornasier and Toniolo [35] for the detailed description of how circular harmonics were used to render the computation required practicable. The partially reconstructed coloured image was then used to build a coloured image of the entire fresco. †

This preferred frequency for certain physical objects is part of the phenomena of resonance, and the design of large structures like buildings or bridges tries to prevent resonances that may lead to reinforcement of oscillations by wind, for example.

12

1 Motivation

1.3 What is Spectral Theory? As we will see later, the topics considered in Sections 1.1 and 1.2 are connected to spectral theory. The goal of spectral theory, at its broadest, might be described as an attempt to ‘classify’ all linear operators. We will restrict our attention to Hilbert spaces, which is natural for two reasons. Firstly, it is much easier than the general case of operators on Banach spaces. Secondly, many of the most important applications belong to this simpler setting of operators on Hilbert spaces. In finite-dimensional linear algebra the classification problem for linear operators is successfully solved by the theory of eigenvalues, eigenspaces, minimal and characteristic polynomials, which leads to a canonical normal form (the Jordan normal form) for any linear operator Cn → Cn for n > 1. We will not be able to get such a general theory if the Hilbert space H is infinite-dimensional, but it turns out that many operators of great interest have properties which, in the finite-dimensional case, ensure an even simpler description. They may belong to any of the special classes of operators defined on a Hilbert space by means of the adjoint operation T 7→ T ∗ : self-adjoint operators, unitary operators, or normal operators. For these, if dim H = n and we work over C, then there is an orthonormal basis (e1 , . . . , en ) of eigenvectors of T with corresponding eigenvalues (λ1 , . . . , λn ) so that T

X n j=1

αj ej



=

n X

αj λj ej .

(1.11)

j=1

P In other words, the map φ( nj=1 αj ej ) = (α1 , . . . , αn ) is an isometry from H to Cn and we may rephrase (1.11) to become T1 = φ ◦ T ◦ φ−1

(1.12)

where T1 is the diagonal map defined by T1 : Cn ∋ (αi ) 7→ (αi λi ) ∈ Cn . This is obvious, but gives a slightly different view of the classification problem. For any finite-dimensional Hilbert space H, and normal operator T , we have found a model space and operator (Cn , T1 ), such that (H, T ) is equivalent to (Cn , T1 ) in the sense of (1.12). The theory we will describe in Chapters 9, 12 and 13 will be a generalization of this type of normal form reduction. This succeeds because the model spaces and operators are indeed simple: they are of the type L2µ (X) for some measure space (X, µ), and the operators are multiplication operators Mg : f 7→ gf for a suitable function g : X → C.

1.4 The Prime Number Theorem

13

1.4 The Prime Number Theorem Part of the inherent beauty of mathematics comes from the interplay between simple problems and the sophisticated theories that are sometimes required to solve these problems. The natural numbers are among the simplest mathematical objects, but number theory tends to use techniques from much of mathematics to study basic properties of N. Additively N is quite simple, but multiplicatively N is much more complex as it is generated by the prime numbers 2, 3, 5, . . . . Perhaps because of mathematics’ omnipresence across all of the sciences, its absolute (internal) truth, and the pre-eminent role played by the natural numbers, Gauss is alleged to have said “mathematics is the queen of the sciences and number theory is the queen of mathematics”. For modern number theory functional analysis is one of many essential tools. While we will not be able to really justify this statement without devoting a significant proportion of this volume to number theory, we do attempt a partial justification by giving a proof of the prime number theorem in Chapter 14. Prime numbers have been a source of inspiration for mathematicians certainly since Euclid proved (approx. 300 BCE) that there are infinitely many prime numbers. One of many mysteries concerning the prime numbers is their distribution or location within the natural numbers. Exercise 1.8. Writing p1 , p2 , p3 , . . . for the primes 2, 3, 5, . . . and π(x) = |{n | pn 6 x}| for the number of primes less than or equal to x, recall Euclid’s argument using the fact that p1 p2 · · · pn + 1 has a prime divisor not in {p1 , . . . , pn } to show that there are infinitely n many primes. Use this to show that pn < 22 and deduce that log(log(x)) < π(x) 6 x

(1.13)

for all large x.

The property of being a prime is in some sense ‘deterministic’ but, measured appropriately, their appearance in N seems to mimic randomness. Leaving finer questions of this nature to one side, one may ask if Euclid’s proof that there are infinitely many primes can be improved to a more effective statement concerning the growth rate of π(x). That is, can (1.13) be improved, and if so, to what extent? Attempting to answer this question has been the source of many developments in number theory and other fields. For instance, Euler had studied properties of what we now call the P Riemann zeta function ζ(z) = n>1 n1z for z ∈ R, obtaining in 1737 a result Px that says (in a manner of speaking) n=1 p1n is approximately the logarithm P P of xn=1 n1 , equivalently that xn=1 p1n is approximately log log x, presaging an important result of Mertens from 1874. We will prove a weaker form of Mertens’ theorem in Chapter 14 (Theorem 14.15). Euler’s paper introduced several seminal ideas into number theory and arguably is the founding work of analytic number theory. Based on tables of primes Gauss (in 1792 or 1793), Legendre (in 1797 or 1798) and Dirichlet (in 1838) made conjectures about the asymptotic

14

1 Motivation

growth of π; the latter two both suggesting that logx x π(x) → 1 as x → ∞. Chebyshev was the first to really establish the correct order of growth  for π, proving in 1848 that if the sequence logx x π(x) x>1 converges at all as x → ∞, then it must converge to 1, and proving in 1851 that there are constants A, B > 0 with A logx x 6 π(x) 6 B logx x for all x. Riemann’s memoir of 1859 introduced methods of complex analysis into the subject, building on Euler’s use of real analysis by allowing complex variables and relating analytic properties of the zeta function and its zeros to arithmetic properties of the primes. Finally, in 1886 Hadamard and de la Vall´ee-Poussin independently used complex analysis on the Riemann zeta function to prove the prime number theorem, logx x π(x) −→ 1 as x → ∞. We will present a proof of the prime number theorem (Theorem 14.1) in Chapter 14, closely following Tao’s blog [103], which builds on a key step of Selberg’s elementary proof of the prime number theorem from 1949 in [96] and Mertens’ theorem. In combining these ingredients to prove the prime number theorem we will use the language and several tools developed in this volume. Moreover, we will close our discussion with an application of the spectral theory of finite abelian groups (as in the discussion from Section 1.1) proving Dirichlet’s theorem from 1837 on primes in arithmetic progressions and the prime number theorem for primes in arithmetic progressions.

1.5 Further Topics We list here two more topics that we will be able to discuss. • Suppose that f : R2 → R is a continuous function and the partial de∂k ∂k rivatives ∂x k f, ∂xk f exist and are continuous for all k > 1. Then f is 1 2 smooth (see Exercises 3.69 and 5.18). • Another application of functional analysis that we wish to discuss concerns the construction of sparse but highly connected graphs called expanders, which are an important concept in graph theory and computer science, see Section 10.4.

Chapter 2

Norms and Banach Spaces

In this chapter we start the more formal treatment of functional analysis, giving the fundamental definitions and introducing some of the basic examples and their properties. We also discuss some theorems and constructions that may be considered part of topology or measure theory, to put them into the context of the theory developed here.

2.1 Norms and Semi-Norms Throughout this book we will be working with real or complex vector spaces (V, +, ·) (here + is vector addition, and · scalar multiplication). We will call the elements of the field simply scalars if we want to avoid making the distinction between the real and complex case. For instance, in the fundamental definitions to come in this section, we treat the real and complex cases simultaneously. We will assume familiarity with the following concepts from linear algebra: vector spaces, subspaces, quotient spaces, dimension (which may be infinite), linear maps, image and kernel of linear maps. The notion of a basis of a vector space will only be used to distinguish finite-dimensional vector spaces from infinite-dimensional ones. We will not usually try to describe the vector spaces that arise in functional analysis, or the linear maps between them, in terms of bases. An exception will arise in the study of Hilbert spaces (see Section 3.1) and in the study of certain (important but nonetheless special) operators on them (see Section 6.2). Also recall that a subset K ⊆ V of a vector space is said to be convex if for k1 , k2 ∈ K and t ∈ [0, 1] we have that the convex combination (1−t)k1 +tk2 also belongs to K.

© Springer International Publishing AG 2017 M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications, Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_2

15

16

2 Norms and Banach Spaces

2.1.1 Normed Vector Spaces Definition 2.1. Let V be a real or complex vector space. A map k·k : V → R is called a norm if it has the following properties • (strict positivity) kvk > 0 for any v ∈ V , and kvk = 0 if and only if v = 0; • (homogeneity) kαvk = |α|kvk for all v ∈ V and scalars α; and • (triangle inequality) kv + wk 6 kvk + kwk for all v, w ∈ V . If k · k is a norm on V , then (V, k · k) is called a normed vector space. It is easy to give examples of normed vector spaces, and we list a few standard examples here (more will appear throughout the text). Example 2.2. The following are examples of normed real vector spaces, in which we write v = (v1 , . . . , vd )t for elements of Rd . p (1) Rd with the Euclidean norm kvk = kvk2 = |v1 |2 + · · · + |vd |2 . (2) Rd with kvk = kvk∞ = max16i6d |vi |. (3) Rd with kvk = kvk1 = |v1 | + · · · + |vd |. (4) Rd with the norm defined by kvkB = inf{α > 0 | α1 v ∈ B}, where B is a non-empty, open, centrally symmetric (that is, with B = −B), convex, bounded (with respect to the Euclidean norm) subset of Rd . (5) Let X be any topological space (for example, a metric space; see Appendix A), and let Cb (X) = {f : X → R | f is continuous and bounded} with the uniform or supremum norm kf k = kf k∞ = sup |f (x)|. x∈X

Notice that if X is compact, then Cb (X) coincides with C(X), the space of continuous functions X → R. We note that our definition of compactness (see Definition A.18) contains the assumption that X is Hausdorff. (6) A special case of (5) makes C([0, 1]), and so also the subspace C 1 ([0, 1]) = {f : [0, 1] → R | f has a continuous derivative on [0, 1]}, into a normed vector space. A different norm on C 1 ([0, 1]) may be obtained by setting kf kC 1([0,1]) = max{kf k∞ , kf ′ k∞ }. (7) Finally, consider the vector space of real polynomials R[x] =

n

f=

N X

k=0

cf (k)xk | N ∈ N, cf (k) ∈ R

o

on which we can define any of the following norms (thinking of f ∈ R[x] really as the finitely supported but infinite vector (cf (0), cf (1), . . . , cf (N ), cf (N + 1), . . . ) ∈ RN0

2.1 Norms and Semi-Norms

17

of its coefficients with cf (k) = 0 for k > N ): (a) kf k1 = (b) kf k2 =

∞ X

|cf (k)|,

k=0 ∞ X

k=0

|cf (k)|2

!1/2

, or

(c) kf k∞ = max |cf (k)|. k>0

We could also think of polynomials as defining continuous functions on [0, 1], thus embedding R[x] ⊆ C 1 ([0, 1]) ⊆ C([0, 1]), so that the norm k · kC 1 ([0,1]) or k · k∞ may also be used. The examples in Example 2.2 all generalize in the obvious way to form normed complex vector spaces, with the exception of (4), where additional requirements on the set B are required (see Exercise 2.3). Exercise 2.3. (a) Verify that Example 2.2(1), (2), (3), (5), (6), and (7) define normed vector spaces over R or C. (b) Show that Example 2.2(4) defines a real normed vector space. (c) Show that for a complex normed vector space (V, k · k) the open unit ball B = B1V = {v ∈ V | kvk < 1} has the property that αB = B for any α ∈ C with |α| = 1. (d) Show that if B ⊆ Cd is non-empty, open, convex, bounded, and satisfies αB = B for any α ∈ C with |α| = 1, then there exists a norm on Cd whose open unit ball is B.

Throughout the text we will use notions from topology (see Appendix A for a summary). Lemma 2.4 (Associated metric). Suppose that (V, k·k) is a normed vector space. Then for every v, w ∈ V we have kvk − kwk 6 kv − wk. (2.1)

Moreover, writing d(v, w) = kv − wk for v, w ∈ V defines a metric d on V such that the norm function k · k : V → R is continuous with respect to the topology induced by the metric d. Proof. For any v, w ∈ V , we have kvk = kv − w + wk 6 kv − wk + kwk and similarly kwk = kw − v + vk 6 kv − wk + kvk, by Definition 2.1 (the triangle inequality and homogeneity), and the two inequalities together give (2.1). To see that d is a metric we need to check the following defining properties of a metric: • (strict positivity) that d(v, w) > 0 for all v, w ∈ V and d(v, w) = 0 if and only if v = w is clear by strict positivity in Definition 2.1. • (symmetry) d(v, w) = d(w, v) for all v, w ∈ V follows by applying the homogeneity in Definition 2.1 with the choice α = −1.

18

2 Norms and Banach Spaces

• (triangle inequality) Finally, we have d(u, w) = ku − wk = ku − v + v − wk 6 ku − vk + kv − wk = d(u, v) + d(v, w). The norm is continuous at v ∈ V if for every ε > 0 there exists some δ > 0 such that d(u, v) < δ implies kuk − kvk < ε. By (2.1), we may choose δ = ε to see this.  Notice that the triangle inequality makes addition continuous. Indeed, if we write Bεk·k (v) = {w ∈ V | kw − vk < ε} for the ball of radius ε around v ∈ V , then we have k·k

k·k

Bε/2 (v1 ) + Bε/2 (v2 ) ⊆ Bεk·k (v1 + v2 )

(2.2)

for every ε > 0. This means that (v, w) 7→ v + w is continuous at (v1 , v2 ) and, since v1 , v2 ∈ V were arbitrary, shows that addition is continuous. Scalar multiplication is also continuous. To see this fix a scalar α and a vector v ∈ V , and notice that βw − αv = (β − α)w − α(v − w). ε So if ε ∈ (0, 1), |β − α| < kvk+1 for a scalar β and kw − vk < ε for a vector w ∈ V , then kwk < kvk + 1 and hence

kβw − αvk < ε(1 + |α|).

(2.3)

This gives continuity of scalar multiplication at (α, v). We now turn to the sense in which the topology induced by a norm determines the norm. Lemma 2.5 (Equivalence of norms). Two norms k·k and k·k′ on the same vector space induce the same topology if and only if there exists a (Lipschitz) constant c > 1 such that 1 ′ c kvk

6 kvk 6 ckvk′

for all v ∈ V . In this case we call the norms equivalent. Proof. If (2.4) holds, then the standard neighbourhoods of v ∈ V , ′

Bεk·k (v) = {w ∈ V | kw − vk′ < ε} and Bεk·k (v) = {w ∈ V | kw − vk < ε} with respect to the two norms satisfy

(2.4)

2.1 Norms and Semi-Norms

19 k·k′

k·k B 1 ε (v) ⊆ Bεk·k (v) ⊆ Bcε (v). c

This implies that the topologies have the same notion of neighbourhood, and so are identical. k·k Suppose now that the two topologies are the same, so that B1 is a neighbourhood of 0 in this topology. Then there must be some ε > 0 with ′

k·k

Bεk·k ⊆ B1 . Equivalently, kvk′ < ε implies that kvk < 1. For any v ∈ V r{0}, if w = then ε kwk′ = kvk′ < ε 2kvk′ and so

ε 2kvk′ v

ε kvk = kwk < 1. 2kvk′

This implies that kvk 6 2ε kvk′ for all v ∈ V , giving the second inequality in (2.4). Reversing the roles of k · k and k · k′ gives the first inequality, and choosing c to be the larger of the two choices produced for c gives the lemma.  The phenomenon seen in the proof of Lemma 2.5, where a property on all of V is determined by the local behaviour at 0, is something that will occur frequently. For Rd the notion of equivalence of norms has the following property. Proposition 2.6 (Equivalence in finite dimensions). Any two norms on Rd are equivalent, for any d > 1. As we will see in the proof, this is related to the compactness of the closed unit ball in Rd . Proof of Proposition 2.6. Let k · k1 be the norm on Rd from Example 2.2(3), and let k · k′ be an arbitrary norm on Rd . It is enough to show that these two norms are equivalent. Write e1 , . . . , ed for the standard basis of Rd , and let M = max16i6d kei k′ . Then d d

X

′ X

kvk′ = vi ei 6 |vi |kei k′ 6 M kvk1 , i=1

(2.5)

i=1

where we have used the triangle inequality generalized by induction to finite sums and homogeneity of the norm. This gives one of the inequalities in (2.4). Using compactness: To obtain the reverse inequality, notice first that S1 = {v ∈ Rd | kvk1 = 1} is closed and bounded in the standard topology of Rd (which may either be seen as a consequence of the bounds d1 k · k1 6 k · k2 6 k · k1 or directly

20

2 Norms and Banach Spaces

using k · k1 ), and so is compact by the Heine–Borel theorem. Also, (2.5) shows that v 7→ kvk′ is a continuous function in the standard topology: (2.1) for k · k′ and (2.5) together give kvk′ − kwk′ 6 M kv − wk1 , giving the ε continuity claimed just as in the end of the proof of Lemma 2.4 with δ = M . Together this implies that m = min kvk′ = kv0 k′ v∈S1

is attained for some v0 ∈ S1 . By definition of S1 we have v0 6= 0, so that m > 0 by the property of the norm k · k′ . Therefore, v ∈ V r{0} implies that

v ′

kvk1 > m,

or v ∈ V implies that kvk′ > mkvk1 , as required.



This might suggest that the equivalence of norms is a widespread phenomenon. However, once we leave the setting of finite-dimensional normed spaces, we will quickly see that a given normed space may have many inequivalent norms. Exercise 2.7. Show that the norms kf k∞ and kf kC 1([0,1]) for f ∈ C 1 ([0, 1]) from Example 2.2(5)–(6) are not equivalent. Show that the norm kf kC 1([0,1]) and the norm defined by kf k0 = |f (0)| + kf ′ k∞ for f ∈ C 1 ([0, 1]) are equivalent. Exercise 2.8. Show that no two of the norms on R[x] from Example 2.2(7) are equivalent. However, some of the pairs of norms do satisfy an inequality of the form kf k 6 ckf k′ for some fixed c > 0 and any f ∈ R[x]. Find those that do and identify the smallest relevant constant c in each case. Exercise 2.9. Let V, W be normed vector spaces. Show that V × W with its canonical inherited vector space structure can be made into a normed vector space using either of the norms 1/p k(v, w)kp = kvkpV + kwkpW for some p ∈ [1, ∞), or

k(v, w)k∞ = max{kvkV , kwkW }.

Show that all of these norms are equivalent, and that they induce the product topology.

The next exercise also shows why we are careful in setting up the theory of normed spaces instead of just declaring that everything is a generalization of the finite-dimensional theory. Exercise 2.10. We define the space ℓ1 (N) = {(xn ) | all absolutely summable sequences.

P

P∞

n=1

|xn | < ∞} to be the space of

∞ 1 • Show that kxk1 = n=1 |xn | for x ∈ ℓ (N) defines a norm, and that the subspace cc (N) = {x | xn = 0 for all large enough n} ⊆ ℓ1 (N) is dense. • Let V = ℓ1 (N)×ℓ1 (N) (with any of the equivalent norms from Exercise 2.9) and define the subspaces V1 = ℓ1 (N) × {0} and

V2 = {(x, y) ∈ V | nyn = xn for all n ∈ N}.

2.1 Norms and Semi-Norms

21

Show that V1 , V2 are closed subspaces of V , but that V1 + V2 = {v1 + v2 | v1 ∈ V1 , v2 ∈ V2 } is not closed.

2.1.2 Semi-Norms and Quotient Norms †

The following weakening of Definition 2.1 is often useful.

Definition 2.11. A non-negative function k · k : V → R>0 on a vector space V is called a semi-norm (or a pseudo-norm) if k · k satisfies the homogeneity property and the triangle inequality of a norm. Thus a semi-norm is allowed to have a non-trivial subset (which will be a subspace, see below) on which it vanishes. A semi-norm gives rise to a pseudometric, which in turn gives rise to a topology on V . The resulting topology is Hausdorff if and only if the original semi-norm is a norm in the usual sense. Indeed, if v ∈ V has kvk = 0, then v will belong to every neighbourhood of 0 in the topology defined by k · k. Example 2.12. Let (X, B, µ) be a measure space (see Appendix B), and define Lµ1 (X) = {f : X → R | f is measurable and Lebesgue integrable w.r.t. µ}. On this space we can define a semi-norm Z kf k1 = |f | dµ, X

and this is not a norm (unless the measure space (X, µ) has special properties; see Exercise 2.13). Exercise 2.13. Characterize those measure spaces (X, B, µ) on which the semi-norm from Example 2.12 on the space Lµ1 (X) of Lebesgue integrable functions is a norm.

A reader familiar with measure theory may misread Example 2.12, so we should emphasize that Lµ1 (X) denotes the space that contains genuine functions defined at each point of X. The usual solution to the problems created by the many functions on which the semi-norm vanishes is to define an equivalence class of a function f to consist of all functions that differ from f on a null set. This generalizes to a construction that allows any semi-norm on a vector space to be modified to give a norm (on a related vector space). To describe this construction, we first prove a simple lemma that starts to connect the algebraic properties of spaces equipped with a semi-norm to their topological properties. †

The construction in this section is satisfying and useful at times but, with the exception of Definition 2.11, is not critical for later developments.

22

2 Norms and Banach Spaces

Lemma 2.14 (Kernel of a semi-norm). The kernel V0 = {v ∈ V | kvk = 0} of a semi-norm on a vector space is a closed subspace in the topology induced by the semi-norm. Proof. For v, w ∈ V0 and any scalar α we have 0 6 kαv + wk 6 |α|kvk + kwk = 0, so V0 is a subspace. By the argument used in Lemma 2.4, we see that the semi-norm k · k is continuous with respect to the induced topology. It follows that the preimage V0 = (k · k)−1 ({0}) is also closed.  Returning to Example 2.12, recall that for f ∈ Lµ1 (X), kf k1 = 0 is equivalent to the statement that f = 0 almost everywhere with respect to µ. Thus the usual equivalence class of a function f is precisely the coset f + V0 defined by f with respect to the kernel V0 ⊆ Lµ1 (X) of the semi-norm. We define, as is standard, the quotient space L1µ (X) = Lµ1 (X)/V0 , and note that the semi-norm k · k1 on Lµ1 (X) gives rise to a norm, also denoted k · k1 , on L1µ (X). For an introduction to the function spaces Lpµ (X) for p ∈ [1, ∞) we refer to Appendix B.3. Where the measure is clear from the context or has a standard choice (for example, the Lebesgue measure on [0, 1]), it is omitted from the notation. This construction is a special case of the following. Lemma 2.15 (Quotient norm). For any vector space V equipped with a semi-norm k · k, and any closed subspace W ⊆ V , the expression kv + W kV /W = inf kv + wk w∈W

for v ∈ V defines a norm on the quotient space V /W = {v + W | v ∈ V }. For the kernel W = V0 we have kv + V0 kV /V0 = kvk for v ∈ V . Proof. This is simply a matter of chasing the definitions through the statements. Let v1 , v2 ∈ V and ε > 0 be given. Then there exist w1 , w2 ∈ W with kvi + wi k 6 kvi + W kV /W + ε for i = 1, 2. Hence

2.1 Norms and Semi-Norms

23

kv1 + v2 + W kV /W 6 kv1 + v2 + w1 + w2 k

6 kv1 + w1 k + kv2 + w2 k 6 kv1 + W kV /W + kv2 + W kV /W + 2ε,

and so the triangle inequality holds for k · kV /W . Similarly, for any scalar α,  kαv1 + αw1 k = |α|kv1 + w1 k 6 |α| kv1 + W kV /W + ε , which gives

kαv1 + W kV /W 6 |α|kv1 + W kV /W . If α = 0 then this is clearly an equality, and if α 6= 0 then we may apply the above to αv1 and the scalar α−1 to give kv1 + W kV /W 6 |α|−1 kαv1 + W kV /W . However, this is the remaining half of the homogeneity property. It remains to check that k · kV /W is indeed a norm and not simply a seminorm. Assume therefore that kv + W kV /W = 0. Then for every ε > 0 there exists some w ∈ W with kv − wk < ε. However, this shows that v belongs to the closure W of W . By assumption W = W is closed, so that v ∈ W and v + W = W is the zero element in the quotient space V /W . For W = V0 we have kv + wk = kv + wk + k − wk > kvk for every v ∈ V and w ∈ V0 , which gives the final claim.  Notice that we cannot expect the infimum in Lemma 2.15 to be a minimum in general (see, for example, Exercise 2.16). Exercise 2.16. Let (C([−1, 1]), k · k∞ ) be the normed vector space defined as in Example 2.2(5). Define W =



f ∈ C([−1, 1]) |

Z

0

−1

f (x) dx =

Z

0

1



f (x) dx = 0

.

Show that W is a closed subspace. Now let f (x) = x, calculate kf kC([−1,1])/W , and show that the infimum is not achieved.

2.1.3 Isometries are Affine †

The following strengthening of the triangle inequality has interesting consequences. †

The results in Section 2.1.3 are interesting, but will not be needed later.

24

2 Norms and Banach Spaces

Definition 2.17. A norm k · k on a vector space V is strictly sub-additive if we have strict inequality in the triangle inequality except in the case when the two vectors are real non-negative multiples, more precisely kv + wk < kvk + kwk for all v, w ∈ V unless v, w ∈ {t(v + w) | t > 0}. Exercise 2.18. A normed space (V, k · k) is strictly convex if the closed unit ball is a strictly convex set, or equivalently if a line segment with end points x 6= y in the unit sphere {v ∈ V | kvk = 1} only intersects the unit sphere at its end points. Show that the closed unit ball in a normed linear space is strictly convex if and only if the norm is strictly sub-additive.

A map f : V → W between normed spaces is an isometry if kf (v) − f (v0 )kW = kv − v0 kV for all v, v0 ∈ V . Exercise 2.19. Show that the supremum norm k·k∞ on R2 is not strictly convex. Give an example to show that an isometry between normed spaces need not be affine, by considering maps of the form x 7→ (x, f (x)) from (R, |·|) to (R2 , k·k∞ ) for a suitably chosen function f .

Theorem 2.20 (Mazur–Ulam(3) ). Let V and W be normed linear spaces over R, and let M : V → W be a function. Assume that either • M is a surjective isometry, or • M is an isometry and the norm on W is strictly sub-additive. Then M is affine, that is M (v) = Mlinear (v) + M (0) where Mlinear : V → W is a linear isometry. Proof. Clearly the map v 7→ M (v)−M (0) is an isometry if M is an isometry, so we may assume that M (0) = 0 without loss of generality, and need to show in this case that M is linear. Midpoint-preserving maps: We claim first that if M preserves mid-points in the sense that   2) 2 M v1 +v = M(v1 )+M(v (2.6) 2 2

for all v1 , v2 ∈ V and satisfies M (0) = 0, then M is linear. To see this, pick v in V and apply (2.6) to the pairs v and 0, then to 12 v and 0, and inductively to 21k v and 0 to prove that M ( 21k v) = 21k M (v) for all k ∈ N and v ∈ V . Next apply (2.6) to 2v and 0, and inductively to (ℓ + 1)v and (ℓ − 1)v to prove that M (ℓv) = ℓM (v) for all ℓ ∈ N and v ∈ V . Finally, apply (2.6) to v and −v to see that M (−v) = −M (v) for all v ∈ V . This gives   M 2kn v = 2kn M (v)

2.1 Norms and Semi-Norms

25

for any k ∈ Z, n ∈ N and v ∈ V , and so by continuity M (av) = aM (v) for all a ∈ R and v ∈ V . With (2.6), this also gives M (v1 + v2 ) = M (v1 ) + M (v2 ) for all v1 , v2 ∈ V . Subadditive norms: The case of the theorem under the sub-additivity hypothesis is now easily obtained. Suppose that M is an isometry and the norm 2 on W is strictly sub-additive. Let v1 , v2 ∈ V have mid-point z = v1 +v 2 , so that kv1 − zk = kz − v2 k = 12 kv1 − v2 k, and hence (since M is an isometry) kM (v1 ) − M (z)k = kM (z) − M (v2 )k = 21 kM (v1 ) − M (v2 )k. Moreover, M (v1 ) − M (v2 ) = (M (v1 ) − M (z)) + (M (z) − M (v2 )) . Thus if the norm on W is strictly sub-additive then (M (v1 )− M (z)) and (M (z)− M (v2 )) must be real non-negative multiples of each other by strict sub-additivity, but as they have the same norm this forces them to be be equal. Solving this equation for M (z) gives (2.6). Surjective isometries: Consider now the case where M is assumed to be a surjective isometry (and hence also bijective). We define for z ∈ V the reflection in z to be the map ψz : V → V defined by ψz (v) = 2z − v. It is easy to check that ψz2 is the identity, so ψz has inverse ψz and is a bijective isometry. Note that kψz (v) − zk = kv − zk,

(2.7)

kψz (v) − vk = 2kv − zk

(2.8)

and

for all v ∈ V , which also implies that z itself is the only point fixed of ψz . 2 Now fix v1 , v2 ∈ V and write z = v1 +v as before for the midpoint. Let B 2 be the group of all bijective isometries V → V that fix v1 and v2 , and define λ = sup{kg(z) − zk | g ∈ B}. Since any g ∈ B is an isometry satisfying g(v1 ) = v1 we have kg(z) − zk 6 kg(z) − g(v1 )k + kv1 − zk = 2kv1 − zk and hence λ < ∞. We also have 2kg(z) − zk = kψz g(z) − g(z)k = kg

−1

= kψz g

ψz g(z) − zk −1

ψz g(z) − zk

(by (2.8) for v = g(z)) (since g is an isometry) (by (2.7) for v = g −1 ψz g(z))

26

2 Norms and Banach Spaces

for any g ∈ B. Furthermore, note that ψz (v1 ) = v2 and ψz (v2 ) = v1 . It follows that g ∈ B implies g ′ = ψz g −1 ψz g ∈ B and so kg ′ (z) − zk 6 λ. Combining this with the above gives 2kg(z) − zk = kg ′ (z) − zk 6 λ, for all g ∈ B, and hence by definition of λ ∈ [0, ∞) also 2λ 6 λ. This forces λ to be 0, and therefore g(z) = z for all g ∈ B. 2) Now let M : V → W be a bijective isometry, and let z ′ = M(v1 )+M(v . 2 −1 Then h = ψz M ψz′ M ∈ B, so h(z) = z and therefore ψz′ M (z) = M (z). On the other hand, the only point fixed by ψz′ is z ′ itself, so M (z) = z ′ and M preserves mid-points as required.  Exercise 2.21. Show that the vertex set of a graph consisting of vertices v1 , v2 , v3 , vc and three edges connecting one central vertex vc to the remaining three vertices v1 , v2 , v3 , endowed with the combinatorial distance given by d(vj , vk ) = 2δjk for j, k ∈ {1, 2, 3} and d(vj , vc ) = 1 for j = 1, 2, 3 admits no isometric embedding into any Banach space with a strictly sub-additive norm.

2.1.4 A Comment on Notation On several occasions in this section we considered different norms on the same vector space. This will happen less frequently in the theoretical parts of the text, and most of the time the normed vector space (V, k · k) will be equipped with a particular norm. Where we are dealing with a single norm, we will write Br = BrV = BrV = {w ∈ V | kwk < r} for the open ball of radius r around 0, and Br (v) = BrV (v) = {w ∈ V | kw − vk < r} = BrV + v for the open ball of radius r around v ∈ V . We will also frequently write k·kV for the natural norm on V . For example, in Example 2.2(6) we may write kf kC 1 ([0,1]) for the natural norm of a function f ∈ C 1 ([0, 1]), but may also write kf kC([0,1]) = kf k∞ for the supremum norm of f ∈ C 1 ([0, 1]) thought of as an element of the large space C([0, 1]). At this point it is reasonable to ask what makes a norm be naturally associated to a given space, and this is partially explained in the next section.

2.2 Banach Spaces We start by recalling a basic definition from analysis on metric spaces.

2.2 Banach Spaces

27

Definition 2.22. A sequence (xn ) in a metric space (X, d) is said to be a Cauchy sequence if for any ε > 0 there is an N = N (ε) such that d(xm , xn ) < ε for any m, n > N . The metric space is called complete if every Cauchy sequence converges to an element of X. From a purely logical point of view, ‘converges to an element of X’ should be written ‘converges’, but we wish to emphasize here that the limit of the sequence belongs to X (and not to some strictly larger space that contains X). The notion of Cauchy sequences gives rise to one of the fundamental types of normed spaces in functional analysis. Definition 2.23. A normed vector space (V, k · k) is a Banach space if V is complete with respect to (the metric induced by) the norm k · k. Once again there are many familiar examples of Banach spaces. As we will see there is often an almost canonical choice of norm k · kV which makes a linear space V into a Banach space (V, k · kV ). It is clear that this property of a norm does not define it uniquely. In fact, any equivalent norm would induce the same topology, the same notion of Cauchy sequence, and therefore also make V into a Banach space. Example 2.24. We start with a small number of examples, and postpone the proof that these are indeed Banach spaces to Section 2.2.1. (1) The Euclidean space Rd with any of the norms from Example 2.2(1)–(4) from Section 2.1.1 forms a Banach space. (2) Let X be any set. Then B(X) = {f : X → R | f is bounded}, equipped with the norm kf k∞ = sup |f (x)|, x∈X

is a Banach space. Convergence of a sequence of functions in this space is also called uniform convergence. If X = N, then one often writes ℓ∞ = ℓ∞ (N) = B(N). (3) Let X be a topological space. Then Cb (X) = {f ∈ B(X) | f is continuous} is a closed subspace of B(X) and so is also a Banach space. Notice that if X is compact then Cb (X) = C(X). (4) Let X be a locally compact topological space (that is, a Hausdorff topological space in which every point has a compact neighbourhood, see also Definition A.21). Then

28

2 Norms and Banach Spaces

C0 (X) = {f ∈ Cb (X) | lim f (x) = 0} x→∞

is a closed subspace of Cb (X) and hence a Banach space. The notion of the limit of f (x) as x → ∞ used here is defined as follows: limx→∞ f (x) = A if and only if for every ε > 0 there exists some compact set K ⊆ X with |f (x) − A| < ε for all x ∈ XrK. If X = N (with the discrete topology), one often writes c0 = c0 (N) = C0 (N) for this subspace of ℓ∞ (N). (5) The space C 1 ([0, 1]) of continuously differentiable functions on [0, 1] with the norm kf kC 1 ([0,1]) = max{kf k∞, kf ′ k∞ } is a Banach space. (6) Let U ⊆ Rd be non-empty and open, and fix k > 1. Then the space Cbk (U ) of functions U → R for which all partial derivatives up to order k exist and are continuous and bounded on U , equipped with the norm kf kCbk (U) = max k∂α f k∞ , kαk1 6k

is a Banach space, where ∂α for α ∈ Nd0 stands for the partial differential operator defined by ∂ kαk1 ∂α f = f α1 d ∂x1 · · · ∂xα d of degree kαk1 = α1 + · · ·+ αd . In the case α = ej we will call this the jth partial derivative and write ∂j f = ∂ej f . (7) Fix p ∈ [1, ∞) and let (X, B, µ) be a measure space. Then kf kp =

Z

X

|f |p dµ

1/p

defines a semi-norm on the vector space Lµp (X) = {f : X → R | f is measurable and kf kp < ∞}. The associated space of equivalence classes, equal to the quotient Lpµ (X) = Lµp (X)/V0 by the kernel V0 of the semi-norm k · kp , is a Banach space. We will write Lp (X, µ) and Lp (µ) in place of Lpµ (X) when the space is clear, and in particular where it is useful to avoid multiple levels of subscript. Important special cases of this construction include the following: (a) (X, B, µ) = (Ω, B, m) where Ω is a Borel subset of Rd , B is the Borel σ-algebra, and m is d-dimensional Lebesgue measure on Ω. (b) (X, B, µ) = (N, P(N), λcount ), where λcount denotes the counting measure, which is defined on any subset of N. In this case we will write

2.2 Banach Spaces

29

ℓp = ℓp (N) = Lpλcount (N). (8) The analogue of (7) with p = ∞ is constructed slightly differently. As before, let (X, B, µ) be a measure space. Then L ∞ (X) = {f : X → R | f is measurable, f ∈ B(X)} is already a Banach space with respect to kf k∞ . However, one also defines ∞ L∞ µ (X) = L (X)/Wµ (X),

where Wµ (X) = {f ∈ L ∞ (X) | f = 0 µ-almost everywhere} and L∞ µ (X) is equipped with the essential supremum norm defined by  kf kesssup = esssupx∈X |f (x)| = inf α > 0 | µ ({x | |f (x)| > α}) = 0 . (2.9) We will generally follow the convention that the essential supremum norm of f is also denoted for simplicity by kf k∞ . All of these ℓp and Lp spaces also have natural complex-valued analogues. As is customary, we will quickly stop being too careful about the distinction between an element of L ∞ (X) and the equivalence class defined by it in L∞ µ (X). For example, |f |(x) = |f (x)| for all x ∈ X really depends on f ∈ L ∞ (X) and not just on the equivalence class, but (as we will see later in the proof of completeness) the norm defined in (2.9) is independent of the representative chosen for a given equivalence class. Exercise 2.25. Show that a product of two normed vector spaces V × W is complete with respect to one of the norms from Exercise 2.9 if and only if both V and W are complete with respect to their own norms. Thus the product of two Banach spaces is a Banach space.

2.2.1 Proofs of Completeness In this subsection we will explain why the examples from Example 2.24 are indeed Banach spaces. Depending on the background of the reader, parts of this section may be skipped. In each case it is proving completeness that really takes up what effort is required. The following principle will be used several times. Essential Exercise 2.26. Let (X, d) be a metric space. (a) Show that if Y is a subset of X that is complete (with respect to the restriction of the metric on X to Y ), then Y is a closed subset of X. (b) If X is complete, show that Y ⊆ X is complete if and only if Y is closed.

30

2 Norms and Banach Spaces

Example 2.24(1). If two norms are equivalent then they define the same notion of convergence and of Cauchy sequence. Thus it is enough to consider Rd with the norm k·k∞ by Proposition 2.6. Now a Cauchy sequence (vn ) (i) in Rd has the property that each component sequence (vn ) for a fixed i, (1) (d) t where vn = (vn , . . . , vn ) for all n, is itself a Cauchy sequence in R. Since R (i) is complete, there exists a limit v (i) = limn→∞ vn for each i. These limits (1) (d) t together define a vector v = (v , . . . , v ) and it is easy to see that v is the limit of (vn ) in Rd .  Example 2.24(2). Let X be any set and let (fn ) be a Cauchy sequence in B(X) with respect to k·k∞ . Then for any fixed x ∈ X the sequence (fn (x)) is a Cauchy sequence in R, which therefore has a limit f (x). This defines a function f : X → R. We need to show that f ∈ B(X) and fn → f as n → ∞ with respect to k · k∞ . Since (fn ) is Cauchy, for any ε > 0 there is some N (ε) with kfm − fn k∞ < ε for all m, n > N (ε), and so |fm (x) − fn (x)| < ε for any x ∈ X and m, n > N (ε). Now let m → ∞ to see that |f (x) − fn (x)| 6 ε for all n > N (ε). Setting ε = 1 and n = N (1) gives |f (x)| 6 1 + kfN (1) k∞ for any x ∈ X, showing that f ∈ B(X). For any ε > 0, we obtain kf − fn k∞ 6 ε for all n > N (ε) and hence that f = limn→∞ fn ∈ B(X), as required. If |X| has cardinality d then this example reduces to the previous one.  Example 2.24(3). By definition, Cb (X) is a subspace of B(X), and we use the same norm on both spaces. Thus, if (fn ) is a Cauchy sequence in Cb (X) then, by (2), there exists a limit f = limn→∞ fn ∈ B(X). It remains to show that f ∈ Cb (X) — that is, to show that Cb (X) is a closed subspace of B(X). This is a familiar argument from real analysis. Given any ε > 0 there exists some n with kfn − f k∞ < ε. Since fn ∈ Cb (X) is continuous at x, there is a neighbourhood U ⊆ X of x with |fn (y) − fn (x)| < ε for all y ∈ U . Therefore, |f (y) − f (x)| 6 |f (y) − fn (y)| + |fn (y) − fn (x)| + |fn (x) − f (x)| < 3ε | {z } | {z } | {z } 6kf −fn k∞ 1 converges, where sN = v P∞ n=1 n for all N > 1, and converges absolutely if the real-valued series n=1 kvn k converges. Lemma 2.28 (Absolute convergence). A normed vector space (V, k · k) is a Banach space if and only if any absolutely convergent series in V is convergent. P∞ Proof. If V is a Banach P∞ space and a series n=1 vn is absolutely convergent, which means that Pn n=1 kvn k < ∞, then the sequence of partial sums (sn ) defined by sn = k=1 vk is a Cauchy sequence, since for m > n we have

m m

X

X

ksm − sn k = vk 6 kvk k,

k=n+1

k=n+1

and the last sum can be made arbitrarily small by requiring n to be sufficiently large. Assume now for the converse that (V, k·k) is a normed vector space in which every absolutely convergent series is convergent, and let (vn ) be a Cauchy sequence in V . In order to render the Cauchy property more uniform, we choose a subsequence of (vn ) as follows. For each k > 1 there exists some Nk such that 1 kvm − vn k < k 2 for all m, n > Nk . Using these numbers we define inductively an increasing sequence (nk ) by n1 = N1 and nk = max{nk−1 + 1, Nk } for k > 2. The corresponding subsequence (vnk )k>1 satisfies kvnk+1 − vnk k < 21k . Now define wk = vnk+1 − vnk

2.2 Banach Spaces

33

P P∞ 1 for all k > 1,Pso that ∞ k=1 kwk k < k=1 2k = 1 converges, and hence the ∞ infinite sum k=1 wk = w ∈ V converges by our assumption on the normed space (V, k · k). For the ℓth partial sum of this series we obtain ℓ X

k=1

wk = vnℓ+1 − vn1 ,

and so the subsequence (vnk ) satisfies v = limk→∞ vnk = w + vn1 . Finally, we use the fact that any Cauchy sequence with a convergent subsequence must converge; we quickly recall the argument. For any ε > 0 choose N with kvm −vn k < ε for m, n > N and choose K with kvnk −vk < ε for k > K. Then if k > K has nk > N we have kvm − vk 6 kvm − vnk k + kvnk − vk < 2ε for all m > N , showing that the sequence converges.  Lemma 2.29 (Quotients of Banach spaces). If (V, k · k) is a Banach space and W ⊆ V is a closed subspace then (V /W, k · kV /W ) is a Banach space. Proof. Assume that (vn ) is a sequence with ∞ X

n=1

kvn + W kV /W < ∞.

Since Banach spaces can be characterized by absolute convergence (see Lemma 2.28), it suffices to show that ∞ X

(vn + W )

n=1

exists. For this, choose for each n > 1 some wn ∈ W with

Then

P∞

n=1

kvn + wn k 6 kvn + W kV /W +

1 2n .

kvn + wn k < ∞, so the limit v=

∞ X

(vn + wn )

n=1

exists in V by Lemma 2.28. Also note that the canonical map π : V → V /W (defined by π(v) = v + W for all v ∈ V ) is continuous since kv − v0 k < ε implies k(v + W ) − (v0 + W )kV /W < ε for all v, v0 ∈ V and ε > 0. This implies that ∞ X (vn + W ) = v + W n=1

34

2 Norms and Banach Spaces

converges.

 Lpµ (X),

We refer to Appendix B for basic properties of k · kp on and in particular for the triangle inequality. Moreover, in the proof below and in the remainder of the book we will need the monotone convergence and the dominated convergence theorems (Theorems B.7 and B.8, respectively). Example 2.24(7). Let (fn ) be a sequence in Lpµ (X) with ∞ X

M=

n=1

kfn kp < ∞.

P p By Lemma 2.28 it is enough to show that ∞ n=1 fn converges in Lµ (X). For this, define a sequence of functions (gn ) by n X

gn (x) =

k=1

|fk (x)|.

Clearly gn (x) ր g(x) for some measurable function g : X → [0, ∞]. Note that !p Z n X p p |gn | dµ = kgn kp 6 kfk kp 6 Mp k=1

by the triangle inequality for k · kp . By monotone convergence, this implies that kgkpp = lim kgn kpp 6 M p , n→∞

and so g(x) < ∞ for µ-almost every x ∈ X. Therefore, f (x) =

∞ X

fn (x)

n=1

exists for µ-almost every x ∈ X, and hence defines a measurable function f : X → R. Strictly speaking we have only defined f on the complement of a null set, but we simplify the notation by ignoring this distinction here. Since we also have |f (x)| 6 g(x) for all x, we have f ∈ Lpµ (X). It remains to show that

n

X

fk − f −→ 0 (2.14)

k=1

Pn

p

p

as n → ∞. For this,P notice first that | k=1 fk − f | 6 (2g)p and by definition p n of f we also have | k=1 fk − f | −→ 0 as n → ∞ and almost everywhere, so that we may apply dominated convergence to the sequence of integrals defined by

2.2 Banach Spaces

35

n

p Z n p

X

X

fk − f = fk − f dµ.

X k=1

k=1

p

From this we obtain (2.14), as required.



Example 2.24(8). Since a pointwise limit (and a fortiori a uniform limit) of a sequence of measurable functions is a measurable function, the subspace L ∞ (X) ⊆ B(X) is closed. Therefore, by part (2) of the example we see that L ∞ (X) is a Banach space with respect to the norm k · k∞ . Now let Wµ = {f ∈ L ∞ | f = 0 µ-almost everywhere}. Clearly Wµ is closed, since if fn ∈ Wµ for all n > 1 and fn → f uniformly, then [ {x ∈ X | f (x) 6= 0} ⊆ {x ∈ X | fn (x) 6= 0} n>1

is a µ-null set. Therefore ∞ L∞ µ = L /Wµ

is a Banach space with respect to the quotient norm k · kL ∞ /Wµ . It remains to show that kf kL ∞ /Wµ = inf kf + gk∞ g∈Wµ

coincides, as claimed, with the essential supremum norm kf kesssup = inf{α > 0 | µ ({x ∈ X | |f (x)| > α}) = 0} as given in Example 2.24(8). For this, assume first that α > kf kesssup so that Nα = {x ∈ X | |f (x)| > α} is a µ-null set, and hence gα = −f 1Nα ∈ Wµ . It follows that kf kL ∞ /Wµ 6 kf + gα k∞ 6 α. Since this holds for any α > kf kesssup it follows that kf kL ∞ /Wµ 6 kf kesssup . If, on the other hand, α > kf kL ∞ /Wµ , then there exists some g ∈ Wµ with kf + gk∞ < α, and so {x ∈ X | |f (x)| > α} ⊆ {x ∈ X | g(x) 6= 0} is a null set. Varying α once more, we see that kf kesssup 6 kf kL ∞ /Wµ .



Exercise 2.30. Show that in the definition of k · kesssup and of k · kL ∞ /Wµ (from the proof that Example 2.24(8) is a Banach space on p. 35) the infima are actually minima and hence that for f ∈ L∞ µ (X) we have |f (x)| 6 kf kesssup µ-almost everywhere.

Finally, let us recall two facts from real analysis:

36

2 Norms and Banach Spaces

• If a series of real numbers is absolutely convergent, then the sum is independent of any rearrangement of the series. The latter property is also called unconditional convergence. • If a convergent series of real numbers is not absolutely convergent, then the series may be rearranged to obtain any value in R∪{±∞} for the sum of the rearranged series. We say that a series is conditionally convergent if its convergence properties depend on the order of its elements. In finite-dimensional spaces unconditional convergence is equivalent to absolute convergence, and in infinite dimensions an absolutely convergent series is unconditionally convergent (see the exercise below). However, any infinitedimensional Banach space contains an unconditionally convergent series that is not absolutely convergent(4) (see Corollary 3.42 for a particular case). Exercise 2.31. Show that in a normed vector space any absolutely convergent series is unconditionally convergent.

2.2.2 The Completion of a Normed Vector Space Even though we have seen several examples of Banach spaces above, there are many natural normed vector spaces that are not Banach spaces. For example, R[x] is not a Banach space with respect to any of the five norms discussed in Example 2.2(7) (see also Exercise 2.62). As a result it is useful to know that any normed vector space has a completion (whose uniqueness properties we will discuss in Corollary 2.60). Theorem 2.32 (Existence of a Completion). Let (V, k · k) be a normed vector space. Then there exists a Banach space (B, k · k) which contains V as a dense subspace, and the indicated norm on B restricts to the original norm on the image of V in B.  Proof† . Let W = (vn ) ∈ V N | (vn ) is a Cauchy sequence . It is straightforward to check that W is a vector space. We also define the semi-norm k(vn )k′ = lim kvn k, n→∞

which is well-defined as (kvn k) is a Cauchy sequence in R, since (vn ) is a Cauchy sequence in V (due to (2.1)). The kernel of this semi-norm is the space  W0 = (vn ) | vn → 0 as n → ∞ of null sequences (that is, sequences converging to 0) in V . We define

The proof in this section can be skipped, as many natural normed vector spaces are already Banach spaces, and we will be able to give another shorter construction in Chapter 7 on p. 214.

2.2 Banach Spaces

37

B = W/W0 and kbkB = lim kvn k n→∞

where b = (vn ) + W0 . It follows from our discussion concerning quotient norms (see Lemma 2.15) that (B, k · kB ) is a normed vector space. Moreover, B contains an isometric copy of V (that is, there is an isometry V → B), since an element v ∈ V can be identified with the equivalence class of the constant sequence φ(v) = (v, v, . . . ) + W0 , with the norm of this coset being kφ(v)kB = lim kvk = kvk n→∞

by definition. We claim that (the image of) V is dense in B. Given an equivalence class b = (v1 , v2 , . . . ) + W0 ∈ B of a Cauchy sequence (vn ), for every ε > 0 there exists some N with kvm − vn k < ε for m, n > N . Then k(v1 , v2 , . . . ) + W0 − φ(vN )kB = lim kvn − vN k 6 ε. n→∞

Using this for any ε > 0 shows that the image of V is dense in B. It remains to show that B is complete with respect to k · kB . For this, assume that (bn )n>1 is a Cauchy sequence in B. Since the image of V is dense in B we can find a sequence (vn ) of vectors in V with kbn − φ(vn )kB
0 there exists some N (ε) with kbm − bn kB < ε and

1 1 m, n

< ε for m, n > N (ε), so that

kvm − vn k 6 kφ(vm ) − bm kB + kbm − bn kB + kbn − φ(vn )kB < 3ε | {z } | {z } | {z } 1 0. It follows that there exists some w ∈ W with kv + wk 6 2d. Define vk+1 =

1 (v + w), kv + wk

so that kvk+1 k = 1. Also, for 1 6 n 6 k, we have vn ∈ W and so kvk+1 − vn k > kvk+1 + W kV /W =

1 d 1 kv + W kV /W > = kv + wk 2d 2

as required. Thus by induction we obtain a sequence (vn ) with the claimed properties, and hence the proposition.  Given the negative statement in Proposition 2.35, a natural question to ask is how to characterize compact subsets of a Banach space. This depends on the space concerned (see Exercise 2.36). A vague principle is that one tries to extract topological and geometrical properties of finite subsets of the Banach space, and then compact subsets are sometimes characterized by suitable uniform versions of those properties. We will also illustrate this in the next section, where we will prove the Arzela–Ascoli theorem. Exercise 2.36. Characterize the compact subsets of the following Banach spaces. (a) The space c0 of null sequences (that is, sequences (xn ) of scalars with |xn | → 0 as n → ∞) with the norm k(xn )k∞ = supn>1 |xn | = maxn>1 |xn |. (b) The space ℓp of p-summable sequences of scalars with p ∈ [1, ∞). That is, ℓp =

with the p-norm k(xn )kp =

P∞

(

n=1

(xn ) |

|xn |

∞ X

n=1

 p 1/p

|xn |p < ∞

)

.

2.3 The Space of Continuous Functions To illustrate the failure of compactness of the closed unit ball in a Banach space, we now discuss the Banach space of continuous functions C(X) on a compact metric space (X, d). A subset K ⊆ C(X) is said to be equicontinuous if for every ε > 0 there is a δ > 0 such that d(x, y) < δ =⇒ |f (x) − f (y)| < ε

40

2 Norms and Banach Spaces

for all x, y ∈ X and f ∈ K. The key uniformity here is that a single δ may be used for all the functions f ∈ K. 2.3.1 The Arzela–Ascoli Theorem Essential Exercise 2.37. Recall that a function f : X → R on a metric space (X, d) is uniformly continuous if for any ε > 0 there is some δ > 0 for which d(x, y) < δ =⇒ |f (x) − f (y)| < ε for all x, y ∈ X. Show that any continuous function on a compact metric space is uniformly continuous and hence that any finite set of continuous functions is equicontinuous. Theorem 2.38 (Arzela–Ascoli). Let (X, d) be a compact metric space, and let C(X) be the Banach space of continuous (real- or complex-valued) functions on X with the supremum norm. A subset K ⊆ C(X) is compact if and only if K is closed, bounded, and equicontinuous. Proof. Suppose that K ⊆ C(X) is compact, so it is closed and bounded. We will now show that it is also equicontinuous. Fix ε > 0. Then we may find finitely many functions f1 , . . . , fn ∈ K such that K⊆

n [

Bε (fi )

(2.18)

i=1

by compactness, since {Bε (f ) | f ∈ K} is an open cover of K. Each fi is continuous and, since X is compact, each fi is also uniformly continuous by Exercise 2.37. Since the family {fi } is finite, we can conclude that there is a δ > 0 with d(x, y) < δ =⇒ |fi (x) − fi (y)| < ε (2.19) for i = 1, . . . , n. We now combine (2.18) and (2.19) for the given ε > 0. Fix some f ∈ K. By (2.18), there exists some i with kf − fi k∞ < ε. If x, y ∈ X and d(x, y) < δ, then |f (x) − f (y)| 6 |f (x) − fi (x)| + |fi (x) − fi (y)| + |fi (y) − f (y)| < 3ε, | {z } | {z } | {z } 0. Then there exists some δ > 0 with d(x, y) < δ =⇒ |fnk (x) − fnk (y)| < ε (2.21) for all k > 1 (this is possible by equicontinuity of K). Now choose some m ∈ N 1 with m 6 δ. Since fnk (y) −→ φ(y)

as k → ∞ for y ∈ Dm , each of the sequences (fnk (y))k in IM (with k varying) is a Cauchy sequence. Since m is fixed, there are only finitely many sequences concerned, so there exists some N (ε) such that k, ℓ > N (ε) implies that |fnk (y) − fnℓ (y)| < ε

(2.22)

for all y ∈ Dm . Now we combine (2.21) and (2.22) as follows. Given x ∈ X, 1 by (2.20) there is some y ∈ Dm with d(x, y) < m 6 δ. For k, ℓ > N (ε) this implies

42

2 Norms and Banach Spaces

|fnk (x) − fnℓ (x)| 6 |fnk (x) − fnk (y)| | {z } 0. Then the function 1 gε = 2 (f 2 + ε) M +ε is in A and takes on values in [ε/(M 2 + ε), 1], and so ∞ X

n=0

 kgε − 1kn∞ < ∞,

1/2 n

44

2 Norms and Banach Spaces

which implies that  ∞  X √ 1/2 1/2 gε = (1 + (gε − 1)) = (gε − 1)n n n=0 converges with p respect to k · k∞ by Example 2.24(3) and Lemma 2.28. We deduce that f 2 + ε ∈ A. Now

p p p f2 + ε − f2 ε p 6 √ . f 2 + ε − |f | = f 2 + ε − f 2 = p 2 2 ε f +ε+ f

p p √

In particular, |f | − f 2 + ε 6 ε, and so the fact that f 2 + ε ∈ A 06

for all ε > 0 implies that |f | ∈ A. The identities

max{f, g} = 21 (f + g) + 12 |f − g|

and min{f, g} = 12 (f + g) − 12 |f − g| give the other parts of the lemma.



Proof of Theorem 2.40. We start with the case of an algebra A ⊆ CR (X). Notice that by Lemma 2.42 the algebra A is closed under taking finitely many maxima or minima: if f1 , . . . , fn ∈ A then max{f1 , . . . , fn }, min{f1 , . . . , fn } ∈ A. We will use this property for a given f ∈ CR (X) and ε > 0 to find a function fε ∈ A with kf − fε k∞ < ε. This then implies that A = CR (X). The construction has three steps. First step: correct value at two points. Let x0 , x ∈ X be (not necessarily distinct) points. Then there exists some hx0 ,x ∈ A with  hx0 ,x (x0 ) = f (x0 ) (2.23) hx0 ,x (x) = f (x). Indeed, if x0 = x then we simply take hx0 ,x = f (x0 )1 ∈ A. If x ∈ Xr{x0 } we know that A contains a function e h ∈ A with e h(x) 6= e h(x0 ) since the algebra separates points. In this case, we may find a linear combination hx0 ,x of e h ∈ A and the constant function 1 ∈ A with the desired property.

Second step: correct value at one point, nowhere much smaller. Let x0 ∈ X. As our next step we claim that there exists a function gx0 ∈ A with  gx0 (x0 ) = f (x0 ) (2.24) gx0 (y) > f (y) − ε

2.3 The Space of Continuous Functions

45

for all y ∈ X. That is, gx0 is chosen to have the correct value at x0 for the objective of approximating f , and to be not much smaller than f at every other point, as illustrated in Figure 2.1.

hx0 ,x

f f −ε

hx0 ,x0 = f (x0 )1

|

{z

|

x0

}

O x0

{z

x

Ox

X

}

Fig. 2.1: The function gx0 is constructed by finding x1 , . . . , xn (in this case, x0 and x) with the property that gx0 = max{hx0 ,x1 , . . . , hx0 ,xn } > f − ε.

We will construct gx0 as a maximum after finding a finite subcover for the following open cover of X. For any x ∈ X (including x0 ) there exists an open neighbourhood Ox of x with y ∈ Ox =⇒ hx0 ,x (y) > f (y) − ε,

(2.25)

where hx0 ,x ∈ A is as in (2.23). This defines an open cover {Ox | x ∈ X} of X. By compactness there exists some finite subcover X = Ox1 ∪ · · · ∪ Oxn .

(2.26)

We define gx0 = max{hx0 ,x1 , . . . , hx0 ,xn } ∈ A, and notice that gx0 satisfies gx0 (x0 ) = max{f (x0 ), . . . , f (x0 )} = f (x0 ) by (2.23), and by (2.26) for every y ∈ X there is some i ∈ {1, . . . , n} for which y ∈ Oxi , and hence gx0 (y) > hx0 ,xi (y) > f (y) − ε

46

2 Norms and Banach Spaces

by (2.25). f +ε

g x2

f

g x1

f −ε

g x3

|

x2

| {z

U x2

{z

U x1

}

x1

|

x3

}

X

{z

U x3

}

Fig. 2.2: The function fε = min{gx1 , . . . , gxm } is constructed with kf −fε k∞ < ε.

Third step: nowhere much smaller, nowhere much bigger. The claim (2.24) above takes care of one half of the need to find an approximation to f within A. For every x ∈ X we found some gx ∈ A that is nowhere much smaller than f , and is equal to f at x. We now vary the point x, and essentially repeat the argument to find an ε-approximation to f within A. Indeed, for every x ∈ X there is an open neighbourhood Ux for which y ∈ Ux =⇒ gx (y) < f (y) + ε.

(2.27)

By allowing x ∈ X to vary this gives an open cover {Ux | x ∈ X} of X, and once again by compactness there is a finite subcover X = Ux1 ∪ · · · ∪ Uxm .

(2.28)

We define fε = min{gx1 , . . . , gxm } ∈ A, and claim that kf − fε k∞ 6 ε, as illustrated in Figure 2.2. For every y ∈ X we have gxi (y) > f (y) − ε by the property of gxi in (2.24), and so fε (y) > f (y)−ε. By (2.28) every y ∈ X lies in some Uxi and so (2.27) implies that

2.3 The Space of Continuous Functions

47

fε (y) 6 gxi (y) < f (y) + ε. Since fε ∈ A and ε > 0 was arbitrary, we deduce that f ∈ A. Complex case. In the case of a complex sub-algebra A ⊆ CC (X) that is closed under conjugation we may consider AR = A ∩ CR (X). This is again a sub-algebra that separates points if A separates points. Indeed, if x, y ∈ X with x 6= y then there is (by the assumption on A) some f ∈ A with f (x) 6= f (y). Let u = ℜ(f ) and v = ℑ(f ), so that u=

f +f f −f ,v = ∈ AR 2 2i

by our assumption on A. Thus AR also contains a function that separates x and y. By the real case, AR is dense in CR (X), so by splitting an arbitrary function in CC (X) into real and imaginary parts and approximating each of these with elements of AR ⊆ A the theorem is proved.  Exercise 2.43. (a) Let X be as in Theorem 2.40. Show that the second requirement (Constants) on A ⊆ C(X) could also be replaced by the requirement • (Nowhere vanishing) for every x ∈ X there is a function f ∈ A with f (x) 6= 0. (b) Let X be a locally compact space. Extend the Stone–Weierstrass theorem to C0 (X) by considering a sub-algebra A ⊆ C0 (X) that separates points, is closed under conjugation, and vanishes nowhere. Exercise 2.44. Define, for every infinite compact subset K ⊆ R, kpkK = kp|K kC(K) = sup |p(x)|. x∈K

Show that k · kK and k · kL are inequivalent norms on R[x] if K 6= L are two different infinite compact subsets. Exercise 2.45. Let X and Y be two compact spaces. Prove that the linear hull of all functions of the form (x, y) ∈ X × Y → f (x)g(y) for f ∈ C(X) and g ∈ C(Y ) is dense in C(X × Y ).

The following is also an easy consequence of the Stone–Weierstrass theorem. Lemma 2.46 (Separability). Let (X, d) be a compact metric space. Then the Banach space C(X) is separable with respect to the topology induced by the supremum norm. Proof. The space X is separable (this may be seen, for example, from the proof of Theorem 2.38) so we may choose a countable dense set {xn | n ∈ N} in X. We now define fn (x) = d(x, xn ) for all x ∈ X and n > 1, and claim

48

2 Norms and Banach Spaces

that these functions separate points in X. That is, if x 6= y then there exists some n for which fn (x) 6= fn (y). To see this, notice that by density there is some n with d(x, xn ) = fn (x) < 12 d(x, y), which implies that fn (y) = d(y, xn ) > d(y, x) − d(x, xn ) > 12 d(x, y). Now let AQ = Q[f0 = 1, f1 , f2 , . . . ] be the Q-algebra generated by the functions f1 , f2 , . . . together with the constant function f0 = 1. Clearly AQ is countable, and the closure of AQ contains the algebra A = R[f0 , f1 , f2 , . . . ]. Since A is an algebra that separates points, it is dense in CR (X) (and A +iA is dense in CC (X)) by the Stone–Weierstrass theorem (Theorem 2.40).  2.3.3 Equidistribution of a Sequence †

As an application of the discussion above, and in particular of the Stone– Weierstrass theorem, we now describe the notion of equidistribution. A sequence (xn )n of elements of a metric space X is dense if for every x ∈ X there is a subsequence (xnk )k that converges to x. A much finer property is given by equidistribution, which roughly speaking corresponds to the sequence spending the right proportion of time in any given part of the space. In this section we will define and discuss this notion carefully for X = [0, 1]. A sequence (xn )n>1 of points in [0, 1] is said to be equidistributed or uniformly distributed if any one of the following equivalent conditions is satisfied: 1 |{k ∈ N | 1 6 k 6 K, xk ∈ [a, b]}| → b−a as K → ∞ for 0 6 a < b 6 1. K Z 1 K 1 X (2) f (xk ) −→ f (x) dx as K → ∞ for any f ∈ C([0, 1]) (that is, K 0 k=1 any continuous function). Z 1 K 1 X (3) f (xk ) −→ f (x) dx as K → ∞ for any f ∈ R([0, 1]) (that is, K 0 k=1 any Riemann-integrable function). ( Z 1 K 0 if n 6= 0 1 X (4) χn (xk ) −→ χn (x) dx = as K → ∞ for any n K 1 if n = 0 0 k=1 in Z, where χn (x) = e2πinx for all x ∈ [0, 1].

(1)

We will now sketch some of the implications between these equivalent statements (see Exercise 2.48) and will return to the topic of equidistribution in Chapter 8 from a more general point of view. Almost a proof of (4) =⇒ (2). Consider the algebra of trigonometric polynomials †

The results of this section will not be needed in this form later, so may be skipped.

2.3 The Space of Continuous Functions

A=

(

N X

n=−N

49

)

cn χn | cn ∈ C, N ∈ N .

Using the complex version of the Stone–Weierstrass theorem (Theorem 2.40), it follows that A is dense in C(T) with respect to the uniform metric (see Exercise 2.47 and Proposition 3.65), where T = R/Z. This means that given f ∈ C(T) and ε > 0, there is some g ∈ A with kf − gk∞ = sup |f (x) − g(x)| < ε, x∈T

which implies that

Z 1 Z 1 f (x) dx − g(x) dx < ε 0

and

0

K K 1 X 1 X f (xk ) − g(xk ) < ε K K k=1

k=1

for any K > 1. If K is sufficiently large then, by assumption, Z 1 K 1 X g(xk ) − g(x) dx < ε. K 0 k=1

It follows that

Z 1 K 1 X f (xk ) − f (x) dx < 3ε, K 0 k=1

which is not quite the claim in (2) since C(T) and C([0, 1]) differ slightly. Indeed, any function f : T → C gives rise to a function f : R → C via the diagram R

f

/C ⑧? ⑧ ⑧⑧ ⑧⑧ f ⑧  ⑧ T

which we can restrict to [0, 1], defining an element g ∈ C([0, 1]). If f : T → C is continuous then so is g, but g will always satisfy g(0) = g(1). On the other hand, if g ∈ C([0, 1]) is a function satisfying g(0) = g(1) then one can define a continuous function f : T → C by f (t + Z) = g(t) for t ∈ [0, 1], and obtain the result for such g. The extension to general continuous functions on [0, 1] can be handled by the same method as in the proof that (2) implies (1) below, where we will only assume (2) for all f ∈ C(T).  Exercise 2.47. Show that A as in the previous proof does indeed satisfy all the assumptions of Theorem 2.40,

50

2 Norms and Banach Spaces

Proof of (2) =⇒ (1). Suppose first that 0 < a < b < 1 and write 1[a,b] for the characteristic function of the interval [a, b]. Fix ε > 0 and choose continuous functions f− , f+ : [0, 1] → R with

(a) 0 6 f− (x) 6 1[a,b] (x) 6 f+ (x) 6 1 for all x ∈ [0, 1], Z 1 (b) (f+ − f− ) dx < ε, and 0

(c) f+ (0) = f+ (1) = f− (0) = f− (1) = 0.

For example, the functions f+ and f− could be chosen to be piecewise linear, as illustrated in Figure 2.3. In this case the shaded region can easily be chosen to have total area bounded above by ε, as required in (b). By (c), the functions f− and f+ also define continuous functions on T. 1[a,b]

f+

f−

0

a

1

b

Fig. 2.3: The function 1[a,b] and the approximations f− (drawn using dots) and f+ (using dashes).

Since

K K K 1 X 1 X 1 X f− (xk ) 6 f+ (xk ) 1[a,b] (xk ) 6 K K K k=1

k=1

k=1

R1 for all K > 1, and the left-hand side converges to 0 f− (x) dx while the R1 right-hand side converges to 0 f+ (x) dx as K → ∞, we obtain (b−a)−ε 6 lim inf K→∞

K K 1 X 1 X 1[a,b] (xk ) 6 lim sup 1[a,b] (xk ) 6 (b−a)+ε, K K→∞ K k=1

k=1

which implies the claim in (1) for 0 < a < b < 1. The formula in (1) holds trivially if f ≡ 1, so we also get K 1 X K

k=1



1[0,a) (xk ) + 1(b,1] (xk ) −→ 1 − (b − a)

2.3 The Space of Continuous Functions

51

as K → ∞ by taking the difference. Suppose now that 0 = a < b < 1. Then, for any sufficiently small ε > 0, we have f− = 1[ε,b] 6 1[0,b] 6 1[0,b+ε) + 1(1−ε,1] = f+ and

Z

1

0

(f+ − f− ) dx < 3ε,

and the formula in (1) already holds for f− and f+ . As before, this implies the claim for 1[0,b] . The case of 0 < a < b = 1 is similar.  Exercise 2.48. Prove the remaining implications to show that the four characterizations of equidistribution at the start of this section are indeed equivalent.

Example 2.49. A simple example of an equidistributed sequence may be obtained as follows. Fix α ∈ RrQ and define xk = {kα} ∈ [0, 1) for k ∈ N, where {t} denotes the fractional part of the real number t. To see that this defines an equidistributed sequence, the characterization in (4) is the most PK 1 convenient to use. For n = 0, we have χn ≡ 1 and so K k=1 χ0 (xk ) = 1 for all K. If n 6= 0, then K K 1 X 2πinkα 1 X 2πinα k e2πin(K+1)α − e2πinα e| {z } = e = −→ 0 K K K(e2πinα − 1) k=1

6=1

k=1

as K → ∞.

An amusing consequence of this example is a special case of Benford’s law [7]. Exercise 2.50. Use the equidistribution from Example 2.49 to show the following. Write ℓn for the leading digit of 2n written in decimal (so the sequence (ℓn ) begins (2, 4, 8, 1, 3, . . . )). Then 1 |{k ∈ N | 1 6 k 6 K, ℓk = 1}| −→ log10 2 K as K → ∞.

2.3.4 Continuous Functions in Lp Spaces Another important feature of continuous functions (of compact support) is that they form a dense subset of the Lp spaces. We say that a measure on a topological space is locally finite if each point of X has an open neighbourhood of finite measure. Proposition 2.51 (Density of Cc (X) in Lpµ (X)). Let X be a locally compact σ-compact metric space equipped with a locally finite measure µ on the Borel σ-algebra B(X). Then, for any p ∈ [1, ∞), Cc (X) is dense in Lpµ (X).

52

2 Norms and Banach Spaces

Proof† . We split the proof of the proposition into several steps. Compact case. We assume first that X is a compact metric space, that µ is a finite Borel measure on X, and start with some preparatory observations. Fix p ∈ [1, ∞) and let f ∈ Lµp (X). Then f = ℜf + iℑf, and it is enough to show that each of ℜf and ℑf can be approximated in Lpµ by elements of Cc (X). We may therefore assume that f is real-valued, and by writing f = f + − f − we may also assume that it takes values in [0, ∞). Now notice that such a real-valued, non-negative function f is the pointwise limit of the simple functions  fn (x) = min n, 21n ⌊2n f (x)⌋ ր f (x) as n → ∞, which implies that

kfn − f kp =

Z

X

p

|fn − f | dµ

1/p

−→ 0

as n → ∞, by dominated convergence. PN Thus it is sufficient to show that any simple function f = i=1 ai 1Bi (where ai ∈ R and Bi ∈ B(X) have µ(Bi ) < ∞ for i = 1, . . . , N ) can be approximated by elements of C(X). This in turn will follow if we can show that the characteristic function of any Borel set can be approximated by elements of C(X) in the k · kp norm.

Defining a σ-algebra. Having made these initial reductions, we can now turn to the heart of the argument (still assuming that X is compact). We define the family o n k·kp

A = B ∈ B | 1B ∈ C(X)

of all Borel sets whose characteristic function can be approximated by elements of C(X) (the notation indicates that the closure is taken within Lpµ (X) and with respect to the norm k · kp ). We claim that A = B, and will prove this by showing that • A contains any open subset of X, and • A is a σ-algebra.

Open Subsets. Let O ⊆ X be open. Define the closed set A = XrO and the distance function d(x, A) = inf d(x, y). (2.29) y∈A

† The reader may be familiar with this result for the Lebesgue measure (for example), and this case is sufficient for much of the material that will follow. Thus she may skip the general proof and return to it at a later stage if needed.

2.3 The Space of Continuous Functions

53

This distance function satisfies |d(x1 , A) − d(x2 , A)| 6 d(x1 , x2 )

(2.30)

for all x1 , x2 ∈ X. Indeed, given x1 , x2 ∈ X and ε > 0, there exists some y ∈ A for which d(x2 , y) 6 d(x2 , A) + ε, and so d(x1 , A) 6 d(x1 , y) 6 d(x1 , x2 ) + d(x2 , y) 6 d(x1 , x2 ) + d(x2 , A) + ε, which implies that d(x1 , A) 6 d(x1 , x2 ) + d(x2 , A), and hence (2.30) by the symmetry between x1 and x2 . This shows the continuity of x 7→ d(x, A) and so it follows that the function defined by fn (x) = min{1, nd(x, A)} lies in C(X). Moreover, if x ∈ A = XrO then fn (x) = 0 = 1O (x), while if x ∈ O then d(x, A) > 0 and fn (x) ր 1 = 1O (x). Thus fn ր 1O as n → ∞ on X, and so Z  kfn − 1O kp =

O

|fn − 1O |p dµ

1/p

−→ 0

as n → ∞ by dominated convergence. This shows that O ∈ A, by definition.

Complements. Suppose that A ∈ A. Then there exists a sequence of functions (fn ) in C(X) with kfn − 1A kp → 0 as n → ∞. Using the sequence 1 − fn ∈ C(X), we see that XrA ∈ A. Finite Intersections. Suppose that A, B ∈ A. Then there exist sequences of functions (fn ), (gn ) in C(X) with kfn − 1A kp → 0 and kgn − 1B kp → 0 as n → ∞. We may assume that fn and gn take on values in [0, 1], for if not we can replace fn by ff n = max{0, min{1, fn }} (which will approximate Then fn gn ∈ C(X) and

1A equally well or better), and similarly gn by gfn .

fn gn − 1A∩B = fn gn − 1A 1B = (fn − 1A ) gn + 1A (gn − 1B ) , which implies kfn gn − 1A∩B kp 6 kfn − 1A kp kgn k∞ + kgn − 1B kp −→ 0

as n → ∞. This shows that A ∩ B ∈ A as desired. Countable unions. Let A, B ∈ A. Then XrA, XrB ∈ A and hence  B ∪ A = Xr (XrA) ∩ (XrB) ∈ A

by the two steps above. This extends to finite unions by induction. Now suppose that A1 , A2 , . . . all lie in A, and fix ε > 0. Then there exists an ℓ such that [  ∞ ℓ [ µ Akr Ak < εp . k=1

k=1

54

Thus

2 Norms and Banach Spaces

 [ 1/p ∞ ℓ

[

S∞

Akr Ak < ε.

1 k=1 Ak − 1Sℓk=1 Ak = µ p

k=1

k=1

Sℓ

However, since k=1 Ak ∈ A for any ℓ > 1, we already know that there exists an f ∈ C(X) with

f − 1Sℓk=1 Ak < ε, p

and so

f − 1S∞ A < 2ε. k p k=1 S Since ε > 0 was arbitrary, we deduce that ∞ k=1 Ak ∈ A.

Concluding the compact case. By the arguments above, A is a σ-algebra containing all the open subsets of X. By definition, A ⊆ B and so A = B by definition of the Borel σ-algebra B. As explained above, this implies that every simple function, and so also every function, in Lpµ (X) can be approximated by continuous functions. Extending to the locally compact case. Let us now extend the above to the general case where X is locally compact σ-compact metric and µ is locally finite. By Lemma A.22 we find a sequence (XS m ) of compact subsets ∞ o of X with Xm ⊆ Xm+1 for all m > 1, and with X = m=1 Xm . Given some f ∈ Lpµ (X) we first note that the sequence fm = f 1Xm converges to f with respect to k · kp as m → ∞ (by dominated convergence). Given some ε > 0 we choose m such that kf − fm kp < ε. Next we apply the compact case above and hence we find some g ∈ C(Xm ) with kg − fm kLp(Xm ,µ) < ε. Applying Tietze’s extension theorem (Proposio tion A.29) we can extend g to an element g ∈ Cc (Xm+1 ) ⊆ Cc (X). Using again the distance function in (2.29) with A = Xm we define the sequence  gn (x) = 1 − min{1, nd(x, Xm )} g(x) ∈ Cc (X). For x ∈ Xm we have gn (x) = g(x) and for x ∈ / Xm we have |gn (x)| ց 0 as n → ∞. Now notice that Z Z p p p kgn −fmkp = |gn − fm | dµ = kg − fm kLp (Xm ,µ) + |gn |p dµ, Xm+1

Xm+1 rXm

where the first expression on the right is less than εp by construction of g and the second expression converges to 0 by dominated convergence as n → ∞. Therefore, there exists some n > 1 such that kgn − fm kp < 2ε. Combining this with the choice of fm above, we obtain kf − gn kp < 3ε and gn ∈ Cc (X), as desired. 

2.4 Bounded Operators and Functionals

55

2.4 Bounded Operators and Functionals Just as in linear algebra, linear maps are of fundamental importance in functional analysis. However, in infinite-dimensional normed vector spaces continuity of linear maps is not guaranteed. Lemma 2.52 (Continuity and boundedness). Let L : V → W be a linear map between the two normed vector spaces (V, k · kV ) and (W, k · kW ). Then L is continuous if and only if the operator norm kLk = kLkop = sup kLvkW v∈V kvkV 61

is finite. Definition 2.53. A continuous linear map L : V → W between normed vector spaces is called a bounded linear operator. We denote the space of all bounded operators from V to W by B(V, W ). For brevity we write B(V ) for B(V, V ). If W = R (or W = C if the field of scalars is C) then we also write V ∗ for B(V, R) (respectively B(V, C)) the dual space of V , and elements of the dual space are called linear functionals. Lemma 2.54 (Space of operators). Let (V, k · kV ) and (W, k · kW ) be normed vector spaces. Then the space B(V, W ) of bounded linear maps from V to W is also a normed vector space with addition and scalar multiplication defined pointwise as in any space of functions, and with the operator norm from Lemma 2.52. If W is a Banach space, then so is B(V, W ), and in particular V ∗ is always a Banach space. Proof of Lemma 2.52. The case L = 0 is trivial, so we may assume that L is not 0. Suppose that kLkop < ∞. Then for any v0 ∈ V we have  V L v0 + Bε/kLk ⊆ L(v0 ) + BεW op V r{0} implies that since v ∈ Bε/kLk op



kLvkW = kvkV L kvk−1 V v W < ε. | {z } | {z } 0 such that  L BδV ⊆ B1W . In particular, kvkV 6 1 implies that kL( 2δ v)kW 6 1, and kLvkW 6 this holds for all v with kvkV 6 1, we deduce that kLkop 6 δ2 < ∞.

2 δ.

As 

56

2 Norms and Banach Spaces

As the next exercise shows, the notion of boundedness for an operator makes a clear distinction between integration and differentiation of realvalued functions. Exercise 2.55. Show that the operator I : C([0, 1]) → C([0, 1]) defined as the integral I(f )(x) =

Z

x

f (t) dt

0

is continuous. Use this to shorten the argument in the proof for Example 2.24(5) on p. 30. Show also that the operator D : C 1 ([0, 1]) → C([0, 1]) defined as the derivative D(f ) = f ′ is not continuous if we use the norm k · k∞ on both spaces.

Notice that the definition of the operator norm immediately gives the general inequality kLvkW 6 kLkop kvkV , for all v ∈ V , and the operator norm may be characterized as being the smallest number C with the property that kLvkW 6 CkvkV ,

(2.31)

for all v ∈ V . We will use both these statements frequently in the sequel without comment. Essential Exercise 2.56. Prove that the operator norm of a bounded operator L : V → W between two normed vector spaces is the smallest constant C > 0 such that (2.31) holds for all v ∈ V . Proof of Lemma 2.54. As indicated in the lemma, for L1 , L2 ∈ B(V, W ) and a scalar α we define αL1 + L2 by (αL1 + L2 ) (v) = αL1 (v) + L2 (v) for all v ∈ V . This is clearly another linear map. In order to bound its operator norm, let v ∈ V with kvkV 6 1. Then k(αL1 + L2 )(v)kW = kαL1 (v) + L2 (v)kW

6 |α|kL1 (v)kW + kL2 (v)kW 6 |α|kL1 kop + kL2 kop ,

and so kαL1 + L2 kop 6 |α|kL1 kop + kL2 kop . That is, the operator norm satisfies the triangle inequality and one half of the homogeneity property. The reverse inequality for homogeneity of the operator norm follows easily by considering the case α = 0 and α 6= 0 separately, as in the proof of Lemma 2.15. Strict positivity is clear, so we have shown that B(V, W ) is a normed vector space with the operator norm. Now suppose that W is a Banach space and that (Ln ) is a Cauchy sequence in B(V, W ). We claim that L(v) = lim Ln (v) n→∞

2.4 Bounded Operators and Functionals

57

defines an element L of B(V, W ) which is the limit of the sequence with respect to the operator norm. To see that L(v) is well-defined it is enough to check that (Ln (v)) is a Cauchy sequence, which follows at once from the bound kLm (v) − Ln (v)kW = k(Lm − Ln )(v)kW 6 kLm − Ln kop kvkV , which (for fixed v) may be made as small as we please for m, n large by the Cauchy property for the sequence (Ln ). To see that the limit L is a bounded operator one has to show that it is linear (which we leave as an exercise) and that it is bounded. For the latter, assume that v ∈ V has kvkV 6 1 and choose N (ε) as in the Cauchy property for (Ln ) with kLm − Ln kop 6 ε for m, n > N (ε). Continuity of the norm now gives kLn (v) − L(v)kW = lim kLn (v) − Lm (v)kW 6 ε m→∞

for n > N (ε). Taking the supremum over v with kvkV 6 1, we get kL − Ln kop 6 ε, so L is bounded with kLkop 6 kLn kop + ε, and as ε > 0 is arbitrary we also see that Ln → L as n → ∞ with respect to k · kop .  A word about notation: Where the spaces concerned are clear, or where we wish to emphasize certain aspects of the spaces, we will for brevity often use k · k or k · kX to mean the appropriate norm in that situation. Thus, for example, depending on context, the symbols kLk, kLkop , and kLkB(V,W ) all mean the same thing. A good exercise for the reader is to ensure that they can identify the norms in each case. Lemma 2.57 (Sub-multiplicativity of operator norms). Let V, W, Z be three normed vector spaces, and let R : V → W and S : W → Z be bounded operators. Then S ◦ R : V → Z is also a bounded operator, and kS ◦ Rk 6 kSkkRk. In particular, if L : V → V is a bounded operator then kLn k 6 kLkn for all n > 1. Proof. We have kS ◦ R(v)k 6 kSkkR(v)k 6 kSkkRkkvk 6 kSkkRk for any v ∈ V with kvk 6 1.  Exercise 2.58. Compute the operator norm of the continuous map f 7−→ f when viewed: (a) as a map from the Banach space C 1 ([0, 1]) to C([0, 1]) (and where the former is equipped with the norm kf kC 1 ([0,1]) = max{kf k∞ , kf ′ k∞ } for f ∈ C 1 ([0, 1])); and (b) as a map C([0, 1]) → L1m ([0, 1]), where m denotes Lebesgue measure on [0, 1]. (c) Compute the operator norm of the composition of the maps from (a) and from (b). (d) Now restrict the maps in (a), (b) and (c) to the subspace of functions f with f (0) = 0, and compute the operator norms again.

58

2 Norms and Banach Spaces

The following result is both quite easy and extremely useful for the theory to come. Proposition 2.59 (Unique extension to completion). Let V be a normed vector space, let V0 ⊆ V be a dense subspace, and assume that L0 : V0 → W is a bounded operator into a Banach space W . Then L0 has a unique bounded extension L : V → W , that is a bounded linear map L : V → W which satisfies L|V0 = L0 . Moreover, kLkB(V,W ) = kL0 kB(V0 ,W ) . We implicitly assume here that a subspace V0 ⊆ V is equipped with the restriction of the norm on V to V0 . This is important to remember in applications where the subspace may have other natural norms defined on it. Proof of Proposition 2.59. For any v ∈ V there is a sequence (vn ) in V0 with vn → v as n → ∞. In particular, this implies that (vn ) is a Cauchy sequence in V0 , and since L0 : V0 → W is bounded (and so Lipschitz), it follows that (L0 (vn )) is a Cauchy sequence in W . If (vn′ ) is another sequence in V0 with vn′ → v as n → ∞ then it is clear that vn − vn′ → 0 as n → ∞ and so L0 (vn ) − L0 (vn′ ) −→ 0 as n → ∞ since L0 is bounded (and so continuous at 0). Thus it makes sense to define an operator L on V by L(v) = lim L0 (vn ) ∈ W, n→∞

because W is a Banach space. Notice that by density and the desired continuity of the extension, this is the only possible definition of a bounded operator that extends L0 . One can quickly check that L is a linear map from V to W . Moreover, if v ∈ V and (vn ) is a sequence in V0 with vn → v as n → ∞, then kL(v)k = lim kL0 (vn )k 6 kL0 k lim kvn k, n→∞ n→∞ | {z } =kvk

showing that L is bounded, with kLk 6 kL0 k. On the other hand L|V0 = L0 , so kLk > kL0 k.  Corollary 2.60. Any two completions B1 and B2 of a given normed vector space V are isometrically isomorphic. Here a completion of a normed vector space V is a Banach space B containing an isometric dense copy of V , just as in the construction in Section 2.2.2. Proof of Corollary 2.60. Suppose that φ1 : V → B1 and φ2 : V → B2 are isometric embeddings associated to the two completions, as illustrated in Figure 2.4.

2.4 Bounded Operators and Functionals

59

V

❆ ⑥⑥ ❆❆❆ φ2 ⑥ ❆❆ ⑥ ❆❆ ⑥⑥ ⑥ ~ ⑥ ψ1 +B B1 k 2 φ1

ψ2

Fig. 2.4: The two given completions φ1 , φ2 and the maps ψ1 , ψ2 to be constructed.

Since φ1 and φ2 are isometries, the map φ2 ◦ φ−1 1 : φ1 (V ) −→ φ2 (V ) ⊆ B2 φ1 (v) 7−→ φ2 (v)

is a well-defined bounded operator defined on a dense subset φ1 (V ) ⊆ B1 . By Proposition 2.59 there is an extension ψ1 : B1 → B2 with norm kψ1 k = kφ2 ◦ φ−1 1 k = 1. Similarly there exists an extension ψ2 : B2 → B1 which extends φ1 ◦ φ−1 2 and which also has norm kψ2 k = 1. It follows that ψ2 ◦ ψ1 and ψ1 ◦ ψ2 are extensions of the identity map on φ1 (V ) and on φ2 (V ) respectively. By uniqueness of the extension in Proposition 2.59 we must have ψ2 ◦ ψ1 = IB1 and ψ1 ◦ ψ2 = IB2 . We also see that kbk = kψ2 (ψ1 (b))k 6 kψ1 (b)k 6 kbk for any b ∈ B1 , so that ψ1 is an isometry from B1 to B2 with ψ2 its inverse.  Exercise 2.61. Let D = {z ∈ C | |z| < 1} ⊆ C be the open unit disk, and parameterize the circle of radius r ∈ (0, 1) by the map γr : [0, 1] → C defined by γr (t) = re2πit . Let V be the space of functions f ∈ C(D) holomorphic on D, and fix p ∈ [1, ∞). (a) Equip V with the norm kf kH p (D) =

sup r∈(0,1)

Z

0

1

1/p

|f (γr (t))|p dt

.

Show that the linear map Ez : f 7−→ f (z) is continuous with respect to k · kH p (D) for all z ∈ D. Also show that if O ⊆ D is open with compact closure O ⊆ D, then V ∋ f 7−→ f |O ∈ C(O) is a bounded operator with respect to k·kH p (D) and k·k∞ on C(O). In particular, conclude that there exists a canonical injective map from the completion H p (D) of V , known as a Hardy space, into the space of holomorphic functions on D. (b) Equip V with the norm kf kAp (D) = kf kLp (D) , and repeat the problems from (a) to obtain the(7) Bergman space Ap (D).

60

2 Norms and Banach Spaces

Exercise 2.62. For each of the five norms on R[x] given in Example 2.2(7), find a Banach space containing R[x] for which the induced norm obtained by restriction coincides with the given norm on R[x].

2.4.1 The Norm of Continuous Functionals on C0 (X) Let X be a locally compact metric space, and let µ be a finite Borel measure. Then Z µ : f 7−→ f dµ is a continuous functional on C0 (X). Indeed, Z Z f dµ 6 |f | dµ 6 µ(X)kf k∞

shows the continuity by Lemma 2.52. More generally, if µ is a Borel measure on X and g ∈ L1µ (X) then Z g dµ : C0 (X) ∋ f 7−→ f g dµ (2.32) is also a continuous functional on C0 (X). Again this is easy to see since Z Z f g dµ 6 |f ||g| dµ 6 kf k∞ kgkL1 . (2.33) µ

In fact, a more precise statement holds, but this takes a little more work.

Lemma 2.63 (Operator norm of integration). Suppose that µ is a Borel measure on a locally compact σ-compact metric space X and g is a function in L1µ (X). Then the norm of the functional on C0 (X) defined in (2.32) is precisely kgkL1µ . Proof† . Let h(x) = arg(g(x)) = Clearly h ∈

L∞ µ (X)

and Z

hg dµ =

Z

(

g(x) |g(x)|

0

if g(x) 6= 0, if g(x) = 0.

|g| dµ = kgkL1µ .

We wish to approximate h by continuous functions. Fix ε > 0. By Lusin’s theorem (Theorem B.17) applied to the finite measure ν defined by dν = |g| dµ † The result of Section 2.4.1 will initially only be used in more concrete settings. The reader may therefore skip the proof and return to it later if needed.

2.4 Bounded Operators and Functionals

61

there exists a compact R set K ⊆ X such that the restriction h|K of h to K is continuous, and XrK |g| dµ < ε. By Tietze’s extension theorem (Proposition A.29) the restriction h|K can be extended to a continuous function fε ∈ Cc (X) of compact support. We may assume that kfε k∞ 6 1, because if this is not the case we may replace fε by the continuous function ( fε (x) if |fε (x)| 6 1, fε (x) if |fε (x)| > 1. |fε (x)| Thus kg dµkop

Z Z Z > fε g dµ > fε g dµ − fε g dµ X ZK Z XrK > |g| dµ − |g| dµ > kgkL1µ − 2ε. K

XrK

Since ε > 0 was arbitrary, this shows that kgkL1µ 6 kg dµkop , and the reverse inequality follows from (2.33).



Exercise 2.64. Let X be a compact metric space, µ a Borel measure on X, and g a function in L1µ (X). Give and prove a precise criterion in terms R of properties of g for the existence of a function f ∈ C(X) with kf k∞ 6 1 such that | f g dµ| = kgk1 .

2.4.2 Banach Algebras In many situations it makes sense to multiply elements of a normed vector space with each other. Recall that an algebra is a vector space and simultaneously a ring in such a way that the two structures are compatible: addition in the vector space and addition in the ring are the same, the scalar multiplication and the ring multiplication satisfy (αx)y = x(αy) = α(xy) for all scalars α and elements x, y ∈ A. Definition 2.65. Let A be a Banach space, and assume there is a multiplication operation (x, y) 7→ xy from A × A to A such that addition and multiplication make A into an algebra, with the sub-multiplicativity property that kxyk 6 kxkkyk for all x, y ∈ A. Then A is called a Banach algebra. Elements a, b of an algebra are said to commute if ab = ba, and the algebra is said to be commutative if any a, b ∈ A commute.

62

2 Norms and Banach Spaces

Recall that a ring or an algebra does not need to have a unit; if a nontrivial ring A has a unit 1A satisfying 1A a = a1A = a for all a ∈ A then it is called unital. The additional axiom on the norm makes the product operation continuous by the following argument. Fix ε ∈ (0, 1) and x, y ∈ A. Then kx′ − xk < ε < 1 and ky ′ − yk < ε together imply that kx′ y ′ − xyk 6 kx′ (y ′ − y)k + k(x′ − x)yk

6 (kx′ k + kyk) ε 6 (kxk + 1 + kyk) ε.

(2.34)

Since ε ∈ (0, 1) was arbitrary, this shows the continuity of the product map at (x, y) ∈ A × A. Example 2.66. (1) The continuous functions C(X) on a compact topological space X with the supremum norm form a Banach algebra with respect to the pointwise multiplication operation (f g)(x) = f (x)g(x) for all x ∈ X. Notice that the constant function 1 is a unit in this ring. (2) Let X be a non-compact topological space. Then C0 (X) is a Banach algebra with respect to the supremum norm and pointwise multiplication as in (1) above, but it does not have a unit. (3) If V is any Banach space, then B(V ) = B(V, V ) is a Banach algebra with respect to composition. The sub-multiplicativity property of the operator norm in Definition 2.65 is precisely the content of Lemma 2.57. The algebra has a unit, namely the identity map I(v) = v for all v ∈ V . (4) A special case of (3) above is the case V = Rn . By choosing a basis for Rn we may identify B(Rn ) with the space of n × n real matrices. In a Banach algebra with unit, we can apply many well-known functions to its elements and obtain new elements of the Banach algebra. For example, if a is any element of a unital Banach algebra, then we may define exp a =

∞ X an , n! n=0

where a0 = 1A is the unit in A. The series defines an element of A by Lemma 2.28. We will return to the topic of Banach algebras in Chapter 11.

2.5 Ordinary Differential Equations We want to briefly indicate how even the simplest differential equations can lead directly to the study of integral operators, which may be analyzed using tools introduced above (and in Chapter 6). Consider first the differential equation

2.5 Ordinary Differential Equations

63

f ′′ (x) + f (x) = g(x)

(2.35)

f (0) = 1, f ′ (0) = 0.

(2.36)

with the initial values Let us recall briefly the familiar approach to solving such an equation. First one finds all solutions to the homogeneous equation f ′′ (x) + f (x) = 0, giving f (x) = A sin x + B cos x

(2.37)

for constants A and B. Then one moves on to the problem of finding one particular solution fp to the equation fp′′ (x) + fp (x) = g(x),

(2.38)

ignoring the initial values, which may be done by a sophisticated guess if g is sufficiently simple, or by using the method of variation of parameters (that is, treating A and B as functions of x rather than constants). Finally, taking the sum of f from (2.37) and a solution to (2.38), one chooses the constants A and B in the solution to the homogeneous equation to satisfy the initial values. Rather than going through this in detail, we claim that the function Z x f (x) = cos(x) + sin(x − t)g(t) dt (2.39) 0

is a solution to the initial value problem. This is easily checked by a calculation: f (0) = 1 clearly, and Z x f ′ (x) = − sin x + sin(x − x)g(x) + cos(x − t)g(t) dt, 0

so f ′ (0) = 0. Finally, f ′′ (x) = − cos x + cos(x − x)g(x) − = −f (x) + g(x),

Z

0

x

sin(x − t)g(t) dt

as required. To summarize, we have shown that (2.35)–(2.36) together are equivalent to (2.39). 2.5.1 The Volterra Equation If the original differential equation in (2.35) is changed slightly, to take the form

64

2 Norms and Banach Spaces

f ′′ (x) + f (x) = σ(x)f (x),

(2.40)

with the same initial values f (0) = 1 and f (0) = 0, then the discussion above does not solve the equation. Nonetheless, the ideas are still useful, since it transforms the equation into the integral equation Z x f (x) = cos(x) + sin(x − t) σ(t)f (t) dt. (2.41) | {z } 0 g(t)

Now define k(x, t) = sin(x − t)σ(t) so that (2.41) takes the form f = u + K(f ),

(2.42)

where u(x) = cos x and K(f )(x) =

Z

x

k(x, t)f (t) dt.

0

Due to its inventors and its nature K is called a Hilbert–Schmidt integral operator. The function k will be referred to as the kernel of the integral operator. Solving the perturbed equation (2.40) with initial values turns out to be straightforward at the level of abstraction aimed at in functional analysis. We can rewrite the equation (2.42) as a Volterra equation (I − K)f = u where I is the identity map. The solution f is then given by applying the inverse operator (I − K)−1 to u, which we may calculate (in this particular case) using an operator form of the geometric series (in this context the geometric series is usually called a von Neumann series), (I − K)−1 = and hence f=

∞ X

∞ X

K n,

n=0

K n u.

n=0

The heuristic above is made formal in the following lemma. Lemma 2.67. Suppose that k ∈ C([0, 1]2 ). Then Z x K(f )(x) = k(x, t)f (t) dt 0

2.5 Ordinary Differential Equations

65

defines a bounded linear operator K : C([0, 1]) → C([0, 1]) with kKk 6 kkk∞ , and more generally with kK n k 6 kkkn∞ /n! for n > 1. In particular, the geometric series ∞ X −1 (I − K) = Kn n=0

 converges in B C([0, 1]) . It follows that the integral equation (I − K)f = u has a unique solution for any u ∈ C([0, 1]). For u(x) = cos x, σ ∈ C([0, 1]) and k(x, t) = sin(x−t)σ(t) with x, t ∈ [0, 1], this solution belongs to C 2 ([0, 1]), and solves the initial value problem ( f ′′ + f = σf (2.43) f (0) = 1, f ′ (0) = 0 on [0, 1]. Proof. As k is uniformly continuous, it is easy to check that K(f ) ∈ C([0, 1]) for every f ∈ C([0, 1]). Indeed, if ε > 0 then there exists some δ > 0 for which |x1 − x2 | < δ =⇒ |k(x1 , t) − k(x2 , t)| < ε for all t ∈ [0, 1]. Multiplying by f (t) and integrating from 0 to x shows that |x1 − x2 | < δ =⇒ |K(f )(x1 ) − K(f )(x2 )| < kf k∞ ε. Also, K is linear, and Z kKf k∞ 6 sup x∈[0,1]

x

0

k(x, t)f (t) dt 6 kkk∞ kf k∞ ,

so K defines a bounded linear operator with kKk 6 kkk∞ . To prove the estimate on kK n k we need to be a bit more careful. For every x ∈ [0, 1] we have Z x |K(f )(x)| 6 |k(x, t)f (t)| dt 0

6 xkkk∞ kf k∞ .

Suppose we have already shown for x ∈ [0, 1] that |K n (f )(x)| 6 Then

xn kkkn∞ kf k∞ . n!

(2.44)

66

2 Norms and Banach Spaces

n+1 K (f )(x) 6 6

Z

0

Z

0

x

|k(x, t)| |K n (f )(t)| dx

x n

t xn+1 kkkn+1 kkkn+1 ∞ kf k∞ dx = ∞ kf k∞ n! (n + 1)!

for all x ∈ [0, 1]. By induction on n, it follows that (2.44) holds for all n > 1. Hence kK n k 6 kkkn∞ /n! for alln > 1, as claimed. By Lemma 2.54, B C([0, 1]) is a Banach space. It follows by Lemma 2.28  P∞ that the absolutely convergent series n=0 K n also converges in B C([0, 1]) . However, (I − K)

∞ X

n=0

P∞

∞ ∞ ∞  X  X X Kn = K n (I − K) = Kn − K n = I, n=0

n=0

n=1

so the sum n=0 K n is the inverse of I − K and, for any u ∈ C([0, 1]), the equation (I − K)f = u has the unique solution f = (I − K)−1 u. In the case u(x) = cos x, k(x, t) = sin(x − t)σ(t) for x, t ∈ [0, 1] and σ ∈ C([0, 1]), the calculation after (2.39) shows that the solution f belongs to C 2 ([0, 1]) and solves (2.40) with the initial values f (0) = 1 and f ′ (0) = 0.  2.5.2 The Sturm–Liouville Equation We now make two more small changes to the initial value problem (2.35) and (2.36). Fix a parameter λ > 0 and consider instead the Sturm–Liouville equation f ′′ + λ2 f = g, (2.45) with the boundary conditions f (0) = f (1) = 0. These boundary conditions (made at the two end points of [0, 1]) replace the initial value conditions in (2.36), and that change has a surprisingly deep impact on the resulting equation. As we recall below the space of functions satisfying (2.45) is, if non-empty, a two-dimensional affine subspace of functions, so that the additional boundary conditions might lead to a unique solution. We may proceed just as before. The functions of the form f (x) = A cos(λx) + B sin(λx) give all solutions to the homogeneous differential equation f ′′ +λ2 f = 0. Next one needs to find a particular solution fp to fp′′ + λ2 fp = g

2.5 Ordinary Differential Equations

67

(ignoring the boundary conditions). After this, one would use the solutions to the homogeneous differential equation to satisfy the boundary conditions. Explicitly, given fp we can calculate the vector   fp (0) (2.46) fp (1) and try to express it as a linear combination of the two vectors     cos(λ0) 1 = cos(λ1) cos λ and

If

    sin(λ0) 0 = . sin(λ1) sin λ 

 1 0 det = sin λ cos λ sin λ

is non-zero, then this is always possible and we find a unique solution to the boundary value problem. However, if λ ∈ πZ then sin λ = 0 and we may be unlucky with the value of the vector (2.46): if the vectors     fp (0) 1 , fp (1) cos λ are linearly independent, then there will not be a solution to the boundary value problem. This obstruction to being able to find a solution to the boundary value problem may be phrased in terms of another integral operator. Lemma 2.68. Define the continuous Green function on [0, 1]2 by ( s(t − 1) for 0 6 s 6 t 6 1; G(s, t) = t(s − 1) for 0 6 t 6 s 6 1. Then for f, h ∈ C([0, 1]) the conditions f (0) = f (1) = 0 f ∈ C 2 ([0, 1]) and f ′′ = h



(2.47)

are equivalent to the operator equation f = Kh, where K is the operator defined by Z 1 K(h)(s) = G(s, t)h(t) dt. (2.48) 0

Proof. Assume first that f = Kh. Then

68

2 Norms and Banach Spaces

f (0) =

Z

0

1

G(0, t) h(t) dt = 0, | {z } =0

and f (1) = 0 for the same reason. Moreover,

Z 1 t(s − 1)h(t) dt + s(t − 1)h(t) dt, 0 s Z s ✭ ✭✭ f ′ (s) = ✭ s(s✭−✭1)h(s) + th(t) dt f (s) =

Z

s

0

✭ ✭✭ −✭ s(s✭−✭1)h(s) +

Z

s

1

(t − 1)h(t) dt,

and f ′′ (s) = sh(s) − (s − 1)h(s) = h(s), so f is a solution of the boundary value problem (2.47). To see the converse, notice that the boundary value problem has a solution (by the argument above). However, our previous discussion of the boundary value problem associated to the Sturm–Liouville equation (2.45) (which needs to be modified for the case λ = 0) shows that in this case the solution is unique. Thus the equivalence of (2.47) and f = Kh is established.  Exercise 2.69. Modify the argument for the Sturm–Liouville equation for the case λ = 0, and show that the solution is always unique.

In particular, the fact that sn (x) = sin(πnx) for any n ∈ N satisfies  sn (0) = sn (1) = 0 s′′n = −(πn)2 sn implies by Lemma 2.68 that sn = −(πn)2 K(sn ). In other words, the values µn = −(πn)−2 for n = 1, 2, . . . are eigenvalues of the integral operator K (actually these are all the eigenvalues of K; see Exercise 2.70). Thus we can rephrase our earlier observation regarding the equivalent formulations  ′′   f + λ2 f = g ⇐⇒ f = K(−λ2 f + g) ⇐⇒ I + λ2 K f = K(g) f (0) = f (1) = 0 by saying that this differential equation always has a unique solution for any g unless λ = πn corresponds to one of the eigenvalues µn = −(πn)−2 = −λ−2 of K.

2.5 Ordinary Differential Equations

69

In Chapter 6 we will start the discussion of eigenvalues of operators, and as discussed in Section 1.3 it is easier for this to restrict to the case of operators on Hilbert spaces. As we will show in Chapter 6 the operator K also makes sense in L2 ([0, 1]) and is completely diagonalizable on that space. Exercise 2.70 (A special case of elliptic regularity). Suppose that the function f in L2 ([0, 1]) satisfies Kf = λf for some λ in Rr{0}, where K is the operator (2.48) discussed in connection with the Sturm–Liouville problem. Show that f must be smooth on (0, 1), and deduce that f and λ must satisfy the conditions found above. Exercise 2.71. In this exercise we generalize the connection between the Sturm–Liouville boundary value problem and integral operators. Let a < b be real numbers, and assume that p ∈ C 1 ([a, b]) and q ∈ C([a, b]) are real-valued functions with p > 0 and q > 0. We define the second-order differential operator L(f ) = (pf ′ )′ + qf. Also let α1 , α2 , β1 , β2 ∈ R and define the boundary conditions B1 (f ) = α1 f (a) + α2 f ′ (a) = 0, B2 (f ) = β1 f (b) + β2 f ′ (b) = 0. Assume that f1 and f2 are fundamental solutions † of the differential equation L(f ) = 0 such that we also have B1 (f1 ) = B2 (f2 ) = 0, but B1 (f2 ) 6= 0 and B2 (f1 ) 6= 0. Show that p(f1 f2′ − f1′ f2 ) = c is a constant. Using this, define an associated Green function G(s, t) =

(

1 f (s)f2 (t) c 1 1 f (t)f2 (s) c 1

for a 6 s 6 t 6 b, for a 6 t 6 s 6 b,

and show that for h ∈ C([a, b]) the boundary-value problem B1 (f ) = B2 (f ) = 0 L(f ) = h



is equivalent to the equation f (s) = K(h)(s) =

Z

b

G(s, t)h(t) dt.

a

Calculate G explicitly for the equation L(f ) = f ′′ , B1 (f ) = f (a) and B2 (f ) = f ′ (b). †

That is, the functions f1 , f2 form a basis of the vector space of all solutions.

70

2 Norms and Banach Spaces

2.6 Further Topics The material in this chapter represents the basic language and some of the main examples of functional analysis. Let us mention briefly some directions in which the theory continues. • In Chapter 3 and the following chapters we will start to see why we insisted on completeness in the definition of Banach spaces. • We have seen the definition of dual spaces, but have not yet found a description of any dual space. This will be corrected in the next chapter and more generally in Chapter 7, where we will describe the dual spaces of many of the Banach spaces that we discussed here. • How can one construct a generalized limit notion that assigns to every bounded sequence a limit, and still has many of the expected properties? One such property is linearity (but notice, for example, that lim sup is not a linear function on the space of bounded sequences). Another such property is translation-invariance with respect to the underlying group (for a sequence in the normal sense, this group would be Z). After we construct this so-called Banach limit, we ask which groups have similar notions of generalized limits. We will discuss these topics in Sections 7.2 and 10.2. • Clearly there is some hidden notion of convergence of measures to the Lebesgue measure in Section 2.3.3. In order to formulate this precisely, we will need to define an appropriate topology on a space of measures. This topology will be called the weak* topology (read as ‘weak star’ topology; see Chapter 8), and as we will show the space of probability measures on a compact metric space is itself a compact metric space in this topology. This result helps to provide a coherent setting for many equidistribution results. • Some natural spaces (examples include Cc (X) and C ∞ ([0, 1])) do not fit into the framework of Banach spaces, but do fit into the more general context of locally convex spaces. These will be introduced in Chapter 8. • Convexity will also turn out to be fundamental for many discussions in functional analysis. One of the goals in Chapter 8 will be to analyze how the extreme points of a convex compact set determine the set. • Banach algebras will be discussed in greater detail in Chapter 11, which lays the foundations for the more advanced spectral theory in Chapters 12 and 13. The reader is advised to continue with the next chapter (or at least the first three or four sections of it), after which she may select parts of the text.

Chapter 3

Hilbert Spaces, Fourier Series, and Unitary Representations

In this chapter we define Hilbert spaces as a special case of Banach spaces, pick up some of the informal claims from Section 1.1, and prove them. In particular, we will introduce Fourier series in two different settings: the first abstract, and the second being the — at least soon to be — familiar setting of the torus. In Section 3.5 we discuss the spectral theory of compact abelian groups as well as two notions of integrals for Banach space-valued functions.

3.1 Hilbert Spaces The notion of a Hilbert space is a fundamental idea in functional analysis. We will see in this section that a Hilbert space is a Banach space of a special sort, and the additional structure entailed by the extra hypothesis turns out to be highly significant. 3.1.1 Definitions and Elementary Properties Definition 3.1. An inner product space or a pre-Hilbert space is a vector space over R (or C) with an inner product h·, ·i : V × V → R (or C) with the following properties: • (Strict positivity) hv, vi > 0 for all v ∈ V r{0}; • ((Conjugate-)Symmetry) hv, wi = hw, vi for all v, w ∈ V ; and • (Linearity) for any fixed w ∈ V the map v 7−→ hv, wi is linear. If the first property of strict positivity is replaced by the weaker axiom of • (Positivity) hv, vi > 0 for all v ∈ V , then we call h·, ·i a semi-inner product.

© Springer International Publishing AG 2017 M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications, Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_3

71

72

3 Hilbert Spaces, Fourier Series, and Unitary Representations

Notice that over R a consequence is linearity of the map w 7→ hv, wi in the second variable for fixed v, so that h·, ·i is bilinear. In the complex case, we have semi-linearity in the second argument, that is hv, α1 w1 + α2 w2 i = α1 hv, w1 i + α1 hv, w2 i for any v, w1 , w2 ∈ H and α1 , α2 ∈ C. A map L from a complex vector space V to C is semi-linear ( 12 -linear) if L(α1 v1 + α2 v2 ) = α1 L(v1 ) + α2 L(v2 ) for all vectors v1 , v2 ∈ V and scalars α1 , α2 ∈ C, and a map B : V × V → C is sesqui-linear (1 12 -linear) if the map v ∈ V 7→ B(v, w) is linear for any w ∈ V and the map w ∈ V 7→ B(v, w) is semi-linear for any v ∈ V . Thus the inner product h·, ·i is sesqui-linear. In an inner product space, we will see shortly that defining p kvk = hv, vi (3.1) gives a norm on V .

Proposition 3.2 (Cauchy–Schwarz). Let (V, h·, ·i) be an inner product space. Then we have the Cauchy–Schwarz inequality, |hv, wi| 6 kvkkwk

(3.2)

for all v, w ∈ V , where equality holds if and only if v and w are linearly dependent. Moreover, the function k · k defined in (3.1) is a norm on V , so that every inner product space is also a normed vector space. If h·, ·i is onlypassumed to be a semi-inner product on V , then the induced function kvk = hv, vi for v ∈ V is a semi-norm and the inequality in (3.2) also holds in that case. Definition 3.3. A Hilbert space is an inner product space (H, h·, ·i) which is complete with respect to the norm k · k induced by the inner product as in (3.1). Proof of Proposition 3.2. To see that k · k : V → R>0 from (3.1) defines a norm, we need to check the following properties: • strict positivity of k · k, which follows at once from the strict positivity property of the inner product, and • homogeneity of k·k, which follows from linearity and (conjugate-)symmetry, since kαvk2 = hαv, αvi = |α|2 kvk2 . For the proof of the triangle inequality we will need the Cauchy–Schwarz inequality (3.2), which we will prove now. We note that the latter holds trivially if w = 0. So assume that w 6= 0. By definition, we have

3.1 Hilbert Spaces

73

0 6 kv + twk2 = hv + tw, v + twi

= hv, vi + htw, vi + hv, twi + htw, twi

= kvk2 + t hw, vi + t hw, vi + |t|2 kwk2  = kvk2 + 2ℜ thv, wi + |t|2 kwk2

(3.3)

for any scalar t by linearity and (conjugate-)symmetry of the inner product. We set hv, wi t=− . kwk2 Then the inequality (3.3) becomes 0 6 kvk2 − 2

|hv, wi|2 |hv, wi|2 + kwk2 , kwk2 kwk4

or 0 6 kvk2 kwk2 − |hv, wi|2 , giving (3.2). Reading the manipulations above in the reverse direction we see that equality in (3.2) gives kv + twk2 = hv + tw, v + twi = 0, which forces v + tw = 0 by the positivity property. If, on the other hand, v and w are linearly dependent with v = αw for some scalar α, then |hv, wi| = |hαw, wi| = |α|kwk2 = kvkkwk by the homogeneity property of k · k. It remains to show the • triangle inequality, which may be seen as follows: kv + wk2 = hv + w, v + wi = kvk2 + 2ℜ hv, wi + kwk2 2

6 kvk2 + 2kvkkwk + kwk2 = (kvk + kwk)

by the Cauchy–Schwarz inequality. Analyzing the proof above shows that the strict positivity of h·, ·i was only needed to obtain strict positivity of the induced norm and in the proof of the equality case of the Cauchy–Schwarz inequality.  As we have seen, the triangle inequality for the norm needs the Cauchy– Schwarz inequality. Another reason for the importance of this fundamental inequality is that it gives us continuity of the inner product, which we will use frequently.

74

3 Hilbert Spaces, Fourier Series, and Unitary Representations

Essential Exercise 3.4. Show that an inner product on an inner product space is jointly continuous with respect to the induced norm: if vn → v and wn → w as n → ∞, then hvn , wn i → hv, wi as n → ∞. We record a few elementary properties of inner product spaces. • The parallelogram identity, kv + wk2 + kv − wk2 = 2kvk2 + 2kwk2

(3.4)

for all v, w ∈ V . • The relationship with linear functionals: for fixed w ∈ V the map φw defined by φw (v) = hv, wi is a linear functional with norm kφw k = kwk. • The relationship with geometry: the vector hv,wi kwk2 w (appearing as tw in the proof of Proposition 3.2 above) is the orthogonal projection of v onto the subspace spanned by w. Moreover, if hv, wi = 0 then we recover Pythagoras’ theorem in the form kv + wk2 = kvk2 + kwk2 . These are easy to check. For the first, expand the left-hand side to obtain kv + wk2 + kv − wk2 = hv + w, v + wi + hv − w, v − wi

= kvk2 + 2ℜ hv, wi + kwk2 + kvk2 − 2ℜ hv, wi + kwk2 .

The second claim is a consequence of the linearity of the inner product, the Cauchy–Schwarz inequality and the definition of the operator norm. The two final claims follow by expanding v − hv,wi kwk2 w, w , respectively the square norm kv + wk2 = hv + w, v + wi. Exercise 3.5. (a) Show that any real inner product space satisfies the polarization identity hx, yi =

1 4

kx + yk2 − kx − yk2



which expresses the inner product in terms of the norm. (b) Show that the parallelogram identity (3.4) characterizes the real inner product spaces among the real normed spaces in the following sense. If a real normed vector space satisfies the parallelogram identity, then an inner product can be defined in such a way that the norm arises from the inner product. (c) Generalize the polarization identity to complex inner product spaces, and show the complex analogue of (b).

Example 3.6. We have already seen several Hilbert spaces without making explicit the underlying inner product. Pd (1) Rd (or Cd ) with hv, wi = i=1 vi wi , (also written v ·w) giving the 2-norm kvk2 =

d X i=1

|vi |2

!1/2

.

3.1 Hilbert Spaces

75

(2) ℓ2 = ℓ2 (N), the space of square-summable sequences of scalars, with inner product ∞ X hv, wi = vi wi i=1

and the 2-norm

kvk2 =

∞ X i=1

|vi |

2

!1/2

.

Equivalently ℓ2 (N) = L2λcount (N), where λcount is the counting measure on N. (3) L2µ (X) for a measure space (X, B, µ) with the inner product Z hf, gi = f g dµ, (3.5) giving the 2-norm kf k2 =

Z

|f |2 dµ

1/2

.

Notice that in Example 3.6(2) and (3), the spaces are themselves defined as the set of sequences or functions with finite 2-norm. We recall how this implies that the inner product is well-defined and note that (3) contains (2) as a special case. Lemma 3.7. If (X, B, µ) is a measure space and f, g ∈ L2µ (X), then the right-hand side of (3.5) is well-defined. Proof. Since (|f | − |g|) = |f |2 − 2|f g| + |g|2 > 0 we have Z Z Z 2 2 |f g| dµ 6 |f | dµ + |g|2 dµ < ∞, which proves that f g is integrable.



Definition 3.8. Let V, W be two Hilbert spaces and M : V → W a linear map. If M is both an isometry and a bijection, then M is called a unitary operator. Essential Exercise 3.9. Show that a bijective linear operator M : V → W is unitary if and only if hM v1 , M v2 iW = hv1 , v2 iV for all v1 , v2 ∈ V . 3.1.2 Convex Sets in Uniformly Convex Spaces From the equality case of the Cauchy–Schwarz inequality, which is itself used in the proof of the triangle inequality, it follows quickly that a norm in an

76

3 Hilbert Spaces, Fourier Series, and Unitary Representations

inner product space is strictly sub-additive (see Definition 2.17). Thus the Mazur–Ulam theorem concerning isometries (Theorem 2.20) applies in particular to Hilbert spaces. While the emphasis in this section is on Hilbert spaces, we will isolate a more abstract convexity property which is precisely what is needed for several proofs in this section. Exercise 3.10. (a) Show that the norm in a Hilbert space is strictly sub-additive (see Definition 2.17). (b) Show that the norm in a uniformly convex vector space (as defined below) is strictly sub-additive.

Definition 3.11. A normed vector space (V, k · k) is called uniformly convex if

x + y

6 1 − η (kx − yk) , kxk, kyk 6 1 =⇒ 2 for all x, y ∈ V where η : [0, 2] → [0, 1] is a monotonically increasing function with η(r) > 0 for all r > 0.

y x+y 2

x Fig. 3.1: If x and y are not close to each other, then the mid-point is uniformly closer to zero (independent of the choice of x and y).

Heuristically, we can think of Definition 3.11 as having the following geometrical meaning, illustrated in Figure 3.1. If vectors x and y have norm (length) one, then their mid-point x+y has significantly smaller norm un2 less x and y are very close together. This accords closely with the geometrical intuition from finite-dimensional spaces with Euclidean distance. Lemma 3.12. A Hilbert space (H, h·, ·i) is uniformly convex. Proof. For x, y ∈ H with kxk, kyk 6 1 we have

x + y q1 1 1 2 2 2

2 = 2 kxk + 2 kyk − 4 kx − yk q 6 1 − 14 kx − yk2 = 1 − η (kx − yk) by the parallelogram identity, with η(r) = 1 −

q 1 − 14 r2 .



3.1 Hilbert Spaces

77

The following theorem, whose conclusion is illustrated in Figure 3.2, will have many important consequences for the study of Hilbert spaces. Theorem 3.13 (Unique approximation within a closed convex set). Let (V, k · k) be a Banach space with a uniformly convex norm, let K ⊆ V be a non-empty closed convex subset, and assume that v0 ∈ V . Then there exists a unique element w ∈ K that is closest to v0 in the sense that w is the only element of K with kw − v0 k = inf kk − v0 k. k∈K

K

v0 w

Fig. 3.2: The unique closest element of K to v0 .

The following exercise shows that the unique existence of the best approximation is by no means guaranteed. Exercise 3.14. (1) Let K ⊆ V be a non-empty compact subset of a normed vector space (V, k · k) or let V be finite-dimensional and K closed. Show the existence of a best approximation of any v0 ∈ V within K. (2) Let V = R2 equipped with the norm k · k∞ and let K be the closed unit ball. Find a point v0 ∈ V that has more than one best approximation within K. Describe the points that have exactly one best approximation within K. (3) Let V = ℓ1R (N) equipped with the norm k · k1 . Let K=

n

(xn ) ∈ ℓ1R (N) | xn > 0 and

∞ X

n=1

o

an xn = 1 ,

where (an ) is a fixed sequence in (0, 1) with limn→∞ an = 1. Show that K is closed and convex. Let v0 = 0 and show that there is no best approximation of v0 within K. Conclude in particular from this that the closed unit ball in ℓ1R (N) is not compact and that ℓ1R (N) is not uniformly convex. Exercise 3.15. (8) Let (X, B, µ) be a measure space with Lpµ (X), for p ∈ [1, ∞], the associated function spaces. p p 2 2 p/2 for any p ∈ [2, ∞) and a, b > 0. (a) Show that a

+ b 6 (a

+ b )

p

p

(b) Show that f +g + f −g 6 2 2 p

p

1 kf kpp 2

+ 12 kgkpp for any p ∈ [2, ∞) and f, g in Lpµ (X).

(c) Deduce from (b) that Lpµ (X) is uniformly convex for p ∈ [2, ∞). (d) Show that L1µ (X) and L∞ µ (X) are in general not uniformly convex.

Proof of Theorem 3.13. By translating both the set K and the point v0 by −v0 , we may assume without loss of generality that v0 = 0. We define

78

3 Hilbert Spaces, Fourier Series, and Unitary Representations

s = inf kk − v0 k = inf kkk. k∈K

k∈K

If s = 0, then we must have 0 ∈ K since K is closed, and the only choice is then w = v0 = 0 (the uniqueness of w is a consequence of the strict positivity of the norm). So assume that s > 0. By multiplying by the scalar 1s we may also assume without loss of generality that s = 1. Notice that once we have found a point w ∈ K with norm 1, then its uniqueness is an immediate consequence of the uniform convexity: if w1 , w2 ∈ K have kw1 k = kw2 k = 1, 2 2 then w1 +w ∈ K because K is convex. Also, k w1 +w k = 1 by the triangle 2 2 inequality and since s = 1. By uniform convexity this implies that w1 = w2 . Existence: Turning to the existence, let us first give the idea of the proof. Choose a sequence (kn ) in K with kkn k → 1 as n → ∞. Then the midm points kn +k also lie in K, since K is convex, and thus the mid-point must 2 have norm greater than or equal to 1, since s = 1. Therefore kn and km must be close together by uniform convexity, so (kn ) is a Cauchy sequence. Since V is complete and K is closed, this will give a point w ∈ K with kwk = 1 = s as required. To make this more precise, it is easier to apply uniform convexity to the normalized vectors 1 xn = kn , sn where sn = kkn k. The mid-point of xn and xm can now be expressed as   xm + xn 1 1 1 1 = km + kn = + (akm + bkn ) 2 2sm 2sn 2sm 2sn with a=

b=

1 2sm 1 2sm

+

1 2sn

1 2sn 1 2sm

+

1 2sn

> 0,

> 0,

and a + b = 1. Therefore akm + bkn ∈ K by convexity, and so

 

xm + xn 1 1 1 1

= + kakm + bkn k > + .

2 2sm 2sn 2sm 2sn

Let η be as in Definition 3.11 and fix ε > 0. Choose N = N (ε) large enough to ensure that m > N implies that 1 > 1 − η(ε). sm Then m, n > N implies that

3.1 Hilbert Spaces

79

1 1 + > 1 − η(ε), 2sm 2sn which together with the definition of uniform convexity gives  n 1 − η kxm − xn k > k xm +x k > 1 − η(ε). 2 By monotonicity of the function η this implies that kxm − xn k < ε for all m, n > N , showing that (xn ) is a Cauchy sequence. As V is assumed to be complete, we deduce that (xn ) converges to some x ∈ V with kxk = 1. Since sn → 1 and kn = sn xn as n → ∞ it follows that limn→∞ kn = x. As K is closed the limit x belongs to K and by construction is an (hence the unique) element in K closest to v0 = 0.  Definition 3.16. Let H be a Hilbert space, and A ⊆ H any subset. Then the orthogonal complement of A is defined to be A⊥ = {h ∈ H | hh, ai = 0 for all a ∈ A}. Corollary 3.17 (Orthogonal decomposition). Let H be a Hilbert space, and let Y ⊆ H be a closed subspace. Then Y ⊥ is a closed subspace with H = Y ⊕ Y ⊥, meaning that every h ∈ H can be written in the form h = y + z with y ∈ Y and z ∈ Y ⊥ , and y and z are unique with these properties. Moreover, we ⊥ have Y = Y ⊥ and khk2 = kyk2 + kzk2 (3.6) if h = y + z with y ∈ Y and z ∈ Y ⊥ .

In a two-dimensional real vector space, (3.6) is familiar as Pythagoras’ theorem. Proof of Corollary 3.17. As h 7→ hh, yi is a (continuous linear) functional for each y ∈ Y , the set Y ⊥ is an intersection of closed subspaces and hence is a closed subspace. Using strict positivity of the inner product, it is easy to see that Y ∩ Y ⊥ = {0}; from this the uniqueness of the decomposition h = y + z with y ∈ Y and z ∈ Y ⊥ follows at once. So it remains to show the existence of this decomposition. Fix h ∈ H, and apply Theorem 3.13 (which we may as a Hilbert space is uniformly convex by Lemma 3.12) with the closed convex set K = Y to find a point y ∈ Y that is closest to h. Let z = h − y, so that for any v ∈ Y and any scalar t we have  kzk2 6 kh − (tv + y) k2 = kz − tvk2 = kzk2 − 2ℜ thv, zi + |t|2 kvk2 . | {z } ∈Y

80

3 Hilbert Spaces, Fourier Series, and Unitary Representations

In particular, for t ∈ R the function t 7→ kh − (tv + y)k2 is a quadratic polynomial with  its minimum at t = 0. Taking the derivative at t = 0 gives ℜ hv, zi = 0 for all v ∈ Y . Similarly, restricting to t = is with s ∈ R and taking the derivative once more it follows that ℑhv, zi = 0 for all v ∈ Y . Thus z ∈ Y ⊥ , and hence khk2 = hh, hi = hy + z, y + zi = kyk2 + kzk2, showing (3.6). ⊥ ⊥ It is clear from the definitions that Y ⊆ Y ⊥ . If v ∈ Y ⊥ then we may write v = y + z for some y ∈ Y and z ∈ Y ⊥ by the first part of the ⊥ proof. However, 0 = hv, zi = kzk2 implies that v = y and so Y = Y ⊥ .  An immediate consequence of Corollary 3.17 is the following.

Corollary 3.18 (Orthogonal projection). For a closed subspace Y of a Hilbert space H, the orthogonal projection onto Y , defined by PY : H −→ Y h 7−→ y

where y is the unique element of Y with h − y ∈ Y ⊥ , is a bounded linear operator with kPY k 6 1 (and with kPY k = 1 unless Y = {0}) satisfying (and characterized by) hh, yi = hPY h, yi for all h ∈ H and y ∈ Y . Moreover, if H = Y ⊕ Y ⊥ , then the orthogonal decomposition from Corollary 3.17 is given by h = PY h + PY ⊥ h. Recall that we write V ∗ = B(V, R) or B(V, C) for the dual space of a normed vector space V , equipped with the operator norm. The following gives our first classification of a dual space, and is crucial for the further development of the theory as well as of its applications. Corollary 3.19 (Fr´ echet–Riesz representation). For a Hilbert space H the map sending h ∈ H to φ(h) ∈ H∗ defined by φ(h)(x) = hx, hi is a linear (resp. semi-linear in the complex case) isometric isomorphism between H and its dual space H∗ . Proof. By the axioms of the inner product, we know that φ is (semi-)linear. By the Cauchy–Schwarz inequality and since φ(h)(h) = khk2 we also know that φ is isometric. It remains to show that φ is onto. Let ℓ : H → R (or C)

3.1 Hilbert Spaces

81

be a linear functional. Then Y = ker(ℓ) is a closed linear subspace of H (since ℓ is continuous). If Y = H then ℓ = 0 and so φ(0) = ℓ. So suppose that Y 6= H, in which case we can choose† a non-zero element z ∈ Y ⊥ . We claim that ! ℓ(z) ℓ=φ z . kzk2 Indeed, if x ∈ H then ℓ(z)x−ℓ(x)z ∈ ker(ℓ) = Y and so hℓ(z)x − ℓ(x)z, zi = 0 2 by choice of z. In other words, D we have E shown that ℓ(z) hx, zi = ℓ(x)kzk , ℓ(z) which is equivalent to ℓ(x) = x, kzk for x ∈ H, as claimed. 2z



The following exercise shows that completeness is essential in Corollaries 3.17 and 3.19.

Exercise 3.20. Consider the space ℓ2c of sequences with finite support with the ℓ2 inner P xn product, and the subspace V = {x ∈ ℓ2c | n>1 n = 0}. Show that V is closed, and 2 is empty. Deduce that the bounded linear functional that its orthogonal complement in ℓ c P xn 2 sending (xn ) ∈ ℓ2c to n>1 n ∈ C cannot be represented as x 7→ hx, yi for any y ∈ ℓc . Exercise 3.21. The following is known as the Lax–Milgram lemma. (a) Suppose that H is a Hilbert space, and suppose that B : H × H −→ R (or C) is bilinear (or sesqui-linear in the complex case). Finally, assume that B is bounded in the sense that there is some M > 0 with |B(x, y)| 6 M kxkkyk for all x, y ∈ H. Show that there exists a unique linear operator T : H → H with B(x, y) = hT x, yi for which kT kop 6 M . (b) Assume in addition that B is coercive, meaning that there exists some c > 0 such that |B(x, x)| > ckxk2 for all x ∈ H. Show in this case that the operator T from (a) has an inverse, and that kT −1 kop 6 1c . Exercise 3.22. Use Corollary 3.19 to show that if H is a Hilbert space, then H∗ is also a Hilbert space, and exhibit a natural isometric isomorphism between H and H∗∗ .

Essential Exercise 3.23. Show that the completion of an inner product space is a Hilbert space. Exercise 3.24. Recall the definition of the Hardy space H 2 (D) (or the definition of the Bergman space A2 (D)) for D = B1C from Exercise 2.61. Show that for every a ∈ D there is a function ka in H 2 (D) (respectively ka ∈ A2 (D)) with f (a) = hf, ka iH 2 (D) or f (a) = hf, ka iA2 (D) respectively. The function D × D ∋ (a, w) 7→ ka (w) is called a reproducing kernel. Determine ka explicitly in both cases. † The space Y ⊥ is one-dimensional, since ℓ| Y ⊥ has trivial kernel. Hence z is really uniquely determined up to a scalar multiple.

82

3 Hilbert Spaces, Fourier Series, and Unitary Representations

We recall (and extend) the definition of linear hull as follows. Definition 3.25. Let (V, k · k) be a normed vector space, and let S ⊆ V be a subset. The linear hull of S, written hSi, is the smallest P subspace of V containing S. Thus hSi consists of all linear combinations s∈F cs s for F ⊆ S finite and scalars cs . The closed linear hull hSi is the smallest closed subspace of V containing S — it is the closure of the linear hull. Corollary 3.26 (Characterization of the closed linear hull). Let H be ⊥ a Hilbert space and S ⊆ H a subset. Then hSi = S ⊥ . Proof. Let Y = hSi be the closed linear hull. By orthogonal decomposition ⊥ of Hilbert spaces (see Corollary 3.17), Y = Y ⊥ . We claim that Y ⊥ = S ⊥ , which together with the last statement gives the corollary. To see the claim, notice that for any x ∈ H we have x ∈ S ⊥ ⇐⇒ hx, yi = 0 for y ∈ hSi ⇐⇒ hx, yi = 0 for y ∈ hSi by (semi-)linearity in the second argument and continuity of the inner product.  We have seen that a closed subspace Y in a Hilbert space has a closed orthogonal complement Y ⊥ , allowing the Hilbert space to be written as the direct sum Y ⊕ Y ⊥ . The next two exercises explore the same question for Banach spaces, where no such simple conclusion can be drawn. Exercise 3.27. Let (V, k · k) be a normed space. Two subspaces V1 , V2 of V are said to be algebraically complemented if V1 + V2 = V and V1 ∩ V2 = {0}. In that case we define linear maps π : V → V /V2 by v 7→ v + V2 , φ : V1 × V2 → V by (v1 , v2 ) 7→ v1 + v2 , and P : V → V1 by v1 + v2 7→ v1 where v ∈ V, v1 ∈ V1 and v2 ∈ V2 . We may call P the projection of V onto V1 along V2 . Show that the following are equivalent: (1) V1 and V2 are closed subspaces of V and the map π|V1 is a homeomorphism (where V /V2 is equipped with the quotient norm from Lemma 2.15); (2) the map φ is a homeomorphism (where V1 × V2 is equipped with any of the norms from Exercise 2.9); (3) P is a bounded operator. If any of these equivalent conditions hold, then the subspaces are called topologically complemented.

A closed subspace W of a normed space V is called complemented if there is a closed subspace W ′ with the property that W and W ′ are topologically complemented. The next exercise shows that a closed subspace is not necessarily complemented.(9) Exercise 3.28. Prove that the closed subspace c0 = {x = (xn ) ∈ ℓ∞ | limn→∞ xn = 0} is not complemented in ℓ∞ as follows.

3.1 Hilbert Spaces

83

(1) Show that (ℓ∞ )∗ contains a countable subset A with the property that if x ∈ ℓ∞ has a(x) = 0 for all a ∈ A then x = 0, and deduce that the same holds for any space isomorphic to a closed subspace of ℓ∞ . Using the following steps, show that V = ℓ∞ /c0 does not have the property in (1) and hence that c0 cannot be complemented by Exercise 3.27. (2) Use an enumeration of Q to construct for each i ∈ I = RrQ a sequence (i)

x(i) = (xn ) ∈ ℓ∞ with values in {0, 1} and with infinite support (i)

Supp(x(i) ) = {n ∈ N | xn = 1} in such a way that Supp(x(i) ) ∩ Supp(x(j) ) is finite for all i 6= j. (3) Show that for any finite non-empty subset J ⊆ I and any list of numbers (bi )i∈J with |bi | = 1 for all i ∈ J we have

X

(i) bi x

i∈J

= 1.

ℓ∞ /c0

(4) Deduce from (3) that for any continuous linear functional f ∈ V ∗ and n ∈ N the 1 set {i ∈ I | |f (x(i) )| > n } is finite. Conclude that for any countable subset A of V ∗ there is some i ∈ I with the property that a(x(i) ) = 0 for all a ∈ A.

3.1.3 An Application to Measure Theory We will show in this section how the results from Section 3.1.2 can be used in measure theory. Before stating the main result, we recall some definitions. A measure ν is absolutely continuous with respect to another measure µ, written ν ≪ µ, if any measurable set N with µ(N ) = 0 must also satisfy ν(N ) = 0. Two measures µ and ν are singular with respect to each other, written ν ⊥ µ, if there exist disjoint measurable sets Xµ , Xν ⊆ X with X = Xµ ⊔ Xν and with ν(Xµ ) = 0 = µ(Xν ). Finally, recall that a measure µ is σ-finite if there is a decomposition of X into measurable sets, X=

∞ G

Xi ,

i=1

with µ(Xi ) < ∞ for all i > 1. Proposition 3.29 (Lebesgue decomposition, Radon–Nikodym derivative). Let µ and ν be two σ-finite measures on a measurable space (X, B). Then ν can be decomposed as ν = νabs + νsing into the sum of two σfinite measures with νabs ≪ µ and νsing ⊥ µ. Moreover, there exists a (µalmost everywhere uniquely determined) measurable function f > 0, called abs the Radon–Nikodym derivative and often denoted by f = dνdµ , such that

84

3 Hilbert Spaces, Fourier Series, and Unitary Representations

νabs (B) =

Z

f dµ B

for any measurable B ⊆ X. Proof. Suppose that µ and ν are both finite measures (the general case can be reduced to this case by using the assumption that µ and ν are both σ-finite; see Exercise 3.31). We define a new measure m = µ + ν and will work with the real Hilbert space H = L2m (X). On this Hilbert space we define a linear functional φ by Z φ(g) = g dν for g ∈ H = L2m (X). Note that equivalence of functions modulo m implies equivalence of functions modulo ν, and moreover that for g ∈ H we have Z Z |g| dν 6 |g| dm 6 kgkL2m k1kL2m , where we have used the fact that m = µ + ν, that µ is a positive measure, and the Cauchy–Schwarz inequality on H. Therefore, φ(g) is well-defined for g ∈ H and satisfies |φ(g)| 6 kgkL2m k1kL2m . By Fr´echet–Riesz representation (Corollary 3.19) there is some k ∈ H = L2m such that Z Z g dν = φ(g) = gk dm (3.7) for all g ∈ L2m (X). We claim that k takes values in [0, 1] almost surely with respect to m. Indeed, for any B ∈ B we have 0 6 ν(B) 6 m(B), so (using the fact that g = 1B ) Z 06

k dm 6 m(B).

B

Using the choices B = {x ∈ X | k(x) < 0} and B = {x ∈ X | k(x) > 1} implies the claim that k takes values in [0, 1], m almost surely. Modifying k if necessary we will assume that k only takes values in [0, 1]. Since m = µ + ν, we can reformulate (3.7) as Z Z g(1 − k) dν = gk dµ. (3.8) This holds by construction for all simple functions g, and hence for all nonnegative measurable functions by monotone convergence. Now define Xsing = {x ∈ X | k(x) = 1}, Xµ = XrXsing = {x ∈ X | k(x) < 1},

3.1 Hilbert Spaces

85

and νsing = ν|Xsing . By definition, νsing (Xµ ) = 0, and by equation (3.8) applied with g = 1Xsing we also have µ(Xsing ) = 0. Therefore νsing ⊥ µ. We also define νabs = ν|Xµ so that ν = νsing + νabs . Finally, define the function ( k on Xµ , f = 1−k 0 on Xsing . For any measurable g > 0 we may now apply (3.8) to obtain Z Z Z Z g g g dνabs = (1 − k) dν = k dµ = gf dµ, 1−k 1−k X

X

abs which shows that f = dνdµ is a Radon–Nikodym derivative and also shows that νabs ≪ µ. If f1 , f2 are both Radon–Nikodym derivatives of νabs with respect to µ, then Z Z f1 dµ = νabs (B) = f2 dµ

B

B

for all measurable B ⊆ X, which implies that f1 = f2 µ-almost surely by considering B = {x ∈ X | f1 (x) < f2 (x)} and B = {x ∈ X | f1 (x) > f2 (x)}. 

Exercise 3.30.FFor every n > 1, let (Xn , Bn , µn ) be a measure space. (a) Define X = ∞ n=1 Xn . Show that B = {B ⊆ X | B ∩ Xn ∈ Bn for every n > 1} defines a σ-algebra on X. P∞ (b) For every B ∈ B as in (a) define µ(B) = n=1 µn (B ∩ Xn ) and show that µ is a measure on B.

Essential Exercise 3.31. Complete the proof of Proposition 3.29 in the σfinite case. Exercise 3.32. Use Lemma 2.63 and Proposition 3.29 to calculate the norm of the functional Z Z Cc (X) ∋ f 7−→

f dµ −

f dν

for two finite Borel measures µ and ν on a locally compact metric space X (which may or may not be mutually singular).

Exercise 3.33. Let (X, B) be a measurable space and denote the space of signed measures on X (as defined in Section B.5) by M(X). (a) Given a signed measure dν = g dµ with a finite measure µ and g ∈ L1µ (X), define kνk to be kgkL1 (µ) . Show that this yields a well-defined norm on M(X). (b) Show that M(X) is a Banach space with respect to this norm.

The notion of conditional expectation with respect to a sub-σ-algebra is a powerful tool, and finds wide applications in probability (see Lo´eve [64], [65]) and ergodic theory (see [27], for example). Roughly speaking it is an ‘orthogonal projection’ defined on L1 , and we invite the reader to construct it in this manner in the following exercise.

86

3 Hilbert Spaces, Fourier Series, and Unitary Representations

Exercise 3.34. Let (X, B, µ) be a probability space, and let A ⊆ B be a sub-σ-algebra. (a) Show that there exists a bounded operator, called the conditional expectation, Eµ ( · | A) : L1µ (X, B) −→ L1µ (X, A)



f 7−→ Eµ f | A such that

Z

A

f dµ =

Z

A



Eµ f | A dµ

(3.9)

for all A ∈ A. (b) Show that (3.9) uniquely characterizes Eµ (f | A) ∈ L1µ (X, A) as an equivalence class. (c) Show that f ∈ L1µ (X, B) and g ∈ L∞ µ (X, A) implies that Eµ (f g | A) = gEµ (f | A). (d) Show that kEµ (f | A) k1 6 kf k1 for f ∈ L1µ (X, B).

3.2 Orthonormal Bases and Gram–Schmidt Definition 3.35. A finite or countable list (xn ) of vectors in an inner product space (V, h·, ·i) is called orthonormal if ( 1 if m = n, hxm , xn i = δmn = 0 if m 6= n for all m, n > 1. In other words, we require that all the vectors have length one, and are mutually orthogonal. As one might expect, this notion is fundamental for Hilbert spaces, and gives rise to the following satisfying abstract result, which as we will see lays the ground for Fourier analysis. Proposition 3.36 (The closed linear hull of an orthonormal list). Let H be a Hilbert space. Then the closed linear hull of an orthonormal list (xn ) is given by X  h{xn }i = an xn | the sum converges in H , n

P an xn converges in H if and only if n |an |2 < ∞. In P 1/2 2 that case we also have kvk = and hv, xm i = am for m > 1. n |an | P 2 Hence the linear map φ that sends the sequence (an ) with n |an | < ∞ P to n an xn ∈ h{xn }i is a unitary isomorphism of Hilbert spaces. P We note that the series n an xn need not be absolutely convergent, since ℓ2 (N) ) ℓ1 (N). Proof of Proposition 3.36. Suppose first that (x1 , . . . , xN ) is a finite orthonormal list. Then we may define a map φ from KN (with K being the

where the sum v =

P

n

3.2 Orthonormal Bases and Gram–Schmidt

87

PN field of scalars R or C) to H by setting φ((an )) = n=1 an xn . Using the assumption of orthonormality it follows that X X kφ((an ))k2 = ham xm , an xn i = |an |2 = k(an )k22 (3.10) m,n

n

and hφ((an )), xj i =

X n

han xn , xj i = aj

(3.11)

for j = 1, . . . , N . This proves the statements in the case of a finite list. Now suppose that the list is infinite. In this case we define the space  cc (N) = (an ) ∈ ℓ2 (N) | an = 0 for all but finitely many n and the linear map φ : cc (N) → H by

(an ) 7−→

∞ X

an xn ,

n=1

where the sum is actually finite by definition of the space cc (N), so that properties (3.10) and (3.11) clearly still hold in this case. We note that φ(cc (N)) is the linear hull of the set {xn }. Now notice that cc (N) ⊆ ℓ2 (N) is dense. Indeed, if a = (an ) ∈ ℓ2 (N) and we define ( an if n 6 N ; ) a(N = n 0 if n > N for N ∈ N, then ∞

2 X

(N ) 

− (an ) = |an |2 −→ 0

an 2

n=N +1

as N → ∞. By Proposition 2.59 there is therefore a unique extension of φ to ℓ2 (N), defined by ) φ((an )) = lim φ((a(N (3.12) n )). N →∞

By continuity of the norm and the inner product, properties (3.10) and (3.11) extend to all of ℓ2 (N). By (3.10), φ is an isometry from ℓ2 (N) onto its image, so the image is complete and therefore closed in H. Since φ(cc (N)) = h{xn }i, 2 it follows that φ(ℓ hull. Finally P∞(N)) = h{xn }i is the closed linear P (3.12) shows that the series n=1 an xn converges if (an ) ∈ ℓ2 (N), and if ∞ an xn conn=1P ∞ verges, then (3.10) (applied to the partial sums) also implies that n=1 |an |2 converges.  The argument in the proof above can also be used for orthogonal subspaces as in the following exercise.

88

3 Hilbert Spaces, Fourier Series, and Unitary Representations

Essential Exercise 3.37. (a) Let (Hn ) be a finite or countable list of Hilbert spaces. Then we define the direct Hilbert space sum n o M X Hn = (vn ) | vn ∈ Hn and kvn k2 < ∞ n

n

to consist of all ‘square summable sequences’. Show that X h(vn ), (wn )i⊕ = hvn , wn i n

defines an inner product on the direct sum, making it into a Hilbert space. (b) Let H be a Hilbert space and (Hn ) a finite or countable list of mutually orthogonal closed subspaces of H. Show that there is a canonical isometric isomorphism D[ E M φ: Hn −→ Hn n

n

analogous to Proposition 3.36, and describe the inverse map of φ using orthogonal projections. Definition 3.38. A list of orthonormal vectors in a Hilbert space H is said to be complete (or to be an orthonormal basis) if its closed linear hull is H.

We note that strictly speaking this notion of a Schauder basis of an infinitedimensional Hilbert space does not coincide with the notion of a basis in the sense of linear algebra as we allow (in contrast to the standard definition) infinite converging sums to represent arbitrary vectors as linear combinations of the basis vectors. We invite the reader to compare our discussion here with the proof of the existence of a basis for an infinite-dimensional vector space (relying on the axiom of choice and often called a Hamel basis), and hope that the reader agrees with us that the notion of an orthonormal basis of a Hilbert space is much more natural than the notion of a Hamel basis in our context. More importantly, the notion of an orthonormal basis will prove to be much more useful in the following discussions. Theorem 3.39 (Gram–Schmidt). Every separable Hilbert space H has an orthonormal basis. If H is n-dimensional, then H is isomorphic to Rn or Cn . If H is not finite-dimensional, then H is isomorphic to ℓ2 (N). Here isomorphic means isomorphic as Hilbert spaces, so there is a linear bijection between the spaces that preserves the inner product. The proof of Theorem 3.39 is simply an interpretation of the familiar Gram–Schmidt orthonormalization procedure. Proof of Theorem 3.39. Let {y1 , y2 , . . . } ⊆ H be a dense countable subset. We are going to use the vectors {yn } to construct an orthonormal list of vectors which has the same linear hull. This is built up from the simple geometrical observation that if a vector v does not lie in the linear span of a

3.2 Orthonormal Bases and Gram–Schmidt

89

finite set of vectors, then something from the linear span may be added to v to produce a non-zero vector orthogonal to the linear span. We may assume that y1 6= 0 and define x1 = kyy11 k . Suppose now that we have already constructed orthonormal vectors x1 , . . . , xn by using the vectors y1 , . . . , yk with k > n in such a way that Vn = hx1 , . . . , xn i = hy1 , . . . , yk i. If yk+1 ∈ Vn we simply increase k but not n. If yk+1 ∈ / Vn we decompose yk+1 into a sum v+w with v ∈ Vn and w ∈ Vn⊥ using the orthogonal decomposition w in Hilbert spaces (see Corollary 3.17). Since w 6= 0, we may define xn+1 = kwk and obtain an extended list of orthonormal vectors satisfying Vn+1 = hx1 , . . . , xn+1 i = hy1 , . . . , yk+1 i. Continuing this construction, we see that either H is n-dimensional for some n > 0 and we have produced an orthonormal basis {x1 , . . . , xn } for H, or that H is infinite-dimensional, and we construct an infinite list x1 , x2 , . . . of orthonormal vectors in H. In this case the linear hull of {x1 , x2 , . . . } contains the original dense set {y1 , y2 , . . . } and so the closed linear hull must be all of H, showing that {xn } is an orthonormal basis of H. The remaining statement that H is isomorphic to ℓ2 (N) follows from Proposition 3.36.  Exercise 3.40. Give a direct proof that the closed unit ball in an infinite-dimensional Hilbert space is not compact by using the material from this section. Exercise 3.41. Recall the Hardy and Bergman spaces H 2 (D) and A2 (D) on the unit disk D = B1C from Exercise 2.61. Describe the spaces H 2 (D) andP A2 (D) in terms of the sequence of Taylor coefficients (an ) of the Taylor expansion f (z) = n>0 an z n of elements of the space.

As we have noted above, a convergent sum obtained from an orthonormal list may not always be absolutely convergent. However, as we will show now, it is always unconditionally convergent in the sense that the order of summation is irrelevant. Corollary 3.42. Let (xn ) beP a countable orthonormal list in a Hilbert space H ∞ and let (an ) ∈ ℓ2 (N). Then n=1 an xn converges P∞ unconditionally, P∞meaning that for any permutation  : N → N we have m=1 a(m) x(m) = n=1 an xn . In particular, it makes sense to speak of a countable orthonormal basis even if we do not specify an enumeration of the basis. Proof. Let (xn ), (an ),P and  : N → N be as in the corollary. By Proposition 3.36 the series v = ∞ n=1 an xn converges, and since ∞ X

m=1

|a(m) |2 =

∞ X

n=1

|an |2

90

3 Hilbert Spaces, Fourier Series, and Unitary Representations

P∞ the same applies to w = m=1 a(m) x(m) . Also by Proposition 3.36 we have hv, xn i = an and w, x(m) = a(m) for all m, n ∈ N. As  is a permutation we see that hv − w, xn i = 0 for all n ∈ N. As v − w belongs to the closed linear hull of {xn }, it follows that v = w, as claimed. Suppose now B ⊆ H is a countable set consisting of mutually orthogonal unit vectors with dense linear hull. Then we may choose an enumeration B = {xn | n ∈ N} and obtain an orthonormal basis in the sense of Definition 3.38. By the above the properties of the orthonormal basis and also the coordinates hv, xi of v ∈ H associated to a given element of x ∈ B remain unchanged if a different enumeration is being used.  3.2.1 The Non-Separable Case While the motivation generated by natural examples and the notational convenience of thinking of countable collections as sequences incline one strongly to the separable case, there is no reason to restrict attention completely to separable Hilbert spaces. Example 3.43. Let I be a set, equipped with the discrete topology and the counting measure λcount defined on the σ-algebra P(I) of all subsets of I. Then ℓ2 (I) = L2 (I, P(I), λcount ) is a Hilbert space, and it comprises all functions a : I → R (or C) for which the = {i ∈ I | ai 6= 0} is P support Supp(a) P finite or countable, and for which i∈I |ai |2 = i∈Supp(a) |ai |2 < ∞.

Theorem 3.44 (Non-separable Gram–Schmidt). Let H be a Hilbert space that is not separable. Then there is an orthonormal basis consisting of elements xi for every i in an uncountable index set I. Moreover, we have an isomorphism H ∼ = ℓ2 (I), where the isomorphism between H and ℓ2 (I) is given by X ℓ2 (I) ∋ a 7−→ ai xi i∈Supp(a)

and the sum on the right is countable and convergent. Proof. We will construct a maximal orthonormal set of vectors by using Zorn’s lemma (see Appendix A.1). Define a partially ordered† set

.

. with partial order defined by (I, x.) 4 (J, y.) if I ⊆ J and x. = y.| . In this F = {(I, x ) | the function x : I → H has orthonormal image} , I

partially ordered set every totally ordered subset (or chain) has an upper bound, which can be found by simply taking the union of the index sets and the natural extension of the partially defined functions to the union. † In order to ensure that this definition does indeed define a set, we could add the requirement that I is a subset of H, and let x be the identity.

.

3.3 Fourier Series on Compact Abelian Groups

91

.

It follows that there exists a maximal element (I, x ) of this partially ordered set by Zorn’s lemma. Using this, define an isometry φ : ℓ2 (I) → H by X a 7−→ ai xi i∈Supp(a)

first on the subset of all elements a ∈ ℓ2 (I) with | Supp(a)| < ∞, and then, by applying the automatic extension to the closure (Proposition 2.59), on all of ℓ2 (I). This again defines an isomorphism from ℓ2 (I) to the complete, and hence closed, subspace Y = φ(ℓ2 (I)) ⊆ H. We claim that Y = H, for otherwise there would exist some x ∈ Y ⊥ of norm one by the orthogonal decomposition of Hilbert spaces (Corollary 3.17), and using this element x we can define a new element of F which is strictly bigger than the maximal element (I, x ) in the partial order. This contradiction shows the claim, and hence proves the theorem. 

.

3.3 Fourier Series on Compact Abelian Groups Definition 3.45. A topological group is a group G that carries a topology with respect to which the maps (g, h) 7→ gh and g 7→ g −1 are continuous as maps G × G → G and G → G respectively. A compact (σ-compact, locally compact, and so on) group is a topological group for which the topological space is compact (σ-compact, locally compact, and so on). We similarly extend other topological and algebraic properties to topological groups. For example, a metric compact abelian group is a compact metric topological space with an abelian group structure satisfying the continuity conditions above. Below we will be largely concerned with specific metric abelian groups, in which the circle or 1-torus T = R/Z ∼ = S1 with its metric inherited from the usual metric on R (see the footnote on p. 2) and the d-torus d Td = Rd /Zd ∼ = S1

are the main examples (which will also be discussed in Section 3.4 from a slightly different, more concrete, point of view). The notation T will be used for the additive circle and S1 = {z ∈ C | |z| = 1} for the multiplicative circle. Here compact and abelian are necessary assumptions for the type of result we prove, and dropping either of these two assumptions changes the theory significantly. However, we assume metrizability mainly for convenience, because it gives separability of C(G) by Lemma 2.45. We will use the following facts about compact abelian groups as ‘black boxes’ (that is, we will not need to know how they are proved at this stage).

92

3 Hilbert Spaces, Fourier Series, and Unitary Representations

However, we will also see in some examples below that these are often easy to prove if the group is given concretely. Theorem (Existence of Haar measure†(10) ). Every locally compact σcompact metric group G has a left Haar measure mG , satisfying (and, up to positive multiples, characterized by) the properties: • mG (K) < ∞ for any compact set K ⊆ G; • mG (O) > 0 for any non-empty open set O ⊆ G; and • mG (gB) = mG (B) for all measurable B ⊆ G and g ∈ G. We will usually be dealing with σ-compact metrizable groups, which simplifies the measure theory needed, but the existence of Haar measure only requires the group to be locally compact. For G = Td , which as a measurable space can be identified with [0, 1)d , the Haar measure is simply the ddimensional Lebesgue measure restricted to [0, 1)d . Exercise 3.46. Show that the Lebesgue measure on [0, 1)d considered as a measure on Td satisfies all the properties of the Haar measure.

Knowing the defining third property mG (x + B) = mG (B) for all Borel subsets B ⊆ G, the formula Z Z f (x) dm(x) = f (g + x) dm(x) (3.13) G

G

follows immediately for simple functions and then by monotone convergence also for positive integrable functions, and then by taking differences for all integrable functions. Before stating the next fact, we recall that a (unitary) character on a topological group G is a continuous homomorphism χ : G −→ S1 = {z ∈ C | |z| = 1}.

(3.14)

The trivial character is the character defined by χ(g) = 1 for all g ∈ G. A collection F of functions on G (and, in particular, a collection of characters) is said to separate points if for any g, g ′ ∈ G with g 6= g ′ there is some f ∈ F with f (g) 6= f (g ′ ). Theorem (Completeness of characters‡ ). On every locally compact σcompact metric abelian group G there are enough characters to separate points. For G = T this is trivial, because the single character χ defined by χ(x + Z) = e2πix † ‡

This will be proved in Section 10.1.

This will be established in Section 12.8, and holds more generally for locally compact abelian groups.

3.3 Fourier Series on Compact Abelian Groups

93

for x ∈ R already separates points since it is an isomorphism between T and S1 . For G = Td the characters χ1 , . . . , χd , where χj (x + Zd ) = e2πixj t

for x = (x1 , . . . , xd ) ∈ Rd , separate points since if x 6= y we must have some j ∈ {1, . . . , d} with xj 6= yj , and then χj (x) 6= χj (y). In some discussions about characters we will parameterize the collection of all characters using some index set. For example, we will see shortly that the characters on Td are parameterized by elements n ∈ Zd in a natural way if we define for n ∈ Zd the character χn on Td by χn (x + Zd ) = e2πin·x where n·x denotes the usual inner product Rd . We will write x ∈ Td as a shorthand for the element x+Zd ∈ Td , and whenever convenient we identify x ∈ Td with x ∈ [0, 1)d . Assuming the existence of a Haar measure and the completeness of characters as above for a compact metric abelian group G, we will now describe the theory of Fourier series on G. This will give a complete description of L2 (G) = L2mG (G) where mG is the Haar measure on G. For convenience we normalize mG to satisfy mG (G) = 1. Theorem 3.47 (Fourier series). Assume that a metric compact abelian group G has a Haar measure and satisfies completeness of characters. Then the set of characters is finite or countably infinite and forms an orthonormal basis of L2 (G). That is, the set of characters is an orthonormal set and any f ∈ L2 (G) may be written as X f= aχ χ, χ

where the sum, which runs over all the characters of G, is convergent(11) with respect to k · k2 , the equality is meant as elements of L2 (G), the coefficients are given by aχ = hf, χi, and they satisfy X |aχ |2 = kf k22 . χ

The final equality is a form of Parseval’s theorem or Parseval’s formula. Proof of Theorem 3.47. Let χ be a non-trivial character on G, so that there is some element g ∈ G with χ(g) 6= 1. Since χ is continuous and by assumption m(G) = 1, the function χ is integrable, and Z Z Z χ(x) dm(x) = χ(g + x) dm(x) = χ(g) χ(x) dm(x). (3.15) G

G

G

94

3 Hilbert Spaces, Fourier Series, and Unitary Representations

In fact, in (3.15) we used the defining invariance property of the Haar measure extended to integrals as in (3.13) and the fact that a character is in particular a homomorphism. However, we have chosen g with χ(g) 6= 1 so (3.15) gives Z χ dm = 0. G

Now let χ1 , χ2 be any characters, and write χ = χ1 χ2 . Then χ is also a character, and since χ2 (g) = χ2 (g)−1 , we see that χ is trivial if and only if χ1 = χ2 . Therefore the calculation above gives ( Z m(G) = 1 if χ1 = χ2 ; hχ1 , χ2 i = χ1 χ2 dm = δχ1 ,χ2 = 0 if χ1 6= χ2 , G so the characters form an orthonormal set (and this is a consequence of the properties of the Haar measure). To show that there are only countably many characters on G, notice that by orthonormality of √ the set of characters, the L2 distance between any two distinct characters is 2. By Lemma 2.46, C(G) is separable with respect to the k · k∞ norm. This extends to L2 (G) with respect to k · k2 since the bound Z kf k22 = |f (g)|2 dm(g) 6 kf k2∞ G | {z } 6kf k2∞

shows that the embedding C(G) → L2 (G), which we know has dense image by Proposition 2.51, is continuous. It follows that there can be only countably many distinct characters, since an uncountable collection would give rise to √ an uncountable collection of disjoint open balls of radius 21 2, contradicting separability. In order to show completeness we will use the completeness of characters from p. 92. Define the complex linear hull A = hχ | χ a character on Gi, and notice that A is an algebra since the product of two characters is another character. Also notice that A is closed under conjugation, since χ(g) = χ(g) = χ(g)−1 = χ(−g) for g ∈ G defines another character if χ is a character. Since by completeness of characters the algebra A separates points in G, the Stone–Weierstrass theorem now implies that A is dense in C(G) with respect to k·k∞ . However, by the continuity of the embedding from C(G) to L2 (G) the closed linear hull of A in L2 (G) contains C(G) and so by Proposition 2.51 must be all of L2 (G). Now the theorem follows from the description of the closed linear hull of an orthonormal list in Theorem 3.39. 

3.4 Fourier Series on Td

95

Exercise 3.48. Let G ⊆ Td be a closed subgroup. Show that any character χ on G is the restriction of a character of the form χn for some n ∈ Zd (by using the arguments from the proof of Theorem 3.47). Exercise 3.49. Find all the characters on G = Z/q Z and prove Theorem 3.47 directly for this case. Exercise 3.50 (Uncertainty principle on finite groups). Generalize Exercise 3.49 to b for the group of characters on G. a finite abelian group G, and write G b is a group under the operation of pointwise multiplication, and that we (a) Show that G b = |G|. have |G| b → C defined (b) Define the discrete Fourier transform of f : G → C to be the function fb : G P P P 1 2 = 1 2 (Parseval’s b by fb(χ) = |G| f (g)χ(g). Show that | f (χ)| |f (g)| b g∈G g∈G χ∈G |G| formula). P 1 b (c) Prove that kfbk∞ = maxχ∈G b |f (χ)| 6 |G| g∈G |f (g)| = kf k1 . (d) Use the Cauchy–Schwarz inequality, Parseval’s formula, and the inequality from (c) to deduce the following uncertainty principle(12) : | Supp f |·| Supp fb| > |G| for f ∈ L2 (G)r{0}. Exercise 3.51. (a) Find all the characters on G = (Z/N Z)N (endowed with the product topology). Show the existence of a Haar measure and the completeness of characters from p. 92 for this case. (b) Now set N = 2 and notice that G = (Z/2Z)N is, as a measure space, isomorphic to (0, 1) with the Lebesgue measure (by using the binary expansion of real numbers). Interpret the characters of G as maps on (0, 1) to obtain the orthonormal basis known as the Walsh system. Exercise 3.52. Let p ∈ N be a prime number. Describe all the characters on the compact group of p-adic integers G = Zp , defined by G = lim Z/(pn Z) = ←− n→∞

(

(zn ) ∈

∞ Y

n=1

)

Z/(pn Z) | zn ≡ zn+1 mod pn Z .

Show the existence of a Haar measure and the completeness of characters from p. 92 for this case.

The following exercise shows how large the class of metric compact abelian groups really is. Exercise 3.53. Let Γ be a countable abelian group and use it to define G = {(zγ ) ∈ TΓ | zγ1 +γ2 = zγ1 + zγ2 for all γ1 , γ2 ∈ Γ }. (a) Show that G is a metric compact abelian group in the induced topology from the product topology on TΓ . (b) Use the theorem on completeness of the characters from p. 92 to show that the group of characters on G is isomorphic to Γ .

3.4 Fourier Series on Td The discussion in Section 3.3 applies in particular to the torus G = Td , giving the basic theory of Fourier series of L2 functions there, but this case is so

96

3 Hilbert Spaces, Fourier Series, and Unitary Representations

important that we will treat it in greater detail here. Along the way, we will give a proof for Fourier series on the torus that will be independent of the theorem regarding Fourier series on general groups (Theorem 3.47). For this section, we will define a character on Td to be a function of the form χn (x) = e2πin·x = e2πi(n1 x1 +···+nd xd ) for all x ∈ Td , for some n ∈ Zd . We will see in Corollary 3.67 that these are indeed all the characters on Td in the sense of Section 3.3 (also see Exercise 3.48). We note that χn (x) = χ−n (x) = χn (−x) for n ∈ Zd and x ∈ Td . A trigonometric polynomial is a finite linear combination X p= a n χn n∈F

of characters, where F ⊆ Zd is a finite set and an ∈ C for all n ∈ F . As in the proof of Theorem 3.47, one can use the complex version of the Stone–Weierstrass theorem to show that every continuous function can be approximated by a trigonometric polynomial. In this section we will give another proof of this using convolution. Theorem 3.54 (Fourier series on the torus). The characters χn with n in Zd form an orthonormal basis for L2 (Td ), so that every f ∈ L2 (Td ) is given by an L2 convergent Fourier series X f= a n χn , (3.16) n∈Zd

where the an are the Fourier coefficients defined by Z an = an (f ) = hf, χn i = f (t)χn (−t) dt Td

for n ∈ Zd . Moreover,

kf k22 =

X

n∈Zd

|an |2 .

(3.17)

Exercise 3.55. (a) Phrase Theorem 3.47 for d = 1 using the function χ0 = 1 and the functions x 7→ cos(2πnx) and x 7→ sin(2πnx) for n > 1. (b) For every n > 1 choose dn such that fn (x) = dn sin(πnx) has norm one in L2 ((0, 1)). Show that (fn )n>1 forms an orthonormal basis of L2 ((0, 1)). Notice that each fn satisfies the boundary conditions fn (0) = fn (1) = 0, which are called the Dirichlet boundary conditions. (c) For every n > 0 choose dn such that gn (x) = dn cos(πnx) has norm one in L2 ((0, 1)). Show that (gn )n>0 forms an orthonormal basis of L2 ((0, 1)). Note that every gn satis′ (0) = g ′ (1) = 0, which are called the Neumann boundary conditions. fies gn n (d) Find an orthonormal basis (hn )n>1 of L2 ((0, 1)) that consists of smooth functions satisfying the mixed boundary conditions hn (0) = h′n (1) = 0 for all n > 1.

3.4 Fourier Series on Td

97

Exercise 3.56. (a) Rephrase Theorem 3.47 for Td for real-valued functions using sine and cosine functions. (b) Find an orthonormal basis of L2 ((0, 1)d ) satisfying the Dirichlet boundary conditions (that is, the basis should consist of smooth functions that vanish on the boundary of [0, 1]d ).

The relation (3.17) is Parseval’s formula, and it may be viewed as an infinite-dimensional form of Pythagoras’ theorem. We will see later (see Theorem 4.9 in Section 4.1.1) that it is too much to ask for the Fourier series of a continuous function to converge uniformly, or even pointwise. However, some additional smoothness assumptions do imply uniform convergence of the Fourier series, and this will be the starting point for our excursion into the theory of Sobolev spaces in Chapter 5 and Section 6.4. Theorem 3.57 (Differentiability and Fourier series). Suppose that f is a function in C k (Td ) for some k > 1. Let α = (α1 , . . . , αd ) ∈ Nd0 be a multi-index with kαk1 6 k. Then the Fourier coefficient an (∂α f ) of ∂α f is given by an (∂α f ) = (2πin1 )α1 · · · (2πind )αd an (f ). (3.18) If k > d/2, then the Fourier series on the right-hand side of (3.16) converges absolutely, and q X kf k∞ 6 |an (f )| ≪d kf k22 + k∂ek1 f k22 + · · · + k∂ekd f k22 . n∈Zd

The reader may wonder in what sense the absolute convergence is meant, and the answer is in all of them: With respect to k · k2 , pointwise at every point, and with respect to k · k∞ . In order to prove these results (independently from the previous section) we will need to discuss convolution. 3.4.1 Convolution on the Torus Definition 3.58 (Convolution). Fix p, q ∈ [1, ∞] satisfying p1 + 1q = 1, let f be an element of Lp (Td ) and g an element of Lq (Td ). Then the convolution of f and g is the function f ∗ g defined by Z f ∗ g(x) = f (t)g(x − t) dt. Td

A pair of numbers p, q related as in Definition 3.58 are called H¨ older conjugate or conjugate exponents due to H¨older’s inequality kf gk1 6 kf kp kgkq for f ∈ Lp (Td ) and g ∈ Lq (Td ) (see Theorem B.15). This implies in particular that the integral defining f ∗ g(x) exists for all x ∈ Td . Lemma 3.59. Let p, q ∈ [1, ∞] be H¨ older conjugate numbers, let f ∈ Lp (Td ) q d and g ∈ L (T ). Then

98

3 Hilbert Spaces, Fourier Series, and Unitary Representations

(1) f ∗ g = g Z ∗ f;  (2) f ∗ χn = f (t)χn (t) dt χn ; and (3) hχm , χn i = δm,n .

Proof. The first formula follows by a simple substitution (see Exercise 3.46): Z Z f ∗ g(x) = f (t)g(x − t) dt = f (x − u)g(u) du = g ∗ f (x). | {z } Td Td =u

The second formula follows from the definition, since Z f ∗ χn (x) = f (t)χn (x − t) dt d ZT Z = f (t)χn (x)χn (−t) dt = f (t)χn (t) dt χn (x). Td

Td

For the last identity (which is a general property of characters, as we have seen in Section 3.3), note that χm (t)χn (t) = χm−n (t) = e2πi((m1 −n1 )t1 +···+(md −nd )td ) , and integrate this character over Td to obtain the result.



Lemma 3.60 (Continuity). Let p, q ∈ [1, ∞] be H¨ older conjugate numbers. If f ∈ Lp (Td ) and p < ∞ then the shifted function f x ∈ Lp (Td ) defined by f x (t) = f (t − x) depends continuously on x ∈ Td in the k · kp norm. Moreover, if f ∈ Lp (Td ) and g ∈ Lq (Td ), then f ∗ g ∈ C(Td ). Proof. For 1 6 p < ∞, C(Td ) is dense in Lp (Td ) by Proposition 2.51. Now fix f ∈ Lp (Td ), ε > 0 and choose F ∈ C(Td ) with kf − F kp < ε. Then by uniform continuity of F there exists some δ > 0 for which d(x, y) < δ =⇒ kF x − F y k∞ < ε =⇒ kF x − F y kp < ε. Since shifting functions preserves their integrals and their p-norms, we deduce that d(x, y) < δ implies that kf x − f y kp 6 kf x − F x kp + kF x − F y kp + kF y − f y kp < 3ε, showing the continuity of the map Td ∋ x 7→ f x ∈ Lp (Td ). Now suppose that f ∈ Lp (Td ) and g ∈ Lq (Td ). By Lemma 3.59(1) we may switch f and g if necessary and assume q < ∞. Fix ε > 0 and choose δ > 0 so that kg x − g y kq < ε holds whenever d(x, y) < δ. Using the fact that the function g essentially appears in the shifted form g x in the definition of f ∗ g(x), we now obtain

3.4 Fourier Series on Td

99

Z |f ∗ g(x) − f ∗ g(y)| = f (t) (g(x − t) − g(y − t)) dt d ZT 6 |f (t)||g(x − t) − g(y − t)| dt Td

6 kf kp kg x − g y kq 6 εkf kp

by the H¨ older inequality, whenever d(x, y) < δ (strictly speaking the function (˜ g )x (t) = g(x − t) with g˜(t) = g(−t) appears in the definition of f ∗ g(x), but using k˜ g x − g˜y kp = kg x − g y kp this does not make much of a difference). As ε > 0 was arbitrary we see that f ∗ g is continuous.  3.4.2 Dirichlet and Fej´ er Kernels Let us assume first that d = 1. By Lemma 3.59(2) the nth term in the Fourier series of f is given by an (f )χn = f ∗ χn with an (f ) = hf, χn i for every n ∈ Z. Thus the partial sums of the Fourier series satisfy ! N N X X an (f )χn = f ∗ χn . n=−N

n=−N

This observation motivates the following definition. Definition 3.61. The N th Dirichlet kernel is the function DN ∈ C(T) defined by N X DN = χn . n=−N

The 8th Dirichlet kernel is illustrated in Figure 3.3.

Fig. 3.3: The 8th Dirichlet kernel on the interval [− 21 , 12 ].

100

3 Hilbert Spaces, Fourier Series, and Unitary Representations

Fig. 3.4: The 8th Fej´ er kernel on the interval [− 21 , 12 ].

Lemma 3.62 (Dirichlet kernel). The Dirichlet kernel DN is real-valued and can also be expressed in the form ( 2N + 1 if x = 0 ∈ T, DN (x) = e2πi(N +1)x −e−2πiN x sin((N + 12 )2πx) = if x 6= 0, e2πix −1 sin(πx) and satisfies

Z

DN (x) dx = 1.

T

Proof. The case x = 0 and the integral calculation follow immediately from the definitions. To check the formula for x 6= 0 we notice that the Dirichlet iφ −iφ kernel is a geometric series and use the relation sin φ = e −e : 2i DN (x) =

N X

e2πix

n=−N

n

 2N  = e−2πiN x 1 + · · · + e2πix

2N +1 e2πix −1 e2πi(N +1)x − e−2πiN x =e = e2πix − 1 e2πix − 1  2πi(N + 12 )x −2πi(N + 12 )x sin (N + 12 )2πx e −e = = . eπix − e−πix sin(πx) −2πiN x

The latter formula also implies that DN is real-valued.



By the above, the Dirichlet kernel is real-valued but takes on both positive and negative values. By averaging we obtain another kernel that only takes on positive values, which will be crucial later. Definition 3.63. The M th Fej´er kernel is the function FM ∈ C(T) defined by M−1 1 X FM = Dm . M m=0 The 8th Fej´er kernel is shown in Figure 3.4. Lemma 3.64 (Fej´ er kernel). The M th Fej´er kernel is given by

3.4 Fourier Series on Td

101

FM (x) =

 M 1 M



sin(Mπx) sin(πx)

and satisfies the following properties:

if x = 0,

2

if x 6= 0

•F Z M (x) > 0 for all x ∈ T; • FM (x) dx = 1; T

• FM (x) → 0 as M → ∞ uniformly on every set of the form [δ, 1 − δ] for δ > 0.

Proof. We first verify the formula claimed for FM . If x = 0, then FM (0) =

M−1 1 X (2m + 1) = M. M m=0

For x 6= 0 we use Lemma 3.62 and obtain FM (x) =

M−1 1 X e2πi(m+1)x − e−2πimx M m=0 e2πix − 1

1 1 = M (e2πix − 1)

e

2πix

M−1 X

e

2πimx

m=0 2πiMx

M−1 X

e

m=0

−2πimx

!

  −2πiMx 1 eπix −1 −1 πix e −πix e e − e M (e2πix − 1) e2πix − 1 e−2πix − 1   1 eπix eπix = e2πiMx − 1 M (e2πix − 1) e2πix − 1   e−πix −2πiMx + e −1 1 − e−2πix  1 1 = e2πiMx − 2 + e−2πiMx πix −πix 2 M (e − e ) =

=

1 (eπiMx − e−πiMx )2 1 sin2 (M πx) = . πix −πix 2 M (e − e ) M sin2 (πx)

Now it isR clear that FM (x) > 0. R Since T Dm (x) dx = 1 for all m > 0 we also have T FM (x) dx = 1. Finally, for x ∈ [δ, 1 − δ] we have sin(πx) > π2 δ and hence FM (x) 6

1 M



uniformly for x ∈ [δ, 1 − δ] as M → ∞.

2 πδ

2

−→ 0 

102

3 Hilbert Spaces, Fourier Series, and Unitary Representations

Proposition 3.65 (Density of trigonometric polynomials). For a continuous function f on T we have f ∗FM → f as M → ∞ with respect to k·k∞ . In particular, trigonometric polynomials are dense in C(T). This behaviour of the sequence of functions (FM ) with respect to convolution is also described by saying that the sequence (FM ) is an approximate identity. As we will see in the proof, this property holds for any sequence of functions that satisfy the last three properties of the Fej´er kernel in Lemma 3.64. Proof of Proposition 3.65. Let f ∈ C(T) and fix ε > 0. Then there exists some δ ∈ (0, 12 ) for which d(x, y) < δ =⇒ |f (x)−f (y)| < ε. Now we estimate the difference f ∗ FM (x) − f (x) Ras follows. By commutativity of convolution and the facts that FM > 0 and T FM (t) dt = 1 we have |f ∗ FM (x) − f (x)| = |FM ∗ f (x) − f (x)| Z Z = FM (t)f (x − t) dt − FM (t)f (x) dt T ZT 6 FM (t)|f (x − t) − f (x)| dt. T

Now we split the range of integration into the interval [δ, 1 − δ] and its complement: |f ∗ FM (x) − f (x)| 6

Z

δ

1−δ

FM (t) |f (x − t) − f (x)| dt | {z } 62kf k∞

+

Z

δ

−δ

FM (t) |f (x − t) − f (x)| dt. | {z } 0 and Z FM (t) dt = 1. T

As δ is independent of x, the same is true for M , and we see that f ∗ FM converges uniformly to f . To see the final statement of the proposition note that f ∗ FM is a trigonometric polynomial (by linearity and Lemma 3.59(2)).  Exercise 3.66. Analyze where the above proof fails if we replace the Fej´ er kernel by the Dirichlet kernel.

Proof of Theorem 3.54. We start with the case d = 1. Proposition 3.65 shows that the linear hull A of the characters (that is, the space of trigono-

3.4 Fourier Series on Td

103

metric polynomials) is dense in C(T) with respect to k · k∞ . Therefore the same holds with respect to k·k2 in L2 (T). By Lemma 3.59, the characters on T form an orthonormal set. Thus the description of the closed linear hull of the characters follows from Proposition 3.36 and proves the theorem for d = 1. The case of d > 2 is similar once we have shown that the space of trigonometric polynomials is dense in the continuous functions. For this, notice first that g F M (x1 , . . . , xd ) = FM (x1 ) · · · FM (xd ) is a trigonometric polynomial satisfying

g • F R M > 0, g • Td F M (x) dx = 1, and R d R δ g • [−δ,δ]d F F (t) dt >1−ε M (x) dx = M −δ

for ε, δ > 0 and large enough M (how large depending on ε and δ). Next notice d g that f ∗ F M is a trigonometric polynomial for any f ∈ C(T ). The argument g is now similar to the case d = 1: we again show that the sequence (F M ) is an d approximate identity. Given f ∈ C(T ) and ε > 0 we can choose δ > 0 such that |f (x − t) − f (x)| < ε for x ∈ Td and t ∈ [−δ, δ]d . This implies that Z g g |f (x − t) − f (x)|F f ∗ FM (x) − f (x) 6 M (t) dt Td Z g 6 |f (x − t) − f (x)| F M (t) dt {z } [−δ,δ]d | 1 and χ : Td → S1 is a continuous homomorphism. Writing x = (x1 , . . . , xd ) we have χ(x) = χ(x1 , 0, 0, . . . , 0) χ(0, x2 , 0, . . . , 0) · · · χ(0, 0, . . . , xd ), | {z }| {z } | {z } χ(1) (x1 )

χ(2) (x2 )

χ(d) (xd )

where each χ(i) : T → S1 is a continuous homomorphism for 1 6 i 6 d. By the argument above we have χ(i) = χni for some ni ∈ Z, and so the result follows.  Exercise 3.68 (Polynomial approximate identities on R). Give another proof of the Weierstrass approximation theorem (Example 2.41) using the Landau kernel defined R1 by Ln (x) = ℓn (1 − x2 )n for x ∈ R, where ℓn > 0 is chosen to make −1 Ln (x) dx = 1. (a) Given f ∈ C([0, 1]), extend f continuously to R with support in [−c, 1 + c] for some small c > 0 (the choice of which will matter in (b)). Show that Ln ∗ f (x) =

Z

R

Ln (t)f (x − t) dt

is a polynomial. (b) State, prove, and use an appropriate approximate identity property of the sequence (Ln ) to show that Ln ∗ f (x) converges uniformly on [0, 1] to f as n → ∞.

3.4.3 Differentiability and Fourier Series We now turn to the interplay between Fourier series and differentiation, with the goal of proving Theorem 3.57. As we will see, this relationship will be a simple but important consequence of integration by parts. Suppose that f ∈ C 1 (Td ) and j ∈ {1, . . . , d}. Notice that Z xj =1 Z ∂j f (x)χn (x) dxj = f (x)χn (x) − f (x)∂j χn (x) dxj xj =0 T T Z = − f (x)∂j χn (x) dxj T Z = 2πinj f (x)χn (x) dxj (3.19) T

by integration by parts and periodicity, since (∂j χn ) (x) = ∂j e2πi(n1 x1 +···+nd xd ) = 2πinj e2πi(n1 x1 +···+nd xd ) . Integrating over the remaining variables, we see that the Fourier coefficients of f and of the partial derivative ∂j f satisfy the relation

3.4 Fourier Series on Td

an (∂j f ) =

Z

105

∂j f (x)χn (x) dx = 2πinj

Td

Z

f (x)χn (x) dx

(by (3.19))

Td

= 2πinj an (f ).

(3.20)

Proof of Theorem 3.57. The formula (3.18) follows from (3.20) by induction on k. To prove the last claim of the theorem, we will show that q X |an (f )| ≪d kf k22 + k∂1k f k22 + · · · + k∂dk f k22 (3.21) n∈Zd

for f ∈ C k (Td ) and k > d2 . Assuming (3.21) for the moment, we see that X

an (f )χn

n∈Zd

is an absolutely convergent series with respect to k · k∞ , and so converges to some limit F in C(Td ) with respect to k ·k∞. However, since k ·k2 6 k ·k∞ the same function F is also a limit with respect to k · k2 . Hence by Fourier series on the torus (Theorem 3.54) we have F = f first as an identity in L2 (Td ) (and hence almost everywhere), but as both functions are continuous, also in C(Td ) (and hence everywhere). To prove (3.21), we start by expressing the right-hand side in terms of the Fourier coefficient an = an (f ). By Parseval’s theorem in (3.17) applied to ∂ekj f we have k∂ekj f k22 =

X

n∈Zd

|an (∂ekj f )|2 =

X

(2πnj )

2k

n∈Zd

|an (f )|2 ,

where we have used (3.18) in the last step. Therefore we can simplify the sum under the square root in (3.21) to give the estimate   d X X 1 + (2π)2k  |an (f )|2 kf k22 + k∂ek1 f k22 + · · · + k∂ekd f k22 = n2k j n∈Zd

≫d

since knk2 6 that

X

n∈Zd

j=1

 2k

1 + knk2

|an (f )|2

√ Pd 2k d max16j6d |nj | and hence knk2k 2 ≪d j=1 |nj | . We claim  −1/2  1 + knk2k ∈ ℓ2 (Zd ) (3.22) 2 d n∈Z

d 2.

for k > From this claim the inequality (3.21) follows quickly by the Cauchy–Schwarz inequality:

106

3 Hilbert Spaces, Fourier Series, and Unitary Representations

X

n∈Zd

|an (f )| =

X

n∈Zd

1 + knk2k 2

−1/2

 

 2k −1/2 6 1 + knk 2

1 + knk2k 2

d n∈Z

ℓ2 (Zd )

1/2

|an (f )|

sX

n∈Zd

q ≪d kf k22 + k∂ek1 f k22 + · · · + k∂ekd f k22 .

 1 + knk2k |an (f )|2 2

To verify the claim that

X

n∈Zd

1 0. Secondly, using the symmetry of the summands with respect to permutation of the variables, we may restrict the sum to those n ∈ Zd for which n2 , n3 , . . . , nd 6 n1 , and 2k we may also assume that n1 > 1. Now 1 + knk2k 2 > n1 , so X

n∈Zd

n1 ∞ X X 1 1 ≪ d 2k 1 + knk2 n2k n =1 n ,...,n =0 1 1

2

d

∞ ∞ X X (n1 + 1)d−1 1 = ≪ , d 2k 2k+1−d n1 n n1 =1 n1 =1 1

and the last sum converges if 2k > d. This implies the claim above, the inequality (3.21), and hence the theorem.  The above is already sufficient to answer another claim from Section 1.5. We state a special case of the inheritance of smoothness below, and return again to this topic in Chapter 5. Exercise 3.69. Let f be a real-valued function defined on an open subset U ⊆ R2 . Suppose that f is continuous and that ∂14 f , ∂24 f exist and are continuous. Show that ∂1 ∂2 f exists and is continuous.

3.5 Group Actions and Representations We are going to describe in this section in particular how functions on R2 can be decomposed into functions that have special rotational symmetries as alluded to in Section 1.1. The same argument will also apply to symmetries with respect to rotations about the z-axis in R3 (but not to all rotations in R3 ), and to many other situations. A convenient framework that incorporates all of these examples and much more is the following set-up.

3.5 Group Actions and Representations

107

3.5.1 Group Actions and Unitary Representations Definition 3.70. Let G be a topological group, and let X be a topological space. A continuous group action of G on X is a continuous map

..

. : G × X −→ X (g, x) 7−→ g.x

.

.

with g (h x) = (gh) x for g, h ∈ G and x ∈ X, and e x = x for all x ∈ X, where e ∈ G is the identity element. In this definition we have used multiplicative notation for the group operation in G, but as usual if G is abelian we will often use additive notation. Definition 3.71. Let G be a topological group with a continuous action on a topological space X, and let µ be a measure on the Borel sets of X. Then we say that the G-action is measure-preserving (or that the measure is Ginvariant ) if µ(g B) = µ(B) for all g ∈ G and any Borel set B ⊆ X.

.

It is straightforward (see Exercise 3.72) to see that a measure-preserving action also preserves integration with respect to µ in the sense that Z Z Z Z f g dµ = f (g −1 x) dµ(x) = f (x) dµ(x) = f dµ, (3.23)

.

X

X

X

X

.

for all integrable functions f and g ∈ G, where we define f g (x) = f (g −1 x) for all x ∈ X (the inverse in the definition of f g is only necessary if G is non-abelian). In particular, if f1 , f2 ∈ L2µ (X) then kf1g k2 = kf1 k2 and, more generally hf1g , f2g i = hf1 , f2 i . (3.24) We can associate to the action of g on X an operator πg on L2µ (X) defined by πg f = f g = f ◦ g −1 . Using (3.24) and the relation πg πg−1 = πg−1 πg = I, we see that πg is unitary (see Definition 3.8 and Exercise 3.9) for all g ∈ G. Essential Exercise 3.72. Prove that a group action is measure-preserving if and only if (3.23) holds for all integrable f . Definition 3.73. A unitary representation of a topological group G on a Hilbert space H is a map π : G → B(H), written as πg (or π(g), g v, or v g for v ∈ H), such that

.

• (Identity) πe = I, the identity operator on H; • (Composition) πg1 ◦ πg2 = πg1 g2 for g1 , g2 ∈ G; • (Unitary) πg : H → H is unitary for every g ∈ G; and

108

3 Hilbert Spaces, Fourier Series, and Unitary Representations

• (Continuity) for any given v ∈ H, the map g 7→ πg v ∈ H is continuous. We note that the first three properties together state that π is a homomorphism into the group of unitary operators. Lemma 3.74 (Continuity). Let G be a locally compact metric group G acting continuously on a locally compact σ-compact metric space X. Suppose that the action is measure-preserving with respect to a locally finite† measure µ on the Borel sets of X. Then the induced action of G on H = L2µ (X) is a unitary representation of G on H. More generally, for any p ∈ [1, ∞) and f ∈ Lpµ (X) we have that G ∋ g 7→ f g ∈ Lpµ (X) is continuous with respect to k · kp and satisfies kf g kp = kf kp . Lemma 3.74 is an important motivation for the study of unitary representations in general. We will return to this topic in several settings. Proof of Lemma 3.74. By Exercise 3.72 (see the hints on p. 563), πg defined by πg f (x) = f (g −1 x) for x ∈ X and f ∈ Lpµ (X) satisfies kπg f kp = kf kp for every g ∈ G. The first properties of a unitary representation hold trivially. For the second, let f be an element of Lpµ (X) and g1 , g2 ∈ G. Applying the definition twice we obtain

.

.

. . = f ((g g ) .x) = π

πg1 (πg2 f )(x) = πg2 f (g1−1 x) = f (g2−1 (g1−1 x)) 1 2

−1

g1 g2 f (x)

for x ∈ X, giving the second defining property of a unitary representation. For the continuity we essentially have to repeat the argument from the proof of Lemma 3.60 regarding convolutions on the torus. So let p ∈ [1, ∞) and f ∈ Lpµ (X) and fix ε > 0. By Proposition 2.51, there exists some function F ∈ Cc (X) with kf − F kp < ε. (3.25) Let U be a compact neighbourhood of e ∈ G, so that

.

K = U Supp F ⊆ X is compact and hence has finite measure. The map

.

(g, x) 7−→ F (g −1 x) is continuous and so is uniformly continuous on the set U × K. Hence there exists a δ > 0 for which p p d(g, e) < δ =⇒ g ∈ U and F (g −1 y) − F (y) < ε/ µ(K)

.

.

for all y ∈ K. Note also that F (g −1 y) 6= 0 for y ∈ X and g ∈ U implies g −1 y ∈ Supp F and hence y ∈ K. Thus

.

That is, a measure µ with the property that at every point there is an open neighbourhood of the point with finite µ-measure.

3.5 Group Actions and Representations

kπg F − F kpp = =

Z

X

Z

109

F (g −1 x) − F (x) p dµ(x)

. .x){z− F (x) } dµ(x) < ε

F (g −1 K|

p

p

0 and g0 ∈ G were arbitrary, continuity of g 7→ πg f follows.  We explain now that any continuous group action and invariant measure gives rise to a convolution, which will play a critical role in the following discussions. Lemma 3.75. Let G be a locally compact σ-compact metric group with a left Haar measure mG , X a locally compact σ-compact metric space with a continuous group action of G on X, and µ a locally finite G-invariant measure on X. Let φ ∈ L1 (G), p ∈ [1, ∞) and f ∈ Lpµ (X). Then the integral Z φ ∗ f (x) = φ(g)f (g −1 x) dmG (g)

.

.

.

G

exists for µ-almost every x ∈ X, and kφ ∗ f kp 6 kφk1 kf kp .

.

.

We note that the inequality also shows that φ ∗f ∈ Lpµ (X) only depends on the equivalence classes of f ∈ Lpµ (X) and φ ∈ L1 (G). Proof of Lemma 3.75. Let (Y, B, ν) be a probability Rspace. Recall R that the p map [0, ∞) ∋ t 7→ tp is convex, which implies that f dν 6 f p dν for any non-negative simple function f on Y . Using monotone convergence we obtain the same for any non-negative measurable f on Y (that is, a special case of Jensen’s inequality). Consider now G, φ, X, and f as in the lemma, but assume first that φ R is a non-negative measurable function with φ dmG = 1, and f > 0. We apply Jensen’s inequality to the function g 7→ f (g −1 x) and the probability measure φ dmG on G to obtain Z p Z −1 f (g x)φ(g) dmG (g) 6 f (g −1 x)p φ(g) dmG (g).

.

.

G

.

G

110

3 Hilbert Spaces, Fourier Series, and Unitary Representations

.

R Fubini’s theorem shows that φ ∗f (x) = G f (g −1 x)φ(g) dmG (g) depends measurably on x ∈ X. Integrating over X with respect to µ gives Z Z Z (φ ∗f )p dµ 6 f (g −1 x)p φ(g) dmG (g) dµ(x) X X G Z = kf kpp φ(g) dmG (g) = kf kpp ,

.

.

.

G

where we also applied Fubini’s theorem to the function

.

G × X ∋ (g, x) 7−→ f (g −1 x)p φ(g) and used the fact that πg preserves the p-norm (see Lemma 3.74). Taking the pth root, we obtain kφ ∗f kp 6 kf kp in the case considered. If f ∈ Lpµ (X) and φ ∈ L1 (G), then we can apply the above to fe = |f | and φe = kφk−1 1 |φ|. Since Z −1 ∗ |φ f |(x) = φ(g)f (g x) dmG (g) G Z e fe(g −1 x) dmG (g) = kφk1 φ˜ ∗f˜(x), 6 kφk1 φ(g)

.

.

.

.

G

.

the lemma follows from the previous case.



Essential Exercise 3.76. Prove Lemma 3.75 in the case p = ∞. We will sometimes consider direct sums of representations as in the next exercise. Exercise 3.77. Let G be a topological group, and let π n be a unitary representation of G L on the Hilbert space Hn for n > 1. Define H⊕ = n Hn as in Exercise 3.37. Show that π⊕,g (vn ) = (πn,g vn ) for g ∈ G and (vn ) ∈ H⊕ defines a unitary representation of G.

3.5.2 Unitary Representations of Compact Abelian Groups We now describe the decomposition of elements in a complex Hilbert space into elements of special types with respect to a unitary representation of a compact abelian group. Definition 3.78. Let G be a topological group and let π be a unitary representation of G on the complex Hilbert space H. Let χ : G → S1 be a character on G. Then v ∈ H is of type χ or has weight χ (type or weight n ∈ Zd if χ = χn in the case G = Td ) if πg v = χ(g)v for all g ∈ G. We also define the weight space Hχ = {v ∈ H | v has weight χ}.

3.5 Group Actions and Representations

111

T We note that Hχ = g∈G ker(πg − χ(g)I) is in fact just a common eigenspace if one considers all operators πg for g ∈ G simultaneously, and in particular is closed. Here χ gives us the various eigenvalues χ(g) as g ∈ G (and hence the operator πg ) varies. Lemma 3.79 (Orthogonality). Let G, π, and H be as in Definition 3.78, then Hχ ⊥ Hη for any two characters χ 6= η of G. Proof. Let v ∈ Hχ , w ∈ Hη and g ∈ G. Then χ(g) hv, wi = hχ(g)v, wi = hπg v, wi = hv, π−g wi = hv, η(−g)wi = η(g) hv, wi . However, for some g ∈ G we have χ(g) 6= η(g) and so hv, wi = 0.



The following result gives as a special case the decomposition of functions on R2 into components of different weights for SO2 (R). Theorem 3.80 (Spectral theorem of compact abelian groups). Let G be a compact metric abelian group and let π be a unitary representation on a complex Hilbert space H. Then the weight spaces Hχ are closed, mutually orthogonal, subspaces of H and M H= Hχ , χ

where the sum is over all charactersPχ of G. More concretely, every v ∈ H can be written as a convergent sum χ vχ in H, where vχ = χ ∗π v =

Z

χ(g)πg v dmG (g)

(3.26)

G

has weight χ. We need to explain the meaning of the convolution by χ in (3.26) by extending the notion of Riemann or Lebesgue integration to functions taking values in a Hilbert space (or even a Banach space). In Lemma 3.75 we have already seen one possible interpretation of the convolution in the case where the Hilbert space is L2µ (X) and the unitary representation is induced from a measure-preserving action. Although this is an interesting case, there are other ways to come by a unitary representation. Hence we will discuss in the following two subsections two more general definitions of such integrals. 3.5.3 The Strong (Riemann) Integral One way to interpret χ ∗π v is to generalize the Riemann integral to this context, giving rise to a special case of the Bochner or strong integral. This approach applies to any unitary representation as in Theorem 3.80 (and more

112

3 Hilbert Spaces, Fourier Series, and Unitary Representations

generally). Recall that for a fixed v ∈ H we have assumed in that theorem that πg v ∈ H depends continuously on g ∈ G, and since the map g 7→ χ(g) is continuous we also see that χ(g)πg v ∈ H depends continuously on g ∈ G. Extracting the essential assumptions from this application we obtain the assumptions of the next statement. Proposition 3.81 (Strong integration). Let (X, d) be a compact metric space and let µ be a finite Borel measure on X. Let V be a Banach space, and let f : X → V be a continuous map. Then we can define Riemann sums of the form X R(f, ξ) = mG (P )f (xP ) ∈ V, P ∈ξ

where ξ = {P1 , . . . , Pk } is a partition of X into finitely many non-empty measurable sets and xP ∈ P is a point chosen arbitrarily in each partition element P ∈ ξ. Suppose that lim max (diam(P )) = 0

n→∞ P ∈ξn

(3.27)

along a sequence of partitions (ξn ). Then the sequence of associated Riemann sums (R(f, ξn )) forms R a Cauchy sequence in V . The limit is a well-defined Riemann integral R- fX dµ ∈ V that is independent of the choice of the partition and sample points. Proof. Since f is continuous on a compact metric space, it is also uniformly continuous. Fix some ε > 0 and let δ > 0 be such that d(x, y) < δ implies kf (x) − f (y)k < ε. Suppose ξ = {P1 , . . . , Pk } and ζ = {Q1 , . . . , Qℓ } are two partitions such that diam(Pi ) < δ and diam(Qj ) < δ for all i, j. Let η = ξ ∨ ζ = {Pi ∩ Qj | Pi ∩ Qj 6= ∅, i = 1, . . . , k and j = 1, . . . , ℓ}. Fix some choice of sample points for ξ, ζ, and η and note that xPi ∈ Pi and zPi ∩Qj ∈ Pi ∩ Qj ⊆ Pi implies d(xPi , zPi ∩Qj ) < δ for every i and j. This gives

X

k X′

(f (x ) − f (z ))µ(P ∩ Q ) kR(f, ξ) − R(f, η)k = Pi Pi ∩Qj i j

i=1 j 6

k X X ′ i=1

6

j

k X X ′ i=1

j

kf (xPi ) − f (zPi ∩Qj )kµ(Pi ∩ Qj )

εµ(Pi ∩ Qj ) = εµ(X),

P where we write ′j for the sum over those j ∈ {1, . . . , ℓ} with Pi ∩ Qj 6= ∅. The same holds for the Riemann sums R(f, ζ) and R(f, η), which implies

3.5 Group Actions and Representations

113

that kR(f, ξ) − R(f, ζ)k 6 2µ(X)ε. This implies the lemma: If (ξn ) is a sequence satisfying (3.27), then for every ε > 0 there exists some N such that ξ = ξm , ζ = ξn satisfy the above discussions whenever m, n > N . This implies that (R(f, ξn )) is a Cauchy sequence. If (ζn ) is another such sequence, we may mix the two sequences of partitions into another sequence (by, for example, setting η2n−1 = ξn and η2n = ζn for all n ∈ N) satisfying (3.27). Since the Riemann sums for this sequence also form a Cauchy sequence, we see that the limit is indeed independent of the choice of the sequence of partitions and the choice of the sample points.  Note that in the context of Theorem 3.80 we may set X = G, the measure µ = mG , V = H, and f (g) = χ(g)πg (v) and use Proposition 3.81 to obtain a definition of χ ∗π v. Essential Exercise 3.82. Using R the same notation and assumptions as in Proposition 3.81, show that R- X f dµ depends linearly on f and satisfies

Z

Z

R- f dµ 6 kf k dµ.

X

X

3.5.4 The Weak (Lebesgue) Integral In the following we will describe an integral for functions taking values in a Hilbert space, and this is also the approach that requires the least amount of structure for the function. On the other hand, this approach does not allow integration of functions taking values in any Banach space, but works similarly for any dual space (and so also for the class of reflexive Banach spaces to be defined later). This is a special case of the weak, Pettis or Gelfand–Pettis integral. In the general setting the properties of the Pettis integral involve subtle developments in measure theory; we refer to Talagrand [102] for the details. Proposition 3.83 (Weak integration). Let (X, B, µ) be a measure space and let H be a Hilbert space. Let the function f : X → H have the properties that x 7→ kf (x)k is measurable and integrable, and that for any v ∈ H the map x 7→ R hv, f (x)i is measurable. Then there exists a unique element of H denoted X f dµ and called the weak integral of f with  Z  Z v, f dµ = hv, f (x)i dµ(x) (3.28) X

for all v ∈ H. Moreover,

R

X

X

f dµ depends linearly on f and satisfies

114

3 Hilbert Spaces, Fourier Series, and Unitary Representations

Z

Z

f dµ 6 kf k dµ.

X

X

Proof. To see this we only have to show that the right-hand side of (3.28) defines a continuous functional on H, for then the Fr´echet–Riesz representation theorem (Corollary 3.19) implies the claimed existence and uniqueness. By the Cauchy–Schwarz inequality we have Z Z Z hv, f (x)i dµ(x) 6 |hv, f (x)i| dµ(x) 6 kvk kf (x)k dµ(x), X

X

X

so the hypotheses show that the integral converges, and hence the map is well-defined. Moreover, for any scalar α and v, w ∈ H we have by linearity of the inner product that Z Z Z hαv + w, f (x)i dµ(x) = α hv, f (x)i dµ(x) + hw, f (x)i dµ(x), X

X

X

showing linearity of Rthe functional.

R

R This shows that X f dµ ∈ H exists, and that X f dµ 6 X kf k dµ. Using uniqueness, one can now check that this definition depends linearly on f .  Lemma 3.84. Let (X, d) be a compact metric space and let µ be a finite Borel measure on X. R Let H be R a Hilbert space, and let f : X → H be a continuous map. Then R- X f dµ = X f dµ, that is, the strong and weak integrals agree. Proof. R Let (ξn ) be a sequence of partitions as in Proposition 3.81 defining R- Xf dµ as the limit of the Riemann sums R(f, ξn ). Let w ∈ H and notice that Z X hw, R(f, ξn )i = hw, f (xP )i µ(P ) = Fn dµ, X

P ∈ξn

where Fn is the simple function with values Fn (x) = hw, f (xP )i for all x ∈ P and P ∈ ξn . Note that Fn (x) → hw, f (x)i as n → ∞ by continuity of f and the assumption (3.27) on (ξn ). Letting n → ∞ we can apply the definition of the strong integral and dominated convergence to obtain   Z Z w, R- f dµ = hw, f (x)i dµ(x). X

X

As this holds for any w ∈ H, the lemma follows from the construction of the weak integral in Proposition 3.83.  In the context of unitary representations of a group G on a Hilbert space H, we can use the notions of integration above to define convolution with measures or with L1 functions as follows.

3.5 Group Actions and Representations

115

Definition 3.85. Let π be a unitary representation of a topological group G on a Hilbert space H. Let µ be a finite measure on G. Then for v ∈ H we define the convolution operator Z µ ∗π v = πg v dµ(g). G

If G has a left Haar measure mG and φ ∈ L1 (G) = L1mG (G), then for v ∈ H we define Z φ ∗π v = φ(g)πg v dmG (g). G

Essential Exercise 3.86. (a) Let π, G, H, µ, and φ be as in Definition 3.85. Show that kµ ∗π vk 6 µ(G)kvk and kφ ∗π vk 6 kφk1 kvk for all v ∈ H, so that µ ∗π (·) and φ ∗π (·) define two bounded operators on H. (b) Suppose that the unitary representation π on H = L2µ (X) is induced by a measure-preserving action of G as in Lemma 3.75. Let ν be a finite measure on G, φ ∈ L1 (G) and f ∈ H. Generalize Lemma 3.75 to also give a definition of ν ∗ f . Show that the pointwise defined functions ν ∗f and φ ∗ f satisfy (3.28) (or equivalently that ν ∗ f = ν ∗π f and φ ∗f = φ ∗π f ).

.

.

.

.

.

3.5.5 Proof of the Weight Decomposition We are now ready to prove Theorem 3.80 which, apart from the generalized context, is a simple extension of the theorem regarding Fourier series on compact abelian groups (Theorem 3.47). We will use the assumptions of the theorem in this section without further remark. Lemma 3.87 (Convolution with χ). Let χ be a character of G. Then χ ∗π v has weight χ for any v in H. Proof. We need to prove that πg (χ ∗π v) = χ(g)(χ ∗π v). Then hw, πg (χ ∗π v)i = hπ−g w, χ ∗π vi Z = hπ−g w, χ(h)πh vi dmG (h) ZG = hw, χ(h)πg+h vi dmG (h) ZG = hw, χ(h′ − g)πh′ vi dmG (h′ ) ZG = hw, χ(g)χ(h′ )πh′ vi dmG (h′ ) = hw, χ(g)χ ∗π vi G

for w ∈ H, which gives the lemma.



116

3 Hilbert Spaces, Fourier Series, and Unitary Representations

Lemma 3.88 (Convolution with χ). If χ is a character and v ∈ Hχ then we have χ ∗π v = v. If η 6= χ are two different characters and v ∈ Hη , then χ ∗π v = 0. Proof. If v ∈ Hχ then Z Z χ ∗π v = χ(g)πg v dmG (g) = χ(g)χ(g)v dmG (g) = v, G

G

since we assume that mG (G) = 1 (strictly speaking, we should argue by taking the inner product with another vector, as in Proposition 3.83; here, and in similar cases below, we keep this implicit when convenient). Also for v ∈ Hη we have Z  Z Z χ ∗π v = χ(h)πh v dmG = χ(h)η(h)v dmG = χη dmG v = 0. G

G

G

 Proof of Theorem 3.80. Let H′ be the closed linear hull of Hχ for all characters χ of G. By Lemma 3.79 the various weight spaces are mutually orthogonal, and (as already noted) closed, which gives us the following description of their closed linear hull ( ) M X X ′ 2 H = Hχ = vχ | vχ ∈ Hχ and kvχ k < ∞ , χ

χ

χ

see Exercise 3.37. If H′ = H then the theorem follows from Exercise 3.86(a) (which specializes Proposition P 3.83) and Lemma 3.88. In fact, (3.26) can then be shown as follows: if v = χ vχ is an element of H′ with vχ ∈ Hχ for every character χ of G, then continuity of convolution implies ! X χ ∗π v = χ ∗π vχ = χ ∗π vχ = vχ χ

for any character χ of G. To see that H′ = H we show that any vector v ∈ H can be approximated by a vector in the linear hull of the spaces Hχ . Fix v and ε > 0. By continuity of the unitary representation there exists some δ > 0 such that kπg v − vk < ε for g ∈ BδG . By Urysohn’s lemma (Lemma A.27) there exists a function f in C(G) with f (0) > 0, f > 0, and f (g) = 0 for all g ∈ GrRBδG . Replacing f by a multiple of itself we may also suppose G f dmG = 1. Using this function together with the last part of the proof of Theorem 3.47 (or the more concrete argument from Section 3.4.2 in the case of the torus) we see that there is a trigonometric polynomial (that is, a finite linear combination of characters) F with the following properties:

3.5 Group Actions and Representations

117

• F Z > 0; • F dmG = 1; and ZG • F dmG > 1 − ε. BδG

The careful reader might notice that the approximation may not be realvalued nor have integral one. To deal with this issue we let F1 be the first ε′ approximation and define F2 to be the function 12 (F1 + F1 ) R+ ε′ , which is a ′ −1 positive R 2ε -approximation since Rf > 0. Now define F′ = ( F2 dmG ) F2 . Since f dmG = 1, we have | F2 dmG − 1| < 2ε and so F is an εapproximation if ε′ is sufficiently small. Then F ∗π v ∈ H′ is a finite linear combination of elements from weight spaces by Lemma 3.87. However, we also claim that kF ∗π v − vk 6 ε (1 + 2kvk) .

(3.29)

To see this, let w ∈ H, and then notice that  Z   Z   |hw, F ∗π v − vi| = w, F (g)πg v dmG (g) − w, F dmG v G G Z = hw, F (g) (πg v − v)i dmG (g) ZG 6 |hw, πg v − vi| F (g) dmG (g) {z } BδG | 6kwkε

+

Z

GrBδG

6 kwkε (1 + 2kvk),

|hw, πg v − vi| F (g) dmG (g) | {z } 62kwkkvk

which implies (3.29) since we may apply the inequality with w = F ∗π v − v.  For the case of the continuous action of T on R2 defined by      x cos(2πφ) − sin(2πφ) x1 φ 1 = k(2πφ)x = , x2 sin(2πφ) cos(2πφ) x2

.

for φ ∈ T and all x = (x1 , x2 )t ∈ R2 , the above immediately gives the following corollary (that we hinted at in Section 1.1). Corollary 3.89. Every function f ∈ L2 (R2 ) can be written uniquely as a P sum n∈Z fn that converges with respect to k · k2 , where Z fn (x) = χn (φ)f (k(2πφ)x) dφ T

118

3 Hilbert Spaces, Fourier Series, and Unitary Representations

and fn has weight n with respect to the rotation action of T on R2 for all integers n ∈ Z. Exercise 3.90. Let f ∈ C ∞ (R2 ) ∩ L2 (R2 ) (or, more generally, f ∈ C ∞ (R2 )). Show that the decomposition of f given by Corollary 3.89 converges uniformly on compact subsets of R2 to f . How much smoothness is needed to arrive at this uniform convergence?

3.5.6 Convolution We conclude the chapter by generalizing the convolution considered in Section 3.4.1 to a more general context, combining it with the discussion regarding the convolution with respect to a unitary representation. Proposition 3.91 (Convolutions). Let G be a locally compact σ-compact metric group with a left Haar measure mG . Define the convolution f1 ∗ f2 of f1 , f2 ∈ L1 (G) by Z f1 ∗ f2 (g) = f1 (h)f2 (h−1 g) dmG (h) for all g ∈ G. The integral defining f1 ∗ f2 (g) exists for mG -almost every g in G, and defines an element f1 ∗ f2 ∈ L1 (G) with kf1 ∗ f2 k1 6 kf1 k1 kf2 k1 . In other words, the convolution makes L1 (G) into a separable Banach algebra. Suppose in addition π is a unitary representation of G on the Hilbert space H. Then f1 ∗π (f2 ∗π v) = (f1 ∗ f2 ) ∗π v, where ∗π is defined in Definition 3.85. In other words, H is a module for the Banach algebra L1 (G).

.

Proof. Let G be as in the proposition. Then G ∋ h 7−→ g h = gh for every g ∈ G defines (by the definition of topological group) a continuous group action of G on X = G, called the left action of G (on G). By the definition of a left Haar measure, mG is a G-invariant locally finite measure on G for the left action of G. Therefore, we may apply Lemma 3.75 to this action, giving kf1 ∗ f2 k1 6 kf1 k1 kf2 k1 for f1 , f2 ∈ L1 (G). Clearly the map (f1 , f2 ) 7→ f1 ∗ f2 is bilinear in f1 , f2 ∈ L1 (G). It remains to show associativity for the operation. For this assume that f1 , f2 , f3 ∈ L1 (G), and then note that Z (f1 ∗ f2 ) ∗ f3 (g) = (f1 ∗ f2 )(h)f3 (h−1 g) dmG (h) ZG Z −1 −1 = f1 (k)f2 (k | {z h})f3 (h g) dmG (k) dmG (h). G

G

Using Fubini and the substitution ℓ = k −1 h for a fixed k gives

3.5 Group Actions and Representations

(f1 ∗ f2 ) ∗ f3 (g) =

Z

G

f1 (k)

119

Z

f2 (ℓ)f3 (ℓ−1 k −1 g) dmG (ℓ) dmG (k) |G {z } (f2 ∗f3 )(k−1 g)

= f1 ∗ (f2 ∗ f3 )(g),

as required. For the proof of separability of L1 (G) we apply Lemma A.22 o to find a sequence (Kn )Sof compact subsets Kn ⊆ G with Kn ⊆ Kn+1 for all n > 1 and with X = ∞ K . By Lemma 2.46, C(K ) is separable, which n n n=1 implies the same for the subset Cc (Kno ), for every n > 1. Since mG (Kn ) < ∞ the inclusion of Cc (Kno ) into L1 (G) is continuous, andS so the image of Cc (Kno ) o 1 in L1 (G) is also separable. By density of Cc (G) = ∞ n=1 Cc (Kn ) in L (G) (Proposition 2.51) this proves separability of L1 (G). For the second part we suppose π is a unitary representation of G on the Hilbert space H and that v, w ∈ H. Then Z  Z  hw, f1 ∗π (f2 ∗π v)i = w, f1 (h)πh f2 (g)πg v dmG (g) dmG (h)   Z Z = f1 (h) πh−1 w, f2 (g)πg v dmG (g) dmG (h) ZZ

= f1 (h) πh−1 w, f2 (g)πg v dmG (g) dmG (h) ZZ = hw, f1 (h)f2 (g)πhg vi dmG (g) dmG (h). Using the substition k = hg in the inner integral and exchanging the order of integration we obtain ZZ

hw, f1 ∗π (f2 ∗π v)i = w, f1 (h)f2 (h−1 k)πk v dmG (k) dmG (h) Z = hw, f1 ∗ f2 (k)πk vi dmG (k) = hw, (f1 ∗ f2 ) ∗π vi . By the definition of the weak integral in Proposition 3.83, this implies the proposition.  Essential Exercise 3.92. Suppose in addition to the assumptions of Proposition 3.91 that G is abelian and that the Haar measure is invariant under† g 7→ −g. Show that f1 ∗ f2 = f2 ∗ f1 for any f1 , f2 ∈ L1 (G). Exercise 3.93. Recall from Exercise 3.33 that the space of signed measures on G is a Banach space. Define a convolution for elements in the space of signed measures on G, and extend Proposition 3.91 to this case. †

The assumption that G is abelian actually implies this by the uniqueness properties of the Haar measure (see Proposition 10.2).

120

3 Hilbert Spaces, Fourier Series, and Unitary Representations

Exercise 3.94. Show that there is a continuous injective algebra homomorphism from the Banach algebra ℓ1 (Z) (which may be thought of as L1 (Z) with respect to the counting measure on Z and convolution) to C(T), where multiplication is pointwise multiplication.

3.6 Further Topics • Hilbert spaces are at the heart of many developments. We will start to see this in the context of Sobolev spaces and the Laplace differential operator in Chapters 5 and 6. • Another case where the Hilbert space splits into eigenspaces will be considered in Chapter 6. • The spectral theory of a single unitary operator (equivalently, a unitary representation of the group G = Z) is actually more delicate than the case considered above (see Exercise 6.1, for example), where we showed that the Hilbert space splits into a sum of generalized eigenspaces (the weight spaces). We will treat the case of a single unitary operator only in Chapter 9 (which will build on the material in Chapters 7 and 8). • The topic of Fourier series on Td leads naturally to the study of the Fourier integral on Rd (see Section 9.2). The concepts of Fourier series and Fourier integrals on Td and Rd , respectively, find a common generalization in the theory of Pontryagin duality (see Section 12.8). • The case of unitary representations for compact abelian groups considered in this chapter was quite straightforward and is only the beginning of the important theory of unitary representations of locally compact groups. For locally compact abelian groups this is strongly related to Pontryagin duality; see Sections 11.4 and 12.8. For compact groups the main theorem in this direction is the Peter–Weyl theorem [85] (which is covered in Folland [32]). For many other groups that are neither abelian nor compact this topic is also important and can have many interesting surprises. • One such surpise may be the so-called property (T) that was introduced by Kaˇzdan in 1967 and has become important in many parts of mathematics since then. Building on the material in Chapter 9 we will study this notion in Section 10.3. • We have seen in this chapter that the notion of a left Haar measure leads to many interesting concepts. For a concretely given group it is often not difficult to find its left Haar measure. In Chapter 10 (which relies on Chapter 7) we will prove the existence of the left Haar measure in general. The reader may continue with Chapter 4, 5, 6, or 7 (with some of the material of Chapter 6 building on Chapter 5).

Chapter 4

Uniform Boundedness and the Open Mapping Theorem

In this chapter we present the main consequences of completeness for Banach spaces.

4.1 Uniform Boundedness Our first result is the principle of uniform boundedness or the Banach– Steinhaus theorem. Theorem 4.1 (Banach–Steinhaus). Let X be a Banach space and let Y be a normed vector space. Let {Tα | α ∈ A} be a family of bounded linear operators from X to Y . Suppose that for each x ∈ X, the set {Tα x | α ∈ A} is a bounded subset of Y . Then the set {kTα k | α ∈ A} is bounded. The reader in a hurry may also first prove the Baire category theorem (Theorem 4.12) and derive Theorem 4.1 relatively quickly from it (see Exercise 4.16). We refrain from doing this here as it might help her to see the argument behind the Baire category theorem once here in the concrete application and once in the general case. Proof of Theorem 4.1. Assume first that there is an open ball Bε (x0 ) on which {Tα x | α ∈ A} is uniformly bounded: that is, there is a constant K such that kx − x0 k < ε =⇒ kTα xk 6 K

(4.1)

for all α ∈ A. Then we claim that it is possible to find a bound on the family {kTα k | α ∈ A} of the norms of the operators. Indeed, for any y 6= 0 define ε z= y + x0 . 2kyk © Springer International Publishing AG 2017 M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications, Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_4

121

122

4 Uniform Boundedness and the Open Mapping Theorem

Then z ∈ Bε (x0 ) by construction, so (4.1) implies that kTα zk 6 K. Now by linearity of Tα the triangle inequality shows that

ε

ε

kTα yk − kTα x0 k 6 T y + T x α α 0 = kTα zk 6 K,

2kyk 2kyk which can be solved for kTα yk to give kTα yk 6 2

K + kTα x0 k K + K′ kyk 6 2 kyk, ε ε

where K ′ = supα kTα x0 k 6 K < ∞. It follows that kTα k 6

4K ε

for every α ∈ A, as required. To finish the proof we have to show that there is a ball on which property (4.1) holds. This is proved by contradiction. Assume that there is no ball on which (4.1) holds. Fix an arbitrary open ball B0 . By assumption there is a point x1 ∈ B0 such that kTα1 x1 k > 1 for some index α1 ∈ A. Since each Tα is continuous, there is a ball Bε1 (x1 ) with kTα1 yk > 1 for all y ∈ Bε1 (x1 ). Assume without loss of generality that Bε1 (x1 ) ⊆ B0 and ε1 < 1. By assumption, in this new ball the family {Tα x | α ∈ A} is not bounded, so there is a point x2 ∈ Bε1 (x1 ) with kTα2 x2 k > 2 for some index α2 ∈ A. We continue in the same way. By continuity of α2 there is a ball Bε2 (x2 ) with Bε2 (x2 ) ⊆ Bε1 (x1 ) and with kTα2 yk > 2 for all y ∈ Bε2 (x2 ). Assume without loss of generality that ε2 < 21 . Repeating this process produces points x1 , x2 , . . . , indices α1 , α2 , . . . , and positive numbers ε1 , ε2 , . . . such that Bεn (xn ) ⊆ Bεn−1 (xn−1 ), εn < n1 , and kTαn yk > n for all y ∈ Bεn (xn ) for all n > 1. Now the sequence (xn ) is clearly Cauchy (since xm ∈ Bεn (xn ) for all m > n, and so d(xm , xn ) < εn < 1/n), and therefore converges to some z ∈ X. By construction, z ∈ Bεn (xn ) and kTαn zk > n for all n > 1, which contradicts the hypothesis that the set {Tα z | α ∈ A} is bounded.  Corresponding to the operator norm defined in Lemma 2.52 there is of course a notion of convergence in the space B(X, Y ) of bounded linear operators from X to Y . A sequence (Tn ) in B(X, Y ) is uniformly convergent to T ∈ B(X, Y ) if kTn − T k → 0 as n → ∞ (so uniform convergence of a sequence of operators is simply convergence in the operator norm).

4.1 Uniform Boundedness

123

A different (and weaker, despite the name) notion of convergence for a sequence of operators is given by the following definition. We will discuss this and other notions of convergence again in Section 8.3. Definition 4.2. A sequence (Tn ) in B(X, Y ) is strongly convergent if for any x ∈ X the sequence (Tn x) converges in Y . If there is a T ∈ B(X, Y ) with limn→∞ Tn x = T x for all x ∈ X, then (Tn ) is strongly convergent to T . Corollary 4.3. Let X be a Banach space, and Y any normed vector space. If a sequence (Tn ) in B(X, Y ) is strongly convergent, then there exists an operator T ∈ B(X, Y ) such that (Tn ) is strongly convergent to T . Proof. For each x ∈ X the sequence (Tn x) is bounded since it is convergent. By the uniform boundedness principle (Theorem 4.1), there is a constant K such that kTn k 6 K for all n. Hence kTn xk 6 Kkxk for all x ∈ X.

(4.2)

We now define T : X → Y by T x = limn→∞ Tn x for all x ∈ X. It is clear that T is linear, and (4.2) shows that kT xk 6 Kkxk for all x ∈ X, so T is bounded. The construction of T means that (Tn ) converges strongly to T .  We note that the conclusions of Theorem 4.1 and of Corollary 4.3 crucially rely on the assumption that X is a Banach space (see also Exercise 4.6). (For Corollary 4.3 it is also crucial that we have restricted our attention to sequences.) Exercise 4.4. Prove that uniform convergence implies strong convergence, and find an example of a sequence of bounded operators from a Banach space into a Banach space to show that strong convergence does not imply uniform convergence. Exercise 4.5. Phrase the definition of a unitary representation of a metric group (Definition 3.73) using the notion of strong convergence for operators. Exercise 4.6. Let cc (N) ⊆ ℓ∞ (N) be the space of sequences with finite support equipped with the supremum norm. Define T : cc (N) → cc (N) by T (x1 , x2 , x3 , . . .) = (x1 , 2x2 , 3x3 , . . .) for all (x1 , x2 , x2 , . . .) ∈ cc (N). Show that T is not bounded. Construct a sequence of bounded linear operators Tk on cc (N) with Tk x → T x as k → ∞ for all x in cc (N).

4.1.1 Uniform Boundedness and Fourier Series This section gives an application of Theorem 4.1 to classical Fourier analysis on T (see Section 3.4 for the background). Recall that if f ∈ C(T) then the Fourier coefficients of f are defined by am = hf, χm i where χm (x) = e2πimx , for m ∈ Z and x ∈ T. The nth partial sum of the Fourier series is

124

4 Uniform Boundedness and the Open Mapping Theorem

sn (x) =

n X

am e2πimx .

m=−n

Recall that one of the basic goals of Fourier analysis is to clarify the relationship between the sequence of partial sums (sn ) and the function f . That is, to understand in what sense does the function sn approximate f for large n (if it does at all). We now ask if the sequence of functions (sn ) converges uniformly or pointwise to f for f ∈ C(T). Recall from Definition 3.61 that the Dirichlet kernel Dn is defined by Dn (x) =

n X

e2πikx =

k=−n

sin((n + 12 )2πx) sin(πx)

for x ∈ T. By the discussion in Section 3.4.2 we have Z Z sn (0) = f (x)Dn (−x) dx = f (x)Dn (x) dx. T

T

Lemma 4.7. The linear functional Tn : C(T) → R defined by Z Tn f = f (x)Dn (x) dx T

is bounded, with kTn k =

Z

T

|Dn (x)| dx.

This is a very special case of the general argument in Lemma 2.63, but we include it for the case at hand as this is easier to prove. Proof. For any function f ∈ C(T) we have Z Z |Tn f | 6 |f (x)||Dn (x)| dx 6 kf k∞ |Dn (x)| dx, T

T

so kTn k 6

Z

T

|Dn (x)| dx.

Fix δ > 0. Since Dn is analytic it can only have finitely many sign changes in [0, 1]. Therefore, we may find a continuous (this could even be chosen to be piecewise-linear, for example) function fn with kfn k∞ 6 1 that differs from sign(Dn (x)) only on a finite union of intervals whose total length is less than kDn1k∞ δ. The triangle inequality for integrals now gives Z Z fn (x)Dn (x) dx > |Dn (x)|dx − 2δ, T

T

which proves the lemma as δ > 0 was arbitrary.



4.1 Uniform Boundedness

125

Lemma 4.8. The Dirichlet kernel Dn from Definition 3.61 satisfies Z Z sin((n + 21 )2πx) dx −→ ∞ |Dn (x)| dx = sin(πx) T T as n → ∞.

Proof. Recall that | sin t| 6 |t| for all t ∈ R. It follows that Z Z 1 sin((n + 12 )2πx) 1 dx > | sin((2n + 1)πx)| dx. sin(πx) πx 0

T

Now | sin t| >

1 2

for all t ∈ πZ + [ π6 , 5π 6 ]. In particular, it follows that if (2n + 1)πx ∈

2n [

[(k + 16 )π, (k + 56 )π]

k=0

then | sin((2n + 1)πx)| > 12 . Together this gives Z 2n Z (k+ 5 )π/(2n+1) X 6 sin((n + 21 )2πx) 1 1 dx > dx 5 1 sin(πx) π(k + 6 )/(2n + 1) 2 T k=0 (k+ 6 )π/(2n+1) =

as n → ∞.

2n 1 X✘ 2n✘ +✘ 1 46 π −→ ∞ 2π 2n✘ +✘ 1 k + 56 ✘ k=0



Theorem 4.9. There exists a continuous function f ∈ C(T) whose Fourier series diverges at x = 0. Proof. As noted before Lemma 4.7, we have Tn f = sn (0) for all f ∈ C(T). Moreover, for a fixed f ∈ C(T), if the Fourier series of f converges at 0, then the family {Tn f | n > 1} is bounded (since each element is just a partial sum of a convergent series). Thus if the Fourier series of f converges at 0 for all f ∈ C(T), then for each f ∈ C(T) the set {Tn f | n > 1} is bounded. By Theorem 4.1, this implies that the set {kTn k | n > 1} is bounded, which contradicts Lemmas 4.7 and 4.8. It follows that there must be some f ∈ C(T) whose Fourier series does not converge at 0 (and in fact the partial sums must be unbounded).  In principle the proofs of Theorem 4.1 and Theorem 4.9 allow one to construct the function f as in Theorem 4.9 more concretely, at least as the limit of a Cauchy sequence of explicit continuous functions. Comparing Theorem 4.9 with the absolute convergence claim in Theorem 3.57 and the result

126

4 Uniform Boundedness and the Open Mapping Theorem

regarding the Fej´er kernel in Proposition 3.65, we see that this limit function is not continuously differentiable and that the Fourier series of f at 0 is an oscillating function with the property that the C´esaro averages of the diverging sequence (sn (0)) actually converge to f (0).

4.2 The Open Mapping and Closed Graph Theorems Recall that a continuous map has the property that the pre-image of any open set is open, but in general the image of an open set is not open. We now show that bounded linear maps between Banach spaces on the other hand have the following special property. Theorem 4.10 (Open mapping theorem). Let X and Y be Banach spaces, and let T be a bounded linear map from X onto Y . Then T maps open sets in X onto open sets in Y . The assumption that X maps onto Y is essential. Consider, for example, the projection (x, y) 7→ (x, 0) from R2 → R2 to see this. The proof of Theorem 4.10 uses the Baire category theorem,(13) which states that a complete non-empty metric space cannot be written as a countable union of nowhere dense subsets. 4.2.1 Baire Category Definition 4.11. A subset S ⊆ X of a metric space (X, d) is said to be nowhere dense if for every point x ∈ S, and for every ε > 0, Bε (x) ∩ (X\S) is non-empty (equivalently, if (S)o = ∅). A set is called meagre or first category if it is a countable union of nowhere dense sets. We will think of a nowhere dense set as being small and want to extend this to meagre sets. The next result is needed as a justification of that interpretation. Theorem 4.12 (Baire category theorem). A complete non-empty metric space cannot be written as a countable union of nowhere dense sets. Indeed, the complement of a countable union of nowhere dense sets is dense. This is often described by saying that a complete metric space is of second category. As we will see, the method of proof is similar to the proof of Theorem 4.1 (see also Exercise 4.16). We note that in a normed linear space the closed unit ball B r (x) coincides with the closure Br (x) of the open ball. However, in a metric space they may be entirely different: if any set X is given the discrete metric defined by d(x, y) = 1 if x 6= y and 0 if x = y, then B 1 (x) = X while B1 (x) = {x} for any x ∈ X.

4.2 The Open Mapping and Closed Graph Theorems

127

Proof of Theorem 4.12. Let X be a complete non-empty metric space, and suppose that (Xj ) is a sequence of nowhere dense subsets of X (that is, the sets Xj all have empty interior for j = 1, 2, . . .). Fix an arbitrary ball Bε (x0 ) with ε > 0 and x0 ∈ X. Since X1 does not contain Bε (x0 ), there must be a point x1 in Bε (x0 ) with x1 ∈ / X1 . It follows that there is some r1 > 0 such that the closed ball B r1 (x1 ) = {y ∈ X | d(y, x1 ) 6 r1 } satisfies B r1 (x1 ) ⊆ Bε (x0 ) and B r1 (x1 ) ∩ X1 = ∅. Assume without loss of generality that r1 < 1. Similarly, there is some x2 ∈ X and r2 > 0 such that B r2 (x2 ) ⊆ Br1 (x1 ), and B r2 (x2 ) ∩ X2 = ∅, and without loss of generality r2 < 12 . Notice that B r2 (x2 ) ∩ X1 = ∅ since B r2 (x2 ) ⊆ Br1 (x1 ). Inductively, we construct a sequence of decreasing closed balls B rn (xn ) such that B rn (xn ) ∩ Xj = ∅ for 1 6 j 6 n, and rn → 0 as n → ∞. It follows that (xn ) is a Cauchy sequence, and the limit z lies in the intersection of all the closed balls B rn (xn ), so z ∈ / Xj for all j > 1. This implies that [ z ∈ Bε (x0 )r Xj 6= ∅, j>1

which gives the result since ε > 0 and x0 ∈ X were arbitrary.



Exercise 4.13. Prove the Baire category theorem for compact topological spaces (that is, without the assumption that the space is metric).

By taking complements we can also phrase the Baire category theorem in terms of Gδ -sets. Definition 4.14. A countable intersection of open sets in a topological space is called a Gδ -set. Corollary 4.15 (Baire category theorem). Let (X, d) be a complete metric space, and assume T∞ that Gn ⊆ X is a dense Gδ -set for each n > 1. Then the intersection n=1 Gn is also a dense Gδ -set. T∞ Proof. By assumption we can write each Gn in the form Gn = k=1 On,k , where each On,k is open and dense. It follows that ∞ \

n=1

Gn =

∞ \ ∞ \

On,k

n=1 k=1

is a Gδ -set, and that it is sufficient to consider the case where each Gn = On is open and dense. In that case, Xn = XrGn is closed and so Bε (x) ∩ (XrXn ) = Bε (x) ∩ (XrXn ) = Bε (x) ∩ Gn 6= ∅ for any open ball Bε (x) since Gn is dense. Therefore,SXn is nowhere dense ∞ for each n > 1. By T∞Theorem 4.12 the complement of n=1 Xn is dense, and this is precisely n=1 Gn , by construction. 

128

4 Uniform Boundedness and the Open Mapping Theorem

Exercise 4.16. Prove the Banach–Steinhaus theorem (Theorem 4.1) using the Baire category theorem (Theorem 4.12).

Let us mention that the notion of a dense Gδ -set is the topological version of being a ‘large’ set, while a set is measure-theoretically ‘large’ if its complement is a null set. Both notions of being large share similar features, and in particular a countable intersection of large sets in either sense is also large.(14) However, these two notions are quite different. Example 4.17 shows how to construct topologically large sets that are measure-theoretically small, and vice-versa. Example 4.17. For every ε > 0 there exists an open set Oε ⊆ R which contains Q and has Lebesgue measure less than ε. This may be found, for example, by listing the elements of Q as {x1 , x2 , . . . } and setting [ Oε = Bε/2k+2 (xk ). k>1

T

Then G = n>1 O1/n is a dense Gδ and a null set, and its complement RrG is meagre and of full measure. The Baire category theorem can be used to show the existence of elements of a complete metric space with certain properties. If the set of elements of a complete space which do not satisfy the property can be obtained as a countable union of nowhere dense sets, there must be elements that satisfy the property (indeed, there exists a dense set of such elements). Exercise 4.18. (a) Assume that X and Y are metric spaces. Show that for any f : X → Y the set {x ∈ X | f is continuous at x} is a Gδ -set. (b) Show that the map f : R → R defined by f (x) =

(

1 q

0

if x =

p q

∈ Q;

if x ∈ RrQ

is continuous at each irrational point and is not continuous at each rational, where we assume that pq is written in lowest terms and has q > 1. (c) Use (a) to show that no function could have the reverse properties of the function in (b).



Exercise 4.19. Show that f ∈ L1 ((0, 1)) | kf |(a,b) k∞ = ∞ whenever 0 6 a < b 6 1 contains a dense Gδ -subset of L1 ((0, 1)).

Exercise 4.20. Show that the set of functions in C([0, 1]) that are nowhere differentiable contains a dense Gδ -set.

4.2.2 Proof of the Open Mapping Theorem Recall that we write BrX and BrY for the open balls of radius r and centre 0 in X and Y , respectively.

4.2 The Open Mapping and Closed Graph Theorems

129

Lemma 4.21. Assume that X is a normed vector space, Y is a Banach space, and T : X → Y is a bounded, surjective linear operator. For any ε > 0, there is a δ > 0 such that T BεX ⊇ BδY . (4.3) Proof. Since X=

∞ [

nBεX ,

n=1

S∞ and T is onto, we have Y = T (X) = n=1 nT BεX . By the Baire category theorem (Theorem 4.12) applied to Y it follows that, for some n, the set nT BεX contains some ball BrY (z) in Y . Then, by linearity, T BεX must contain the ball BδY (y), where y = n1 z and δ = n1 r. We note that Y BδY (y) − BδY (y) = {y1 − y2 | y1 , y2 ∈ BδY (y)} = B2δ X and similarly B2ε = BεX − BεX . Therefore, Y X B2δ ⊆ T BεX − T BεX ⊆ T B2ε

and (4.3) follows.



The above lemma only used the fact that Y is a Banach space. Using the hypothesis that X is also a Banach space, we are able to prove the main step towards the theorem in the following lemma. Lemma 4.22. Let T : X → Y be as in Theorem 4.10. For any ε > 0 there is a δ > 0 such that T BεX ⊇ BδY . (4.4) P∞ Proof. Choose a sequence (εn ) with each εn > 0 and with n=1 εn < ε. By Lemma 4.21 there is a sequence (δn ) of positive numbers such that T BεXn ⊇ BδYn

(4.5)

for all n > 1. Without loss of generality, assume that δn → 0 as n → ∞. (Actually this holds unless Y is very special indeed.) Now let δ = δ1 . Let y be any point in BδY = BδY1 . By (4.5) there is a point x1 ∈ BεX1 such that T x1 is as close to y as we wish, say with ky − T x1 k < δ2 . Since y − T x1 ∈ BδY2 , the inclusion (4.5) with n = 2 implies that there exists a point x2 ∈ BεX2 such that ky − T x1 − T x2 k < δ3 . Continuing, we obtain a sequence (xn ) in X such that kxn k < εn for all n, and

! n

X

xk < δn+1 . (4.6)

y − T

k=1

130

4 Uniform Boundedness and the Open Mapping Theorem

This argument may be paraphrased Pn−1 as follows. At each stage we approximate the current element y − T k=1 xk in Y up to an error δn+1 that we know can be dealt with later. This pushes the problem along until it ultimately vanishes in the limit. P Since kxn k < εn , the series n is absolutely convergent, so by n xP Lemma 2.28 it is convergent; write x = n xn . Then kxk 6

∞ X

n=1

kxn k 6

∞ X

εn < ε.

n=1

The map T is continuous, so (4.6) shows that y = T x, since δn → 0 as n → ∞. That is, for any y ∈ BδY we have found a point x ∈ BεX such that T x = y, proving (4.4).  Proof of Theorem 4.10. Let O ⊆ X be a non-empty open subset and let x be an element of O. Then there is a ball BεX such that x + BεX ⊆ O. By Lemma 4.22, T BεX ⊇ BδY for some δ > 0. Hence T (O) ⊇ T (x + BεX ) = T x + T (BεX ) ⊇ T x + BδY . To summarize, we have shown that for every x ∈ O the point T x is in the interior of T (O).  Exercise 4.23. Let X and Y be Banach spaces and T : X → Y a bounded operator. Show that the following three conditions are equivalent: (a) T is surjective; (b) T is open; (c) there exists a dense subspace Y ′ ⊆ Y and a constant c > 0 such that for any y ∈ Y ′ there exists some x ∈ X with T x = y and kxkX 6 ckykY .

4.2.3 Consequences: Bounded Inverses and Closed Graphs As an application of Theorem 4.10, we establish a general property of inverse maps. As is standard, T : X → Y means that T is defined on all of X, but sometimes it is convenient (or necessary) to permit an operator T to only be defined on a domain DT which is then a (possibly proper) subspace of X. Definition 4.24. Let T : X → Y be an injective linear operator. Define the inverse of T , denoted T −1 , by requiring that T −1 y = x if and only if T x = y. Then the domain of T −1 is the linear subspace T X ⊆ Y , and T −1 is a linear operator on its domain. Clearly T −1 T x = x for all x ∈ X, and T T −1y = y for all y in the domain of T −1 . We also say that T −1 is a left inverse of T .

4.2 The Open Mapping and Closed Graph Theorems

131

Proposition 4.25 (Bounded inverse). Let X and Y be Banach spaces, and let T be a bijective bounded operator from X to Y . Then T −1 is also a bounded operator. Proof. Since T −1 is a linear operator, we only need to show it is continuous (which is equivalent to boundedness by Lemma 2.52). By Theorem 4.10, T maps open sets onto open sets. For the map T −1 this shows that the preimage (T −1 )−1 (O) = T (O) of an open set O ⊆ X is open in Y . Therefore, T −1 is continuous.  Corollary 4.26 (Equivalent norms). If X is a Banach space with respect to two norms k · k(1) and k · k(2) and there is a constant K such that kxk(1) 6 Kkxk(2) , for all x ∈ X, then the two norms are equivalent. That is, there is another constant K ′ > 0 with kxk(2) 6 K ′ kxk(1) for all x ∈ X. Proof. Consider the map T : x 7→ x from (X, k · k(2) ) to (X, k · k(1) ). By assumption, T is bounded, so by Proposition 4.25, T −1 is also bounded, giving the bound in the other direction.  Definition 4.27. Let T be a linear operator from a normed linear space X into a normed linear space Y , with domain a linear subspace DT ⊆ X. The graph of T is the set GT = {(x, T x) | x ∈ DT } ⊆ X × Y. If GT is a closed subspace of X × Y then T is a closed operator. Notice as usual that this notion becomes trivial in finite dimensions in the following sense. If X and Y are finite-dimensional, then the graph of T is simply some linear subspace, which is automatically closed. Also it is easy to see that a continuous operator has a closed graph. The next theorem — the converse — is called the closed graph theorem. Notice that this converse is not a purely topological fact. For instance, the set consisting of the graph of the hyperbola xy = 1 and the origin is the closed graph of a discontinuous function f : R → R. Theorem 4.28 (Closed graph theorem). Let X and Y be Banach spaces, and T : X → Y a linear operator with DT = X. If T is closed, then it is continuous.

132

4 Uniform Boundedness and the Open Mapping Theorem

Proof. Fix the norm k(x, y)k = kxkX + kykY on X × Y . The graph GT is, by hypothesis, a closed subspace of X × Y , so GT is itself a Banach space. Consider the projection P : GT → X defined by P (x, T x) = x. Then P is clearly bounded, linear, and bijective. It follows by Proposition 4.25 that P −1 is a bounded linear operator from X to GT , so k(x, T x)k = kP −1 xk 6 KkxkX for all x ∈ X, for some constant K. It follows that kxkX + kT xkY 6 KkxkX for all x ∈ X, so T is bounded, and hence continuous by Lemma 2.52.  Exercise 4.29 (Hellinger–Toeplitz theorem). Suppose that A : H → H is a linear operator on a Hilbert space H that is self-adjoint in the sense that hAx, yi = hx, Ayi for all x, y ∈ H. Show that this implies A is bounded.

Corollary 4.30. Let (X, µ) be a σ-finite measure space and g : X → C a measurable function. If T : f 7→ gf maps L2µ (X) to L2µ (X), then g ∈ L∞ µ (X) and kT k = kgk∞ . Proof. Notice that the hypotheses in the statement do not require that the map is continuous, but simply ask that the range lies in L2µ (X). However, if (fn , gfn ) has fn → f and gfn → ψ as n → ∞ in L2µ (X), (that is, a sequence in the graph that converges to (f, ψ)), then we can extract a subsequence along which both convergences hold µ-almost everywhere. Along this subsequence gfn converges almost everywhere to gf and to ψ, so that gf = ψ ∈ L2µ (X), and hence (f, ψ) also lies in the graph of T . It follows that T is closed, and hence continuous by Theorem 4.28. Knowing now that T is bounded, there is a constant C = kT k > 0 such that kgf k2 6 Ckf k2 for any f ∈ L2µ (X). Let XC = {x ∈ X | |g(x)| > C}, which we claim is a null set. Assuming the opposite, let B ⊆ XC be a measurable subset of positive finite measure and let f = 1B be its characteristic function. Then Z 2 C µ(B) < |g|2 |f |2 dµ = kgf k22 6 C 2 kf k22 = C 2 µ(B) B

gives a contradiction, which implies that µ(XC ) = 0. Hence |g| is almost everywhere less than or equal to C, and in particular g ∈ L∞ µ (X). Moreover, kgk∞ 6 kT k and the opposite inequality follows directly from the definition. 

4.3 Further Topics

133

Some important and natural operators are unbounded. An example is the derivative operator D0 : f 7→ f ′ considered on L2 ((0, 1)), which is originally defined on the dense subset {f ∈ C 1 ((0, 1)) | f, f ′ ∈ L2 ((0, 1))}. It is not continuous, but as we will see in Chapter 5, considering the closure of its graph gives a closed (but still unbounded) weak differential operator defined on a dense subspace of L2 . In particular, we see that closed operators provide a suitable framework for studying unbounded operators. The closed graph theorem says that these generalized operators, namely closed operators, are usually only defined on a proper subset of the first Banach space unless they actually are bounded.

4.3 Further Topics The above consequences of completeness are useful in several different areas. We indicate below a few applications and extensions of these results. • As already mentioned, in the next two chapters we will study partial differential operators as examples of closed operators. We can do this without developing any theory on unbounded operators, by simply studying the graphs of these operators. • As we have seen the Banach–Steinhaus theorem (Theorem 4.1) has interesting consequences for the notion of strong convergence (Corollary 4.3). We will discuss the corresponding topology again in Section 8.3. • We will encounter multiplication operators as in Corollary 4.30 again in Chapter 9, 12, and 13 • The notion of closed operators is the starting point of the theory of self-adjoint unbounded operators, which we will study systematically in Chapter 13. The reader may continue with Chapter 5, 6, or 7 (with some of the material of Chapter 6 building on Chapter 5).

Chapter 5

Sobolev Spaces and Dirichlet’s Boundary Problem

Using the theory of Fourier series developed in Section 3.4, we will now develop the notion of Sobolev spaces and prove the Sobolev embedding theorem. Sobolev spaces combine familiar notions of smoothness (that is, differentiability) with bounds on Lp norms. We will set p = 2 and so will have all the tools of Hilbert spaces at our disposal, but the theory can be extended to all p > 1. The Sobolev embedding theorem and elliptic regularity for the Laplace operator will allow us to prove in Section 5.3 the existence of solutions to the Dirichlet boundary value problem introduced in Section 1.2.

5.1 Sobolev Spaces and Embedding on the Torus In this section we are going to construct Hilbert spaces of functions on Td depending on a parameter k ∈ N0 . Unlike the equivalence classes f ∈ L2 (Td ), these functions may be continuous or even differentiable, depending on k (see Section 5.1.2). 5.1.1 L2 Sobolev Spaces on Td Definition 5.1. Let k > 0 be an integer. We (initially) define the (L2 ) Sobolev space H k (Td ) to be the closure of C ∞ (Td ) inside M V = L2 (Td ), kαk1 6k

where the direct sum runs over all multi-indices α ∈ Nd0 with kαk1 6 k and a function f ∈ C ∞ (Td ) is identified with the tuple φk (f ) = (∂α f )kαk1 6k ∈ V . In order to make this definition more palatable, we now describe some special cases. © Springer International Publishing AG 2017 M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications, Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_5

135

136

5 Sobolev Spaces and Dirichlet’s Boundary Problem

(1) If k = 0, then there is only the multi-index α = 0 and so H 0 (Td ) is the closure of C ∞ (Td ) in L2 (Td ) with respect to k · k2 . As we have seen, C ∞ (Td ) is dense in C(Td ) with respect to k · k∞ (indeed, the trigonometric polynomials are already dense) and C(Td ) is dense in L2 (Td ) with respect to k · k2 , so we obtain H 0 (Td ) = L2 (Td ). We will also write kf kH 0 = kf k2 for f ∈ H 0 (Td ). (2) Now let k = 1, and in this case there are d + 1 multi-indices. Hence H 1 (Td ) = φ1 (C ∞ (Td )) d+1 is the closure of φ1 (C ∞ (Td )) in L2 (Td ) , where we used the embedding φ1 : f 7→ (f, ∂1 f, . . . , ∂d f ) ∈ V . So, by our definition, elements of H 1 (Td ) are (d + 1)-tuples of functions on Td . In order to be able to think of these as single functions on Td (which is how we will think of Sobolev spaces), notice that the last d terms of the (d+1)-tuple are uniquely determined by the first term. This is clear for φ1 (f ) with f ∈ C ∞ (Td ), but also remains true in the closure H 1 (Td ), as we show next. Lemma 5.2 (Fourier series of weak derivatives). Suppose that the vector (f, f1 , . . . , fd ) belongs to H 1 (Td ) and the Fourier series of f is given by X f= cn χn . n∈Zd

Then fj =

X

2πinj cn χn .

(5.1)

n∈Zd

Proof. For f ∈ L2 (Td ) and n ∈ Zd , write an (f ) for the nth Fourier coefficient. We start with the formula  an ∂j f = h∂j f, χn i = 2πinj hf, χn i = 2πinj an (f ) for all n ∈ Zd and all f ∈ C ∞ (Td ), see (3.18). Using continuity of the inner product and the definition of H 1 (Td ), this formula automatically extends to all (f, f1 , . . . , fd ) ∈ H 1 (Td ). Expanding fj into its Fourier series (see Theorem 3.54) gives the lemma. 

The lemma now shows in full generality that the first component f of any element (f, f1 , . . . , fd ) ∈ H 1 (Td ) determines all the other components. Thus we can identify an element of H 1 (Td ) with the associated element f ∈ L2 (Td ), and will write f ∈ H 1 (Td ) and ∂ j f = fj ∈ L2 (Td ) for j = 1, . . . , d. We will also call the other components ∂ j f weak derivatives (this will be further justified in Section 5.2), as these generalize the notion of partial derivative for smooth functions. However, the norm associated to f ∈ H 1 (Td ) is

5.1 Sobolev Spaces and Embedding on the Torus

kf kH 1

137

v u d X u ∂ j f k22 . = tkf k22 + k∂ j=1

Summarizing the above discussion for the case k = 1, we may interpret H 1 (Td ) as the domain (and, in the original definition, as the graph) d ∂ x1 f, . . . , ∂ xd f ) ∈ L2 (Td ) . This of the closed operator ∇ : H 1 (Td ) ∋ f 7→ (∂ discussion generalizes to any k > 1 as follows. Proposition 5.3 (Forgetting regularity, weak derivative). Fix k and ℓ with 0 6 ℓ < k. Then the identity map on C ∞ (Td ) uniquely extends to an injective continuous operator ık,ℓ : H k (Td ) −→ H ℓ (Td ) of norm one. Moreover, for every j ∈ {1, . . . , d} the partial derivative extends uniquely to a continuous operator ∂ j : H k (Td ) −→ H k−1 (Td ) with norm less than or equal to one. Finally, (3.18) holds similarly for all f in H k (Td ) and for all α ∈ Nd0r{(0, . . . , 0)} with kαk1 6 k. Proof. For the first claim consider the map M M πk,ℓ : L2 (Td ) −→ L2 (Td ) kαk1 6k

kαk1 6ℓ

(fα )kαk1 6k 7−→ (fα )kαk1 6ℓ

and notice that πk,ℓ (φk (f )) = φℓ (f ) for all f ∈ C ∞ (Td ). Therefore the extended map ık,ℓ is simply the restriction of this projection to H k (Td ), and so has norm less than or equal to one. Using constant functions we see that the norm of ık,ℓ is equal to one. Injectivity will follow from the last claim of the proposition. For the second claim, regarding the operator ∂ j : H k (Td ) → H k−1 (Td ), we modify the argument above as follows. Consider the projection map M M πj : L2 (Td ) −→ L2 (Td ) kαk1 6k

kαk1 6k−1

(fα )kαk1 6k 7−→ (fα+ej )kαk1 6k−1

which clearly has norm one. Figure 5.1 illustrates the difference between the projection πk,ℓ and the projection πj in a simple example. For f ∈ C ∞ (Td ) we see that πj (φk (f )) = φk−1 ∂ej (f ) , which (as above) shows that the restriction of πj to H k (Td ) is the desired operator ∂ j : H k (Td ) → H k−1 (Td ). The final claim of the proposition follows from the description of the Fourier series of the weak derivative in Lemma 5.2 for k = 1 and induction.  Now justified by Proposition 5.3, we identify an element f = (fα )kαk1 6k in H k (Td ) with its first component f0 in L2 (Td ). The other components are

138

5 Sobolev Spaces and Dirichlet’s Boundary Problem

Fig. 5.1: With d = 2, ℓ = 2, k = 3 and j = 1 the projection defining ık,ℓ corresponds to the set of multi-index α highlighted on the left. The projection defining ∂ j corresponds to the set highlighted on the right.

identified with the weak derivative, so we write f = f0 = ∂ 0 f ∈ H k (Td ) ⊆ L2 (Td ), ∂ α f = fα ∈ L2 (Td ) for all multi-indices α with kαk1 6 k. In this notation our norm becomes s X ∂ α f k22 . kf kH k = k∂ kαk1 6k

5.1.2 The Sobolev Embedding Theorem on Td As we have seen in the discussion above, each of the spaces H k (Td ) consists of certain L2 functions on Td . For k = 0 we have H 0 (Td ) = L2 (Td ). A natural question for k > 1 is to ask which functions in L2 (Td ) lie in H k (Td ). Using Fourier series we can give a formal answer to this, and this will have interesting and important consequences which will be discussed below. Another consequence of this lemma is that it makes it meaningful to define H k for k ∈ R by using the convergence property in the lemma as a definition — we will not pursue this further. k d Lemma 5.4 (Characterizing Let k > 0 P H (T ) by 2thed Fourier series). be an integer and let f = n∈Zd cn χn ∈ L (T ). Then f ∈ H k (Td ) if and only if X |cn |2 knk2k (5.2) 2 < ∞. n∈Zd

d d α Proof. For n = (n1 , . . . , nP d ) ∈ Z and α = (α1 , . . . , αd ) ∈ N0 we write n α1 αd k d for (n1 , . . . , nd ). If f = n∈Zd cn χn ∈ H (T ) then, by Proposition 5.3 and by Fourier series on the torus (Theorem 3.54),

(nα cn )n∈Zd ∈ ℓ2 (Zd ) for all α with kαk1 6 k. We apply this to α = ke1 , ke2 , . . . , ked and see that

5.1 Sobolev Spaces and Embedding on the Torus d X X

Using the bound knk2k 2 ≪

139

2 n2k j |cn | < ∞.

j=1

n∈Zd

d X

d n2k j for all n ∈ Z , we get (5.2) as required.

j=1

Conversely, assume (5.2). Then for any α ∈ Nd0r{(0, . . . , 0)} with kαk1 6 k we have |nα | 6 knkk2 , and so X 2 |(2πin)α cn | < ∞. n∈Zd

The characterization of convergence for a series involving P an orthonormal basis (Proposition 3.36) shows that every component of φk knk2 6N cn χn converges as N → ∞, and so we deduce that the images of these partial sums converge in H k (Td ). As the first component of the limit vector equals f , it follows that f ∈ H k (Td ).  Exercise 5.5. Show that H 1 (Td ) is a meagre subset of L2 (Td ).

The following theorem shows (in a more constructive manner than the previous exercise) how special the elements of the subset H k (Td ) within L2 (Td ) become once k is sufficiently large. If k > d2 , then any element of H k (Td ) agrees almost surely (and will be identified) with a continuous function. Increasing k further also gives some differentiability of this continuous function. Theorem 5.6 (Sobolev embedding on the torus). Let k and ℓ be nonnegative integers with k > ℓ + d2 . Then the inclusion map from C ∞ (Td ) to C ℓ (Td ) has a continuous extension to H k (Td ). In particular, any function f ∈ H k (Td ) has a uniquely defined continuous representative belonging to C ℓ (Td ) with kf kC ℓ ≪d kf kH k , where we also denote the continuous representative by f . The proof will show that most of the work has already been done. Proof of Theorem 5.6. Let us start with the case ℓ = 0. In this case we already know that q kf k∞ ≪d kf k22 + k∂ek1 f k22 + · · · + k∂ekd f k22

for f ∈ C ∞ (Td ) by Theorem 3.57. However, the square root on the right-hand side is bounded above by kf kH k , which shows that the inclusion map   ı : C ∞ (Td ), k · kH k −→ C(Td ), k · k∞ is a bounded operator. Since C(Td ) is a Banach space, this operator extends to H k (Td ) by Proposition 2.59. We still need to argue that this extension

140

5 Sobolev Spaces and Dirichlet’s Boundary Problem

really does select a continuous representative. For this, notice that the composition of the inclusion maps ı

C ∞ (Td ) −→ C(Td ) −→ L2 (Td ) is the inclusion map φ0 : C ∞ (Td ) → L2 (Td ), whose unique extension to H k (Td ) is ık,0 (see Proposition 5.3). Hence the composition of the constructed extension ı : H k (Td ) → C(Td ) with the inclusion into L2 (Td ) coincides with ık,0 and so ı(f ) ∈ C(Td ) is a representative of the equivalence class ık,0 (f ) ∈ L2 (Td ) associated to H k (Td ). Now let ℓ > 1 satisfy ℓ + d2 < k. Note (for example, by using the argument from Example 2.24(6)) that C ℓ (Td ) is a Banach space with the norm kf kC ℓ = max k∂γ f k∞ . kγk1 6ℓ

We apply the above to ∂γ f and obtain from Proposition 5.3 that k∂γ f k∞ ≪d k∂γ f kH k−ℓ 6 kf kH k for f ∈ C ∞ (Td ) and kγk1 6 ℓ. Therefore, kf kC ℓ ≪d kf kH k for f ∈ C ∞ (Td ), and the inclusion C ∞ (Td ) → C ℓ (Td ) once again gives rise to a bounded operator ıℓ : H k (Td ) → C ℓ (Td ). Composing with the inclusion map from C ℓ (Td ) to L2 (Td ) we again see that f ∈ H k (Td ) agrees almost everywhere with ıℓ (f ) ∈ C ℓ (Td ). 

5.2 Sobolev Spaces on Open Sets Much of the discussion in Section 5.1.1 regarding Sobolev spaces on Td can be quickly generalized to open subsets of Rd . However, in Section 5.1.2 we frequently made use of Fourier series in the arguments, so we will go through the definitions and elementary properties once again, without appealing to Fourier series. We will define spaces of functions on an open subset U ⊆ Rd that form Hilbert spaces, and in which the elements are once more continuous or even differentiable depending on a regularity parameter k (and the dimension d). We have a choice regarding the behaviour of the functions at the boundary ∂U of U ⊆ Rd , giving rise to two different Sobolev spaces for every k > 1.

5.2 Sobolev Spaces on Open Sets

141

Definition 5.7. Let d > 1 and k > 0 be integers, and let U ⊆ Rd be an open subset. Then the (L2 ) Sobolev space H k (U ) is defined† to be the closure of  (∂α f )α | f ∈ C ∞ (U ), ∂α f ∈ L2 (U ) for kαk1 6 k (5.3) L inside kαk1 6k L2 (U ), where as before we take the direct sum over all α ∈ Nd0 with kαk1 6 k. Even though the closure H k (U ) contains many new functions that are not in C ∞ (U ), those new elements (fα )kαk1 6k ∈ H k (U )

(5.4)

still have some of the properties of the elements in the subspace (5.3) used to define H k (U ). In fact, as we will show below, f = f0 determines all the other components fα of the vector (5.4), and these are derivatives of f in the following weaker sense (which, as we will see, turns integration by parts into the definition of a derivative). Definition 5.8. Suppose that α ∈ Nd0 and f, g ∈ L2 (U ). Then g is called a weak α-partial derivative (or an α-partial derivative in the sense of distributions) of f , written g = ∂ α f , if Z Z f ∂α φ dx = (−1)kαk1 gφ dx U

U

for all φ ∈ Cc∞ (U ). In the case α = ej , we will call this the weak jth partial derivative and write g = ∂ j f . We view the functions φ appearing in this definition (and similar instances) as ‘test functions’. Example 5.9. Let U = (−1, 1) ⊆ R and define the functions ( ( x if x > 0 1 if x > 0 f (x) = and g(x) = 0 if x < 0 0 if x < 0 for x ∈ (−1, 1). Then f has weak e1 -partial derivative g. In fact, for φ in Cc∞ ((−1, 1)) we have Z

1

f φ′ dx =

−1

as required. †

Z

0

1

1 Z xφ′ (x) dx = xφ(x) − 0

0

1

φ dx = 0 −

Z

1

gφ(x) dx,

−1

In the literature another notation that is used is W k,2 . The more general case of W k,p is defined similarly using Lp (U ) instead of L2 (U ).

142

5 Sobolev Spaces and Dirichlet’s Boundary Problem

Lemma 5.10 (Weak derivatives). Let U ⊆ Rd be open. A weak α-partial derivative of an L2 (U ) function f is uniquely determined as an element of L2 (U ) if it exists. If (fα )kαk1 6k ∈ H k (U ) then f = f0 has fα = ∂ α f as a weak α-partial derivative for α with kαk1 6 k. In particular, f = f0 determines all the elements of the vector in H k (U ). For convenience in this section we restrict attention at several points to real-valued functions. As a C-valued function f on U belongs to H k (U ) (or H0k (U )) for some k > 0 if and only if both ℜ(f ) and ℑ(f ) belong to that space, this is not a significant restriction. Proof. If g is a weak α-partial derivative of f then the inner product Z Z hg, φi = gφ dx = (−1)kαk1 f ∂α φ dx U

is determined by f for all φ ∈ Cc∞ (U ). As Cc∞ (U ) is dense in L2 (U ) (see Exercise 5.11 or Exercise 5.17(e)), we see that g is uniquely determined by f . Let f ∈ C ∞ (U ) with ∂ej f ∈ L2 (U ) and φ ∈ Cc∞ (U ). Then for every y ∈ Rd the set Ky = Supp φ∩(y+Rej ) is a compact subset of Uy = U ∩(y+Rej ) which is relatively open in y + Rej for every y ∈ Rd . Therefore Uy is a countable union of intervals and finitely many of these are sufficient to cover Ky . Using integration by parts for the integral along the jth coordinate xj gives Z Z f (x)∂ej φ(x) dxj = − ∂ej f (x)φ(x) dxj Uy

Uy

for any y ∈ U , where the boundary terms vanish since φ ∈ Cc∞ (U ). Integrating over the remaining variables shows that ∂ej f is indeed also a weak ej partial derivative. By induction on kαk1 , this implies that hf0 , ∂α φi = (−1)kαk1 hfα , φi L first for all (fα )kαk1 6k in the subspace (5.3) inside kαk1 6k L2 (U ), and then by continuity of the scalar product for all (fα )kαk1 6k ∈ H k (U ). This implies the lemma.  Essential Exercise 5.11. Let U ⊆ Rd be open. Show that Cc∞ (U ) ⊆ Lp (U ) is dense for any p ∈ [1, ∞). Definition 5.12 (Modifying Definition 5.7). Lemma 5.10 justifies the following notational convention. We identify the elements of H k (U ) with functions f ∈ L2 (U ) and equip the space H k (U ) with the norm s X ∂ α f k2L2 (U) . kf kH k (U) = k∂ kαk1 6k

5.2 Sobolev Spaces on Open Sets

143

Using this identification, the subspace (5.3) will from now on be referred to as C ∞ (U ) ∩ H k (U ). Proposition 5.13 (Forgetting regularity and the weak partial derivative). For k > ℓ > 0 there is a natural injection ık,ℓ : H k (U ) → H ℓ (U ) of norm one, extending the identity on C ∞ (U ). For any multi-index α with kαk1 6 k there is a natural operator ∂ α : H k (U ) −→ H k−kαk1 (U ) f 7−→ ∂ α f of norm less than or equal to one, which extends ∂α on C ∞ (U ) ∩ H k (U ). This may be proved along the same lines as Proposition 5.3. We may obtain other Sobolev spaces — which will be subspaces of H k (U ) — by requiring additional decay properties at the boundary ∂U . Definition 5.14. We define H0k (U ) = Cc∞ (U ) ⊆ H k (U ) to be the closure of all smooth compactly supported functions in H k (U ). We will see later that elements of H0k (U ) ‘vanish in the square-mean norm sense’ at ∂U if k > 1. Let us add the following remark to Definition 5.7. We defined H k (U ) to consist of those f ∈ L2 (U ) that have weak α-partial derivatives ∂ α f ∈ L2 (U ) for all α with kαk1 6 k such that the vector M ∂ α f )kαk1 6k ∈ (∂ L2 (U ) kαk1 6k

can be approximated by vectors corresponding to elements of C ∞ (U ) ∩ H k (U ). One may ask whether this approximation statement can be proved instead of assumed. Exercise 5.15. Show that f ∈ L2 (Td ) belongs to H k (Td ) if and only if there exists, for every α ∈ Nd0 with kαk1 6 k, a weak α-partial derivative ∂ α f ∈ L2 (Td ). Here the weak partial derivative is defined in terms of smooth test functions φ ∈ C ∞ (Td ).

The analogue of Exercise 5.15 also holds(15) for certain open subsets U of Rd , but we will not use this possible alternative definition of the Sobolev spaces here (and will return to this question in Section 8.2.2). We note that due to the boundary of U this equivalence is a bit harder to prove. For this proof, but more importantly also for the material that follows in this chapter, we need some more background concerning smooth functions on Rd , which we outline in the following series of exercises.

144

5 Sobolev Spaces and Dirichlet’s Boundary Problem

Exercise 5.16 (A smooth function on R). Show that the function ψ(t) =

(

e1/t 0

for t < 0, for t > 0

is smooth on R.

Essential Exercise 5.17 (Smooth approximate identities on Rd ). Define a real function  on Rd by ( 2 ce1/(kxk2 −1) for kxk2 < 1, 2 (x) = cψ(kxk2 − 1) = 0 for kxk2 > 1, R R where c > 0 is chosen so that Rd (x) dx = B1 (x) dx = 1. For ε > 0, also  define ε (x) = ε1d  xε for x ∈ Rd . Show the following. (a)  ∈ Cc∞ (Rd ). R (b) The function fε defined by fε (x) = f ∗ ε (x) = Rd f (y)ε (x − y) dy converges uniformly to f as ε → 0 on any compact subset of Rd for any f ∈ C(Rd ). (c) fε ∈ C ∞ (Rd ) for any f ∈ C(Rd ). (d) Supp fε ⊆ Supp f + Bε . (e) Generalize the above in the appropriate sense to any f ∈ Lp (Rd ) and any p ∈ [1, ∞). Derive the statement in Exercise 5.11 from this.

Exercise 5.18. Generalize and prove the first claim from Section 1.5 for functions defined on Td or on an open subset of Rd . Exercise 5.19. Suppose U ⊆ Rd is open, bounded, and star-shaped with centre 0 in the sense that U ⊆ λU for all λ > 1 (see, for example, Figure 5.2). Let f, f1 , . . . , fd be in L2 (U ) and suppose fj is the weak ej -partial derivative of f for j = 1, . . . , d. Show that f ∈ H 1 (U ).

5.2.1 Examples We illustrate the theory above with some simple examples, which will be justified below. Example 5.20. Let d = 1, U = (0, 1) and k = 1. Then every f ∈ H 1 ((0, 1)) has a continuous representative ı(f ) ∈ C([0, 1]) with kı(f )k∞ 6 2kf kH 1 .

(5.5)

This is again an instance of the Sobolev embedding theorem, which we will prove for open subsets in Theorem 5.34 below. However, the instance discussed here can be proven quite directly and independently. Also see Exercise 5.22 and Exercise 5.23.

5.2 Sobolev Spaces on Open Sets

145

r Example 5.21. Let d = 2, U = and k = 1. Then the function f B1/2 {0} defined by f (x) = log log kxk lies in H 1 (U ) and cannot be extended to an element of C(U ). Also see Exercise 5.25 and Exercise 5.26. Justification of Example 5.20. For x, y ∈ U and f ∈ C ∞ (U ) ∩ H 1 (U ) we clearly have Z y f (y) = f (x) + f ′ (s) ds. x

Notice that for fixed x and y the integral on the right is a continuous functional on H 1 (U ), but for the terms f (y) and f (x) this is not clear yet. Now integrate over x ∈ (0, 1) to get f (y) =

Z

1

f (x) dx +

0

=

Z

0

1

f (x) dx +

0

where

Z

Z

0

1

Z

y

x 1Z 1

f ′ (s) ds dx f ′ (s)σ(y, x, s) ds dx,

0

  if x < s < y, 1 σ(y, x, s) = −1 if y < s < x, and   0 if s is not between x and y.

Applying Fubini’s theorem we get Z 1 Z f (y) = f (x) dx + 0

where k(y, s) =

Z

0

1

f ′ (s)k(y, s) ds,

(5.6)

0

1

σ(y, x, s) dx =

(

s if s < y, s − 1 if s > y.

Hence (5.6) expresses the value of f at y ∈ U as the sum hf, 1U i+hf ′ , k(y, ·)i, which is clearly continuous on H 1 ((0, 1)), and since kk(y, ·)kL2 6 1 we also have |f (y)| 6 2kf kH 1 . Moreover, we may use (5.6) for y = 0 and y = 1 as a definition of f (0) and f (1), and then Z 1 ′ |f (y1 ) − f (y2 )| = f (s) (k(y1 , s) − k(y2 , s)) ds 0 p (5.7) 6 kf ′ k2 kk(y1 , ·) − k(y2 , ·)k2 = kf ′ k2 |y1 − y2 |

for all y1 , y2 ∈ [0, 1]. It follows that any f ∈ C ∞ (U ) ∩ H 1 (U ) extends to a continuous function satisfying (5.5). Applying automatic extension to the closure (Proposition 2.59) to the so described map from C ∞ (U ) ∩ H 1 (U ) to C(U ), the claims in Example 5.20 follow. 

146

5 Sobolev Spaces and Dirichlet’s Boundary Problem

Exercise 5.22. Let U = (0, 1) and let f : U → C be a function continuous on U and continuously differentiable at all but finitely many points of U . If we also have f ′ ∈ L2 (U ), show that f ∈ H 1 (U ). Exercise 5.23. Show that Example 5.20 can be generalized to the statement that any function f ∈ H k ((0, 1)) (continuously extended) belongs to C k−1 ([0, 1]). Show also that H01 ((0, 1)) is mapped under the embedding from Example 5.20 into the space {f ∈ C([0, 1]) | f (0) = f (1) = 0}. Exercise 5.24. Show that not every function in H01 ((0, 1))∩H 2 ((0, 1)) is also in H02 ((0, 1)).

Justification of Example 5.21. It is easy to check that f ∈ L2 (U ). Since f is also smooth on U , we only have to check that ∂j f ∈ L2 (U ). By the chain rule we have 1 1 xj ∂j (log |log kxk|) = . log kxk kxk kxk

Taking the square and integrating with respect to dx dy = r dφ dr we get s s Z 1/2 Z 2π Z 1/2 1 1 k∂j f kL2 (U) 6 r dφ dr ≪ dr < ∞. 2 (r log r) r(log r)2 0 0 0

 Exercise 5.25. Extend Example 5.21 by showing that f (x) = log log kxk defines an element of H 1 (B1/2 ).

d

Exercise 5.26. Let U = B1R , and let fα (x) = kxkα for x ∈ U . For which values of α do we have fα ∈ H k (U )?

5.2.2 Restriction Operators and Traces Essential Exercise 5.27 (Open subsets). Let V ⊆ U ⊆ Rd be open subsets. Let k > 0. (a) Show that the restriction ·|V : H k (U ) → H k (V ) is a bounded operator. (b) Show that the extension operator sending functions in H0k (V ) to H0k (U ), defined by extending the functions to be zero on U rV , is a bounded operator. (c) Show that for k > 1 in general H0k (U )|V does not belong to H0k (V ). Show that one cannot define an extension operator from H k (V ) to H k (U ) by simply extending the functions to be zero on U rV . In order to get a better geometric understanding of what it means for a function to belong to H 1 (U ), we will now show that an element f ∈ H 1 (U ), when restricted to any hyperplane, still belongs to L2 . Notice that since a hyperplane is a null set, any property claimed for the restriction to a hyperplane cannot be demanded — indeed does not even make sense — for a function f ∈ L2 (U ) (as in this case f is really an equivalence class of functions). For notational simplicity we start with the case U = (0, 1)d and describe how Example 5.20 can be generalized to higher dimensions.

5.2 Sobolev Spaces on Open Sets

147

Example 5.28. Let U = (0, 1)d , and S = (0, 1)d−1 and write Sy = S × {y} ⊆ U for y ∈ [0, 1]. For every y ∈ [0, 1] there is a natural restriction operator to L2 (Sy ), called the trace on Sy , H 1 (U ) ∋ f 7−→ f ∈ L2 (Sy ), Sy

which for y ∈ (0, 1) is the continuous extension of the restriction operator C ∞ (U ) ∩ H 1 (U ) ∋ f 7−→ f S . y

Moreover, if we identify the space L2 (Sy ) with L2 (S) for all y ∈ [0, 1] (by simply identifying Sy with S via the projection Sy ∋ (x, y) 7→ x ∈ S), then we also have

p

6 kf kH 1 (U) |y1 − y2 |.

f Sy − f Sy 2 1

2

L (S)

Exercise 5.29. Prove the statements of Example 5.28 by the following steps: (a) Fix some f ∈ C ∞ (U ) ∩ H 1 (U ) and apply Fubini’s theorem to see that the restriction of f to {x} × (0, 1) belongs to H 1 ((0, 1)) for almost every x ∈ (0, 1)d−1 . Now apply Example 5.20 (or more precisely (5.6)) to show that f (x, y) =

Z

0

1

f (x, s) ds +

Z

y

s∂2 f (x, s) ds +

0

Z

y

1

(s − 1)∂2 f (x, s) ds.

Notice that this also gives a definition for the trace in the cases y = 0 and y = 1. Use this to estimate the L2 norm of the restriction of f to Sy . (b) For the last statement show that for any f ∈ C ∞ (U ) ∩ H 1 (U ), |f (x, y1 ) − f (x, y2 )| 6

p

|y1 − y2 | · k∂2 f kL2 ({x}×(0,1)) .

d Exercise 5.30. Prove an extension  of Example 5.28 for any bounded open set U ⊆ R and the image of φ [0, 1]d−1 × {0} for a smooth map φ : [0, 1]d → U .

We now consider a general open set U ⊆ Rd and define the trace for elements of H01 (U ). For the statement that such functions vanish in the squaremean sense at ∂U we want to assume that U has a sufficiently regular boundary in the following sense (this may feel familiar after recalling the implicit function theorem). Definition 5.31. Let U ⊆ Rd be an open set and k ∈ N0 ∪ {∞}. We say that U has a C k -smooth boundary if for every z (0) ∈ ∂U there exists a neighbourhood Bε (z (0) ), a rotated coordinate system (which we denote (0) (0) by x1 , . . . , xd−1 , y) so that z (0) corresponds to (x1 , . . . , xd−1 , y (0) ), and a function (0) (0)  φ ∈ C k Bε (x1 , . . . , xd−1 )

148

5 Sobolev Spaces and Dirichlet’s Boundary Problem

such that U ∩ Bε (z (0) ) = {(x, y) ∈ Bε (z (0) ) | y < φ(x)}, as illustrated in Figure 5.2. If k = ∞ then we simply say that U has smooth boundary. Bε (z0 )

U Fig. 5.2: A set with smooth boundary.

This includes examples like U = Br (x), but excludes U = (0, 1)d if k > 1 and d > 2. Notice that an open set with a C k -smooth boundary need not be connected, simply connected, or bounded. Also note that the rotation within Definition 5.31 does not affect whether a function belongs to H k (U ). In fact, since a rotation R preserves the H k norm, a convergent sequence (fn ) in C ∞ (U ) ∩ H k (U ) is mapped to another convergent sequence (fn ◦ R) in H k (R−1 U ). Exercise 5.32. Let U ⊆ Rd be a bounded open set, and let Φ be a diffeomorphism (a rotation, for example) defined on a neighbourhood of U. Let k > 0 be an integer. Show that H k (U ) ∋ f 7→ f ◦ Φ ∈ H k (Φ−1 (U )) is an isomorphism (and in the case of a rotation, an isometry) between H k (U ) and H k (Φ−1 (U )).

Proposition 5.33 (Trace on graphs). Let U ⊆ Rd be a bounded open set. We let ε > 0, denote the variables (z1 , . . . , zd ) ∈ Rd by (x1 , . . . , xd−1 , y), and assume that φ : BεRd−1 (x(0) ) −→ R is continuous. Then there is a natural restriction operator, called the trace on the graph Graph(φ) = {(x, φ(x)) | x ∈ BεRd−1 (x(0) )} of φ, H01 (U ) ∋ f 7−→ f Graph(φ) ∈ L2 (Graph(φ)) which satisfies

f Graph(φ)

L2 (Graph(φ))

6

p ∂ d f kL2 (U) , δφ k∂

where δφ is chosen to have (x, φ(x) + δφ ) ∈ / U for all x ∈ Bε (x(0) ) (see Figure 5.3) and we use the measure dx1 · · · dxd−1 on Graph(φ). Consider now the case of a bounded open set U ⊆ Rd with C 0 -smooth boundary and the function φδ (x) = φ(x) − δ with φ as in Definition 5.31. 1 Then, by Proposition 5.33, the trace of an element f ∈ H√ 0 (U ) on this translated portion of the boundary has L2 norm of order O( δkf kH 1 (U) ). This

5.2 Sobolev Spaces on Open Sets

149 δφ Graph(φ) U

Fig. 5.3: The trace on the graph of φ.

explains the earlier claim that f ∈ H01 (U ) vanishes in the square-mean sense at ∂U . We also note that if we set φ(x) = y0 to be constant we obtain that the L2 norm of the restriction of f to U ∩ Rd−1 × {y0 } is bounded by a multiple ∂ y f k2 . Integrating the square of this inequality over y0 we obtain of k∂ ∂ y f k2 kf k2 ≪U k∂

(5.8)

for all f ∈ H01 (U ), where the implicit constant depends on the bounded open set U . Proof of Proposition 5.33. Recall that we write x for the first (d − 1) coordinates and y for the last coordinate. For f ∈ Cc∞ (U ) we have f (x, φ(x)) = −

Z

φ(x)+δφ

∂d f (x, s) ds

(5.9)

φ(x)

by our assumption on δφ . Taking the square and applying the Cauchy– Schwarz inequality gives Z 2 Z φ(x)+δφ φ(x)+δφ 2 |f (x, φ(x))| = ∂d f (x, s) ds 6 δφ |∂d f (x, s)|2 ds. φ(x) φ(x) (5.10) Integrated over x this gives p p

f |Graph(φ) 2 6 δ k∂ f k δφ kf kH 1 (U) . 2 (U) 6 φ d L L (Graph(φ)) Using automatic extension to the closure (Proposition 2.59), this implies the proposition.  5.2.3 Sobolev Embedding in the Interior We now extend the Sobolev embedding theorem (Theorem 5.6) to open subsets U ⊆ Rd (but, for now, leave open the question regarding the behaviour of f at ∂U ).

150

5 Sobolev Spaces and Dirichlet’s Boundary Problem

Theorem 5.34 (Sobolev embedding on open subsets of Rd ). Let U be an open subset of Rd and let ℓ > 0 and k > d2 + ℓ be integers. Then any function in H k (U ) (has a continuous representative that) also lies in C ℓ (U ). In the following exercise we simplify the geometry again and only look at the unit cube, but go further than in the theorem above. Exercise 5.35. Let U = (0, 1)d as in Example 5.28. (a) Extend Example 5.28 by showing that f ∈ H k (U ) implies that f |Sy ∈ H k−1 (Sy ) for y ∈ [0, 1]. (b) Use (a) to prove the following weak version of the Sobolev embedding theorem. For any ℓ > 0 there is a natural map from H d+ℓ (U ) to C ℓ (U ) that extends the identity on the functions in C ∞ (U ) ∩ H d+ℓ (U ). (c) Extend the arguments from (b) to the boundary, by showing that there is a natural map from H d+ℓ (U ) to C ℓ (U ). (d) Now improve the needed regularity in (c) in the following way: Show that there is a natural map from H k (U ) to C ℓ (S0 ) if k > ℓ + 1 + d−1 (by also applying the Sobolev 2 embedding theorem on S0 ).

The proof of Theorem 5.34 will be (apart from Exercise 5.18) the first example of a technique that we will use frequently: If a given statement is already known to hold on Td (where we can use Fourier series to prove it), then one can sometimes obtain the same statement for open subsets of Rd by moving the functions or the problem to Td . For this the following lemma and the notation TdR = Rd /(2RZd ) for R > 0 will be useful. We also define H k (TdR ) in the same way as we defined H k (Td ) except that we will use the fundamental domain [−R, R)d and the restriction of the Lebesgue measure to it to define the L2 norm and the derived Sobolev norms. Of course the theorems of the previous section also hold in that context (possibly with different multiplicative constants). Lemma 5.36 (Transfering regularity). Let U ⊆ Rd be open, k > 1, and χ ∈ Cc∞ (V ) for some open V ⊆ U . Then Mχ : H k (U ) → H0k (V ) defined by Mχ (f ) = χf is a bounded operator. Let R > 0 and assume now that U ⊆ BR . For a function f on U we define P (f ) on TdR by first extending f to [−R, R)d by setting it to be zero outside of U and then identifying [−R, R)d with TdR . Then P : H0k (U ) → H k (TdR ) is a linear isometry. Finally, f ∈ H k (U ), χ ∈ Cc∞ (U ) and P (χf ) ∈ H ℓ (TdR ) for some ℓ > k implies that χf ∈ H0ℓ (U ). Proof. For f ∈ C ∞ (U ) ∩ H k (U ) we have Mχ (f ) = χf ∈ Cc∞ (V ) ⊆ H0k (V ) and the partial derivatives of χf are sums of products of partial derivatives of χ and partial derivatives of f of lower order (the full expansion is given by Leibniz’ rule). Since χ ∈ Cc∞ (V ) this gives the estimate k∂α (χf )kL2 (V ) ≪

sup kβk1 6kαk1

k∂β χk∞

sup kγk1 6kαk1

k∂γ f kL2 (U)

for all α with kαk1 6 k, which leads to kMχ (f )kH k (V ) ≪χ kf kH k (U) . From this it follows that the operator Mχ is a bounded operator from H k (U )

5.2 Sobolev Spaces on Open Sets

151

into H0k (V ) (and that the weak partial derivatives ∂ α (χf ) for f ∈ H k (U ) are obtained by the same Leibniz rule as the partial derivates ∂α (χf ) for f in C ∞ (U ). For the second statement of the lemma notice first that P (Cc∞ (U )) ⊆ C ∞ (TdR ) (which would not be true for C ∞ (U ) ∩ H k (U )). Since the norm on H k (TdR ) is defined by integration over the Lebesgue measure on (−R, R)d , and since U ⊆ (−R, R)d we obtain kP (f )kH k (TdR ) = kf kH k (U) for f in Cc∞ (U ). Hence the automatic extension of P to an operator from H0k (U ) to H k (TdR ) (Proposition 2.59) exists and is an isometry. Assume now that P (χf ) lies in H ℓ (TdR ) for some f ∈ H k (U ), ℓ > k, and χ ∈ Cc∞ (U ). Let ψ in Cc∞ (U ) be chosen with ψ ≡ 1 on a neighbourhood of Supp χ and with Supp ψ ⊆ U (see Exercise 5.37). Since any g ∈ C ∞ (TdR ) can be identified with a (2RZ)d -periodic smooth function g on Rd , the map C ∞ (TdR ) ∋ g 7−→ ψg ∈ Cc∞ (U ) is well-defined. Arguing just as in the first part of the proof, we get that multiplication by ψ defines a bounded operator from H ℓ (TdR ) to H0ℓ (U ). Applying this map to g = P (χf ) ∈ H ℓ (TdR ) we get ψP (χf ) = χf ∈ H0ℓ (U ).  The existence of functions in Cc∞ (U ) that are equal to one on large subsets of U (as used in the above proof) will frequently be useful. Essential Exercise 5.37 (Smooth approximate characteristic functions). Let K ⊆ U be a compact subset of an open subset U ⊆ Rd . Find a smooth function ψ ∈ Cc∞ (U ) with ψ|K ≡ 1. Proof of Theorem 5.34. Let x0 ∈ U and ε > 0 be such that d

R V = B2ε (x0 ) ⊆ U.

Assume k > d2 + ℓ. We fix some χ ∈ Cc∞ (V ) satisfying χ|Bε (x0 ) ≡ 1. Note that χ ∈ Cc∞ (V ) ⊆ Cc∞ (BR ) for R = kx0 k + 2ε. The theorem follows from the existence of the following chain of operators P ◦Mχ

ı

·|Bε (x

)

H k (U ) −→ H k (TdR ) −→ C ℓ (TdR ) −→0 Cbℓ (Bε (x0 )) −→ L2 (Bε (x0 )), where Mχ : H k (U ) → H0k (V ) and P : H0k (V ) → H k (TdR ) are as in Lemma 5.36, ı is from the Sobolev embedding theorem for the torus (Theorem 5.6), and ·|Bε (x0 ) denotes the operator sending a function to its restriction to Bε (x0 ). We also note that the composition of these operators

152

5 Sobolev Spaces and Dirichlet’s Boundary Problem

applied to f ∈ H k (U ) simply gives the restriction of f to Bε (x0 ) (since this holds initially for f ∈ C ∞ (U ) ∩ H k (U )). This gives the theorem (see also Exercise 5.38).  Exercise 5.38 (Merging lemma for continuous functions). If we wish to be pedantic the above proof is not yet complete since we have only shown that for every point x there exists a neighbourhood Bε (x) such that we can find a C ℓ -version of the restriction of f to the given neighbourhood. However, we actually claimed that there is a version of f on all of U which is in C ℓ (U ). To complete the proof, prove or recall the following statements: (a) Suppose that U is covered by a family of open subsets Bτ for τ ∈ T and for every τ ∈ T we are given some fτ ∈ C(Bτ ). Assume that fτ1 |Bτ1 ∩Bτ2 = fτ2 |Bτ1 ∩Bτ2 for every τ1 , τ2 in T . Then there exists some f ∈ C(U ) with fτ = f |Bτ for all τ ∈ T . (b) Use the fact that U is σ-compact to construct a countable cover Bn = Bεn (xn ) of U with B2εn (xn ) ⊆ U . (c) Complete the proof of Theorem 5.34 above, using (a) and (b).

Exercise 5.39. Let U ⊆ Rd be open and K ⊆ U a compact subset. Show that kf kK,∞ ≪K,U kf kH k (U ) for f ∈ H k (U ) and k >

d . 2

5.3 Dirichlet’s Boundary Value Problem and Elliptic Regularity In this section we will combine the discussion of Sobolev spaces from Section 5.2, the Fr´echet–Riesz representation theorem (Corollary 3.19), and a simple orthogonality relation to solve the Dirichlet boundary value problem  ∆u = 0 (5.11) u|∂U = b introduced and motivated in Section 1.2.1 for certain domains U ⊆ Rd and certain functions b : ∂U → R. Recall that a function g is said to be harmonic if ∆g = 0, where ∆g =

∂2g ∂2g + · · · + = ∂12 g + · · · + ∂d2 g ∂x21 ∂x2d

is the Laplacian of g. We note that (with the exception of Lemma 5.48 and its proof) we restrict our attention in this section to real-valued functions. In the following we will also write h·, ·iL2 (U) to denote the inner product on L2 (U ), and similarly for other Hilbert spaces, to emphasize the difference between the various inner products used, especially for the semi-inner product h·, ·i1 as introduced in

5.3 Dirichlet’s Boundary Value Problem and Elliptic Regularity

153

the next lemma. Recall from Definition 3.1 that a semi-inner product satisfies positivity instead of strict positivity. Lemma 5.40 (Orthogonality). Let U ⊆ Rd be open, φ ∈ Cc∞ (U ), and assume that g ∈ C 2 (U ) ∩ H 1 (U ) is a harmonic function. Then φ and g are orthogonal with respect to the semi-inner product h·, ·i1 defined by hu, vi1 =

d X j=1

∂ j u, ∂ j viL2 (U) h∂

(5.12)

for u, v ∈ H 1 (U ). Proof. Fix j ∈ {1, . . . , d} and φ ∈ Cc∞ (U ). Then as in Lemma 5.10 we can use integration by parts to obtain Z Z  ∂j g∂j φ dx = − ∂j2 g φ dx U

U

since the boundary terms vanish. Integrating over the remaining variables and summing over all j = 1, . . . , d, we get hg, φi1 = h−∆g, φiL2 (U) = 0 by the assumption on g.  Motivated by Lemma 5.40, the approach is to decompose a function f in C 1 (U ) as f = g + v, where v ∈ H01 (U ) and g is ‘orthogonal to’ H01 (U ) with respect to the semi-inner product h·, ·i1 . As harmonic functions have this orthogonality property by Lemma 5.40, there is some hope that g will be harmonic and indeed it will turn out to be. Morevoer, v will vanish at ∂U in the square-mean sense and so f |∂U = g|∂U at least in the square-mean sense. As we wish to use the semi-inner product from (5.12) in the definition of the orthogonal complement, we will have to discuss properties of this semiinner product. We will then show that g is smooth and harmonic, and it is this step that relies on a general phenomenon called elliptic regularity, the Laplace operator being an example of an elliptic differential operator. We will show in Section 5.3.3, for d = 2, that g extends continuously to the boundary ∂U and agrees with f |∂U there. Finally, we will discuss in Section 8.2.2 the behaviour at a smooth boundary in any dimension. 5.3.1 The Semi-Inner Product Let U ⊆ Rd be an open bounded set. Lemma 5.41 (Semi-inner product). The semi-inner product h·, ·i1 restricted to H01 (U ) is an inner product, and the norm defined by this inner product is equivalent to k · kH 1 (U) . The semi-norm k · k1 induced by h·, ·i1 on C ∞ (U ) ∩ H 1 (U ) has as its kernel the subspace of all locally constant functions.

154

5 Sobolev Spaces and Dirichlet’s Boundary Problem

Here the kernel is the subspace of all functions f with hf, f i1 = kf k21 = 0. A function f on U is called locally constant if for every x ∈ U there is a neighbourhood V of x such that f |V is constant. If U is connected, then any locally constant function is constant. ∂ xd f kL2 (U) Proof of Lemma 5.41. Let f ∈ H01 (U ). We have kf kL2 (U) ≪ k∂ by (5.8). Thus q q q Pd ∂ j f k2L2 (U) ≪ hf, f i1 hf, f i1 6 kf kH 1 (U) = kf k2L2 (U) + j=1 k∂

for f ∈ H01 (U ), proving the first statement in the proposition. If f ∈ C ∞ (U ) ∩ H 1 (U ) is locally constant then it is clear that hf, f i1 = 0. On the other hand, if f ∈ C ∞ (U ) ∩ H 1 (U ) has hf, f i1 = 0, then ∂j f = 0 almost everywhere and for all j, so f is locally constant.  We are now ready to exhibit the desired orthogonal decomposition. Proposition 5.42 (Existence of weak solution). Let U ⊆ Rd be an open bounded set with C 1 -smooth boundary, and let f ∈ C 1 (U ) (that is, f and all of its partial derivatives are continuous and extend continuously to U ). Then there exists some v in H01 (U ) such that g = f − v ∈ H 1 (U ) is weakly harmonic in the sense that hg, ∆φiL2 (U) = 0 for all φ ∈ Cc∞ (U ).

As before with ∂ and ∂ , we will think of this statement as giving meaning to ‘∆g = 0’ by writing h∆g, φiL2 (U) = hg, ∆φiL2 (U) = 0 for all φ ∈ Cc∞ (U ). If ∆g = 0, then we say that g is weakly harmonic. Proof of Proposition 5.42. We equip Cc∞ (U ) with the inner product h·, ·i1 . By Lemma 5.41, k·k1 defines an inner product on H01 (U ), which makes H01 (U ) into a Hilbert space. Let f ∈ C 1 (U ) be as in the statement of the proposition, and notice that d X ℓ(u) = hu, f i1 = h∂j u, ∂j f iL2 (U) (5.13) j=1

defines a linear functional on H01 (U ) since |ℓ(u)| 6

d X j=1

∂ j u, ∂j f iL2 (U) | 6 |h∂

d X j=1

∂ j ukL2 (U) k∂j f kL2 (U) ≪f kukH01 (U) k∂

for all u ∈ H01 (U ). Applying the Fr´echet–Riesz representation theorem (Corollary 3.19) for the Hilbert space H01 (U ) with the inner product h·, ·i1 we find some v ∈ H01 (U ) with ℓ(φ) = hφ, vi1 (5.14) for all φ ∈ H01 (U ). This implies for every φ ∈ Cc∞ (U ) that

5.3 Dirichlet’s Boundary Value Problem and Elliptic Regularity

h∆φ, f − viL2 (U) =

d X

2 ∂j φ, f − v L2 (U)

155

(by definition of ∆)

j=1

=−

d X j=1

h∂j φ, ∂j f − ∂ j viL2 (U)

= −ℓ(φ) + ℓ(φ) = 0,

(by Lemma 5.10) (by (5.13) and (5.14))

completing the proof.



5.3.2 Elliptic Regularity for the Laplace Operator In this section we will upgrade the conclusion from the previous section to show that the weakly harmonic function g is actually smooth and harmonic. The principle at work here is much more general, and is called elliptic regularity. We will again rely on Fourier series in the argument, and this will only give the result in the interior of U and not at the boundary ∂U . For this reason, it is natural to start with functions that have little structure on ∂U , as in the following definition. Definition 5.43. A measurable function f on U is called locally Lp for some p ∈ [1, ∞] if 1K f ∈ Lp (U ) for every compact set K ⊆ U . In this case we write f ∈ Lploc (U ). A measurable function f on U is called locally H k for some k ∈ N0 if χf ∈ H k (U ) for all χ ∈ Cc∞ (U ). In this case we k write f ∈ Hloc (U ).

Notice that the characteristic function 1K localizes f and removes the values of f near the boundary ∂U ; in the second case χ has the same effect but is chosen to be C ∞ so as not to disturb any of the smoothness properties of f . k Clearly Lp (U ) ⊆ Lploc (U ), and by Lemma 5.36 we also have H k (U ) ⊆ Hloc (U ). Exercise 5.44. Let f be a measurable function on an open subset U ⊆ Rd . Show that f k (U ) if and only if f | o ∈ H k (K o ) for every compact K ⊆ U , if and only lies in Hloc K if χf ∈ H0k (U ) for all χ ∈ Cc∞ (U ) (we have not used this as our definition as we will need the functions χ as in Definition 5.43 in the proofs anyway).

Theorem 5.45 (Elliptic regularity for ∆ inside open subsets of Rd ). 1 Suppose that U ⊆ Rd is open and bounded, and g ∈ Hloc (U ). Assume k k that ∆g ∈ Hloc (U ) for k ∈ N0 , in the sense that there exists some u ∈ Hloc (U ) with h∆g, φiL2 (U) = hg, ∆φiL2 (U) = hu, φiL2 (U) k+2 for all φ ∈ Cc∞ (U ). Then g ∈ Hloc (U ).

The assumption of boundedness is not important, but simplifies the discussion slightly and is sufficient for all our applications of the theorem. Similarly, the assumption on the regularity of g could be significantly weakened.

156

5 Sobolev Spaces and Dirichlet’s Boundary Problem

Roughly speaking, the theorem says that if ∆g exists, then the Sobolevregularity of g must be two more than that of ∆g. In other words, any non-smoothness of g will be visible also in ∆g, or there is no cancellation of singularities when ∆g is calculated from g. This remarkable result has many striking consequences, a few of which we list here. 1 Corollary 5.46. If g ∈ Hloc (U ) has ∆g = u ∈ C ∞ (U ) (or g is weakly harmonic in the sense that ∆g = 0), then g ∈ C ∞ (U ) satisfies ∆g = u (respectively is harmonic). k Proof. Since u ∈ Hloc (U ) for all k ∈ N0 , Theorem 5.45 implies that k+2 g ∈ Hloc (U )

for all k > 0. Hence χg ∈ H k+2 (U ) for all k > 0 and all functions χ ∈ Cc∞ (U ). By the Sobolev embedding theorem for open subsets (Theorem 5.34), this implies that χg ∈ C ∞ (U ) for all χ ∈ Cc∞ (U ). Choosing χ ∈ Cc∞ (U ) equal to 1 on a neighbourhood of a given x ∈ U shows that g ∈ C ∞ (U ), since it is C ∞ in a neighbourhood of each point. Finally, integration by parts gives h∆g, φi = hg, ∆φi = h∆g, φi = hu, φi for all φ ∈ Cc∞ (U ). By density of Cc∞ (U ) ⊆ L2 (U ) and continuity of ∆g and u we see that ∆g = u.  The following might be even more surprising and will be important in the next chapter. 1 Corollary 5.47. If g ∈ Hloc (U ) is a weak eigenfunction of ∆ in the sense that there exists a λ ∈ C with h∆g, φiL2 (U) = hg, ∆φiL2 (U) = λ hg, φiL2 (U) for all φ ∈ Cc∞ (U ), then g ∈ C ∞ (U ) and ∆g = λg. 1 Proof. By assumption, ∆g = λg ∈ Hloc (U ) and so by Theorem 5.45 we 3 3 also have g ∈ Hloc (U ). However, this shows that ∆g = λg ∈ Hloc (U ) and 5 Theorem 5.45 may be applied again to see that g ∈ Hloc (U ), and so on. k It follows that g ∈ Hloc (U ) for all k > 0, and arguing as in the proof of Corollary 5.46 we see that g ∈ C ∞ (U ). 

We will prove Theorem 5.45 in two steps: firstly we deal with the case of functions on Td (which turns out to be easy because of Fourier series), and secondly we show how to transfer the theorem from Td to open subsets of U . Morally the second step (the transfer) should be the easy step as we are discussing the ‘Laplace operator’ on both of these spaces. However, some care is necessary as ∆ has different meanings on Td and on U since the spaces of allowed test functions in the definition of ∆ are C ∞ (Td ) and Cc∞ (U ), respectively.

5.3 Dirichlet’s Boundary Value Problem and Elliptic Regularity

157

Lemma 5.48 (Elliptic regularity on Td ). Let g ∈ L2 (Td ), and assume that ∆g = u ∈ H k (Td ) so h∆g, φiL2 (Td ) = hg, ∆φiL2 (Td ) = hu, φiL2 (Td ) for all φ ∈ C ∞ (Td ). Then g ∈ H k+2 (Td ). Proof. If ∆g P = u as in the lemma, then u is uniquely determined by g. Indeed, if g = n∈Zd cn χn is the Fourier series of g then u=−

X

n∈Zd

cn (2π)2 knk22 χn

is the Fourier series of u. This follows from Fourier series on the torus (Theorem 3.54) since the characters χn are eigenfunctions of the Laplace operator: ∆χn = (2πi)2 knk22 χn = −(2π)2 knk22 χn and so hu, χn iL2 (Td ) = hg, ∆χn iL2 (Td ) = −(2π)2 knk22 hg, χn iL2 (Td ) . | {z } =cn

By assumption u ∈ H k (Td ), which shows that X cn (2π)2 knk22 2 knk2k 2 1. Then there ℓ−1 exists a weak partial derivative ∂ j g ∈ Hloc (U ) ⊆ L2loc (U ) with

∂ j g, φiL2 (U) = − hg, ∂j φiL2 (U) h∂ for all φ ∈ Cc∞ (U ) and j = 1, . . . , d. ℓ Lemma 5.50. Let U ⊆ Rd be open and bounded. If g lies in Hloc (U ) for k ∞ some ℓ > 1, ∆g = u ∈ Hloc (U ) with k > 0, and χ ∈ Cc (U ), then

∆(χg) = χu + (∆χ)g +2 |{z} | {z } ∈H k

∈H ℓ

d X j=1

|

∂ j g) ∈ H min{k,ℓ−1} (U ). (∂j χ)(∂ {z

∈H ℓ−1

}

Notice that if g happened to be smooth, then the formula in the lemma calculates ∆(χg) since in that case ∆(χg) =

d X

∂j2 (χg) =

j=1

d X

∂j (χ(∂j g) + (∂j χ)g)

j=1

=

d X

χ(∂j2 g) + 2(∂j χ)(∂j g) + (∂j2 χ)g.

j=1

Proof of Theorem 5.45. Let R > 0 and U ⊆ BR be an open subset of Rd . 1 k Suppose k ∈ N0 and that g ∈ Hloc (U ) has ∆g = u ∈ Hloc (U ) weakly. We k+2 ℓ wish to show that g ∈ Hloc (U ), and will do so by showing that g ∈ Hloc (U ) by induction on ℓ ∈ {1, . . . , k + 2}. The case ℓ = 1 is the assumption in the ℓ theorem. So suppose that 1 6 ℓ 6 k + 1, g ∈ Hloc (U ), and fix χ ∈ Cc∞ (U ). Then min(k, ℓ−1) = ℓ−1 and ∆(χg) = u1 ∈ H ℓ−1 (U ) weakly by Lemma 5.50. This means that hu1 , φiL2 (U) = hχg, ∆φiL2 (U) (5.15) for all φ ∈ Cc∞ (U ). Using the fact that χ ∈ Cc∞ (U ) and U ⊆ BR , we can now make a switch in this formula to TdR as follows. For this we will use Lemma 5.36 and its notation. Let ψ ∈ Cc∞ (U ) be such that ψ ≡ 1 on Supp χ. Then ψφ lies in Cc∞ (U ) for any φ ∈ C ∞ (TdR ), where TdR = Rd /(2RZ)d and functions on TdR are identified with (2RZ)d -periodic functions on Rd . Applying (5.15) to ψφ we get hP (ψu1 ), φiL2 (Td ) = hψu1 , φiL2 (U) = hu1 , ψφiL2 (U) = hχg, ∆(ψφ)iL2 (U) . R

5.3 Dirichlet’s Boundary Value Problem and Elliptic Regularity

159

Since ψ is one and its derivatives are zero at any point of Supp χ, we can remove ψ on the right-hand side. One may wonder why ψ is introduced in the first place, since it is only brought in so that it can be removed again. The answer lies in the definition of ∆ which depends crucially on a choice of test functions. In particular, ∆ is defined differently on Td and on U — we use ψ to bridge between these two definitions. We now obtain hP (ψu1 ), φiL2 (Td ) = hχg, ∆φiL2 (U) = hP (χg), ∆φiL2 (Td ) R

R

k for any φ ∈ C ∞ (TdR ). In fact, by definition of Hloc (U ) we have χg ∈ H k (U ). k By Lemma 5.36 ψχg ∈ H0 (U ), but ψχg = χg by our choice of ψ, so we deduce that χg ∈ H0k (U ) and P (χg) ∈ H k (TdR ) is defined by Lemma 5.36. In other words, we have shown that ∆(P (χg)) = P (ψu1 ) ∈ H ℓ−1 (TdR ) weakly by Lemma 5.36 (where elements of C ∞ (TdR ) are used as the test functions). By Lemma 5.48 it follows that P (χg) ∈ H ℓ+1 (TdR ), which by the last claim of Lemma 5.36 allows us to pull the statement back to U and deduce that χg lies in H ℓ+1 (U ). Since this holds for all χ ∈ Cc∞ (U ) we see ℓ+1 that g ∈ Hloc (U ). Repeating the argument and increasing ℓ each time, we ℓ+1 k+2 eventually reach ℓ = k + 1 and then g ∈ Hloc (U ) = Hloc (U ). 

We now give the proof of the lemmas that were used in the theorem. In the remainder of the chapter we will only consider the inner product on L2 (U ) and hence will again simply write h·, ·i. Proof of Lemma 5.49. Fix some j ∈ {1, . . . , d}. Let V ⊆ U be an open subset with V ⊆ U compact, and choose χ ∈ Cc∞ (U ) with χ ≡ 1 on V . Then χg ∈ H ℓ (U ) has a weak partial derivative along xj , which we will denote by gχ,j ∈ H ℓ−1 (U ). By definition, we now have for the inner products in L2 (U ) that hgχ,j , φi = − hχg, ∂j φi = − hg, ∂j φi

for all φ ∈ Cc∞ (V ) ⊆ Cc∞ (U ). This shows that gχ,j |V ∈ H ℓ−1 (V ) is the weak partial derivative of g|V along xj and by the properties of weak derivatives (Lemma 5.10) is therefore uniquely determined by g|V (and independent of χ). S Now write U = n>1 Vn for an increasing sequence of open subsets of U with compact closures within† U , and define gj to be gχn ,j on Vn where χn and gχn ,j are the functions as above corresponding to the set V = Vn . By Lemma 5.10 the function gj is well-defined almost everywhere. Let φ ∈ Cc∞ (U ), then there exists some n with Supp φ ⊆ Vn , which gives hgj , φi = hgχn ,j , φi = − hg, ∂j φi. Moreover, φgj = φχn gχn ,j ∈ H ℓ−1 (U ) by Lemma 5.36. As these two facts hold for every φ in Cc∞ (U ), we see ℓ−1 that gj ∈ Hloc (U ) is a weak partial derivative of g along xj . 

For example, Vn = {x | d(x, RdrU ) >

1 } n

∩ Bn .

160

5 Sobolev Spaces and Dirichlet’s Boundary Problem

k Proof of Lemma 5.50. By assumption u ∈ Hloc (U ) and so χu ∈ H k (U ) by k ℓ definition of Hloc (U ) (Definition 5.43). Similarly, g ∈ Hloc (U ) and so (∆χ)g ℓ ℓ lies in H (U ). Finally, by assumption, g ∈ Hloc (U ) with ℓ > 1 and so ∂ j g lies ℓ−1 ∂ j g) ∈ H ℓ−1 (U ). Therefore, in Hloc (U ) by Lemma 5.49, which gives (∂j χ)(∂

χu + (∆χ)g + 2

d X j=1

∂ j g) ∈ H min{k,ℓ−1} (U ) (∂j χ)(∂

and it remains to show that this function is equal to ∆(gχ) weakly. For this, recall that ∆g = u, let φ ∈ Cc∞ (U ), and calculate * + d X ∂ j g), φ χu + (∆χ)g + 2 (∂j χ)(∂ j=1

= hu, χφi + hg, (∆χ)φi + 2

d X j=1

= hg, ∆(χφ)i + hg, (∆χ)φi − 2 =

*

g, (∆χ)φ + 2

d X

∂ j g, (∂j χ)φi h∂ d X j=1

hg, ∂j ((∂j χ)φ)i

(∂j χ)(∂j φ) + χ∆φ

j=1

+(∆χ)φ − 2

d X

(∂j2 χ)φ

j=1

= hg, χ∆φi = hχg, ∆φi .

−2

d X j=1

+

(∂j χ)(∂j φ)

As φ ∈ Cc∞ (U ) was arbitrary we see that ∆(χg) = χu + (∆χ)g + 2

d X

∂ j g) (∂j χ)(∂

j=1

weakly.



5.3.3 Dirichlet’s Boundary Value Problem For k > 0 and a bounded open set U ⊆ Rd the function space C k (U ) consists of all continuous functions with f |U ∈ C k (U ) such that the partial derivatives extend continuously to the closure U . If U has C k -smooth boundary, then the function space C k (∂U ) is defined using the assumption that ∂U has the structure of a manifold; the local charts allow smoothness properties to be

5.3 Dirichlet’s Boundary Value Problem and Elliptic Regularity

161

transported onto ∂U from a suitable subset of Rd−1 ; an example of how this may be done appears in the proof of Theorem 5.51 below. Theorem 5.51 (Dirichlet’s boundary value problem). Let U ⊆ Rd be open and bounded with C 1 -smooth boundary and let f ∈ C 1 (∂U ). Then there exists a function g ∈ H 1 (U ) such that g|U in C ∞ (U ) is harmonic and g|∂U is equal to f in the square-mean sense. If d = 2 then g extends uniquely to an element in C(U ), and g|∂U = f . Proof of the existence of a solution in the square-mean sense. Since U has C 1 -smooth boundary, we can find an extension of f , again denoted by f , to a function in C 1 (U ). We only sketch the argument. In an open neighbourhood of a point z (0) in ∂U as in Definition 5.31, such a function can, for example, be chosen to be independent of the y-coordinate. Using compactness we find finitely many open sets V1 , . . . , Vk covering ∂U and bounded C 1 -functions fj defined on Vj with fj (x) = f (x) for x ∈ ∂U ∩ Vj and j = 1, . . . , k. Using a smooth partition ψ1 , . . . , ψk of unity (that is, Pk functions ψj ∈ Cc∞ (Vj ) for j = 1, . . . , k with j=1 ψj |∂U ≡ 1; see ExerPk cise 5.52) with the cover V1 , . . . , Vk , Vk+1 = U we may define f = j=1 ψj fj (where ψj fj (x) = 0 for x ∈ / Vj and j = 1, . . . , k) and restrict it to U . Now apply Proposition 5.42 to f , giving functions v in H01 (U ) and g in H 1 (U ) such that f = g + v with g weakly harmonic in U . Now Corollary 5.46 implies that g ∈ C ∞ (U ), so that ∆g (in the usual sense) is well-defined. By integration by parts (see Lemma 5.10), it follows that h∆g, φi = hg, ∆φi = h∆g, φi = 0 for all φ ∈ Cc∞ (U ), showing that ∆g = 0. That is, g is a harmonic function. Moreover, v vanishes at ∂U in the square-mean sense (Proposition 5.33), so g|∂U = f |∂U in L2 (∂U ).  Essential Exercise 5.52 (Smooth partion of unity). Let U ⊆ Rd be open and bounded, let V1 , . . . , Vk be an open cover of U and let V0 = RdrU . Show that there exist smooth functions ψj ∈ Cc∞ (Vj ) for j = 0, . . . , k (always P extended to Rd by setting ψj (x) = 0 for x ∈ RdrVj ) such that kj=0 ψj = 1. Using the averaging property of harmonic functions (Proposition 5.53) and Lemma 5.55 (which only works in two dimensions), we will upgrade the first part to give the second part of the theorem for d = 2.

Proposition 5.53 (Mean value principle). Let U ⊆ Rd be open, and let φ ∈ C ∞ (U ) be a harmonic function. Let x0 ∈ U and r > 0 be chosen with Br (x0 ) ⊆ U . Then the value of the harmonic function at x0 is equal to the average over the sphere of radius r around x0 , that is Z 1 φ(x0 ) = φ(x0 + x) dσ(x), σ(rSd−1 ) rSd−1

162

5 Sobolev Spaces and Dirichlet’s Boundary Problem

where σ denotes the natural area measure on the sphere rSd−1 . Proof. Without loss of generality x0 = 0. The proof consists of applying the d-divergence theorem to the vector field f (x) = φ(x)∇v(x) − v(x)∇φ(x), where v : Rdr{0} → R is an auxiliary function. In fact, v is defined by ( log kxk2 for d = 2, v(x) = 1 for d > 2. kxkd−2 2

A direct calculation shows that it satisfies ( 1 x for x 6= 0 and d = 2, kxk2 kxk2 ∇v = 2−d x for x 6= 0 and d > 2, kxkd−1 kxk2 2

and ∆v = div ∇v = 0 for x 6= 0. For ε ∈ (0, r) the divergence theorem applied to f on the annulus BrrBε ⊆ Rd has the form Z Z Z f · n dσ − f · n dσ = div f dx1 · · · dxd , ∂Br

∂Bε

Br rBε

x where n = kxk is the normalized outward normal vector to the sphere of 2 radius kxk2 at x and we also write σ for the area measure on ∂Bε . For the right-hand side, we calculate for f as above

div f = div (φ∇v − v∇φ) = ∇φ · ∇v + φ∆v − ∇v · ∇φ − v∆φ = 0. Therefore

Z

∂Br

f · n dσ =

Z

∂Bε

f · n dσ.

(5.16)

For x with kxk2 = r we have f (x) · n = (φ(x)∇v(x) − v(x)∇φ(x)) · n c1 x x = φ(x) · − v(x)∇φ(x) · n kxk kxk kxkd−1 2 2 2 c1 = φ(x) d−1 − c2 ∇φ(x) · n, r where c1 = 1 if d = 2 or c1 = 2 − d if d > 2 and c2 = v(x) which is a constant for kxk = r since v depends only on kxk2 . Furthermore, we notice that Z Z ∇φ · n dσ = div ∇φ dx1 · · · dxd = 0. ∂Br Br | {z } =∆φ=0

Using this and the analogous formula for kxk = ε allows us to write (5.16) as

5.3 Dirichlet’s Boundary Value Problem and Elliptic Regularity

1 rd−1

Z

φ dσ =

rSd−1

1 εd−1

Z

163

φ(x) dσ.

εSd−1

Now divide by the area of Sd−1 and notice that we get Z Z 1 1 φ dσ = φ dσ −→ φ(0) σ(rSd−1 ) rSd−1 σ(εSd−1 ) εSd−1 as ε → 0, by continuity of φ.



Exercise 5.54. Use Proposition 5.53 to prove that any bounded harmonic function on Rd is constant.

Lemma 5.55 (Convergence on average). Suppose that U ⊆ R2 is open and bounded with C 1 -smooth boundary. Let v ∈ H01 (U ), which we extend to a function on R2 by setting it equal to 0 outside U . Then for any z (0) ∈ ∂U we have Z 1 |v| dx dy −→ 0 (5.17) ε2 z(0) +Bε as ε → 0, uniformly for all z (0) ∈ ∂U . Proof. To prove (5.17) for z (0) ∈ ∂U we use the assumption that U has C 1 smooth boundary and rotate the coordinate system so that Bδ (z (0) ) ∩ U can be described as in Definition 5.31 (see also Exercise 5.32). By a further rotation and by shrinking δ if necessary we may also assume that |φ′ (x)|  1 is the operator sending f ∈ Lp ([0, 1]) to the function x in C([0, 1]) defined by x 7→ 0 f (t) dt a compact operator?

Many of the compact operators that we will encounter have a similar flavour to the example above. They either map from a space of functions with more regularity properties (in this instance, differentiability) to a space of functions with fewer regularity properties (in this case, continuity), or are integral operators. The next lemma is a useful tool for proving compactness of bounded operators. Lemma 6.7 (Uniform approximation). Let V be a normed vector space, and let W be a Banach space. Suppose that (Ln ) is a sequence of compact operators V → W , and suppose that Ln → L ∈ B(V, W ) as n → ∞ with respect to the operator norm. Then L is a compact operator as well. Lemma 6.7 improves the claim from Exercise 6.4 in that the two-sided ideal K(V ) in B(V ) is even closed for any Banach space V (see also Exercise 6.8).  Proof of Lemma 6.7. Let M = L B1V ⊆ W . Since W is assumed to be a Banach space, M is complete. It remains to show that M is totally bounded (see Section A.4 for the notion and for the equivalence to compactness). Let ε > 0 and choose Ln with kLn − Lk < ε. Since Ln is compact, we know  that Ln B1V is compact and hence is totally bounded. It follows that there  exist elements w1 , . . . , wm ∈ Ln B1V with m  [ Ln B1V ⊆ BεW (wi ). i=1

For each wi there exists some vi ∈ B1V with kwi − Ln (vi )k < ε. If now v ∈ B1V , then for some i ∈ {1, . . . , m} we have

170

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

kLn (v) − Ln (vi )k < 2ε. Now kLn − Lk < ε and kvk, kvi k < 1 so kL(v)−L(vi )k 6 kL(v)−Ln(v)k+kLn (v)−Ln (vi )k+kLn(vi )−L(vi )k < 4ε. It follows that

m  [ W L B1V ⊆ B4ε (L(vi )), i=1

which implies that the points L(vi ) for i = 1, . . . , m are 5ε-dense in the  set M = L B1V . As ε was arbitrary, M is therefore totally bounded, so M is a compact set and hence L is a compact operator.  Exercise 6.8. Continuing the discussion from Exercise 6.4, show that B(V )/ K(V ) becomes a Banach algebra — the Calkin algebra — by defining (A + K(V ))(B + K(V )) to be AB + K(V ) for all A, B ∈ B(V ) and using the quotient norm k · kB(V )/ K(V ) . Exercise 6.9. In each of the following, justify your claim. (a) Is the inclusion map ık+1,k : H k+1 (Td ) −→ H k (Td ) from Proposition 5.3 a compact operator? (b) Let U ⊆ Rd be an open set. Show that the inclusion ık+1,k : Cbk+1 (U ) −→ Cbk (U ) is a compact operator if U ⊆ Rd is compact. Show that for U = R and k = 0 (or for any k > 0), the inclusion map ı1,0 (or ık+1,k ) is not a compact operator. (c) Is the inclusion map C(Td ) → L2 (Td ) a compact operator?

6.1.1 Integral Operators are Often Compact We explore here briefly the realm of integral operators and show that many (but not all) are in fact compact operators. Lemma 6.10 (Integral operators defined by continuous kernels). Assume that (X, dX ) and (Y, dY ) are compact metric spaces. Let µ be a finite Borel measure on X, and let k be a function in C(X × Y ). Then the operator K : L2µ (X) −→ C(Y ) defined by Z K(f )(y) = f (x)k(x, y) dµ(x) X

is a compact operator. Proof. We first need to show that K is well-defined. To see this, notice that Z |f (x)||k(x, y)| dµ(x) 6 kf k2 kk(·, y)k2 6 kf k2 kkk∞ µ(X)1/2 , X

where k(·, y) denotes the function on X obtained by fixing the coordinate y in Y . This shows that the integral defining K(f )(y) is well-defined and that

6.1 Compact Operators

171

|K(f )(y)| 6 kkk∞ µ(X)1/2 kf k2 .

(6.1)

We now must show that K(f ) is continuous, and in doing so we will obtain equicontinuity of the image of the unit ball, which together with (6.1) and the Arzela–Ascoli theorem will give the compactness of K. Since X × Y is compact, k is uniformly continuous, and so for any ε > 0 there is a δ > 0 for which dY (y1 , y2 ) < δ implies that |k(x, y1 ) − k(x, y2 )| < ε for all x ∈ X. Therefore Z |K(f )(y1 )−K(f )(y2 )| 6 |f (x)||k(x, y1 )−k(x, y2 )| dµ(x) 6 εµ(X)1/2 kf k2 X

if dY (y1 , y2 ) < δ, by the same argument as above. Hence K(f ) ∈ C(Y ) L2 (X)

and the image of the unit ball B1 µ is an equicontinuous bounded family of functions. By the Arzela–Ascoli theorem (Theorem 2.38) the closure L2 (X)  of K B1 µ is a compact subset of C(Y ), and so K is a compact operator.  Proposition 6.11 (Hilbert–Schmidt [46]). Let (X, BX , µ) and (Y, BY , ν) be σ-finite measure spaces. Let k ∈ L2µ×ν (X × Y ). Then the Hilbert–Schmidt integral operator K : L2µ (X) → L2ν (Y ) defined by Z K(f )(y) = f (x)k(x, y) dµ(x) X

for ν-almost every y ∈ Y defines a compact operator. Exercise 6.12. Assume in addition that X, Y are compact metric spaces and µ, ν are finite measures on the Borel σ-algebras of X and Y , respectively. Deduce Proposition 6.11 in this case as a corollary of Lemma 6.10.

Proof of Proposition 6.11. Note first that Z 1/2 Z 2 |f (x)k(x, y)| dµ(x) 6 kf kL2µ |k(x, y)| dµ(x) . X

(6.2)

X

Squaring and integrating over Y gives Z Z Y

X

2 |f (x)k(x, y)| dµ(x) dν(y) 6 kf k2L2µ kkk2L2

µ×ν

1. Then A = {D ∈ BX ⊗ BY | the claim above holds for D∩(Xn ×Yn ) for all n > 1} is a σ-algebra containing all rectangles A × B for A ∈ BX and B ∈ BY . It follows that A = BX ⊗ BY . Finally, if D ⊆ X × Y has finite measure, then

1D − 1D∩(X ×Y ) 2 −→ 0 n n L µ×ν

as n → ∞ by dominated convergence. Therefore the simple functions as in (6.3) are indeed dense, which gives the proposition.  Exercise 6.13. Prove that the collection A in the proof of Proposition 6.11 is a σ-algebra. Exercise 6.14. Let g ∈ L2 (Td ). Show that L2 (Td ) ∋ f 7→ f ∗g ∈ C(Td ) defines a compact operator from (L2 (Td ), k · k2 ) to (C(Td ), k · k∞ ).

6.1 Compact Operators

173

Not all integral operators are compact, as shown by the Holmgren operators. Proposition 6.15 (Holmgren). Let (X, BX , µ) and (Y, BY , ν) be σ-finite measure spaces. Let k : X × Y → R be measurable on X × Y , with Z sup |k(x, y)| dν(y) < ∞ x∈X

and sup y∈Y

Y

Z

X

|k(x, y)| dµ(x) < ∞.

Then the integral operator K defined by Z K(f )(y) = f (x)k(x, y) dµ(x)

(6.4)

is a bounded operator K : L2µ → L2ν . Moreover, kKk 6



sup

x∈X

Z

Y

1/2  1/2 Z |k(x, y)| dν(y) sup |k(x, y)| dµ(x) < ∞. y∈Y

X

Proof. The proof that the integral in (6.4) makes sense for ν-almost every y in Y , and defines an element in L2ν , is less straightforward than the proof of Proposition 6.11, and uses the Fr´echet–Riesz representation theorem (Corollary 3.19). Suppose that f ∈ L2µr{0} and g ∈ L2νr{0}, and consider the integral Z I= |f (x)k(x, y)g(y)| dµ×ν(x, y). X×Y

Notice that for any real numbers a, b > 0 and c > 0, we always have ab 6 ab +

q

c 2a

q

1 2c b

2

=

ca2 b2 + . 2 2c

Applying this and Fubini’s theorem to the definition of I with a = |f (x)| and b = |g(y)| gives ZZ  1 I6 |k(x, y)| 2c |f (x)|2 + 2c |g(y)|2 dµ(x) dν(y) ZX×Y Z Z Z c 1 2 6 |k(x, y)|dν(y)|f (x)| dµ(x)+ |k(x, y)|dµ(x)|g(y)|2 dν(y) 2 X Y 2c Y X Z Z c 1 6 kf k2L2µ sup |k(x, y)| dν(y) + kgk2L2ν sup |k(x, y)| dµ(x) . 2 2c x∈X Y y∈Y X | {z } | {z } sX

sY

174

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

If sX or sY is 0, then k = 0 µ × ν-almost everywhere and the proposition q kgk 2 Lν Y holds trivially. If not, we optimize the parameter c by setting c = ssX kf k 2 , Lµ

and obtain Z

X×Y

|f (x)k(x, y)g(y)| dµ×ν(x, y) 6

√ sX sY kf kL2µ kgkL2ν .

It follows that (x, y) 7→ f (x)k(x, y)g(y) is µ × ν-integrable on X × Y , and that Z Z φ : g 7−→ g(y) f (x)k(x, y) dµ(x) dν(y) (6.5) Y

X

√ is a continuous functional on L2ν with kφk 6 sX sY kf kL2µ . We conclude first that Z K(f )(y) = f (x)k(x, y) dµ(x) X

is well-defined ν-almost everywhere. Using the Fr´echet–Riesz representation theorem (Corollary 3.19) the functional φ can be represented by taking the in√ ner product with a function in L2ν again with norm bounded by sX sY kf kL2µ . Varying the element g in (6.5) we see that K(f ) must be this function, and √ we obtain kK(f )kL2ν 6 sX sY kf kL2µ .  The main difference between Hilbert–Schmidt integral operators and Holmgren integral operators is that the latter are not automatically compact. Exercise 6.16. Let X = Y = R and µ = ν = λ, the Lebesgue measure on R. Define k(x, y) =

(

1 0

for kx − yk 6 1, otherwise.

Show that the corresponding Holmgren operator K as defined in Proposition 6.15 is not a compact operator on L2λ (R).

6.2 Spectral Theory of Self-Adjoint Compact Operators There is a general spectral theory of compact operators L : V → V on Banach spaces. However, as we will discuss later, our applications do not need that level of generality and the statement and proof for the simpler case of selfadjoint operators is significantly easier. For these reasons we will restrict to that case below and refer to Lax [59, Ch. 21] for the general result.

6.2 Spectral Theory of Self-Adjoint Compact Operators

175

6.2.1 The Adjoint Operator Let H1 , H2 be Hilbert spaces, and let A : H1 → H2 be a bounded operator. For any fixed v2 ∈ H2 the map H1 ∋ v1 7→ hAv1 , v2 iH2 is linear and bounded since |hAv1 , v2 i| 6 kAv1 kkv2 k 6 kAkop kv2 kkv1 k. Therefore, by the Fr´echet– Riesz representation theorem (Corollary 3.19) applied to H1 there exists some uniquely determined element, which will be denoted A∗ v2 ∈ H1 , with the properties that hv1 , A∗ v2 iH1 = hAv1 , v2 iH2 (6.6) for all v1 ∈ H1 , and

kA∗ v2 k 6 kAkop kv2 k.

(6.7)

This defines a bounded operator A∗ : H2 → H1 , called the adjoint of A. This map is indeed linear, since hv1 , A∗ (v2 + αv2′ )i = hAv1 , v2 + αv2′ i = hAv1 , v2 i + α hAv1 , v2′ i

= hv1 , A∗ v2 i + α hv1 , A∗ v2′ i = hv1 , A∗ v2 + αA∗ v2′ i

for v1 ∈ H1 , v2 , v2′ ∈ H2 and any scalar α. By (6.7) we have kA∗ kop 6 kAkop , so A∗ is bounded. Taking conjugates in (6.6) implies that A∗∗ = A, so kAkop = kA∗ kop . Essential Exercise 6.17. (a) Show that the map A 7→ A∗ is semi-linear. (b) Let A : H1 → H2 and B : H2 → H3 be bounded operators between Hilbert spaces. Show that (BA)∗ = A∗ B ∗ . Exercise 6.18. Show that im(T )⊥ = ker(T ∗ ) and ker(T )⊥ = im(T ∗ ) for a linear operator T between Hilbert spaces.

The adjoint operation allows us to give an alternate definition of unitarity. Definition 6.19. An operator U : H1 → H2 between two Hilbert spaces is unitary if U ∗ U = IH1 and U U ∗ = IH2 , which we also write as U ∗ = U −1 . Exercise 6.20. (a) Show that an operator U : H1 → H2 is unitary in the sense of Definition 6.19 if and only if it is a bijective isometry (that is, a bijection with kU vkH2 = kvkH1 for all v ∈ H1 ). (b) Suppose that U : H1 → H2 is an isometry. Show that U ∗ U = IH1 and that U U ∗ is the orthogonal projection Pim(U ) from H2 onto the closed subspace im(U ) ⊆ H2 . Exercise 6.21 (Von Neumann’s mean ergodic theorem [78]). Let U : H → H be a unitary operator on a Hilbert space H and let I = {v ∈ H | U v = v} be the subspace of invariant vectors. (a) Show that I isP closed and that {U v − v | v ∈ H} is dense in I ⊥ . 1 n (b) Show that n n−1 j=0 U v → PI v as n → ∞, where PI is the orthogonal projection onto I.

Definition 6.22. A bounded operator A : H → H on a Hilbert space H is called self-adjoint if A∗ = A.

176

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

The next exercise revisits the maps introduced in Exercise 6.1. Exercise 6.23. (a) Define U : ℓ2 (Z) → ℓ2 (Z) by U ((xn )n∈Z ) = (xn+1 )n∈Z . Show that the operator U is unitary. (b) Define S : ℓ2 (N) → ℓ2 (N) by S ((xn )n∈N ) = (xn+1 )n∈N . Show that kSkop = 1, but that S is not an isometry. (c) Define T : ℓ2 (N) → ℓ2 (N) by T ((xn )) = (0, x1 , x2 , . . . ), which shifts the sequence to the right and fills in the first entry of the new sequence with a 0. Show that kT kop = 1, that T = S ∗ is an isometry, is not surjective, and has no eigenvectors. Exercise 6.24 (Decomposition of isometries). Let H be a Hilbert space and U : H → H an isometry. Show that there exists an orthogonal decomposition Hshift ⊕ Hunitary L H= n into two closed subspaces with the property that Hshift = n>0 U V for some closed subspace V , and U |Hunitary : Hunitary → Hunitary is unitary.

The next exercise is not simply another example. It turns out to really be the basis of the powerful spectral theory of normal bounded operators as well as self-adjoint unbounded operators. Essential Exercise 6.25. Let (X, B, µ) be a measure space, H = L2µ (X), let g : X → C be a measurable function, and let Mg be the multiplication operator Mg : f 7→ gf for f ∈ H. (a) What properties of g ensure that Mg : H → H is well-defined and bounded? What is kMg kop ? (b) When is Mg a bounded self-adjoint operator? That is, what property of g is equivalent to hMg f1 , f2 i = hf1 , Mg f2 i holding for all f1 , f2 ∈ H? What property of g is equivalent to Mg being unitary? (c) When does Mg have λ ∈ C as an eigenvalue? (d) Suppose that X = R and let g(x) = x, and assume that µ is an arbitrary finite compactly supported Borel measure on R. Characterize in terms of µ the property that Mg can be diagonalized. That is, characterize the property that H has an P orthonormal basis P∞{en | n ∈ N} and a sequence of scalars (λn ) ∞ such that Mg ( n=1 xn en ) = n=1 λn xn en for every (xn ) ∈ ℓ2 (N). Exercise 6.26. Let H = Cn be a finite-dimensional Hilbert space with respect to the usual inner product. Show that the linear operator defined by a matrix A = (ai,j ) is selfadjoint if and only if A is equal to its own conjugate transpose (that is, ai,j = aji for all i, j). Such matrices are also called Hermitian.

6.2.2 The Spectral Theorem The spectral theorem presented here generalizes to an infinite-dimensional setting the familiar fact that a Hermitian matrix has real eigenvalues and can be diagonalized using a unitary matrix. We will assume separability of Hilbert spaces in this section in order to make use of an orthonormal basis that consists of a sequence. Properly formulated, the next result holds more generally, and in particular allows the kernel of A to be a non-separable space. Both the finite-dimensional and the inseparable case can easily be extracted from the proof we give.

6.2 Spectral Theory of Self-Adjoint Compact Operators

177

Theorem 6.27 (Spectral theorem for compact self-adjoint operators). Let H be a separable infinite-dimensional Hilbert space, and let A be a compact self-adjoint operator on H. Then there exists a sequence of real eigenvalues (λn ) with λn → 0 as n → ∞, and an orthonormal basis {vn } of eigenvectors with Avn = λn vn for all n > 1. In other words, a compact self-adjoint operator is diagonalizable, each nonzero eigenvalue has finite multiplicity, and 0 is the only possible accumulation point of the set of eigenvalues. Given these properties — which will turn out to be extremely useful — it is worth asking if there are any such operators. Clearly such operators exist in the following sense. If {en } is an orthonormal basis of a Hilbert space H, and (λn ) is a sequence of real numbers with λn → 0 as n → ∞, then we may define an operator A : H → H by ! ∞ ∞ X X A xn en = λn xn en n=1

n=1

P∞

for any convergent series n=1 xn en . It may then be checked that A is compact and self-adjoint. Of course, Theorem 6.27 does not tell us anything we did not already know about such an operator. A more interesting kind of example is found among the integral operators. Let H = L2µ (X), where (X, B, µ) is a σ-finite measure space, and suppose that k ∈ L2µ×µ (X × X) satisfies k(x, y) = k(y, x) for µ × µ-almost every point (x, y) ∈ X × X. Then the operator K defined by Z K(f )(y) = f (x)k(x, y) dµ(x) X

is compact by Proposition 6.11, and is self-adjoint since Z Z ∗ hf1 , K (f2 )i = hK(f1 ), f2 i = f1 (x)k(x, y) dµ(x)f2 (y) dµ(y) Z

X

X

Z = f1 (x) f2 (y) k(x, y) dµ(y) dµ(x) = hf1 , K(f2 )i | {z } X X =k(y,x)

for all f1 , f2 ∈ L2µ (X) by Fubini’s theorem. Hence Theorem 6.27 applies, but in this case it is a priori not at all clear how one could find the eigenvalues or eigenvectors for the operator. Example 6.28. Notice that the integral operator from Section 2.5.2 defined by the kernel ( s(t − 1) for 0 6 s 6 t 6 1; G(s, t) = t(s − 1) for 0 6 t 6 s 6 1 satisfies the conditions above, and so the eigenfunctions found in Section 2.5.2 coincide with the eigenvectors which must exist by Theorem 6.27.

178

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

In fact as we saw in Section 3.4 (see Exercise 3.55(b) and its hint on p. 566) the functions s1 , s2 , . . . form an orthonormal basis of L2 ([0, 1]) which makes K a diagonalizable operator. These notions P∞ also explain the argument from P Section 2.5.2 quite clearly: If g = n=1 dn sn and we are ∞ looking for f = n=1 cn sn with (I + λ2 K)f = g, then (1 + λ2 µn )cn = dn for all n ∈ N, which can be solved for cn unless λ2 = −µ−1 n and dn 6= 0. Exercise 6.29. Let K be the Hilbert–Schmidt integral operator on L2µ (X) defined by a kernel k ∈ L2µ×µ (X × X) with k(x, y) = k(y, x) as above. Prove that the generalized Fredholm integral equation of the second kind f = λK(f ) + φ has a solution for any function φ ∈ L2µ (X) if and only if λλn 6= 1 for all n, where (λn ) is the sequence of eigenvalues of K on L2µ (X).

We will see another class of compact self-adjoint operators in Section 6.4. 6.2.3 Proof of the Spectral Theorem Lemma 6.30 (Invariance of orthogonal complement). Let A : H → H be a bounded operator on a Hilbert space. If V ⊆ H is an A-invariant subspace (that is, a subspace with A(V ) ⊆ V ), then V ⊥ is A∗ -invariant. Proof. If v ′ ∈ V ⊥ and v ∈ V , then hA∗ v ′ , vi = hv ′ , Avi = 0. As this holds for all v ∈ V , we must have A∗ v ′ ∈ V ⊥ .  As we will see, Lemma 6.30 reduces the proof of Theorem 6.27 mostly to finding a single eigenvector e1 , as we can then apply the lemma to V = he1 i and A = A∗ to see that V ⊥ is A-invariant. We now approach the central statement concerning the existence of an eigenvalue. Before doing this, it is useful to recall how one proves the complete diagonalizability of self-adoint operators on Rd . By compactness, we may choose e ∈ Sd−1 = {v ∈ Rd | kvk2 = 1} such that the quadratic form hAx, xi achieves its maximum at x = e. Using Lagrange multipliers one can then check that e is an eigenvector of A. The vector e is then an eigenvector with eigenvalue λ ∈ R of absolute value |λ| = kAkop . This relies in an essential way on the compactness of the unit sphere Sd−1 , which as we know fails in infinitedimensional Hilbert spaces, and it is here that the additional assumptions on A will become important. Lemma 6.31 (The norm and the quadratic form). Let A : H → H be a bounded self-adjoint operator on a Hilbert space. Then kAk = sup |hAx, xi| . kxk61

Notice that if A is self-adjoint, then hAx, xi ∈ R for all x ∈ H, since hAx, xi = hx, Axi = hA∗ x, xi = hAx, xi .

(6.8)

6.2 Spectral Theory of Self-Adjoint Compact Operators

179

Proof of Lemma 6.31. Let us write s(A) = sup |hAx, xi| kxk61

for the right-hand side of (6.8). Then, by the Cauchy–Schwarz inequality, |hAx, xi| 6 kAxkkxk 6 kAkkxk2 6 kAk for all x ∈ H with kxk 6 1. Hence s(A) 6 kAk. The proof of the opposite inequality is slightly more involved. For λ > 0, we have

A(λx ± λ1 Ax), λx ± λ1 Ax = hA(λx), λxi + A2 ( λ1 x), A( λ1 x) ± 2kAxk2 . Taking the difference of the two equations we see that

4kAxk2 = A(λx + λ1 Ax), λx + λ1 Ax − A(λx − λ1 Ax), λx − λ1 Ax  6 s(A) kλx + λ1 Axk2 + kλx − λ1 Axk2

since the two inner products appearing are of the form hAu, ui and thus satisfy |hAu, ui| 6 s(A)kuk2 . Now we apply the parallelogram identity (3.4) to obtain  4kAxk2 6 2s(A) λ2 kxk2 + λ12 kAxk2 . Assuming that kAxk = 6 0, we set λ2 = 4kAxk2 6 2s(A)



kAxk kxk

and get

kAxk kxk kxk2 + kAxk2 kxk kAxk



= 4s(A)kAxkkxk,

and so kAxk 6 s(A)kxk for all x ∈ H. This shows that kAk 6 s(A).



We are now ready to prove the existence of an eigenvector. Lemma 6.32 (Main step: finding the first eigenvector). Let A be a compact self-adjoint operator on a non-trivial Hilbert space. Then either kAk or −kAk is an eigenvalue of A. Proof. If kAk = 0 then A = 0 and there is nothing to prove, so we may assume that kAk > 0. By Lemma 6.31 there exists a scalar α with |α| = kAk and a sequence (xn ) in H with kxn k = 1 for all n > 1 and with hAxn , xn i → α as n → ∞. As remarked before the proof of Lemma 6.31, hAxn , xn i is real and so α ∈ {kAk, −kAk}. Now notice that 0 6 kAxn − αxn k2 = kAxn k2 − 2ℜ (α hAxn , xn i) + α2 kxn k2 = kAxn k2 − 2α hAxn , xn i + α2

6 2kAk2 − 2α hAxn , xn i −→ 2kAk2 − 2kAk2 = 0

180

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

as n → ∞. In particular, this shows that (Axn ) converges if and only if (αxn ) converges, and that the limits agree if this is the case. However, since kxn k = 1 and A is a compact operator there exists a subsequence (xnk ) for which Axnk converges, say Axnk −→ αx (6.9) as k → ∞ for some x ∈ H. Therefore, αxnk → αx as k → ∞ as well, and hence xnk → x as k → ∞. Since A is continuous, we deduce that Axnk → Ax as k → ∞. Together with (6.9) we have Ax = αx, and since kxnk k = 1 for all k > 1 and xnk → x as k → ∞ we also have kxk = 1 and hence x 6= 0.  Now we combine the arguments above to prove the spectral theorem for compact self-adjoint operators. Proof of Theorem 6.27. By assumption, H is an infinite-dimensional Hilbert space and A : H → H is a compact self-adjoint operator. By Lemma 6.32 there exists an eigenvector e1 with eigenvalue λ1 ∈ R, and with |λ1 | = kAk. We may assume without loss of generality that ke1 k = 1. Suppose now, for the purposes of an induction argument, that we have already found orthonormal eigenvectors e1 , . . . , en with corresponding eigenvalues λ1 , . . . , λn . Let Vn = he1 , . . . , en i be the linear span of these vectors, and notice that A(Vn ) ⊆ Vn since they are eigenvectors for A. By Lemma 6.30 we have A∗ (Vn⊥ ) ⊆ Vn⊥ , but since A∗ = A this means that A(Vn⊥ ) ⊆ Vn⊥ . Write An = A|Vn⊥ : Vn⊥ −→ Vn⊥ for the restriction of A to Vn⊥ . Then An is a compact operator because A is compact, and is self-adjoint because A is self-adjoint.† Therefore, we may apply Lemma 6.32 again to the operator An : Vn⊥ → Vn⊥ to find another eigenvector en+1 orthogonal to e1 , . . . , en with eigenvalue λn+1 satisfying |λn+1 | = kAn k, and ken+1 k = 1. Repeating the argument, we find an orthonormal sequence (en ) of eigenvectors with Aen = λn en and λn ∈ R. We need to show that λn → 0 as n → ∞. By construction we have |λn+1 | = kAn k = kA|Vn⊥ k 6 kA|Vn−1 ⊥ k = kAn−1 k = |λn |, so that by induction we have |λ1 | > |λ2 | > · · · .

(6.10)

If λn 6→ 0 as n → ∞, then there is some ε > 0such that |λn | > ε for all n > 1 by (6.10). This shows that εen = A λεn en ∈ A (B1 ) for all n > 1, √ and since en ⊥ em for n 6= m we must have kεen − εem k = ε 2 for n 6= m. This shows that the sequence (εen ) lies in A(B1 ) (which is compact because A †

If w1 , w2 ∈ Vn⊥ then hA∗n w1 , w2 i = hw1 , An w2 i = hA∗ w1 , w2 i = hAw1 , w2 i and Aw1 lies in Vn⊥ , so we have A∗n = A|V ⊥ = An . n

6.2 Spectral Theory of Self-Adjoint Compact Operators

181

is a compact operator) but cannot have a convergent subsequence, which is a contradiction. Also, since |λn | = kA|Vn⊥ k > kA|V ⊥ k, where V = he1 , e2 , . . . i, we see that A|V ⊥ = 0. Thus far we have not used the assumption that H is separable, and the statement at the end of the last paragraph is the general result. Assuming now that H is separable, we can choose an orthonormal basis of V ⊥ (which might be zero, in which case the theorem is already proved, or might be finite-dimensional). Listing this orthonormal basis of V ⊥ together with the basis of V already constructed proves the theorem.  Exercise 6.33. Let H be a separable Hilbert space, and let A1 , A2 , . . . be a sequence of commuting self-adjoint bounded operators on H. Using Theorem 6.27, state and prove a simultaneous spectral theorem for the sequence assuming either of the properties below: (1) An is compact for all n > 1; or (2) A1 is compact and ker(A1 ) = {0}.

6.2.4 Variational Characterization of Eigenvalues †

In the following we let A be a compact self-adjoint operator on a separable infinite-dimensional Hilbert space H (or a Hermitian matrix in Matn,n (C)). Applying Theorem 6.27 we find a (finite or countable) sequence of positive eigenvalues ϕ1 (A) > ϕ2 (A) > · · · > 0 and a (finite or countable) sequence of negative eigenvalues ν1 (A) 6 ν2 (A) 6 · · · < 0, with corresponding orthonormal eigenvectors v1 , v2 , . . . and w1 , w2 , . . ., respectively, so that X X A= ϕj (A)vj ⊗ vj∗ + νj (A)wj ⊗ wj∗ , j

j

where we define v ∗ (w) = hw, vi and u ⊗ v ∗ (w) = v ∗ (w)u for u, v, w ∈ H. This decomposition is known as the spectral resolution of A. In many situations it is useful to be able to say something about the eigenvalues of A + B (even for Hermitian matrices A and B) in terms of the eigenvalues of A and of B, a fundamentally non-linear problem. The variational approach to finding eigenvalues dates back to Cauchy’s interlacing theorem [17] (see Exercise 6.36). There are three elementary observations that can be made in this direction. • Assuming that H is infinite-dimensional the spectral resolution shows that the numerical range {hAv, vi | kvk = 1} coincides with the real interval [ν1 , ϕ1 ] unless there are no negative or positive eigenvalues. In † The material of this subsection motivates some later arguments in this chapter but is strictly speaking not necessary.

182

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

the former case we obtain the numerical range (0, ϕ1 ] or [0, ϕ1 ] (and hence set ν1 = 0) and in the latter case we obtain [ν1 , 0) or [ν1 , 0] (and hence set ϕ1 = 0). In particular, ϕ1 (A) = sup hAv, vi ,

(6.11)

kvk=1

where the supremum is achieved if ϕ1 > 0, and ν1 (A) = inf hAv, vi , kvk=1

(6.12)

which is again achieved if ν1 < 0. Thus ϕ1 (A + B) 6 ϕ1 (A) + ϕ1 (B) and ν1 (A + B) > ν1 (A) + ν1 (B). • In the case of a Hermitian matrix A ∈ Matn,n (C), we do not have to distinguish between positive and negative eigenvalues and may simply write λ1 (A) 6 · · · 6 λn (A) for its eigenvalues. Viewing the functions λj on the linear space of Hermitian matrices the above applies (again without giving 0 a special role) as well, and we see that λn is a convex function and λ1 a concave one. • Also note that the trace map tr : Matn,n (C) −→ C is linear. Hence λ1 (A + B) + · · · + λn (A + B) = λ1 (A) + · · · + λn (A) + λ1 (B) + · · · + λn (B) (6.13) for any A, B ∈ Matn,n (C). More detailed assertions about the relationships between the eigenvalues of Hermitian matrices are the subject of Horn’s conjecture.(16) We will not go into the details of this, but state as exercises some special cases which can be proved with elementary methods and which are widely used in other parts of mathematics. The first of these is the min-max principle. Exercise 6.34. Generalize the identities (6.11) and (6.12) by proving the Courant– Fischer–Weyl theorem or min-max principle for compact self-adjoint operators as follows. Fix some k > 1 and in the following let V vary over all k-dimensional subspaces of an infinite-dimensional separable Hilbert space H. Show that inf

max

V v∈V,kvk=1

hAv, vi = νk (A),

where we set νk (A) = 0 if there are fewer than k negative eigenvalues. Similarly sup V

min

v∈V,kvk=1

hAv, vi = ϕk (A),

(6.14)

where we set ϕk (A) = 0 if there are fewer than k positive eigenvalues. Formulate and prove the result also for Hermitian matrices. Exercise 6.35. Deduce from Exercise 6.34 the Weyl monotonicity principle (17) as follows. For compact self-adjoint operators A and B write A 6 B if hAv, vi 6 hBv, vi for all v. Show that if A 6 B then νj (A) 6 νj (B) and ϕj (A) 6 ϕj (B) (where we set νj = 0 and ϕj = 0 if there are not sufficient eigenvalues of the necessary sign) for all j. Formulate and prove the result also for Hermitian matrices.

6.3 Trace-Class Operators

183

Exercise 6.36. Use Exercise 6.34 to prove Cauchy’s interlacing theorem as follows. Let A ∈ Matn,n (C) be a Hermitian matrix. A matrix B ∈ Matm,m (C) with m 6 n is called a compression of A if there is an orthogonal projection Q from Cn onto an m-dimensional subspace with QAQ∗ = B. Show that λj (A) 6 λj (B) 6 λn−m+j (A) for 1 6 j 6 m in this case.

6.3 Trace-Class Operators †

The trace is undoubtedly one of the fundamental functions on the Pspace of matrices. Recall that for any n > 1 the trace is defined by tr(A) = nk=1 Akk for all A = (Ajk ) ∈ Matn,n (C) and that it satisfies tr(AB) = tr(BA) for A and B in Matn,n (C), so that tr(S −1 AS) = tr(A)

(6.15)

for any A ∈ Matn,n (C) and S ∈ GLn (C). The identity (6.15) means that the trace is well-defined on the space of linear maps of a finite-dimensional vector space (specifically, independent of the choice of basis). Using a Hilbert space structure on Cn and fixing an orthonormal basis v1 , . . . , vn we note that hAvj , vk i is the coefficient of vk when expressing Avj in terms of the orthonormal basis for j, k = 1, . . . , n. Hence tr(A) =

n X hAvj , vj i. j=1

It is desirable to extend the definition of the trace to operators on an infinite-dimensional Hilbert space H. However, since tr(In ) = n for the identity matrix In ∈ Matn,n (C) and all n > 1, it is clear that the trace cannot have a reasonable definition on all operators on H, and in particular not on the identity. The following definition gives the natural domain of the trace functional. Definition 6.37. Let H be a Hilbert space. A linear operator A : H → H is called trace-class if its trace-class norm kAktc =

sup

N X

(vn ),(wn ) n=1

|hAvn , wn i|

is finite, where the supremum is taken over all integers N > 0 and over any two finite lists of orthonormal vectors (v1 , . . . , vN ) and (w1 , . . . , wN ) of the same length N . † In this section we present an important class of compact operators. However, it is not needed for the further developments in this volume.

184

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

In the following we will assume that H is separable and complex (once again separability is only needed to simplify the notation and some steps in the proofs, but is not crucial for the results and as every real Hilbert space H has a complexification HC = H ⊗R C = H ⊕ iH as in Exercise 6.51, the assumption that H is a complex vector space is also not a significant restriction). We list a few consequences of this definition for trace-class operators A, B : H → H, which in particular justify our calling it a norm. • If v ∈ H is a unit vector with Av 6= 0 then we may set w = apply the definition with N = 1, v and w to see that

1 kAvk Av,

and

kAvk = hAv, wi 6 kAktc , so kAkop 6 kAktc and hence a trace-class operator is bounded. • If α is a scalar, then kαAktc =

sup

N X

(vn ),(wn ) n=1

|hαAvn , wn i| = |α|kAktc .

• The triangle inequality follows, since kA + Bktc = 6

sup

N X

(vn ),(wn ) n=1

sup

N X

(vn ),(wn ) n=1

|h(A + B)vn , wn i|

 |hAvn , wn i| + |hBvn , wn i| 6 kAktc + kBktc .

Thus the space TC(H) = {A ∈ B(H) | kAktc < ∞} of trace-class operators is a linear subspace of the space of bounded operators, and k · ktc is a norm on TC(H). • Noting that a unitary operator maps an orthonormal list of vectors to an orthonormal list of vectors, we see that kAU ktc = kU Aktc = kAktc for any unitary U : H → H. • By writing a bounded operator of H as a linear combination of four unitary operators (see Lemma 6.38 below) we see that if A ∈ TC(H) and B ∈ B(H) then AB, BA ∈ TC(H). Lemma 6.38 (Four unitary operators). Any bounded operator on a separable complex Hilbert space may be written as a linear combination of four unitary operators. This will be shown using the spectral theory of self-adjoint operators in Section 12.4.2 (after Corollary 12.45) and will be used here as a black box. Theorem 6.39 (Trace functional). Let H be a separable complex Hilbert space. Then there exists a linear functional tr : TC(H) −→ C with the following properties:

6.3 Trace-Class Operators

185

(1) | tr(A)| 6 kAktc , (2) tr(A) = tr(U −1 AU ), and (3) tr(AB) = tr(BA) for all A ∈ TC(H), B ∈ B(H) and unitary U ∈ B(H). Moreover, (4) tr(A) =

∞ X

n=1

hAvn , vn i

for any A ∈ TC(H) and orthonormal basis (vn ) of H. Proof of Theorem 6.39: assuming independence. Using wn = vn for all n > 1 in the definition of kAktc shows that ∞ X

n=1

|hAvn , vn i| 6 kAktc ,

so the right-hand side of (4) converges absolutely and gives a definition of the linear functional tr satisfying (1). The difficult part of the theorem is to show that in (4) the right-hand side is independent of the choice of the orthonormal basis. Assuming this for now, property (2) follows at once as the operation sending A to U −1 AU corresponds to choosing the orthonormal basis (U vn ) in place of (vn ). In particular, tr(AU ) = tr(U A) for all A ∈ TC(H), and by applying Lemma 6.38 we can write every bounded operator B as a linear combination of four unitary operators and deduce by linearity of the trace that (3) holds as well.  For the proof that the right-hand side of (4) is independent of the choice of the orthonormal basis two lemmas are needed. Lemma 6.40 (Orthonormal approximations). Let (vn ) be an orthonormal basis of a Hilbert space H and let w1 , . . . , wm be orthonormal vectors. Then for every ε > 0 there exists some N > 1 and orthonormal vec′ tors w1′ , . . . , wm ∈ hv1 , . . . , vN i satisfying kwj − wj′ k < ε for j = 1, . . . , m. Proof. Let πN : H → hv1 , . . . , vN i be the orthogonal projection. By the properties of an orthonormal basis (Proposition 3.36) we have πN (w) → w as N → ∞ for any w ∈ H. Applying this projection to w1 , . . . , wm gives wj′′ = πN (wj ) → wj

(6.16)

as N → ∞. For every N we now apply the Gram–Schmidt procedure to obtain

186

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

w1′ = c1 w1′′ w2′ = c2 (w2′′ − hw2′′ , w1′ i w1′ ) w3′ = c3 (w3′′ − hw3′′ , w1′ i w1′ − hw3′′ , w2′ i w2′ ) .. .

′′ ′ ′  ′ ′′ ′′ wm = cm wm − hwm , w1′ i w1′ − · · · − wm , wm−1 wm−1 ,

where the constants c1 , . . . , cm > 0 are chosen to normalize the vectors to have unit length. As w1 , . . . , wm are orthogonal and due to (6.16), a simple induction on j = 1, . . . , m shows that cj exists for all large enough N , that cj → 1 and also wj′ → wj as N → ∞.  Lemma 6.41 (Tail estimate). Let H be a Hilbert space, let A ∈ TC(H) and let (vn ) be an orthonormal basis of H. Then for every ε > 0 there exists ′ ′ some N such that for every extension vN +1 , vN +2 , . . . of v1 , . . . , vN to an orthonormal basis of H we have ∞ X

n=N +1

|hAvn′ , vn′ i| 6 ε.

Proof. Fix some ε > 0 and some N > 0 and suppose the claim in the lemma does not hold for N . Then there exist orthonormal vectors w1 , . . . , wm in hv1 , . . . , vN i⊥ such that m X

k=1

|hAwk , wk i| > ε.

(6.17)

Now apply Lemma 6.40 to the orthonormal vectors w1 , . . . , wm and the orthonormal basis vN +1 , vN +2 . . . of the Hilbert space hv1 , . . . , vN i⊥ to find some N ′ > N large enough and a very good orthonormal approxima′ tion w1′ , . . . , wm to w1 , . . . , wm inside hvN +1 , vN +2 , . . . , vN ′ i. In particular, ′ we may suppose that (6.17) also holds for w1′ , . . . , wm . We now apply the argument above infinitely often to achieve a contradiction to the hypothesis that A ∈ TC(H). Indeed, set N0 = 0 to find some N1 > N0 and orthonormal vectors w1,1 , . . . , w1,m1 in hv1 , . . . , vN1 i so that (6.17) also holds for w1,1 , . . . , w1,m1 . Assuming we have already found N0 < N1 < · · · < Nℓ and orthonormal vectors wj,1 , . . . , wj,mj in hvNj−1 +1 , . . . , vNj i with the same estimate for all j = 1, . . . , ℓ, we may apply the same argument to find wℓ+1,1 , . . . , wℓ+1,mℓ+1 in hvNℓ +1 , . . . , vNℓ+1 i with the same properties. However, the bound ℓε
0 was arbitrary the claimed independence follows.  As explained directly after the theorem, Theorem 6.39 follows from the independence and the black box Lemma 6.38. Proposition 6.42 (Compactness). Every trace-class operator on a complex Hilbert space H is compact. Before proving this, notice the following property of the trace-class norm. As the supremum is taken over (vn )n=1,...,N and (wn )n=1,...,N separately, we could multiply each wn by an appropriate scalar αn with |αn | = 1 to ensure that hAvn , αn wn i > 0. Therefore we may also write

188

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

N X kAktc = sup hAvn , wn i , (vn ),(wn ) n=1

and if A ∈ TC(H) then for every ε > 0 there exist finite orthonormal lists (vn )n=1,...,N and (wn )n=1,...,N with N X

n=1

hAvn , wn i > kAktc − ε.

In the proof of Proposition 6.42 we will approximate a trace-class operator by operators with finite-dimensional range and then apply Lemma 6.7. In order to do this, it will be useful to understand the behaviour of the traceclass norm for matrices. For this we endow Cn and Cn+1 with the standard inner product, and identify Cn with Cn × {0} ⊆ Cn+1 . Lemma 6.43 (Trace-class norm for matrices). Let us assume that a matrix A0 ∈ Matn,n (C), the vectors b, c ∈ Cn , and the scalar d ∈ C together define   A0 b A1 = ∈ Matn+1,n+1 (C) ct d and satisfy kA0 ktc > (1 − ε2 )kA1 ktc for some ε ∈ (0, 1). Then

 

b √

d 6 5εkA1 ktc .

Proof. From the fact that en+1 ⊥ Cn (by the identification between Cn and Cn × {0}) and the definition of the trace-class norms, we have kA0 ktc + |d| 6 kA1 ktc and so |d| 6 kA1 ktc − kA0 ktc 6 ε2 kA1 ktc 6 εkA1 ktc

(6.18)

by the hypotheses. The lemma will follow from Pythagoras’ theorem after we have shown the more delicate estimate kbk 6 2εkA1 ktc

(6.19)

for the vector b. To highlight the main point in the argument for (6.19) let us treat the case n = 1 first. In that case A0 = a, b, c, d ∈ C. If θ ∈ C has |θ| = 1 then √  1 − ε2 v= εθ and

6.3 Trace-Class Operators

189

  1 w= 0 are both unit vectors, and we may apply the definition of the trace-class norm kA1 ktc to just these two vectors and obtain p |hA1 v, wi| = a 1 − ε2 + bεθ 6 kA1 ktc . By choosing the argument of θ correctly this gives p |a| 1 − ε2 + |b|ε 6 kA1 ktc ,

(6.20)

and the assumption of the lemma implies that p 3/2 2 kA0 ktc 1 − ε2 > 1 − ε2 kA1 ktc > 1 − ε2 kA1 ktc  > 1 − 2ε2 kA1 ktc .

(6.21)

Combining (6.20) and (6.21) with kA0 ktc = |a| gives  1 − 2ε2 kA1 ktc + |b|ε 6 kA1 ktc ,

which is equivalent to (6.19). The idea of the proof of (6.19) in the general case is similar. However, since b is a vector for n > 2, some additional preparations are needed. By compactness of the closed and bounded subset 2

Un (R) = {A ∈ Matn,n (C) | A∗ A = I} ⊆ Matn,n (C) ∼ = R2n

and the comment after the statement of Proposition 6.42 above, there exist orthonormal bases v1 , . . . , vn ∈ Cn and w1 , . . . , wn ∈ Cn with kA0 ktc =

n X j=1

hA0 vj , wj i .

We define a unitary matrix U ∈ Matn,n (C) by requiring that U ∗ vj = wj for all j = 1, . . . , n, so that kA0 ktc =

n X j=1

hA0 vj , U ∗ vj i =

n X j=1

hU A0 vj , vj i = tr(U A0 ).

(6.22)

Since we have now expressed kA0 ktc as the trace of U A0 , independence of the trace on the choice of orthonormal basis implies that (6.22) holds for an arbitrary orthonormal basis v1′ , . . . , vn′ of Cn . In particular, from the equality and the definition of the trace-class norm we deduce that

U A0 vj′ , vj′ > 0 (6.23)

190

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

for any choice of orthonormal basis of Cn and j = 1, . . . , n. Now let the orthonormal basis v1′ , . . . , vn′ of Cn be chosen so that U b = kbkvn′ . We extend U to a unitary operator on Cn+1 by setting U en+1 = en+1 , and note that U A1 en+1 = U (b + den+1 ) = kbkvn′ + den+1 by definition of A1 and the choice of orthonormal basis. We now consider the two orthonormal lists p ′ v1′ , . . . , vn−1 , 1 − ε2 vn′ + εen+1 and

′ U −1 v1′ , . . . , U −1 vn−1 , U −1 vn′ .

Using the definition of the trace-class norm kA1 ktc we get kA1 ktc >

n−1 X j=1

p U A1 vj′ , vj′ + 1 − ε2 hU A1 vn′ , vn′ i + εkbkhvn , vn i

p > 1 − ε2 kA0 ktc + εkbk,

where we have used (6.23) and (6.22) in the last step. This is the analogue to (6.20) with |a| replaced by kA0 ktc . Together with (6.21) we obtain (6.19) in the general case.  Proof of Proposition 6.42. Let A be a trace-class operator on H. We will construct, for every ε > 0, an operator Aε : H → H with finite-dimensional range and with kA − Aε kop ≪ εkAktc . (6.24) This will imply the proposition by Lemma 6.7. To construct Aε we choose orthonormal vectors v1 , . . . , vn and w1 , . . . , wn in H with n X hAvk , wk i > (1 − ε2 )kAktc k=1

as in the definition of k·ktc and using the comment after Proposition 6.42. We define Aε by setting it equal to A on hv1 , . . . , vn i and to 0 on hv1 , . . . , vn i⊥ . For any vector v ′ ∈ hv1 , . . . , vn i and v ′′ ∈ hv1 , . . . , vn i⊥ we have (A − Aε ) (v ′ + v ′′ ) = Av ′′ , and we claim that kAv ′′ k 6

√ 5εkAktc kv ′′ k,

which will imply (6.24). To prove (6.25) we may assume that v ′′ = vn+1 ∈ hv1 , . . . , vn i⊥

(6.25)

6.3 Trace-Class Operators

191

is a unit vector. We write Avn+1 =

n X

bj wj + dwn+1

j=1

for b ∈ Cn , d ∈ C, and wn+1 ∈ hw1 , . . . , wn i⊥ a unit vector. We apply Lemma 6.43 to the matrix A1 ∈ Matn+1,n+1 (C) defined by (A1 )jk = hAvk , wj i for j, k = 1, . . . , n + 1. Then   A0 b A1 = ∈ Matn+1,n+1 (C) ct d for some matrix A0 ∈ Matn,n (C) and c ∈ Cn . By choice of the orthonormal lists (vn ) and (wn ) we have (1 − ε2 )kAktc 6

n X j=1

hAvj , wj i = tr A0 6 k(A0 )ktc .

(6.26)

Since a different choice of an orthonormal basis of Cn+1 corresponds to a different choice of an orthonormal basis of hv1 , . . . , vn+1 i or of hw1 , . . . , wn+1 i, it follows that kA1 ktc 6 kAktc . Arguing a bit more carefully we may find, again by compactness, some unitary matrix U ∈ Matn+1,n+1 (C) with kA1 ktc = tr(U ∗ A1 ) =

n+1 X j=1

hU ∗ A1 ej , ej i.

Pn+1 Pn+1 Since hU ∗ A1 ej , ej i = hAvj , k=1 ukj wk i and the vectors k=1 ukj wk ∈ H are orthonormal for j = 1, . . . , n + 1, the inequality kA1 ktc 6 kAktc follows. Combining the estimate kA1 ktc 6 kAktc with (6.26) and Lemma 6.43, we get (6.25) for any v ′′ ∈ hv1 , . . . , vn i⊥ with kv ′′ k = 1. It follows that A is the limit of Aε defined as above as ε ց 0 (with respect to the operator norm), and so Lemma 6.7 implies the proposition.  The results above regarding the trace and the trace-class are satisfying, but the concepts would not be important without non-trivial examples of traceclass operators. We next discuss the relationship with the class of self-adjoint (compact) operators, which gives us many examples. We say that a self-adjoint operator A on a Hilbert space H is positive if hAv, vi > 0 for all v ∈ H. Proposition 6.44. Let H be a complex Hilbert space and A a bounded operator on H. If A is self-adjoint and positive and (vn ) is an orthonormal basis P∞ of H, then kAktc = n=1 hAvn , vn i.

192

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

In particular, tr(A) = kAktc , where in the case of a positive operator A with kAktc = ∞ this extends our definition of the trace. P Proof of Proposition 6.44. The inequality kAktc > ∞ n=1 hAvn , vn i follows directly from the definition of thePtrace-class norm. For the opposite ∞ inequality we may suppose that S = n=1 hAvn , vn i is finite and let (xk ) and (yk ) be two orthonormal lists of length K as in the definition of the trace-class norm. We wish to show K X

k=1

|hAxk , yk i| 6 S.

(6.27)

Using Lemma 6.40 we can find some N > 1 and orthonormal approximations of (xk ) and (yk ) within V = hv1 , . . . , vN i. Letting N → ∞ later on, it suffices to show (6.27) for the approximations within V and we will use the same letters to denote the approximations. We extend the orthonormal lists (xk ) and (yk ) to orthonormal bases of V . Using the comment after Proposition 6.42 we may adjust the yk once more and assume without loss of generality that hAxk , yk i > 0 for k = 1, . . . , N without changing the value of the left-hand side in (6.27). We also define a unitary operator U : V → V satisfying U ∗ xk = yk for k = 1, . . . , N . In other words, we wish to estimate K X

k=1

hAxk , yk i 6

N X

k=1

hAxk , yk i =

N X

k=1

hAxk , U ∗ xk i = tr(U AV ),

(6.28)

where we let AV be the positive self-adjoint operator v ∈ V 7→ πV (Av) and πV : H → V is the orthogonal projection. As a trace (on the finitedimensional space V ) can be calculated in any basis we may also calcu′ late tr(U AV ) using an orthonormal basis v1′ , . . . , vN of V consisting of eigenvectors of AV . Let λ1 , . . . , λN in R>0 be the corresponding eigenvalues. Since we have tr(U AV ) > 0 by (6.28), we obtain tr(U AV ) =

N X

N X hU AV vj′ , vj′ i = λn hU vj′ , vj′ i

n=1

n=1

6

N X

n=1

λn = tr(AV ) =

N X

hAvn , vn i 6 S.

n=1

This, together with (6.28), implies (6.27), first for the approximations of (xk ) and (yk ) in V , and then using Lemma 6.40 and letting N → ∞ as indicated earlier for any two lists of orthonormal vectors in H. Hence kAktc 6 S and the proposition follows.  Exercise 6.45. Let A be a compact operator. Show that A has a polar decomposition of the form A = QP where ker(A) = ker(Q) = ker(P ), Q|(ker(A))⊥ is an isometry, and P is positive, self-adjoint, and compact. Show that P is trace-class if and only if A is.

6.3 Trace-Class Operators

193

Corollary 6.46 (Lidski˘ı’s theorem [61]). Let H be a separable complex Hilbert space. A self-adjoint bounded operator A on H is trace-class if and only if it is compact and its eigenvalues λn (allowing repetitions as in Theorem 6.27) satisfy ∞ X |λn | = kAktc < ∞. n=1

If A is indeed trace-class, then

tr(A) =

∞ X

λn

n=1

and this sum converges absolutely. Proof. Assume first that A is self-adjoint and trace-class. Then Proposition 6.42 implies that A is compact. Using the orthonormal basis consisting of eigenvectors vn with eigenvalues λn from Theorem 6.27 and the P∞ Pdefinition ∞ of the trace it follows that n=1 |λn | 6 kAktc < ∞ and tr(A) = P∞ n=1 λn . Let now A be a compact self-adjoint operator and assume n=1 |λn | < ∞, where λn are the eigenvalues of A. Again let vn be an orthonormal basis of H consisting of eigenvectors for A (with corresponding eigenvalues λn ). We let αn ∈ {±1} be chosen with λn αn = |λn | for all n > 1. Define the positive self-adjoint operator P on H by setting P vn = |λn |vn and the unitary operator U on H by setting U vn = αn vn for all n > 1 and linearlyP extending both to H, so that A = U P . By Proposition 6.44 we have kP ktc = ∞ n=1 |λn |, and by the initial properties of the trace-class norm this gives kAktc = kU P ktc = kP ktc =

∞ X

n=1

|λn | < ∞,

which implies the corollary.



Let us indicate how the trace appears frequently in applications, and how to calculate it in these special circumstances. Exercise 6.47. Let Rℓ > d and k ∈ H ℓ (Td × Td ). Show that the Hilbert–Schmidt integral operator K(f )(x) = Td k(x, y)f (y) dy on L2 (Td ) isRtrace-class and that the trace is given by the integral along the diagonal, that is tr(K) = Td k(x, x) dx.

Proposition 6.48. Let X be a compact metric space, let µ be a finite measure on X, and let k ∈ C(X × X) be a continuous kernel with the property that the associated Hilbert–Schmidt operator K defined by Z K(f )(x) = k(x, y)f (y) dµ(y) X

is trace-class. Then tr(K) =

Z

X

k(x, x) dµ(x).

194

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

Proof. Let (ξℓ ) be a sequence of finite measurable partitions of X that become finer in the sense that max diam(P ) −→ 0

(6.29)

P ∈ξℓ

as ℓ → ∞, and assume that the sequence is refining, meaning that each element of ξℓ is a union of elements of ξℓ+1 for ℓ > 1. For P ∈ ξℓ we also define the unit vector 1 wP = p 1P µ(P )

and notice that {wP | P ∈ ξℓ } is an orthonormal basis of its linear hull Wℓ for all ℓ > 1 (if µ(P ) = 0 we do not associate a vector wP to this partition element). Since (ξℓ ) is refining, we have Wℓ ⊆ Wℓ+1 for all ℓ > 1. Now define an orthonormal sequence (vn ) by starting with wP for P ∈ ξ1 (in some fixed order) and extending it via Gram–Schmidt first to an orthonormal basis v1 , . . . , vn(2) of W2 , then to an orthonormal basis v1 , . . . , vn(3) of W3 , and so on. It is clear that the assumption (6.29) implies that S the characteristic function of every open set belongs to the closure of ℓ>1 Wℓ . This in turn implies, as in the proof of Proposition 2.51, that [

Wℓ = L2µ (X)

ℓ>1

and hence that (vn ) is an orthonormal basis of L2µ (X). Therefore tr(K) =

∞ X

n=1

n(ℓ)

hKvn , vn i = lim

ℓ→∞

X

n=1

hKvn , vn i = lim

ℓ→∞

X

P ∈ξℓ

hKwP , wP i

Pnℓ since n=1 hKvn , vn i = tr(πWℓ K|Wℓ ), where πWℓ denotes the orthogonal projection onto Wℓ , and this trace can also be computed in the orthonormal basis {wP | P ∈ ξℓ }. Now we may use the definition of K to see that Z Z X X X 1 1 hKwP , wP i = K(1P ) dµ = k(x, y)dµ×µ(x, y). µ(P ) µ(P ) P ∈ξℓ

P ∈ξℓ

P

P ∈ξℓ

P ×P

Now fix ε > 0 and use uniform continuity of k to find an ℓ sufficiently large to ensure that |k(x, y) − k(x, x)| < ε whenever x, y ∈ P for some P ∈ ξℓ . We may also suppose that ℓ is large enough to have X hKwP , wP i < ε. tr(K) − P ∈ξℓ

Together we see that

6.3 Trace-Class Operators

195

Z tr(K) − k(x, x) dµ(x) X Z Z 1 6 ε + k(x, y) dµ×µ(x, y) − k(x, x) dµ(x) P P ∈ξℓ µ(P ) P ×P X 1 Z (k(x, y) − k(x, x)) dµ×µ(x, y) 6 (1 + µ(X)) ε. 6ε+ {z } µ(P ) P ×P | P ∈ξℓ 0 was arbitrary, the proposition follows.



Exercise 6.49. Generalize Proposition 6.48 by assuming that X is a σ-compact, locally compact, metric space, µ is locally finite, and k ∈ C(X × X) ∩ L2µ×µ (X × X) defines a trace-class integral operator. Exercise 6.50. Let H be a (separable) Hilbert space. (a) Show that TC(H) is complete (and hence a Banach space) with respect to the traceclass norm. (b) What is the closure of TC(H) with respect to the operator norm? Exercise 6.51. Let H be a real Hilbert space. (a) Define HC = H ⊗R C = H ⊕ iH and define ha1 + ia2 , b1 + ib2 iC = ha1 , b1 i + ha2 , b2 i + i(ha2 , b1 i − ha1 , b2 i). Show that h·, ·iC is a complex inner product making HC into a complex Hilbert space. (b) We used Lemma 6.38 (concerning complex Hilbert spaces) twice in this section. Use (a) to show that the results of this section also hold in the case of a real Hilbert space. Exercise 6.52. Let (T, B, µ) be a probability space, H a Hilbert space, and T ∋ t 7−→ At ∈ TC(H) a map for every v, w ∈ H. R such that t ∈ T 7→ hAt v, wi is measurable R R Also suppose that T kAt ktc dµ(t) < ∞. Show that A = A R t dµ(t) (defined via v 7→ At v dµ(t) as in Section 3.5.4) is trace-class, and that tr(A) = tr(At ) dµ(t).

The next two exercises give a tool for showing that certain Hilbert–Schmidt integral operators (as in Proposition 6.48) are trace-class.

Exercise 6.53. Let H be a Hilbert space with an orthonormal basis (en ). Define the Hilbert–Schmidt norm X kAk2HS = |hAej , ek i|2 j,k

and the space of Hilbert–Schmidt operators HS(H) = {A ∈ B(H) | kAkHS < ∞}. (a) Show that A ∈ HS(H) if and only if A∗ ∈ HS(H), and that kA∗ kHS = kAkHS for all A ∈ HS(H). (b) Show that the definition of the Hilbert–Schmidt norm is independent of the choice of orthonormal basis. (c) Show that HS(H) forms a two-sided ideal in B(H). That is, for any A ∈ HS(H) and B ∈ B(H) we have AB ∈ HS(H) and BA ∈ HS(H).

196

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

(d) Find an inner product on HS(H) which induces the norm k · kHS , and show that HS(H) is a Hilbert space with this inner product. (e) Show that HS(H) is also a Banach algebra, meaning that kABkHS 6 kAkHS kBkHS . (f) Show that HS(H) is a closed subspace of B(H) if and only if H is finite-dimensional. (g) Show that every Hilbert–Schmidt operator is compact. (h) Assume now that H = L2 ((0, 1)). For every k ∈ L2 ((0, 1)2 ) we define the associated Hilbert–Schmidt integral operator as in Proposition 6.11. Show that the space of Hilbert– Schmidt integral operators corresponds exactly to HS(H). In particular, show that for any operator A ∈ HS(H) the corresponding kernel kA is given by kA (x, y) =

X i,j

hAei , ej i ei (x)ej (y).

Exercise 6.54. (a) Show that if A, B ∈ HS(H) then AB is trace-class. (b) If C is trace-class then there are operators A, B ∈ HS(H) with C = AB. Exercise 6.55. (18) Let H be a Hilbert space with respect to the inner product h·, ·iH , and write k · kH for the induced norm on H. Let h·, ·i0 be a semi-inner product on H, and write k · k0 for the induced semi-norm on H. Assume that k · k0 6 k · kH . (a) Show that there exists a unique positive bounded self-adjoint operator A such that hv, wi0 = hAv, wiH . The relative trace of k · k0 with respect to k · kH is defined as the trace of A (which might be infinity). (b) Let k > d2 , H = H k (U ) for some open subset U ⊆ Rd , and hf, gi0 = f (x)g(x) for some fixed x ∈ U . Show that A as in (a) has finite trace (and so k · k0 has finite relative trace with respect to k · kH ). (c) Let µ be a compactly supported measure on U . Combine (b) with Exercise 6.52 to show that the semi-norm kf kL2 (µ) = respect to k · kH .

R

1/2

|f |2 dµ

for f ∈ H k (U ) has finite relative trace with

6.4 Eigenfunctions for the Laplace Operator We will prove in this section the claim from Section 1.2 that for any open bounded subset U ⊆ Rd there is a basis of L2 (U ) consisting of eigenfunctions of the Laplace operator such that these functions also vanish (in the squaremean sense) at the boundary of U . In the proof we will first go back to the case of the d-dimensional torus, even though (or actually precisely because) we already have an orthonormal basis consisting of eigenfunctions of the Laplacian in this setting, namely the characters. In Section 6.4.2 we will define a right inverse of ∆ defined on L2 (U ) for an open subset U of Rd — a setting in which we do not know the eigenfunctions of the Laplacian. Finally, we will ask in Section 6.4.4 about the growth rate of the eigenvalues and prove Weyl’s law for Jordan measurable open domains. We start by stating the main theorem, which will be proved in Section 6.4.2.

6.4 Eigenfunctions for the Laplace Operator

197

Theorem 6.56 (Existence of basis of Laplace eigenfunctions). Let U be an open bounded subset of Rd . Then there exists an orthonormal basis {fn } of L2 (U ) of functions in H01 (U ) which are smooth in U and have ∆fn = λn fn , with λn < 0 for all n > 1, and λn → −∞ as n → ∞. 6.4.1 Right Inverse and Compactness on the Torus We already used the fact that the characters on Td are eigenfunctions of the Laplace operator on Td in the proof of elliptic regularity (Lemma 5.48). Obtaining a compact self-adjoint right inverse to ∆ is quite easy on the torus Td . 

R

Exercise 6.57. Define L20 (Td ) = f ∈ L2 (Td ) | Td f dx = 0 , and prove that there exists a compact self-adjoint operator S : L20 (Td ) −→ L20 (Td ) with the property that ∆Sf = f for all f ∈ L20 (Td ).

For the discussion on an open subset we will need the following lemma. Lemma 6.58 (Compactness on the torus). The operator ı1,0 : H 1 (Td ) −→ H 0 (Td ) = L2 (Td ) is compact. Proof. For the proof we define  ∂ j f k2 6 1 for j = 1, . . . , d , K = f ∈ L2 (Td ) | kf k2 6 1 and ∂ j f exists with k∂

 1 d  H (T ) and note that ı1,0 B1 ⊆ K. Hence it suffices to show that K is totally P bounded. Let now f = n∈Zd an χn ∈ K. The definition of ∂ j f implies ∂ j f, χn i = −hf, ∂j χn i = 2πinj an . Using kf k2 6 1 and k∂ ∂ j f k2 6 1 that h∂ for j = 1, . . . , d we obtain X  1 + knk22 |an |2 ≪ 1. n∈Zd

This implies a uniformity claim for the convergence of the Fourier series of all f ∈ K. Indeed, for any N > 1 we have X X X |an |2 = N −2 N 2 |an |2 6 N −2 (1 + knk22 )|an |2 ≪ N −2 , knk2 >N

knk2 >N

knk2 >N

and so we see that the above tail sum goes to zero uniformly for all f ∈ K as N → ∞. To see that K is totally bounded we fix some ε > 0 and choose N such that the above statement becomes

198

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

X

knk2 >N

P

|an |2 < ε2 /4

for all f = n an χn ∈ K. Next take a finite ε/2-dense subset of the finitedimensional compact set n o X f= an χn | kf k2 6 1 and k∂j f k2 6 1 for j = 1, . . . d . knk2 6N

Combining these statement shows that we have found an ε-dense subset of K. As ε > arbitary, K is totally bounded, which implies that the closure  0 was  H 1 (Td ) of ı1,0 B1 is compact.  Exercise 6.59. Consider the map ık,ℓ : H k (Td ) → H ℓ (Td ) for k > ℓ > 0. (a) Characterize those k and ℓ for which the map ık,ℓ is compact. (b) Characterize those k for which the map ık,0 ı∗k,0 is Hilbert–Schmidt class. (c) Characterize those k for which the map ık,0 ı∗k,0 is trace-class.

6.4.2 A Self-Adjoint Compact Right Inverse on Open Subsets The following provides the link between the Laplace operator and our discussion of compact self-adjoint operators in Theorem 6.27. The compactness claim is a special case of Rellich’s Theorem. Proposition 6.60 (Self-adjoint compact right inverse). Let U ⊆ Rd be a bounded and open subset. Using Lemma 5.41 we equip H01 (U ) with the inner product h·, ·i1 . Then the map ı = ı1,0 : H01 (U ) −→ H 0 (U ) = L2 (U ) has the property that ∆(ıı∗ f ) ∈ L2 (U ) exists for all f ∈ L2 (U ) and equals −f . In other words, ∆ ◦ (−ıı∗ ) = I is the identity on L2 (U ). Finally, S = −ıı∗ is a compact self-adjoint operator L2 (U ) −→ L2 (U ). Proof. Recall the map ı : H01 (U ) −→ H 0 (U ) sending f 7→ f from Proposition 5.13. The adjoint is a map ı∗ : H 0 (U ) −→ H01 (U ), and so the composition ıı∗ is indeed a map from L2 (U ) to L2 (U ). By Exercise 6.17 ıı∗ (and hence S) is self-adjoint. Now let φ ∈ Cc∞ (U ) and f ∈ L2 (U ). Then X

X ∂ j ıı∗ f,∂j φiL2 (U) h−ıı∗ f,∆φiL2 (U) = − ıı∗ f,∂j2 φ L2 (U) = h∂ j

j

= hı∗ f,φi1 = hf, ıφiL2 (U) = hf,φiL2 (U)

shows that ∆ ◦ (−ıı∗ ) = I, as claimed. It remains to show that S is a compact operator. For this notice that ı is the composition of the operators P

ı1,0

·|U

H01 (U ) −→ H 1 (TdR ) −→ H 0 (TdR ) = L2 (TdR ) −→ L2 (U ),

6.4 Eigenfunctions for the Laplace Operator

199

where we choose R such that U ⊆ BR , P is the periodizing operator from Lemma 5.36, ı1,0 is the operator from Proposition 5.3 that simply forgets regularity and is compact by Lemma 6.58, and finally ·|U is the restriction operator to U . We also note that we equip H01 (U ) with the norm derived from the inner product h·, ·i1 in Lemma 5.40 but H 1 (Td ) with the standard Sobolev norm. By Lemma 5.41 this is not an issue since the norm derived from h·, ·i1 is equivalent to the standard Sobolev norm on H01 (U ). As ı is the composition of bounded operators and a compact operator, Lemma 6.3 applies, and it follows that ı and also S are compact operators.  Proof of Theorem 6.56. Let S = −ıı∗ : L2 (U ) → L2 (U ) be the self-adjoint compact operator from Proposition 6.60. Theorem 6.27 gives an orthonormal basis (fn ) of eigenvectors with Sfn = µn fn and µn → 0 as n → ∞. By Proposition 6.60 we have ∆(Sfn ) = fn , so that ∆(µn fn ) = fn . Note that the eigenvectors fn all lie in the image of ı, and so belong to H01 . It follows that µn 6= 0 and ∆fn = λn fn with λn = µ1n , and so we have |λn | → ∞ as n → ∞. Since S = −ıı∗ we have µn = hSfn , fn iL2 (U) = − hı∗ fn , ı∗ fn i1 6 0 so that λn → −∞ as n → ∞. It remains to show that fn is smooth in U and ∆fn = λn fn for all n > 1, and this is precisely the statement of Corollary 5.47.  There are very few examples of domains U for which one can write down the eigenfunctions of the Laplace operator explicitly. Important exceptions Rd are rectangles U = (0, a1 ) × (0, a2 ) × · · · × (0, ad ) and balls U = BM . Exercise 6.61. (a) Let U = (0, a1 ) × · · · × (0, ad ). Show that the functions fn arising as eigenfunctions of the Laplace operator as in Theorem 6.56 can be chosen to take the form (1)

(d)

fn (x) = sin(λn x1 ) · · · sin(λn xd ). (b) Let U = {(x1 , x2 ) ∈ (0, 1) × (0, 1) | x1 + x2 < 1}. Find an orthonormal basis of L2 (U ) consisting of eigenfunctions of the Laplace operator and satisfying the Dirichlet boundary value conditions. Exercise 6.62. Assume that d > 2 (or that d = 2 for simplicity). Let U ⊆ Rd be open and K ⊆ U a compact subset. Let f ∈ H01 (U ) be an eigenfunction of ∆ (and of S as in the proof of Theorem 6.56) such that ∆f = λf for some λ < −1. Show that d

1

kf kK,∞ ≪K,U |λ| 4 + 2 kf k2 .

6.4.3 Eigenfunctions on a Drum (19)

We now describe a concrete case of Theorem 6.56. As mentioned earlier, a concrete description of the Laplace eigenfunction is generally impossible unless the domain has special features. Thus a natural case beyond the open 2 rectangle considered in Exercise 6.61 is to set U to be the open unit disc B1R . 1 For a given eigenfunction f ∈ H0 (U ) of ∆ and some rotation matrix

200

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

  cos φ − sin φ k(φ) = sin φ cos φ for φ ∈ [0, 2π) as in Section 1.1 we may consider the function f k (x) = f (kx). A simple calculation (which may be carried out using Proposition 1.5) shows that f k is also an eigenfunction of ∆ on U with the same eigenvalue as f . Since the eigenspace of H01 functions of ∆ for a given eigenvalue is finitedimensional, it follows that we can find for any given eigenvalue a basis of the eigenspace with the property that every basis vector also has some weight n ∈ Z for the action of K on U (cf. Corollary 3.89). Fixing the weight n ∈ Z and the eigenvalue λ < 0, the partial differential equation ∆f = λf has a convenient reformulation. In fact, a calculation ∂2 ∂2 reveals that the Laplace operator ∆ = ∂x 2 + ∂y 2 has the representation ∆=

∂2 1 ∂ 1 ∂2 + + ∂r2 r ∂r r2 ∂θ2

in polar coordinates (we will also write f for the eigenfunction in polar coordinates), and if f has weight n then f (r, θ) = F (r)einθ for a function F on [0, 1]. Since f is smooth on U , F is smooth on (0, 1). Since f vanishes on ∂U we have F (1) = 0. Moreover, if n 6= 0 we must also have F (0) = 0 (check this). Finally, the partial differential equation ∆f = λf now becomes 2 n2 the ordinary differential equation ddrF2 + 1r dF dr − r 2 F = λF , or, equivalently, r2

 d2 F dF +r + |λ|r2 − n2 F = 0, dr2 dr

(6.30)

with the conditions on F (0) and F (1) as explained above. The differential equation x2 Jn′′ + xJn′ + (x2 − n2 )Jn = 0 (6.31) on (0, ∞) is known as Bessel’s equation and the solutions are called the Bessel functions, one of a class of special functions introduced by the astronomer Bessel in 1917 in connection with the problem of three bodies moving under mutual gravitational attraction. The two equations (6.30) and (6.31) are essentially equivalent by setting x = |λ|1/2 r and Jn (x) = F (|λ|1/2 r). Since (6.31) is a linear second-order differential equation there are two linearly independent real solutions for each λ and n. The function Jn is characterized up to a scalar multiple by the condition that limx→0 Jn (x) exists (see Exercise 6.63(b)). Bessel found the integral representation Z 1 π Jn (x) = cos (x sin t − nt) dt (6.32) π 0 of the function Jn (we refer to Whittaker and Watson [113] for a general treatment of special functions). We will not develop this theory further, but refer to Figure 6.1–6.2 for a visualization of the resulting functions; in mod-

6.4 Eigenfunctions for the Laplace Operator

201

Fig. 6.1: Two eigenfunctions with weight n = 0

Fig. 6.2: Two eigenfunctions with weight n = 1

elling the behaviour of a drum the time variable is also needed, and so these illustrations may be thought of as snapshots of an oscillating drum skin (as alluded to in Section 1.2.2). Exercise 6.63. Make the discussion of this section complete by the following steps. (a) Prove that Jn as defined in (6.32) satisfies the differential equation (6.31). (b) Show that the equation 6.31 has a solution Yn with Yn (x) → −∞ as x → 0 given by Yn (x) =

1 π

Z

π 0

sin (x sin t − nt) dt −

1 π

Z

0



ent + (−1)n e−nt e−x sinh t dt.

(The solutions Jn and Yn are referred to as Bessel functions of the first and second kind, respectively.) (c) Show that for every n ∈ Z there is an eigenfunction of weight n. (d) Show that for every n ∈ Z the eigenvalues (and eigenfunctions) of weight n correspond to the zeros of Jn .

6.4.4 Weyl’s Law The eigenfunctions and eigenvalues arising in Theorem 6.56 are mysterious. While their existence and some of their properties are readily proved, it is not usually possible to describe them analytically in closed form unless the open set is very special, so numerical approximations are often all that is available. It is, however, possible to count them asymptotically as expressed in this result of 1911 by Weyl [111].(20) Recall that U ⊆ Rd is said to be

202

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

Jordan measurable if it is bounded and its characteristic function is Riemann integrable. Theorem 6.64 (Weyl’s law). Let U ⊆ Rd be open and Jordan measurable. Let N (T ) = |{n | |λn | 6 T }| denote the number of eigenvalues of ∆ on U with eigenfunctions in H01 (U ) and absolute value bounded by T (with repetitions allowed, just as in Theorem 6.56). Then lim

T →∞

N (T ) = (2π)−d ωd m(U ), T d/2

(6.33) d

where m is the Lebesgue measure on Rd and ωd = m(B1R ) is the volume of the unit ball in Rd . In 1966 M. Kac [50] asked ‘Can one hear the shape of a drum?’ As we explained in Section 1.2.2, the eigenvalues of the Laplacian on an open set U relate directly to the frequencies at which a membrane with the shape U would vibrate. Thus the notes one hears from a drum with shape U are precisely related to the eigenvalues of the Laplacian and the question raised by Kac asks whether the list of eigenvalues determines U (up to isometric motions of Rd ). One of the consequences of Theorem 6.64 is that the size of the drum certainly can be heard in this sense. Kac’s question was answered in the negative.(21) Our (by now well-established) approach is to first show the result for the torus, and we will then apply a technique known as Dirichlet–Neumann bracketing to extend the proof to the general case. Proposition 6.65. Let R > 0 and U = TdR = Rd /(2RZd ) or U = (0, R)d . Then Weyl’s law holds for the eigenvalues of the Laplacian on U . Proof. In both cases, write (λn ) for the eigenvalues and (fn ) for the associated eigenfunctions in H 1 (TdR ) resp. H01 (U ). In the case of TdR we know that the basis of eigenfunctions is given by (χn ), where χn (x) = e2πi(2R)

−1

(n1 x1 +···+nd xd )

for x ∈ TdR and n ∈ Zd . The Laplace eigenvalue of χn is given by λn = −(2π)2 (2R)−2 knk22 for all n ∈ Zd . Hence N (T ) = {n ∈ Zd | (2π)2 (2R)−2 knk22 6 T } d = {n ∈ Zd | knk22 6 T (2R)2 (2π)−2 } = Zd ∩ BTR1/2 2R/(2π)

is determined by a lattice point counting problem. Using the fundamental domain F = [− 21 , 12 )d for Zd in Rd we have

6.4 Eigenfunctions for the Laplace Operator d

203 d

R √ R √ BS− ⊆ (Zd ∩ BSR ) + F ⊆ BS+ d d d

√ for any S > d (see Figure 6.3). Taking the Lebesgue measure (satisfy √ d d √ d R ing m(F ) = 1) we obtain ωd (S − d) 6 Z ∩ BS 6 ωd (S + d)d and lim

S→∞

d d Z ∩ BSR Sd

= ωd .

Combining the two discussions and setting S to be T 1/2 2R/2π gives lim

T →∞

N (T ) = ωd , (T 1/2 2R/2π)d

or equivalently (6.33) for U = Td .

Fig. 6.3: Counting lattice points in a ball for d = 2.

We now extend the result to U = (0, R)d . For this let n ∈ Nd0 and note that the characters (χm ) on TdR for m = (±n1 , . . . , ±nd ) all have the same eigenvalue −4π 2 (2R)−2 knk22 for the Laplacian. Taking linear combinations of the characters we obtain the eigenfunctions f1 (πR−1 n1 x1 ) · · · fd (πR−1 nd xd ),

(6.34)

where each fj for j = 1, . . . , d is either the sine or cosine function. If n ∈ Nd this gives 2d linearly independent eigenfunctions, while for n ∈ Nd0rNd this gives 2e linearly independent eigenfunctions, where e is the number of nonzero components of n. Notice that the hull of these eigenfunctions coincides with the linear hull of all characters, and that these are mutually orthogonal. We claim that the functions that only involve sine functions form the orthogonal basis of L2 (U ) consisting of eigenfunctions of ∆ in H01 (U ). Assuming the claim, we obtain precisely one eigenfunction for every n ∈ Nd , so

204

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

d NU (T ) = {n ∈ Nd | 4π 2 (2R)−2 knk22 6 T } = Nd ∩ BTR1/2 2R/2π ,

and so

NTd (T ) NU (T ) = lim d R d/2 = (2π)−d ωd m(U ). d/2 T →∞ T T →∞ 2 T For the proof of the claim we apply the discussion of even and odd functions from Section 1.1 in d dimensions. For this we identify L2 ((0, R)d ) with the subspace of functions in L2 ((−R, R)d ) = L2 (TdR ) that are odd with respect to all coordinates. More precisely, given f ∈ L2 ((0, R)d ) we define f˜|U = f and f˜(ε1 t1 , ε2 t2 , . . . , εd td ) = ε1 · · · εd f˜(t1 , . . . , td ) lim

for ε1 , . . . , εd ∈ {±1} and (t1 , . . . , td ) ∈ U (and the same formula then holds for all t ∈ (−R, R)d ). Expand f˜ into eigenfunctions of the form (6.34) for all n ∈ Nd . If g is one of these, then either g is the product only of sine functions or it is even with respect to one or more of the variables; assume that it is even with respect to xk . Using the substitution xk → −xk in the inner product we obtain hf˜, giL2 ((−R,R)d ) = h−f˜, giL2 ((−R,R)d ) and so hf˜, giL2 ((−R,R)d ) = 0. This shows that f˜ is expressed using products of sine functions only. We also note that for any f, g ∈ L2 ((0, R)d ) we have D E f˜, g˜ = 2d hf, giL2 ((0,R)d ) L2 ((−R,R)d )

which may be seen by splitting (−R, R)d into 2d smaller cubes and substituting yj = ±xj for j = 1, . . . , d and x ∈ (0, R)d on each one of them. It follows that the functions of the form x 7→ sin(πR−1 n1 x1 ) · · · sin(πR−1 nd xd ) for n ∈ Nd are an orthogonal basis of L2 (U ). As these functions also vanish on ∂U it follows that they belong to H01 (U ) (see Exercise 6.66), proving the proposition. The cautious reader may notice that we have only found an orthonormal basis of L2 (U ) in H01 (U ) consisting of eigenfunctions of ∆ as in Theorem 6.56. However, as ∆ is not a well-defined operator it is not clear whether this basis is the same as the one in Theorem 6.56. This is resolved in Lemma 6.67(a).  Essential Exercise 6.66. (a) Show that the function x 7→ sin(πR−1 nx) lies in H01 ((0, R)) for all n > 1. (b) Formulate and show the analogous result for U = (0, R)d . Lemma 6.67. Let U ⊆ Rd be open and bounded. (a) If a function f ∈ H01 (U ) ∩ C ∞ (U ) satisfies ∆f = λf , then hf, gi1 = −λ hf, giL2 (U)

(6.35)

6.4 Eigenfunctions for the Laplace Operator

205

for all g ∈ H01 (U ). If this holds for a non-trivial f then λ < 0 and we have S(f ) = λ−1 f (where S = −ıı∗ is as in Proposition 6.60). In particular, the eigenspaces inside H01 (U ) of ∆ coincide with those of S. (b) If f1 , f2 , . . . ∈ H01 (U ) ∩ C ∞ (U ) are eigenfunctions of ∆ with eigenvalues 0 > λ1 > λ2 > · · · that form an orthonormal basis of L2 (U ), then 1

1

|λ1 |− 2 f1 , |λ2 |− 2 f2 , . . .

(6.36)

form an orthonormal basis of H01 (U ) with respect to h·, ·i1 . P∞ (c) Moreover, if g ∈ H01 (U ) and an = hg, fn iL2 (U) then g = n=1 an fn converges in L2 (U ) and in H01 (U ). Proof. For the proof of (a), suppose that f ∈ H01 (U ) ∩ C ∞ (U ) satisfies the equation ∆f = λf . Let φ ∈ Cc∞ (U ). Then X hf, φi1 = h∂j f, ∂j φiL2 (U) = − h∆f, φiL2 (U) = −λ hf, φiL2 (U) . j

Since Cc∞ (U ) is dense in H01 (U ), we obtain (6.35) for all g ∈ H01 (U ). In particular, we may set g = f to obtain hf, f i1 = −λ hf, f iL2 (U) , which implies λ < 0 if we assume that f is non-trivial. Recall that ı : H01 (U ) → L2 (U ) is defined by ı(f ) = f . With this, (6.35) gives hı∗ (−λf ), gi1 = h−λf, ı(g)iL2 (U) = h−λf, giL2 (U) = hf, gi1 for all g ∈ H01 (U ). This implies ı∗ (−λf ) = f and S(f ) = λ−1 f . For the proof of (b), let f1 , f2 , . . . be an orthonormal basis of L2 (U ) in H01 (U ) ∩ C ∞ (U ) as in the lemma. Then hfk , fℓ i1 = −λk hfk , fℓ iL2 (U) = 0 for k, ℓ > 1 with k 6= ℓ, and hfk , fk i1 = |λk |hfk , fk iL2 (U) = |λk | by part (a). This shows that (6.36) forms an orthonormal list of vectors in H01 (U ) with respect to h·, ·i1 . To see that these form an orthonormal basis of H01 (U ), suppose that f ∈ H01 (U ) is orthogonal to all of these functions with respect to h·, ·i1 . Then 0 = hf, fn i1 = −λn hf, fn iL2 (U) by assumption and (a). Hence hf, fn iL2 (U) = 0 for all n > 1, and so we must have f = 0 since we assumed that the sequence f1 , f2 , . . . is an orthonormal basis of L2 (U ), and the map ı : P H01 (U ) → L2 (U ) is injective. ∞ For the final claim, let g = n=1 bn fn be the expansion in H01 (U ), and 1 2 notice that ı : H0 (U ) → L (U ) is a bounded operator, and in particular must send a convergent series to a convergent series, so that bn = an for all n > 1.  Our proof of Weyl’s law (Theorem 6.64) relies crucially on the following reformulation of the counting function.

206

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

Lemma 6.68 (Variational characterization). Let U ⊆ Rd be open and bounded. Let f1 , f2 , . . . , λ1 , λ2 , . . . , and N (T ) for T > 0 be as in Theorem 6.56. Then  N (T ) = max dim V | V ⊆ H01 (U ), kf k1 6 T 1/2 kf kL2 for all f ∈ V (6.37) for all T > 0.

Proof. Fix some T > 0. Suppose that |λ1 |, . . . , |λn | 6 T and |λk | > T for all k > n (so that n = N (T )). Define V0 =Phf1 , . . . , fn i. Applying n Lemma 6.67(a) to each fk for k = 1, . . . , n and g = ℓ=1 aℓ fℓ we get kgk21 =

n DX

k=1

ak f k ,

n X

aℓ f ℓ

ℓ=1

E

1

= =

n X

k=1 n X k=1

n D X E ak |λk | fk , aℓ f ℓ

L2 (U)

ℓ=1

n

X

2

|λk ||ak |2 6 T ak f k 2 k=1

L (U)

for all (a1 , . . . , an ) ∈ Cn . This already gives N (T ) = dim V0 6 max dim V with V as in (6.37). For the reverse inequality, assume that V ⊆ H01 (U ) has dim V > N (T ). Then there exists a non-trivial function f ∈ V ∩ V0⊥ (because, for example, any f ∈ V induces a linear functional on V0 by taking the inner product, ∗ and for dimension reasons the resulting linear map V → P V0 cannot be injective). Since V0⊥ = hfn+1 , fn+2 , . . . i we may write f = k>n ak fk with the sum converging in L2 (U ) and in H01 (U ) by Lemma 6.67(c). Therefore using Lemma 6.67(a) as above we obtain DX E X X X kf k21 = ak f k , aℓ f ℓ = |λk ||ak |2 > T |ak |2 = T kf k2L2 , k>n

ℓ>n

1

k>n

k>n

and we see that V does not satisfy the requirement in (6.37). Hence any subspace V as in (6.37) would satisfy dim V 6 dim V0 = N (T ) and the lemma follows.  Proof of Theorem 6.64. Notice first that Lemma 6.68 implies for disjoint open subsets U1 and U2 of a bounded open set U the sub-additivity NU1 (T ) + NU2 (T ) 6 NU (T ),

(6.38)

where we write NU ′ (T ) for the counting function for an open and bounded domain U ′ ⊆ Rd . Indeed, on extending functions to be zero outside Uj we may write H01 (Uj ) ⊆ H01 (U ) for j = 1, 2 as in Exercise 5.27, and, once embedded, we have H01 (U1 ) ⊥ H01 (U2 ), with respect to both h·, ·i1 and h·, ·iL2 (U) , since U1 and U2 are disjoint, so that we can take the direct sum of the subspaces realising the maximum U1 and U2 appearing in Lemma 6.68. We note that — although it is tempting to try — it is not possible to derive the estimate 6.38

6.4 Eigenfunctions for the Laplace Operator

207

directly by expanding eigenfunctions on U1 and U2 to eigenfunctions on U as there would be no reason to expect these functions to be smooth on ∂U1 ∩ U . The variational characterization in Lemma 6.68 avoids this issue. Now let U ⊆ (−R, R)d be an open Jordan measurable subset. By Jordan measurability, for any ε > 0 we can divide (−R, R)d into finitely many cubes so that we can approximate U from the inside and from the outside by finite disjoint unions of small cubes, as illustrated in Figure 6.4.

Fig. 6.4: Approximating U by two pixelated versions of U , one from inside and one from outside.

Let I1 ⊔ · · · ⊔ Ik ⊆ U and U ⊆ O1 ⊔ · · · ⊔ Oℓ be the approximation from inside and outside respectively, where I1 , . . . , Ik and O1 , . . . , Oℓ are translates d of the cube (0, R n ) , and choose n so large and the approximations so well that m ((O1 ⊔ · · · ⊔ Oℓ )r(I1 ⊔ · · · ⊔ Ik )) < ε. By the sub-additivity mentioned above for the various counting functions NU (T ) we have lim inf T →∞

NU (T ) NI1 (T ) + · · · + NIk (T ) > lim T →∞ T d/2 T d/2 −d = (2π) ωd m(I1 ⊔ · · · ⊔ Ik ) > (2π)−d ωd (m(U ) − ε)

by Proposition 6.65. On the other hand, we may add extra cubes to O1 ⊔ · · · ⊔ Oℓ to obtain O1 ⊔ · · · ⊔ Oℓ ⊔ E1 ⊔ · · · ⊔ En = [−R, R]d for some cubes E1 , . . . , En ⊆ [−R, R]d . Hence, again by sub-additivity,

208

6 Compact Self-Adjoint Operators and Laplace Eigenfunctions

(T ) lim sup  NTUd/2 T →∞

+

n X NE

(T )

j T d/2

j=1

NO

o (T )+ 1 ⊔···⊔Oℓ

T d/2

T →∞

N(−R,R)d (T )

T →∞

NEj (T )

j=1

 6 lim sup 6 lim

n X

T d/2

= (2π)−d ωd m((−R, R)d ).

Since for each j we have lim

T →∞

NEj (T ) = (2π)−d ωd m(Ej ) T d/2

it follows that lim sup T →∞

NU (T ) 6 (2π)−d ωd m (O1 ⊔ · · · ⊔ Oℓ ) 6 (2π)−d ωd (m(U ) + ε) . T d/2

Since ε > 0 was arbitrary, this and the reverse bound for the limit infimum above prove the theorem.  Exercise 6.69. Let U ⊆ Rd be open, bounded, and Jordan measurable. Show that lim

n→∞

|λn | = (2π)2 (ωd m(U ))−2/d , n2/d

where λ1 > λ2 > · · · is the ordered list of eigenvalues of ∆ on U . Exercise 6.70. Assume that d > 2 (or, for simplicity, that d = 2), let U ⊆P Rd be an open, bounded, Jordan measurable set, and let f ∈ Cc∞ (U ). Show that the series ∞ n=1 hf, fn ifn , with (fn ) as in Theorem 6.56 ordered as in Exercise 6.69, converges pointwise on U and uniformly on any compact subset of U .

6.5 Further Topics • In Section 8.2.2 we return one more time to the topic of Sobolev spaces and study elliptic regularity up to and including the boundary of U . For further reading in that direction, we refer to Evans [30]. • The spectral theory of compact self-adjoint operators proven here is only the starting point. We discuss spectral theory again in Chapter 9 for unitary operators, in Chapter 11 from a general perspective as a preparation for Chapter 12 for bounded normal operators, and in Chapter 13 for unbounded self-adjoint operators. The reader should continue with Chapter 7 and Chapter 8 as these give important results for the chapters that follow.

Chapter 7

Dual Spaces

Let X be a real (or complex) normed vector space. A bounded linear operator from X into the normed space R (or C) is a (continuous) linear functional on X. Recall that the space of all continuous linear functionals is denoted X ∗ or B(X, R) and it is called the dual or conjugate space of X. Lemma 2.54 shows that X ∗ is a Banach space with respect to the operator norm. In Section 7.1 we prove the Hahn–Banach theorem, a fundamental tool for constructing linear functionals with prescribed properties. We also discuss several further consequences of the Hahn–Banach theorem concerned with the relationship between X and X ∗ . In Section 7.2 we discuss applications of these results. Finally, in Sections 7.3 and 7.4 we will identify the duals of many important Banach spaces, leading to examples and counter-examples to the property of reflexivity.

7.1 The Hahn–Banach Theorem and its Consequences One of the most important questions one may ask of X ∗ is the following: are there ‘enough’ elements in X ∗ ? For example, are there enough elements to separate points? This is answered in great generality using the Hahn–Banach theorem (Theorem 7.3 below); see Corollary 7.4. 7.1.1 The Hahn–Banach Lemma and Theorem Even though in the main applications of the Hahn–Banach lemma the function p below is simply a norm, we will also see applications of this stronger form of the lemma (with the stated weaker assumptions on the function p) in Section 7.4 and Section 8.6.1. Lemma 7.1 (Hahn–Banach lemma). Let X be a real vector space, and assume that p : X → R is a norm-like function with the properties © Springer International Publishing AG 2017 M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications, Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_7

209

210

7 Dual Spaces

p(x1 + x2 ) 6 p(x1 ) + p(x2 ) and p(λx1 ) = λp(x1 ) for all λ > 0 and x1 , x2 ∈ X. Let Y be a subspace of X, and f : Y → R a linear function with f (y) 6 p(y) for all y ∈ Y . Then there exists a linear functional F : X → R such that F (y) = f (y) for y ∈ Y , and F (x) 6 p(x) for all x ∈ X. To stress the similarities of the assumptions on p to the definition of a norm from Definition 2.1 (or semi-norm from Definition 2.11) we refer to the function p as a norm-like function. Proof of Lemma 7.1. Let K be the set of all pairs (Yα , gα ) in which Yα is a linear subspace of X containing Y , and gα is a real linear functional on Yα with gα (y) = f (y) for all y ∈ Y , and gα (x) 6 p(x) for all x ∈ Yα . We make K into a partially ordered set by defining (Yα , gα ) 4 (Yβ , gβ ) if Yα ⊆ Yβ and gα = gβ |Yα . It is clear that any totally orderedSsubset {(Yλ , gλ ) | λ ∈ I} has an upper bound given by the subspace Y ′ = λ Yλ and the functional defined by g ′ (y) = gλ (y) for y ∈ Yλ and λ ∈ I. That Y ′ is a subspace and that g ′ is well-defined both follow since {(Yλ , gλ ) | λ ∈ I} is linearly ordered. Applying Zorn’s lemma: All of this is to prepare the ground for an application of Zorn’s lemma, which roughly speaking allows us to make a transfinite induction with choices (the heart of the argument follows in the next paragraph). Indeed, by Zorn’s lemma (see Section A.1), there is a maximal element (Y0 , g0 ) in K. All that remains is to check that Y0 is all of X (so we may take F to be g0 ).

Extending by one dimension: So assume for the purposes of a contradiction that x ∈ X\Y0 , and let Y1 be the vector space spanned by Y0 and x. Each element z ∈ Y1 may be expressed uniquely in the form z = y + λx with y ∈ Y0 and λ ∈ R, because x is assumed not to be in the subspace Y0 . Define a linear function g1 on Y1 by setting g1 (y + λx) = g0 (y) + λc, where the constant c will be chosen to ensure that g1 is bounded by p. Note that if y1 , y2 ∈ Y0 , then g0 (y1 ) − g0 (y2 ) = g0 (y1 − y2 ) 6 p(y1 − y2 ) 6 p(y1 + x) + p(−x − y2 ), so −p(−x − y2 ) − g0 (y2 ) 6 p(y1 + x) − g0 (y1 ). It follows that A = sup {−p(−x − y) − g0 (y)} 6 inf {p(y + x) − g0 (y)} = B. y∈Y0

y∈Y0

Choose c to be any number in the interval [A, B]. Then, by construction of A and B, c 6 p(y + x) − g0 (y),

(7.1)

7.1 The Hahn–Banach Theorem and its Consequences

211

and −p(−x − y) − g0 (y) 6 c

(7.2)

for all y ∈ Y0 . In order to show the required bound on g1 , we consider scalars of different sign separately. For λ > 0, multiply (7.1) by λ and substitute λ1 y for y to obtain λc 6 p(y + λx) − g0 (y) (7.3) from the assumed (positive) homogeneity. Similarly, for λ < 0, multiply (7.2) by λ, and substitute λ1 y for y to obtain λc 6 |λ|p(−x − λ1 y) − λg0 ( λ1 y). Using the homogeneity assumption on p for |λ| = −λ we obtain (7.3) again. Since the assumptions on g also give (7.3) for λ = 0, we obtain g1 (y + λx) = g0 (y) + λc 6 p(y + λx) for all λ ∈ R and y ∈ Y0 . A contradiction: Thus we have found (Y1 , g1 ) ∈ K with (Y0 , g0 ) 4 (Y1 , g1 ) and Y0 6= Y1 . This contradicts the maximality of (Y0 , g0 ) and hence F = g0 is defined on all of X and satisfies the conclusion of the lemma.  Exercise 7.2. Let X be a real vector space and let K ⊆ X be a convex subset. Suppose that 0 ∈ K and that for every x ∈ X there is some t > 0 with tx ∈ K. Define the gauge function pK (x) = inf{t > 0 | 1t x ∈ K}. Show that pK is norm-like in the sense that it is non-negative, homogeneous for positive scalars, and satisfies the triangle inequality (the latter two being assumptions in Lemma 7.1).

For real vector spaces, the Hahn–Banach theorem follows at once (for complex spaces a little more work is needed). Theorem 7.3 (Hahn–Banach theorem). Let X be a real or complex normed space, and Y a linear subspace. Then for any y ∗ ∈ Y ∗ there exists an x∗ ∈ X ∗ such that kx∗ k = ky ∗ k and x∗ (y) = y ∗ (y) for all y ∈ Y . That is, any linear functional defined on a subspace may be extended to a linear functional on the whole space, without increasing the norm. Proof of Theorem 7.3. Assume first that X is a real normed space. Let p(x) = ky ∗ kkxk and f (x) = y ∗ (x). Apply the Hahn–Banach lemma (Lemma 7.1) to find an extension x∗ = F to the whole space. To check that kx∗ k 6 ky ∗ k, write x∗ (x) = θ|x∗ (x)| with θ ∈ {±1}. Then |x∗ (x)| = θx∗ (x) = x∗ (θx) 6 p(θx) = ky ∗ kkθxk = ky ∗ kkxk. The reverse inequality is clear, so kx∗ k = ky ∗ k. Complex case: Now let X be a complex normed vector space, let Y ⊆ X be a complex linear subspace, let y ∗ ∈ Y ∗ , and define a real linear functional yR∗ by yR∗ (y) = ℜ(y ∗ (y)) for y ∈ Y . Let x∗R : X → R be an extension of yR∗ with kx∗R k = kyR∗ k 6 ky ∗ k (by the real case above). Now define

212

7 Dual Spaces

x∗ (x) = x∗R (x) − ix∗R (ix), which is once again an R-linear map from X to C. It is also C-linear since x∗ (ix) = x∗R (ix) − ix∗R (i2 x) = ix∗R (x) − i2 x∗R (ix) = ix∗ (x). Moreover, for y ∈ Y we have, by C-linearity of y ∗ , x∗ (y) = x∗R (y) − ix∗R (iy) = ℜ(y ∗ (y)) − iℜ(y ∗ (iy)) = ℜ(y ∗ (y)) + iℑ(y ∗ (y)) = y ∗ (y). Finally, |x∗ (x)| = θx∗ (x) for some θ ∈ C with |θ| = 1, and so |x∗ (x)| = θx∗ (x) = x∗ (θx) = x∗R (θx) 6 ky ∗ kkθxk = ky ∗ kkxk, which shows that kx∗ k = ky ∗ k and hence the complex case of the theorem.  7.1.2 Consequences of the Hahn–Banach Theorem Many useful results follow from the Hahn–Banach theorem. Corollary 7.4 (Separation). Let X be a non-trivial normed vector space. Then for any x ∈ X there is a functional x∗ ∈ X ∗ with kx∗ k = 1 and with x∗ (x) = kxk. Hence, if z 6= y ∈ X then there exists an x∗ ∈ X ∗ such that x∗ (y) 6= x∗ (z). Proof. Note that we may assume without loss of generality that x 6= 0. Apply Theorem 7.3 with Y being the linear hull of x to find an extension of the linear map y ∗ (ax) = akxk on Y . Since |y ∗ (ax)| = |a|kxk = kaxk we have ky ∗ k = 1, and so we find an x∗ ∈ X ∗ with kx∗ k = 1 and x∗ (x) = kxk. For the last part, take x = y − z. 

Notice finally that linear functionals allow us to decompose a vector space (see Exercises 3.27 and 3.28): let X be a normed vector space, and x∗ ∈ X ∗ . The null space or kernel of x∗ is the closed linear subspace ker(x∗ ) = {x ∈ X | x∗ (x) = 0}. If x∗ 6= 0, then there is a point x0 6= 0 such that x∗ (x0 ) = 1. Any x ∈ X can then be written as x = z + λx0 , with λ = x∗ (x) and z = x − λx0 ∈ ker(x∗ ). Thus, X = ker(x∗ )⊕ Y , where Y is the one-dimensional space spanned by x0 . Exercise 7.5. Show that every finite-dimensional subspace of a normed vector space has a closed linear complement.

The reader should compare the following result for a general normed vector space to the characterization of the closed linear hull in Hilbert spaces (see Corollary 3.26).

7.1 The Hahn–Banach Theorem and its Consequences

213

Corollary 7.6 (Closed linear hull). Let S ⊆ X be a subset of a normed vector space. Then the closed linear hull of S is precisely the set of all x ∈ X that satisfy x∗ (x) = 0 for all x∗ ∈ X ∗ with x∗ (S) = {0}. Equivalently, \ hSi = ker(x∗ ). x∗ ∈X ∗ x∗ (S)={0}

Proof. The inclusion of the left-hand side in the right-hand side is clear since ker(x∗ ) is a closed subspace for any x∗ ∈ X ∗ . Suppose that x0 ∈ / hSi, and let Y = hx0 i + hSi. Then the functional y ∗ defined by y ∗ (αx0 + z) = α for z ∈ hSi is bounded. For otherwise there would exist, for every n > 1, some scalar αn 6= 0 and some zn ∈ hSi with |αn | > nkαn x0 + zn k, which implies that kx0 + α1n zn k 6 n1 , forcing x0 ∈ hSi. Therefore, y ∗ can be extended to a continuous linear functional x∗ on X which satisfies x∗ (S) = {0} but x∗ (x0 ) = 1. This shows that x0 also does not lie in the intersection of the kernels on the right-hand side, and hence proves the other inclusion.  Exercise 7.7. Let Y ⊆ X be a subspace of a normed linear space X. Show that max

kx∗ k61, x∗ (Y )={0}

|x∗ (x)| = inf kx − yk y∈Y

for all x ∈ X.

Exercise 7.8. (a) Prove that if the dual space X ∗ of a real normed vector space X is strictly convex (see Definition 2.17), then the Hahn–Banach extension of a continuous functional on a subspace to all of X is unique. (b) Give an explicit example of a situation in which the extension defined by the Hahn– Banach theorem is not unique.

7.1.3 The Bidual Corollary 7.9 (Isometric embedding into the bidual). Let X be a normed vector space. Then kxk = max |x∗ (x)| ∗ ∗ x ∈X , kx∗ k61

(7.4)

for any x ∈ X. In particular, the natural linear map ı : X −→ X ∗∗ = (X ∗ )∗ x 7−→ ı(x)

from X into the bidual of X that sends x ∈ X to the linear functional ı(x) defined by ı(x)(x∗ ) = x∗ (x) for x∗ ∈ X ∗ , is an isometric embedding.

214

7 Dual Spaces

Definition 7.10. A Banach space is called reflexive if the isometry ı in Corollary 7.9 is a bijection (and hence an isometric isomorphism) from X to X ∗∗ . As we will see in the next section, some Banach spaces which we have already encountered are reflexive, but some are not. Proof of Corollary 7.9. By definition, |x∗ (x0 )| 6 kx∗ kkx0 k 6 kx0 k for all x∗ ∈ X ∗ with kx∗ k 6 1 and x0 ∈ X. Moreover, we may apply Corollary 7.4 to obtain some functional x∗ ∈ X ∗ of norm one with x∗ (x0 ) = kx0 k, which proves (7.4). Now notice that sup |x∗ (x0 )|

x∗ ∈X ∗ kx∗ k61

is, by definition, precisely the operator norm of ı(x0 ) ∈ X ∗∗ . Hence we have shown that ı is an isometry, and linearity of ı is easy to check.  Exercise 7.11. Let Y ⊆ X be a closed subspace of a normed vector space. (a) Show that Y ⊥ = {x∗ ∈ X ∗ | x∗ (Y ) = {0}} is a closed subspace. (b) Show that (X/Y )∗ = Y ⊥ (that is, that there is a natural isometric isomorphism between the two). (c) Show that Y ∗ = X ∗ /Y ⊥ . (d) Conclude that Y is reflexive if X is reflexive. Exercise 7.12. Let X be a normed vector space and suppose that the dual X ∗ is separable. Show that X is also separable. In particular, if X is separable but X ∗ is not, then X cannot be reflexive. Find an example of a Banach space that is not reflexive for that reason.

The results developed above give another approach to the existence of completions. Shorter proof of Theorem 2.32. Let X be a normed vector space. By Corollary 7.9, X is isometric to ı(X) ⊆ X ∗∗ . Set B = ı(X), which is a Banach space by Lemma 2.54 and Exercise 2.26(b).  7.1.4 An Application of the Spanning Criterion †

The description of the closed linear hull in Corollary 7.6 can be used as a spanning criterion: a subset S of a Banach space X spans X (that is, has X as its closed linear hull) if and only if there is no non-zero x∗ ∈ X ∗ with the property that S ⊆ ker(x∗ ). This is a powerful tool, surprisingly often even without a complete description of the dual space. The following result generalizes the Stone–Weierstrass theorem on the unit interval. The full result also shows the converse, so the divergence characterises the density. †

The result of this subsection will not be needed in the remainder of the book.

7.1 The Hahn–Banach Theorem and its Consequences

215

Theorem 7.13 (M¨ untz [76]). Suppose that (nk ) is a sequence in N with n1 < n2 < n3 < · · ·

P∞ and with k=1 n1k = ∞, and let pn (x) = xn for n ∈ N. Then the linear hull of {1, pn1 , pn2 , . . . } is dense in C([0, 1]). Proof. Let Y be the closed linear hull of the set {1, pn1 , pn2 , . . . } in C([0, 1]). By Corollary 7.6 we have to show that if ℓ ∈ C([0, 1])∗ has ℓ(1) = ℓ(pnk ) = 0

(7.5)

for all k > 1, then ℓ = 0. In fact, it is enough to show that if ℓ ∈ C([0, 1])∗ has (7.5) for all k > 1, then ℓ(pn ) = 0 for all integers n > 1. This is because Corollary 7.6 then shows that C[x] ⊆ Y , after which the Stone–Weierstrass theorem (Theorem 2.40) may be applied to give Y = C([0, 1]). So assume that ℓ ∈ C([0, 1])∗ satisfies (7.5) for all k > 1, and assume also P∞ that there is some n ∈ N with ℓ(pn ) 6= 0. We will show that this implies k=1 n1k < ∞. For ζ ∈ C with ℜ(ζ) > 0, we define pζ (t) = tζ for t ∈ [0, 1] (with the convention that 0ζ = 0). This defines the function pζ ∈ C([0, 1]) satisfying kpζ k 6 1. Moreover, we have tζ+δ − tζ tδ − 1 = lim tζ = tζ log t, C∋δ→0 C∋δ→0 δ δ lim

for all t ∈ [0, 1] and in fact the convergence is with respect to the k · k∞ norm (use the complex version of the mean value theorem to check this claimed uniformity). Now define f (ζ) = ℓ(pζ ) for ζ ∈ C with ℜ(ζ) > 0, so |f (ζ)| 6 kℓk. Furthermore, f is analytic for ℜ(ζ) > 0 since lim

C∋δ→0

 f (ζ + δ) − f (ζ) = ℓ tζ log t δ

exists by the above observation regarding uniform convergence. Finally, we have f (nk ) = 0 for k > 1 by assumption. Now define the Blaschke product (22) BK (ζ) =

K Y ζ − nk , ζ + nk

k=1

with simple zeros BK (nk ) = 0 for k = 1, . . . , K, BK (ζ) 6= 0 for ℜ(ζ) > 0 and ζ ∈ / {n1 , . . . , nK }, and the asymptotic formula |BK (ζ)| −→ 1

216

7 Dual Spaces

as ℜ(ζ) → 0 or as |ζ| → ∞. Together with f (nk ) = 0 these properties show that f (ζ) gK (ζ) = BK (ζ) is analytic for ℜ(ζ) > 0, and satisfies the estimate |gK (ζ)| 6 (1 + δ)kℓk

(7.6)

for kζk > R(δ) and ℜ(ζ) 6 ε(δ), the positive quantities R(δ) and ε(δ) depend on δ, and δ > 0 is arbitrary. Applying the maximum principle for gK on the half-disk {ζ ∈ C | |ζ − ε(δ)| 6 R(δ), ℜ(ζ) > ε(δ)} in Figure 7.1, the function |gK | must attain its maximum on the boundary of the half-circle. As we have (7.6) on that boundary, we obtain kgK k 6 kℓk(1 + δ) first on the half-disk and, by decreasing ε(δ) and increasing R(δ), on all of the right half-space. As δ > 0 was arbitrary, we obtain kgK k∞ 6 kℓk.

R 0

ε

Fig. 7.1: Applying the maximum principle.

Recall that n was chosen so that f (n) = ℓ(pn ) 6= 0. For ζ = n this shows that Y K K Y n + nk kℓk −1 1 + 2n = n − nk = |BK (n)| 6 |f (n)| < ∞, nk − n k=1

k=1

meaning that we have found an upper bound for the product on the left-hand side independent of K. Notice that nk > n for all but finitely many k ∈ N.

7.2 Banach Limits, Amenable Groups, and the Banach–Tarski Paradox

217

Taking the logarithm and using the fact that x ≪ log(1 + x) for all x ∈ [0, 1], PK it follows that the sum k=1 nk1−n has an upper bound independent of K. Multiplying the series term-by-term with nkn−n (and noticing that nkn−n →1 k k P∞ 1 as k → ∞), it follows that k=1 nk < ∞, as claimed. This contradicts our assumption, and the theorem follows. 

7.2 Banach Limits, Amenable Groups, and the Banach–Tarski Paradox In this section we start the discussion of amenability and related topics. Amenability will be one of two (quite different) functional-analytic properties of a group that we will discuss in Chapter 10. 7.2.1 Banach Limits On the space c(N) = {(xn )n∈N ∈ ℓ∞ (N) | limn→∞ xn exists} we have the natural linear functional lim defined by c(N) ∋ (xn )n∈N 7−→ lim ((xn )n∈N ) = lim xn . n→∞

A natural question is to ask if this rather obvious functional — taking the limit of sequences that do have a limit — might have an extension to all of the much larger space ℓ∞ (N). The Hahn–Banach theorem is built for just such situations, and using it we readily find such a generalized limit in the form of a linear functional. Corollary 7.14 (Banach limit). There exists a linear functional, which we ∗ will denote LIM ∈ (ℓ∞ (N)) , with norm one, which may be thought of as a generalized limit since it satisfies the following properties: • LIM((an )) = limn→∞ an if the latter limit exists; • LIM((an )) ∈ [lim inf n→∞ an , lim supn→∞ an ] if an ∈ R for all n > 1; • LIM((an )) = LIM((an+1 )). The functional LIM is called a Banach limit. Proof. We work initially over R. Let c(N) ⊆ ℓ∞ (N) and lim ∈ c(N)∗ be as given before the statement of the corollary. Notice that k lim k = 1 since lim an 6 sup |an |. n→∞

n>1

Let L ∈ (ℓ∞ (N))∗ be an extension as in Theorem 7.3, with kLk = k lim k. We now define

218

7 Dual Spaces

 2 a1 +a2 +a3 LIM((an )) = L a1 , a1 +a ,... . 2 , 3

Clearly LIM is linear and extends lim on the subspace c(N), since the C´esaro averages of a convergent sequence converge to the same limit. This functional also has norm one, since a +···+a n 1 6 k(an )k∞ n for all n > 1, which implies that  L a1 , a1 +a2 , . . . 6 k(an )k∞ . 2 Moreover,

 3 a1 −a4 LIM((an )n − (an+1 )n ) = L a1 − a2 , a1 −a , 3 , . . . = 0, 2

which implies the last claim in the corollary. Let I = inf n>1 an and S = supn>1 an , so that an − all n > 1, and hence LIM((an )) − I+S 6 S−I , 2 2

I+S 2

6

S−I 2

for

which implies that I 6 LIM((an )) 6 S. Together with the already established translation-invariance, we obtain inf n>k an 6 LIM((an )) 6 supn>k an , and so also lim inf an 6 LIM((an )) 6 lim sup an . n→∞

We may extend LIM from

n→∞

ℓ∞ R (N)

to

ℓ∞ C (N)

by setting

LIM((an )) = LIM((ℜan )) + i LIM((ℑan )) for all (an ) ∈ ℓ∞ C (N) (see Exercise 7.15).

(7.7) 

∗

Exercise 7.15. Show that the extension LIM ∈ ℓ∞ C (N) norm one.

from (7.7) is C-linear and has

By pre-composing with the projection operator from ℓ∞ (Z) to ℓ∞ (N) defined by (an )n∈Z 7→ (an )n∈N , we can also view LIM as a translationinvariant linear functional on ℓ∞ (Z). 7.2.2 Amenable Groups A natural and important question to ask is which other groups G have a similar invariant functional defined on all of ℓ∞ (G). We assume here that G is endowed with the discrete topology but note that the notions discussed have natural analogues for locally compact groups (see Section 10.2). The following concept was introduced by von Neumann in 1929, and called messbar (measurable); the modern terminology was introduced by Day

7.2 Banach Limits, Amenable Groups, and the Banach–Tarski Paradox

219

in 1949, perhaps as a pun, as these groups are ‘easy to work with’ and hence a-men-able (US) / a-mean-able (UK), and are groups that ‘admit a mean’, hence a-mean-able. Definition 7.16. A discrete group G is called amenable if there exists a finitely additive (left-)invariant mean on G. That is, a map m : P(G) → [0, 1] defined on all subsets of G, with the following properties: • m(A) > 0 for all A ⊆ G and m(G) = 1; • m(A1 ∪ A2 ) = m(A1 ) + m(A2 ) for disjoint sets A1 , A2 ⊆ G; and • m(gA) = m(A) for all g ∈ G and A ⊆ G. One may think of a mean (which is only required to be finitely additive) as a poor substitute for a measure (which is countably additive) when a measure with the desired properties does not exist. This is the case if G is a countable infinite group, as the only translation-invariant measure in that case is the counting measure (or a scalar multiple of the counting measure), which is infinite. However, the invariant mean discussed here takes values in [0, 1]. Example 7.17. Corollary 7.14 together with the next lemma shows G = Z is amenable. Moreover, any invariant mean on Z (there will turn out to be many, see Exercise 7.25) will have some reassuringly natural properties. For example, if E = {n ∈ Z | n is even}, then for any invariant mean m we must have m(E) = 21 since 2m(E) = m(E) + m(E + 1) = m(Z) = 1. Lemma 7.18. A discrete group G is amenable if and only if there exists a ∗ positive left-invariant functional LIM ∈ (ℓ∞ (G)) of norm one. Here positivity is the requirement that • a > 0 (on all of G) implies that LIM(a) > 0, and left-invariance is • LIM(ah ) = LIM(a) for all h ∈ G and a ∈ ℓ∞ (G), where ah is the shifted map g 7→ a(h−1 g). Sketch of proof. If LIM is given on ℓ∞ (G) then we can define m(A) to be LIM(1A ) for all A ⊆ G, and then it is easy to check that m is a leftinvariant mean on G. On the other hand, if a left-invariant mean m is given, then we may obtain every a ∈ ℓ∞ (G) as the limit of a sequence of finite sums of the form ℓn X (n) ri 1A(n) i=1

(n)

i

(n)

as n → ∞, where ri ∈ C and Ai ⊆ G for all n > 1 and i. For example, C into finitely many sets B1 , . . . , Bℓ each of we may partition the set Bkak ∞ diameter less than

1 n,

(n)

choose ri

in Bi and then define Ai = a−1 (Bi ) so that

220

7 Dual Spaces

ℓn

X 1

(n) ri 1A(n) 6 .

a − i

n i=1

Now we can define LIM by

LIM(a) = lim

n→∞

ℓn X

(n)

(n)

ri m(Ai ),

i=1

and check that LIM is well-defined, linear, bounded of norm one, positive, and left-invariant.  Exercise 7.19. Provide a detailed proof of Lemma 7.18.

Next we show that the class of amenable groups is closed under many natural operations that allow us to give more examples. Proposition 7.20 (Subgroups and quotients). Let G be a discrete group. (a) If G is amenable, then every subgroup H < G is also amenable. (b) Let H ⊳ G be a normal subgroup. Then G is amenable if and only if both H and G/H are amenable. Proof. For (a), let LIMG be a left-invariant positive functional on ℓ∞ (G) of norm one, as in Lemma 7.18. Let H < G be a subgroup and S ⊆ G a set of right coset representatives, so that G G= Hs. s∈S

For any a ∈ ℓ∞ (H) we define aG (hs) = a(h) for h ∈ H and s ∈ S and LIMH (a) = LIMG (aG ), which is clearly linear. Moreover, a > 0 implies that aG > 0, and therefore LIMH (a) > 0. Moreover, | LIMH (a)| = | LIMG (aG )| 6 kaG k∞ = kak∞ . Finally, for h0 ∈ H and s ∈ S we have −1 h0 (ah0 )G (hs) = ah0 (h) = a(h−1 0 h) = aG (h0 hs) = (aG ) (hs),

so LIMH (ah0 ) = LIMG ((aG )h0 ) = LIMG (aG ) = LIMH (a), and Lemma 7.18 shows that H is amenable. For (b) we again use Lemma 7.18, allowing us to work with functionals on the space of bounded functions on the groups involved. Assume first that G is amenable, so that H is amenable by (a). For a ∈ ℓ∞ (G/H), define an element aG ∈ ℓ∞ (G) by setting aG (g) = a(gH). Just as in (a) we define LIMG/H (a) = LIMG (aG ) to obtain a left-invariant positive functional of norm one on ℓ∞ (G/H), showing that G/H is amenable.

7.2 Banach Limits, Amenable Groups, and the Banach–Tarski Paradox

221

For the converse, assume that H and G/H are both amenable, and write LIMH and LIMG/H for the associated functionals. For a ∈ ℓ∞ (G) define the bounded function a on G by a(g) = LIMH (h 7→ a(gh)) for g ∈ G. Notice that for h0 ∈ H a(gh0 ) = LIMH (h 7→ a(gh0 h)) = LIMH (h 7→ a(gh)) = a(g), since multiplication by h0 induces a left shift on the function h 7→ a(gh). Therefore we can think of a as a function on G/H by setting a(gH) = a(g). We can now define LIMG (a) = LIMG/H (a) = LIMG/H (g 7→ LIMH (h 7→ a(gh))) , and it is once again straightforward to check the required properties of LIMG to see that G is amenable.  The following exercise gives the first indication of the existence of a maximal amenable radical subgroup: the amenable radical (also see Exercise 8.25). Exercise 7.21. (a) Let G be a discrete group and let H1 ⊳ G and H2 < G be two amenable subgroups with H1 normal. Show that H1 H2 = {h1 h2 | h1 ∈ H1 , h2 ∈ H2 } is also an amenable subgroup. (b) Let H1 , . . . , Hn ⊳ G be amenable normal subgroups for some n > 2. Show that the product H1 H2 · · · Hn ⊳ G is an amenable normal subgroup.

Even though the above shows that many groups are amenable, it is also easy to give an example of a group that is not amenable. Example 7.22. The free group G = F2 generated by two elements α, β is not amenable. To see this, suppose that m is an invariant mean on G. Clearly the singletons {e}, {α}, {αα}, . . . are all disjoint and are left-translates of each other, so we must have m({e}) = 0. Now define Sα = {g ∈ G | the reduced form of g starts on the left with α}, and similarly define sets Sα−1 , Sβ , and Sβ −1 . Since G = Sα ⊔ Sα−1 ⊔ Sβ ⊔ Sβ −1 ⊔ {e} and m({e}) = 0, we must have 1 = m(Sα ) + m(Sα−1 ) + m(Sβ ) + m(Sβ −1 ). However, α−1 Sα = Sα ⊔ Sβ ⊔ Sβ −1 ⊔ {e}, so

(7.8)

222

7 Dual Spaces

m(Sα ) = m(Sα ) + m(Sβ ) + m(Sβ −1 ), and hence m(Sβ ) = m(Sβ −1 ) = 0 by positivity. Exchanging the roles of α and β also shows that m(Sα ) = m(Sα−1 ) = 0, which together contradict (7.8). It follows that F2 cannot be amenable. We note that Proposition 7.20 shows inductively that G = Zd is amenable. This may also be seen by noting that a sequence (Fn ) of large boxes, for example the sequence defined by Fn = [−n, n]d ∩ Zd for n > 1, is a Følner sequence as discussed now. Definition 7.23. A sequence (Fn )n of finite subsets of a countable group G is called a Følner sequence if the elements of the sequence are asymptotically translation invariant in the sense that lim

n→∞

|Fn △hFn | =0 |Fn |

(7.9)

for all h ∈ G. Lemma 7.24. If a countable group has a Følner sequence, then it is amenable. Proof. Let LIMN be the Banach limit from Corollary 7.14 and let (Fn ) be a Følner sequence. Then for any a ∈ ℓ∞ (G) we can define   1 X LIMG (a) = LIMN a(g) , |Fn | g∈Fn

which is linear, positive, and of norm one. Then     X 1 X 1 h −1 LIMG (a ) = LIMN a(h g) = LIMN a(g) |Fn | |Fn | g∈Fn g∈h−1 Fn   1 X = LIMN a(g) = LIMG (a) (by (7.9)) |Fn | g∈Fn

showing left-invariance.



Exercise 7.25. Show that the invariant means on Z constructed by using the Følner (1) (2) (3) sequences defined by Fn = [0, n], Fn = [−n, 0], and Fn = [n2 , n2 + n] are all different. Can you construct infinitely many different invariant means on Z? Exercise 7.26. Show that any countable abelian group is amenable. Exercise 7.27. Prove that the discrete Heisenberg group

    1k ℓ  H = 0 1 m | k, ℓ, m ∈ Z   0 0 1

with the usual matrix multiplication is amenable.

7.2 Banach Limits, Amenable Groups, and the Banach–Tarski Paradox

223

Exercise 7.28. Show that SL2 (Z) is not amenable. You may use the fact that the group PSL2 (Z) = SL2 (Z)/{±I} is isomorphic to the free product of Z/2Z and Z/3Z.

7.2.3 The Banach–Tarski Paradox The following surprising consequence of the axiom of choice was one of the original motivations for the study of amenable groups and of measurable sets. Theorem 7.29 (Banach–Tarski paradox(23) ). The closed unit ball B in R3 can be decomposed into finitely many disjoint sets B = P1 ⊔ · · · ⊔ Pm with the property that there are isometric motions (that is, compositions of rotations and translations) of R3 sending Pi to Pi′ for 1 6 i 6 m such that    3 ′ B ⊔ B + 0 = P1′ ⊔ · · · ⊔ Pm . 0

Clearly some of the sets appearing in the decomposition are of necessity non-measurable. The same result holds in Rd for any dimension d > 3, but we restrict attention to the case d = 3 for simplicity. However, in dimension d = 2 no such finite paradoxical decomposition can be found. Notice that while the dimension d plays an essential role, other aspects of the topology of Rd play no real role here since the sets in the decomposition are not even measurable, and in particular cannot be described using countable unions and intersections of open and closed sets. The real drama takes place in the group of isometries of Rd , and it is differences between the structure of this group when d = 2 and when d > 3 that lie behind the existence of the paradoxical decomposition. For this reason we will endow the group of isometries Isom(Rd ) = Od (R) ⋉ Rd with the discrete topology. Doing so helps to illuminate the difference between the cases d = 2 and d > 3. • For d = 2 the group SO2 (R) is abelian, and hence amenable (we will only be able to prove this a bit later in Exercise 8.23). The translation group R2 and the factor O2 (R)/ SO2 (R) are also abelian, so by Proposition 7.20 we see that Isom(R2 ) is amenable. This helps to prevent paradoxical decompositions as in the Banach–Tarski paradox for the unit ball in 2 dimensions and using isometries (see Exercise 7.30). • For d > 3 the group SOd (R) is far from abelian, and in fact contains a free subgroup (see Proposition 7.31) and so cannot be amenable by Proposition 7.20 and Example 7.22 (where we use the discrete topology on SOd (R)).

Exercise 7.30. (a) Let G be a discrete amenable group, and suppose that G acts on a set X. Show that there exists a (non-canonical) finitely additive G-invariant mean mX (that is, a function mX : P(X) → [0, 1] satisfying the first two requirements of Definition 7.16 and with mX (g B) = mX (B) for all B ⊆ X and g ∈ G) on the set of subsets P(X) of X.

.

224

7 Dual Spaces

(b) Show that amenability of Isom(R2 ) implies that a paradoxical decomposition as in Theorem 7.29 cannot exist in 2 dimensions.

Proposition 7.31. The rotations 3 4  5 –5 0 a =  45 35 0, 0 0 1

 1 0 0 3 4 b = 0 5 – 5  4 0 5 35

in SO3 (R) generate a free subgroup H of SO3 (R). The statement is essentially geometric: a and b are rotations by an irrational multiple of π fixing two orthogonal axes, as illustrated in Figure 7.2. The proposition means that any rotation obtained by composing a finite sequence of rotations a±1 and b±1 (in which a rotation is never followed by its inverse) cannot be obtained by any other such sequence of rotations. z a y

x b Fig. 7.2: Two rotations generating a free subgroup.

Proof of Proposition 7.31.  3 –4 e a+ = 5a = 4 3 0 0

Let    0 3 4 0 0 , e a− = 5a−1 = –4 3 0 , 5 0 0 5

and similarly define eb+ = 5b and eb− = 5b−1 . For some part of the proof we will be working over the field F5 (that is, working modulo 5). The matrices arising can all be viewed as linear transformations of the vector space Z3 /5Z3 ∼ = F35 . We want to study how they act on the vectors           1 2 1 0 0 v = 1 , wα = 1 , wα−1 = 2 , wβ = 2 , wβ −1 = 1 . 1 0 0 1 2

Notice that it is enough to carry out the calculations for e a+ , since the situation for the other matrices is the same up to a permutation of the basis vectors. Writing ∼ for proportionality, we obtain

7.2 Banach Limits, Amenable Groups, and the Banach–Tarski Paradox

225

but

     −1 4 2 e a+ v =  7  ≡ 2 ∼ 1 = wα , 5 0 0      3 –4 0 2 2 e a+ wα = 4 3 0 1 ≡ 1 = wα , 0 0 0 0 0      3 –4 0 0 2 e a+ wβ = 4 3 0 2 ≡ 1 = wα , 0 0 0 1 0        3 –4 0 0 1 2 e a+ wβ −1 = 4 3 0 1 ≡ 3 ∼ 1 = wα , 0 0 0 2 0 0 e a+ wα−1

     3 –4 0 1 0 = 4 3 0 2 ≡ 0. 0 0 0 0 0

The same applies to the other matrices, which in summary means that each of the matrices e a+ , e a− , eb+ , eb− has its own non-zero eigenvector in F35 , maps the eigenvector of the matrix with the same symbol but opposite sign to the zero vector, but maps v and the other three to a multiple of its eigenvector. Suppose now that γ is a reduced word of length n > 1 in F2 , the free group with generators α and β (that is, a finite string of symbols chosen from α, α−1 , β, β −1 with the property that no symbol is immediately followed by its inverse). Define a homomorphism φ : F2 → SO3 (R) by defining it on the generators by φ(α) = a, φ(β) = b and then extending to F2 using e the homomorphism property, and use this to define φ(γ) = 5n φ(γ). Equivale ently, φ(γ) ∈ Mat33 (Z) may be obtained by multiplying e a+ , e a− , eb+ , eb− in the order and multiplicities corresponding to the appearance of α, α−1 , β, β −1 in the word γ. As the word γ is reduced, we see by induction on n and the calculations above that e φ(γ)v ∈ Z3

modulo 5 is a non-zero multiple of wη where η ∈ {α, α−1 , β, β −1 } is the e left-most symbol of γ. In particular, φ(γ) is not divisible by 5 and φ(γ) 6= I n e because φ(γ) 6= 5 I. Thus φ is injective and so im φ = ha, bi ∼  = F2 .

In the proof of Theorem 7.29 the free subgroup ha, bi < SO3 (R) from Proposition 7.31 will play a critical role. It will be convenient to define two subsets B1 , B2 of R3 to be equivalent, written B1 ∼ B2 , if they can be decomposed as B1 = P1 ⊔ · · · ⊔ Pn and B2 = Q1 ⊔ · · · ⊔ Qn into finitely many disjoint subsets such that Qk is the image of Pk under some isometric motion for k = 1, . . . , n. Essential Exercise 7.32. Prove that this defines an equivalence relation.

226

7 Dual Spaces 3

Proof of Theorem 7.29. Let B = B1R be the closed unit ball in R3 . Step 1. We claim that B ∼ Br{0} by using a ‘Hilbert’s Hotel’ argument.(24) To see this, let x0 = ( 12 , 0, 0)t and let γ : R3 → R3 be an irrational rotation (meaning that γ n = I for some n ∈ Z implies that n = 0) about the point x0 in the x-y plane extended trivially to a rotation about the line parallel to the z-axis through x0 , so that the orbit D = {γ n (0) | n ∈ N0 } ⊆ B is infinite. Therefore B = BrD ⊔ D ∼ BrD ⊔ γD = Br{0}, proving the claim. Now let H < SO3 (R) be the free subgroup constructed in Proposition 7.31. Since H is countable and every non-trivial rotation in SO3 (R) has a single one-dimensional eigenspace with eigenvalue 1, we can find a countable union E of lines through the origin such that HE = E and with the property that no vector in BrE is fixed by a non-trivial element of H. Step 2. By using countably many Hilbert Hotel arguments at once we claim that Br{0} ∼ BrE. To see this, notice that the set S2 ∩ E is countable and so the set P of pairs of vectors v1 , v2 ∈ S2 ∩ E with v1 6= v2 is also countable. Therefore  W = w ∈ R3 | w ⊥ v1 − v2 for some (v1 , v2 ) ∈ P

is a countable union of hyperplanes, and so a null set. Fix x1 ∈ R3r(W ∪ E) and some irrational rotation γ about the line Rx1 . If now γ m v1 = γ n v2 for some m, n ∈ N0 and v1 , v2 ∈ S2 ∩ E, then, since γ is an isometry fixing x1 , hv1 , x1 i = hγ m v1 , x1 i = hγ n v2 , x1 i = hv2 , x1 i

gives v1 = v2 by our choice of x1 . Since v1 ∈ / Rx1 and γ is an irrational rotation about the line, we also see that m = n. Therefore the various sets γ n Rv1r{0}, where we vary n > 0 and the subspace Rv1 ⊆ E, are all disjoint. Thus G G Br{0} = Br γnE ⊔ B ∩ γ n Er{0} n>0

∼ Br

G

n>0

n>0

n

γ E⊔B∩

G

n>1

γ n Er{0} = BrE,

as claimed. Step 3. We claim that BrE ∼ BrE ⊔ (BrE + (3, 0, 0)t ). This is clearly the main step in the argument, and it is here that we will use the fact that H is a free group, and in particular the resulting decomposition H = {e} ⊔ Sa ⊔ Sa−1 ⊔ Sb ⊔ Sb−1

7.3 The Duals of Lpµ (X)

227

obtained by taking the image of the decomposition of F2 in Example 7.22. In order for the decomposition of H to be useful, we have to find a crosssection C of BrE satisfying G BrE = Hc, c∈C

which can be found by a direct application of the axiom of choice (and will not be measurable). We now decompose BrE into the four disjoint sets G G B1 = S a C ⊔ a−n C, B2 = Sa−1 Cr a−n C, B3 = Sb C n>0

n>1

and B4 = Sb−1 C. Applying a to B2 we obtain aB2 = (Sa−1 C ⊔ C ⊔ Sb C ⊔ Sb−1 C) r

G

a−n C

n>0

 = B2 ⊔ B3 ⊔ B4 ∼ B2 ⊔ (B3 ⊔ B4 ) + (3, 0, 0)t ,

and applying b to B4 we obtain

bB4 = (Sb−1 C ⊔ C ⊔ Sa C ⊔ Sa−1 C)

 = B4 ⊔ B1 ⊔ B2 ∼ B4 ⊔ (B1 ⊔ B2 ) + (3, 0, 0)t .

Leaving B1 and B3 untouched and taking the union this proves the claim in Step 3. Applying Step 2 and Step 1 (twice each) backwards, the theorem follows. 

7.3 The Duals of Lpµ (X) In this section we will present descriptions of dual spaces using a bilinear pairing. If X and Y are vector spaces and each y ∈ Y induces a linear functional on X, then we often write hx, yi for the value of the functional associated to y ∈ Y evaluated at x ∈ X. We always assume that the linear functional depends linearly on y ∈ Y , and so h·, ·i is a bilinear functional on X × Y . We use the word pairing here to signify that, for example, the map in Exercise 7.33 may be thought of in two ways. It defines on the one hand a family of functionals parameterized by elements of ℓ1 (N) defined on sequences in c0 (N) and on the other a family of functionals parameterized by elements of c0 (N) defined on sequences in ℓ1 (N). Even though we will prove this in many cases (indeed, it will often be the key step in an argument), when

228

7 Dual Spaces

we use this notation and terminology we do not assume that Y is indeed the whole dual to X or vice versa. The reader may start with the following as a warm-up exercise on how dual spaces may be found. Exercise 7.33. (a) Recall that c0 (N) = {(an ) | limn→∞ an = 0} ⊆ ℓ∞ (N) is a Banach space with respect to the supremum norm k · k∞ . Show that there is an isometric isomorphism (c0 (N))∗ ∼ = ℓ1 (N), where the dual pairing is given by h(an ), (bn )i =

∞ X

an bn

n=1

for (an ) ∈ c0 (N) and (bn ) ∈ ℓ1 (N). ∗ (b) Show that ℓ1 (N) ∼ = ℓ∞ (N), with the same formula for the pairing. (c) Show that the Banach limit LIM ∈ (ℓ∞ (N))∗ as in Corollary 7.14 is not in the canonical image of ℓ1 (N) in (ℓ∞ (N))∗ . (d) Conclude that neither c0 (N) nor ℓ1 (N) is reflexive. (e) Now let X be any infinite discrete set, and define c0 (X) (and ℓ1 (X)) to be the space of all (R-valued or C-valued) functions a on X for which there exists a sequence (xn ) in X with Supp a ⊆ {x1 , x2 , . . .} and such that (a(xn ))n belongs to c0 (N) (resp. ℓ1 (N)). Generalize (a) and (b) to this context.

7.3.1 The Dual of L1µ (X) We start by generalizing the second part of Exercise 7.33. Proposition 7.34. Let (X, B, µ) be a σ-finite measure space. Then ∗ L1µ (X) ∼ = L∞ µ (X) R under the pairing hf, gi = X f g dµ for f ∈ L1µ (X) and g ∈ L∞ µ (X). The operator norm of the functional defined by g ∈ L∞ (X) is the essential suµ premum norm kgk∞ (defined on p. 29). Proof. As indicated, we associate to every g ∈ L∞ µ (X) the functional Z φ(g) : f 7−→ f g dµ X

which satisfies kφ(g)kop

Z Z = sup f g dµ 6 sup |f ||g| dµ 6 kgk∞ . kf k1 61

X

kf k1 61

X

For the converse we assume that g 6= 0, let ε ∈ (0, kgk∞) and choose a measurable set A ⊆ {x ∈ X | |g(x)| > kgk∞ − ε} with µ(A) > 0 (which is possible by definition of the essential supremum) and with µ(A) < ∞ (which is possible since µ is σ-finite). Now define

7.3 The Duals of Lpµ (X)

229

f=

1 |g(x)| , 1A µ(A) g(x)

and notice that kf k1 = 1 and Z φ(g)(f ) = f g dµ = X

1 µ(A)

Z

A

|g| dµ > kgk∞ − ε.

1 ∗ This shows that kφ(g)kop = kgk∞ , so φ : L∞ µ (X) → (Lµ (X)) is an isometric embedding. It remains to show that φ is onto (this is often the most interesting part of the identification of a dual space). For this, assume first that µ is finite. Then L2µ (X) ⊆ L1µ (X) and

kf k1 6 µ(X)1/2 kf k2 for all f ∈ L2µ (X) by the Cauchy–Schwarz inequality. Therefore, a func∗ ∗ tional ℓ ∈ L1µ (X) also induces a functional ℓ′ ∈ L2µ (X) . Since L2µ (X) is a Hilbert space, the Fr´echet–Riesz representation theorem (Corollary 3.19) now shows that there exists some g ∈ L2µ (X) with Z ℓ′ (f ) = f g dµ X

for all f ∈ L2µ (X). We now show that g ∈ L∞ µ (X). Let A = {x ∈ X | |g(x)| > kℓkop}, 2 1 so that f = 1A |g| g ∈ Lµ (X) ⊆ Lµ (X) and kf k1 = µ(A). If µ(A) > 0 then

kℓkop µ(A)
1 Exercise 7.35. Let p, q ∈ (1, ∞) with

1 p

+

1 q

= 1. Show that (ℓp (N))∗ ∼ = ℓq (N).

The following provides us with many examples of reflexive spaces. Proposition 7.36. Let (X, B, µ) be a σ-finite measure space, and assume that p ∈ (1, ∞) has H¨ older conjugate q. Then (Lpµ (X))∗ ∼ = Lqµ (X) via the pairing Z hf, gi =

f g dµ

X

for f ∈ Lpµ (X) and g ∈ Lqµ (X). The operator norm of the functional determined by g is precisely kgkq . Proof. For f ∈ Lpµ (X) and g ∈ Lqµ (X) with

1 p

+

1 q

= 1 we have

|hf, gi| 6 kf kp kgkq by the H¨ older inequality (Theorem B.15). It follows that the linear functional defined by g on Lpµ (X) is bounded, with norm less than or equal to kgkq . If we set ( |g| |g|q/p if g 6= 0, f= g 0 if g = 0 then kf kp = and hf, gi =

Z

X

Z |g|

|g|q dµ

q 1+ p

1/p

dµ =

Z

X

= kgkq/p 0}. Thus for a general ℓ we would like to show that ν + ≪ µ is an absolutely continuous measure on X (which then will give us g + as a Radon–Nikodym derivative). Clearly ν + (B2 ) > ν + (B1 ) > 0 for measurable B1 ⊆ B2 ⊆ X. For measurable disjoint B1 , B2 ⊆ X and A1 ⊆ B1 , A2 ⊆ B2 as in the definition of ν + , we have

ℓ(1A1 ) + ℓ(1A2 ) = ℓ(1A1 ∪A2 ) 6 ν + (B1 ∪ B2 ), and taking the supremum over A1 and A2 gives ν + (B1 ) + ν + (B2 ) 6 ν + (B1 ∪ B2 ). If, on the other hand, A ⊆ B1 ∪ B2 has finite measure we define Ai = A ∩ Bi for i = 1, 2 and see that ℓ(1A ) = ℓ(1A1 ) + ℓ(1A2 ) 6 ν + (B1 ) + ν + (B2 ). Hence on taking the supremum over A we get ν + (B1 ) + ν + (B2 ) = ν + (B1 ∪ B2 ). Suppose now that B=

∞ G

Bn .

n=1

PN Then n=1 ν + (Bn ) = ν + (B1 ∪· · · ∪BN ) 6 ν + (B) for all N > 1 by the above finite additivity and monotonicity, and so ∞ X

ν + (Bn ) 6 ν + (B).

n=1

To see the converse, let A ⊆ B be P measurable with finite measure, and define An = A ∩ Bn . Clearly 1A = ∞ n=1 1An pointwise, but by the dominated convergence theorem this convergence also holds with respect to k · kp (since p < ∞). Since ℓ is continuous, it follows that ℓ(1A ) =

∞ X

n=1

ℓ(1An ) 6

∞ X

ν + (Bn ).

n=1

As this holds for all A ⊆ B with finite measure we get ! ∞ ∞ X G + ν ν + (Bn ), Bn = n=1

n=1

232

7 Dual Spaces

and so ν + is a measure on X. Finally, if B ⊆ X has finite µ-measure, then ℓ(1A ) 6 kℓkop k1A kp = kℓkop µ(A)1/p 6 kℓkopµ(B)1/p for all A ⊆ B, which shows that ν + (B) 6 kℓkµ(B)1/p . It follows that ν + ≪ µ is absolutely continuous and is σ-finite since µ is assumed to be σ-finite. By the Radon–Nikodym theorem (see Proposition 3.29) we have dν + = g + dµ for some measurable function g + > 0. We claim that g + ∈ Lqµ (X) is the positive part of the element g ∈ Lqµ (X) we are looking for. For this we first have to check that g + ∈ Lqµ (X), which we do by estimating the Lqµ norms of gn+ = min{n, g + 1Xn }, where (Xn ), with X 1, is a sequence of sets with finite Sn∞⊆ Xn+1 for all n > + + measure and X = n=1 Xn . Notice that g n Pm ր g as n → ∞. Now let h > 0 be a simple function of the form h = j=1 βj 1Bj , where βj > 0 for all j and with the sets Bj measurable and pairwise disjoint. Then Z

hgn+ dµ 6

Z

hg + dµ =

m X j=1

 βj sup ℓ(1Aj ) | Aj ⊆ Bj

 X   m = sup ℓ βj 1Aj | Aj ⊆ Bj , j=1

but the expressions inside the last supremum we may estimate by X

X  m

m ℓ

β 1 6 kℓk β j 1A j j Aj op j=1

and so

j=1

Z

6 kℓkop khkp

p

hgn+ dµ 6 kℓkopkhkp .

Using monotone convergence this estimate extends to all positive h ∈ Lpµ (X). Applying the argument (for gn+ ∈ Lqµ (X)) from the beginning of the proof this shows that kgn+ kq 6 kℓkop and letting n → ∞ also shows that kg + kq 6 kℓkop by monotone convergence. Now define ∗ ℓ− = φ(g + ) − ℓ ∈ Lpµ (X) where φ(g + ) is the functional determined by g + ∈ Lqµ (X). Notice that for all B ⊆ X measurable with µ(B) < ∞ we have

7.3 The Duals of Lpµ (X)

ℓ− (1B ) =

233

Z

1B g + dµ − ℓ(1B )

= sup{ℓ(1A − 1B ) | A measurable, A ⊆ B} = sup{−ℓ(1C ) | C measurable, C ⊆ B},

(7.10)

where we used the identity 1A − 1B = −1C for C = BrA and A ⊆ B. Using −ℓ in the construction above we also obtain a measure dν − = g − dµ for some g − ∈ Lqµ (X). For a measurable set B ⊆ X with µ(B) < ∞ we now obtain from (7.10) that Z Z + − − 1B g dµ ℓ(1B ) = ν (B) = 1B g − dµ or equivalently ℓ(1B ) =

Z

1B (g − g ) dµ = +

Z

1B g dµ

for g = g + − g − ∈ Lqµ (X). By the density of simple functions in Lpµ (X) we conclude that ℓ = φ(g), as required.  7.3.3 Riesz–Thorin Interpolation †

The Riesz–Thorin interpolation theorem (also called the Riesz–Thorin convexity theorem) bounds the norms of linear maps between Lp spaces. This can be useful because certain Lp spaces have special properties making it easier to understand properties of operators on them — this particularly applies to the cases p = 1, 2, and ∞. Proposition 7.37. Let (X, B, µ) be a measure space, and assume that 1 6 q0 < q < q1 6 ∞. Then Lqµ0 (X) ∩ Lqµ1 (X) ⊆ Lqµ (X)

t q0 q1 and kf kq 6 kf k1−t q0 kf kq1 for all f ∈ Lµ (X) ∩ Lµ (X), where t ∈ (0, 1) is 1−t t 1 determined by the relation q = q0 + q1 . 0 Proof. We first assume that q1 = ∞, and note that |f |q 6 |f |q0 kf kq−q ∞ almost everywhere for f ∈ Lqµ0 (X) ∩ L∞ (X). Integrating over X and taking µ

q /q

(q−q )/q

the qth root gives kf kq 6 kf kq00 kf k∞ 0 . Moreover, since q1 = ∞ we have q1 = 1−t q0 , giving the proposition in this case.

The results of this subsection conclude our discussion of Lp -spaces but will not be needed in the remainder of the book.

234

7 Dual Spaces

q0 Now suppose that q1 < ∞. In this case the numbers (1−t)q and qtq1 are H¨ older conjugate by definition of t. Let f ∈ Lqµ0 (X) ∩ Lqµ1 (X). Applying H¨ older’s inequality (Theorem B.15) gives Z Z q |f | dµ = |f |(1−t)q |f |tq dµ

tq 6 |f |(1−t)q = kf k(1−t)q kf ktq

|f | q0 q1 . q0 /(1−t)q

q1 /tq

Taking the qth root gives the proposition.



We now consider a linear map that is defined not just on one Lp space but on several, taking values in possibly different Lq spaces. Theorem 7.38 (Riesz–Thorin interpolation). Let (X, B, µ) and (Y, C, ν) be two σ-finite measure spaces and let p0 , p1 , q0 , q1 ∈ [1, ∞]. Let T : Lpµ0 (X) ∩ Lpµ1 (X) −→ Lqν0 (Y ) ∩ Lqν1 (Y ) be a linear map such that kT f kq0 6 M0 kf kp0 and kT f kq1 6 M1 kf kp1 for all f ∈ Lpµ0 (X) ∩ Lpµ1 (X) and some constants M0 , M1 > 0. Then T has a linear extension to a linear space D of (equivalence classes of ) functions on X into the space Lqν0 (Y ) + Lqν1 (Y ) with the following properties. If we t 1 1−t t define pt and qt for any t ∈ (0, 1) by p1t = 1−t p0 + p1 and qt = q0 + q1 then we have Lpµt (X) ⊆ D and kT f kqt 6 M01−t M1t kf kpt for all f ∈ Lpµt (X). The conclusion also holds for t = 0 if p0 < ∞ and for t = 1 if p1 < ∞. Example 7.39. An interesting example to keep in mind is an application of the theorem known as the Hausdorff–Young inequality. For this, consider the map from Theorem 3.47 sending a function f ∈ L2 (G) for a compact abelian b of characters defined by group G to the map a(f ) on the set G ) a(f ) (χ) = a(f χ = hf, χi

b For f ∈ L2 (G) we have a(f ) ∈ ℓ2 (G) b and ka(f ) k2 = kf k2 ; for every χ ∈ G. 1 (f ) ∞ b (f ) for f ∈ L (G) we have a ∈ ℓ (G) with ka k∞ 6 kf k1 — or formally we have p0 = 2 = q0 , p1 = 1, q1 = ∞, and M0 = M1 = 1. The above interpolation theorem now implies that the map is also defined for functions f ∈ Lp (G) b A short calculation reveals with p ∈ [1, 2], taking values in a certain ℓq (G). that in this case q ∈ [2, ∞] is the H¨older conjugate of p ∈ [1, 2]. Proof of Theorem 7.38 in the case p0 = p1 . Set D = Lpµ0 (X), and notice that for f ∈ Lpµ0 (X) we have kT f kq0 6 M0 kf kp0 and kT f kq1 6 M1 kf kp0 . Applying Proposition 7.37 gives the theorem in this case.  For the general case we will need the following result from complex analyis.

7.3 The Duals of Lpµ (X)

235

Lemma 7.40 (Hadamard’s three-lines theorem). Let φ be a continuous bounded function on the strip S = {z ∈ C | 0 6 ℜ(z) 6 1} that is holomorphic on S o . If |φ(z)| 6 M0 when ℜ(z) = 0 and |φ(z)| 6 M1 when ℜ(z) = 1, then |φ(t + iu)| 6 M01−t M1t for all u ∈ R and t ∈ [0, 1]. Proof. Assume first that M0 , M1 > 0. For ε > 0, define φε (z) = φ(z)M0z−1 M1−z eεz and note that

2

−ε

lim φε (z) = φ(z)M0z−1 M1−z

(7.11)

ε→0

for all z ∈ S. Moreover, φε is continuous on S and holomorphic on S o . Setting z = t + iu ∈ S we also have z 2 = t2 − u2 + 2itu and so 2

|φε (z)| = |φ(z)|M0t−1 M1−t eεt

−εu2 −ε

2

6 |φ(z)|M0t−1 M1−t e−εu ,

since t ∈ [0, 1]. For t = ℜ(z) = 0 or t = ℜ(z) = 1 this gives |φε (z)| 6 1 by assumption on φ. Moreover, lim |φε (z)| = 0,

|u|→∞

where u = ℑ(z). Applying the maximum modulus theorem on {z ∈ C | 0 6 ℜ(z) 6 1, −Nε 6 ℑ(z) 6 Nε } for sufficiently large Nε we see that |φε (z)| 6 1 for all z ∈ S. By (7.11) it follows that |φ(z)| 6 |M01−z M1z | = M01−t M1t for z ∈ S with t = ℜ(z). If M0 = 0 (or M1 = 0) we may apply the argument above with M0 (or M1 ) replaced by any δ > 0 and obtain the lemma by letting δ → 0.  Proof of Theorem 7.38 in the case p0 6= p1 . Our first goal is the inequality kT f kqt 6 M01−t M1t kf kpt (7.12) for a fixed t ∈ (0, 1) and all f ∈ ΣX . Here ΣX denotes the space of simple integrable functions on X (and ΣY is defined similarly). Then ΣX ⊆ Lpµ (X) for all p ∈ [1, ∞] and in particular T is defined on ΣX and satisfies T (ΣX ) ⊆ Lqν0 (Y ) ∩ Lqν1 (Y ) ⊆ Lqνt (Y ) by the assumption in the theorem and Proposition 7.36. Assume for the moment that qt ∈ (1, ∞]. Then the H¨older conjugate qt′ of qt belongs to [1, ∞) q′ and ΣY is dense in Lνt (Y ). Fix some f ∈ ΣX and assume that

236

7 Dual Spaces

Z (T f )g dν 6 M 1−t M1t kf kpt kgkq′ 0 t

(7.13)

for all g ∈ ΣY . Then the above and Propositions 7.34 and 7.36 imply (7.12). The case qt = 1 with qt′ = ∞ is only slightly different. Assume again (7.13) and fix some measurable set B ⊆ Y with ν(B) < ∞. Then {g ∈ ΣY | g(y) = 0 for y ∈ Y rB} is dense in L∞ ν (B) and as before (see also Corollary 7.9) we obtain k(T f )|B k1 6 M01−t M1t independent of B. Using the fact that ν is σ-finite, this again implies (7.12). For the proof of (7.13) it suffices to fix some t ∈ (0, 1), some f ∈ ΣX with kf kpt = 1, and some g ∈ ΣY with kgkqt′ = 1. By definition, we may express f and g as finite sums f=

m X

cj 1Ej ,

j=1

with cj ∈ C, Ej ∈ B, µ(Ej ) < ∞ for 1 6 j 6 m, and g=

n X

dk 1Fk ,

k=1

with dk ∈ C, Fk ∈ C, ν(Fk ) < ∞ for 1 6 k 6 n. We may also assume that the sets Ej are pairwise disjoint, and similarly for the sets Fk . A holomorphic function on the strip. Next define −1 α(z) = (1 − z)p−1 0 + zp1 ,

β(z) = (1 − z)q0−1 + zq1−1

for z ∈ C and observe that α(t) = p−1 and β(t) = qt−1 for the fixed t ∈ (0, 1). t Notice that α(t) > 0 (since otherwise p0 = p1 = ∞, a case already considered) and that β(t) = 1 implies that q0 = q1 = qt = 1, q0′ = q1′ = qt′ = ∞, and also β(z) = 1 for all z ∈ C. For any z ∈ C we now define fz =

m X j=1

|cj |α(z)/α(t) arg(cj )1Ej ,

 n X   |dk |(1−β(z))/(1−β(t)) arg(dk )1Fk gz = k=1   g

if β(t) < 1; if β(t) = 1.

7.3 The Duals of Lpµ (X)

237

Notice that fz ∈ ΣX , gz ∈ ΣY for all z ∈ C, ft = f , and gt = g. Also define Z X φ(z) = (T fz )gz dν = Ajk |cj |α(z)/α(t) |dk |(1−β(z))/(1−β(t)) , j,k

where the coefficient Ajk = arg(cj dk )

Z

(T 1Ej )1Fk dν ∈ C

is independent of z for all j and k. It follows that φ is just a finite linear combination of exponential functions of the form z 7→ az for some a > 0, so that φ is entire and bounded on the strip S = {z ∈ C | 0 6 ℜ(z) 6 1}. Since ft = f and gt = g we also have Z φ(t) = (T f )g dν, which is the quantity that we wish to estimate. The desired estimate will follow from Lemma 7.40 once we establish its remaining assumptions. Boundary estimate: Consider therefore z = iu with ℜ(z) = 0 and notice that ℜ(α(iu)) = p−1 and ℜ(1 − β(iu)) = 1 − q0−1 = (q0′ )−1 . Since the 0 sets E1 , . . . , Em , respectively F1 , . . . , Fn , are disjoint, this gives |fiu | = |f |ℜ(α(iu)/α(t)) = |f |pt /p0 and |giu | =

(

|g|ℜ((1−β(iu))/(1−β(t))) = |g|qt /q0 |g|

if β(t) < 1; if β(t) = 1.

Using the assumption on T this gives |φ(iu)| 6 kT fiukq0 kgiu kq0′ 6 M0 kfiu kp0 kgiu kq0′ =

( q′ /q′ p /p M0 kf kptt 0 kgkqt′ 0 = M0 t

p /p M0 kf kptt 0 kgk∞

= M0

if β(t) < 1; if β(t) = 1

since kf kpt = kgkqt′ = 1. The case of z = 1 + iu with ℜ(z) = 1 works similarly −1 after noticing that ℜ(α(1 + iu)) = p−1 1 and ℜ(β(1 + iu)) = q1 , which leads to the bound |φ(1 + iu)| 6 M1 . The estimate: By Lemma 7.40 we obtain Z |φ(t)| = (T f )g dν 6 M01−t M1t .

238

7 Dual Spaces

Since g ∈ ΣY was arbitrary subject to kgkqt′ = 1, this implies (as explained after (7.13)) that kT f kqt 6 M01−t M1t

for any f ∈ ΣX with kf kpt = 1. By the density of ΣX in Lpµt (X) and Proposition 2.59 it follows that T has a unique extension Tpt to all of Lpµt (X) with values in Lqνt (Y ) such that kTpt f kqt 6 M01−t M1t kf kpt for f ∈ Lpµt (X). One linear map: It remains to find one common domain D that contains Lpµt (X) for all t in (0, 1) and a linear map on D that extends the extension above. For this we let D0 = Lpµ0 (X) ∩ Lpµ1 (X) and D = D0 where · have

pt

p0

+ D0

p1

⊆ Lpµ0 (X) + Lpµ1 (X),

denotes the closure with respect to k · kpt for t ∈ [0, 1], and we D = Lpµ0 (X) + Lpµ1 (X)

unless ∞ ∈ {p0 , p1 }. Applying the assumption of the theorem and Proposition 2.59 we find extensions of T (initially only defined on D0 ); Tp0 extendp0 p1 ing T to D0 and Tp1 extending T to D0 . We now define T (f0 + f1 ) = Tp0 (f0 ) + Tp1 (f1 ) p0

p1

for f0 ∈ D0 and f1 ∈ D0 . We claim that this gives a well-defined extension of T to D with values in Lqν0 (Y ) + Lqν1 (Y ). Indeed, if f0 + f1 = f0′ + f1′ ∈ D p0 p1 with f0 , f0′ ∈ D0 and f1 , f1′ ∈ D0 then f0 − f0′ = f1′ − f1 ∈ D0 and Tp0 (f0 ) − Tp0 (f0′ ) = T (f0 − f0′ ) = T (f1′ − f1 ) = Tp1 (f1′ ) − Tp1 (f1 ). Rearranging the terms, the claim follows. The map on Lpµ (X): Now let f ∈ Lpµt (X) and assume without loss of generality that p0 < p1 . We define the set B = {x ∈ X | |f (x)| 6 1} and use it to split f = f s + f l into a component f s taking on small values and a component f l taking on large values, where f s = f 1B ∈ Lpµt (X) ∩ Lpµ1 (X) and

f l = f 1XrB ∈ Lpµt (X) ∩ Lpµ0 (X).

If we now choose a sequence (fn ) in ΣX with |fn | 6 |f | for all n > 1 and with fn → f pointwise as n → ∞ then fns = fn 1B → f s in Lpµt (X) and in Lpµ1 (X) by dominated convergence if p1 < ∞. If however p1 = ∞ then we can choose the sequence (fn ) of simple functions to also have fns → f s with respect to k · k∞ as n → ∞. Similarly, fnl = fn 1XrB → f l in Lpµt (X) and in Lpµ0 (X). Therefore, T (fns ) → Tpt (f s ) in Lqνt (Y ) and T (fns ) → Tp1 (f s ) in Lqν1 (Y ) as n → ∞. Choosing a subsequence if necessary, the convergence

7.4 Riesz Representation: The Dual of C(X)

239

also holds pointwise almost everywhere, which gives Tpt (f s ) = Tp1 (f s ). The same argument gives Tpt (f l ) = Tp0 (f l ), and Tpt (f ) = T (f ) follows.  Exercise 7.41. Show that t 7→ log kTpt k is convex for t ∈ (0, 1), where Tpt : Lpµt (X) → Lqνt (Y ) is as in the proof above. Exercise 7.42. Let G be a locally compact, σ-compact, metrizable, abelian group. Fix some p ∈ [1, ∞) with H¨ older conjugate q and some F ∈ Lp (G). Show (or recall) that f ∗ F (x) =

Z

f (t)F (x − t) dmG (t)

is well-defined almost everywhere for f ∈ L1 (G) with kf ∗ F kp 6 kf k1 kF kp and also for f ∈ Lq (G) with kf ∗ F k∞ 6 kf kq kF kp . Apply the Riesz–Thorin interpolation theorem to obtain a conclusion for all f ∈ Lr (G) with r ∈ [1, q].

7.4 Riesz Representation: The Dual of C(X) The next result is useful in many ways. It will allow us to completely describe C(X)∗ in Section 7.4.5, but it is more often used directly in the form presented here. Definition 7.43. Let F (X) be a space of real- or complex-valued functions on some space X. Then a positive linear functional on F (X) is a linear functional Λ defined on F (X) with the property that any real-valued function f ∈ F(X) with f > 0 is mapped to Λ(f ) > 0. Theorem 7.44 (Riesz representation). Let X be a locally compact, σcompact metric space, and suppose that Λ is a positive linear functional defined on Cc (X). Then there exists a uniquely determined locally finite (positive) Borel measure µ such that Z Λ(f ) = f dµ X

for all f ∈ Cc (X). Recall that a measure µ is locally finite if every point has a neighbourhood of finite measure, or equivalently if every compact subset of X has finite measure. A locally finite Borel measure is also often called a Radon measure. As a real-valued function in Cc (X) is the difference of two non-negative functions in Cc (X), a positive linear functional maps a real-valued function to a real number. Hence we may and will restrict in the proof below to the real case as this case implies the complex case as well.

240

7 Dual Spaces

Exercise 7.45. Let X be a σ-compact, locally compact metric space. Let µ be a locally finite measure on X. Show that µ is regular, meaning that µ(B) = sup{µ(K) | K ⊆ B is compact} = inf{µ(O) | O ⊇ B is open} for any Borel set B ⊆ X.

We will prove Theorem 7.44 in several steps, first showing the claimed uniqueness of the measure, then showing existence in the totally disconnected compact case, then the compact case and finally the general case. 7.4.1 Uniqueness We will prove uniqueness without assuming the measure to be locally finite, but this is automatic for a measure representing Λ. Proof of uniqueness in Theorem 7.44. Let Λ be a positive linear functional on Cc (X) and suppose µ and ν are two positive measures with Z Z f dµ = Λ(f ) = f dν X

X

for all f ∈ Cc (X). This implies that µ and ν are locally finite, since for every compact set K ⊆ X there exists some function f ∈ Cc (X) with f > 1K by Urysohn’s lemma (Lemma A.27), which shows that µ(K), ν(K) 6 Λ(f ) < ∞. Define m = µ + ν, so that µ ≪ m and ν ≪ m. By Proposition 3.29 there exist Radon–Nikodym derivatives fµ , fν > 0 with dµ = fµ dm,

dν = fν dm,

and fµ + fν = 1 m-almost everywhere. Therefore Z Z Z Z gfµ dm = g dµ = Λ(g) = g dν = gfν dm X

X

X

X

L1m (X)

for all g ∈ Cc (X). Since Cc (X) ⊆ is dense by Proposition 2.51, the functions fµ and fν in L∞ (X) determine the same functional on L1m (X). By m Proposition 7.34 this implies fµ = fν almost everywhere with respect to m. However, this shows that µ = ν.  7.4.2 Totally Disconnected Compact Spaces As our first step towards the existence of the measure representing a positive linear functional we consider the following kind of spaces, where the proof is quite simple. Definition 7.46. Let X be a topological space. A set C ⊆ X is called clopen if it is both open and closed in X. The space X is called totally disconnected

7.4 Riesz Representation: The Dual of C(X)

241

if every open set in X is a union of clopen sets, so the topology has a basis consisting of clopen sets. Example 7.47. Before we give the proof, let us give examples of compact metric totally disconnected spaces. (1) X = {1, . . . , a}N is a compact metrizable space with respect to the product topology using the discrete topology on {1, . . . , a}. It is also totally disconnected, since for any finite collection F1 , . . . , Fn ⊆ {1, . . . , a} the set π1−1 (F1 ) ∩ · · · ∩ πn−1 (Fn ) is both open and closed (here πj is the projection X → {1, . . . , a} onto the jth coordinate). Q ∞ (2) More generally, we can also take the product X = n=1 An , where each An is a finite set equipped with the discrete topology. Note that any closed subset Y ⊆ X is again totally disconnected and compact. One way to define a metric on X as in (1) or (2) and hence also on Y as in (2) is to set ( 0 if x = y, and d(x, y) = 1 if x1 = y1 , . . . , xn−1 = yn−1 , but xn 6= yn n for all points x, y (see also Lemma A.17). In this metric the open ball of radius n1 and centre y is given by  B n1 (y) = x | x1 = y1 , . . . , xn = yn = π1−1 ({y1 }) ∩ · · · ∩ πn−1 ({yn }).

1 Also note Br (y) = B n1 (y) if n+1 < r 6 n1 . It follows that there are only countably many balls and that these are all clopen. As every open set O ⊆ X is a union of balls it follows that every open set is actually a countable union of clopen sets. In particular, the clopen sets generate the Borel σ-algebra.

Lemma 7.48. Let X be a totally disconnected compact metric space. Then the Borel σ-algebra is generated by the clopen sets. As we have already obtained a proof of the lemma in the setting of Example 7.47 and since these cases will be sufficient for the proof of Theorem 7.44 we leave the proof as an exercise. Exercise 7.49. (a) Prove Lemma 7.48 in general by showing that in a compact totally disconnected metric space, there are only countably many clopen sets. (b) Show that every compact totally disconnected metric space is homeomorphic to a metric space Y as in Example 7.47(2).

Proof of Theorem 7.44 for totally disconnected compact metric spaces as in Example 7.47. Let X be a totally disconnected compact metric space so that the algebra C = {C ⊆ X | C is open and closed} generates the Borel σ-algebra of X. Let Λ : C(X) → R be a positive linear functional. Using Λ we can already define a content µC on the algebra C. In fact, for C ∈ C we define

242

7 Dual Spaces

µC (C) = Λ(1C ).

This is possible since that

1C ∈ C(X) as C is both open and closed. It follows

• µC (C) > 0 for C ∈ C (Positivity); • µC (C1 ⊔C2 ) = µC (C1 )+µC (C2 ) for disjoint C1 , C2 ∈ C (Finite additivity). By Caratheodory’s extension theorem (see Theorem B.4) we can extend µC to a measure on the Borel σ-algebra B of X if ! ∞ ∞ G X µC Cn = µC (Cn ) n=1

n=1

F∞ for any disjoint sets C1 , C2 , . . . in C with n=1 Cn ∈ C. In the totally disconnected compact setting this is quite easy to check. Suppose that Cn ∈ C are F∞ disjoint for n > 1 and C = n=1 Cn ∈ C. Then C is compact since C ∈ C gives that it is a closed subsetF of X and X is compact. On the other hand the sets Cn ∈ C are open, so C = ∞ n=1 Cn ∈ C is an open cover of a compact set. FN It follows that C = n=1 Cn for some N > 1, and hence Cn = ∅ for n > N . Hence finite additivity gives µC (C) =

N X

n=1

µC (Cn ) =

∞ X

n=1

µC (Cn ),

as required. Therefore, µC can be extended to a measure µ, defined on the Borel σalgebra B of X. By construction Z 1C dµ = Λ(1C ) X

for C ∈ C. We wish to extend this formula to all continuous functions. For this we note that this formula extends trivially to all linear combinations of characteristic functions of clopen sets. Now note that the linear hull A of the characteristic functions of clopen sets is an algebra, contains the constant function, and separates points. Hence it is dense in C(X) by the Stone– Weierstrass theorem (Theorem 2.40; this is also easy to see directly but the given argument is shorter). It follows that for every R f ∈ C(X) and ε > 0 there exists some g ∈ A (already satisfying Λ(g) = X g dµ) such that g − ε < f < g + ε. R Hence we may apply Λ and · dµ and obtain from the positivity of both these functionals the bounds Z Λ(f ), f dµ ∈ [Λ(g) − εΛ(1), Λ(g) + εΛ(1)]

7.4 Riesz Representation: The Dual of C(X)

and so

243

Z f dµ − Λ(f ) 6 2εΛ(1).

As this holds for all ε > 0 and all f ∈ C(X), the theorem follows.



7.4.3 Compact Spaces We now upgrade the result from Section 7.4.2 to the case of a general compact metric space. For this we are going to use the Hahn–Banach lemma (Lemma 7.1) and the following lemma. Lemma 7.50 (Symbolic cover). Let X be a compact metric space. Then there exists a totally disconnected compact metric space Y and a continuous surjective map φ : Y → X. In fact, we can choose Y as in Example 7.47(2). Example 7.51. A few cases of this lemma do not need a proof, and should help explain why one can think of Y as a symbolic cover. • If X = [0, 1] then we may take Y = {0,P 1}N to be the space of all bin∞ ary sequences with the map φ ((an )) = n=1 an 2−n sending the binary sequence to the real number with that binary expansion. • Let X ⊆ [−M, M ]d be a compact subset of Rd . By composing with an affine map, we can assume without loss that X ⊆ [0, 1]d = X ′ . Define Y ′ = {0, 1}N

d

and a continuous surjective map just as above φ′ : Y ′ −→ X ′ = [0, 1]d ! ∞ ∞   X X (1) (d) (1) −n (d) −n (an ), . . . , (an ) 7−→ an 2 , . . . , an 2 n=1

n=1

and finally Y = {y ′ ∈ Y ′ | φ′ (y ′ ) ∈ X} with φ = φ′ |Y . Then Y ⊆ Y ′ is closed and so again is a totally disconnected compact metric space, and φ : Y → X is continuous and surjective. Exercise 7.52. Suppose that X is a compact d-dimensional manifold. Construct Y and φ as in Lemma 7.50.

We postpone the proof of the lemma until after we have seen why it is useful for the problem at hand. Proof of Theorem 7.44 for compact metric spaces. Let X be a compact metric space, and let Y and φ : Y → X be as in Lemma 7.50. Let Λ : C(X) → R be a positive linear functional. For f ∈ C(X) we have

244

7 Dual Spaces



so Λ

 sup f (x) 1X − f > 0,

x∈X



  sup f (x) 1X − f > 0

x∈X

by positivity, or equivalently

Λ(f ) 6 Λ(1X ) sup f (x). x∈X

Now let V = {f ◦ φ | f ∈ C(X)} ⊆ C(Y ), where we used the continuity of φ, and notice that if f1 ◦ φ = f2 ◦ φ for f1 , f2 ∈ C(X) then f1 = f2 since φ is surjective. Thus we may define ΛV (f ◦ φ) = Λ(f ), which is linear and satisfies ΛV (f ◦ φ) = Λ(f ) 6 Λ(1X ) sup f (x) = p(f ◦ φ) x∈X

for p : C(Y ) → R defined by p(F ) = Λ(1X ) sup F (y). y∈Y

Note that p satisfies p(F1 + F2 ) 6 p(F1 ) + p(F2 ) and p(αF1 ) = αp(F1 ) for F1 , F2 ∈ C(Y ) and α > 0. These are precisely the hypotheses for the Hahn–Banach lemma (Lemma 7.1), so we conclude that ΛV can be extended to a functional ΛY : C(Y ) → R which still satisfies ΛY (F ) 6 Λ(1X ) sup F (y). y∈Y

If F > 0 then −F 6 0 and so ΛY (−F ) 6 Λ(1X ) sup (−F (y)) 6 0, y∈Y

or ΛY (F ) > 0. Hence ΛY is a positive linear functional on Y . By the totally disconnected compact case in Section 7.4.2 we conclude that there is a measure µY on Y with Z ΛY (F ) =

F dµY

Y

for all F ∈ C(Y ). Applying this to F = f ◦ φ, we see that Z Λ(f ) = ΛY (f ◦ φ) = f ◦ φ dµY . Y

We now define µ = φ∗ µY by the formula µ(B) = µY (φ−1 (B)) for a Borel set B ⊆ X. Note that by this definition we have

7.4 Riesz Representation: The Dual of C(X)

Z

X

245

1B dµ = µ(B) = µY (φ−1 (B)) =

Z

Y

1B ◦ φ dµY ,

which extends by linearity to all simple functions, then by monotone convergence to all positive measurable functions, and then to all integrable functions. In particular, Z Z Λ(f ) = f ◦ φ dµY = f dµ Y

X

for all f ∈ C(X), proving the theorem for a compact metric space X.



We note that the argument above actually proves the following abstract principle. If φ : Y → X is a continuous surjective map between two compact spaces, and the Riesz representation theorem holds for Y , then it also holds for X. It remains to construct the totally disconnected symbolic cover. Proof of Lemma 7.50. Recall that since X is a compact metric space, it is also totally bounded, so for every m > 1 there exist finitely many (m) (m) points x1 , . . . , xn(m) ∈ X with n(m)

X=

[

i=1

We define Z=

∞ Y

(m) 

B1/m xi

.

(7.14)

{1, . . . , n(m)}

m=1

with the product topology from the discrete topologies on each of the spaces {1, . . . , n(m)}. Then Z is a compact metric space (see Sections A.3 and A.4). We will define Y as a closed subset of Z, and will define φ : Y → X by (m) φ(y) = lim xy(m) , m→∞

(m)

where y(m) ∈ {1, . . . , n(m)} is the mth coordinate of y and xy(m) is the corresponding centre of the y(m)-th ball in the cover (7.14). Our definition of Y will ensure that φ is well-defined (that is, the limit defining φ exists), continuous, and surjective. The closed set Y . Define n o (1)  (m)  Y = y ∈ Z | B1/1 xy(1) ∩ · · · ∩ B1/m xy(m) 6= ∅ for all m > 1 .

We will show that Y is closed by proving that its complement ZrY is open. So suppose that z ∈ ZrY , so that (1)  (m)  B1/1 xz(1) ∩ · · · ∩ B1/m xz(m) = ∅

246

7 Dual Spaces

for some m > 1. However, this means that all other sequences with the same first m coordinates also lie in ZrY . That is, −1 π1−1 ({z(1)}) ∩ · · · ∩ πm ({z(m)}) ⊆ ZrY,

and the set on the left is an open neighbourhood of z by definition, so ZrY is open. The limit defining φ exists. Let y ∈ Y and m > ℓ, then there exists a point (ℓ)  (m)  x ∈ B1/ℓ xy(ℓ) ∩ B1/m xy(m) and so

 (ℓ) (m)  (ℓ) (m)  d xy(ℓ) , xy(m) 6 d xy(ℓ) , x + d x, xy(m)
0. Choose ℓ with 4ℓ < ε. Suppose that z ∈ Y belongs to the neighbourhood πℓ−1 ({y(ℓ)}) defined by the ℓth coordinate of y. Letting m → ∞ in (7.15) we see that

and similarly

 (ℓ) d xy(ℓ) , φ(y) 6

2 ℓ

 (ℓ) d xz(ℓ) , φ(z) 6 2ℓ .

However, by the choice of z we have y(ℓ) = z(ℓ) and so  d φ(z), φ(y) 6 4ℓ < ε.

This shows the continuity of φ. Surjectivity. Let x ∈ X, and choose for every m > 1 an index y(m) in {1, . . . , n(m)} with (m)  x ∈ B1/m xy(m) , which is possible by (7.14). It follows directly from the definitions that y ∈ Y and that φ(y) = x.  7.4.4 Locally Compact σ-Compact Metric Spaces Knowing Theorem 7.44 for compact metric spaces, we now extend it to σcompact locally compact metric spaces using suitable patchworking. Proof of Theorem 7.44. Let X be a locally compact σ-compact metric space, and let Λ : Cc (X) → R be a positive linear functional. By Lemma A.22 o there exists a sequence of compact sets (Kn ) with Kn ⊆ Kn+1 for all n > 1 S∞ and with X = n=1 Kn .

7.4 Riesz Representation: The Dual of C(X)

247

By Urysohn’s lemma (Lemma A.27) there exists a function fn ∈ Cc (X) with 1Kn 6 fn 6 1 for each n > 1. If f ∈ Cc (Kno ) then   sup f (x) fn − f > 0 o x∈Kn

and hence Λ(f ) 6 Λ(fn ) sup f (x). o x∈Kn

We now consider Cc (Kno ) as a subspace of the space of continuous functions C(Kn ) on Kn . The norm-like function pn (f ) = Λ(fn ) sup f (x) o x∈Kn

for f ∈ C(Kn ) has all the properties needed to apply Lemma 7.1, so Λ|Cc (Kno ) extends to some Λn defined on C(Kn ) and is again positive (use the argument from Section 7.4.3 to check this), and can be represented by a finite measure µn defined on the Borel sets in Kn . Restricting this measure µn to Kno , we obtain a measure µn = µn |Kno on Kno with Z Λ(f ) = f dµn o Kn

for all f ∈ Cc (Kno ). We claim that these measures can be patched together to define a locally finite measure µ on X with the desired properties. For this, o notice that µn+1 is a measure on Kn+1 which satisfies Z Z Λ(f ) = f dµn+1 = f dµn+1 o Kn+1

o Kn

o for all f ∈ Cc (Kno ) ⊆ Cc (Kn+1 ). By the uniqueness of the measure in Theorem 7.44 (see Section 7.4.1) this shows that µn+1 |Kno = µn for all n > 1. Using this compatibility property we may define

µ(B) = lim µn (B ∩ Kno ) = µ1 (B ∩ K1o ) + n→∞

∞ X

n=2

o µn (B ∩ KnorKn−1 )

P∞ for any measurable B ⊆ X. Alternatively we may also write µ = n=1 µ′n , o where µ′1 = µ1 , X1 = K1o , µ′n = µn Xn =Kno rKn−1 for n > 2. By Exercise 3.30 this shows that µ is indeed a measure. Note that µ Kno = µn for n > 1. By construction {Kno | n ∈ N} is an open cover of X. Hence for a given compact subset K ⊆ X there is a finite subcover of K so K is contained in some Kno , and we have µ(K) 6 µn (Kno ) < ∞, so µ is locally finite. By the same argument any f ∈ Cc (X) belongs to some Cc (Kno ) and hence

248

7 Dual Spaces

Λ(f ) =

Z

f dµn = o Kn

Z

f dµ = o Kn

Z

f dµ

X

as required.



Exercise 7.53. Let X be a σ-compact locally compact metric space, and let Λ be a positive linear functional C0 (X) → R (where we do not assume that Λ is bounded). Show Z that Λ(f ) =

f dµ for all f ∈ C0 (X) for a finite measure µ on X.

7.4.5 Continuous Linear Functionals on C0 (X) In the remainder of this section we again treat the real and the complex case simultaneously. The following result describes the dual of C0 (X). Theorem 7.54 (Riesz representation on C0 (X)). Let X be a locally com∗ pact σ-compact metric space, and let Λ ∈ (C0 (X)) be a continuous linear functional on the space C0 (X) of continuous functions on X that vanish at infinity. Then there exists a uniquely determined signed measure µ representing Λ. That is, there exists a positive finite measure |µ| and some measurable g with kgk∞ = 1 such that dµ = g d|µ| defines a signed measure with Z Z Λ(f ) = f dµ = f g d|µ| X

X

for all f ∈ C0 (X). The operator norm of Λ is equal to kgkL1|µ| (X) , which shows that C0 (X)∗ ∼ = M(X) under the pairing Z hf, µi = f dµ for f ∈ C0 (X) and µ ∈ M(X), where M(X) is equipped with the norm in Exercise 3.33. We note that in a sense Theorem 7.54 also gives a polar decomposition for complex signed measures (see Exercises 7.56 and 7.55). In the proof below we first construct from the linear functional Λ a positive linear functional |Λ| (which may be called the positive version of Λ) which will give rise to the positive finite measure |µ|. The existence of g will then follow from Proposition 7.34. At first sight the construction of |Λ| is surprising — we will force positivity, and then linearity is a minor miracle. Comparing this construction to our discussion of the operator norm of integration in Lemma 2.63 and its proof should make this less surprising. Proof of Theorem 7.54. Let Λ be a continuous linear functional on C0 (X). Uniqueness: To see the uniqueness claim in the theorem, suppose that Λ is represented by dµ1 = g1 d|µ1 | and also by dµ2 = g2 d|µ2 |. Define

7.4 Riesz Representation: The Dual of C(X)

249

µ = |µ1 | + |µ2 | and notice that |µ1 |, |µ2 | ≪ µ. By Proposition 3.29 this implies that there is a measurable function hj > 0 with d|µj | = hj dµ for j = 1, 2. This shows that Λ is representated by dµj = gj hj dµ for j = 1, 2, so (g1 h1 − g2 h2 ) dµ represents the zero functional on C0 (X). By Lemma 2.63 this implies that kg1 h1 − g2 h2 kL1µ = 0, which in turn implies that µ1 = µ2 . Defining the positive version |Λ|: To prove the existence, define the positive version of Λ by |Λ|(f ) = sup {ℜ(Λ(g)) | g ∈ C0 (X), |g| 6 f } for any non-negative and continuous f ∈ C0,R (X). Clearly |Λ(g)| 6 kΛkop kf k∞ for all g as in the definition of |Λ|(f ) and so 0 6 |Λ|(f ) 6 kΛkop kf k∞ .

(7.16)

Moreover, the definition readily implies |Λ|(αf ) = α|Λ|(f ) for α > 0. In order to extend |Λ| to an R-linear functional on C0,R (X) we first consider functions f1 , f2 ∈ C0,R (X) with f1 > 0 and f2 > 0, and claim that |Λ|(f1 + f2 ) = |Λ|(f1 ) + |Λ|(f2 ).

(7.17)

One inequality is quite easy. If gi ∈ C0 (X) satisfies |gi | 6 fi for i = 1, 2, then |g1 + g2 | 6 |g1 | + |g2 | 6 f1 + f2    and so ℜ Λ(g1 ) + ℜ Λ(g2 ) = ℜ Λ(g1 + g2 ) 6 |Λ|(f1 + f2 ), which shows that |Λ|(f1 ) + |Λ|(f2 ) 6 |Λ|(f1 + f2 ). To show the reverse inequality, we need to take some function g ∈ C0 (X) with |g| 6 f1 + f2 and split it into continuous functions g = g1 + g2 with the property that |g1 | 6 f1 and |g2 | 6 f2 . We define ( g(x) if |g(x)| 6 f1 (x), g1 (x) = g(x) f (x) if |g(x)| > f1 (x) and g(x) 6= 0, |g(x)| 1 which we claim is a continuous function satisfying |g1 | 6 f1 . First consider the restriction h of g1 to the set D = {x ∈ X | |g(x)| > f1 (x)}. Clearly h is continuous wherever g(x) 6= 0. For the points

250

7 Dual Spaces

x ∈ D0 = {x ∈ D | g(x) = 0} we have h(x) = 0 by definition of g1 , and f1 (x) = 0 by definition of D. It follows that |h| = |f1 | on D and, since f1 is continuous, we see that x0 ∈ D0 and x → x0 inside D implies that h(x) → 0 as x → x0 inside D, which proves that h is also continuous at points in D0 . Now notice that g1 is continuous on the two closed sets D and {x ∈ X | |g(x)| 6 f1 (x)}, the union of which is X. It follows that g1 is continuous and satisfies |g1 | 6 f1 on X. Since f1 ∈ C0 (X) we also have g1 ∈ C0 (X). We also define g2 = g − g1 ∈ C0 (X) and notice that ( 0 if |g(x)| 6 f1 (x), |g2 (x)| = |g(x)| − f1 (x) if |g(x)| > f1 (x) so that |g2 | 6 f2 by the assumption on g. Hence ℜ(Λ(g)) = ℜ(Λ(g1 )) + ℜ(Λ(g2 )) 6 |Λ|(f1 ) + |Λ|(f2 ), which proves that |Λ|(f1 + f2 ) 6 |Λ|(f1 ) + |Λ|(f2 ) since g ∈ C0 (X) was arbitrary satisfying |g| 6 f1 + f2 . In particular, we now obtain (7.17). Linearity of |Λ|: Now let f be any real-valued function in C0,R (X). We extend the definition of |Λ| by the formula |Λ|(f ) = |Λ|(f + ) − |Λ|(f − ),

(7.18)

where f + = max{f, 0} and f − = max{−f, 0} are non-negative continuous functions. We now have |Λ|(αf ) = α|Λ|(f ) for all α ∈ R and f ∈ C0,R (X). We note that (7.16) extends to all f ∈ C0,R (X) since |Λ|(f + ), |Λ|(f − ) ∈ [0, kΛkopkf k∞ ]. For linearity it remains to show that |Λ|(f1 + f2 ) = |Λ|(f1 ) + |Λ|(f2 )

(7.19)

for f1 , f2 ∈ C0,R (X). To see this, notice first that (f1 + f2 )+ − (f1 + f2 )− = f1 + f2 = f1+ − f1− + f2+ − f2− and so (f1 + f2 )+ + f1− + f2− = (f1 + f2 )− + f1+ + f2+ . We may apply |Λ| to the latter equation and use the non-negative linearity in (7.17) to get   |Λ| (f2 +f2 )+ +|Λ|(f1− )+|Λ|(f2− ) = |Λ| (f1 +f2 )− +|Λ|(f1+ )+|Λ|(f2+ ).

Rearranging the terms again we get

7.4 Riesz Representation: The Dual of C(X)

251

  |Λ| (f2 +f2 )+ −|Λ| (f1 +f2 )− = |Λ|(f1+ )−|Λ|(f1− )+|Λ|(f2+ )−|Λ|(f2− ),

which is precisely (7.19) by definition of |Λ| in (7.18). Applying Riesz representation: We have thus shown that |Λ| is a bounded positive linear functional on C0,R (X). Restricting it to Cc,R (X) we may apply Theorem 7.44 and find a positive measure |µ| with Z |Λ|(f ) = f d|µ| (7.20) X

for f ∈ Cc,R (X). Now use local compactness, σ-compactness, and Urysohn’s lemma (see Lemma A.22 and Lemma A.27) to find some non-negative function fn ∈ Cc,R (X) with fn ր 1 as n → ∞ and apply monotone convergence to obtain Z |µ|(X) = lim fn d|µ| = lim |Λ|(fn ) 6 kΛkop (7.21) n→∞

n→∞

by (7.16). Note that (7.20) extends now to all f ∈ C0,R (X) by applying (7.20) to the sequence (fn f ) in Cc,R (X) together with continuity of |Λ| and dominated convergence. Description of Λ: We now return to the study of the original functional Λ. For any f ∈ C0 (X) we may apply the definition of |Λ| and |µ| to obtain Z |Λ(f )| = αΛ(f ) = ℜ(Λ(αf )) 6 |Λ|(|αf |) = |f | d|µ| (7.22) X

for some α ∈ C with |α| = 1. However, (7.22) shows that Λ is continuous with respect to k · kL1|µ| (X) , hence by density of Cc (X) extends to L1|µ| (X), and so by Proposition 7.34 must be of the form Z Λ(f ) = f g d|µ| X

for some g ∈ L∞ |µ| (X). Moreover, kΛkop = kgkL1 (µ) by Lemma 2.63, and together with (7.21) we obtain kΛkop = kgkL1(µ) 6 kgk∞ |µ|(X) 6 kgk∞ kΛkop , and so kgk∞ = 1 follows unless kΛkop = 0. In the trivial case Λ = 0 we have |µ| = 0 and may also set g ≡ 1.  Exercise 7.55. In the notation of Theorem 7.54 (and of its proof) show that |g| = 1 for |µ|-almost every x ∈ X. Exercise 7.56. (a) Recall that µ = ν1 − ν2 defines a real signed measure µ on X if ν1 , ν2 are two finite measures on a measurable space X. Show that for every real signed measure µ there exist uniquely determined positive measures µ+ ⊥ µ− with µ = µ+ − µ− . Also show that µ+ (B) = sup{µ(A) | A ⊆ B measurable} and similarly for µ− . Show the existence of µ+ , µ− as a corollary of Proposition 3.29 and also of Theorem 7.54 if X is a locally compact σ-compact metric space. (b) Suppose that µ is a complex signed measure defined by dµ = h dν for some finite positive measure ν on X and some h ∈ L1ν (X). Show that µ also has a representation in

252

7 Dual Spaces

the form dµ = g d|µ| with |g| = 1 everywhere and |µ| being a positive finite measure. Show that |µ| is uniquely determined, as is g, |µ|-almost everywhere. Exercise 7.57. Let X be a locally compact σ-compact metric space, and let Λ be a linear functional on Cc (X) with the property that for any compact K ⊆ X there is a constant CK > 0 such that |Λ(f )| 6 CK kf k∞ for any f ∈ Cc (X) with Supp f ⊆ K. Show that Λ can be represented by a signed Radon measure on X, meaning that there exists a Radon measure µ on X and a locally R integrable (that is, integrable on any compact subset) function g on X such that Λ(f ) = f g dµ for all f ∈ Cc (X). Exercise 7.58. Let X = [0, 1] ⊆ R (though the reader will notice that the same conclusions holds on most compact metric spaces). (a) Notice that every finite signed measure µ on X defines a linear functional on the space L ∞ (X) = {f : X → R | kf k∞ < ∞, f measurable} but that L ∞ (X)∗ contains other functionals as well. (b) Notice that every function f ∈ L ∞ (X) defines a linear functional on the space of finite signed measures M(X) ∼ = C(X)∗ . Deduce that C(X) is not reflexive. Show that M(X)∗ contains more functionals than those arising from L ∞ (X). Exercise 7.59. Find a description of the dual of C n ([0, 1]) for all n ∈ N.

7.5 Further Topics • As we will see in Section 8.6, the Hahn–Banach lemma (Lemma 7.1) is very useful in the study of closed convex sets (even in the more general setting of locally convex vector spaces introduced in Section 8.4). • The explicit description of the dual spaces in this chapter will give us concrete cases of the weak and the weak* topology in Chapter 8. • The more general Marcinkiewicz interpolation theorem gives a result similar to the Riesz–Thorin interpolation theorem for certain non-linear operators, see Folland [33, Sec. 6.5]. • The Riesz representation theorem (Theorem 7.44) has numerous applications. It plays a crucial role in obtaining a point in a convex set as a generalized convex combination of extreme points of the convex set (see Section 8.6.1), in the spectral theory of unitary and self-adjoint operators on Hilbert spaces (see Chapters 9 and 12–13), in the spectral theory of unitary representations of locally compact abelian groups (see Herglotz’s Theorem 9.6), and also in the construction of the Haar measure of a locally compact group (see Section 10.1).

Chapter 8

Locally Convex Vector Spaces

In this chapter we introduce the important weak and weak* topologies on Banach spaces and their duals, prove an important compactness result, introduce two more topologies on B(V, W ), and put these into the general context of locally convex vector spaces. Finally, we also discuss convex sets of locally convex vector spaces.

8.1 Weak Topologies and the Banach–Alaoglu Theorem As we have seen in Proposition 2.35, the unit ball in an infinite-dimensional normed vector space is not compact in the topology induced by the norm (which is often called the norm or strong topology). Given the central importance of compactness in much of analysis, this is a significant problem. In general this is simply something that must be lived with as a price to pay for the additional power of doing analysis in infinite-dimensional spaces, but we can also improve the chance of finding compactness by studying weaker topologies than the norm topology. We also refer to Appendix A.3 since many definitions of topologies in this chapter are special cases of more general constructions discussed there. As usual, for a given normed vector space X the space X ∗ consists of the linear functionals that are continuous with respect to the norm topology on X. Definition 8.1. Let X be a normed vector space with dual space X ∗ . The weak topology on X is the weakest (coarsest) topology on X for which all the elements of X ∗ (which are functions on X) are continuous. Exercise 8.2. Show that the weak and norm topologies coincide for a finite-dimensional normed vector space.

By the properties of the initial topology (Definition A.15) a neighbourhood in the weak topology of x0 ∈ X is a set containing a set of the form © Springer International Publishing AG 2017 M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications, Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_8

253

254

8 Locally Convex Vector Spaces

Nℓ1 ,...,ℓn ;ε (x0 ) =

n \ 

i=1

x ∈ X | |ℓi (x) − ℓi (x0 )| < ε

for some ε > 0 and functionals ℓ1 , . . . , ℓn ∈ X ∗ . Note that a sequence (xn ) in X converges in the weak topology to x ∈ X if and only if ℓ(xn ) → ℓ(x) for every ℓ ∈ X ∗ . However, sequences alone are in general not sufficient to describe a topology (see Exercise 8.15); one needs to consider filters (or nets) instead. We therefore generalize this comment to that setting in the next exercise. Exercise 8.3. Given a normed vector space X, show that a filter F ⊆ P(X) converges in the weak topology to x ∈ X if and only if limF ℓ = ℓ(x) for all ℓ ∈ X ∗ (see Appendix A.2).

If X is infinite-dimensional, then in contrast to Exercise 8.2 the weak topology and the norm topology are different. To see this notice that n \

i=1

ker ℓi ⊆ Nℓ1 ,...,ℓn ;ε (0),

which implies that no neighbourhood of 0 in the weak topology can be bounded with respect to the norm of X. Definition 8.4. Let X be a normed vector space with dual space X ∗ . The weak* topology (read as ‘weak star’ topology) is the weakest (or coarsest) topology on X ∗ for which all the evaluation maps x∗ 7→ x∗ (x) corresponding to x ∈ X are continuous. Once again we can describe the weak* topology by saying that a neighbourhood of x∗0 ∈ X ∗ is a set containing a set of the form Nx1 ,...,xn ;ε (x∗0 ) =

n \ 

i=1

x∗ ∈ X ∗ | |x∗ (xi ) − x∗0 (xi )| < ε

for some ε > 0 and x1 , . . . , xn ∈ X. As before, we can show that the weak* topology and the norm topology on X ∗ are different if X (and hence if X ∗ ) is infinite-dimensional. Example 8.5. (a) For a Hilbert space H, the weak and weak* topologies are identical. The same holds for any reflexive Banach space. However, in general there is no definition of a weak* topology on a given Banach space as there may not exist a pre-dual of X, meaning a Banach space Y with X = Y ∗ (see Example 8.81). (b) Let X = [0, 1] and consider the sequence of measures (µn ) where µn =

 1 δ1/n + δ2/n + · · · + δ1 , n

8.1 Weak Topologies and the Banach–Alaoglu Theorem

255

viewed (via integration) as functionals on C(X) (see Theorem 7.54; here δt denotes the point measure defined by δt (A) = 1 if t ∈ A and 0 if not). Then the sequence of measures (µn ) converges in the weak* topology to the Lebesgue measure λ, which we also identify with the functional it induces. Notice that this statement is equivalent to the beginning of the theory of the Riemann integral for continuous functions, which should help to explain why the weak* topology is a quite natural notion. Notice however that (µn ) does not converge in the weak topology, nor a fortiori in the norm topology. To see the former, notice that every function in L ∞ ([0, 1]) induces a linear functional on the space M([0, R 1]) of finite signed measures on [0, 1], and that for f = 1 we have f dµn = 1 for Q ∩ [0,1] R all n, while f dλ = 0. Thus the weak and weak* topologies on M([0, 1]) are different. We note that we have already seen other interesting examples of weak* convergence. In fact, Proposition 3.65 can be interpreted as saying that for every x ∈ T the measures defined by t 7→ FM (x − t) dt on T converge in the weak* topology to the Dirac measure δx corresponding to the unit point mass at x. The reader can analyze the proof of Proposition 3.65 and the material of this section to prove the following theorem due to Toeplitz. Exercise 8.6 (Toeplitz). Suppose that (kn ) is a sequence of integrable complex-valued functions defined on [0, 1], and let x be a point in [0, 1]. Then the measures defined by kn (t) dt converge in the weak* topology to δx as n → ∞ if and only if all of the following conditions hold: (1) kkn k1 6 C for some constant C independent of n; (2) (3)

Z

1

Z0 1 0

kn (t) dt −→ 1 as n → ∞; and kn (t)g(t) dt −→ 0 as n → ∞ for all g ∈ C ∞ ([0, 1]) with x ∈ / Supp(g).

Lemma 8.7. For a Banach space X the weak topology on X and the weak* topology on X ∗ are Hausdorff. Proof. For the weak topology this follows from Corollary 7.4: if y 6= z in X there exists some ℓ ∈ X ∗ with ℓ(y) 6= ℓ(z), so that Nℓ;ε (y) ∩ Nℓ;ε (z) = ∅ . The proof for the weak* topology is similar, using the fact for ε = |ℓ(z)−ℓ(y)| 2 that for x∗1 6= x∗2 there exists some x ∈ X with x∗1 (x) 6= x∗2 (x).  Exercise 8.8. Let X be a Banach space and let (xn ) be a sequence converging to x ∈ X in the weak topology. Show that supn>1 kxn k < ∞. In other words, show that weakly convergent sequences in Banach spaces are bounded.

The following exercise shows that the weak and the weak* topologies have natural compatibility properties with respect to bounded operators.

256

8 Locally Convex Vector Spaces

Exercise 8.9. Let A : X → Y be a bounded operator between two Banach spaces X and Y . (a) Show that A is also a continuous operator if we equip both X and Y with the respective weak topologies. (b) Consider the dual operator A∗ : Y ∗ → X ∗ defined by A∗ (y ∗ ) = y ∗ ◦ A ∈ X ∗ for all y ∗ ∈ Y ∗ . Show that A∗ is continuous if we endow both dual spaces with the weak* topologies. (c) Suppose now that A is a compact operator. Show that Axn → Ax as n → ∞ in the norm topology on Y whenever xn → x as n → ∞ in the weak topology on X.

8.1.1 Weak* Compactness of the Unit Ball The importance of the weak* topology comes from the following theorem, which was alluded to in the introduction to the chapter. Theorem 8.10 (Banach–Alaoglu). The closed unit ball B1X = {ℓ ∈ X ∗ | kℓkop 6 1} ∗

in the dual X ∗ of a normed vector space X is compact in the weak* topology. Proof. Let B(r) be the closed (and hence compact) ball of radius r > 0 in R or C depending on the field of scalars. By Tychonoff’s theorem (see Theorem A.20) the space Y Y = B(kxk) x∈X

is compact with respect to the product topology (see Definition A.16). Now define the embedding ∗

φ : B1X −→ Y

ℓ 7−→ (ℓ(x))x∈X ∈ Y.

Let πx : Y −→ B(kxk) be the projection operator (or evaluation map) corresponding to x ∈ X defined by Y ∋ y 7→ πx (y) = y(x). Then the neigh∗ bourhoods of some y = φ(ℓ0 ) with ℓ0 ∈ B1X in the product topology are sets containing sets of the form N=

n \

πx−1 (Bε (ℓ0 (xi ))) . i

i=1

Now notice that the pre-image of N under φ takes the form φ−1 (N ) =

n \

{ℓ ∈ B1X | |ℓ(xi ) − ℓ0 (xi )| < ε} = Nx1 ,...,xn ;ε (ℓ0 ),

i=1

8.1 Weak Topologies and the Banach–Alaoglu Theorem

257 ∗

which is precisely one of the neighbourhoods of ℓ0 ∈ B1X defining the weak* ∗ ∗ topology on B1X . Therefore, φ is a homeomorphism from B1X (with the restriction of the weak* topology) to a subset of Y (with the product topology). We claim that  ∗ φ B1X ⊆ Y is closed, which then implies the theorem, since any closed subset of Y is compact since Y is itself compact. ∗ To see the claim, notice first that φ(B1X ) consists of all linear maps in Y . This is because any element y ∈ Y is a scalar-valued function on X with y(x) ∈ B(kxk) for all x ∈ X, and so if y is linear then kyk 6 1. The claim now follows easily since linearity is defined by equations and so is a closed condition, as we will now show. In fact for any scalars α1 , α2 the set Dα1 ,α2 = {(λ1 , λ2 , λ3 ) | λ3 = α1 λ1 + α2 λ2 } is closed, and the joint evaluation map πx1 ,x2 ,α1 x1 +α2 x2 (y) = (y(x1 ), y(x2 ), y(α1 x1 + α2 x2 )) is continuous by definition of the product topology on Y . Hence the set of all linear maps in Y is given by \ πx−1 (Dα1 ,α2 ) 1 ,x2 ,α1 x1 +α2 x2 x1 ,x2 ,α1 ,α2

and so is closed, and the theorem follows.



8.1.2 More Properties of the Weak and Weak* Topologies The weak and weak* topologies are never metrizable for infinite-dimensional Banach spaces (see Exercise 8.12), but when restricted to the unit ball the situation is better. Proposition 8.11. Let D ⊆ X be a dense subset of a normed vector space. ∗ ∗ Then the weak* topology restricted to B1X is the weakest topology on B1X for which the evaluation maps ℓ 7→ ℓ(x) are continuous for all x ∈ D. In ∗ particular, if X is separable, then the weak* topology restricted to B1X is metrizable. Proof. Suppose that D ⊆ X is dense, and suppose that Nx;ε = {ℓ ∈ X ∗ | |ℓ(x) − ℓ0 (x)| < ε}

258

8 Locally Convex Vector Spaces ∗

is a neighbourhood of ℓ0 ∈ B1X defined by ε > 0 and some arbitrary x ∈ X. ∗ Choose some x′ ∈ D with kx − x′ k < 3ε , and notice that for all ℓ ∈ B1X we ε ′ have |ℓ(x) − ℓ(x )| < 3 and so ∗

Nx′ ;ε/3 (ℓ0 ) ∩ B1X ⊆ Nx;ε (ℓ0 ) ∩ B1X

by a simple application of the triangle inequality (check this). Thus the to∗ pologies defined on B1X using the evaluation maps for x ∈ D or for x ∈ X (the latter being the weak* topology by definition) agree. For the last claim of the proposition, notice that if X is separable, then by definition there exists a countable dense set D = {x1 , x2 , . . . } ⊆ X. For every xn ∈ D the weakest topology for which ℓ 7→ ℓ(xn ) is continuous is the topology induced by the semi-norm kℓkxn = |ℓ(xn )|, and so the weak* ∗ topology is the weakest topology on B1X that is stronger than all the topologies induced by the semi-norms k · kxn for n ∈ N. By Lemma A.17 and the Hausdorff property of the weak* topology from Lemma 8.7, this topology is metrizable.  Exercise 8.12. Let X be an infinite-dimensional Banach space. (a) Show that X is not the span of countably many elements of X. That is, show that for any x1 , x2 , . . . ∈ X we have X 6= hxn | n ∈ Ni. Of course we may have X = hxn | n ∈ Ni. (b) Use part (a) to show that the weak* topology does not have a countable basis of neighbourhoods of 0. Conclude that the weak* topology on X ∗ cannot be metrizable. (c) Generalize (b) to the weak topology on X.

Let us finish with the following lemma, which answers both of the following questions for a Banach space affirmatively: • Does X ∗ as a vector space with the weak* topology characterize X? • If the weak and weak* topologies on X ∗ agree, does it follow that X is reflexive? Lemma 8.13. Let X be a Banach space. A functional on X ∗ is continuous with respect to the weak* topology if and only if it is an evaluation map, that is a map of the form f : x∗ 7→ x∗ (x) for some x ∈ X. Proof. Suppose f is a functional on X ∗ continuous with respect to the weak* topology. Then f −1 (B1C ) is a neighbourhood of 0 in X ∗ , and so there exist x1 , . . . , xn ∈ X and ε > 0 with  Nx1 ,...,xn ;ε (0) ⊆ f −1 B1C . If now x∗ ∈ X ∗ satisfies x∗ (x1 ) = · · · = x∗ (xn ) = 0 then any multiple of x∗ belongs to Nx1 ,...,xn ;ε (0) and therefore |f (M x∗ )| < 1 for all scalars M . This implies that f (x∗ ) = 0, or in other words that f induces a functional on

8.1 Weak Topologies and the Banach–Alaoglu Theorem

Y = X ∗/

n \

ker φxi

i=1

259

φx1  φx2    ∼ = im  ..  = V,  .  φxn

where φx (x∗ ) = x∗ (x) for x∗ ∈ X ∗ and x ∈ X. However, the dual of V is generated by the restrictions of the coordinate functions to V , and these correspond to the functionals φx1 , . . . , φxn on Y , so that f must be a linear combination of the form f=

n X

αi φxi = φPni=1 αi xi

i=1

for some scalars α1 , . . . , αn , as claimed.



Lemma 8.13 in particular determines the vector space X (and its weak topology) as the space of continuous functionals on X ∗ with respect to the weak* topology, which answers the first question above affirmatively. The second question can also be answered using the lemma. Suppose the weak and weak* topologies on X ∗ agree and ℓ ∈ X ∗∗ is a linear functional on X ∗ . By definition ℓ is also continuous with respect to the weak topology, and so also with respect to the weak* topology by assumption. Lemma 8.13 shows that ℓ(x∗ ) = x∗ (x) for all x∗ ∈ X ∗ for some x ∈ X, which shows that X is reflexive since ℓ ∈ X ∗∗ was arbitrary. Exercise 8.14. Let X be a reflexive Banach space. Let (xn ) in X be a bounded sequence. Show that (xn ) has a weakly convergent subsequence. Notice that this follows immediately from Theorem 8.10 and Proposition 8.11 if X ∗ is separable; show it in general. Exercise 8.15. We know that the weak topology and the norm topology on infinitedimensional Banach spaces are different. In contrast to this, show that a sequence in ℓ1 (N) converges in the weak topology if and only if it converges in the norm topology. Exercise 8.16. Fix p ∈ (1, ∞). (a) Prove that a sequence (fn ) in ℓp (N) converges weakly to f ∈ ℓp (N) if and only if there is some M with kfn kp 6 M for all n > 1 and fn (k) → f (k) as n → ∞ for each k ∈ N. (b) Find a sequence in ℓp (N) that converges weakly but not in norm. Exercise 8.17. Let X, Y be normed vector spaces, and let T : X → Y be linear. Show that T is a bounded operator if and only if T is sequentially continuous with respect to the weak topology, that is, xn → x weakly in X as n → ∞ implies that T xn → T x weakly in Y as n → ∞. Exercise 8.18. Let X be an infinite-dimensional normed vector space. Show that the weak closure of the unit sphere S = {x ∈ X | kxk = 1} is the closed unit ball B1X = {x ∈ X | kxk 6 1}.

260

8 Locally Convex Vector Spaces

8.1.3 Analytic Functions and the Weak Topology †

As we have seen, weak convergence and norm convergence are in general quite different. There are, however, situations in which weak convergence can be upgraded to norm convergence. Analytic functions taking values in a Banach space provide one setting where this phenomenon is seen. Definition 8.19. Let G ⊆ C be an open set, and let X be a complex Banach space. A function f : G → X is called (strongly) analytic if for every ζ ∈ G the limit f (ζ + h) − f (ζ) f ′ (ζ) = lim h→0 h exists in the norm topology. Also f is called weakly analytic if for every ℓ ∈ X ∗ and ζ ∈ G the limit (ℓ ◦ f )′ (ζ) = lim

h→0

ℓ(f (ζ + h)) − ℓ(f (ζ)) h

exists. Notice that in the definition of weak analyticity we do not see immediately whether we can associate to f and ζ a weak limit of the difference quotient defining f ′ (ζ). What we can associate to f and ζ ∈ G in terms of a derivative is a weak* limit in X ∗∗ , X ∗ ∋ ℓ 7−→ lim

h→0

ℓ(f (ζ + h)) − ℓ(f (ζ)) ∈ C, h

which is bounded by a corollary of the Banach–Steinhaus theorem (Corollary 4.3). However, much more is true. Theorem 8.20 (Dunford). Let G ⊆ C be an open set and let X be a Banach space. Then every weakly analytic function f : G → X is analytic. Proof. Let ℓ ∈ X ∗ , so that by assumption ℓ ◦ f : G → C is analytic. Choose ζ ∈ G, ε > 0 sufficiently small, and h ∈ BεC . Then I 1 ℓ ◦ f (z) ℓ ◦ f (ζ + h) = dz 2πi |z−ζ|=ε z − (ζ + h) by the Cauchy integral formula, where the integral is a contour integral over a circular path with positive orientation winding once around ζ with radius of ε. Therefore we have †

This subsection will not be needed in the remainder of the book.

8.2 Applications of Weak* Compactness

I

261





1 1 1 ℓ ◦ f (z) − dz 2πi |z−ζ|=ε z − (ζ + h) z − ζ I h 1 = ℓ ◦ f (z) dz. (8.1) 2πi |z−ζ|=ε (z − (ζ + h))(z − ζ)

ℓ ◦ f (ζ + h) − ℓ ◦ f (ζ) =

For h 6= h′ in BεCr{0} we write   1 f (ζ + h) − f (ζ) f (ζ + h′ ) − f (ζ) x(h, h′ ) = − h − h′ h h′ for the second-order difference quotient. We claim that x(h, h′ ) is uniformly C r bounded for h 6= h′ in Bε/2 {0}. This will give the theorem, since it implies that f (ζ + h) − f (ζ) h 7−→ h C r is a Lipschitz function on Bε/2 {0} and so has a limit as h → 0. To prove the claim let ℓ ∈ X ∗ and use (8.1) to calculate that I h i ℓ◦f (z) ℓ◦f (z) ′ 1 ℓ(x(h, h )) = 2πi(h−h′ ) − ′ (z−(ζ+h))(z−ζ) (z−(ζ+h ))(z−ζ) dz |z−ζ|=ε I ′ ✘✘ ℓ◦f (z)✘ (h−h ) 1 = 2πi✘ (8.2) ′ ✘✘ (h−h ) (z−ζ)(z−(ζ+h))(z−(ζ+h′ )) dz. |z−ζ|=ε

Notice that the denominator in the integral on the right-hand side of (8.2) is uniformly bounded away from zero, and the numerator is bounded above by M kℓk for some constant M depending only on f , ζ, and ε. It follows that |ℓ(x(h, h′ ))| 6 M ′ kℓk C r for some constant M ′ not depending on h 6= h′ ∈ Bε/2 {0}. By Corollary 7.9 we see that kx(h, h′ )k 6 M ′ , which proves the claim and hence the theorem. 

Another instance where weak convergence can be upgraded to strong convergence arises in the proof of a version of the mean ergodic theorem for a measure-preserving group action. We refer to [27, Sec. 8.7], where a simple version of an argument due to Greschonig and Schmidt [42] is presented.

8.2 Applications of Weak* Compactness The Banach–Alaoglu theorem (Theorem 8.10) is quite helpful for the construction of the Haar measure on compact abelian groups and invariant means (see Section 3.3, Section 7.2 and Section 10.2).

262

8 Locally Convex Vector Spaces

Exercise 8.21. Let G be a compact metric abelian group. Show that there exists a Ginvariant positive functional Λ : CR (G) → R with Λ(1) = 1, and deduce the existence of a Haar measure on G. Exercise 8.22. Let H be a separable Hilbert space, and suppose that A ∈ B(H) is a  compact operator on H. Show that A B1H is compact and that A∗ is also a compact operator. Exercise 8.23 (Discrete abelian groups are amenable). Let G be any abelian discrete group. Define for any finitely generated subgroup H < G the set SH to be the set of all positive functionals L ∈ (ℓ∞ (G))∗ whichThave norm one and are left-invariant under elements of H. Show that the intersection H f.g. SH taken over all finitely generated subgroups H in G is non-empty, and deduce that G is amenable by applying Lemma 7.18. Exercise 8.24. Use Exercise 8.23 and the Riesz representation theorem to give a different proof of the existence of Haar measure on a compact abelian group.

The next exercise generalizes Exercise 7.21(b) and shows the existence of a maximal amenable normal subgroup called the amenable radical of G. Exercise 8.25 (Amenable radical). Let G be a discrete group. Let A be a set and suppose that Hα ⊳ G is an amenable normal subgroup for any α ∈ A. Show that the subgroup hHα | α ∈ Ai generated by these subgroups is an amenable normal subgroup.

The following exercise gives an analogue to the existence claim in Theorem 3.13. Exercise 8.26. Let X be a Banach space and K ⊆ X ∗ a non-empty weak* closed subset. Show that for any x∗0 ∈ X ∗ we have kx∗0 − k0 k = mink∈K kx∗0 − kk for some k0 ∈ K.

8.2.1 Equidistribution The combination of the Riesz representation theorem for functionals on C(X) (Theorem 7.54) and the compactness of the unit ball in the weak* topology in the Banach–Alaoglu theorem (Theorem 8.10) provide the basic tools for studying sequences of probability measures.(25) Proposition 8.27. Let X be a compact metric space. Then the space P(X) of probability measures defined on the Borel σ-algebra of X forms a compact metric space in the weak* topology. The same applies to  M6T (X) = µ is a positive measure on X with µ(X) 6 T for all T > 0.

Proof. By the Riesz representation theorem (Theorem 7.54) we have C(X)∗ ∼ = M(X) ⊇ P(X),

8.2 Applications of Weak* Compactness

263

where M(X) is the space of finite signed measures defined on the Borel σalgebra of X. By Theorem 7.44 the set of probability measures is given by  \ R R P(X) = µ ∈ M(X) | 1X dµ = 1 ∩ µ ∈ M(X) | f dµ > 0 f >0

where the intersection is taken over all f ∈ C(X) with f > 0. Since each of the sets in the intersection is closed in the weak* topology, we see that P(X) is closed as well. By the Banach–Alaoglu theorem (Theorem 8.10), and since C(X)∗

P(X) ⊆ B1

,

this implies that P(X) is compact in the weak* topology. By Lemma 2.46 we know that C(X) is separable, so by Proposition 8.11 the weak* topology on P(X) is metrizable. The same argument applies to M6T (X).  Exercise 8.28. Let X be a locally compact σ-compact metric space. Show for any T > 0 that M6T (X) (defined as in Proposition 8.27) is compact with respect to the weak* topology defined by C0 (X) and the identification between C0 (X)∗ and M(X) in the Riesz representation (Theorem 7.54). Also show that the space of probability measures P(X) is necessarily not compact if X is not compact.

Definition 8.29. Let X be a compact metric space, and let (µn ) be a sequence of probability measures in P(X). We say that (µn ) equidistributes with respect to a probability measure m ∈ P(X) if µn → m as n → ∞ in the weak* topology; that is, if Z Z f dµn −→ f dm X

X

as n → ∞ for all f ∈ C(X). A sequence (xn ) equidistributes with respect to m ∈ P(X) if the averages µn = n1 (δx1 + · · · + δxn ) of the Dirac measures at x1 , . . . , xn equidistribute with respect to m. One is often interested in equidistribution with respect to a natural given measure like the Lebesgue measure on T. In that case the natural measure is often not mentioned, and we simply talk about a sequence of measures being equidistributed. For the case of the Lebesgue measure on Td the following provides a characterization of equidistribution. Lemma 8.30. A sequence (µn ) of probability measures on Td equidistributes if and only if Z Z χk dµn −→ χk dx = 0 Td

dr

for all k ∈ Z

{0}.

Td

264

8 Locally Convex Vector Spaces

Essential Exercise 8.31. Prove Lemma 8.30 using the density of the trigonometric polynomials in C(Td ). Exercise 8.32. Assume that 1, α1 , . . . , αd ∈ R are linearly independent over Q. Show that N−1 1 X f n(α1 , . . . , αd ) N n=0



(mod Zd ) −→

Z

f (x) dx

Td

for any f ∈ C(Td ). Use this to generalize Exercise 2.50 to a statement about powers of 2 and 3 with the same exponent. Exercise 8.33. Assume that α1 , . . . , αd ∈ R are linearly independent over Q. Show that 1 T

Z

T

f t(α1 , . . . , αd )

0

as T → ∞, for any f ∈ C(Td ).



(mod Zd ) dt −→

Z

f (x) dx

Td

Lemma 8.30 already gives some examples of equidistributed sequences, generalizing Example 2.49 (see also Exercises 8.33 and 8.32). Equidistribution results like this are a starting point for more general results obtained by Weyl [112]. We will only discuss a special case, and outline a proof along the lines of a slightly more recent approach due to Furstenberg [36]. Proposition 8.34. If α ∈ RrQ, then the sequence (xn ) defined by xn = n2 α modulo Z for all n > 1 is equidistributed in T. The approach of Furstenberg is to study not just n2 α modulo Z but in fact orbits of points (x, y) ∈ T2 under the map T : T2 → T2 defined by T (x, y) = (x + α, y + 2x + α).

(8.3)

Notice that T (0, 0) = (α, α), T 2 (0, 0) = (2α, 4α), .. . T n (0, 0) = (nα, n2 α), so that Proposition 8.34 will certainly follow from the stronger result that the orbit {T n (0, 0) | n > 0} is equidistributed in T2 . Dynamical questions of this sort — concerning equidistribution of an orbit under iteration of a map — are part of ergodic theory. We will briefly outline how one can use the Banach–Alaoglu theorem (Theorem 8.10) to prove Proposition 8.34 using ideas from ergodic theory without developing this theory further, and refer to [27] for a more thorough treatment.

8.2 Applications of Weak* Compactness

265

Definition 8.35. Let X be a compact metric space, and let T : X → X be a continuous transformation. A Borel probability measure µ on X is said to be T -invariant (T is called measure-preserving with respect to µ) if µ(T −1 B) = µ(B) for Borel measurable sets B ⊆ X. The triple (X, T, µ) is called a measure-preserving system. A T -invariant probability measure µ is said to be ergodic if any Borel measurable set B ⊆ X with µ(T −1 B△B) = 0 has µ(B) ∈ {0, 1}. Ergodicity is the natural notion of indecomposability in ergodic theory (which includes the study of measure-preserving systems). To see this, notice that if B ⊆ X is measurable with µ T −1 B△B = 0 and µ(B) ∈ (0, 1), then we can decompose the measure into a convex combination     1 1 µ = µ(B) µ|B + µ(XrB) µ| , µ(B) µ(XrB) XrB     1 1 where one can quickly check that µ(B) µ|B and µ| are two r µ(XrB) X B different T -invariant probability measures. Thus a non-ergodic measure-preserving system can be decomposed into two disjoint measure-preserving systems (where in both systems we still consider the map T : X → X). Pursuing the idea that a non-ergodic measure is one that can be decomposed in this way leads to the following alternate characterization of ergodicity. Proposition 8.36. Let X be a compact metric space, and let T : X → X be a continuous transformation. Then the space P T (X) = {µ ∈ M(X) | µ is a T -invariant probability measure} is a weak* compact convex subset of M(X). The extreme points of P T (X) are precisely the ergodic measures in P T (X).

We recall that µ ∈ P T (X) is extremal if it cannot be expressed as a proper convex combination µ = sν1 + (1 − s)ν2 with s ∈ (0, 1) and ν1 , ν2 distinct elements of P T (X). We will discuss extreme points from an abstract point of view in Section 8.6.1. The characterization in Proposition 8.36 is interesting because it relates an intrinsic property of a T -invariant probability measure (ergodicity, as in Definition 8.35) with a property regarding the relative position of this measure in the space of all T -invariant probability measures. For the proof we will make use of the following construction. Given a measurable map θ : (X, B) → (Y, C) between two measurable spaces the push-forward of a measure µ on (X, B) is the measure R θ∗ µ on (Y,RC) defined by θ∗ µ(A) = µ(θ−1 A) for all A ∈ C. Notice that Y f dθ∗ µ = X f ◦ θ dµ for all integrable functions f on (Y, C), which follows for simple functions directly from the definition of θ∗ µ and then for positive measurable functions by monotone convergence.

266

8 Locally Convex Vector Spaces

If X and T : X → X are as in the proposition, then the uniqueness part of Riesz representation (Theorem 7.44) implies that µ is T -invariant if and only if Z Z Z X

f dT∗ µ =

X

f ◦ T dµ =

f dµ

(8.4)

X

for all f ∈ C(X).

Sketch of Proof of Proposition 8.36. By Proposition 8.27, P(X) is compact in the weak* topology. By the characterization in (8.4), the subset P T (X) ⊆ P(X) is therefore weak* closed and so also compact. It is easy to see that P T (X) is convex, and the discussion before the proposition shows that a non-ergodic invariant measure is not extreme. Suppose now that µ is not extreme, and write µ = sν1 + (1 − s)ν2 with some s ∈ (0, 1) and ν1 , ν2 distinct measures in P T (X). Clearly ν1 ≪ µ since s lies in (0, 1), so there is a measurable function f1 > 0 with dν1 = f1 dµ. We claim that f1 is T -invariant in the sense that f1 ◦ T = f1 almost everywhere with respect to µ. To see this let B ⊆ X be a measurable set and note that by T -invariance of µ we have Z Z Z ν1 (B) = 1B f1 dµ = (1B f1 ) ◦ T dµ = f1 ◦ T dµ, B

T −1 B

X

and, by T -invariance of ν1 , ν1 (B) = ν1 (T −1 B) =

Z

f1 dµ. T −1 B

Let us assume† now that T has a continuous inverse (which is the case for the map on T2 considered above to which this result will be applied). Then the above implies that f1 = f1 ◦ T almost everywhere with respect to µ, since T −1 (T B) = B shows that all measurable sets are pre-images. Since ν1 6= µ the function f1 is not equal to 1 almost everywhere with respect to µ, and has Z f1 dµ = ν1 (X) = 1. X

f1−1 ([0, 1))

Therefore, B = so µ is not ergodic.

satisfies µ(B△T −1 B) = 0 and has µ(B) ∈ (0, 1), 

The compactness of P(X) can be used to obtain elements of P T (X) from sequences of approximately invariant measures. Essential Exercise 8.37. Let X be a non-trivial compact metric space and T : X → X be continuous. For any sequence (νn ) in P(X), define a sequence (µn ) by †

This is not necessary, we refer to [27] for the general case.

8.2 Applications of Weak* Compactness

267

n−1 1X j µn = T νn n j=0 ∗

for all n > 1. Show that any weak* limit µ of a subsequence of (µn ) is T invariant, and deduce that P T (X) is non-empty.(26) With these general facts about continuous transformations on compact metric spaces at our disposal, we will return to a consideration of the transformation (8.3). We start by explaining Example 2.49 in this language. Clearly the map Rα : T → T defined by Rα (x) = x + α preserves the Lebesgue measure λT , so (T, Rα , λT ) is a measure-preserving system. Lemma 8.38. If α ∈ RrQ then λT is ergodic for Rα . −1 Proof. Suppose that B ⊆ T is a measurable set with λT (B△Rα B) = 0. Then the characteristic function 1B satisfies 1B ◦ Rα = 1R−1 = 1B as α B elements of L2λT (T). Thus for the Fourier series expansion

1B =

X

cm χm ,

m∈Z

which converges in L2λT (T), we have

1B ◦ Rα =

X

m∈Z

cm χm ◦ Rα =

X

cm χm ,

m∈Z

where we have used the fact that URα : f 7→ f ◦ Rα is an isometry of L2λT (T) and hence maps a convergent series to a convergent series. Notice that χm ◦ Rα (x) = e2πim(x+α) = e2πimα χm (x), so that (by uniqueness of Fourier coefficients) we must have cm e2πimα = cm for all m ∈ Z. Since α ∈ RrQ this implies that cm = 0 for m ∈ Zr{0}, so 1B = c0 in L2λT (T), which implies that λT (B) ∈ {0, 1} as required.  In fact, Rα has a stronger property, called unique ergodicity: λT is the only measure invariant under Rα if α ∈ RrQ. This also implies Lemma 8.38 by Proposition 8.36. To see this stronger result, let µ be any Rα -invariant probability measure, and calculate Z Z Z χm dµ = χm ◦ Rα dµ = e2πimα χm dµ, T

which implies that

T

Z

T

χm dµ =

T

(

1 for m = 0; 0 for m 6= 0.

268

8 Locally Convex Vector Spaces

Since this is a property shared by λT , and the trigonometric polynomials are dense in C(T) by Proposition 3.65, we deduce that µ = λT . Using Exercise 8.37 together with the exercise below gives an alternative approach to Example 2.49. Of course this approach is more complicated, but it can also be used in situations where a direct calculation of the sort used in Example 2.49 is not feasible. Essential Exercise 8.39. (a) Let Z be a topological space, let (zn ) be a sequence in Z, and let z ∈ Z. Show that the following are equivalent: • lim zn = z. n→∞

• For every subsequence (znk ) there is a subsequence (znkℓ ) such that lim znkℓ = z.

ℓ→∞

(b) Assume in addition that Z is a compact metric space, and show that the following gives another equivalent condition: • For every convergent subsequence (znk ) we have lim znk = z.

k→∞

(c) Assume now that α ∈ RrQ, and use this, together with the fact that P Rα (T) only contains the measure λT and Exercise 8.37, to show the equidistribution of (nα) in T. We now describe the procedure for obtaining equidistribution of the orbits of the map T defined by T (x, y) = (x + α, y + 2x + α) on T2 discussed earlier, leaving some of the steps as exercises. Essential Exercise 8.40. Show that the Lebesgue measure λT2 is T -invariant and ergodic. Sketch of Proof of Proposition 8.34. As discussed just after the statement of the proposition, it is enough to show that every T -orbit (x, y), T (x, y), T 2 (x, y), . . . is equidistributed with respect to λT2 . Notice that the first coordinate of points in the orbit are precisely the points in the orbit 2 x, Rα (x), Rα (x), . . .

in T for the transformation Rα . We already know that this sequence is equidistributed with respect to λT . Write δx for the Dirac measure at x ∈ T, so the equidistribution of the Rα -orbit of x is equivalent to the statement

8.2 Applications of Weak* Compactness n−1 1X j T (δx × λT ) −→ λT2 n j=0 ∗

269

(8.5)

as n → ∞ (check this). Fix some ρ ∈ (0, 12 ) and write λy,ρ = λBρ (y) for the Lebesgue measure restricted to the ρ-ball Bρ (y) = (y − ρ, y + ρ) ⊆ T around y ∈ T, and consider the average n−1  1 X j T∗ δx × λy,ρ . (8.6) 2ρn j=0

We want to show that these averages converge to λT2 in the weak* topology. Proposition 8.27 and Exercise 8.39 imply that for this it is enough to show that any convergent subsequence has λT2 as its limit. So assume (nk ) is the index sequence of a convergent subsequence, and denote the limit by µ1 . Using the convergence in (8.5), we see that nX k −1 1 T j (δx × (λT − λy,ρ )) −→ µ2 (1 − 2ρ)nk j=0 ∗

converges as k → ∞. By Exercise 8.37 we have µ1 , µ2 ∈ P T (T2 ). We also have λT2 = 2ρµ1 + (1 − 2ρ)µ2 by (8.5). Together with Exercise 8.40 and Proposition 8.36 this implies that λT2 = µ1 = µ2 . Using Exercise 8.39(b) this shows that the average in (8.6) converges to λT2 as n → ∞. Using the structure of the map T it is now not too difficult to upgrade the above to the statement in the proposition. Fix some function f ∈ C(T2 ) and some ε > 0. By uniform continuity of f there is some ρ ∈ (0, 21 ) such that  d (x1 , y1 ), (x2 , y2 ) < ρ =⇒ |f (x1 , y1 ) − f (x2 , y2 )| < ε (where d denotes the usual metric on T2 ). With this choice of ρ we have n−1 n−1 XZ ρ 1 X   1 j j f T (x, y) − f T (x, y + z) dz < ε n 2ρn j=0 −ρ j=0

since T j (x, y + z) = T j (x, y) + (0, z) has distance less than ρ from T j (x, y) for all z ∈ (−ρ, ρ). Using the convergence of (8.6) to λT2 , it follows that Z n−1 X  1 lim sup f dλT2 − f T j (x, y) n j=0 n→∞ T2

270

8 Locally Convex Vector Spaces

Z n−1 XZ 1 6 lim sup f dλT2 − f dT∗j (δx × λy,ρ ) + ε = ε. 2ρn j=0 n→∞ T2

Since f ∈ C(T2 ) and ε > 0 were arbitary, the proposition follows.



8.2.2 Elliptic Regularity for the Laplace Operator †

We show in this and the next subsection how the Banach–Alaoglu theorem can help to prove elliptic regularity for weak solutions to equations of the form ∆g = u with g in H01 (U ), u in L2 (U ), and U ⊆ Rd open and bounded. In this section we essentially reprove Theorem 5.45 using different methods. In the next subsection we will assume that U has smooth boundary and will show the regularity (unlike in Section 5.3.2) up to and including the boundary. For convenience we will consider only R-valued functions. Definition 8.41 (Difference quotients). Let U ⊆ Rd and V ⊆ U be open subsets. For any f ∈ L2 (U ), j = 1, . . . , d and h ∈ R such that V + hej is contained in U we define the difference quotient Djh f ∈ L2 (V ) by Djh f (x) =

f (x + hej ) − f (x) h

for almost every x ∈ V . As one might expect the difference quotient and the weak partial derivative are related. The first connection below is a direct application of our definition of the Sobolev spaces. Lemma 8.42 (Bounding the difference quotient). Let V ⊆ U ⊆ Rd be open subsets and s > 0 such that V + [−s, s]ej ⊆ U for some j ∈ {1, . . . , d}. Then, for any function f ∈ H 1 (U ), ∂ j f kL2 (U) kDjh f kL2 (V ) 6 k∂ for 0 < |h| 6 s. Proof. If f ∈ C ∞ (U ) ∩ H 1 (U ) then Djh f (x)

f (x + hej ) − f (x) = = h

Z

1

∂j f (x + thej ) dt 0

for all x ∈ V and 0 < |h| 6 s. By integrating the square of this equation, applying Cauchy–Schwarz, translation invariance of the Lebesgue measure, and Fubini’s theorem we obtain †

This and the next subsection finish our discussion of Sobolev spaces and the Laplace operator. In particular, this material will not be needed in the remainder of the book.

8.2 Applications of Weak* Compactness

Z

V

271

2 Z Z 1 dx ∂ f (x + the ) dt j j V 0 Z Z 1 Z 2 6 |∂j f (x + thej )| dt dx 6 |∂j f (x)|2 dx

|Djh f (x)|2 dx =

V

0

U

whenever 0 < |h| 6 s. Approximating f ∈ H 1 (U ) by elements of the intersection H 1 (U ) ∩ C ∞ (U ) then gives the lemma.  The above lemma will be useful but it will be more powerful when combined with its partial converse, which is a corollary of the Banach–Alaoglu theorem. Corollary 8.43 (Existence of weak derivative). Let V ⊆ U ⊆ Rd be open subsets and s > 0 satisfying V + [−s, s]ej ⊆ U for some j ∈ {1, . . . , d}. Suppose that f ∈ L2 (U ) and C > 0 satisfy kDjh f kL2 (V ) 6 C

(8.7)

for all 0 < |h| 6 s. Then f |V has a weak partial derivative ∂ j f on V satisfying ∂ j f kL2 (V ) 6 C. k∂ Proof. Let φ ∈ Cc∞ (V ) and note that this implies that Djh φ converges uniformly to ∂j φ as h → 0. Indeed, we have Djh φ(x) =

φ(x + hej ) − φ(x) = ∂j φ(x + ξh hej ) −→ ∂j φ(x) h

as h → 0 for some ξh ∈ (0, 1) by the mean value theorem and continuity of ∂j φ. Indeed, the convergence is uniform and so for f ∈ L2 (U ) we have hf, Djh φiL2 (V ) −→ hf, ∂j φiL2 (V )

(8.8)

as h → 0. On the other hand, since φ has compact support within V we may also shift integration to obtain a discrete analogue of the formula defining the weak derivative in Definition 5.8, Z

 1 f, Djh φ L2 (V ) = f (x) φ(x + hej ) − φ(x) dx h V Z Z 1 1 = f (y − hej )φ(y) dy − f (x)φ(x) dx h V h V

= − Dj−h f, φ L2 (V ) (8.9)

for all sufficiently small h. −1/n By the assumption (8.7) the functions Dj f |V for n ∈ N with n1 6 s form a bounded sequence of elements of the Hilbert space L2 (V ) satisfy-

272

8 Locally Convex Vector Spaces −1/n

ing kDj f kL2 (V ) 6 C. By the Banach–Alaoglu theorem (Theorem 8.10 and Proposition 8.11) there exists a subsequence (nk ) with the property −1/nk that Dj f |V converges in the weak* topology to some function v ∈ L2 (V ) as k → ∞ with kvkL2 (V ) 6 C. We claim that v is the weak partial derivative sought in the corollary. In fact, by (8.9) we now have for any φ ∈ Cc∞ (V ) 1/nk

hf, Dj

−1/nk

φiL2 (V ) = −hDj

f, φiL2 (V ) −→ −hv, φiL2 (V )

as k → ∞. Together with (8.8) this gives hf, ∂j φiL2 (V ) = −hv, φiL2 (V ) for any function φ ∈ Cc∞ (V ), which proves the corollary.  Exercise 8.44. Using the same assumptions as in Corollary 8.43 show that the difference quotients Djh f converge weakly in L2 (V ) to ∂ j f as h → 0. Do they also converge strongly?

In order for Corollary 8.43 to be useful we wish to upgrade the conclusion and obtain functions in H 1 . For this we need some more preparations, which partially already featured in some exercises. For any open subset U ⊆ Rd we will identify elements f ∈ L2 (U ) with their trivial extension to all of Rd (by setting the extension to be equal to zero outside of U ). By a slight abuse of terminology we will also say that f ∈ L2 (U ) has compact support in Rd if (for some representative of f ) the set {x ∈ U | f (x) 6= 0} has compact closure in Rd . Proposition 8.45 (Convolution and derivatives). Let U ⊆ Rd be open. Let f ∈ L2 (U ) have compact support in Rd and let χ ∈ Cc∞ (Rd ). Then the convolution product f ∗ χ defined by Z Z f ∗ χ(x) = f (y)χ(x − y) dy = f (x − z)χ(z) dz for x ∈ Rd is smooth with compact support, and its derivatives are given by ∂α (f ∗ χ) = f ∗ ∂α χ for all α ∈ Nd0 . If f ∈ L2 (U ) has a weak α-partial derivative fα ∈ L2 (U ) for some α ∈ Nd0 and the open subset V ⊆ U has the property that V − Supp χ ⊆ U , then we also have ∂α (f ∗ χ)|V = (fα ∗ χ)|V . Similarly, if g ∈ L2 (U ) has compact support and satisfies the equation ∆g = u then ∆(g ∗ χ)|V = (u ∗ χ)|V . Proof. Let f be as in the proposition and note that f ∈ L1 (Rd ). Since we know χ ∈ Cc∞ (U ) has compact support, the set {x ∈ U | f (x) 6= 0} + Supp χ is compact and it is easy to see that f ∗ χ vanishes outside of this set. Note that dominated convergence implies that f ∗ χ is continuous. Moreover, for any j ∈ {1, . . . , d} we have  1 f ∗ χ(x + hej ) − f ∗ χ(x) h→0 h Z

∂j (f ∗ χ)(x) = lim = lim

h→0

f (y)

χ(x+hej −y)−χ(x−y) h

dy = f ∗ ∂j χ(x)

8.2 Applications of Weak* Compactness

273

for all x ∈ Rd by the mean value theorem and dominated convergence. Induction now shows that f ∗ χ ∈ Cc∞ (Rd ) and ∂α (f ∗ χ) = f ∗ ∂α χ for all α ∈ Nd0 , as claimed. Assume next that f ∈ L2 (U ) has the weak α-partial derivative fα ∈ L2 (U ) for some α ∈ Nd0 . Note that fα also has compact support, as it vanishes by Lemma 5.10 almost everywhere on every open subset on which f vanishes almost everywhere. Also suppose for χ ∈ Cc∞ (Rd ) that the open subset V ⊆ U satisfies V − Supp χ ⊆ U . Now let φ ∈ Cc∞ (V ) and consider Z Z hf ∗ χ, ∂α φiL2 (V ) = f (x − y)χ(y) dy∂α φ(x) dx V Supp χ Z = χ(y) hf, ∂α (λ−y φ)iL2 (U) dy, Supp χ

where λ−y φ(x) = φ(x + y) has support Supp φ − y and defines for all y in Supp χ an element of Cc∞ (U ) since V − Supp χ ⊆ U . Using the fact that fα is the weak partial derivative of f we obtain Z hf ∗ χ, ∂α φiL2 (V ) = (−1)kαk1 χ(y) hfα , λ−y φiL2 (U) dy Supp χ Z Z = (−1)kαk1 χ(y) fα (x − y)φ(x) dx dy Supp χ

kαk1

= (−1)

V

hfα ∗ χ, φiL2 (V )

for any φ ∈ Cc∞ (V ). By uniqueness of the weak derivative (Lemma 5.10) and continuity we now obtain ∂α (f ∗ χ)|V = (fα ∗ χ)|V (pointwise) as required. This argument also gives the claim in the proposition for g ∈ L2 (U ) with ∆g = u ∈ L2 (U ). Indeed, with the same arguments concerning the support of λ−y φ with y ∈ Supp χ we obtain Z Z hg ∗ χ, ∆φiL2 (V ) = g(x − y)χ(y) dy∆φ(x) dx V Supp χ Z = χ(y) hg, ∆(λ−y φ)iL2 (U) dy Supp χ Z = χ(y) hu, λ−y φiL2 (U) dy Supp χ Z Z = χ(y) u(x − y)φ(x) dx dy = hu ∗ χ, φiL2 (V ) , Supp χ

V

as required.

 Cc∞ (Rd )

We will now use a non-negative functionR  ∈ as in Exercise 5.17, so that Supp  is the closed unit ball and  dx = 1, and define the scaled function ε (x) = ε−d ( xε ) for all x ∈ Rd and ε > 0.

274

8 Locally Convex Vector Spaces

Lemma 8.46 (Approximate identity Rin L2 (Rd )). Let U ⊆ Rd be open and let f ∈ L2 (U ). Then f ∗ ε (x) = f (x − y)ε (y) dy converges to f in L2 (Rd ) as ε → 0. Proof. By Lemma 3.75 fε = f ∗ ε again belongs to L2 (Rd ) for any ε > 0. Next, notice that Z Z fε (x) = ε−d ( yε )f (x − y) dy = (z)f (x − εz) dz, for x ∈ Rd . Therefore

Z Z  f (x − εz) − f (x) (z) dz 2 dx ZZ f (x − εz) − f (x) 2 (z) dz dx 6 Z

2 = λεz f − f L2 (Rd ) (z) dz

kfε − f k2L2 (Rd ) =

by Jensen’s inequality (see the first paragraph of the proof of Lemma 3.75) and Fubini’s theorem. By Lemma 3.74 and dominated convergence the latter converges to zero as ε → 0.  We note that the following corollary to Proposition 8.45 will be combined with Corollary 8.43. Corollary 8.47 (Weak derivatives and H k ). Let U ⊆ Rd be open and let k > 1 be an integer. Suppose that f ∈ L2loc (U ) has, for all α ∈ Nd0 with kαk1 6 k, a weak partial derivative ∂ α f ∈ L2loc (U ). Then f lies k in Hloc (U ). In fact, we have χf ∈ H k (Rd ) for all χ ∈ Cc∞ (U ). Proof. Let χ ∈ Cc∞ (U ). Extending the product rule for differentiation we ∂ j f ) ∈ L2 (U ) for j = 1, . . . , d (check this), have that ∂ j (χf ) = (∂j χ)f + χ(∂ which generalizes inductively to the Leibniz rule for weak differentiation ∂ α with kαk1 6 k. In particular, χf has a weak α-partial derivative on U . We now show that χf has weak partial derivatives ∂ α (χf ) on all of Rd , where we extend these functions trivially from U to all of Rd . Since Supp χ ⊆ U is compact, there exists a function ψ ∈ Cc∞ (U ) with ψ ≡ 1 on Supp χ (see Exercise 5.37). If now φ ∈ Cc∞ (Rd ), then ψφ ∈ Cc∞ (U ). Using in addition that Supp(χf ) and Supp ∂ α (χf ) are contained in Supp χ ⊆ U , we obtain ∂ α (χf ), ψφiL2 (U) = h∂ ∂ α (χf ), φiL2 (Rd ) hχf, ∂α φiL2 (Rd ) = hχf, ∂α (ψφ)iL2 (U) = h∂ for all α ∈ Nd0 with kαk1 6 k. Applying Proposition 8.45 and Lemma 8.46 we see that (χf ) ∗ ε is in Cc∞ (Rd ) and that   ∂α (χf ) ∗ ε = ∂ α (χf ) ∗ ε

8.2 Applications of Weak* Compactness

275

approximates ∂ α (χf ) as ε → 0 for any α ∈ N0 with kαk1 6 k. From this it follows that χf ∈ H k (Rd ) and with the identification of functions we also see that χf ∈ H k (U ) (see Definition 5.7). Since χ ∈ Cc∞ (U ) was arbitrary, this k proves f ∈ Hloc (U ), and hence the corollary.  The argument that we will present here gives an alternate proof of Theorem 5.45 (avoiding Fourier series). 1 Theorem 8.48 (Elliptic regularity). Let U ⊆ Rd be open, g ∈ Hloc (U ), k+2 k let k > 0 and suppose that ∆g = u ∈ Hloc (U ). Then g ∈ Hloc (U ).

Second proof of elliptic regularity on open sets. By definition (Definition 5.43) we have to show that χg ∈ H k+2 (U ) for all χ ∈ Cc∞ (U ). We initially assume that k = 0, fix some χ ∈ Cc∞ (U ), and consider f = χg, which is an element of H 1 (U ). By Lemma 5.50 we have ∆f = v ∈ H 0 (U ) = L2 (U ), and we also know that v vanishes on U r Supp χ. We have to show that f ∈ H 2 (U ), for which, using Corollary 8.47, we need to show that ∂i ∂j f exists in L2 (U ) for all i, j = 1, . . . , d. It will, however, be more convenient to work on Rd . Extending to all of Rd . We claim that, after extending f and v trivially from U to all of Rd , we have f ∈ H 1 (Rd ) and the relation ∆f = v actually holds on Rd . In fact, the first claim follows from the assumption on g and Corollary 8.47. For the second we again apply the same argument using a function ψ ∈ Cc∞ (U ) satisfying ψ ≡ 1 on Supp χ. Indeed, for φ ∈ Cc∞ (Rd ) we have ψφ ∈ Cc∞ (U ) and hf, ∆φiL2 (Rd ) = hf, ∆(ψφ)iL2 (U) = hv, ψφiL2 (U) = hv, φiL2 (Rd ) . Thus f satisfies ∆f = v on Rd as φ ∈ Cc∞ (Rd ) was arbitrary. Convolution. Next we let ε > 0 and define fε = f ∗ ε and vε = v ∗ ε . By Proposition 8.45 we have that fε , vε ∈ Cc∞ (Rd ) satisfy ∆fε = vε . We now use integration by parts for smooth functions of compact support and obtain Z Z 2 k∂i ∂j fε k2 = (∂i ∂j fε )(∂i ∂j fε ) dx = − (∂i2 ∂j fε )(∂j fε ) dx = h∂i2 fε , ∂j2 fε i for all i, j ∈ {1, . . . , d}, which gives d X

i,j=1

k∂i ∂j fε k22 =

d X

i,j=1

h∂i2 fε , ∂j2 fε i =

Z

∆fε

2

dx = kvε k22 .

(8.10)

A uniform estimate implies regularity. Using Lemma 8.42 we see that kDih ∂j fε k2 6 k∂i ∂j fε k2 6 kvε k2

276

8 Locally Convex Vector Spaces

for all i, j ∈ {1, . . . , d} and real numbers h with 0 < |h| 6 1. We now let ε → 0 and obtain from (8.10), Proposition 8.45, and Lemma 8.46 that kDih∂ j f k2 6 kvk2 for all h with 0 < |h| 6 1. Applying Corollary 8.43, this implies that ∂ i∂ j f 2 exists in L2 (Rd ) and by Corollary 8.47 it follows that f ∈ Hloc (Rd ). Using the same function ψ ∈ Cc∞ (U ) as above this implies f = ψf ∈ H 2 (Rd ) ∩ H 2 (U ), 2 so g ∈ Hloc (U ) since f = χg and χ ∈ Cc∞ (U ) was arbitrary.

Induction on k. The theorem now follows by induction on k > 0. The k case k = 0 is proven above. So assume now that ∆g = u ∈ Hloc (U ) for k−1 some k > 1. Since we then also have u ∈ Hloc (U ) we obtain from the k+1 inductive hypothesis that in fact g ∈ Hloc (U ). Let χ ∈ Cc∞ (U ). Lemma 5.50 then gives for f = χg that ∆f = v ∈ H k (U ). If α ∈ Nd0 satisfies kαk1 6 k, ∂ α f = ∂ α v since then ∂ α f ∈ L2 (U ) satisfies ∆∂ ∂ α f, ∆φi = (−1)kαk1 hf, ∂α ∆φi = (−1)kαk1 hf, ∆∂α φi h∂

∂ α v, φi = (−1)kαk1 hv, ∂α φi = h∂

for all φ ∈ Cc∞ (U ). Hence the argument above for k = 0 applies to ∂ α f and 2 shows that ∂ α f ∈ Hloc (U ). As α ∈ Nd0 is arbitrary with kαk1 6 k, it follows that f satisfies the assumption of Corollary 8.47 for the integer k + 2 and k+2 hence f = ψf ∈ H k+2 (U ), or equivalently that g ∈ Hloc (U ) since f = χg ∞ and χ ∈ Cc (U ) was arbitrary. This concludes the induction and the proof of the theorem.  The above proof of elliptic regularity, and in particular the step in (8.10), was tailored very closely to the Laplace operator on open subsets in Rd . In order to also obtain the regularity at the boundary we start by giving a different argument which will be more amenable for generalizations (even though it will be a bit more involved). For this we will use the following inequality, which is also known as the Cauchy inequality with an ε. For any measure space (X, B, µ), functions u, v ∈ L2µ (X), and ε > 0 we have Z 1 hu, viL2 (X,µ) 6 |uv| dµ 6 εkuk22 + kvk22 . (8.11) 4ε X The first inequality is the triangle inequality and the second follows from √ 2 2 √ integrating the inequality 0 6 ε|u| − 2|v| = ε|u|2 + |v| 4ε − |u||v| over X. ε

Third proof of elliptic regularity on open sets. As in the second proof of elliptic regularity above, we multiply g by some χ ∈ Cc∞ (U ) and apply Lemma 5.50. This shows that it suffices to consider the case U = Rd and a function g ∈ H01 (Rd ) with compact support satisfying ∆g = u ∈ H k (Rd ). We also initially set k = 0 and will use Corollary 8.43 after bounding

8.2 Applications of Weak* Compactness

277

h

Dℓ gj 2

for ℓ, j = 1, . . . , d, where gj = ∂ j g and h ∈ R with 0 < |h| 6 1. Recall that for any φ ∈ Cc∞ (Rd ) we have hg, φi1 =

d X j=1

∂ j g, ∂j φi = − hg, ∆φi = − hu, φi , h∂

see also Lemma 5.41. Approximating any v ∈ H01 (Rd ) by smooth functions with compact support this formula extends to φ = v. We set v = −Dℓ−h (Dℓh g) ∈ H01 (Rd )

(8.12)

for some fixed ℓ ∈ {1, . . . , d} and h ∈ R satisfying 0 < |h| 6 1. Therefore

g, −Dℓ−h (Dℓh g) 1 = u, Dℓ−h (Dℓh g) , (8.13) {z } | {z } | R

L

where L denotes the left-hand side and R denotes the right-hand side. Studying the left-hand side. By definition of h·, ·i1 , we have L=−

d d X X

gj , ∂ j Dℓ−h (Dℓh g) = − gj , Dℓ−h (Dℓh gj ) , j=1

j=1

where we used the fact that ∂ j and Dℓh commute (check this). Finally, we apply the same argument as in (8.9), which gives our main term M=L=

d d X

h 2

h X

Dℓ gj . Dℓ gj , Dℓh gj = 2 j=1

j=1

This is precisely what we wish to estimate, and it is the only term that is quadratic in the difference quotient of the weak partial derivatives of g. Bounding the right-hand side. We are aiming to convert (8.13) into an estimate on M that is uniform with respect to h. For this, we need to bound the right-hand side R of (8.13). In fact, we have for any ε > 0 that Z Z

2 1 2

u |R| = uDℓ−h (Dℓh g) dx 6 |Dℓ−h (Dℓh g)||u| dx 6 ε Dℓ−h (Dℓh g) 2 + 4ε 2

by (8.11) (Cauchy’s inequality with an ε). In the first expression of the bound on the right we use the fact that Dℓh g ∈ H01 (Rd ) and the bound on the difference quotient by the weak partial derivative in Lemma 8.42 to obtain

−h h 2

D (Dℓ g) 6 ∂ ℓ (Dℓh g) 2 = Dℓh gℓ 2 . ℓ 2 2 2

278

This gives

8 Locally Convex Vector Spaces

2 |R| 6 ε Dℓh gℓ 2 +

1 2 4ε u 2

6 εM +

1 2 4ε u 2 ,

(8.14)

which on setting ε = 12 gives |R| 6 12 M + 12 kuk22 . Putting the estimates together. Using (8.13) and the estimate for R

2 in (8.14) we finally see that M 6 21 M + 12 u 2 and so d X j=1

2 kDℓh gj k22 = M 6 u 2 .

Note that this upper bound is independent of h and holds for all ℓ in {1, . . . , d}. Applying Corollary 8.43 we see that ∂ ℓ gj exists for all ℓ, j in {1, . . . , d}. In other words, all degree two weak partial derivatives of g exist, and so g lies in H 2 (Rd ) by Corollary 8.47. Induction on k. The theorem again follows by induction on k as in the second proof of elliptic regularity above.  8.2.3 Elliptic Regularity at the Boundary In this subsection we will use the Banach–Alaoglu theorem (much as in the third proof of elliptic regularity starting on p. 276) to prove elliptic regularity up to the boundary. For this we need to assume that U has smooth boundary as in Definition 5.31. Theorem 8.49 (Elliptic regularity up to the boundary). Let U ⊆ Rd be bounded and open with smooth boundary. Let g ∈ H01 (U ), k > 0, and suppose that ∆g = u ∈ H k (U ). Then g ∈ H k+2 (U ). We define C ∞ (U ) =

\

C k (U )

k>0

to consist of all smooth functions on U with the property that the function and all partial derivatives can be extended continuously to U (see also the first paragraph of Section 5.3.3). We consider C ∞ (U ) as a subspace of C(U ). Proposition 8.50 (Sobolev embedding up to the boundary). Let U be a bounded and open subset of Rd with smooth boundary. Then \ H k (U ) = C ∞ (U ). (8.15) k>0

The above theorem and proposition together allow us to complete our discussion of the Dirichlet boundary value problem and the eigenfunctions of the Laplace operator, which previously had only weaker than desired conclusions regarding the behaviour of the functions near the boundary.

8.2 Applications of Weak* Compactness

279

Corollary 8.51 (Smooth solutions). Let U ⊆ Rd be bounded and open with smooth boundary. Then • for any f ∈ C ∞ (∂U ) the weak solution to the Dirichlet boundary value problem ∆g = 0, g|∂U = f from Theorem 5.51 belongs to C ∞ (U ) and satisfies the boundary value condition pointwise, and • the Laplace eigenfunctions f ∈ H01 (U ) with ∆f = λf from Theorem 6.56 also belong to C ∞ (U ) and vanish at the boundary. We first assume Theorem 8.49 and Proposition 8.50 and show how these imply the corollary. Proof of Corollary 8.51. For the Dirichlet boundary value problem we recall from the proof of Theorem 5.51 that we first extended f ∈ C ∞ (∂U ) to all of U , which under our assumptions leads to a function f ∈ C ∞ (U ). Proposition 5.42 then gives a function v ∈ H01 (U ) with g = f − v satisfying ∆g = 0. In other words, ∆v = ∆f ∈ H k (U ) for all k > 0, which by Theorem 8.49 and Proposition 8.50 gives v, g ∈ C ∞ (U ). Proposition 5.33 now implies that v vanishes at ∂U pointwise. Similarly, suppose f ∈ H01 (U ) is an eigenfunction of ∆ from Theorem 6.56. In this case, ∆f = λf implies f ∈ H 3 (U ) by Theorem 8.49, then f ∈ H 5 (U ) and so on. Together with Proposition 8.50 this gives f ∈ C ∞ (U ).  Proof of Proposition 8.50. The claim in (8.15) is (unlike the corollary) a purely local statement. In fact, we claim that it is enough to show that every point z (0) ∈ U has an open neighbourhood V ⊆ Rd with the following properties. Every function \ f∈ H k (U ∩ V ) k>0

with support in U ∩ V is continuous and can be continuously extended to U so that the extension vanishes outside of V . Assuming this property for now we find by compactness a finite cover V1 , . . . , Vn of U comprising such neighbourhoods. By Exercise 5.52 we can find a corresponding smooth partition of unity ψ1 , . . . , ψn , which we use to localize f to the functions \ fj = f ψj ∈ H k (U ∩ Vj ) k>0

with Supp fj ⊆ U ∩ Supp ψj ⊆ U ∩ Vj for j = 1, . . . , n. The local statement then implies that fj can be continuously extended to all of U for j = 1, . . . , n so that the extension vanishes outside of Vj . This implies that f = f1 +· · ·+fn has an extension to U too. Applying this argument to all partial derivatives then gives (8.15). Interior points. We now prove the local statement starting with the case z (0) ∈ U . Here we may take V = U and apply Sobolev embedding

280

8 Locally Convex Vector Spaces

on open sets (Theorem 5.34) to conclude that f ∈ C(U ), which together with Supp f ⊆ U proves that f can be extended continuously to U by setting it equal to 0 on ∂U . Boundary points, flattening the boundary. So consider now some element z (0) ∈ ∂U . We may translate and rotate the coordinate system so that z (0) = 0 and use Definition 5.31 to find some ε > 0 so that U ∩ Bε (0) = {(x, y) ∈ Bε (0) | y < φ(x)} d−1

for some φ ∈ C ∞ (BεR (0)). To simplify the discussion we define the new open sets U ′ = (−δ, δ)d−1 × (−δ, 0), V ′ = (−δ, δ)d , and the map Φ by Φ(x1 , . . . , xd ) = (x1 , . . . , xd−1 , xd + φ(x1 , . . . , xd−1 )), where δ ∈ (0, ε/d) is chosen so that Φ(V ′ ) ⊆ Bε (0). In particular, Φ and all the partial derivatives of Φ will be bounded on V ′ and Φ(U ′ ) ⊆ U ∩Bε (0). We note that Φ maps the Lebesgue measure on V ′ ⊆ Rd to the Lebesgue measure on the open set V = Φ(V ′ ) ⊆ Bε (0). This implies that every f ∈ C ∞ (U ∩ V ) is of the form f = g ◦ Φ−1 for g = f ◦ Φ ∈ C ∞ (U ′ ∩ V ′ ). Moreover, by the multi-dimensional chain rule and induction we have kf kH k (U∩V ) ≪k kf ◦ ΦkH k (U ′ ∩V ′ ) ≪k kf kH k (U∩V ) for all k > 0 and f ∈ C ∞ (U ∩ V ). Therefore, the completions H k (U ∩ V ) and H k (U ′ ∩ V ′ ) are also isomorphic under the map H k (U ∩ V ) ∋ f 7−→ f ◦ Φ ∈ H k (U ′ ∩ V ′ ) for all k > 0. In particular, it is enough to prove the desired local statement for U ′ and V ′ instead of U and V . Simplifying the notation further we apply a linear map, set δ = 1, and may suppose in the following that V = (−1, 1)d and U = (−1, 1)d−1 × (0, 1). Boundary points, trace operators on a box. We define S = (−1, 1)d−1 and will need the trace operator on the hyperplanes Sy = S × {y} inside U for all possible values of the height parameter y ∈ (0, 1). The trace operators for y ց 0 will allow us to extend functions from U to U ∩ V = U ∪ S0 with S0 = S × {0}. We note that these trace operators already featured in Section 5.2.2 but (except for Exercise 5.29 and Exercise 5.35) not in the generality needed here. For completeness we quickly go through the construction of these operators once more. To define the trace operators we note that for any y1 , y2 ∈ (0, 1), any function f ∈ C ∞ (U ), and x ∈ S we have Z y2 f (x, y2 ) = f (x, y1 ) + ∂d f (x, t) dt. (8.16) y1

8.2 Applications of Weak* Compactness

281

If f ∈ C ∞ (U ) ∩ H 1 (U ) we may integrate over y1 ∈ (0, 1) to obtain f (x, y2 ) =

Z

1

f (x, y) dy +

0

Z

1

∂d f (x, t)k(y2 , t) dt

(8.17)

0

for some bounded measurable function k (see (5.6)), almost every x ∈ S, and y2 ∈ (0, 1). Clearly for a fixed height y ∈ (0, 1) the trace map ·|Sy : C ∞ (U ) ∩ H 1 (U ) −→ C ∞ (S)

 f 7−→ S ∋ x 7−→ f (x, y)

is linear. Using (8.17) and Cauchy–Schwarz for the integration over t ∈ (0, 1) it is also easy to see that the trace map is bounded with respect to k · kH 1 (U) and k · kL2 (S) . Therefore the trace map is defined on H 1 (U ) and takes values in L2 (S). For y1 , y2 ∈ (0, 1) we may also use (8.16) to obtain Z Z Z y2 2 2 |f (x, y2 ) − f (x, y1 )| dx = ∂d f (x, t) dt dx S

6

S

y1 1

S

0

Z Z

|∂d f (x, t)|2 dt dx |y2 − y1 |

by Cauchy–Schwarz for f ∈ C ∞ (U ) ∩ H 1 (U ), which gives p kf |Sy1 − f |Sy2 kL2 (S) 6 k∂d f kL2 (U) |y2 − y1 |

(8.18)

first for f ∈ C ∞ (U ) ∩ H 1 (U ) and then for all f ∈ H 1 (U ). In particular, the map (0, 1) ∋ y 7→ f |Sy ∈ L2 (S) is uniformly continuous for any f ∈ H 1 (U ). We wish to combine the above with the Sobolev embedding theorem on S to obtain similar conclusions with respect to the supremum norm. Clearly if we have f ∈ C ∞ (U ) then f |Sy ∈ C ∞ (S) for all y ∈ (0, 1). Applying the trace operator to the partial derivatives of a function in H k (U ) along the various directions in S now shows that |Sy : H k (U ) → H k−1 (S) is a bounded operator for all k > 1 and that (8.18) implies p kf |Sy1 − f |Sy2 kH k−1 (S) 6 kf kH k (U) |y2 − y1 | (8.19) for f ∈ H k (U ), k > 1, and y1 , y2 ∈ (0, 1). Let now k > 1 + d−1 2 . Using the Sobolev embedding theorem (Theorem 5.34) on S we obtain that H k (U )|Sy belongs to C(S). For κ ∈ (0, 1) the proof of Theorem 5.34 (see Exercise 5.39) also shows that on the compact subset K = [−1 + κ, 1 − κ]d−1 ⊆ S we have kgkK,∞ ≪ kgkH k−1 (S) for all g ∈ H k−1 (S). Together with (8.19) we obtain

282

8 Locally Convex Vector Spaces

sup |f (x, y1 ) − f (x, y2 )| ≪κ kf kH k (U)

x∈K

p |y2 − y1 |

for f ∈ H k (U ) and y1 , y2 ∈ (0, 1). This also shows that for any sequence (yn ) with yn → 0 as n → ∞ the sequence of functions defined by K ∋ x 7→ f (x, yn ) for n > 1 is a Cauchy sequence with respect to k · kK,∞ . k Boundary points, conclusion. Thus if k > 1 + d−1 2 and f ∈ H (U ∩ V ) has Supp f ⊆ U ∩ V , then we can find κ > 0 so that Supp f ⊆ [−1 + κ, 1 − κ]d−1 × [0, 1 − κ] and apply the above to see that f can be continuously extended to U . This proves the remaining local statement. As mentioned before, this discussion also applies to all partial derivatives of f and hence completes the proof of the proposition.  Much as in the proof of Proposition 8.50, the statement in Theorem 8.49 can be reduced to a purely local statement. Indeed, suppose the following holds. (Local statement) For any z (0) ∈ U there exists a neighbourhood V of z (0) in Rd so that if g ∈ H01 (U ) with ∆g = u ∈ H k (U ) for some k > 0 and Supp g ⊆ U ∩ V , then g ∈ H k+2 (U ). Then we may find a finite cover of U consisting of such neighbourhoods and an associated smooth partition of unity. Together with Lemma 5.50 this reduces the proof to the local statements (check this). Moreover, for any interior point z (0) ∈ U we may take V = U , apply elliptic regularity on open subsets (Theorem 5.45 or Theorem 8.48), and multiply g by a smooth ψ ∈ Cc∞ (U ) with ψ ≡ 1 on Supp g to obtain the local statement. We have therefore reduced the proof of Theorem 8.49 to the following: (Local statement at boundary points) For any z (0) ∈ ∂U there exists a neighbourhood V ⊆ Rd so that g ∈ H01 (U ) with ∆g = u ∈ H k (U ) for some k > 0 and Supp g ⊆ U ∩ V implies that g ∈ H k+2 (U ). Proof of Theorem 8.49, Flattening of the boundary. As in the proof of Proposition 8.50 we wish to flatten out the boundary by a diffeomorphism. Indeed, the assumption that U has smooth boundary implies for any boundary point z (0) ∈ ∂U that there exists some δ > 0 and a diffeomorphism Φ defined on V ′ = (−δ, δ)d such that Φ(0) = z (0) and Φ(U ′ ) = U ∩ V , where U ′ = (−δ, δ)d−1 × (0, δ) and V = Φ(V ′ ). The proof of Proposition 8.50 also shows that we may assume that Φ extends to a diffeomorphism on a neighbourhood of V ′ , that Φ maps the Lebesgue measure on V ′ to the Lebesgue measure on V , and induces an isomorphism between H k (U ′ ) and H k (U ∩ V ) for all k > 0. Assume now g ∈ H01 (U ) with ∆g = u ∈ H k (U ) for some k > 0 and Supp g ⊆ U ∩V . Then hg, ∆φiL2 (U) = hu, φiL2 (U) for any φ ∈ Cc∞ (U ∩V ).

8.2 Applications of Weak* Compactness

283

We define g ′ = g ◦ Φ ∈ H01 (U ′ ), u′ = u ◦ Φ ∈ H k (U ′ ), and obtain hg ′ , ∆(φ) ◦ ΦiL2 (U ′ ) = hu′ , φ ◦ ΦiL2 (U ′ ) .

(8.20)

Here φ ◦ Φ ∈ Cc∞ (U ′ ) is again an arbitrary smooth function with compact support in U ′ , which we will denote by ϕ ∈ Cc∞ (U ′ ). We will use the shorthand Ψ = Φ−1 for the inverse diffeomorphism and wish to express ∆(φ) ◦ Φ = (∆(ϕ ◦ Ψ )) ◦ Φ in terms of partial derivatives of ϕ. By the chain rule d X ∂ℓ (ϕ ◦ Ψ ) = ((∂i ϕ) ◦ Ψ )(∂ℓ Ψi ) i=1

and the product rule ∂ℓ2 (ϕ ◦ Ψ ) =

d X

i,j=1

((∂i ∂j ϕ) ◦ Ψ )(∂ℓ Ψi )(∂ℓ Ψj ) +

d X i=1

((∂i ϕ) ◦ Ψ )(∂ℓ2 Ψi )

for all ℓ ∈ {1, . . . , d}, and hence ∆(φ) ◦ Φ = ∆(ϕ ◦ Ψ ) ◦ Φ =

d X

ai,j ∂i ∂j ϕ +

i,j=1

d X

bi ∂i ϕ,

i=1

 Pd ∞ ′ where ai,j = ℓ=1 (∂ℓ Ψi )(∂ℓ Ψj ) ◦ Φ and bi = (∆Ψi ) ◦ Φ belong to C (U ) ′ for all i, j ∈ {1, . . . , d}. Hence the relationship in (8.20) between g and u′ can also be expressed as hg ′ , P ϕiL2 (U ′ ) = hu′ , ϕiL2 (U ′ ) for all ϕ ∈ Cc∞ (U ′ ), where P is the degree two partial differential operator P (ϕ) =

d X

ai,j ∂i ∂j ϕ +

i,j=1

d X

bi ∂i ϕ

i=1

with smooth coefficients ai,j ∈ C ∞ (U ′ ) and bi ∈ C ∞ (U ′ ) for i, j ∈ {1, . . . , d}. We remark that the coefficients ai,j form a symmetric positive-definite matrix. More precisely, for any x′ ∈ U ′ we have ai,j (x′ ) = aj,i (x′ ) for all i, j and on setting x = Φ(x′ ) we also have that d X

i,j=1

ai,j (x′ )vi vj =

d X d X ℓ=1

(∂ℓ Ψi )(x)vi

i=1

X d j=1

2

= kDΨ |x vk > θkvk

2

(∂ℓ Ψi )(x)vj



284

8 Locally Convex Vector Spaces

for all v ∈ Rd and some constant θ. Here DΨ |x is the total derivative of Ψ at x = Φ(x′ ) and by invertibility of Dψ|x and compactness of U ′ the constant θ > 0 can be chosen uniformly for all x ∈ U ′ . This makes P into a uniformly elliptic operator of degree two (as defined below), and we can apply the argument presented below.  Let U ⊆ Rd be open and bounded. We use functions ai,j , bi , c ∈ C ∞ (U ) for i, j = 1, . . . , d to define a partial differential operator P by P (φ) =

d X

ai,j ∂i ∂j φ +

i,j=1

d X

bi ∂i φ + cφ

i=1

for φ ∈ Cc∞ (U ). The operator P is called a uniformly elliptic operator of degree two if ai,j = aj,i for all i, j ∈ {1, . . . , d} and there exists some uniform constant θ > 0 with d X

ai,j (x)vi vj > θkvk2 = θ

i,j=1

d X

vi2

i=1

for all v ∈ Rd and x ∈ U . These uniformly elliptic operators satisfy the following (by now familiar) result. Theorem 8.52 (Elliptic regularity for P ). Let U ⊆ Rd be bounded and open with smooth boundary, and let P be a uniformly elliptic operator of degree two. Let g ∈ H01 (U ), k > 0, and u ∈ H k (U ). Suppose that hg, P φiL2 (U) = hu, φiL2 (U) for all φ ∈ Cc∞ (U ). Then g ∈ H k+2 (U ). We will not prove this theorem in detail (see Exercise 8.53), but instead consider only the ‘local version’ needed to complete the proof of elliptic regularity of the Laplace operator in Theorem 8.49. (Local statement at boundary of a box) Let V = (−1, 1)d and U = (−1, 1)d−1 × (0, 1), and let k > 0. Assume that g ∈ H01 (U ) with Supp g ⊆ U ∩ V = (−1, 1)d−1 × [0, 1),

and u ∈ H k (U ) with

hg, P φiL2 (U) = hu, φiL2 (U)

(8.21)

for all φ ∈ Cc∞ (U ). Then g ∈ H k+2 (U ).

(8.22)

8.2 Applications of Weak* Compactness

285

Proof of the local statement for Theorem 8.52 at a flat boundary. We first assume that k = 0 and let φ be a function in Cc∞ (U ). By assumption, Z

g

d X

ai,j (∂i ∂j φ) +

i,j=1

d X i=1

Z  bi (∂i φ) + cφ dx = uφ dx.

(8.23)

By integration by parts and the product rule we also have Z Z gai,j (∂i ∂j φ) dx = − ∂ i (gai,j )(∂j φ) dx Z Z ∂ i g)(∂j φ) dx − g(∂i ai,j )(∂j φ) dx = − ai,j (∂ Z Z ∂ i g)(∂j φ) dx + ∂ j (g∂i ai,j )φ dx = − ai,j (∂ and

Z

gbi ∂i φ dx = −

Z

∂ i (gbi )φ dx

for all i, j ∈ {1, . . . , d}. Using this we can rewrite (8.23) in the form −

d Z X

∂ i g)(∂j φ) dx = ai,j (∂

i,j=1

Z

for all φ ∈ Cc∞ (U ), where u e=u−

d X

∂ j (g∂i ai,j ) +

i,j=1

d X i=1

u eφ dx

(8.24)

∂ i (gbi ) − cg ∈ L2 (U ).

(8.25)

This allows us to ignore the first and zero order terms in the definition of P . Since (8.24) only involves the first derivatives of φ ∈ Cc∞ (U ), it also holds by continuity for any φ = v ∈ H01 (U ). The choice of v. We fix some ℓ ∈ {1, . . . , d − 1}. Then v = Dℓ−h Dℓh g ∈ H01 (U ) for small enough h ∈ R by the assumptions g ∈ H01 (U ), Supp g ⊆ U ∩ V , and ℓ 6= d. With this choice, (8.24) becomes − |

d Z X

i,j=1

∂ i g)(∂ ∂ j Dℓ−h Dℓh g) dx ai,j (∂ {z L

}

=

Z

|

u eDℓ−h Dℓh g dx . {z R

}

286

8 Locally Convex Vector Spaces

Studying the left-hand side. Using the assumption Supp g ⊆ U ∩ V and the abbreviations gi = ∂ i g, and ahi,j (x) = ai,j (x + heℓ ), we see that L= =

d Z X

Dℓh (ai,j gi )Dℓh gj dx

i,j=1

d Z X

i,j=1

|

d Z X     ahi,j Dℓh gi Dℓh gj dx + Dℓh ai,j gi Dℓh gj

{z

}

M

i,j=1

since the product rule for Dℓh has the form

|

{z E

}

 1 ai,j (x + heℓ )gi (x + heℓ ) − ai,j (x)gi (x) h  1 = ai,j (x + heℓ ) gi (x + heℓ ) − gi (x) h   + ai,j (x + heℓ ) − ai,j (x) gi (x)  = ahi,j (x)Dℓh gi (x) + Dℓh ai,j (x)gi (x)

Dℓh (ai,j gi )(x) =

for i, j ∈ {1, . . . , d} and x ∈ U . For x ∈ U close to ∂U rV we have gi (x) = 0. Extending gi and ai,j trivially, this also holds for all x ∈ Rd . The term M is our main term since the uniform ellipticity assumption on P implies that d X M>θ kDℓh gi k22 . (8.26) i=1

The extra term E may be bounded using the Cauchy inequality with an ε as in (8.11) to obtain |E| 6

d  X

i,j=1

6 εθ

−1

εkDℓh gj k22 + d

  1 k Dℓh ai,j gi k22 4ε

κd X dM + kgi k22 , 4ε i=1

(8.27)

where we also used (8.26) and bounded the supremum norm of Dℓh ai,j on Supp gj by some constant κ > 0. Bounding the right-hand side. Since our right-hand side has the same shape as the right-hand side of (8.13) the argument on p. 277 applies to give |R| 6 εkDℓh gℓ k22 +

1 1 ke uk22 6 εθ−1 M + ke uk22 . 4ε 4ε

(8.28)

8.2 Applications of Weak* Compactness

287

Putting the estimates together. Using L = M + E = R together with the estimates in (8.26)–(8.28), we obtain |M| = |R − E| 6 |R| + |E| 6 εθ−1 (d + 1)M + 1ε C, where C > 0 depends only on (ai,j )i,j , (gi )i and u e but not on h. We choose ε θ to be 2(d+1) which allows us to obtain the estimate |M| 6 2ε C and hence d X i=1

kDℓh gi k22 6 θ−1 |M| 6

2 θε C

(8.29)

for all sufficiently small h ∈ Rr{0} and ℓ ∈ {0, . . . , d − 1}. Existence of second weak partial derivatives. The uniform estimate in (8.29) and Corollary 8.43 applied to the subset Ws = (−1 + s, 1 − s)d−1 × (0, 1) ⊆ U for some s > 0 implies that ∂ ℓ (gi |Ws ) exists for any i ∈ {1, . . . , d} and for any ℓ ∈ {1, . . . , d − 1}. We choose s > 0 such that Supp g ⊆ (−1 + 2s, 1 − 2s)d−1 × [0, 1). Extending ∂ (gi |Ws ) trivially we obtain the existence of ∂ ℓ gi = ∂ ℓ∂ i g on U for i ∈ {1, . . . , d} and ℓ ∈ {1, . . . , d − 1} (check this). To show the existence of the partial derivative ∂ d∂ d g we use the argument above and the operator P more directly. In fact, we note that ad,d > θ on U 1 by the uniform ellipticity assumption on P . Let φ ∈ Cc∞ (U ). Using ad,d φ in (8.24) we obtain Z Z X Z     1 1 1 ad,d ∂ d g ∂d ad,d φ dx = − ai,ℓ ∂ i g ∂ℓ ad,d φ dx − u e ad,d φ dx 16i,ℓ6d, (i,ℓ)6=(d,d)

and hence also Z Z ∂ d g∂d φ dx = − ad,d∂ d g∂d

X Z

+

∂ ℓ ai,ℓ∂ i g

16i,ℓ6d, (i,ℓ)6=(d,d)



1

which shows the existence of ∂ d∂ d g = ad,d∂ d g∂d

X

∂ ℓ ai,ℓ∂ i g

16i,ℓ6d, (i,ℓ)6=(d,d)



Z

1 u e ad,d φ dx,

1 +u e ad,d ∈ L2 (U ).

(8.30)

288

8 Locally Convex Vector Spaces

Since Supp g ⊆ U ∩ V by assumption, we may extend g and its partial derivatives trivially from (−1, 1)d−1 × (0, 1) to the half-space Rd−1 × (0, ∞) and again obtain a function g together with all its degree one and two partial derivatives. Extending Corollary 8.47. From the existence of all second-order partial derivatives of g on U we would like to conclude that g ∈ H 2 (U ). For this we again combine Proposition 8.45 and Lemma 8.46, much as in the argument in the proof of Corollary 8.47. By continuity of the regular representation in Lemma 3.74 we may find, for every ε > 0, some s > 0 so that the function gs defined by gs (x) = g(x + sed ) for x ∈ Rd−1 × (−s, ∞) and extended trivially outside that set satisfies ∂ α g − ∂ α gs k2 < ε k∂ for all α ∈ Nd0 with kαk 6 2. By Proposition 8.45 and Lemma 8.46 the function g δ = gs ∗ δ is smooth, and satisfies kgδ − gk2 < 2ε for δ ∈ (0, s) sufficiently small. Also by Proposition 8.45 and our shift of the functions the derivatives ∂α g δ of gδ for all α ∈ Nd0 with kαk 6 2 can be expressed on U by convolution of ∂ α gs with δ and so k∂α gδ − ∂ α gkL2 (U) < 2ε for sufficiently small δ ∈ (0, s) and α ∈ Nd0 with kαk 6 2 . Therefore g ∈ H 2 (U ), which concludes the case k = 0. Induction on k > 0. The argument above gives the base of the induction on k. Suppose we already know (8.22) for k − 1 > 0 and assume again that u ∈ H k (U ). By the inductive hypothesis we already know g ∈ H k+1 (U ). We again fix some ℓ ∈ {1, . . . , d − 1} and claim that ∂ ℓ g ∈ H01 (U ) and that there exists some uℓ ∈ H k−1 (U ) with ∂ ℓ g, P φi = huℓ , φi h∂

(8.31)

for all φ ∈ Cc∞ (U ). The inductive hypothesis then implies ∂ ℓ g ∈ H k+1 (U ). By varying ℓ ∈ {1, . . . , d−1} this shows the existence of all partial derivatives ∂ α g for α ∈ Nd0 with kαk1 6 k + 2 except for α = (k + 2)ed . To obtain this partial derivative we take the kth partial derivative of ∂ 2d g as in (8.30) with respect to the dth coordinate and obtain the existence of ∂ (k+2)ed g by using the other partial derivatives of degree at most (k + 2). In fact, if we have u ∈ H k (U ) and g ∈ H k+1 (U ) then u e ∈ H k (U ) by (8.25). Using ∂ ℓ g ∈ H k+1 (U ) for each ℓ ∈ {1, . . . , d − 1} and (8.30) we also have ∂ 2d g ∈ H k (U ). As in the base case, this implies that g ∈ H k+2 (U ). For the proof of the first part of the claim we note that f ∈ H 1 (U ) and Supp(f ) ⊆ U ∩ V implies  1 Dℓh f (x) = h

Z

0

h

∂ ℓ f (x + seℓ ) ds

(8.32)

8.2 Applications of Weak* Compactness

289

for sufficiently small h ∈ Rr{0} and almost every x ∈ U . Indeed, this holds for any f ∈ H 1 (U ) ∩ C ∞ (U ) and x, x + heℓ ∈ U , extends by continuity to any f ∈ H 1 (U ) and almost every x ∈ U with x+heℓ ∈ U , and then holds for f with Supp f ⊆ U ∩ V for almost every x ∈ U and sufficiently small h 6= 0. By the continuity of the regular representation in Lemma 3.74 the identity (8.32) implies that ∂ ℓ f = lim Dℓh f. h→0

2

Using this for g ∈ H (U ) and its partial derivatives ∂ i g for i = 1, . . . , d we find for a given ε > 0 some h > 0 such that ∂ ℓ g − Dℓh gk2 < ε k∂ and ∂ ℓ∂ i g − Dℓh∂ i gk2 < ε k∂

for i = 1, . . . , d. Since g ∈ H01 (U ), Supp(g) ⊆ U ∩ V and ℓ ∈ {1, . . . , d − 1} we have Dℓh g ∈ H01 (U ) for all sufficiently small h, and since ε > 0 was arbitrary this implies ∂ ℓ g ∈ H01 (U ), as claimed. For the second part (8.31) of the claim we let φ ∈ Cc∞ (U ) and calculate ∂ ℓ g, P φi = − hg, ∂ℓ (P φ)i h∂ * + d d X  X = − g, ∂ℓ ai,j ∂i ∂j φ + bi ∂i φ + cφ i,j=1

*

= − g, P ∂ℓ φ +

=

∂ℓ ai,j

i,j=1

= − hu, ∂ℓ φi − *

i=1

d X

d X

i,j=1

∂ ℓu −

d X



+ d  X    ∂i ∂j φ + ∂ℓ bi ∂i φ + ∂ℓ c φ i=1

hg∂ℓ ai,j , ∂i ∂j φi −

∂ i∂ j (g∂ℓ ai,j ) +

i,j=1

= huℓ , φi ,

d X i=1

d X i=1

hg∂ℓ bi , ∂i φi−hg∂ℓ c, φi +

∂ i (g∂ℓ bi ) − g∂ℓ c, φ

where uℓ =

∂ ℓu |{z}

∈H k−1 (U)

d X

i,j=1

|

∂ i∂ j (g∂ℓ ai,j ) + {z

∈H k−1 (U)

}

d X i=1

|

∂ i (g∂ℓ bi ) − {z

∈H k (U)

}

g∂ℓ c |{z}

∈H k+1 (U)

belongs to H k−1 (U ), as claimed. This concludes the induction and hence the proof of Theorem 8.49. 

290

8 Locally Convex Vector Spaces

Exercise 8.53. Complete the proof of Theorem 8.52 following the steps below. (a) State and give a detailed proof of the extension of Corollary 8.47 that was used in the above proof. (b) Generalize Lemma 5.50 to allow the uniformly elliptic operator P instead of just ∆. (c) Use the assumption that U has smooth boundary and a smooth partition of unity to localize the situation. Apply the above proof on each of the local statements.

8.3 Topologies on the space of bounded operators Let X and Y be Banach spaces. Then we have seen that the space B(X, Y ) of bounded linear operators from X to Y together with the operator norm is again a Banach space. Definition 8.54. Let X and Y be Banach spaces. The topology on B(X, Y ) induced by the operator norm is called the uniform operator topology. Since any Banach space has a weak topology, there is of course also a weak topology on B(X, Y ). There are, however, further topologies that make special use of the fact that B(X, Y ) is a space of maps. Definition 8.55. Let X and Y be Banach spaces. The strong operator topology on B(X, Y ) is the weakest topology for which the evaluation maps B(X, Y ) ∋ L 7−→ Lx ∈ Y are continuous for every x ∈ X, where we use the norm topology on Y . In other words, a neighbourhood of L0 ∈ B(X, Y ) in the strong operator topology is a set containing a set of the form Nx1 ,...,xn ;ε (L0 ) =

n \ 

i=1

L ∈ B(X, Y ) | kLxi − L0 xi k < ε

for some x1 , . . . , xn ∈ X and ε > 0. Equivalently, we could define the strong operator topology by using all neighbourhoods defined by the semi-norms kLkx1,...,xn = max{kLx1 k, . . . , kLxn k}. The strong operator topology is in many situations more natural than the uniform topology, and the study of unitary representations (see Definition 3.73 for the general definition) is an example. Example 8.56. Let H = L2 (R) and define for x ∈ R the unitary map ρx : H → H

8.3 Topologies on the space of bounded operators

291

by ρx f (t) = f (t − x) for t ∈ R and f ∈ H. We claim that ( 2 if x 6= y; kρx − ρy k = 2δxy = 0 if x = y. In fact, if x < y and in Figure 8.1) by   0 f (t) = (−1)m   0

M > 0 then we can define a function f ∈ H (illustrated for t < 0; for t ∈ [m(y − x), (m + 1)(y − x)) with 0 6 m < M ; for t > M (y − x).

Fig. 8.1: The function f in Example 8.56.

Then ρy−x f satisfies |(f − ρy−x f ) (t)| = |2f (t)| = 2 for almost every t ∈ (y − x, M (y − x)), so that p kf − ρy−x f k2 > 2 (M − 1)(y − x) p while kf k2 = M (y − x). As M > 1 was arbitrary, this shows that kρx − ρy k = kI − ρy−x k = 2

since kρx k = kρ−x k = 1. The claim implies that the map R ∋ x 7→ ρx ∈ B(H, H) is not continuous with respect to the uniform operator topology. However, it is continuous with respect to the strong operator topology, for if f1 , . . . , fn ∈ H and ε > 0 are given, then for y sufficiently close to x we have kρy fi − ρx fi k2 < ε

292

8 Locally Convex Vector Spaces

for i = 1, . . . , n by the continuity property from Lemma 3.74. Thus, for y sufficiently close to x we have ρy ∈ Nf1 ,...,fn ;ε (ρx ), as required. Exercise 8.57. Let X and Y be Banach spaces. Show that the strong operator topology on B(X, Y ) has the following properties: (1) it is Hausdorff; (2) it is weaker than the uniform operator topology (defined by the operator norm); (3) a sequence (Tn ) in B(X, Y ) converges to T0 ∈ B(X, Y ) as n → ∞ in the strong operator topology if and only if Tn (v) → T (v) as n → ∞ for all v ∈ X; and (4) a filter F on B(X, Y ) converges to T0 ∈ B(X, Y ) if the filter generated by



{T v | T ∈ F } | F ∈ F

converges to T0 v for all v ∈ X.

Another topology on B(X, Y ) is built up using functionals on Y . Definition 8.58. Let X and Y be Banach spaces. The weak operator topology on B(X, Y ) is the weakest topology with respect to which the maps B(X, Y ) ∋ L 7−→ y ∗ (Lx) are continuous for all x ∈ X and y ∗ ∈ Y ∗ . Equivalently, the weak operator topology can be defined using the neighbourhoods defined by the semi-norms kLkx1 ,y1∗ ;x2 ,y2∗ ;...;xn ,yn∗ = max{|y1∗ (Lx1 )|, . . . , |yn∗ (Lxn )|}. Exercise 8.59. Assume that X and Y are infinite-dimensional Banach spaces. Show that the uniform topology, the weak topology, the strong operator topology, and the weak operator topology are all different Hausdorff topologies on B(X, Y ).

8.4 Locally Convex Vector Spaces Even if we were initially only interested in Banach spaces, the last few sections should have left no doubt that the next definition is natural and unavoidable. It gives a class of topological vector spaces generalizing normed vector spaces. Definition 8.60. Let X be a vector space (over R or C) and suppose that {k · kα | α ∈ A} is a family of semi-norms on X with the property that for every x ∈ Xr{0} there is some α ∈ A with kxkα > 0. Then the locally convex topology on X induced by the semi-norms is the topology for which a neighbourhood of the point x0 ∈ X is a set containing a set of the form

8.4 Locally Convex Vector Spaces

Nα1 ,...,αn ;ε (x0 ) =

n \

i=1

k·kαi

293

 (x0 ) = x ∈ X | max kx − x0 kαi < ε . i=1,...,n

The vector space X together with this topology is called a locally convex vector space. Equivalently, a locally convex topology is the weakest topology that is stronger than those defined by a collection of semi-norms. Enlarging the collection of semi-norms if necessary, we may assume that for α1 , . . . , αn ∈ A the semi-norm kxk′ = max kxkαi i=1,...,n

also belongs to the collection (that is, coincides with k · kα for some α ∈ A). If this is the case, then the neighbourhoods of x ∈ X are sets containing a ball of the form  Bεk·kα (x0 ) = x ∈ X | kx − x0 kα < ε

for some α ∈ A and ε > 0. An equivalent definition of locally convex vector spaces is obtained by requiring that the topology on the vector space X is Hausdorff, makes addition and scalar multiplication continuous, and has a basis of neighbourhoods of the point 0 ∈ X consisting of absorbent balanced convex sets. Here a convex set C ⊆ X is balanced if for any x ∈ C and scalar ρ with |ρ| 6 1 we also have ρx ∈ C, and is absorbent if for any x ∈ X there exists some α > 0 with αx ∈ C. We refer to Exercise 8.72 and Conway [19, Sec. IV.1] for the equivalence, as we will not need it in this form. Essential Exercise 8.61. Show that a locally convex vector space (as in Definition 8.60) has the property that addition and scalar multiplication are continuous, and that 0 ∈ X has a basis consisting of absorbent balanced convex sets. As the next exercise shows, even if a locally convex vector space topology cannot be described using a norm, the locally convex structure is enough to obtain results similar to those obtained as corollaries of the Hahn–Banach theorem (Theorem 7.3). Exercise 8.62. Let X be a locally convex vector space. Show that the space X ∗ of continuous linear functionals on X separates points.

We have seen many examples of locally convex vector spaces. These include normed vector spaces with their norm or weak topology, duals of Banach spaces with the weak* topology, and the space B(X, Y ) of operators between two Banach spaces with any of the topologies discussed in Section 8.3. However, there are further spaces that we have neglected so far because they do not fit well (or at all) into the framework of normed spaces.

294

8 Locally Convex Vector Spaces

Example 8.63. (1) The space C ∞ ([0, 1]) is a locally convex vector space with the semi-norms kf kC n([0,1]) = max kf (j) k∞ j=0,...,n

for n ∈ N. Notice that even though each of these semi-norms is already a norm, we still have to use all of them to define the locally convex topology on C ∞ ([0, 1]) we are interested in, namely the topology of uniform convergence of all derivatives. Notice that differentiation D

C ∞ ([0, 1]) ∋ f 7−→ f ′ ∈ C ∞ ([0, 1]) is a continuous operator on C ∞ ([0, 1]). (2) Let U ⊆ Rd be an open set. Then \ Cb∞ (U ) = Cbk (U ), k>0

with Cbk (U ) defined as in Example 2.24(6), is another example of a locally convex vector space if we use all of the norms k · kCbk (U) for k > 1. (3) Let U ⊆ Rd be an open set. Another important notion of convergence in analysis for functions on U is the notion of uniform convergence on compact subsets. For example, on the space C(U ) this notion is captured if we use the collection of semi-norms {k · kK,∞ | K ⊆ U compact}, where kf kK,∞ = sup |f (x)| x∈K

for f ∈ C(U ) is the supremum norm of the restriction to K. (4) Let U ⊆ Rd be an open set. We can also make Cc (U ) into a locally convex space in a natural way by endowing it with the collection of semi-norms {k · kF | F ∈ C(U )}, where kf kF = kf F k∞ for f ∈ Cc (U ) is the supremum norm taken after multiplication by F ∈ C(U ). The corresponding notion of convergence is less familiar but is natural for elements of Cc (U ) (see Exercise 8.64 below). The convergence is uniform across U , and this remains true after multiplication with any continuous function on U , however rapidly it might increase towards ∂U . Exercise 8.64. We use the notation from Example 8.63(4) in this exercise. (a) Show that f ∈ C(U ) belongs to Cc (U ) if and only if kf kF < ∞ for all F ∈ C(U ). (b) Suppose that (fn ) is a sequence of functions in Cc (U ) that converges in Cc (U ) to some f ∈ Cc (U ). Show that there exists a compact set K ⊆ U such that Supp(fn ), Supp(f ) ⊆ K for all n > 1.

8.4 Locally Convex Vector Spaces

295

In general, the topology of a locally convex vector space is not metrizable. One important situation in which it is metrizable is when it is sufficient to use countably many semi-norms. This is the case in Example 8.63(1), (2) and (3) (for the latter recall that an open subset of Rd is σ-compact), but is not for Example 8.63(4). If the locally convex topology on X is given by the semi-norms k · kn for n ∈ N, then we can define a metric on X as in Lemma A.17, leading to the following definition. Definition 8.65. A Fr´echet space is a locally convex vector space X whose topology is defined by countably many semi-norms k · kn for n ∈ N, such that X is complete with respect to the metric d(x, y) =

∞ X

1 kx−ykn 2n 1+kx−ykn .

(8.33)

n=1

Exercise 8.66. Suppose that the topology of a locally convex vector space X is induced by countably many semi-norms k · kn for n ∈ N. (a) Show that a sequence (xn ) in X is a Cauchy sequence with respect to the metric in (8.33) if and only if (xn ) is a Cauchy sequence with respect to all of the semi-norms k·kn for n ∈ N. (b) Show that if two families of semi-norms {k · kn } and {k · k′n } make X into a locally convex vector space with the same topology, then X is complete with respect to d if and only if X is complete with respect to d′ , where d′ is defined using {k·k′n } (just as in (8.33)). (c) Show that the spaces from Example 8.63(1), (2) and (3) are Fr´ echet spaces.

The following exercise indicates why we restricted attention to the study of locally convex vector spaces instead of considering the larger class of topological vector spaces. Exercise 8.67 (A topological vector space with trivial dual). Let MF([0, 1]) denote the space of all (equivalence classes of) complex-valued measurable functions on [0, 1], where functions are equivalent if they agree almost everywhere with respect to Lebesgue measure m on [0, 1]. Given f0 ∈ MF([0, 1]) and ε > 0 we define the ε-neighbourhood of f0 by   Uε (f0 ) = f ∈ MF([0, 1]) | m {x ∈ [0, 1] | |f (x) − f0 (x)| > ε} < ε .

(a) Show that the above system of neighbourhoods defines a basis of neighbourhoods with respect to a topology on MF([0, 1]). We note that the corresponding notion of convergence is called convergence in measure. (b) Show that the vector space operations in MF([0, 1]) are continuous, making MF([0, 1]) into a so-called topological vector space. (c) Show that the dual space MF([0, 1])∗ = {ℓ : MF([0, 1]) → C | ℓ is linear and continuous} is given by MF([0, 1])∗ = {0}. (d) Show that if a sequence (fn ) in MF([0, 1]) converges almost everywhere to f in MF([0, 1]), then it also converges in measure to f . Give an example of a sequence that converges in measure but not almost everywhere. (e) Show that a sequence (fn ) in MF([0, 1]) converges in measure to f in MF([0, 1]) if every subsequence of (fn ) has a subsequence that converges almost everywhere to f .

296

8 Locally Convex Vector Spaces

8.5 Distributions as Generalized Functions Both in applications and within mathematics it is often useful to have a generalized notion of function to allow, for example, a function F on R with the property that Z φ(x)F (x) dx = φ(0) (8.34) R

for any ‘nice’ function φ : R → R. Such an F might represent a point mass (a dimensionless object of mass 1 located at 0), or be a mathematical representation of an impulse in physics. Since F is certainly not a function, one needs to develop a new theory that includes such objects.(27) The theory of distributions allows for such generalized functions, and permits them to be differentiated, multiplied by smooth functions, and so on. Of course if we were only interested in expressions of the form in (8.34) then we could simply study measures, since (8.34) is simply the integral against the Dirac measure δ0 at the origin. However, within the space of measures it does not normally make sense to take derivatives (and this is the case for δ0 ), while it is possible to define a derivative map in the space of distributions. The most direct approach to distributions superficially seems to be a cheat: We declare a distribution to be a linear continuous functional (that is, a linear continuous map to the base field R or C) on a space of nice test functions {φ}. Here the definition of ‘nice’ may vary, to give different classes of distributions. For example, we could fix an open subset U ⊆ Rd and all φ ∈ Cc∞ (U ) as test functions. Requiring continuity of the linear functional is natural but needs a topology on Cc∞ (U ). We declare the topology on Cc∞ (U ) by introducing the following systems of semi-norms, which make Cc∞ (U ) into a locally convex vector space. In fact, for every α ∈ Nd0 and F ∈ C(U ) we define the semi-norm kf kα,F = k(∂α f )F k∞

for f ∈ Cc∞ (U ). Using F = 1 and α = 0 shows that these include k·k∞ , so that the topology is indeed Hausdorff. We define the space D(U ) of distributions on U to be the space of continuous linear functionals on the locally convex vector space Cc∞ (U ). This definition of a distribution is a cheat because we have finessed the problem that no function F satisfies (8.34) by simply declaring F to be the distribution (that is, continuous linear functional) which sends the test function φ to φ(0) without giving a more direct generalization of functions on R. We may write this formally as hF, φi = φ(0), where we write hF, φi for the R action of the functional F on the test function φ. One sometimes also writes R F φ for hF, φi, especially if we continue to think

8.5 Distributions as Generalized Functions

297

of F as a generalized function, but whenever one wants to prove something about F one has to go back to the formal definition of F as a functional on Cc∞ (U ). Even though this may look dubious at first sight, the intuition provided by the viewpoint that F is a generalized function is often useful, and will stay consistent with the formal treatment of F as a linear functional. Our discussion of the Dirichlet boundary value problem (in Section 5.2–5.3) and the eigenfunctions of the Laplace operator (in Section 6.4) have already made use of the viewpoint provided by distribution theory. However, we will not develop the theory here, referring to the monograph of Schwartz [94, 95] for a thorough treatment. Exercise 8.68. Show that any integrable function on an open subset U ⊆ Rd gives rise to a distribution. That is, any f in L1 (U ) defines a linear functional Ff on the space Cc∞ (U ) of smooth compactly supported functions via

Ff , φ =

Z

f (x)φ(x) dx.

Prove that the resulting map f 7−→ Ff is linear and injective. Actually it is sufficient to assume that f ∈ L1loc (U ), the space of locally integrable functions, measurable functions that are integrable on any compact set. Exercise 8.69. Show that no measurable and locally integrable function f : R → R has the property (8.34) for all φ ∈ Cc∞ (R). Exercise 8.70. Let U ⊆ Rd be open and α ∈ Nd0 . (a) Show that the linear map ∂α : Cc∞ (U ) −→ Cc∞ (U ) f 7−→ ∂α f

is continuous with respect to the locally convex topology on Cc∞ (U ). ∗ : D(U ) → D(U ) by ∂ F = −F ◦ ∂ for all F ∈ D(U ). Show that we (b) Define ∂ α = −∂α α α have ∂ α ψ = ∂α ψ for ψ ∈ C ∞ (U ) if we identify ψ with the distribution Fψ : Cc∞ (U ) ∋ f 7−→

Z

f ψ dm

as in Exercise 8.68. In other words, the operator ∂ α extends differentiation on C ∞ (U ) to D(U ). (c) Show that for ψ ∈ C ∞ (U ) and F ∈ D(U ) we can define a new distribution ψ · F : Cc∞ (U ) ∋ φ 7−→ hψ · F, φi = hF, ψφi which depends linearly on ψ ∈ C ∞ (U ) and linearly on F ∈ D(U ). Prove also the product rule ∂ j F ). ∂ j (ψ · F ) = (∂j ψ) · F + ψ · (∂

298

8 Locally Convex Vector Spaces

8.6 Convex Sets A set K ⊆ X in a vector space is called absorbent if for any x ∈ X there exists some α > 0 with αx ∈ K. We note that for a convex set K ⊆ X with 0 ∈ K and some given x ∈ X the set {t > 0 | 1t x ∈ K} is either empty, an open interval (pK (x), ∞), or a closed interval [pK (x), ∞) for some pK (x) > 0. k·k Moreover, if K = B1 for some semi-norm k · k on X, then K is an absorbent convex set and pK (x) = kxk for any x ∈ X. A partial converse is given by the following result, which gives a solution to Exercise 7.2. Lemma 8.71. Let K ⊆ X be an absorbent convex set in a vector space. Define the gauge function pK : X → R>0 by pK (x) = inf{t > 0 | 1t x ∈ K}. Then pK (αx) = αpK (x) and pK (x + y) 6 pK (x) + pK (y) for all α > 0 and x, y ∈ X. Proof. The positive homogeneity follows directly from the definition. Suppose now that x, y ∈ X and tx , ty > 0 have 1 1 x, y ∈ K. tx ty Then

1 tx (x + y) = tx + ty tx + ty



(8.35)

   1 ty 1 x + y tx tx + ty ty

also lies in K, since K is convex. Thus pK (x + y) 6 tx + ty , and since this holds for all tx , ty with (8.35), the triangle inequality follows.  Exercise 8.72. Use Lemma 8.71 to prove the converse to Exercise 8.61. More precisely, let X be a vector space endowed with a Hausdorff topology. Assume that addition and scalar multiplication are continuous and that 0 ∈ X has a basis of neighbourhoods consisting of absorbent balanced convex sets. Show that X is a locally convex space in the sense of Definition 8.60.

In the following discussion concerning convex sets we will frequently restrict to real locally convex vector spaces. This is not a severe restriction as every complex locally convex vector space X can also be considered a real vector space, and every continuous linear functional ℓR on X has the form ℓR = ℜℓC for a continuous linear functional ℓC on X (see the proof of the complex case in Theorem 7.3). The next result strengthens Corollary 7.4 and Exercise 8.62, and is readily explained by Figure 8.2. Theorem 8.73 (Separation from convex sets). Let X be a locally convex vector space. Let K ⊆ X be a closed convex set, and suppose that z ∈ XrK.

8.6 Convex Sets

299

K

ℓ(x) = c z

Fig. 8.2: Separation of z ∈ / K from the convex set K by a closed hyperplane.

Then there exists a continuous linear functional ℓ ∈ X ∗ and a constant c ∈ R such that ℓ(y) 6 c < ℓ(z) for all y ∈ K. Proof. Since z ∈ / K and K is closed, XrK is a neighbourhood of z, and in particular Nα1 ,...,αn ;ε (z) ⊆ XrK for some α1 , . . . , αn ∈ A and ε > 0 (see Definition 8.60 for the notation). We define U = Nα1 ,...,αn ;ε/2 (0), so that z + 2U ⊆ XrK. Without loss of generality we may assume that 0 ∈ K (for otherwise we can just translate both K and z by the negative of an element of K). Define M = K + U = {y + u | y ∈ K, u ∈ U } and notice that M is convex because both K and U are (check this) and that M is absorbent as it contains U . We now apply Lemma 8.71 to obtain the norm-like function pM . By definition, we have pM (·) 6 2ε max{k · kα1 , . . . , k · kαn } (8.36) since U ⊆ M . We claim that pM (z) > 1. For otherwise there exists a sequence (λn ) with λn → 1 as n → ∞ and with 1 λn z

for all n > 1. Clearly

1 λn z

= kn + un ∈ M = K + U

and un are bounded in the semi-norms k · kα1 , . . . , k · kαn ,

so the same holds also for kn . Now rewrite the above equation as z = kn + (λn − 1)kn + λn un and notice that for large enough n we have (λn − 1)kn + λn un ∈ 21 U + 32 U = 2U, since λn → 1 as n → ∞. However, this contradicts z + 2U ⊆ XrK. Therefore we must have pM (z) > 1.

300

8 Locally Convex Vector Spaces

Now define Z = Rz and a functional ℓ ∈ Z ∗ by ℓ(αz) = αpM (z). Observe that ℓ(αz) 6 pM (αz) for all α ∈ R (for α > 0 we have equality and for α < 0 it follows from ℓ(αz) < 0 6 pM (αz)) and ℓ(z) > 1. By the Hahn–Banach lemma (Lemma 7.1) ℓ extends to all of X with ℓ(x) 6 pM (x) for all x ∈ X. This implies that ℓ(y) 6 1 for y ∈ K ⊆ M . Moreover, this estimate also gives continuity of ℓ since (8.36) implies that ℓ(x) 6

2 ε

max{kxkα1 , . . . , kxkαn },

2 ε

max{kxkα1 , . . . , kxkαn }

by linearity of ℓ and since the right-hand side is a semi-norm. This gives the theorem (for c = 1).  Since the weak topology is, for infinite-dimensional vector spaces, strictly coarser than the norm topology, there is no reason why a set that is closed in the norm topology should be closed in the weak topology. However, for convex sets the situation is better. Corollary 8.74. Suppose that K is a convex set in a real Banach space X. norm weak Then the norm and weak closure of K agree. That is, K =K . norm

Proof. Suppose that z ∈ /K and apply Theorem 8.73 to find a continunorm ous linear functional ℓ ∈ X ∗ with ℓ(y) 6 c < ℓ(z) for all y ∈ K and some c ∈ R. Therefore, ℓ−1 ((c, ∞)) ⊆ XrK is a neighbourhood of z in the weak weak norm weak topology and z ∈ / K follows. Thus K ⊆ K . The reverse norm weak inclusion K ⊆K is clear as the norm topology is stronger than the weak topology.  Exercise 8.75. Generalize Corollary 8.74 to complex Banach spaces. Exercise 8.76. Suppose that K, L ⊆ X are disjoint convex sets in a locally convex vector space X over R. Suppose one of them has non-empty interior. Show that there exists a non-trivial continuous linear functional ℓ and a constant c ∈ R such that ℓ(x) 6 c 6 ℓ(y) for all x ∈ K and y ∈ L. Exercise 8.77. Let X be a normed vector space over R, and let K ⊆ X be a non-empty closed and convex subset. Show that



inf kz − xk = sup ℓ(z) − sup ℓ(x)

x∈K

ℓ∈X ∗

kℓk=1

x∈K

for any z ∈ XrK. Exercise 8.78. Let X be a real Banach space and let ı : X → X ∗∗ be the embedding  ∗∗ of X into its bidual as in Corollary 7.9. Show that ı B1X is dense in B1X when X ∗∗ is equipped with the weak* topology.

8.6 Convex Sets

301

8.6.1 Extreme Points and the Krein–Milman Theorem An important concept for convex sets, both abstractly and for many concrete applications (see, for example, Proposition 8.36), is the notion of extreme points. Definition 8.79. Let X be a locally convex space and let K ⊆ X be a convex subset. An element x ∈ K is an extreme point of K if x cannot be expressed as a proper convex combination of points of K (that is, if x = sy + (1 − s)z with y, z ∈ K and s ∈ (0, 1) then we must have x = y = z). As illustrated in Figure 8.3, the set of extreme points of a convex set will not be closed in general, even in a finite-dimensional setting. In infinitedimensional spaces, the situation is more complex still and the extreme points may even be dense (see Exercise 8.84). The smallest closed convex subset of a locally convex space X that contains A ⊆ X is called the closed convex hull of A and is the intersection of all closed convex sets containing A, or equivalently the closure of the convex hull of A.

Fig. 8.3: The set of extreme points need not be closed: Here two cone-like objects are glued together at their base so that a single straight line connects the two cone points in the resulting convex set. The extreme points are the two ends together with all but one point of the central circle.

Theorem 8.80 (Krein–Milman). Let X be a locally convex space over R, and let K ⊆ X be a compact convex subset. Then K is the closed convex hull of its extreme points. In particular, if K is non-empty then K has some extreme points. For the proof the following extension of the definition of extreme points will be useful. A subset E ⊆ K of a convex set is called an extremal subset of K if E is convex, non-empty, and if x = sy + (1 − s)z for x ∈ E, y, z ∈ K and s ∈ (0, 1) forces y, z ∈ E. To better understand this notion, it may be helpful to the reader to find all extremal subsets of a polygon in R2 or of a polytope in R3 . Proof of Theorem 8.80. We assume without loss of generality that K 6= ∅. The proof uses Zorn’s lemma, applied to the set F = {E ⊆ K | E is an extremal closed subset of K}

302

8 Locally Convex Vector Spaces

with the partial order defined by E1 < E2 if E1 ⊆ E2 . Note in particular that K ∈ F , so that F is non-empty. We need to show that for any linearly ordered subset {Eα | α ∈ I} there exists an element E ∈ F with E < Eα for every α ∈ I. We claim that \ E= Eα α∈I

is such an element. For this we only need to show that E ∈ F , as the fact that E < Eα for every α ∈ I then follows directly from the definition of 1 . We complete the argument by showing that there cannot be such an extreme point of the unit ball. By definition, |an | 6 1 for all n > 1 and limn→∞ an = 0. Therefore, there exists some n0 with |an0 | < 21 and then the sequences (bn ) and (cn ) defined by ( an for n 6= n0 , bn = 1 an + 2 for n = n0 and cn =

(

an an −

1 2

for n 6= n0 , for n = n0

are different, both belong to the unit ball by construction, and we have (an ) = 21 (bn ) + 12 (cn ), which shows that (an ) is not an extreme point. Exercise 8.82. In Example 8.81 we showed that V ∗ is never isometrically isomorphic to c0 (N). Generalize the result in two ways as follows. Show that there is no Banach space V with the property that V ∗ is isomorphic to C0 (X), where X is a σ-compact, locally compact, non-compact space and the isomorphism is only assumed to be a bounded operator with a bounded inverse. Exercise 8.83. Let X be a σ-compact, locally compact metric space. M(X)

(a) Find the extreme points of the closed unit ball B1 in the space of signed measures on X, and the extreme points of the convex set P(X) of all probability measures on X. (b) Assume in addition that X is compact and infinite. Show that the assumptions of the Krein–Milman theorem (Theorem 8.80) hold, but that P(X) is not the convex hull of its extreme points. In other words, taking the closure of the convex hull is important in infinite dimensions. This assumes we are working over R, but as mentioned before, the arguments extend easily to C. †

304

8 Locally Convex Vector Spaces

(c) Assume now instead that X is non-compact. Show that the conclusion of the Krein– Milman theorem (Theorem 8.80) holds for P(X) (despite the fact that the assumptions do not).

In many applications where convex subsets of Banach spaces or locally convex spaces appear the extreme points play a special role. One instance of this arose in our brief excursion into ergodic theory (see Section 8.2.1), where the ergodic measures (which may now be seen to exist in great generality due to the Krein–Milman theorem) are precisely the extreme points of the convex set of invariant probability measures. The following example (or rather part (c) of it) shows how badly intuition can fail for convex sets in infinite dimensions. Exercise 8.84. Let K = {f ∈ CR ([0, 1]) | f (0) = 0, f is 1-Lipschitz}. (a) Show that K is convex and compact in the norm topology. (b) Show that any function f ∈ K which is piecewise linear and has slope ±1 wherever f is differentiable is an extreme point. (c) Show that the extreme points in K are dense in K. (d) Describe all the extreme points of K.

8.6.2 Choquet’s Theorem We now further refine the Krein–Milman theorem by showing that every point of a compact convex set can be obtained as a ‘generalized convex combination’ of extreme points of K. However — even after taking account of convergence questions — convex combinations alone will not be sufficiently general, as the next example shows. Exercise 8.85. Let X and P(X) ⊆ C(X)∗ be as in Exercise 8.83. Describe the elements of P(X) that can be written P as a convergent (in norm, or equivalently in the weak* to∞ pology) convexP combination n=1 cn νn of extreme points νn ∈ P(X) with cn > 0 for ∞ all n > 1 and c = 1. Now let X = [0, 1] and give examples of Borel probability n n=1 measures that cannot be obtained as such limits.

Definition 8.86. Let K ⊆ X be a compact convex subset of a locally convex vector space, and suppose that the induced topology on K is metrizable. Let µ be a Borel probability measure on K and x ∈ K. We say that µ represents x (or that x is the barycentre of µ) if Z ℓ(x) = ℓ dµ K

for every ℓ ∈ X ∗ .

Notice that each ℓ ∈ X ∗ is continuous on K and hence is integrable with respect to any µ on K as in the definition above. Essential Exercise 8.87. Show that the barycentre of a Borel probability measure µ on a metrizable compact convex subset is uniquely determined by µ.

8.6 Convex Sets

305

Throughout the discussion of this subsection we will assume that the induced topology on K ⊆ X is metrizable, writing simply (as above) that K is a metrizable subset of X. With Proposition 8.11 and Exercise 8.12 in mind, it should be clear why we do not wish to assume that X itself is metrizable. Lemma 8.88. Let K be a metrizable compact convex subset of a locally convex vector space X over R. Let µ be a probability measure on K. Then µ has a barycentre in K. Proof. For every ℓ ∈ X ∗ we define the closed hyperplane  R Hℓ = x ∈ X | ℓ(x) = K ℓ dµ .

Notice that for a fixed ℓ ∈ X ∗ , this hyperplane is not empty since ℓ is linear. The lemma is equivalent to the statement that \ K∩ Hℓ 6= ∅. ℓ∈X ∗

However, since K is compact and the sets Hℓ are closed, it is sufficient to show that K ∩ Hℓ1 ∩ · · · ∩ Hℓn 6= ∅ (8.37) for any ℓ1 , . . . , ℓn ∈ X ∗ and n > 1. For this, consider the continuous linear map L : X −→ Rn

x 7−→ (ℓ1 (x), . . . , ℓn (x))

and the compact convex set L(K) ⊆ Rn . We claim that Z  Z b= ℓ1 dµ, . . . , ℓn dµ ∈ L(K), K

(8.38)

K

and note that this is equivalent to (8.37). Hence the claim implies the lemma. Suppose therefore that (8.38) does not hold. Then by TheoremP 8.73 (applied to L(K) ⊆ Rn ) there exists a functional φ defined by φ(t) = nj=1 aj tj for t ∈ Rn and some row vector a ∈ Rn such that φ(b) > sup φ(L(x)). x∈K

Defining ℓ∗ = Z

K

Pn

j=1

ℓ∗ dµ =

aj ℓj = φ ◦ L ∈ X ∗ we now obtain

n X j=1

aj

Z

K

ℓj dµ = φ(b) > sup φ(L(x)) = sup ℓ∗ (x), x∈K

x∈K

306

8 Locally Convex Vector Spaces

which gives a contradiction since µ is assumed to be a probability measure.  Exercise 8.89. Let K ⊆ X be a metrizable compact convex subset of a locally convex vector space over R, and let M ⊆ K be a closed subset. Prove that the closure of the convex hull of M is precisely the set of all barycentres of all Borel probability measures µ with Supp µ ⊆ M .

Theorem 8.90 (Choquet’s theorem). Let K ⊆ X be a metrizable compact convex subset of a locally convex space X over R. Then the set of extreme points ext(K) of K is Borel measurable. Moreover, for any  x0 ∈ K there exists a Borel probability measure µ on K with µ ext(K) = 1 that represents x0 . While the issues arising in the proof are functional-analytic, the intuition behind the statement is essentially geometric, which is more visible in a finitedimensional version illustrated in the next exercise. Exercise 8.91 (Carath´ eodory’s form of Minkowski’s theorem). Let K ⊆ Rn be a compact convex subset. Show that any point x0 ∈ K is a convex combination of (n + 1) extreme points of K.

For the proof of Theorem 8.90 we will need the following lemma and some notation. We write A for the space of affine functions on X, that is, functions of the form a(x) = ℓ(x) + c for some ℓ ∈ X ∗ and c ∈ R. Moreover, recall that kf kK,∞ denotes the supremum norm of a function f restricted to some subset K ⊆ X. Lemma 8.92 (Upper envelope). Let K ⊆ X be as in Theorem 8.90. Given a bounded function f : K → R we define the upper envelope of f by f (x) = inf{a(x) | a ∈ A and a > f on K} for all x ∈ K. Then (1) f 6 f 6 kf kK,∞ . (2) f is concave (that is, f (λx + (1 − λ)y) > λf (x) + (1 − λ)f (y) for all x, y in K and λ with 0 6 λ 6 1). −1 (3) f is upper-semicontinuous (that is, the pre-image f ((−∞, c)) is open for every c ∈ R), and so in particular Borel measurable. (4) If f is concave and upper-semicontinuous then f = f . (5) Given r > 0, a ∈ A, and another bounded function g on K, we have: (a) rf = rf , (b) f + a = f + a, (c) f + g 6 f + g, and (d) |f − g| 6 kf − gkK,∞ .

8.6 Convex Sets

307

Proof. Since the constant function kf kK,∞ belongs to A, (1) follows at once from the definition of f . Given x, y ∈ K, λ ∈ [0, 1] and a ∈ A with a > f on K we see that  a λx + (1 − λ)y = λa(x) + (1 − λ)a(y) > λf (x) + (1 − λ)f (y) by definition of f (x) and f (y). The claim in (2) follows by taking the infimum over a. For (3), let c ∈ R and x0 ∈ K such that f (x0 ) < c. Then there exists some a ∈ A with a > f and a(x0 ) < c. Clearly {x ∈ K | a(x) < c} ⊆ {x ∈ K | f (x) < c} and, since a is continuous, the former is an open neighbourhood of x0 . Since x0 was an arbitrary element of {x ∈ K | f (x) < c} we obtain (3). For (4), assume that f is concave and upper-semicontinuous. We claim that this implies that the set M = {(x, c) ∈ K × R | c 6 f (x)} of points on or underneath the graph of f is convex and closed. Here we consider M as a subset of X ′ = X × R, which is the locally convex vector space with the collection of semi-norms defined by k(x, c)k′α = max{kxkα , |c|} for (x, c) ∈ X ′ and every semi-norm k·kα of the locally convex vector space X. To see that M is convex, fix (c1 , x1 ), (c2 , x2 ) ∈ M and λ ∈ [0, 1]. Then we have c1 6 f (x1 ), c2 6 f (x2 ), and λc1 + (1 − λ)c2 6 λf (x1 ) + (1 − λ)f (x2 ) 6 f (λx1 + (1 − λ)x2 ), which shows that λ(x1 , c1 ) + (1 − λ)(x2 , c2 ) ∈ M . Notice that in order to show that M is closed in X ′ it is enough to show that M is closed in K × R, since the latter is closed in X ′ . So suppose that (x0 , c0 ) ∈ (K × R)rM . By definition, this means that f (x0 ) < c0 , so we can choose c ∈ (f (x0 ), c0 ), and apply the definition of upper-semicontinuity to see that f −1 ((−∞, c)) × (c, ∞) is open in K × R, contains (x0 , c0 ), and does not intersect M . This means that M is closed in K × R and hence in X ′ . Now let x0 ∈ K and f (x0 ) < c0 , so that z = (x0 , c0 ) ∈ / M . Applying ∗ Theorem 8.73 we find some functional ℓ′ ∈ (X ′ ) with ℓ′ ((x, c)) < ℓ′ ((x0 , c0 )) for all (x, c) ∈ M . Clearly ℓ′ ((x, c)) = ℓ(x) + ac for some ℓ ∈ X ∗ and α ∈ R. Since for x0 we have (x0 , f (x0 )) ∈ M , f (x0 ) < c0 and ℓ(x0 ) + αf (x0 ) < ℓ(x0 ) + αc0 , we see that α > 0. Dividing ℓ′ by α we may thus assume that α = 1. Using the points (x, f (x)) ∈ M for x ∈ K we obtain ℓ(x) + f (x) < ℓ(x0 ) + c0 for all x ∈ K. Then a(x) = −ℓ(x) + c0 + ℓ(x0 ) defines a function a ∈ A

308

8 Locally Convex Vector Spaces

with f (x) < a(x) for all x ∈ K, and so f (x0 ) 6 a(x0 ) = c0 . Since x0 ∈ K and c0 > f (x0 ) were arbitrary we deduce that f = f as required. Now let r > 0, a ∈ A and g be as in (5). For r = 0 the statement is clear. For r > 0 and a function a ∈ A on X we have a > f if and only if ra > rf . Therefore (a) follows from standard properties of the infimum. Next notice that af > f , ag > g and af , ag ∈ A implies that A ∋ af + ag > f + g as in the definition of f + g. Hence af + ag > f + g, and taking the infimum over af and ag separately gives f + g > f + g, which is (c). Applying this for g = a ∈ A and using a = a and −a = −a, we see that f + a 6 f + a = f + a = f + a − a + a 6 f + a + −a + a = f + a, and so f + a = f + a, giving (b). The proof of (d) is similar: In fact f =f −g+g 6f −g+g and thus f − g 6 f − g 6 kf − gkK,∞ by (1). Reversing the roles of f and g gives |f − g| 6 kf − gkK,∞ and hence the lemma.  Proof of Theorem 8.90. By Lemma 2.46, C(K) is separable and so the same applies to the subspace A ⊆ C(K) of affine functions (when restricted to K and using the supremum norm k · kK,∞ on K). Hence we may choose a dense countable subset D = {an | n ∈ N} ⊆ A with D = A ⊆ C(K). Notice that D is still dense if we remove from it any (potentially contained) an with an |K = 0, so we may assume that kan kK,∞ 6= 0 for all n ∈ N. Moreover, since X ∗ separates points by Theorem 8.73, we know that D separates points in K. Now define the function F =

∞ X

1 a2n , 2 ka k2 n n K,∞ n=1

whose properties are crucial for the proof of the theorem. Since the series converges uniformly on K, we see that F ∈ C(K). We claim that F is strictly convex, meaning that F (λx + (1 − λ)y) < λF (x) + (1 − λ)yF (y) for all x 6= y ∈ K and λ ∈ (0, 1). To see this, note that an (λx + (1 − λ)y) = λan (x) + (1 − λ)an (y) for any n ∈ N, x 6= y ∈ K and λ ∈ (0, 1) so that

(8.39)

8.6 Convex Sets

309

an (λx + (1 − λ)y)2 6 λan (x)2 + (1 − λ)an (y)2

(8.40)

by convexity of the map t 7→ t2 . Also, since D separates points we have an0 (x) 6= an0 (y) for some n0 ∈ N and hence a strict inequality in (8.40) for this choice of n = n0 (by strict convexity of t 7→ t2 ). Summing over n gives (8.39). We now fix x0 ∈ K, which we wish to represent. Using x0 we define the subspace V = A + RF ⊆ C(K), the linear functional Λ(a + cF ) = a(x0 ) + cF (x0 ) for any a + cF ∈ V , and the function p defined by p(f ) = f (x0 ) for all f in C(K). By Lemma 8.92(5), the function p is norm-like, as required in the Hahn– Banach lemma (Lemma 7.1). Also, by Lemma 8.92(5) we have Λ(a + cF ) = a(x0 ) + cF (x0 ) = a + cF (x0 ) = p(a + cF ) whenever c > 0. For c < 0 the function cF is concave and continuous, so that cF = cF by Lemma 8.92(4). Using also the inequality in Lemma 8.92(1) (multiplied by c < 0) and Lemma 8.92(5b) we now see that Λ(a + cF ) = a(x0 ) + cF (x0 ) 6 a(x0 ) + cF (x0 ) = a(x0 ) + cF (x0 ) = a + cF (x0 ) = p(a + cF ) also for c < 0. Hence all the assumptions of the Hahn–Banach lemma are satisfied and so there is an extension of Λ (which we again denote by Λ) to all of C(K) satisfying Λ(f ) 6 p(f ) = f (x0 ) 6 kf kK,∞ for all f ∈ C(K) (where the last inquality is given by Lemma 8.92(1)). For a non-negative function f ∈ C(K) we also have −f 6 0 and so Λ(f ) > 0. Since 1 ∈ A ⊆ V we have Λ(1) = 1(x0 ) = 1 by definition of Λ. Therefore we may apply the Riesz representation theorem (Theorem 7.44) to obtain a Borel probability measure µ on K with Z Λ(f ) = f dµ (8.41) K

for all f ∈ C(K). Applying Λ to linear functionals ℓ ∈ X ∗ ⊆ A we see that

310

8 Locally Convex Vector Spaces

Z

ℓ dµ = Λ(ℓ) = ℓ(x0 ),

K

so x0 is the barycentre of the probability measure µ. It remains to see  that ext(K) is measurable and that µ ext(K) = 1. For this, we claim that Z Z F dµ = F (x0 ) = F dµ. (8.42) K

K

The first of these equations follows directly by applying the functional to F since µ satisfies (8.41) and Λ(F ) was defined as F (x0 ). The proof of the second equality is less direct: By Lemma 8.92(1) we have F 6 F and hence Z Z F dµ 6 F dµ. K

K

On the other hand a ∈ A and a > F implies that a > F and therefore Z Z F dµ 6 a dµ = Λ(a) = a(x0 ). K

K

Taking the infimum over these a ∈ A we get Z F dµ 6 F (x0 ) K

by definition of F . However, since the first equality in (8.42) is already proven these inequalities together prove (8.42). In particular, we see from (8.42) and the inequality F 6 F that  µ {x ∈ K | F < F } = 0. (8.43)

We claim that

ext(K) ⊇ {x ∈ K | F = F }.

(8.44)

To see this, let z = λx + (1 − λ)y ∈ K be a non-extreme point (that is, with x 6= y ∈ K and λ ∈ (0, 1)). Since F is strictly convex, this gives F (z) < λF (x) + (1 − λ)F (y)

6 λF (x) + (1 − λ)F (y) 6 F (z)

by Lemma 8.92(1) and the concavity of F from Lemma 8.92(2). Hence F (z) < F (z), and the claim follows. Note that once we have  shown the measurability of ext(K), this claim implies that µ Xr ext(K) = 0 by (8.43).

8.7 Further Topics

311

To see that ext(K) is measurable, let d denote a metric on K that induces the topology on K and notice that Kr ext(K) = {λx + (1 − λ)y | λ ∈ (0, 1), x 6= y ∈ K} [ = {λx + (1 − λ)y | λ ∈ [ n1 , 1 − n1 ]; x, y ∈ K with d(x, y) > n1 } | {z } n∈N =Fn

is a countable union of sets Fn each of which is compact since Fn is a continuous image of the compact set [ n1 , 1 − n1 ] × {(x, y) ∈ K 2 | d(x, y) >

1 n}

for each n > 1. Therefore, µ(Kr ext(K)) = 0 by (8.43) and (8.44) and µ represents x0 by the argument after (8.43). This proves the theorem.  Exercise 8.93. Let K be a metrizable compact convex subset of a locally convex vector space over R, and let x0 ∈ K. Show that x0 is an extreme point if and only if µ = δx0 is the only Borel probability measure on K that represents x0 .

The material above follows the monograph of Phelps [86] loosely. We refer to those notes for many interesting applications of Choquet’s theorem as well as the generalization of this result to the case of general compact convex sets in locally convex vector spaces (without the metrizability assumption) in the form of the Choquet–Bishop–de Leeuw theorem.

8.7 Further Topics Many proofs and theories depend on weak* compactness, the notion of locally convex vector spaces, or the study of extreme points of convex subsets. We only mention a few samples and give further references. • Decay of Matrix Coefficients for Simple Lie Groups (the Howe–Moore Theorem): If a simple non-compact Lie group G acts unitarily on a Hilbert space H without non-zero G-fixed vectors, then the matrix coefficients hπg v, wi decay to zero as g → ∞ in G, for any v, w ∈ H. In the language of ergodic theory, this means that every measure-preserving ergodic G-action is mixing. This may sound complicated but the proof for SLd (R) only needs as inputs the equality case of the Cauchy–Schwarz inequality, the Banach–Alaoglu theorem, and matrix multiplication. We refer to [27, Sec. 11.4] for a discussion of the easier case G = SL2 (R), and to [25] for the general case. The weak* compactness is here used on the Hilbert space H. • In the study of von Neumann algebras two more topologies on B(X, Y ) are used (particularly in the case where X = Y is a Hilbert space):

312

• •

8 Locally Convex Vector Spaces

the ultra-strong operator topology and the ultra-weak operator topology. We refer to von Neumann [79] for the original formulation of the ultrastrong topology, and to the monograph of Takesaki [101, Ch. II] for a full treatment. As discussed in Section 8.5 the locally convex vector space Cc∞ (U ), also written D(U ), is the space of test functions for distributions on U ⊆ Rd in the sense that one can define distributions as continuous linear functions on D(U ). We refer to Folland [33, Ch. 9] for a treatment of distributions. The Fr´echet space S (Rd ) of Schwartz functions on Rd has important connections to Fourier transforms, and is the space of test functions for tempered distributions on Rd (see Section 9.2.3 and Folland [33, Ch. 9]). There are further general classes of locally convex vector spaces. Among them are the nuclear spaces (C ∞ ([0, 1]), Cb∞ (U ) and S (Rd ) are examples) and the LF-spaces (these are strict inductive limits of Fr´echet spaces; examples include Cc (U ) and Cc∞ (U )). We refer to Bourbaki [13] or Tr`eves [106] for more details. For a topological group G we will define in Section 9.3.1 the notion of a positive-definite function. If G is locally compact and abelian, the extreme points of the set of positive-definite functions (properly normalized) will be the set of characters of G. For more general groups the extreme points give rise to the irreducible unitary representations of the given group G (see Exercises 9.55 and 12.59).

Chapter 9

Unitary Operators and Flows, Fourier Transform

In this chapter we return to the topic of spectral theory by considering unitary operators. Moreover, we generalize our discussion of Fourier series by considering the Fourier transform for functions on Rd and use it to obtain the spectral theory of unitary flows (unitary representations of R or Rd ). As is natural in discussions concerning spectral theory (which will generalize eigenvalues and eigenvectors) we will only consider separable complex Hilbert spaces in this chapter.

9.1 Spectral Theory of Unitary Operators Let H1 , H2 be Hilbert spaces. Recall that a linear operator U : H1 → H2 is said to be unitary if U is surjective and kU vkH2 = kvkH1 for all v ∈ H1 (or, equivalently, if U ∗ = U −1 ). Operators V1 : H1 → H1 and V2 : H2 → H2 on Hilbert spaces are said to be unitarily isomorphic if there is a unitary operator U : H1 → H2 with U V1 = V2 U . In contrast to the spectral theory of compact self-adjoint operators, it is not in general true that unitary operators on a Hilbert space are diagonalizable (that is, unitarily isomorphic to a diagonal operator). This may be seen in the next model example. Example 9.1. Let µ be a finite measure on T, let H = L2µ (T) = L2 (T, µ) and write again χ1 (x) = e2πix for x ∈ T. Define the unitary multiplication operator U = Mχ1 : H ∋ f 7−→ Mχ1 (f ) = χ1 f ∈ H which gives a special case of Exercise 6.25(b). Note that any eigenvalue of a unitary operator like U must have absolute value one. Moreover, this unitary operator U has λ = e2πix0 as an eigenvalue if and only if f = 1{x0 } is non-zero as an element of H = L2µ (T). That is, λ = e2πix0 is an eigenvalue if and only if x0 is an atom of µ, meaning © Springer International Publishing AG 2017 M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications, Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_9

313

314

9 Unitary Operators and Flows, Fourier Transform

that µ({x0 }) > 0. P Moreover, U is diagonalizable if and only if µ is atomic, ∞ meaning that µ = k=1 ck δxk for some ck > 0 and xk ∈ T.

The type of operators seen in Example 9.1 are not difficult to deal with even though they are usually not diagonalizable. Having abandoned the false hope that all unitary operators will be diagonalizable (that is, describable ultimately in terms of only countably many scalar multiplications on the ground field), the next best hope one might have is that they can be fully described in terms of multiplication by characters as in Example 9.1 (at the expense of allowing the underlying measure µ to vary). That this is in fact true is the content of the spectral theory of unitary operators. Theorem 9.2 (Spectral theory of unitary operators). Let H be a separable complex Hilbert space and let U : H → H be a unitary operator. Then H can be split into a countable direct sum M H= Hn n>1

of closed mutually orthogonal subspaces Hn , invariant under U and U ∗ , such that for each n > 1 the unitary operator Un = U |Hn : Hn → Hn is unitarily isomorphic to the multiplication operator Mχ1 : L2 (T, µn ) → L2 (T, µn ) for some finite measure µn on T. We will add some more information on the emerging sequence of measures in Section 9.1.3. 9.1.1 Herglotz’s Theorem for Positive-Definite Sequences Although this is not immediately apparent, a useful concept for the proof of Theorem 9.2 is the notion of a positive-definite sequence. Definition 9.3. A sequence (pn )n∈Z of complex numbers is called positivedefinite if for any finite sequence (cn )n∈Z ∈ cc (Z) of complex numbers we have X cm cn pm−n > 0, m,n∈Z

meaning that the sum is real and non-negative. It is not obvious that non-trivial positive-definite sequences exist at all. There are two ways to construct examples. Example 9.4 (First basic construction). Let µ be a finite measure on T. Then the Fourier coefficients pn (µ) of µ defined by Z pn (µ) = χn dµ T

9.1 Spectral Theory of Unitary Operators

315

for n ∈ Z form a positive-definite sequence. Example 9.5 (Second basic construction). Let U : H → H be a unitary operator on a Hilbert space, and fix some v ∈ H. Then the inner products pn (v) = hU n v, viH for n ∈ Z form a positive-definite sequence. Both these claims require justification. Proof of statement in Examples 9.4 and 9.5. Notice first that Example 9.4 is a special case of Example 9.5. Indeed, if H = L2 (T, µ), U = Mχ1 , and v = 1T = 1 is the constant function, then U n v = χn and so Z pn (µ) = χn dµ = hU n v, viL2 (T,µ) = pn (v) T

for all n ∈ Z. Thus it is enough to consider the sequence (pn (v)) from Example 9.5. Let (cn ) be a finite complex sequence as in Definition 9.3. Then X X cm cn hU m v, U n viH cm cn pm−n (v) = m,n∈Z

m,n∈Z

=

*

X

m

cm U v,

m∈Z

X

n∈Z

since the inner product is positive-definite.

n

cn U v

+

> 0,

H



The main step towards the proof of Theorem 9.2 is the following description of all positive-definite sequences. Theorem 9.6 (Herglotz’s theorem). Let (pn )n∈Z be a positive-definite sequence. Then there exists a uniquely determined finite measure µ on T for which Z pn = pn (µ) = χn dµ T

for all n ∈ Z.

Note that Herglotz’s theorem shows in particular that the Examples 9.4 and 9.5 both give rise to all positive-definite sequences. Proof of Theorem 9.6. Let (pn ) be a positive-definite sequence. Fix some N > 1. For any θ ∈ T we may use the defining property for a positivedefinite sequence with the finite sequence ( χ−n (θ) if 1 6 n 6 N, cn = 0 otherwise, of coefficients to obtain that the function

316

9 Unitary Operators and Flows, Fourier Transform N 1 X FN (θ) = χ−m+n (θ)pm−n N m,n=1

is non-negative. Therefore, we may define a positive measure µN by the forR mula dµN (θ) = FN (θ) dθ, that is, µN (B) = B FN (θ) dθ for any measurable B ⊆ T. Notice that µN (T) =

Z

T

FN (θ) dθ =

N 1 X p0 = p0 . N m=1

By the Riesz representation theorem (Theorem 7.44) and the Banach– Alaoglu theorem (Theorem 8.10) in the combined form of Proposition 8.27 this sequence has a subsequence (µNk ) converging in the weak* topology to a positive measure µ. Since µNk (T) = p0 for all k we also have µ(T) = p0 . Now let ℓ ∈ Z and calculate Z Z χℓ dµ = lim χℓ FNk (θ) dθ T

k→∞

T

Z Nk 1 X Nk − |ℓ| pm−n χℓ−m+n (θ) dθ = lim pℓ = pℓ , k→∞ Nk k→∞ Nk T m,n=1

= lim

R where we used the fact that T χℓ−m+n dθ = δm,n+ℓ (which is 1 if m = n + ℓ and 0 otherwise) and |[1, N ] ∩ ([1, N ] + ℓ) ∩ Z| = N − |ℓ|. This proves the existence of the finite measure µ as in the theorem. For uniqueness we note that the algebra of characters is dense in C(T) and refer to the uniqueness statement in Theorem 7.44.  9.1.2 Cyclic Representations and the Spectral Theorem Recall the notion of a unitary representation from Definition 3.73 and notice that a unitary operator U defines (and is defined by) an associated unitary representation π of the group Z given by πn = U n for n ∈ Z.

Definition 9.7. A unitary representation π of a group G on a Hilbert space H is called cyclic if H = Hv = hπg v | g ∈ Gi

for some v ∈ H. We will call v a generator of the cyclic representation. A closed subspace H′ ⊆ H is π-invariant if πg H′ ⊆ H′ for all g ∈ G. The cyclic subspace Hv generated by v ∈ H = hπg v | g ∈ Gi is the minimal π-invariant closed subspace containing v. If G = Z then we also refer to a cyclic representation of Z as a cyclic Hilbert space with respect to the unitary operator π1 : H → H.

9.1 Spectral Theory of Unitary Operators

317

Cyclic representations can be thought of as the building blocks of more general unitary representations, as the following construction shows. If π is a unitary representation of G on a separable Hilbert space H (or π : G H is a unitary representation), then we can write H as a direct sum M H= Hn ,

ý

n>1

of pairwise orthogonal closed π-invariant subspaces Hn , each of which is a cyclic representation. Indeed, if w1 , w2 , . . . is an orthonormal basis of H as in Theorem 3.39 and H1 = Hw1 = hπg w1 | g ∈ Gi, then H1 is π-invariant. Together with the fact that πg∗ = πg−1 for all g ∈ G and Lemma 6.30, this implies that H1⊥ is also π-invariant. Define H2 = Hw2⊥ , where w2⊥ ∈ H1⊥ is the orthogonal projection of w2 onto H1⊥ . Again the spaces H1 ⊕ H2 and (H1 ⊕ H2 )⊥ are π-invariant, and we can continue the ⊥ process by defining H3 = Hw3⊥ , where w3⊥ ∈ (H1 ⊕ H2 ) is the orthogonal projection of w3 onto (H1 ⊕ H2 )⊥ . Clearly w1 ∈ H1 , w2 ∈ H1 ⊕ H2 , and w3 ∈ H1 ⊕ H2 ⊕ H3 . Repeating this construction inductively gives a sequence of pairwise orthoL gonal, closed, π-invariant, cyclic subspaces (Hn ) with H = H n (see n>1 Exercise 3.37). Therefore the following corollary to Theorem 9.6 will also be the main step towards the proof of Theorem 9.2. Corollary 9.8 (Cyclic spaces). Let U : H → H be a unitary operator on a complex Hilbert space such that H is cyclic with respect to U , with H = Hv = hU n v | n ∈ Zi for some v ∈ H. Then there is a uniquely determined finite measure µv on T, called the spectral measure of v with respect to U , such that U is unitarily isomorphic to the multiplication operator Mχ1 on L2 (T, µv ) with the vector v ∈ H corresponding to 1 ∈ L2 (T, µv ). If φ : H → L2 (T, µv ) is the unitary isomorphism as in the diagram then ‘corresponding’ here means that φ(v) = 1. In other words, for cyclic Hilbert spaces we have a commutative diagram

318

9 Unitary Operators and Flows, Fourier Transform

H   φy

U

−−−−→

H  φ y

L2 (T, µv ) −−−−→ L2 (T, µv ) Mχ1

of unitary maps. Together with the discussion before Theorem 9.2, we see that spectral measures should be thought of as a replacement for, or a generalization of, eigenvalues. As we will see during the proof, the spectral measure µv stores precisely the values of the inner products hU n v, vi for a given unitary operator U on a Hilbert space H and vector v ∈ H. In the case of an eigenvector with eigenvalue λ0 = e2πix0 , we obtain the Dirac measure kvk2 δx0 . At the opposite extreme, it could be that the vectors . . . , U −1 v, v, U v, . . . are all mutually orthogonal, in which case µv is the multiple kvk2 mT of the Lebesgue measure. Proof of Corollary 9.8. Let v ∈ H be a generator of H = Hv = hU n v | n ∈ Zi. By Example 9.5, we know that if pn (v) = hU n v, viH for n ∈ Z then (pn ) is a positive-definite sequence. By Theorem 9.6 there exists a uniquely determined finite measure µv on T with Z pn (v) = pn (µv ) = χn dµv for all n ∈ Z. We wish to define a unitary map φ : Hv = hU n v | n ∈ Zi −→ L2 (T, µv ) to be the unique extension of the map that sends any finite linear combinations of the vectors U n v to the corresponding trigonometric polynomial, X

|n|6N

φ : Hv −→ L2 (T, µv ) X cn U n v 7−→ cn χn . |n|6N

While this is a natural attempt at defining the map φ, it is not clear whether it produces a well-defined map. Curiously (at first encounter), this will follow from the map being an isometry: for any finite complex sequence (cn ), we have

9.1 Spectral Theory of Unitary Operators

319

X

2 X X

cn U n v = hcm U m v, cn U n viH = cm cn = U m−n v, v H

H | {z } n∈Z m,n∈Z m,n∈Z =

X

cm cn

m,n∈Z

=

Z X

Z

cm χm

m∈Z

pm−n (v)=pm−n (µv )

χm−n dµv

X

n∈Z

X

2 

cn χn dµv = cn χn n∈Z

L2 (T,µv )

.

We now show that this implies that φ is well-defined on the set P P of finite linear combinations by the following argument. If n∈Z cn U n v = n∈Z c′n U n v for finite complex sequences (cn ) and (c′n ), then

X

X

0= (cn − c′n )U n v = (cn − c′n )χn

n∈Z

P

H

P

n∈Z

L2 (T,µv )

and so n∈Z cn χn = n∈Z c′n χn in L2 (T, µv ). Clearly the map φ so defined is now an isometry on a dense subspace of Hv , and so extends by the automatic extension to the closure (Proposition 2.59) to an isometry from Hv into L2 (T, µv ). Furthermore, the image of φ contains all trigonometric polynomials on T, and which form a dense subset of C(T) by Proposition 3.65 (and thus also of L2 (T, µv )). Since Hv = H is a Hilbert space and φ is an isometry, we see that φ(Hv ) ⊆ L2 (T, µv ) is complete and dense, and hence equal to L2 (T, µv ). It remains to check that φ ◦ U = Mχ1 ◦ φ. Let (cn ) be a finite complex sequence. Then ! X X n U cn U v = cn U n+1 v, n∈Z

and so  X  X φ U cn U n v = cn χn+1 n∈Z

n∈Z

n∈Z

= χ1

X

n∈Z

X  cn χn = Mχ1 φ cn U n v . n∈Z

That is, the desired formula holds on a dense subset of Hv and so by continuity on all of Hv . This proves the corollary.  Proof of Theorem 9.2. Let U be a unitary operator on a complex separable Hilbert space H. By the discussion after Definition 9.7 H can be written as a direct sum of closed mutually orthogonal cyclic subspaces. Applying Corollary 9.8 to each of them proves the theorem.  Exercise 9.9. (a) Let Hv ∼ = L2 (T, µv ) be as in Corollary 9.8. Let w ∈ Hv and suppose that f ∈ L2 (T, µv ) corresponds to w. Characterize the property Hv = Hw in terms of f . (b) Apply (a) to the unitary operator defined by U ((an )) = (an−1 ) on H = ℓ2 (Z) and the vector (vn ) with vn = 0 for n 6= 0 and v0 = 1.

320

9 Unitary Operators and Flows, Fourier Transform

The spectral theory of unitary operators has the spectral theory of selfadjoint operators as a consequence, as the next exercise shows. However, we will also give an independent and much more detailed treatment of this theory in Chapter 12. Exercise 9.10. (a) For any bounded operator A : V → V on a Banach space V and any P∞ k power series f (z) = k=0 ck z whose radius of convergence is bigger than kAk, show that the natural definition of f (A) as the limit ofPthe sequence of operators obtained as ∞ n is the inverse funcpartial sums makes sense. Show that if g(z) = n=0 dn (z − c0 ) tion P to f definedPin a neighbourhood of f (0) = c (represented by another power series) 0  ∞ ∞ k n < ∞, then we have g(f (A)) = A. and n=0 |dn | k=0 |ck |kAk 1 (b) Let A : H → H be a non-zero self-adjoint operator. Replacing A by 2kAk A we may assume that kAk = 12 . Apply part (a) to A and the power series corresponding to eiz to obtain a unitary operator U : H → H. Show that kU − Ik 6 e1/2 − 1 < 1, and that A can be recovered from U via the power series representing 1i log(z) in a neighbourhood of 1. (c) Apply Theorem 9.2 to U and show that one can describe A on H by a direct sum of multiplication operators as in Exercise 6.25(b). In fact, for each of the direct summands the measure space can be chosen to be a copy of R together with a measure supported in [−kAk, kAk] and the multiplication operator can be chosen to be MI (f )(x) = xf (x).

For simplicity we have been working in this section with a single unitary operator (or the group Z) but the approach can be generalized to several commuting unitary operators (the group Zd ) as outlined in the following exercise. Exercise 9.11. (a) Define positive-definite functions on Zd (so that the sequence case corresponds to d = 1), and generalize Herglotz’s theorem to this context. (b) State and prove a corollary to part (a) regarding the spectral theory of d commuting unitary operators, so that Theorem 9.2 corresponds to the case d = 1.

9.1.3 Spectral Measures In this subsection we will strengthen the spectral theorem for unitary operators by studying the sequence of spectral measures appearing in it more carefully. We start with a few immediate consequences of the definition of the spectral measures. Lemma 9.12 (Behaviour of spectral measures). Let U : H → H be a unitary operator on a separable complex Hilbert space H.

(a) For any v ∈ H we have µv (T) = kvk2 . (b) If v, w ∈ H satisfy Hv ⊥ Hw , then µv+w = µv + µw . (c) If w ∈ Hv then µw ≪ µv . (d) If v1 , v2 , . . . ∈ H P are such L that the cyclic spaces that they generate are ∞ orthogonal, w = k wk ∈ corresponds to the funck=1 Hvk , and wk P 2 2 tion fk ∈ L (T, µvk ) for all k > 1, then dµw = ∞ k=1 |fk | dµvk . (e) If µv ⊥ µw then Hv ⊥ Hw .

9.1 Spectral Theory of Unitary Operators

321

Proof. (a) By definition we have µv (T) = (b) Assume that Hv ⊥ Hw . Then Z χn dµv+w = hU n (v + w), v + wiH

Z

χ0 dµv = U 0 v, v = kvk2 .

= hU n v, viH + hU n w, wiH =

Z

χn dµv +

Z

χn dµw

for all n ∈ Z, which implies that µv+w = µv + µw by uniqueness of the spectral measures (Corollary 9.8). ∼ L2 (T, µv ) with U correspond(c) Suppose that w ∈ Hv and recall that Hv = 2 2 ing to Mχ1 on L (T, µv ). If now f ∈ L (T, µv ) is the image of w under the unitary isomorphism, then Z

hU n w, wiH = Mχn1 f, f L2 (T,µ ) = χn |f |2 dµv v

for all n ∈ Z implies that dµw = |f |2 dµv . P P∞ 2 (d) Let w =P ∞ k=1 wk with wk ∈ Hvk for all k > 1, so that k=1 kwk k < ∞. ∞ 2 Then µ = k=1 µwk is finite by (a). For any k > 1, let fk ∈ L (T, µvk ) be the function corresponding to wk ∈ Hvk ∼ = L2 (X, µvk ), so dµwk = |fk |2 dµvk by the argument in (c). We now have hU n w, wiH =

∞ X

k=1

hU n wk , wk iH =

∞ Z X

k=1

χn |fk |2 dµvk =

Z

χn dµ

for all n ∈ Z, which implies (d).

(e) Assume that w = x + y with x ∈ Hv and y ∈ Hv⊥ . Then we have Hx ⊆ Hv and Hy ⊆ Hv⊥ so that µw = µx + µy by (b). However, (c) states that µx ≪ µv which implies that µx = 0 since µw ⊥ µv . Thus x = 0 by (a) and w = y ∈ Hv⊥ .  It is easy to check that the relation ν ≪ µ ≪ ν of mutual absolute continuity for measures µ and ν is an equivalence relation on the space of measures of a given measurable space, giving rise to the measure equivalence class of a given measure. Corollary 9.13 (Maximal spectral type). Let H and U be as in the spectral theorem (Theorem 9.2). The sequence of spectral measures in the spectral theorem can be chosen so that µ1 ≫ µ2 ≫ · · · . In this case µ1 is the maximal spectral measure in the sense that µv ≪ µ1 for any v ∈ H, and this property uniquely characterizes the measure equivalence class of µ1 , which is called the maximal spectral type of U .

322

9 Unitary Operators and Flows, Fourier Transform

Proof† . In the proof of Theorem 9.2 (more precisely, in the argument after Definition 9.7), we found a sequence of vectors w1 , w2 , . . . in H such that M H= Hwn n>1

(where the sum is possibly finite). Each of the vectors has a spectral measure νn = µwn for n > 1. Applying the Lebesgue decomposition theorem (from Proposition 3.29) to ν1 and νn for n > 2 we define νn = νnac + νn⊥

(9.1)

with νnac ≪ ν1 and νn⊥ ⊥ ν1 . From νn⊥ ⊥ ν1 it follows that there exists some measurable Bn ⊆ T such that νn⊥ (Bn ) = 0 and ν1 (TrBn ) = 0, which implies with (9.1) that νnac = νn |Bn and νn⊥ = νn |TrBn . We will use the set Bn to decompose wn into two components. In fact, under the unitary isomorphism between Hwn and L2 (T, νn ), let wn⊥ ∈ Hwn be the vector corresponding 1 ⊥ to cn 1TrBn where P cn > 0⊥ is chosen so that kwn k 6 2n . Let w = w1 + n>2 wn , which converges absolutely. Using Lemma 9.12(d) we now find that X dµw = dν1 + c2n 1TrBn dνn , n>2

or equivalently µw = ν 1 +

X

c2n νn⊥ .

n>2

νnac

νn⊥

From this we see that νn = + ≪ µw for all n > 2, and for n = 1 this holds trivially.L In particular, Lemma 9.12(d) now shows that µv ≪ µw for any v ∈ H = n>1 Hwn , as claimed in the corollary. It is easy to see that this property uniquely characterizes the measure class of µw . We claim P now that Hw1 ⊆ Hw . To see this, notice that by construction ν1 ⊥ n>2 c2n νn⊥ and let B ⊆ T have ν1 (TrB) = 0 = νn⊥ (B) for all n > 2. Then by Corollary 9.8 and Lemma 9.12(d), Hw contains an element x corresponding to 1B ∈ L2 (T, µw ) with spectral measure µx = µw |B = ν1 and an element y = w − x corresponding to 1TrB with spectral measP ure µy = n>2 c2n νn⊥ . Since µy ⊥ ν1 = µw1 we obtain from Lemma 9.12(e) ⊥ that y ∈ Hw . The same argument shows that x ∈ Hw1 even though 1 it requires some additional thought: Recall from our construction above that wn⊥ ∈ Hwn satisfies µwn⊥ ⊥ µw1 =Lν1 . Applying Lemma 9.12(c) and (d), this gives µz ⊥ ν1 for all z ∈ Hwn⊥ . Since µx = ν1 we n>2 L may apply Lemma 9.12(e) again to see that x ⊥ ⊥ . Moreover, we n>2 Hwn L have x ∈ Hw ⊆ Hw1 ⊕ n>2 Hwn⊥ , which implies that x ∈ Hw1 . Therefore, we have †

The corollary will not be needed in the remainder of the book, and the reader may skip its proof.

9.1 Spectral Theory of Unitary Operators

w = w1 +

X

wn⊥

323

(by construction of w)

n>2

=x+y

(by construction of x, y)

⊥ with x ∈ Hw1 and y ∈ Hw , so that w1 = x ∈ Hw and hence also Hw1 ⊆ Hw , 1 as claimed. The corollary follows by repeating this argument inductively. For this, ⊥ we note that we can describe the space Hw in the next step as a sum of cyclic spaces such that the first space is generated by the orthogonal projec⊥ tion w2′ of w2 to Hw . Repeating now the argument above constructs a new ′ ⊥ vector w such that µv ≪ µw′ for all v ∈ Hw and its cyclic subspace contains ′ the vector w2 . It follows that w2 belongs to the sum Hw ⊕ Hw′ . Continuing in this way, we can make sure that the direct sum of Hw , Hw′ , . . . still contains w1 , w2 , . . . and so coincides with H. By construction, the spectral measures of w, w′ , w′′ , . . . have the claimed absolute continuity property. 

The next two exercises extend the discussion above, and we invite the reader to consider in particular the case in which the measures arising are atomic, so that we are simply discussing eigenvalues and their multiplicities. Exercise 9.14. Given a unitary operator U on a separable complex Hilbert space H and a finite Borel measure ρ on T, write Hρ = {v ∈ H | µv ≪ ρ}. Show that Hρ is a closed subspace of H that is invariant under U and U ∗ , and that we have µw ⊥ ρ for any vector w ∈ (Hρ )⊥ . Exercise 9.15. (a) Given finite measures µ and ν on T with µ ≪ ν ≪ µ, show that the corresponding multiplication operators from Example 9.1 are unitarily isomorphic. (b) Reformulate Corollary 9.13 to show the existence of a sequence of finite measures (νn ) and a measure 6= n with m, n ∈ N ∪ {∞} with the property L ν∞ 2with νmn ⊥ νn2 for all m N that H ∼ = n>1 L (T, νn ) ⊕ L (T, ν∞ ) and the isomorphism carries U to the sum of the associated multiplication operators. (c) Show that the unitary isomorphism from (b) takes H(1) = {v ∈ H | µv ⊥ µw ∀ w ∈ H⊥ v } to L2 (T, ν1 ), and hence in particular deduce that H(1) is a closed subspace. (d) Show that the unitary isomorphism in (b) takes the closed subspace H(2) =

n

v ∈ H | ∃ v2 with Hv ⊥ Hv2 , µv = µv2 and µw ⊥ µv ∀ w ∈ (Hv ⊕ Hv2 )⊥

o

to L2 (T, ν2 )2 . (e) Generalize (d) to higher multiplicities and conclude that the sequence of the measure classes of ν1 , ν2 , . . . , ν∞ and subspaces in H corresponding to L2 (T, νn )n resp. L2 (T, νn )N are uniquely determined by U .

9.1.4 Functional Calculus for Unitary Operators As in Exercise 9.10 it is relatively straightforward to obtain a definition of h(A) for an analytic function h (defined by a power series) and bounded operator A (whose norm is less than the radius of convergence of the power series). For a multiplication operator Mg as in Exercise 6.25 one can go much

324

9 Unitary Operators and Flows, Fourier Transform

further. For example, one could define h(Mg ) by setting it equal to Mh◦g for any bounded measurable function h. The reader should verify at this point that this definition does generalize the prior definition for analytic functions to measurable functions. Since Theorem 9.2 and Exercise 9.10 describe arbitrary unitary or self-adjoint operators in terms of multiplication operators, this allows one to also define the operators obtained by applying h to these. However, from this definition it is not clear whether the result is independent of the choices made to describe the operator on H as a sum of multiplication operators. As it turns out, this is the case, and we will discuss this ‘functional calculus’ in greater detail and in a more general setting in Chapter 12. Here we aim to give a first taste of this theory, by discussing simpler instances of the results for a single unitary operator. For the discussion in this subsection it is convenient to use χ1 as an isomorphism from T to S1 = {z ∈ C | |z| = 1}, and transport the spectral measures µv on T provided by Corollary 9.8 to S1 . We will still use the same symbol for these measures so that their characterizing property becomes Z hU n v, viH = z n dµv (z) S1

for v ∈ H and n ∈ Z. Note that the multiplication operator in Corollary 9.8 then has the form MI (f )(z) = zf (z) for all z ∈ S1 and f ∈ L2 (S1 , µv ), where I(z) = z denotes the identity map on C. To obtain a good definition of h(U ) the trick is to define more general spectral measures. Definition 9.16. Let U : H → H be a unitary operator. A complex-valued measure µv,w on S1 is the spectral measure of v, w ∈ H with respect to U if Z z n dµv,w (z) = hU n v, wiH (9.2) S1

for all n ∈ Z. Proposition 9.17 (Non-diagonal spectral measures). Let U : H → H be a unitary operator on a separable complex Hilbert space H. For every v, w in H the spectral measure µv,w exists and is uniquely determined by v and w. Moreover, the spectral measure depends linearly on v and semi-linearly on w. For every h ∈ L ∞ (S1 ) there exists a bounded operator h(U ) : H → H that is characterized by the property Z hh(U )v, wiH = h dµv,w S1

L 2 1 for all v, w ∈ H. If H ∼ = n>1 L (S , µn ) as in Theorem 9.2, then h(U ) corresponds to the direct sum of the multiplication operators Mh on L2 (S1 , µn ) for n > 1.

9.1 Spectral Theory of Unitary Operators

325

The idea behind the proof of this corollary to Bochner’s theorem is simple, and relies on the polarization identity hU n v, wiH =

1 4

3 X ℓ=0

iℓ U n (v + iℓ w), v + iℓ w H

(9.3)

which is easily checked and gives the existence, by setting µv,w =

1 4

3 X

iℓ µv+iℓ w .

(9.4)

ℓ=0

An important case is to consider the characteristic functions h = 1B for a measurable set B ⊆ T (see the proof of Corollary 9.13 and Exercise 9.21). In particular, for every Borel subset B ⊆ S1 there exists an orthogonal projection operator E(B) = 1B (U ). Definition 9.18. The function E : B(S1 ) → B(H) is called a projectionvalued measure. For a given measurable B ⊆ S1 one should think of E(B) as the projection operator that projects onto the closed subspace on which all ‘generalized eigenvalues belong to B’. Proof of Corollary 9.17. Since the spectral measures µv+iℓ w for the vectors v + iℓ w ∈ H exist by Corollary 9.8, equation (9.3) shows that µv,w as defined in (9.4) satisfies the desired relationship. Since trigonometric polynomials (which on S1 are linear combinations of z n for n ∈ Z) are dense in C(S1 ) by Proposition 3.65, and complex-valued measures are naturally identified with linear functionals on C(S1 ) (by Theorem 7.54), it follows that the spectral measure µv,w is uniquely determined by (9.2). Since the inner product is sesqui-linear, this also implies the claimed sesqui-linearity. We claim that kµv,w k 6 4kvkkwk (9.5) (see Exercise 3.33 and Theorem 7.54 for the norm of µv,w , and also Exercise 9.19 for the correct upper bound). First note that by sesqui-linearity µtv,sw is equal to tsµv,w for all t, s ∈ C, and in particular µv,w = 0 if one of the vectors is zero. This allows us to assume p that v, w ∈ H are non-zero and that kvk = kwk after multiplying v by kwk/kvk, the vector w by its inverse and therefore assume without loss of generality that kvk = kwk. Using now the definition of µv,w in (9.4) with these new vectors and the formula µv+iℓ w (S1 ) = kv + iℓ wk2 6 4kvk2 for 0 6 ℓ 6 3, we obtain (9.5). Now let h ∈ L ∞ (S1 ) so that Z h dµv,w 6 4khk∞ kvkkwk. S1

Fix v ∈ H and consider the function ℓh,v defined by

326

9 Unitary Operators and Flows, Fourier Transform

ℓh,v (w) =

Z

h(z) dµv,w (z)

S1

for w ∈ H. Then w 7→ ℓh,v (w) is a bounded linear functional on H, and so by the Fr´echet–Riesz representation theorem (Corollary 3.19) there exists some vh ∈ H with ℓh,v (w) = hvh , wi for all w ∈ H. Moreover, kvh k = kℓh,v k 6 4khk∞ kvk and vh depends linearly on v. Hence h(U )v = vh defines a bounded linear operator with kh(U )kop 6 4khk∞. We note that v ′ ∈ Hv and w ∈ Hv⊥ implies hU n v, wiH = 0 for all n ∈ Z, hence µv′ ,w = 0 and so hh(U )v ′ , wiH = 0 for all h ∈ L ∞ (S1 ). As this holds for all v ′ ∈ Hv and w ∈ Hv⊥ , we see that h(U )Hv ⊆ Hv . Thus for the proof of the description of h(U ) via the spectral theorem it suffices to describe the restriction of h(U ) to a cyclic representation, or equivalently describe h(MI ) on L2 (S1 , µ) for some finite measure µ on S1 . If f, g ∈ L2 (S1 , µ), then Z hMIn f, giL2 (S1 ,µ) = z n f g dµ S1

for all n ∈ Z. Hence dµf,g = f g dµ is the spectral measure of f, g with respect to MI , giving Z hh(MI )f, giL2 (S1 ,µ) = hf g dµ = hMh f, giL2 (S1 ,µ) . S1

This implies the corollary.



Exercise 9.19. Improve the estimate in (9.5) to the inequality kµv,w k 6 kvkkwk for all v, w ∈ H. Exercise 9.20. Let U be a unitary operator on a separable complex Hilbert space and h a function in L ∞ (S1 ). When is h(U ) unitary or self-adjoint? What is the norm kh(U )kop ? Exercise 9.21. Let U be a unitary operator on aFseparable complex Hilbert space H. Suppose that we are given a decomposition S1 = k P Bk for some (finite or countable) list of measurable sets Bk . Let v ∈ H. Show that v = k E(Bk )v. Describe the spectral measure µE(Bk )v for all k in terms of the spectral measure µv and explain the meaning of E(Bk )v in the case when µv is atomic.

9.1.5 An Application of Spectral Theory to Dynamics In this subsection we will use the spectral theory of unitary operators in the context of measure-preserving systems (see Definition 8.35). In fact, we will assume here that X is a compact metric space, µ is a Borel probability measure on X, and that T : X → X is measure-preserving, continuous, and

9.1 Spectral Theory of Unitary Operators

327

invertible, so that T −1 : X → X is also continuous. This guarantees that the operator UT : L2 (X, µ) → L2 (X, µ) defined by UT f = f ◦ T is unitary, since Z Z hUT f, gi = (f ◦ T g) ◦ T −1 dµ = f (g ◦ T −1 ) dµ = hf, UT −1 gi X

X

2

for all f, g ∈ L (X, µ). Definition 9.22. Let (X, µ, T ) be an invertible measure-preserving system as above. Then we say that T has purely discrete spectrum if UT is diagonalizable (equivalently, if for any f ∈ L2 (X, µ) the spectral measure µf with respect to UT is atomic). We say that T has Lebesgue spectrum if for any   Z f ∈ L20 (X, µ) = f ∈ L2 (X, µ) | f dµ = 0 the spectral measure µf with respect to UT is absolutely continuous with respect to the Lebesgue measure on T. Essential Exercise 9.23. (a) Prove that the rotation Rα : x 7→ x + α for x, α ∈ T has purely discrete spectrum with respect to the Lebesgue measure on T. (b) Prove that the map        x 01 x y A: 7−→ = y 11 y x+y preserves Lebesgue measure on T2 and has Lebesgue spectrum. We recall that if (Z, B) and (Z ′ , B ′ ) are measurable spaces, µ is a measure on Z and φ : Z → Z ′ is a measurable map, then we can define the push-forward measure φ∗ µ on Z ′ by the formula φ∗ µ(B ′ ) = µφ−1 (B ′ ) for all sets B ′ ∈ B ′ . Using this notion, invariance of a measure µ under a map T : X → X is precisely the condition that the push-forward T∗ µ coincides with the original measure µ. Definition 9.24 (Furstenberg [37]). Let (X, νX , T ) and (Y, νY , S) be two measure-preserving systems. A joining of these systems is a measure ρ on the product X × Y with (T × S)∗ ρ = ρ, (πX )∗ ρ = νX , and (πY )∗ ρ = νY where πX , πY denote the natural projections from X × Y onto X, Y respectively. The two systems are called disjoint, written T ⊥ S, if ρ = νX × νY is the only possible joining. Furstenberg introduced the notion of joinings and also gave the first classes of disjoint systems. Theorem 9.25 (Disjointness for spectral reasons). Suppose (X, νX , T ) and (Y, νY , S) are two measure-preserving systems. Suppose moreover that for

328

9 Unitary Operators and Flows, Fourier Transform

any f in L20 (X, νX ) and any g in L20 (Y, νY ) the spectral measures µf (with respect to UT on L2 (X, νX )) and µg (with respect to US on L2 (Y, νY )) are mutually singular. Then the two systems are disjoint. Proof. Let ρ be a joining of the two systems. For every f ∈ L2 (X, νX ) we have f ◦ πX ∈ L2 (X × Y, ρ) with kf ◦ πX kL2 (X×Y,ρ) = kf kL2(X,νX ) and UT ×S (f ◦ πX ) = UT (f ) ◦ πX . Moreover,

n UT ×S (f ◦ πX ), f ◦ πX L2 (X×Y,ρ) = hUTn f, f iL2 (X,νX )

for all n ∈ Z, which implies that the spectral measures µf ◦πX defined using the unitary operator UT ×S : L2 (X × Y, ρ) → L2 (X × Y, ρ) agrees with µf . A similar statement holds for g ∈ L2 (Y, νY ). Applying this to f ∈ L20 (X, νX ) and g ∈ L20 (Y, νY ), using the assumption in the theorem and Lemma 9.12(e) we see that f ◦ πX ⊥ g ◦ πY . Now let A ⊆ X and B ⊆ Y be measurable sets and define f = 1A − νX (A) ∈ L20 (X, νX ) and g = 1B − νY (B) ∈ L20 (Y, νY ) to obtain 0 = hf ◦ πX , g ◦ πY i = h1A×Y − νX (A), 1X×B − νY (B)i

= h1A×Y , 1X×B i − νX (A) h1, 1X×B i − νY (B) h1A×Y , 1i + νX (A)νY (B) = ρ(A × B) − νX (A)νY (B),

where all the inner products are taken in L2 (X × Y, ρ). As this holds for all measurable A ⊆ X and B ⊆ Y , we deduce that ρ = νX × νY .  We wrap up the discussion of disjointness, and our excursion into ergodic theory, by discussing a consequence of disjointness for the dynamics of individual points. Let X be a compact metric space, T : X → X a continuous map, and µ a T -invariant and ergodic probability measure on X. A consequence of the pointwise ergodic theorem (one of the fundamental results in ergodic theory, see [27, Ch. 2, Sec. 4.4.2]) is that µ-almost every point x ∈ X satisfies Z N −1 1 X n f (T x) −→ f dµ N n=0 X as N → ∞ for all f ∈ C(X), in which case x is called µ-generic. Exercise 8.39 and the map in the proof of Proposition 8.34 give examples of systems in which every point is generic for the (Lebesgue) measure on the space. However, for the map A : T2 → T2 in Exercise 9.23(b) it is easy to see that rational points in Q2 /Z2 are not generic. There are also irrational non-generic points, but as the Lebesgue measure m on T2 is invariant and ergodic for A, one obtains from the ergodic theorem that m-almost every

9.2 The Fourier Transform

329

point in T2 is m-generic. With these examples in mind, we can now give a pointwise corollary of the discussion of spectral measures in the following exercise. Essential Exercise 9.26. Let X, Y be compact metric spaces, T : X → X and S : Y → Y continuous maps, and νX ∈ P T (X), νY ∈ P S (Y ) be invariant and ergodic probability measures. Suppose that the two systems are disjoint. (a) Let (ρn ) be a sequence of probability measures on X × Y with the property that (πX )∗ ρn → νX and (πY )∗ ρn → νY as n → ∞ with respect to the weak* topology. Suppose that any limit limk→∞ ρnk of a weak* convergent subsequence is T × S-invariant. Show that limn→∞ ρn = νX × νY . (b) Let x ∈ X be νX -generic for T and let y ∈ Y be νY -generic for S. Use (a) to show that (x, y) is νX × νY -generic for T × S. (c) Show that the result of (b) applies in particular to the case of the rotation T = Rα : T → T and the automorphism S = A : T2 → T2 from Exercise 9.23.

9.2 The Fourier Transform The Fourier transform generalizes the (important and satisfying) theory of Fourier series on Td to an (equally important and satisfying) theory for functions on Rd , which will in particular lead to a generalization of the spectral theory of unitary operators (Theorem 9.2) to unitary flows (Theorem 9.58). The analogue of the Fourier coefficients of a function on Td will be the Fourier transform fb of a function f on Rd , defined by Z fb(t) = f (x)e−2πix·t dx, (9.6) Rd

where x, t ∈ Rd and x · t = x1 t1 + · · · + xd td is the usual inner product. The analogue of the Fourier series will be the Fourier back transform (or reverse transform) q h of a function h on Rd , defined by Z qh(x) = h(t)e2πix·t dt. (9.7) Rd

The analogue of the fact that the Fourier series represents the original function (where this is true) is a Fourier inversion formula f = (fb)q . However, the way in which the optimistic identity f = (fb)q needs to be interpreted as a mathematical theorem is more involved. For example, if f ∈ L2 (Rd ) then there is no reason to expect the integral defining the Fourier transform in (9.6) to exist. However, we will still be able to obtain a sensible definition of the Fourier transform as an extension of a densely defined bounded operator.

330

9 Unitary Operators and Flows, Fourier Transform

We also note that we will think of x ∈ Rd as the space variable and of t ∈ Rd as the frequency variable. In fact, any t ∈ Rd defines the wave function x 7→ e2πix·t for which t gives the frequency and the direction of the wave. In that sense, fb(t) should be interpreted as the correlation of the function f and the wave with frequency t. The formula f = (fb)q then means that one can reconstruct the original function f as a suitable superposition of the waves with frequency t and amplitude fb(t). We start with a concrete example. 2

Let f (x) = e−πkxk for x ∈ Rd .

Example 9.27 (Gaussian distribution). 2 Then fb(t) = e−πktk .

Proof. Suppose first that d = 1, and start by calculating fb(0). By definition, Z 2 fb(0) = e−πx dx. R

Thus

fb(0)2 =

Z

2

2

e−πx e−πy dx dy

R2 Z ∞

Z

(by Fubini)

2

e−πr r dθ dr 0 Z ∞0 Z 2 = 2π e−πr r dr = =

0

(in polar coordinates) ∞

e−s ds = 1

(where πr2 = s)

0

and as fb(0) > 0 we get fb(0) = 1. To verify the claimed formula for a general t in R we will use the Cauchy integral formula for complex path integrals 2 applied to the holomorphic function C ∋ z 7→ e−πz . We integrate over a rectangular path γ with corners at ±M and ±M + it as illustrated in Figure 9.1.

−M + it

M + it

−M

M Fig. 9.1: The contour γ.

9.2 The Fourier Transform

331

H 2 Then by Cauchy’s formula γ e−πz dz = 0. Bringing the integrals over the third and the fourth piece of the path to the other side we obtain Z

M

e −M

Z t Z t Z −π(M+is)2 −π(−M+is)2 dx+i e ds = i e ds+

−πx2

0

0

Now notice that

2

e−π(x+is) = e−π(x

M

2

e−π(x+it) dx. (9.8)

−M

2

−s2 +2isx)

which implies that for fixed t ∈ R and |s| 6 |t| we have 2 2 −π(±M+is)2 e 6 e−π(M −t ) 2

which implies that e−π(±M+is) → 0 uniformly on [−t, t] as M → ∞. Thus letting M → ∞ in (9.8) we see that Z ∞ Z ∞ 2 2 2 1= e−πx dx = e−πx e−2πitx dx eπt , −∞ −∞ | {z } fb(t)

2

which gives fb(t) = e−πt for all t ∈ R. For d > 1 notice that 2

2

2

f (x) = e−πkxk = e−πx1 · · · e−πxd = f1 (x1 ) · · · f1 (xd ) 2

is a product of d copies of the function f1 (x) = e−πx discussed above, so Z fb(t) = f1 (x1 ) · · · f1 (xd )e−2πi(x1 t1 +···+xd td ) dx d ZR Z = f1 (x1 )e−2πix1 t1 dx1 · · · f1 (xd )e−2πixd td dxd |R {z } |R {z } =fb1 (t1 )

=e

−πt21

···e

−πt2d

=fb1 (td )

=e

−πktk2

by Fubini’s theorem and the case d = 1.



9.2.1 The Fourier Transform on L1 (Rd ) Lemma 9.28 (Basic inequality). The Fourier transform in (9.6) is defined for every f ∈ L1 (Rd ) and t ∈ Rd and satisfies kfbk∞ 6 kf k1 . Proof. For f ∈ L1 (Rd ) and t ∈ Rd we have

332

9 Unitary Operators and Flows, Fourier Transform

Z |fb(t)| =

Rd

Z f (x)e−2πix·t dx 6

Rd

|f (x)| dx = kf k1 ,

which proves the lemma.



The next result is the first of many duality principles involving Fourier transforms. Proposition 9.29 (Duality between shift and phase shift). For x0 and t0 ∈ Rd we define the shift operator λx0 and the multiplication operator Mχ(t0 ) on L1 (Rd ) by λx0 (f ) : x 7−→ f (x − x0 ) and Mχ(t0 ) (f ) : x 7−→ e2πix·t0 f (x). \ b b Then λ\ x0 (f ) = Mχ(−x0 ) (f ) and Mχ(t0 ) (f ) = λt0 (f ).

We note that by a phase shift we mean multiplication by a character, the reason being this proposition. Proof of Proposition 9.29. By definition, Z \ λx0 (f )(t) = f (x − x0 )e−2πix·t dx Rd Z = f (y)e−2πi(y+x0 )·t dy = e−2πix0 ·t fb(t) = Mχ(−x0 ) (fb)(t) Rd

for all x0 , t ∈ Rd and

M\ χ(t0 ) (f )(t) = =

Z

e2πix·t0 f (x)e−2πix·t dx

Rd

Z

Rd

for all t0 , t ∈ Rd .

f (x)e−2πix·(t−t0 ) dx = fb(t − t0 ) = λt0 (fb)(t)



Proposition 9.30 (Duality for linear transformations). Let f ∈ L1 (Rd ) and let A ∈ GLd (R) be an invertible matrix. Then f ◦A ∈ L1 (Rd ), and fd ◦A =

−1 1 fb ◦ At . | det A|

Proof. We use the definition and a substitution to get

9.2 The Fourier Transform

fd ◦A(t) =

for all t ∈ Rd .

333

Z

f (Ax)e−2πix·t dx Z 1 = f (Ax)| det A|e−2πi(Ax)· | det A| Rd  1 = fb (At )−1 t | det A| Rd



(At )−1 t

dx



Proposition 9.31 (Duality of convolution and multiplication (I)). For f1 , f2 ∈ L1 (Rd ) recall that the convolution f1 ∗ f2 ∈ L1 (Rd ) defined by Z f1 ∗ f2 (x) = f1 (y)f2 (x − y) dy Rd

satisfies kf1 ∗ f2 k1 6 kf1 k1 kf2 k1 and f1 ∗ f2 = f2 ∗ f1 (so that L1 (Rd ) is a commutative Banach algebra). The Fourier transform of f1 ∗ f2 is given by bb f\ 1 ∗ f2 = f1 f2 .

Proof. Applying Fubini’s theorem and a substitution we see that Z Z Z Z |f1 (y)f2 (x − y)| dy dx = |f1 (y)f2 (x − y)| dx dy d d Rd Rd ZR ZR = |f1 (y)||f2 (z)| dz dy = kf1 k1 kf2 k1 . Rd

Rd

Thus the integral defining f1 ∗ f2 (x) exists for almost every x ∈ Rd , and Z Z kf1 ∗ f2 k1 6 |f1 (y)f2 (x − y)| dy dx = kf1 k1 kf2 k1 . Rd

Rd

For commutativity we see that Z Z f1 ∗ f2 (x) = f1 (y)f2 (x − y) dy = f1 (x − z)f2 (z) dz = f2 ∗ f1 (x) by using the substitution z = x − y for any fixed x ∈ Rd . Now let t ∈ Rd and apply Fubini’s theorem to the definition of f\ 1 ∗ f2 (t) to see that Z Z \ f1 ∗ f2 (t) = f1 (y)f2 (x − y) dy e−2πix·t dx Rd Rd Z Z = f1 (y) f2 (x − y)e−2πi(x−y)·t dx e−2πiy·t dy = fb1 (t)fb2 (t). d d R {z } |R fb2 (t)

334

9 Unitary Operators and Flows, Fourier Transform

 Exercise 9.32. Show that the convolution in Proposition 9.31 is associative.

The impatient reader may use the propositions above together with Example 9.27 to show that the Fourier transform extends to an isometry from L2 (Rd ) to L2 (Rd ) via the steps of the following exercise. Exercise 9.33. (a) Show that A=

(

x 7−→

X

ci e−πai kx−xi k

2

+2πix·ti

| ci ∈ C, ai > 0, xi , ti ∈ Rd

finite

)

is a sub-algebra of C0 (Rd ) that separates points and is closed under conjugation. b = A and that f = (fb)q = (fq)b for all f ∈ A. (b) Show that A (c) Show that A ⊆ C0 (Rd ) is dense with respect to k · k∞ . (d) Show that A ⊆ L1 (Rd ) ∩ L2 (Rd ) is dense in both L1 (Rd ) and in L2 (Rd ) with respect to the norms k · k1 and k · k2 respectively (which is not an immediate consequence of (c) since Rd has infinite Lebesgue measure). Show that if F ∈ L1 (Rd ) ∩ L2 (Rd ) and ε > 0 then there exists a single function f ∈ A with kf − F k1 < ε and kf − F k2 < ε. (e) Show that kfbk2 = kf k2 for all f ∈ A so that the Fourier transform extends to a unitary map on L2 (R) with inverse given by the Fourier back transform. Moreover, the extension agrees with the Fourier transform defined by the Lebesgue integral on L1 (Rd ) ∩ L2 (Rd ).

Proposition 9.34 (Riemann–Lebesgue lemma). The Fourier transform maps L1 (Rd ) into C0 (Rd ). Proof. If tn → t in Rd as n → ∞, then also f (x)e−2πix·tn → f (x)e−2πix·t for almost every x ∈ Rd and so Z Z fb(tn ) = f (x)e−2πix·tn dx −→ f (x)e−2πix·t dx = fb(t) Rd

Rd

as n → ∞ by the dominated convergence theorem. Therefore fb is a bounded continuous function on Rd by Lemma 9.28. It remains to show that fb lies in C0 (Rd ), which we will do by an approximation argument. Suppose first that f = 1[a1 ,b1 ]×···×[ad ,bd ] is the characteristic function of a rectangle. Then, by Fubini’s theorem, Z fb(t) = 1[a1 ,b1 ] (x1 ) · · · 1[ad ,bd ] (xd )e−2πix·t dx Rd b1

=

Z

a1

e−2πix1 t1 dx1 · · ·

Z

bd

e−2πixd td dxd .

Each factor can be calculated explicitly, in fact ( −2πibt −2πiat Z b

a

e−2πixt dx =

e

−e −2πit

b−a

for t 6= 0, and for t = 0,

(9.9)

9.2 The Fourier Transform

335

so each factor lies in C0 (R). It follows that fb ∈ C0 (Rd ) if f is the characteristic function of a rectangle. By linearity the same holds for finite linear combinations of such functions. Since C0 (Rd ) is complete with respect to k · k∞ and the Fourier transform is continuous from L1 (Rd ) to Cb (Rd ), the same holds for any element f ∈ L1 (Rd ) that can be approximated in L1 (Rd ) by such finite linear combinations, which is all of L1 (Rd ) (see the argument on p. 172).  Exercise 9.35. Show that the Fourier transform calculated in (9.9) does not belong to L1 (R).

As mentioned above, we will show that the Fourier back transform is the inverse of the Fourier transform. However, as we will see, this requires additional assumptions on the function, since the hypothesis f ∈ L1 (Rd ) does not imply that fb ∈ L1 (Rd ) (as seen in Exercise 9.35), so there is no reason to expect that the Fourier back transform will be defined on fb. Theorem 9.36 (Fourier inversion). If f ∈ L1 (Rd ) has fb ∈ L1 (Rd ), then f agrees almost everywhere with the continuous function (fb)q ∈ C0 (Rd ).

Despite the additional assumption, this theorem already implies that any L1 function is uniquely determined by its Fourier transform. However, if the latter is not integrable, it is unclear how to recover f from fb. Corollary 9.37 (Injectivity). If f1 , f2 ∈ L1 (Rd ) have fb1 = fb2 , then f1 = f2 .

Proof. Given f1 , f2 ∈ L1 (Rd ) as in the corollary, the function f = f1 − f2 satisfies fb = 0 ∈ L1 (Rd ). Applying Theorem 9.36 this implies that f = 0 and proves the corollary.  In order to prove Theorem 9.36 we need a preparatory lemma. Z Z Lemma 9.38. If f, g ∈ L1 (Rd ) then fbg dx = fb g dy. Rd

Rd

Proof. Once again, this is a simple application of Fubini’s theorem, as Z Z Z b f (x)g(x) dx = f (y)e−2πiy·x dy g(x) dx Rd Rd Rd Z Z Z = f (y) g(x)e−2πiy·x dx dy = f (y)b g(y) dy. Rd

Rd

Rd

 Proof of Theorem 9.36. Let f ∈ L1 (Rd ) such that fb ∈ L1 (Rd ). By Proposition 9.34 we have (fb)q ∈ C0 (Rd ). We need to show that (fb)q agrees with f almost everywhere. To achieve this, we will use Lemma 9.38 for f and the phase-shifted stretched Gaussian distribution

336

9 Unitary Operators and Flows, Fourier Transform

φr,x0 (t) = e2πit·x0 φr (t) where φr (t) = e−πkrtk

2

for x0 ∈ Rd and r > 0. Using Lemma 9.38 we can define the function fr in two equivalent ways by Z Z fr (x0 ) = fb(t)φr,x0 (t) dt = f (x)φ[ (9.10) r,x0 (x) dx Rd

Rd

for all x0 ∈ Rd . We will use the two sides of this formula to show that fr converges as r → 0 both to (fb)q and to f (in two different ways). Pointwise convergence. We first show that fr → (fb)q pointwise as r → 0 (where we will use the left-hand integral in (9.10)). Since Z 2 fr (x0 ) = fb(t)e2πit·x0 e−πkrtk dt Rd

2

and e−πkrtk → 1 as r → 0, we obtain Z fr (x0 ) −→ fb(t)e2πit·x0 dt = (fb)q(x0 ) Rd

by the dominated convergence theorem, for any x0 ∈ Rd . Convergence in L1 . We next show that fr → f in L1 (Rd ) as r → 0 (which will use the right-hand integral in (9.10)). The proof of this step is a bit more involved and relies on the interpretation cr of the right-hand integral as a convolution with the approximate identity φ cr has (it is easy to see — from the proof that follows, for example — that φ properties similar to the Fej´er kernel in Section 3.4.2 and the function used in Exercise 5.17; see also Exercise 8.6). By Example 9.27 and Proposition 9.30 we have cr (x) = r−d e−πkx/rk2 , φ and by Proposition 9.29 we also have

−d −πk(x−x0 )/rk2 φ[ e . r,x0 (x) = r

This gives fr (x0 ) =

Z

2

cr (x0 ) f (x)r−d e−πk(x0 −x)/rk dx = f ∗ φ

cr ∈ L1 (Rd ) by Proposition 9.31. We may bring the difference of fr and f ∗ φ and f at x0 into the form

9.2 The Fourier Transform

337

cr (x0 ) − f (x0 ) = f ∗φ

=

Z Z

2

f (x0 − x)r−d e−πkx/rk dx − f (x0 ) 2

(f (x0 − rz) − f (x0 )) e−πkzk dz

R 2 by using the substitution z = x/r (and recalling that Rd e−πkzk dz = 1). On taking the norm we obtain Z Z  cr − f k1 = f (x0 − rz) − f (x0 ) e−πkzk2 dz dx0 kf ∗ φ ZZ 2 6 |f (x0 − rz) − f (x0 )|e−πkzk dz dx0 Z 2 6 kλrz (f ) − f k1 e−πkzk dz by Fubini’s theorem. Now notice that kλrz (f ) − f k1 → 0 as r → 0 for z ∈ Rd by Lemma 3.74. Therefore cr − f k1 −→ 0 kf ∗ φ

as r → 0 by dominated convergence, which proves the claimed convergence in L1 . The limits coincide. Since every sequence that converges in L1 (Rd ) has a subsequence that converges almost everywhere (see, for example, the proof of completeness of Lp spaces on p. 34), the two statements regarding the convergence properties of fr as r → 0 together prove the theorem.  9.2.2 The Fourier Transform on L2 (Rd ) As mentioned before, the Fourier transform behaves quite well on L2 (Rd ) (after overcoming the minor obstacle that the defining integral does not make sense). Theorem 9.39 (Plancherel formula). If f ∈ L1 (Rd ) ∩ L2 (Rd ), then fb lies in L2 (Rd ) with kfbk2 = kf k2 and the map f 7→ fb extends continuously to a unitary operator on L2 (Rd ) (whose inverse is the continuous extension of f 7→ fq). Proof: A dense subspace. We define the space of functions V = {f ∈ L1 (Rd ) | fb ∈ L1 (Rd )}.

By Theorem 9.36 we have f = (fb)q ∈ C0 (Rd ) almost everywhere for f ∈ V. Hence V ⊆ L∞ (Rd ) consists of essentially bounded functions and so

338

9 Unitary Operators and Flows, Fourier Transform

|f |2 = f f ∈ L1 (Rd ) for all f ∈ V so V ⊆ L2 (Rd ). We claim that V is actually dense in L2 (Rd ). For this, notice first that L1 (Rd ) ∩ L2 (Rd ) is dense in L2 (Rd ) (since it contains simple functions), so that it is enough to approximate a given function in the intersection L1 (Rd ) ∩ L2 (Rd ) by an element of V with respect to k · k2 . Using the same notation as in the proof of Theorem 9.36, we already know cr converges to f in L1 (Rd ) as r → 0. By Proposition 9.31 we see that f ∗ φ that \ cr = fbφr f ∗φ

belongs to L1 (Rd ) (since it is a product of an element in C0 (Rd ) with an elecr ∈ V. Analyzing the argument, we ment of L1 (Rd )), which shows that f ∗ φ see that only a slight modification is necessary to also obtain L2 convergence cr → f as r → 0. Indeed, using Jensen’s inequality (see the first paraof f ∗ φ 2 graph of the proof of Lemma 3.75) with the probability measure e−π|z| dz d on R we get Z Z

cr − f 2 = (f (x0 − rz) − f (x0 ))e−π|z|2 dz 2 dx0

f ∗ φ 2 ZZ λrz f (x0 ) − f (x0 ) 2 e−π|z|2 dz dx0 6 2 Z

2 2 = λrz f − f 2 e−π|z| dz −→ 0 as r → 0, again by dominated convergence. This gives the claimed density.

Unitarity. We will now show that the Fourier transform preserves the inner product. For this, let f, g ∈ V and define h = gb. Then Z Z b h(x) = gb(t)e−2πix·t dt = gb(t)e2πix·t dt = (b g)q(x) = g(x) Rd

Rd

almost everywhere by Theorem 9.36. Applying Lemma 9.38 we see that Z Z Z Z b b hf, giL2 (Rd ) = f g dx = f h dx = f h dt = fbgb dt = hfb, b giL2 (Rd ) . Rd

Rd

Rd

Rd

In other words, we have shown that the Fourier transform preserves the inner product for elements in V. It follows that the Fourier transform extends to an isometry from L2 (Rd ) to itself, which we again denote by L2 (Rd ) ∋ f 7−→ fb ∈ L2 (Rd ).

b = V (which follows directly from Theorem 9.36) is dense in L2 (Rd ) Since V (by the above), the extension is surjective. Clearly the same discussion ap-

9.2 The Fourier Transform

339

plies to the Fourier back transform. Moreover, since (fb)q = f for f ∈ V by Theorem 9.36, the same holds for all f ∈ L2 (Rd ). As we will use the same symbol fb for the Fourier transform of f defined for f ∈ L1 (Rd ) by (9.6) and defined for f ∈ L2 (Rd ) by the unique continuous extension from V ⊆ L2 (Rd ), we still need to check that for a function f ∈ L1 (Rd ) ∩ L2 (Rd ) these definitions agree. Fortunately, most of the required work has already been done. Let f ∈ L1 (Rd ) ∩ L2 (Rd ), so cr → f as r → 0 both in L1 (Rd ) and in L2 (Rd ). As both nothat V ∋ f ∗ φ \ cr → fbL1 in C0 (Rd ) tions of Fourier transforms are continuous we obtain f ∗ φ \ cr → fbL2 in L2 (Rd ) as r → 0, where we write fbL1 for the Fourier and f ∗ φ 2 transform defined using (9.6) and fbL for the Fourier transform obtained by the above continuous extension. Taking a sequence (rn ) with rn → 0 \ d bL2 almost everywhere as n → ∞ we deduce as n → ∞ such that f ∗ φ rn → f 2 1 that fbL = fbL almost everywhere, as desired.  Using the Plancherel formula we can give the reverse direction of the duality we first enountered in Proposition 9.31.

Corollary 9.40 (Duality of convolution and multiplication (II)). For functions f, g ∈ L2 (Rd ) the pointwise product f g ∈ L1 (Rd ) has Fourier transform fcg = fb ∗ gb. Proof. Note that since fb, gb ∈ L2 (Rd ), the integral in Z b f ∗b g(t) = fb(s)b g (t − s) ds Rd

exists for every t ∈ Rd by the Cauchy–Schwarz inequality. Also note that Z Z −2πix·t b g(t) = g(x)e dx = g(x)e2πix·t dx = gb(−t). Rd

Rd

Since the map f 7→ fb is unitary on L2 (Rd ), this gives Z Z

fb ∗ b g(0) = fb(t)b g (−t) dt = fb, b g L2 (Rd ) = f, g L2 (Rd ) = f g dx = fcg(0). Rd

Using Proposition 9.29 and λ−t0 (b g )(−t) = b g (t0 − t) for all t ∈ Rd we can extend this to Z fb ∗ gb(t0 ) = fb(t)b g (t0 − t) dt = fb ∗ λ−t0 (b g)(0) Rd Z  \ b = f ∗ Mχ(−t0 ) (g) (0) = f Mχ(−t0 ) (g) dx = fcg(t0 ) Rd

340

9 Unitary Operators and Flows, Fourier Transform

for all t0 ∈ Rd .



Exercise 9.41. Show that the unitary operator L2 (Rd ) ∋ f 7→ fb ∈ L2 (Rd ) is completely diagonalizable and has only four eigenvalues.

Exercise 9.42. Use the Riesz–Thorin interpolation theorem to prove the Hausdorff–Young inequality. Fix p ∈ (1, 2). Show that the Fourier transform on L1 (Rd ) ∩ L2 (Rd ) can be extended to all f ∈ Lp (Rd ) so that kfbkq 6 kf kp where p1 + 1q = 1.

9.2.3 The Fourier Transform, Smoothness, Schwartz Space As with Fourier series in Section 3.4, smoothness and decay properties of the d d α Fourier transform are closely related.  For x ∈α R and α ∈ N0 we write x d α1 αd for x1 · · · xd and define M(cI)α f (x) = (cx) f (x) for any function f on R and scalar c. Proposition 9.43 (Duality between differentiation and multiplication by monomials). If x 7→ xα f (x) lies in L1 (Rd ) for all α ∈ Nd0 with kαk1 6 k, then fb ∈ C k (Rd ), and V

∂α fb = M(−2πiI)α (f )

for all α with kαk1 6 k. If f ∈ C k (Rd ) and ∂α f ∈ L1 (Rd ) for kαk1 6 k and ∂α f ∈ C0 (Rd ) for kαk1 6 k − 1, then

for all α with kαk1 6 k.

∂d α f (t) = M(2πiI)α f

Proof. Suppose that f and x 7→ xj f (x) lie in L1 (Rd ). Then fb(t + hej ) − fb(t) ∂j fb(t) = lim h→0 h Z e−2πix·(t+hej ) − e−2πix·t = lim f (x) dx h→0 Rd h Z e−2πihxj − 1 −2πix·t = lim f (x) e dx h→0 Rd h if the limit exists. Now notice that e−2πihxj − 1 −→ −2πixj h as h → 0 is bounded in absolute value by 2π|xj | by the two-dimensional mean value theorem for differentiation. Applying the dominated convergence theorem we deduce that the above limit exists and is equal to

9.2 The Fourier Transform

∂j fb(t) =

Z

Rd

341 V

f (x)(−2πixj )e−2πix·t dx = M(−2πiI)ej f (t)

and so the first part of the proposition now follows by induction on k. Now suppose f ∈ C 1 (Rd ), f ∈ C0 (Rd ) and f, ∂j f ∈ L1 (Rd ). Then Z ∂f (x)e−2πix·t dx. ∂d j f (t) = Rd ∂xj

By Fubini’s theorem we may evaluate this integral by first integrating over xj . Assuming as we may that this one-dimensional integral is finite, we apply integration by parts to obtain Z xj =M ∂f (x)e−2πix·t dxj = lim f (x)e−2πix·t M→∞ ∂x xj =−M j R Z M − lim f (x)(−2πitj )e−2πix·t dxj M→∞ −M Z = 2πitj f (x)e−2πix·t dxj R

by all of our assumptions on f . Integrating over the remaining variables this gives the second formula in the proposition in the case α = ej and the general case now follows by induction.  Proposition 9.43 says in particular that the Fourier transform of a smooth function in C0 (Rd ) ∩ L1 (Rd ) whose derivatives also lie in C0 (Rd ) ∩ L1 (Rd ) (for example, any element of Cc∞ (Rd )) has a Fourier transform with superpolynomial decay. That is, the Fourier transform fˆ multiplied by any polynomial is bounded (and still decays). Similarly, a function that has superpolynomial decay has a smooth Fourier transform. Given these observations, the next definition describes a natural class of functions invariant under differentiation and under Fourier transforms. Definition 9.44. The Schwartz space on Rd is defined by  S (Rd ) = f : Rd → C | f is smooth and kxα ∂β f k∞ < ∞ for all α, β ∈ Nd0 .

The following exercises describe the main properties of S (Rd ) and of the Fourier transform on S (Rd ). Essential Exercise 9.45. (a) Show that S (Rd ) is a Fr´echet space (see Definition 8.65) with the seminorms kf kα,β = kxα ∂β f k∞ for f ∈ S (Rd ) and α, β ∈ Nd0 . (b) If the seminorms kf k′α,β = k∂β (xα f (x))k∞ are used instead, do you get the same Fr´echet space? (c) What happens if we replace the supremum norms by 1-norms or 2-norms? (d) Show that S (Rd ) ⊆ Lp (Rd ) for all p ∈ [1, ∞].

342

9 Unitary Operators and Flows, Fourier Transform

Essential Exercise 9.46. Show that the Fourier transform c maps S (Rd ) to itself, is a continuous operator, and has the Fourier back transform | as its continuous inverse. Exercise 9.47. Prove the Poisson summation formula:

X

f (n) =

n∈Zd

for f ∈ S (Rd ).

X

n∈Zd

fb(n)

We will use the following in Chapters 11 and 14. p ∞ Essential Exercise 9.48. Show that C\ c (R) is a dense subspace of L (R) for any p ∈ [1, ∞).

9.2.4 The Uncertainty Principle The Fourier transform viewed as a homeomorphism S (R) → S (R) (see Exercise 9.46) has multiple physical interpretations • A function f ∈ S (R) with kf k2 = 1 may describe the probability distribution of the position of a particle, so that the probability of the particle R being in B ⊆ R is B |f (x)|2 dx. In this case fb gives a probability distribution of the momentum of the particle. Here Heisenberg’s uncertainty principle from quantum mechanics states that if the position distribution f is strongly localized at one position, then the momentum distribution fb is forced to be spread out over a big range. • A function f ∈ S (R) may describe a sound, in which case fb describes the frequencies present. Sampling f for a short time may be thought of as localizing f by multiplying it by a function in S (R) supported on a short interval. It is intuitively clear that if the sampling interval is very short, then we cannot get too much information about the frequencies present: f and fb cannot both be strongly localized.

It turns out that both of these observations are a consequence of the Cauchy– Schwarz inequality, and they have a convenient precise formulation as follows.(28) As before, we write MI (f )(x) = xf (x) for x ∈ Rd and f ∈ L2 (Rd ). Note that if we think of f ∈ L2 (R) with kf k2 = 1 as givingRus a probability distribution for the position of a particle, then kMI (f )k2 = R |x|2 |f (x)|2 dx gives the expectation of the squared distance to the origin. If that expectation were to be small, then f would certainly be quite localized near the origin. Similarly, kMI (fb)k2 measures how much fb is localized at zero momentum (see also Exercise 9.51 for other positions and momenta). Theorem 9.49 (Uncertainty principle). For f ∈ S (R) we have 1 kf k22 . kMI (f )k2 kMI (fb)k2 > 4π

(9.11)

9.2 The Fourier Transform

343

Proof. First notice that for z, w ∈ C we have zw + zw = 2ℜ(zw).

(9.12)

Then kf k22

=

Z

R

2

|f (x)| dx =

Z

f (x)f (x) dx ∞ Z  = x|f (x)|2 − x f ′ (x)f (x) + f (x)f ′ (x) dx R | {z –∞ } R

=0 as f ∈S (R)

= −2

Hence

Z

R

 xℜ f (x)f ′ (x) dx.

(by (9.12))

Z  kf k22 = −2 ℜ xf (x)f ′ (x) dx 6 2kMI (f )k2 kf ′ k2 R

by the Cauchy–Schwarz inequality. Finally we may apply (a very special case of) Proposition 9.43 to obtain V

kf ′ k2 = kf ′ k2 so that

(by Theorem 9.39)

= 2πkMI (fb)k2 ,

(by Prop. 9.43)

kf k22 6 4πkMI (f )k2 kMI (fb)k2 ,

as claimed in the theorem.



Exercise 9.50. Show that if for some f ∈ S (R) we have equality in (9.11) then f has the 2 2 form f (x) = Ae−B x for constants A ∈ C and B > 0. Exercise 9.51. Extend Theorem 9.49 by showing that for any f ∈ S (R) and x0 , t0 in R we have 1 kf k22 , kMI−x0 (f )k2 kMI−t0 (fb)k2 > 4π and that equality holds only if f (x) = Ae2πixt0 e−B

2

(x−x0 )2

for constants A ∈ C and B > 0.

We note that in Exercise 3.50 we saw another instance of an uncertainty principle for finite abelian groups.

344

9 Unitary Operators and Flows, Fourier Transform

9.3 Spectral Theory of Unitary Flows Recall from Definition 3.73 the definition of a unitary representation of a topological group G. In the case G = R, or more generally G = Rd , we will also refer to it as a unitary flow. The important notion of a positive-definite sequence generalizes easily to this setting. 9.3.1 Positive-Definite Functions and Cyclic Representations Definition 9.52. Let G be a topological group. A continuous function p:G→C is called a continuous positive-definite function if for any choice of constants c1 , . . . , cℓ ∈ C and g1 , . . . , gℓ ∈ G we have ℓ X

cm cn p(gn−1 gm ) > 0.

m,n=1

Just as in Section 9.1, this notion is intimately connected to cyclic representations of the group (Definition 9.7). Lemma 9.53 (Matrix coefficients). Let G be a topological group and assume that π : G H is a unitary representation of G on H. For any v ∈ H the function pπ,v : G → C defined by pπ,v (g) = hπg v, vi for g ∈ G, also known as the principal matrix coefficient of v, is a continuous positive-definite function. Moreover, the function pπ,v uniquely characterizes the cyclic representation Hv generated by the element v. More precisely, if π ′ : G H′ is another ′ ′ ′ unitary representation and there is some v ∈ H with pπ,v = pπ ,v′ then there is a unitary isomorphism Ψ : Hv → Hv′ ′ with Ψ (v) = v ′ and Ψ ◦ πg = πg′ ◦ Ψ for all g ∈ G.

ý

ý

If there is a unitary isomorphism with the properties of Ψ in the lemma, then we say that the representations π and π ′ are unitarily isomorphic. Proof of Lemma 9.53. The argument is essentially the same as that used in the sequence case (see the justification for Example 9.5 and the proof of Corollary 9.8), so we will be brief. Let c1 , . . . , cℓ ∈ C and g1 , . . . , gℓ ∈ G. Then ℓ X

cm cn pπ,v (gn−1 gm ) =

ℓ X

m,n=1

m,n=1

=

*

ℓ X

m=1

which gives the first claim of the lemma.

cm cn hπgm v, πgn vi cm πgm v,

ℓ X

n=1

cn πgn v

+

> 0,

9.3 Spectral Theory of Unitary Flows

345

Now suppose that π ′ , H′ , and v ′ have the properties stated in the lemma, namely pπ,v = pπ′ ,v′ . Then the elements ℓ X

m=1

and

ℓ X

m=1

cm πgm v ∈ Hv

cm πg′ m v ′ ∈ Hv′ ′

have the same norms in their respective Hilbert spaces as both norms can be expressed as above in terms of the positive-definite function p. We can define Ψ on a dense subset of Hv by setting ! ℓ ℓ X X Ψ cm πgm v = cm πg′ m v ′ , m=1

m=1

and this is a well-defined isometry mapping from a dense subset of Hv onto a dense subset of Hv′ ′ . The lemma follows by taking the automatic continuous extension from Proposition 2.59.  Using only the definition we can prove the following elementary properties of positive-definite functions. Lemma 9.54 (Properties of positive-definite functions). Let G be a topological group and let p : G → C be a continuous positive-definite function on G. Then p(g −1 ) = p(g) for all g ∈ G and kpk∞ = p(e). Proof. Applying the defining property for a positive-definite function with the choices x1 = e, c1 = 1, x2 = g ∈ G, and c2 = α ∈ C, we obtain p(e) + |α|2 p(e) + αp(g) + αp(g −1 ) > 0. As this holds for all α ∈ C, we may set α = 0 and see that p(e) > 0. Now use both α = 1 and α = i to see that p(g −1 ) = p(g). Finally, if p(g) 6= 0, we may set α = −|p(g)|/p(g) to see that 2p(e) − 2|p(g)| > 0. It follows that |p(g)| 6 p(e) (which also holds if p(g) = 0), with equality for g = e, giving the lemma.  Exercise 9.55. Let G be a topological group. (a) Show that any positive-definite function p on G has the form p = pπ,v for some cyclic unitary representation π on a Hilbert space H and generator v ∈ H. (b) Show that P1 (G) = {p ∈ Cb (G) | p is positive-definite and p(e) = 1} is convex, and show that if p ∈ P1 (G) is extreme then the associated unitary representation in (a) is irreducible (that is, has no non-trivial proper π-invariant closed subspaces).

The converse of the statement in Exercise 9.55(b) also holds, and will be shown later (see Exercise 12.59).

346

9 Unitary Operators and Flows, Fourier Transform

9.3.2 The Case G = Rd The following describes all positive-definite functions for Rd and hence once again all cyclic representations of Rd . Theorem 9.56 (Bochner’s theorem for Rd ). Let d > 1 and suppose that p : Rd → C is a continuous positive-definite function. Then there exists a uniquely determined finite measure µp on Rd satisfying Z p(x) = e2πix·t dµp (t) Rd

for all x ∈ Rd .

We note that this means that p(x) = Mχ(x) 1, 1 L2 (Rd ,µ , where Mχ(x) is p

defined by Mχ(x) (f )(t) = e2πix·t f (t) for all f ∈ L2 (Rd , µp ) and t ∈ Rd .

Essential Exercise 9.57. Let µ be a finite measure on Rd with d > 1. Show that Rd ∋ x 7→ Mχ(x) defines a unitary representation of Rd on L2 (Rd , µ) (and, in particular, also satisfies the continuity requirement of a unitary representation in Definition 3.73). We postpone the proof of Bochner’s theorem and first discuss one of its corollaries, the spectral theorem. Theorem 9.58 (Spectral theorem for Rd ). Let d > 1 and suppose that π is a unitary representation Rd HL on a separable complex Hilbert space H. Then there is a decomposition H = n>1 Hvn for some sequence (vn ) in H. Moreover, for every v ∈ H the unitary representation π : Rd Hv is unitarily isomorphic to the unitary representation Mχ(x) on L2 (Rd , µv ), where µv is the spectral measure of v ∈ H (obtained from pπ,v and Theorem 9.56) and Mχ(x) is the unitary multiplication operator on L2 (Rd , µv ), as above.

ý

ý

Proof. The argument after Definition 9.7 shows that H can be written as an orthogonal direct sum of cyclic representations Hvn for some vectors v1 , v2 , . . . ∈ H. We apply Bochner’s theorem (Theorem 9.56) to a cyclic representation, say Hv for v ∈ H, to find the spectral measure. Lemma 9.53, the comment after Theorem 9.56, and Exercise 9.57 show that the cyclic representation is isomorphic to the cyclic representation generated by 1 inside L2 (Rd , µ). It remains to show that this representation is all of the space H′ = L2 (Rd , µ). Suppose therefore that f ∈ L2 (Rd , µ) belongs to the orthogonal complement of H1′ , so that Z

f (t)e2πix·t dµ(t) = f, Mχ(−x) 1 = 0 Rd

9.3 Spectral Theory of Unitary Flows

347

for all x ∈ Rd . Let g ∈ S (Rd ). Since (b g )q = g and b g ∈ L1 (Rd ), we obtain Z Z f g dµ = f (b g )q dµ hf, giL2 (µ) = Rd Rd Z Z = f (t) gb(x)e2πix·t dx dµ(t) Rd Rd Z Z = gb(x) f (t)e2πix·t dµ(t) dx = 0 Rd

Rd

by Fubini’s theorem. Recalling that S (Rd ) ⊇ Cc∞ (Rd ) is dense in L2µ (Rd ), we see that f = 0. ⊥ Since this holds for all f ∈ (H1′ ) it follows that H1′ = H′ = L2 (Rd , µ), as required.  For the proof of Bochner’s theorem it will be convenient to reformulate the defining property of positive-definite functions in terms of convolution as in the next lemma.

Lemma 9.59 (Positive-definite functions and convolutions). Assume that d > 1 and suppose p : Rd → C is a continuous positive-definite function on Rd . For f ∈ L1 (Rd ), define fe ∈ L1 (Rd ) by fe(x) = f (−x) for x ∈ Rd . Then Z  f ∗ fe p dx > 0. Rd

Proof. We first suppose that f ∈ Cc (Rd ) has Supp(f ) ⊆ [−M, M ]d for some M > 0. In the following we write {P1 , . . . , Pn } for a partition of [−M, M ]d into squares and assume xi ∈ Pi for i = 1, . . . , n. Then Z Z Z  e f ∗ f p dx = f (y)f (y − x)p(x) dy dx | {z } Rd [−M,M]d =z Z Z = f (y)f (z)p(y − z) dy dz [−M,M]d n X

= lim

i,j=1

[−M,M]d

f (xi )m(Pi )f (xj )m(Pj )p(xi − xj )

R  is a limit of Riemann sums of the form in Definition 9.52, so f ∗ fe p dx > 0 whenever f ∈ Cc (Rd ). Approximating an arbitrary function f ∈ L1 (Rd ) by such functions (using the continuity of a product in a Banach algebra from Proposition 9.31) gives the result.  Proof of Theorem 9.56. Let p : Rd → C be a continuous positive-definite function. Let us first prove the uniqueness and assume µp is as in the theorem. For f ∈ S (Rd ) we then have

348

9 Unitary Operators and Flows, Fourier Transform

Z

f dµp =

Rd

Z

(fb)q dµp =

Rd

Z Z

Rd Rd

fb(x)e2πix·t dx dµp (t) =

Z

Rd

fb(x)p(x) dx

R by Fubini’s theorem. In particular, we see that p determines Rd f dµp uniquely for every f ∈ Cc∞ (Rd ) ⊆ S (Rd ). Since Cc∞ (Rd ) is dense in C0 (Rd ) (which contains Cc (Rd )) it follows that p determines µp uniquely by the Riesz representation (Theorem 7.44). We now turn to the existence. We will obtain the measure µ = µp by defining a positive functional on C0 (Rd ), whose properties will be proved in several steps. First step: Definition on the Schwartz space. We initially define the functional on S (Rd ) (which only contains rapidly-decaying smooth functions). For f in S (Rd ) let Z Z Z Λ(f ) = fbp dx = f (t)e−2πix·t dt p(x) dx, Rd

Rd

Rd

which is well-defined since fb ∈ L1 (Rd ) and kpk∞ = p(0) < ∞.

Second step: Positivity. The assumption on p immediately gives a certain amount of positivity. Suppose f ∈ S (Rd ) and note that fb(x) =

Z

f (t)e

−2πix·t

dt =

Rd

Z

Rd

e f (t)e2πix·t dt =fb(x)

(see Lemma 9.59 for the definition of fe) and that |f |2 ∈ S (Rd ). By the duality of multiplication and convolution (Corollary 9.40) we obtain

and so

e d |f |2 = fb ∗ fb = fb ∗fb, 2

Λ(|f | ) =

Z

e fb ∗fb p dx > 0

by Lemma 9.59. We wish to upgrade the above positivity statement to say that f > 0 and f ∈ Cc∞ (Rd ) implies Λ(f ) > 0. So let f ∈ Cc∞ (Rd ) be non-negative, and define q hε (t) = f (t) + εe−πktk2 . Notice that hε ∈ S (Rd ) so that

Λ(f ) + εΛ e−πktk for all ε > 0, and hence Λ(f ) > 0.

2



 = Λ |hε |2 > 0

Third step: Boundedness with respect to k · k∞ . Next we wish to obtain an estimate for Λ(f ) for any f ∈ Cc∞ (Rd ) that only depends on kf k∞ .

9.3 Spectral Theory of Unitary Flows

349

We assume first that f ∈ Cc∞ (Rd ) is real-valued, and fix ε > 0. Then, for 2 sufficiently large a > 0, we have f (t) < (1 + ε)kf k∞ e−πkt/ak , for all t ∈ R 2 and similarly for −f . Therefore, we have (1 + ε)kf k∞ e−πkt/ak − f (t) = |h|2 where q h(t) = (1 + ε)kf k∞ e−πkt/ak2 − f (t), and h ∈ S (Rd ), which as above gives the inequality (1 + ε)kf k∞ Λ e−πkt/ak 2

2



− Λ(f ) > 0.

Note that e−πkt/ak also has a square root inside S (Rd ) which gives Λ e−πkt/ak

2



>0

2 and so Λ(f ) ∈ R satisfies Λ(f ) 6 (1+ε)kf k∞Λ e−πkt/ak . The same estimate also holds for Λ(−f ). However, Z  −πkt/ak2 d −πkaxk2 Λ e = a | e {z } p(x) dx 6 kpk∞ = p(0)

Rd

k·k1 =1

can be bounded independently of a > 0. It follows that |Λ(f )| 6 kf k∞ p(0)

(9.13)

for any R-valued f ∈ Cc∞ (Rd ). If f ∈ Cc∞ (Rd ) is C-valued let α ∈ C have absolute value one and satisfy |Λ(f )| = αΛ(f ). Now note |Λ(f )| = Λ(αf ) = Λ(ℜ(αf )) (since Λ maps Rvalued functions into R) and apply (9.13) for ℜ(αf ) to obtain |Λ(f )| = |Λ(ℜ(αf ))| 6 kf k∞ p(0), showing that (9.13) holds for all f ∈ Cc∞ (Rd ). Last Step: Existence and properties of the measure. By the previous step the functional Λ|Cc∞ (R) extends continuously to a functional Λ0 on C0 (Rd ). Furthermore, every non-negative function in C0 (Rd ) can be approximated with respect to the supremum norm by non-negative functions in Cc∞ (Rd ). Hence the extension of Λ0 is a positive continuous linear functional on C0 (Rd ) and so defines, by the Riesz representation theorem (Theorem 7.44 and Theorem 7.54), a finite positive measure µ on Rd satisfying Z Z fbp dx = Λ(f ) = f dµ (9.14) Rd

Rd

for all f ∈ Cc∞ (Rd ). R Fix some non-negative h ∈ Cc∞ (Rd ) with Rd h dx = 1 and notice that

350

9 Unitary Operators and Flows, Fourier Transform

hr,x0 (x) = r−d h (x − x0 )/r



approximates the δ-measure at x0 as r → 0. Concretely, since p is continuous we may apply dominated convergence and the substitution y = (x − x0 )/r to see that Z p(x0 ) = lim h(y)p(x0 + ry) dy r→0 Rd Z Z  −d = lim r h (x − x0 )/r p(x) dx = lim hr,x0 (x)p(x) dx. (9.15) r→0

r→0

Rd

Rd

In order to be able to combine this with (9.14) we calculate the Fourier back transform of hr,x0 , Z  fr,x0 (t) = h~ (t) = r−d h (x − x0 )/r e2πix·t dx r,x0 d ZR = h(y)e2πi(x0 +ry)·t dy = e2πix0 ·tq h(rt), Rd

where we again used the substitution y = (x − x0 )/r. By Fourier inversion (Theorem 9.36) we also have hr,x0 = fd r,x0 . With (9.14) and (9.15) we now obtain Z Z 2πix0 ·t q p(x0 ) = lim Λ(fr,x0 ) = lim e h(rt) dµ(t) = e2πix0 ·t dµ(t) r→0

r→0

Rd

by dominated convergence and since q h(0) =

Rd

R

Rd

h dx = 1.



We remark that the properties of spectral measures discussed in Section 9.1.3 hold in the setting of unitary flows for essentially the same reasons, but we will not pursue this here. 9.3.3 Stone’s Theorem We already saw a connection between self-adjoint operators and unitary operators in Exercise 9.10. The spectral theorem for unitary flows allows us to expand this connection, leading to a complete description of a unitary flow in terms of a ‘potentially unbounded self-adjoint’ operator. For simplicity we restrict to the case of a one-parameter unitary flow (that is, a unitary representation of G = R).

ý

Theorem 9.60 (Stone’s theorem). Suppose that π : R H is a unitary representation of R on a separable complex Hilbert space H. Then the subspace   πx v − v exists in H D = v ∈ H | lim x→0 x

9.3 Spectral Theory of Unitary Flows

351

of differentiable vectors is dense in H, and D is the natural domain of the closed operator 1 πx v − v D ∋ v 7−→ A(v) = lim . 2πi x→0 x Moreover, there exists an increasing sequence of closed,S A-invariant, πinvariant subspaces D1 ⊆ D2 ⊆ · · · ⊆ D such that ℓ>1 Dℓ is dense in H, A|Dℓ : Dℓ → Dℓ is self-adjoint, kA|Dℓ k 6 ℓ, and πx |Dℓ = exp(2πixA|Dℓ ) is defined by a convergent power series for all x ∈ R and ℓ > 1. Proof. We will apply the spectral theorem (Theorem 9.2) for unitary flows to describe the unitary representation in terms of multiplication operators by scalars. As a slight simplification, we assume that H = Hv is cyclic for some v ∈ H and refer to Exercise 9.62 for the general case. Then by the spectral theorem we have Hv ∼ = L2µv (R), where v corresponds to 1 and πx corresponds to Mχ(x) for all x ∈ R. As the isomorphism between Hv and L2µv (R) is unitary it maps convergent sequences to convergent sequences and hence also differentiable vectors (as in the definition of D) for π precisely to the differentiable vectors for Mχ(·) . In other words, it suffices to prove the theorem in the case where π = Mχ(·) and H = L2µ (R) for a finite measure µ on R. We note that the spectral theorem provides a finite measure, but the proof stays the same if µ is only locally finite. In this case, we claim that D is given by D = {f ∈ L2µ (R) | MI (f ) ∈ L2µ (R)}. Indeed, for f ∈ L2µ (R) we have lim

x→0



πx f − f x



e2πixt − 1 f (t) = 2πitf (t) x→0 x

(t) = lim

pointwise wherever f (t) is defined. Therefore f ∈ D forces MI (f ) ∈ L2µ (R), where MI (f )(t) = tf (t) for t ∈ R. 2πixt For the other inclusion, note that e x −1 6 2πt by the mean value theorem for vector-valued differentiable functions. If now f, MI (f ) ∈ L2µ (R) then

2 Z 2πixt 2

πx f − f

e −1

dµ(t) − 2πiM (f ) = f (t) − 2πitf (t) I

x x | {z } 6(4πtf (t))2

converges to zero as x → 0 by dominated convergence, which proves the claimed description of D and that A = MI . We now show that MI is a closed operator in the sense of Definition 4.27. So suppose that (fn , MI (fn )) ∈ Graph(MI ) converges to (f, g). Then (after taking a subsequence) we may assume that fn (t) → f (t) almost everywhere and tfn (t) = MI (fn )(t) → g(t) almost everywhere, and hence g ∈ L2µ (R) where tf (t) = g(t) almost everywhere and so f ∈ D and MI (f ) = g, as required.

352

9 Unitary Operators and Flows, Fourier Transform

The increasing sequence of closed subspaces can be defined by setting Dℓ = L2µ|[−ℓ,ℓ] ([−ℓ, ℓ]) ⊆ L2µ (R) for ℓ > 1, where a function defined on [−ℓ, ℓ] is extended to be defined on R by setting it equal to zero outside [−ℓ, ℓ]. These S subspaces are clearly Mχ(x) and MI -invariant for all x ∈ R. Moreover, ℓ>1 Dℓ contains all continuous functions of compact support and therefore is dense in L2µ (R) by Proposition 2.51. Moreover, MI |Dℓ : Dℓ → Dℓ is self-adjoint, kMI |Dℓ k 6 ℓ, and exp(2πixMI |Dℓ ) = Mexp(2πixI) |Dℓ = Mχ(x) |Dℓ for all x ∈ R and ℓ > 1.  Exercise 9.61. Let H be a separable S complex Hilbert space, let D1 ⊆ DS2 ⊆ · · · be a sequence of closed subspaces such that ℓ>1 Dℓ is dense in H, and let A : ℓ>1 Dℓ → H be a linear map such that A(Dℓ ) ⊆ Dℓ , kA|Dℓ k 6 ℓ, and A|Dℓ : Dℓ → Dℓ is self-adjoint for all ℓ > 1. Show that there exists a uniquely defined unitary representation π of R on H such that Dℓ is π-invariant and πx |Dℓ = exp(2πixA|Dℓ ) for all x ∈ R and ℓ > 1. Exercise 9.62. Prove Theorem 9.60 in the general case where H = sequence of vectors (vn ).

L

n>1

Hvn for some

Exercise 9.63. Apply the results above to the unitary flow (ρx f )(y) = f (y+x) for x, y ∈ R and f ∈ L2 (R). (a) Use the Fourier transform and the proof of Theorem 9.60 to show that D = {f ∈ L2 (R) | t 7−→ tfb(t) lies in L2 (R)}.

(b) Show that Cc∞ (R) ⊆ D and that Cc∞ (R) is dense in D when D is endowed with the norm in Graph(A) where A is defined as in Theorem 9.60. 1 (c) Show that Graph(A) = H01 (R) and that A = 2πi ∂ x. (d) Show moreover that H 1 (R) = H01 (R). Exercise 9.64. Generalize Theorem 9.60 to unitary representations of Rd by studying the π e v−v space D of vectors for which all partial derivatives limt→0 t jt exist.

9.4 Further Topics • We will study unitary representations of more general topological groups in the next chapter (which in part will need the results regarding unitary flows of this chapter) and in Chapter 12. • The theory of the Fourier transform and the Schwartz space (together with the results of Chapter 11) will be used in Chapter 14 to prove the prime number theorem. • Our introduction to spectral theory will continue in Chapters 11, 12, and 13.

Chapter 10

Locally Compact Groups, Amenability, Property (T)

In this chapter we turn our attention to topological groups. We have seen the importance of the Haar measure for the theory of Fourier series already in Chapter 3. After proving the existence of the Haar measure(29) we study two different classes of groups. The notion of amenable group was already discussed in Chapter 7 but here we will drop the assumption of discreteness and discuss the property in greater detail. After this we will study groups with property (T); these are in some sense (see Exercise 10.35) opposite to amenable groups. Finally, we will link property (T) to the topic of expander graphs in Section 10.4.

10.1 Haar Measure Using the Riesz representation theorem (Theorem 7.44) we now prove a version of the existence of Haar measures from p. 92. Throughout this section we will be working with real-valued functions. Theorem 10.1 (Existence of Haar measure). Let G be a locally compact, σ-compact, metrizable group. Then there exists a (left) Haar measure mG on G: that is, there is a locally finite Borel measure mG (that is, a Radon measure) that is positive on non-empty open sets and satisfies mG (gB) = mG (B) for every Borel measurable set B ⊆ G and all g ∈ G. Write λg for the left regular representation of G on functions (or equivalence classes of functions) on G defined by λg (f )(h) = f (g −1 h) for g, h ∈ G. This indeed defines a representation since λg1 (λg2 (f )) (h) = λg2 (f )(g1−1 h) = f (g2−1 g1−1 h) = λg1 g2 (f )(h) for all g1 , g2 , h ∈ G and functions f on G. © Springer International Publishing AG 2017 M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications, Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_10

353

354

10 Locally Compact Groups, Amenability, Property (T)

Proof of Theorem 10.1. To motivate the following argument, fix some function φ ∈ Cc+ (G) = {f ∈ Cc (G) | f > 0}r{0} and think of it as a gauge† function. If now another non-trivial function f in Cc+ (G) can be approximated (in a suitable sense) by sums of the form n X

cj λgj φ,

j=1

then we would expect that the (yet to be defined) integral Z f dmG G

will be approximated by

n X j=1

cj

Z

φ dmG .

G

In particular, this would express an approximation to the integral of a general function in terms of the integral of a single chosen function. In order to follow this through it is clear that the gauge function will need to be allowed to vary. Roughly speaking, the more localized φ is, the more functions can be approximated in this way. As an extreme example, if G is compact then φ could be a constant and only the constant functions could be approximated (also see Figure 10.1). For that reason, we will fix throughout another function f0 ∈ Cc+ (G) and normalize all expressions so that the (yet R to be constructed) Haar measure mG will satisfy G f0 dmG = 1. Hence we start the formal argument by defining for functions φ, f ∈ Cc+ (G) the expression ( n ) n X X M (f : φ) = inf cj f 6 cj λgj φ for some c1 , . . . , cn > 0, g1 , . . . , gn ∈ G j=1

j=1

and the normalized quantity

Λφ (f ) =

M (f : φ) . M (f0 : φ)

P P We may think of nj=1 cj λgj φ as a φ-cover of f and of nj=1 cj as the total weight of the φ-cover. Notice that {g ∈ G | φ(g) > 12 kφk∞ } is a non-empty open subset of G, and since f ∈ Cc (G) has compact support it is easy to see that a cover of f as in the definition of M (f : φ) exists, andPso M (f : φ) n is a well-defined non-negative real number. Moreover, if f0 6 j=1 cj λgj φ †

The word ‘gauge’ means a fixed standard of measure like a ruler.

10.1 Haar Measure

355

P then kf0 k∞ 6 nj=1 cj kφk∞ and so M (f0 : φ) > kf0 k∞ kφk−1 ∞ > 0, which implies that Λφ (f ) ∈ R>0 is well-defined. We collect a few immediate properties of Λφ for a scalar α > 0 and functions f, f1 , f2 ∈ Cc+ (G): • • • •

(left-invariance) Λφ (λg f ) = Λφ (f ); (positive homogeneity) Λφ (αf1 ) = αΛφ (f1 ); (monotonicity) Λφ (f1 ) 6 Λφ (f2 ) whenever f1 6 f2 ; and (sub-additivity) Λφ (f1 + f2 ) 6 Λφ (f1 ) + Λφ (f2 ).

These properties are immediate consequences of the definitionP of M (f : φ) n and standard properties of the infimum. For instance, if f1 6 j=1 cj λgj φ Pm and f2 6 k=1 dk λhk φ for some scalars c1 , . . . , cn , d1 , . . . , dm > 0 and group elements g1 , . . . , gn , hP then we obtain a φ-cover of f1 + f2 in 1 , . . . , hm ∈ G, P n m the form f1 + f2 6 j=1 cj λgj φ + k=1 dk λhk φ and so M (f1 + f2 : φ) is Pn Pm bounded above by j=1 cj + k=1 dk . Since the φ-covers of f1 and f2 were arbitrary this implies that M (f1 + f2 : φ) 6 M (f1 : φ) + M (f2 : φ), and the claimed sub-additivity follows after dividing by M (f0 : φ). The main step in the argument is to upgrade the sub-additivity of Λφ to an ‘approximate additivity’ property. For this we have to study not one gauge function but many (see Figure 10.1). To prepare for this, we first show that ( M (f : φ) 6 M (f : f0 )M (f0 : φ), (10.1) Λφ (f ) 6 M (f : f0 ) whenever φ, f ∈ Cc+ (G). Note that the second line follows from the first on dividing by M (f0 : φ). For the proof of the first line in (10.1), suppose that f6

n X

cj λgj f0

j=1

and f0 6

m X

dk λhk φ

k=1

are an f0 -cover and a φ-cover of f and of f0 , respectively. We then have λgj f0 6

m X

k=1

for all j and we obtain the φ-cover

dk λgj hk φ

356

10 Locally Compact Groups, Amenability, Property (T)

f6

n X m X

cj dk λgj hk φ,

j=1 k=1

which gives M (f : φ) 6

n X j=1

cj

!

m X

!

dk .

k=1

Since the f0 -cover of f and the φ-cover of f0 were arbitrary, this implies (10.1).

φ

f1

f2

Fig. 10.1: Using φ, f1 , f2 ∈ Cc (R) as shown, it is clear that Λφ is not additive in general. Here this failure of additivity happens because the gauge function is not sufficiently localized to measure the functions f1 and f2 .

We now come to the approximate additivity property mentioned above. For any two functions f1 , f2 ∈ Cc+ (G) and ε > 0 we claim that there exists a neighbourhood U of e ∈ G with the property that Λφ (f1 + f2 ) 6 Λφ (f1 ) + Λφ (f2 ) 6 Λφ (f1 + f2 ) + ε

(10.2)

for any non-zero function φ ∈ Cc+ (G) with support contained in U . Notice that the first inequality in (10.2) is the sub-additivity shown above, and the second inequality requires an argument that splits a φ-cover of f1 +f2 into two separate φ-covers of f1 and f2 without too much loss of precision. We will do this using an approximate partition of unity as follows. By Urysohn’s lemma (Lemma A.27) we may find a function F ∈ Cc+ (G) such that F ≡ 1 on Supp(f1 + f2 ). Using it we define  δ = min 1, 3M(f1 +fε2 +F :f0 ) (10.3)

and the functions p1 and p2 by ( pk (g) =

fk (g) f1 (g)+f2 (g)+δF (g)

0

for g ∈ Supp fk ; for g ∈ / Supp fk

for k = 1, 2; notice that p1 , p2 ∈ Cc+ (G). By uniform continuity of p1 and p2 there exists some neighbourhood U of e ∈ G such that u ∈ U and g ∈ Supp pk implies that pk (gu−1 ) − pk (g) < δ

10.1 Haar Measure

357

for k = 1, 2. Suppose now that φ ∈ Cc+ (U ) (which we could also write as Cc+ (G) ∩ Cc (U ) with the usual convention that functions on U may be extended to functions on G by setting them to be zero on GrU ) and f1 + f2 + δF 6

n X

cj λgj φ

(10.4)

j=1

is a φ-cover of f1 + f2 + δF . Multiplying this inequality by pk gives fk (g) = (f1 + f2 + δF (g))pk (g) 6

n X

cj pk (g)φ(gj−1 g)

j=1

for all g ∈ G. Fixing g ∈ G and one j in the sum, we see that either pk (g)φ(gj−1 g) = 0 or g ∈ Supp pk and gj−1 g = u ∈ Supp φ ⊆ U , which implies that gj = gu−1 and pk (g)φ(gj−1 g) 6 (pk (gj ) + δ) φ(gj−1 g) in either case. Taking the sum over j gives the φ-cover fk 6

n X

cj (pk (gj ) + δ) λgj φ

j=1

and so M (fk : φ) 6

n X

cj (pk (gj ) + δ)

j=1

for k = 1, 2. Taking the sum over k and using the bound p1 + p2 6 1 we obtain n X M (f1 : φ) + M (f2 : φ) 6 cj (1 + 2δ). j=1

Taking the infimum over all φ-covers of f1 + f2 + δF in (10.4) and dividing by M (f0 : φ) shows that Λφ (f1 ) + Λφ (f2 ) 6 (1 + 2δ)Λφ (f1 + f2 + δF ) 6 Λφ (f1 + f2 ) + δΛφ (F ) + 2δΛφ (f1 + f2 + δF ). Applying (10.1) for φ, f0 and f = f1 + f2 + F we arrive at Λφ (f1 ) + Λφ (f2 ) 6 Λφ (f1 + f2 ) + 3δM (f1 + f2 + F : f0 ) 6 Λφ (f1 + f2 ) + ε, by our choice of δ in (10.3). This proves the second inequality in (10.2).

358

10 Locally Compact Groups, Amenability, Property (T)

It remains to apply a compactness argument and the Riesz representation theorem (Theorem 7.44). For this, notice that Y Ω= [0, M (f : f0 )] f ∈Cc+ (G)

is a compact topological space, and that by (10.1) any non-zero function φ in Cc+ (G) defines an element Λφ ∈ Ω (by thinking of the product space Ω as a space of real-valued functions on Cc+ (G)). For a neighbourhood U of e we can then define Ω(U ) = {Λφ | φ ∈ Cc+ (G) and Supp φ ⊆ U }. It is clear that U1 ⊆ U2 implies that Ω(U1 ) ⊆ Ω(U2 ), which in turn shows that Ω(U1 ) ∩ · · · ∩ Ω(Un ) is non-empty for any finite collection of neighbourhoods U1 , . . . , Un of e. It follows that the intersection \ Ω(U ) U

taken over all neighbourhoods U of e ∈ G is non-empty by compactness of Ω. Let Λ be an element in this intersection. We note that left-invariance, positive homogeneity, and monotonicity of all functions Λφ implies the same properties for Λ (check this). Moreover, Λ is in addition additive. In fact, given functions f1 , f2 ∈ Cc+ (G) and ε > 0 there exists a neighbourhood U satisfying (10.2). Since Λ ∈ Ω(U ) there exists a φ ∈ Cc+ (G) with Supp φ ⊆ U and with |Λφ (f1 ) − Λ(f1 )| < ε, |Λφ (f2 ) − Λ(f2 )| < ε, and |Λφ (f1 + f2 ) − Λ(f1 + f2 )| < ε. Using (10.2) for Λφ we see that Λ(f1 + f2 ) − 3ε 6 Λ(f1 ) + Λ(f2 ) 6 Λ(f1 + f2 ) + 4ε. As ε > 0 was arbitrary, this shows that Λ is additive in the sense that Λ(f1 + f2 ) = Λ(f1 ) + Λ(f2 ) for all f1 , f2 ∈ Cc+ (G). Now extend Λ to all of Cc (G) by setting Λ(0) = 0 and Λ(f ) = Λ(f + ) − Λ(f − ),

(10.5)

10.1 Haar Measure

359

where f + = max{0, f } and f − = max{0, −f }. Applying the argument used immediately after (7.19), we see that Λ is now a positive linear functional on Cc (G), so by the Riesz representation theorem (Theorem 7.44) there exists a unique locally finite measure m with Z Λ(f ) = f dm G

for all f ∈ Cc (G). Since Z Z λg (f ) dm = Λ(λg (f )) = Λ(f ) = f dm G

G

for any g ∈ G and R f ∈ Cc (G) it follows that m is left-invariant. Also note that Λ(f0 ) = 1 = f0 dm. If O ⊆ G is a non-empty open subset, then every compact set can be covered by finitely many left translates of O, showing that m(O) = 0Rwould imply that m(K) = 0 for every compact set, and in particular that G f0 dm = 0, a contradiction. It follows that m is positive on non-empty open subsets, which completes the proof that m = mG is a left Haar measure on G.  Proposition 10.2 (Uniqueness of the Haar measure). Let G be a locally compact, σ-compact metrizable group. Then the left Haar measure is unique up to a positive scalar multiple. For the proof of uniqueness, the following will be useful. Lemma 10.3 (Positive overlaps). Let G be as above, and suppose that m is a left Haar measure. If B1 , B2 ⊆ G are Borel measurable sets with m(B1 ) and m(B2 ) positive, then {g ∈ G | m(gB1 ∩ B2 ) > 0} is non-empty. Moreover, m(B1−1 ) > 0, where B1−1 = {g −1 | g ∈ B1 }. Proof. Let B1 , B2 ⊆ G be as in the lemma. Then Z Z m(gB1 ∩ B2 ) = 1gB1 (h)1B2 (h) dm(h) = 1hB −1 (g)1B2 (h) dm(h), 1

so by Fubini’s theorem we have Z Z m(gB1 ∩ B2 ) dm(g) =

1B2 (h)

Z |

1hB1−1 (g) dm(g) dm(h) {z

}

=m(hB1−1 )=m(B1−1 )

= m(B1−1 )m(B2 ).

Setting B2 briefly equal to G, we obtain m(B1 )m(G) = m(B1−1 )m(G) and see that m(B1−1 ) > 0 (since in measure theory 0 · ∞ = 0). With this we now

360

10 Locally Compact Groups, Amenability, Property (T)

obtain the lemma since we see that the set {g ∈ G | m(gB1 ∩ B2 ) > 0} must have positive measure with respect to m.  Proof of Proposition 10.2. Suppose that m1 , m2 are left Haar measures on G. Define m = m1 +m2 , so that m is a left Haar measure and m1 , m2 ≪ m. By the Radon–Nikodym theorem (Proposition 3.29) there exist measurable functions f1 , f2 > 0 with dmi = fi dm for i = 1, 2. We claim that f1 is constant m-almost everywhere (and so f2 is also). This then implies that m1 = c1 m and m2 = c2 m for some constants c1 , c2 > 0, and so the proposition follows. Assume now that the claim does not hold. Then there exist sets B1 , B2 ⊆ G of positive m measure such that f1 (x) < f1 (y) for all x ∈ B1 and y ∈ B2 . We can find these sets, for example, as pre-images of two distinct interℓ ℓ+1 vals [ nk , k+1 n ) and [ n , n ) for some integers k < ℓ and n > 1, for otherwise f1 is constant m-almost everywhere. By Lemma 10.3 there exists a g ∈ G with m(gB1 ∩ B2 ) > 0. For any E ⊆ G we also have Z Z Z f1 (x) dm(x) = m1 (E) = m1 (g −1 E) = f1 dm = f1 (g −1 x) dm(x) g−1 E

E

E

by the left-invariance of m1 and m. If we now take E ⊆ gB1 ∩ B2 with positive and finite measure, then y ∈ E implies y ∈ B2 and g −1 y ∈ B1 , hence f1 (y) > f1 (g −1 y) and so Z Z f1 (y) dm(y) > f1 (g −1 y) dm(y). E

E

This contradiction proves the claim that f1 must be constant almost everywhere, and hence the proposition.  Essential Exercise 10.4. Let G be a locally compact, σ-compact, metrizable group and let f be a measurable complex-valued function on G with  the property that mG {g ∈ G | f (k −1 g) 6= f (g)} = 0 for every k ∈ G. Show that f = c almost everywhere for some constant c ∈ C. Exercise 10.5. Let G be a locally compact, σ-compact metrizable group with left Haar measure mG . Prove the following assertions. (a) For any continuous automorphism θ of G there exists a positive number modG (θ) with mG (θ −1 (B)) = modG (θ)mG (B) for all Borel subsets B ⊆ G. (b) Applying (a) to inner automorphisms θg defined by θg (h) = ghg −1 for all g, h in G defines a map ∆G : G → R>0 by ∆G (g) = modG (θg ). Show that ∆G , known as the modular character, is a continuous group R homomorphism with respect to multiplication on R>0 . Use this to give a formula for f (gh−1 ) dm(g). (c) Show that Z (right)

mG

∆G (g)−1 dmG (g)

(B) =

B

(right)

defines a right-invariant Haar measure on G and show that mG all Borel subsets B ⊆ G.

(B) = mG (B −1 ) for

10.2 Amenable Groups

361

(d) Show that on a compact metrizable group a left Haar measure is also a right Haar measure.

A group G is called unimodular if any left Haar measure mG is also a right Haar measure. Thus, for example, Exercise 10.5(d) says that compact groups are unimodular. Essential Exercise 10.6 (Approximate identity). Let G be a locally compact, σ-compact, metrizable group and let (Un ) be a sequence of open neighbourhoods of the identity e ∈ G with diam(Un ) → 0 as n → ∞. Let (ψn ) be a sequence of non-negative functions in L1 (G) with the property that ψn R vanishes outside of Un and satisfies G ψn dmG = 1 for all n > 1 (for example, we may set ψn = mG 1(Un ) 1Un ). Show that lim ψn ∗ f = lim f ∗ ψn = f

n→∞

n→∞

for all f ∈ L1 (G). Exercise 10.7. Show that if a locally compact σ-compact metrizable group G has the property that mG (G) is finite, then G is compact. Exercise 10.8. A Haar measure on the additive reals (R, +) is (up to a scalar multiple) the Lebesgue measure dx. Show that a Haar measure on the multiplicative reals (Rr{0}, ·) dx . is given by |x| Exercise 10.9. Let G be the group of affine transformations x 7→ ax + b with a 6= 0 and a, b ∈ R, which may also be thought of as the matrix group G=



a b 1





| a, b ∈ R, a 6= 0

under matrix multiplication. Show that dmG = (right) dmG

da db a2

defines a left Haar measure on G

da db |a|

and = defines a right Haar measure on G. Compute the modular character on G (as defined in Exercise 10.5). Exercise 10.10. Show that dmGLd (R) (g) =

dg | det g|d

defines a left and right Haar measure

on GLd (R), where dg denotes Lebesgue measure on the space of real d × d matrices.

10.2 Amenable Groups †

Using the material of Chapter 8 we continue the discussion from Section 7.2.2, where the concept of amenability was introduced for discrete groups. †

Apart from Exercise 10.35 in Section 10.3, this section will not be used later.

362

10 Locally Compact Groups, Amenability, Property (T)

10.2.1 Definitions and Main Theorem In this section we will always assume that either • G is a discrete (but not necessarily countable) group and m = mG denotes the counting measure defined by m(A) = |A| for all A ⊆ G; or • G is a locally compact σ-compact metrizable group and m = mG denotes a left Haar measure defined on the Borel σ-algebra of G. We recall that in either case the dual space to L1 (G) is precisely L∞ (G) by Proposition 7.34 resp. Exercise 7.33(e). Let us also introduce the convex set Z n o P(G) = f ∈ L1 (G) | f > 0 a.e. and f dmG = 1 G

of probability distributions, and the convex set M (G) of means on G defined by  M (G) = M ∈ L∞ (G)∗ | M is positive and M (1) = 1 ,

where M ∈ L∞ (G)∗ is called positive if Φ ∈ L∞ (G) with Φ > 0 almost everywhere implies M (Φ) > 0. For a function f on G we write λg f (h) = f (g −1 h) for g, h ∈ G. Notice that this definition extends to equivalence classes of functions, so λg is an operator on any function space Lp (G) with p ∈ [1, ∞]. We can now extend the notion of amenability via a suitable form of the characterization in Lemma 7.18. Definition 10.11. We say that G is amenable if there exists a left-invariant mean M on L∞ (G), meaning a mean M ∈ M (G) satisfying in addition the left-invariance property M (Φ) = M (λg Φ) for any Φ ∈ L∞ (G) and g ∈ G. The link between amenability and geometric properties of a group seen in Lemma 7.24 also extends to this setting (see also Proposition 10.19 for a strengthening). Definition 10.12. A group G admits Følner sets if for any compact subset K of G and ε > 0 there exists a measurable set F ⊆ G of positive and finite mmeasure with m(kF △F ) 0 there is a finite set F such that the number of edges in Γ (G, K) leaving F is at most ε|F |. This stands in stark contrast to the property of being an expander graph (see Section 10.4). It should be clear that the two notions above — amenability and admitting Følner sets — are related. In fact, our main goal in this section is to prove Lemma 7.24 and its converse in this more general setting. For the more difficult part of the equivalence one more definition will be useful. Definition 10.14 (Reiter’s condition). A group G fulfills the Reiter condition in L1 if for any compact set K ⊆ G and ε > 0 there exists some f ∈ P(G) with kλk f − f k1 < ε

for all k ∈ K. We say that L2 (G) has almost invariant vectors (or that G fulfills the Reiter condition in L2 ) if for any compact set K ⊆ G and ε > 0 there exists some f ∈ L2 (G) with kf k2 = 1 and with kλk f − f k2 < ε for all k ∈ K. Theorem 10.15. Let G be a discrete group or a locally compact σ-compact metrizable group. Then the following are equivalent: (1) G is amenable; (2) G admits Følner sets; (3) G fulfills the Reiter condition in L1 ; and (4) L2 (G) has almost invariant vectors.

10.2.2 Proof of Theorem 10.15 We will restrict ourselves in the following to R-valued function, but it is not hard to see that with a bit more work this can be avoided. Proof that (2)⇐⇒(3)⇐⇒(4). Suppose (2) holds and F is a Følner set for 1 a compact subset K ⊆ G and ε > 0 as in Definition 10.12. Then f = m(F ) 1F −1 lies in P(G), λk 1F (g) = 1F (k g) = 1kF (g), and so Z 1 1 |1kF − 1F |dm = m (kF △F ) < ε kλk f − f k1 = m(F ) m(F ) 1 m(F )

for all k ∈ K. Similarly, if we set f2 = √

1F we see that

364

10 Locally Compact Groups, Amenability, Property (T)

kλk f2 − f2 k2 = =

Z

s

G

1/2 1 2 (1kF (g) − 1F (g)) dm(g) m(F )

m(kF △F ) √ < ε. m(F )

Since ε > 0 and K ⊆ G were arbitrary, we see that (2) implies (3) and (4). Assuming that L2 (G) has almost invariant vectors. If G satisfies (4) and f2 ∈ L2 (G) satisfies kf2 k2 = 1 and kλk f2 − f2 k2 < ε for all k in the compact set K ⊆ G, then we define f (g) = f2 (g)2 for all g ∈ G and see immediately that f > 0 and kf k1 = kf2 k22 = 1. Moreover, for k ∈ K we also have Z f2 (k −1 g)2 − f2 (g)2 dm(g) kλk f − f k1 = ZG f2 (k −1 g) − f2 (g) f2 (k −1 g) + f2 (g) dm(g) = G

= h|λk f2 − f2 |, |λk f2 + f2 |iL2 (G)

6 kλk f2 − f2 k2 kλk f2 + f2 k2 6 2ε. Therefore (4) implies (3). Assuming the Reiter condition in L1 (G). Assume now that (3) holds. We wish to find a Følner set as in Definition 10.12. Therefore let K ⊆ G be compact and assume without loss of generality that mG (K) > 0. Further fix ε > 0 and let f ∈ P(G) be as in Reiter’s condition in Definition 10.14. For every α > 0 we define the measurable set Fα = {g ∈ G | f (g) > α}, which will be a Følner set if we choose α carefully. By Fubini’s theorem we have Z ∞ Z ∞Z m(Fα ) dα = 1Fα (g) dm(g) dα 0 0 G Z Z ∞ Z = 1Fα (g) dα dm(g) = f (g) dm(g) = kf k1 = 1. G

0

G

Moreover, for any k ∈ K we also have Z ∞ Z ∞Z m (kFα △Fα ) dα = |1kFα (g) − 1Fα (g)| dm(g) dα 0 G Z0 Z ∞ 1Fα (k −1 g) − 1Fα (g) dα dm(g) = ZG 0 f (k −1 g) − f (g) dm(g) = kλk f − f k1 < ε. = G

Integrating this over K we obtain

10.2 Amenable Groups

Z

0

∞Z

K

365

m(kFα △Fα ) dm(k) dα < εm(K) =

Z

εm(K)m(Fα ) dα.

0

Therefore there must exist some α ∈ (0, ∞) such that Z m(kFα △Fα ) dm(k) < εm(K)m(Fα ).

(10.6)

K

In the case when G is discrete, K is finite, and this gives |kFα △Fα | < ε|K||Fα | for all k ∈ K. Since ε > 0 was arbitrary this proves (2) in the discrete case. In the non-discrete case, the statement in (10.6) is an averaged form of the inequality we are seeking, and as a result seems to be weaker than what we need. For the upgrade we use the fact that ε > 0 was arbitrary: we have shown that for any ε > 0 and δ > 0 there exists a measurable set F = Fα such that Z m(kF △F ) dm(k) < εδm(F ) < ∞. K

In particular, we must have m(N ) < δ if N = {k ∈ K | m(kF △F ) > εm(F )}. Summarising, we have shown for any compact set K, any ε > 0, and any δ > 0 that there exists a measurable set F with finite measure and a subset N ⊆ K with m(N ) < δ such that m(kF △F ) < εm(F )

(10.7)

for all k ∈ KrN . We now use the group structure to upgrade this and deduce the existence of Følner sets. Define K1 = K ∪K 2 and δ = 21 m(K). Now apply the argument above to K1 , an arbitrary ε > 0, and this choice of δ. This gives a measurable set F ⊆ G of finite measure satisfying (10.7) for all k1 ∈ K1rN and some exceptional set N ⊆ K1 of measure m(N ) < 21 m(K). Now fix some k ∈ K. We see that (10.7) holds for k1 ∈ KrN where m(KrN ) > 21 m(K), and also that (10.7) holds for all kk1 ∈ (kK)rN where m ((kK)rN ) > 12 m(K). Since left translation by k preserves the measure m, it follows that there exists some k1 ∈ K such that (10.7) holds both for k1 and for kk1 . Therefore m(kF △F ) 6 m ((kF △kk1 F ) ∪ (kk1 F △F )) 6 m(F △k1 F )+εm(F ) < 2εm(F ) for any k ∈ K, proving (2).



To summarize, we have shown that (2), (3) and (4) are equivalent. We now turn to the equivalence between (1) and (3), which is where functional analysis will play an important role. In the following we will frequently use

366

10 Locally Compact Groups, Amenability, Property (T)

the left-invariance of m in the form Z Z −1 hλk f, Φi = f (k g)Φ(g) dm(g) = f (g ′ )Φ(kg ′ ) dm(g ′ ) = hf, λk−1 Φi for f ∈ L1m (G), Φ ∈ L∞ (G) and k ∈ G. Proof that (3)=⇒(1) in Theorem 10.15. Assume that G fulfills Reiter’s condition. This shows that for a given ε > 0 and compact K ⊆ G the function f as in Definition 10.14 satisfies |hf, λk−1 Φ − Φi| = |hf, λk−1 Φi − hf, Φi| = |hλk f − f, Φi| 6 εkΦk∞ for k ∈ K and Φ ∈ L∞ (G). Taking the image of such functions under the embedding map ı into the dual of L∞ (G) we see that  A ε, Φ, k = {M ∈ M (G) | |M (Φi − λkj Φi )| 6 εkΦi k∞ for all i, j} is non-empty for any choice of ε > 0,

Φ = (Φ1 , . . . , Φℓ ) ∈ (L∞ (G))ℓ , k = (k1 , . . . , kn ) ∈ Gn ,

and any ℓ, n ∈ N. By definition

 A ε, Φ, k ⊆ M (G)

is weak* closed and contained in the closed unit ball of L∞ (G)∗ (check this). Since any finite intersection of such sets will contain  another such set we see that the collection of sets of the form A ε, Φ, k has the finite intersection property. By the Banach–Alaoglu theorem (Theorem 8.10) it follows that the intersection over these sets is non-empty. By definition, this intersection consists of all left-invariant means on L∞ (G).  For the converse, which is perhaps the most surprising part of the whole proof, we will need the following lemma. Lemma 10.16. Let G be as above, and let ı : L1 (G) −→ L1 (G)∗∗ = L∞ (G)∗ be the natural embedding into the bidual of L1 (G). Then the weak* closure of the image of P(G) under ı in L∞ (G)∗ is M (G). Proof. Assume for the purpose of a contradiction that there is some mean M ∈ M (G) that is not in the weak* closure K of ı(P(G)). Applying Theorem 8.73 to X = L∞ (G)∗ equipped with the weak* topology, the closed set K, and M ∈ / K gives a continuous linear functional on X separating M from K. By Lemma 8.13 this functional is an evaluation map at

10.2 Amenable Groups

367

some Φ ∈ L∞ (G). Hence the conclusion of Theorem 8.73 is precisely that there is some c ∈ R with Z f Φ dm = hΦ, ı(f )i 6 c < M (Φ) G

for all f ∈ P(G). This implies that Φ 6 c almost everywhere, since otherwise we could find a measurable set B ⊆ G of finite positive measure with Φ(g) > c 1 for g ∈ B, and then setting f = m(B) 1B ∈ P(G) leads to a contradiction. However, Φ 6 c almost everywhere also implies that M (Φ) 6 c by the properties of M ∈ M (G). This contradiction proves the lemma.  We start with the discrete case as it is significantly easier. Proof of (1)=⇒(3) in Theorem 10.15 for discrete G. Assume that there exists a left-invariant mean M . Using M we wish to find, for any ε > 0 and finite K ⊆ G, a function f ∈ P(G) such that kλk f − f k1 < ε for all k ∈ K. Define the bounded linear operator K D : ℓ1 (G) −→ ℓ1 (G)

f 7−→ (λk f − f )k∈K . norm

. Note that D(P(G)) is convex, and we wish to show that 0 ∈ D(P(G)) norm By Corollary 8.74 we know that D(P(G)) is also closed in the weak topology. Therefore it is enough to show that 0 ∈ D(P(G))

weak

= D(P(G))

norm

.

(10.8)

K The dual of ℓ1 (G) is given by (ℓ∞ (G))K and it suffices to find, for every Φ1 , . . . , Φn ∈ ℓ∞ (G) and ε > 0, some f ∈ P(G) with |hλk f − f, Φj i| < ε

(10.9)

for all k ∈ K and j = 1, . . . , n. The left-hand side of (10.9) may be rewritten as |hλk f − f, Φj i| = |hf, λk−1 Φj − Φj i| = |hλk−1 Φj − Φj , ı(f )i| .

(10.10)

Note that hλk−1 Φj − Φj , M i = M (λk−1 Φj − Φj ) = 0 for the invariant mean M . By Lemma 10.16 we know that ı (P) is dense in M (G) with respect to the weak* topology, so there must exist an element f of P(G) for which (10.10) is less than ε for all k ∈ K and j = 1, . . . , n, which proves (10.9), (10.8), and hence that G fulfills the Reiter condition in L1 . 

368

10 Locally Compact Groups, Amenability, Property (T)

In the non-discrete case another ingredient is needed. Lemma 10.17 (Topological left-invariant mean). Let G be a σ-compact, locally compact, metrizable amenable group. Then there also exists a ‘topologically left-invariant mean’ on L∞ (G), that is, a mean Mtop on L∞ (G) such that Mtop (f ∗ Φ) = Mtop (Φ) for any f ∈ P(G) and Φ ∈ L∞ (G).

Proof. We start by noting that the definition of f ∗ Φ in Lemma 3.75 and Exercise 3.76 makes sense at every g ∈ G and easily implies that kf ∗ Φk∞ 6 kf k1 kΦk∞

(10.11)

for any f ∈ L1 (G) and Φ ∈ L∞ (G). Given f0 , f1 ∈ P(G) and Φ ∈ L∞ (G) we define Φ0 = f0 ∗ Φ and claim that M (f1 ∗ Φ0 ) = M (Φ0 )

(10.12)

if M is a left-invariant mean on L∞ (G). For this we first recall from Lemma 3.74 that for a given f0 and ε > 0 there exists a neighbourhood U of e ∈ G such that kλk f0 − f0 k1 < ε for all k ∈ U . Using the left-invariance of the Haar measure we obtain Z λk Φ0 (g) = (f0 ∗ Φ)(k −1 g) = f0 (h)Φ(h−1 k −1 g) dm(h) G Z = f0 (k −1 h1 )Φ(h−1 (with h1 = kh) 1 g) dm(h1 ) G  = (λk f0 ) ∗ Φ (g). (10.13) Together with (10.11) we deduce that

kλk Φ0 − Φ0 k∞ = k(λk f0 − f0 ) ∗ Φk∞ 6 kλk f0 − f0 k1 kΦk∞ 6 εkΦk∞ for all k ∈ U . In other words, for Φ0 the left regular representation satisfies the continuity claim appearing in Lemma 3.74, but with respect to the k · k∞ norm. This property of Φ0 is called left uniform continuity of Φ0 . As this is precisely the assumption for the strong integral R discussed in Proposition 3.81, it follows that for f1 ∈ Cc (G) the integral R- f1 (g)λg Φ0 dmG (g) can be obtained as a limit with respect to k · k∞ of Riemann sums of the form

10.2 Amenable Groups

369

X

f1 (gp )λgp Φ0 mG (P ),

P ∈ξ

where ξ is a finite partition of Supp(f1 ) and gp ∈ P for each P ∈ ξ. As convergence with respect to k · k∞ implies pointwise convergence, we see R that R- f1 (g)λg Φ0 dmG (g) = f1 ∗ Φ0 . Applying the continuous functional M we see that Z X M (f1 ∗ Φ0 ) = lim f1 (gp )m(P )M (λgp Φ0 ) = M (Φ0 ) f1 dm ξ

P ∈ξ

G

since M (λg Φ0 ) = M (Φ0 ) for any g ∈ G. Using the estimate (10.11) again and the density of Cc (G) in L1 (G) this extends to all f1 ∈ L1 (G). Restricting to functions in P(G) the claim in (10.12) follows. We now make the definition Mtop (Φ) = M (Φ0 ) = M (f0 ∗ Φ) for some f0 ∈ P(G). Note that Mtop (1) = M (1) = 1 and that Φ > 0 almost surely implies f0 ∗ Φ > 0 and Mtop (Φ) > 0. We also claim that this definition is independent of f0 . Using this independence we see that Mtop (f1 ∗ Φ) = M (f0 ∗ f1 ∗ Φ) = Mtop (Φ) for any f1 ∈ P(G) by associativity of convolution (cf. the proof of Proposition 3.91), and the lemma follows. To see the independence let (ψn )n be an approximate identity in L1 (G) (see Exercise 10.6) so that lim kf0 ∗ ψn − f0 k1 = 0,

n→∞

and so by (10.11) and continuity of M also lim M (f0 ∗ ψn ∗ Φ) = M (f0 ∗ Φ).

n→∞

Combining this with (10.12) and ψn ∈ P(G) we see that lim M (ψn ∗ Φ) = M (f0 ∗ Φ),

n→∞

which gives the claim and the lemma.



Proof of (1) =⇒ (3) in Theorem 10.15 without discreteness. Fix ψ1 , . . . , ψn ∈ P(G). Applying the same argument as in the discrete case but to the map

370

10 Locally Compact Groups, Amenability, Property (T)

n D : L1 (G) −→ L1 (G)

f 7−→ (ψj ∗ f − f )j

and using the existence of a topological left-invariant mean as in Lemma 10.17, we conclude that it is possible to find, for every ε > 0, some f ∈ P(G) such that kψj ∗ f − f k1 < ε (10.14) for j = 1, . . . , n. Now fix some ψ ∈ P(G) and some dense countable subset {g1 , g2 , . . . } ⊆ G with g1 = e. Define ψj = λgj ψ and apply the argument above to ψ1 , . . . , ψn and ε = n1 . This shows that there exists a sequence (fn ) in P(G) with kλgj ψ ∗ fn − fn k1 −→ 0 as n → ∞ for every j. Now let K ⊆ G be a compact subset and fix ε > 0. By Lemma 3.74 there exists some neighbourhood U of e ∈ G such that kλu ψ − ψk1 < ε for u ∈ U . By density of {g1 , g2 , . . . } we have G = of K there exists some ℓ such that K⊆

ℓ [

S∞

j=1

gj U . By compactness

gj U.

(10.15)

j=1

By construction we can choose n large enough to ensure that kλgj ψ ∗ fn − fn k1 < ε for j = 1, . . . , ℓ. For k ∈ K there exists by (10.15) some j 6 ℓ with k = gj u for some u ∈ U , which shows that kλk ψ − λgj ψk1 = kλgj (λu ψ − ψ) k1 < ε. Thus we deduce (after recalling that g1 = e and fn ∈ P(G)) that  kλk ψ ∗ fn − ψ ∗ fn k1 6 k λk ψ − λgj ψ ∗ fn k1

+ kλgj ψ ∗ fn − fn k1 + kfn − ψ ∗ fn k < 3ε.

Finally, notice that

10.2 Amenable Groups

371

 (λk ψ) ∗ fn (g) =

= =

Z Z Z

λk ψ(h)fn (h−1 g) dm(h) ψ(k −1 h)fn (h−1 g) dm(h) −1 ψ(h1 )fn (h−1 g) dm(h1 ) = λk (ψ ∗ fn ) (g) 1 k

for all g ∈ G and k ∈ K. Hence the function ψ ∗ fn ∈ P(G) satisfies Reiter’s condition for K ⊆ G and 3ε.  Exercise 10.18. Fill in the details of the argument leading to (10.14).

10.2.3 A More Uniform Følner Set For subsets A, B ⊆ G of a group we define AB = {ab | a ∈ A, b ∈ B}. Proposition 10.19. Let G be an amenable group as in Theorem 10.15. Then for every non-empty compact K ⊆ G and every ε > 0 there exists a measurable Følner set F ⊆ G with finite measure such that m((KF )△F ) < εm(F ). Proof. Let U = U −1 be a compact neighbourhood of the identity e ∈ G. Given a Følner set F for (U, ε) we define a function f : G → R by Z Z 1 1 f (g) = 1F (ug) dm(u) = 1F dm, m(U ) U m(U g) Ug where we will think of f (g) as the proportion of positive answers in the neighbourhood U g of g to the question of whether g should belong to an improved version of F . In case G is not unimodular, we multiplied the integral and the denominator in the first expression by ∆G (g), used m(U g) = ∆G (g)m(U ) in the denominator and the substitution h = ug ∈ U g for u ∈ U in the integral (at first reading it may be helpful to assume that G is unimodular as this simplifies some of the expressions arising). Given any majority parameter α ∈ (0, 1) we also define the set Fα = Fα (F, U ) by Fα = {g ∈ G | f (g) > α} = {g ∈ G | m(F ∩ U g) > αm(U g)}, which will be a more well-rounded version of F . The defining property of F and the definition of f together with Fubini’s theorem imply that Z 1 k1 −1 − 1F k1 dm(u) < εm(F ). kf − 1F k1 6 m(U ) U u F This gives

βm ({g ∈ G | |f − 1F |(g) > β}) < εm(F )

372

10 Locally Compact Groups, Amenability, Property (T)

for all β ∈ (0, 1). Setting β = min{α, 1 − α} we obtain m (Fα △F )
1. Suppose now that F is a Følner set for (K ∪ U 2 , nε ). Set α = 12 and define the associated set F ′ = F1/2 (F, U ) as above, so that m (F ′ △F ) < by (10.16). Assuming ε
1 we see, from the Følner property of F for k1 ∈ K and (10.17), that m(F ′r(KF ′ )) 6 m(F ′r(k1 F ′ )) ≪ εm(F ′ ). For the second inequality we first claim that U F ′ ⊆ Fα = Fα (F, U 2 )

(10.18)

for the parameter α=

m(U ) 2m(U 2 ) maxu∈U

∆(u)

,

which only depends on our choice of the neighbourhood U . In fact, for u ∈ U and g ∈ F ′ = F1/2 (F, U ) we have m(F ∩ U 2 ug) > m(F ∩ U g) > 21 m(U g) =

m(U ) m(U 2 g) > αm(U 2 ug), 2m(U 2 )

which implies (10.18). Since F is a Følner set for (U 2 , nε ) we obtain from (10.18) and (10.16) that m(U F ′rF ) 6 m(FαrF ) ≪

ε ε m(F ) ≪ m(F ′ ), n n

10.2 Amenable Groups

373

where the implicit constant only depends on the choice of U . Using the fact that F is a Følner set for ({k1 , . . . , kn }, nε ) and (10.17), this gives ε m(kj U F ′rF ′ ) 6 m(kj U F ′rkj F ) + m(kj F rF ) + m(F rF ′ ) ≪ m(F ′ ) n Sn for j = 1, . . . , n. Taking the union and recalling that K ⊆ j=1 kj U , we obtain ! n [ ′r ′ ′r ′ m(KF F ) 6 m kj U F F ≪ εm(F ′ ). j=1

Since ε was arbitrary, this concludes the proof.



10.2.4 Further Equivalences and Properties We conclude the discussion of amenability with a number of exercises that extend the treatment above and generalize various earlier topics to the level of generality of this section. Exercise 10.20. Let G be a discrete group. Show that G is amenable if and only if every finitely generated subgroup of G is amenable. Exercise 10.21. Let G be a σ-compact, locally compact, metric group. (1) Show that Definition 10.12 and Definition 10.14 could equivalently be formulated by using only finite subsets K ⊆ G. (2) Show that if G is amenable, then there exists a mean that is left-invariant and topologically left-invariant.

Unless otherwise noted G will be, as in Theorem 10.15, either a discrete group or a locally compact σ-compact metrizable group. Exercise 10.22. Let G be an amenable group. (1) Show that there exists a bi-invariant mean on L∞ (G), that is, one which is left-invariant and right-invariant (defined in the same way). (2) Assume that G is in addition unimodular. Show that G admits bi-invariant Følner sets, in the sense that they are almost invariant under left and right translation by a given compact subset K ⊆ G.

An action of a topological group G on a locally convex vector space X is affine if every g ∈ G acts via a map V ∋ v 7→ πgaff (v) = πglin (v) + wg , where wg depends continuously on g and πglin (v) depends continuously on (g, v) ∈ G×V , and linearly on v. Exercise 10.23. Show that the following properties are equivalent. (1) G is amenable. (2) If G acts continuously on a compact metric space X (see Definition 3.70), then there exists a G-invariant Borel probability measure on X. (3) If G acts continuously by affine maps on a locally convex space V and K ⊆ V is compact, convex, and G-invariant, then there exists a point x0 ∈ K that is fixed under all elements of G.

374

10 Locally Compact Groups, Amenability, Property (T)

Exercise 10.24. Generalize Proposition 7.20 to the groups considered here: (1) Show that if G is amenable and H < G is a closed subgroup then H is amenable. (2) Show that if H ⊳ G is a closed normal subgroup with the property that both H and G/H are amenable, then G is also amenable. Exercise 10.25. Let H < G be a closed subgroup with the property that X = G/H supports a finite G-invariant Borel measure. Show that G is amenable if and only if H is.

In the remainder of the section we assume that G is discrete and generated by a finite symmetric set S, where a subset S of a group G is symmetric if s ∈ S implies that s−1 ∈ S. The associated length function assigns to each element g ∈ G the number ℓS (g) ∈ N0 defined by the length of the shortest representation of g as a product of elements of S, and the associated growth function is defined by γS (n) = |{g ∈ G | ℓS (g) 6 n}| for n ∈ N0 . In order to define a growth property intrinsic to the group G rather than the pair (G, S), write γ ∼ γ ′ for functions γ, γ ′ : N → N if there exist positive constants C1 , C2 , κ1 , κ2 such that C1 γ ′ (κ1 n) 6 γ(n) 6 C2 γ ′ (κ2 n) for all n > 1. Exercise 10.26. Let G be generated by a symmetric set S. Show that setting d(g, h) = ℓS (gh−1 ) defines a metric on G. Exercise 10.27. Show that the equivalence class [γS ]∼ of the growth function of a finitely generated group is well-defined (meaning that it is independent of the choice of symmetric generating set), allowing us to write γ (G) for any representative of the equivalence class.

As a result we may make the following definition. A finitely generated infinite group G has • polynomial growth if γ (G) ∼ pa for some a > 0, where pa (n) = na for all n > 1; • exponential growth if γ (G) ∼ exp, where exp(n) = en for n > 1; 1/n • sub-exponential growth if lim supn→∞ γ (G) (n) 6 1; and • intermediate growth if it is of neither polynomial nor exponential growth. Exercise 10.28. (1) Show that a group of sub-exponential growth is amenable. (2) Show that the Heisenberg group in Exercise 7.27 has polynomial growth. (3) Show that the group n  o a b | a ∈ 2Z , b ∈ Z[ 12 ] 01 is finitely generated, amenable, and has exponential growth.

10.3 Property (T)

375

10.3 Property (T) †

In this section we will connect the spectral theory of unitary flows in Section 9.3 to the discussion of expanders in Section 10.4. As we will see, the connection will be via another property that topological groups may have. 10.3.1 Definitions and First Properties Let us start with some fundamental definitions where we will assume that G is a topological group and π is a unitary representation of G on a complex Hilbert space H. Definition 10.29 (Almost-invariant vectors). Given ε > 0 and a subset Q ⊆ G, we say that a unit vector v ∈ H is (Q, ε)-almost invariant if sup kπg v − vk 6 ε.

g∈Q

We also say that the unitary representation π has almost-invariant vectors if it has (Q, ε)-almost invariant unit vectors for any ε > 0 and compact Q ⊆ G. The case of (G, 0)-almost invariant vectors corresponds trivially to invariant vectors. Also note that every unit vector is trivially (G, 2)-invariant. Another elementary but less immediate observation is contained in the following exercise which relies on the geometry of Hilbert spaces. Exercise 10.30. Suppose ε ∈ (0, 1). Assume that v ∈ H is a (G, ε)-almost invariant unit vector. Show that there exists a non-zero vector that is invariant under all of G.

Definition 10.31 (Spectral gap). We say that π has spectral gap if π restricted to (HG )⊥ does not have almost-invariant vectors, where HG = {v ∈ H | πg v = v for all g ∈ G} is the subspace of G-invariant vectors. Equivalently, we have spectral gap if there exists a compact subset Q ⊆ G and some ε > 0 such that every unit vector v ∈ (HG )⊥ is moved at least by ε by some g ∈ Q; more precisely if supg∈Q kπg v − vk > ε. Definition 10.32 (Property (T)). Let H < G be a closed subgroup of a topological group G. We say that (G, H) has relative property (T) if whenever a unitary representation π of G has almost-invariant vectors, then it has a non-zero vector fixed by H. We say that G has property (T) if (G, G) has relative property (T). † This section will not be used later in the book except for Section 10.4. Amenability and property (T) will be (almost) exclusive. We note that apart from Exercise 10.35 the following will be independent of Section 10.2.

376

10 Locally Compact Groups, Amenability, Property (T)

We note that the letter ‘T’ in property (T) stands for the trivial representation and that the parentheses indicate a neighbourhood of the trivial representation. In fact, there is a definition of a topology on the family of irreducible unitary representations of a topological group G — the Fell topology — such that property (T) is equivalent to the trivial representation being isolated in that topology. Finding groups without property (T) is quite easy. Example 10.33. Let G = Z or G = R. Then G does not have property (T). Justification of Example 10.33. Let H = L2 (G) and use the regular representation (λ, H) defined by λx f (y) = f (y − x) for all x, y ∈ G. Let mG denote the Haar measure on G (that is, counting or Lebesgue measure). Let Fn = [−n, n] in G for all n > 1. Then fn = mG (Fn )−1/2 1Fn has norm 1 and is almost-invariant in the sense that 1/2 mG (Fn + x)△Fn −→ 0 kλx fn − fn k = mG (Fn )1/2 as n → ∞, uniformly on compact sets. If G did have property (T), then L2 (G) would have to contain a G-invariant function. However, a G-invariant function on L2 (G) would have to be constant (see Exercise 10.4). Since mG (G) = ∞, no non-zero constant function can lie in L2 (G). Therefore, G does not have property (T).  Exercise 10.34. Show that if G is a topological group with property (T), and φ is a continuous homomorphism from G to G′ with dense image, then G′ also has property (T). Conclude that the free group F (with at least one generator) does not have property (T). Exercise 10.35. Let G be a discrete or locally compact σ-compact metrizable group. Show that G is compact if and only if G is amenable and has property (T).

Comparing Definitions 10.31 and 10.32, we see that a unitary representation of a group with property (T) always has a spectral gap. The next lemma shows that more is true. Lemma 10.36 (Uniform spectral gap). Suppose G is a locally compact σcompact metrizable group. Then it suffices to consider only separable Hilbert spaces in the definition of property (T). Moreover, assuming that G has property (T) all unitary representations of G have uniform spectral gap in the sense that there exists some ε > 0 and Q ⊆ G compact such that for any unit⊥ ary representation π on a Hilbert space H and any unit vector v ∈ HG , there is some g ∈ Q with kπg v − vk > ε.

10.3 Property (T)

377

S Proof. Note that by Lemma A.22, G can be written as ∞ n=1 Qn for some compact subsets Qn ⊆ G with Qn ⊆ Qon+1 for all n > 1. Suppose that G does not satisfy the uniform spectral gap property in the lemma. Then for every n > 1 there is a unitary representation (πn , Hn ) without fixed vectors such that there exists a vector vn ∈ Hn that is (Qn , n1 )almost invariant. Since the unitary representation is continuous (and G is separable), it follows that Sn = {πn,g vn | g ∈ G} ⊆ Hn is separable. It follows that the closed linear hull Hn′ = (H L n )vn of Sn is a separable Ginvariant subspace ofL Hn . Now define H = n Hn′ with the natural unitary representation π = n πn |H′n of G on H (see Exercise 3.77) and notice that (π, H) has no non-zero G-invariant vectors. Moreover, it has almostinvariant vectors since for every ε > 0 and compact K ⊆ G there exists some n such that K ⊆ Qn , n1 6 ε, and hence vn ∈ Hn′ ⊆ H is (K, ε)-almost invariant. It follows that the failure of the uniform spectral gap property implies the existence of a unitary representation on a separable Hilbert space without spectral gap. This proves both statements of the lemma.  Exercise 10.37. Show that a discrete group with property (T) is finitely generated.

10.3.2 Main Theorems In the following we will consider the groups SLd (R) endowed with the topo2 logy induced by the inclusion SLd (R) ⊆ Matd,d (R) ∼ = Rd . Kaˇzdan gave the definition of property (T) in 1967 and also gave the first examples of such groups. Theorem 10.38 (Kaˇ zdan). SL3 (R) has property (T). We note that G = SL2 (R) does not have property (T), but despite this, many of its natural (and all of its irreducible) unitary representations have spectral gap; we refer to [26] for references and a detailed discussion. The main tool for proving the above theorem is the following relative version.  Theorem 10.39 (Kaˇ zdan). ASL2 (R), R2 has relative property (T), where    Ax 2 2 ASL2 (R) = SL2 (R) ⋉ R = | A ∈ SL2 (R), x ∈ R . 0 1 As we will see there is a way to push property (T) from the group SL3 (R) to its discrete counterpart SL3 (Z). Corollary 10.40 (Kaˇ zdan). SL3 (Z) has property (T). As Margulis showed in 1988 discrete groups with property (T) quickly give rise to expander families, which we will introduce in the next section.

378

10 Locally Compact Groups, Amenability, Property (T)

10.3.3 Proof of Kaˇ zdan’s Property (T), Connected Case For the proof of Theorem 10.38 we need the following property of unitary representations of G = SLd (R) for d = 2, 3 (due to Mautner [69] and Moore [75]). For this we define the subgroup     1x U12 = ux = |x∈R 01 of SL2 (R). Identifying R2 with the subspace R2 × {0}d−2 of Rd , we obtain an embedding   g g 7−→ Id−2 for g ∈ SL2 (R) of SL2 (R) into SLd (R) for d > 3 and may think of U12 also as a subgroup of SLd (R). Conjugating U12 with permutation matrices we obtain other subgroups of SLd (R), which we will refer to as elementary unipotent subgroups.

ý

Proposition 10.41 (Mautner phenomenon). Let π : SLd (R) H for some d > 2 be a unitary representation. Suppose that v ∈ H satisfies either • πa v = v for a non-trivial positive diagonal matrix a ∈ SLd (R), or • πu v = v for all elements u ∈ U of an elementary unipotent subgroup U . Then v is an invariant vector, meaning that πg v = v for all g ∈ SLd (R). For the proof we will use the following algebraic fact for K = R. For any field K the group SLd (K) is generated by the elementary unipotent subgroups (defined as above but with x ∈ K). This may be seen using a modified Gauss elimination algorithm: given any g ∈ SLd (K) it is clear that the first column is non-zero. Multiplying g on the left by elements of U12 (or another elementary unipotent subgroup) corresponds to the row operation of adding a multiple of the second row to the first row (or the same with any two other rows). For example, for d = 2 we have      1x ab a + xc b + xd = 01 cd c d for all x, a, b, c, d ∈ R. Using such operations we can obtain matrices g ′ , g ′′ and ge that satisfy increasingly stronger properties: ′ • g21 6 0, = ′′ • g11 = 1, • ge11 = 1, and ge21 = ge31 = · · · = ged1 = 0.

Multiplying on the right by the same type of matrices corresponds to column operations which allows us to find now a matrix gb ∈ HgH satisfying

10.3 Property (T)

379

• gb11 = 1, gb1k = gbk1 = 0 for k > 2,

where H is the subgroup of SLd (K) generated by the elementary unipotent subgroups. Using induction on the number of variables we see that gb ∈ H, which implies that g ∈ H and thus H = SLd (K).

Proof of Proposition 10.41. As we will see we only have to multiply matrices and use continuity of the unitary represenation. Let us first consider the case d = 2 and a diagonal positive matrix   t 0 a= , 0 t−1 where we may assume without loss of generality that t > 1. Let   1x u= ∈ U12 01 for some x ∈ R and notice that    −2n  1t x −n 1 x n lim a a = lim = I. 01 1 n→∞ n→∞ 0 If πa v = v then continuity of the unitary representation implies that kπu v − vk = kπu πan v − πan vk = kπa−n uan v − vk −→ 0

as n → ∞. Therefore πu v = v for all u ∈ U12 . Using the same argument with πa−1 v = v and the relation   1 0 −n lim an a = I, x1 n→∞ we see that v is fixed by both elementary unipotent subgroups, and hence by all of SL2 (R). Staying with the case d = 2, suppose now that v is fixed by the subgroup U12 . Define   1 0 gn = 1 n 1 and calculate un gn u−n/2 =

      1n 1 0 1 − n2 2 0 = 1 1 1 , 01 0 1 n 1 n 2

which shows that lim un gn u−n/2 =

n→∞

  20 = a2 . 0 12

Using continuity of the unitary representation again we see that

380

10 Locally Compact Groups, Amenability, Property (T)

kπa2 v − vk = lim kπun πgn πu−n/2 v − vk n→∞

= lim kπgn v − π−un vk = lim kπgn v − vk = 0 n→∞

n→∞

since gn → I as n → ∞. Therefore, v is also invariant under a2 and the first part of the proof shows that v is fixed by all of SL2 (R). The case of the other elementary unipotent subgroup follows by the same argument. Let us note that the first argument above also applies for a non-trivial diagonal matrix a ∈ SLd (R) with positive eigenvalues a1 , . . . , ad > 0 in the following way: If πa v = v and, for example, a1 6= a2 , then v is also fixed by the subgroup obtained by embedding SL2 (R) into the upper left 2-by-2 block in SLd (R). Suppose now that d = 3 and v is fixed by a non-trivial positive diagonal matrix a with eigenvalues a1 , a2 , a3 . Assume that a1 6= a2 (the other cases are similar, or can be reduced to this one by using permutation matrices). In this case v is fixed by the subgroup H obtained by embedding SL2 (R) into the upper left 2-by-2 block in SL3 (R) and in particular by   200 a′ = 0 21 0 ∈ H. 001

Since the eigenvalues of a′ satisfy a′1 6= a′3 and a′2 6= a′3 we may repeat the argument for SL2 (R) twice more and see that v is fixed by all elementary unipotent subgroups, which implies that v is fixed by all of SL3 (R). Remaining with the case d = 3, suppose that v is fixed by an elementary unipotent subgroup U . Since U is again contained in a subgroup H ∼ = SL2 (R) we see that v is invariant under a non-trivial positive diagonal element to which we may apply the arguments above. The case d > 3 follows similarly by induction and will not be needed later, so we leave this part of the proof to the reader (see Exercise 10.42(a)).  Exercise 10.42. (a) Confirm that the case d > 3 in Proposition 10.41 may be seen using the same argument. (b) Suppose that u ∈ SLd (R) is a non-trivial unipotent element (that is, u 6= I and all eigenvalues of u are equal to 1). Show that for any unitary representation π : SLd (R) H any v ∈ H with πu v = v is invariant under all of SLd (R).

ý

Proof of Theorem 10.38 assuming Theorem 10.39. Note that    Ax ASL2 (R) = SL2 (R) ⋉ R2 = | A ∈ SL2 (R), x ∈ R2 0 1 is a closed subgroup of SL3 (R). Suppose π is a unitary representation of SL3 (R) that has almost-invariant vectors. Restricting π to ASL2 (R) we obtain a representation of ASL2 (R) that has almost-invariant vectors. Then by Theorem 10.39, H contains a non-zero vector v that is fixed by

10.3 Property (T)

381

  Ix 01 for all x ∈ R2 . By the Mautner phenomenon (Proposition 10.41) this implies that v is fixed by all of SL3 (R). It follows that SL3 (R) has property (T).  For the proof of Theorem 10.39 we will use the spectral measures from Bochner’s theorem for unitary flows (Theorem 9.56). Lemma 10.43 (Normalizer and push-forward). Let G = ASL2 (R) and suppose π is a unitary representation of G on a complex Hilbert space H. Let µv denote the spectral measure of v ∈ H with respect to the restriction of π t to R2 ⊳ G. For every A ∈ SL2 (R) and v ∈ H we then have µπA v = (A )−1 ∗ µv where πA is the unitary operator obtained from the matrix A, thought of as an element of ASL2 (R). Proof. Recall that for any v ∈ H the spectral measure µv is uniquely determined by the property Z

πu(x) v, v = e2πix·t dµv (t) R2

for all x ∈ R2 , where we use the injective homomorphism u : R2 → ASL2 (R) defined by   Ix u(x) = 01 for x ∈ R2 . Applying this to πA v, we have Z

e2πix·t dµπA v (t) = πu(x) πA v, πA v = πu(A−1 x) v, v R2 Z Z t −1 2πiA−1 x·t = e dµv (t) = e2πix·(A ) t dµv (t) 2 R2 ZR t 2πix·s −1 = e d(A )∗ µv (s), R2

t

where we used A−1 u(x)A = u(A−1 x) for all x ∈ R2 . Hence µπA v = (A )−1 ∗ µv by uniqueness of the spectral measure.  Lemma 10.44 (Continuity of spectral measures). Let π be a unitary representation of Rd for some d > 1 on a complex Hilbert space H. If v and w in H have norm one, then the difference of their spectral measures satisfies kµv − µw k 6 4kv − wk. Proof. First decompose w = w1 + w2 with w1 ∈ Hv and w2 ∈ Hv⊥ . By the properties of the orthogonal decomposition we have kv − w1 k 6 kv − wk and kw2 k 6 kv − wk 6 2. Just as in Lemma 9.12(b) it is easy to see that

382

10 Locally Compact Groups, Amenability, Property (T)

µw = µw1 + µw2 which gives kµv − µw k = kµv − µw1 − µw2 k 6 kµv − µw1 k + kµw2 k 6 kµv − µw1 k + 2kv − wk since kµw2 k = kw2 k2 . It remains to bound kµv − µw1 k. First recall that in the spectral theorem (Theorem 9.58) the generator v corresponds to 1 ∈ L2µv (Rd ) and w1 corresponds to some function f ∈ L2µv (Rd ). Also note that dµw1 = |f |2 dµv by the same argument as in the proof of Lemma 9.12(c). Therefore, Z kµv − µw1 k = |1 − |f |2 | dµv Rd Z  = h 1 − |f |2 dµv = hh1, 1iL2 − hhf, f iL2 , µv

Rd

µv

where h(t) = sign(1 − |f (t)|2 ). By the Cauchy–Schwarz inequality we deduce that kµv − µw1 k = hh1 − hf, 1iL2 + hhf, 1 − f iL2 µv

µv

6 kv − w1 kkvk + kw1 kkv − w1 k 6 2kv − w1 k 6 2kv − wk since 1 ∈ L2µv (Rd ) corresponds to v and f ∈ L2µv (Rd ) to w1 ∈ Hv . Together with the above this gives the lemma.  The last preparatory step for the proof of Theorem 10.39 is the following negative result. Lemma 10.45 (No invariant measures). The natural action of SL2 (R) on the projective line P1 (R) = R2r{0}/ ∼ has no invariant probability measures. The natural action here is given by       ab x ax + by SL2 (R) ∋ : −→ , cd y cx + dy

  x where we write for the equivalence class (with respect to proportiony  x ality) of a vector ∈ R2r{0}. y Proof of Lemma 10.45. Notice that SO2 (R) ⊆ SL2 (R) acts transitively on P1 (R), the kernel M of the action consists of ±I, and that SO2 (R)/M acts simply transitively on P1 (R). Fixing an element of P1 (R), say the element corresponding to the x-axis, to correspond to the identity of SO2 (R)/M , we may

10.3 Property (T)

383

identify P1 (R) with SO2 (R)/M so that the action corresponds to translation on the group. By uniqueness of the Haar measure (Proposition 10.2) there is only one SO2 (R)-invariant probability measure on P1 (R). However, other elements of SL2 (R) do not preserve that measure. For example, the action   e of does not preserve that probability measure (check this).  e−1 Proof of Theorem 10.39. Let π be a unitary representation of ASL2 (R) on a Hilbert space H, and suppose it has almost invariant vectors. We note that we may assume that H is a complex Hilbert space, for if H is a real Hilbert space we may complexify it (see Exercise 6.51 and Exercise 10.46) and can extend the given representation to a unitary representation on a complex Hilbert space that will also have almost invariant vectors. Let (Qn ) be a sequence of compact subsets in ASL2 (R) with Qn ⊆ Qn+1 for which [ Qn = ASL2 (R). n>1

Then for every n > 1 there exists some (Qn , n1 )-invariant vector vn ∈ H with kvn k = 1. Let µvn be the spectral measure of vn with respect to R2 ⊳ ASL2 (R) for each n > 1. If, for some n > 1, we have µvn ({0}) > 0, then by the spectral theorem (Theorem 9.58) Hvn ∼ = L2µvn (R2 ) ∋ 1{0} contains a non-zero vector that is invariant under R2 . This is precisely the statement that we want to prove. So suppose that µvn ({0}) = 0 for all n > 1, and project µvn to a measure νn = p∗ µvn on P1 (R), where p : R2r{0} → P1 (R) denotes the natural projection map p(v) = [v] for all v ∈ R2r{0}. Since P1 (R) is compact we may apply Proposition 8.27 and choose a subsequence (νnk ) such that νnk → ν in the weak* topology as k → ∞ for some probability measure ν on P1 (R). We claim that ν is invariant under the action of SL2 (R) on P1 (R). To show this, let f ∈ C(P1 (R)), consider the function F = f ◦ p ∈ L ∞ (R2r{0}), and extend it by, for example, setting F (0) = 0. For A ∈ Qn ∩ SL2 (R) the vector vn satisfies kπA vn − vn k 6 n1 and so we have

384

10 Locally Compact Groups, Amenability, Property (T)

Z

P1 (R)

f ◦ (At )−1 dνn = =

Z

R2

Z

R2

F ◦ (At )−1 dµvn = F dµvn + Of

1 n



Z

=

R2

Z

F d(At )−1 ∗ µvn f dνn + Of

P1 (R)

1 n



by Lemmas 10.43 and 10.44. Now let n = nk and take k → ∞ to see that Z Z f ◦ (At )−1 dν = f dν P1 (R)

P1 (R)

for all f ∈ C(P1 (R)) and A ∈ SL2 (R). However, this shows that ν is SL2 (R)invariant, which contradicts Lemma 10.45.  Exercise 10.46. Let π be a unitary representation of a topological group on a real Hilbert space H. Let HC be the complexification of H as in Exercise 6.51. Show that π can be extended to a unitary representation on HC , which has almost-invariant (or invariant) vectors if and only if the original representation has almost invariant (or invariant) vectors. Exercise 10.47. Show that SLd (R) has property (T) for all d > 3.

10.3.4 Proof of Kaˇ zdan’s Property (T), Discrete Case The connection between SL3 (R) and its discrete subgroup SL3 (Z) is largely controlled by the fact that SL3 (Z) is a lattice in SL3 (R). We will not discuss the important notion of lattices in detail, but instead will work with the following form of the result, which will be proved after its significance is established. Theorem 10.48 (SL3 (Z) is a lattice). There exists a Borel subset F of the group G = SL3 (R), called a Ffundamental domain for SL3 (Z) in SL3 (R), such that mG (F ) < ∞ and G = γ∈SL3 (Z) F γ.

Apart from this result, we will also need a simple form of induction of unitary representations which will allow us to lift a unitary representation of SL3 (Z) to a unitary representation of SL3 (R). To explain this more generally, we let Γ < G be a discrete subgroup of a locally compact, σ-compact, metrizable, unimodular group, and let F ⊆ G be a fundamental domain for Γ in G, that is, a Borel subset such that G G= F γ. (10.19) γ∈Γ

Simple examples include Γ = Zd < G = Rd with F = [0, 1)d for any d > 1; we refer to [25] for more details on the properties of fundamental domains. Furthermore, let πΓ : Γ HΓ be a unitary representation of Γ on a separable Hilbert space HΓ .

ý

10.3 Property (T)

385

Using these objects, we now define a new Hilbert space HG equipped with a unitary representation of G called the induced representation. It is possible to give this definition abstractly and in a coordinate-free way, but using a Hilbert space isomorphism between HΓ and ℓ2 (N) we can make the definition of HG more explicit. We implicitly assume here that HΓ is infinitedimensional. In the finite-dimensional case the construction is slightly easier and adapting the notation to this case is straightforward. In the description afforded by this isomorphism we can therefore assume that HΓ = ℓ2 (N) and that Γ acts unitarily on ℓ2 (N). Using the fundamental domain F ⊆ G for Γ in G we give the initial definition HG =

M

2

L (F ) =

n∈N



2

N

(f1 , f2 , . . . ) ∈ L (F ) | L

∞ X

kfn k22

n=1

 mG (F ), then there exists some g ∈ B and γ ∈ Γ r{e} with gγ ∈ B.

10.3 Property (T)

387

Proof. By assumption on F in (10.19) we have G F′ = F ′ ∩ (F γ −1 ).

(10.23)

γ∈Γ

Multiplying F ′ ∩(F γ −1 ) on the right by γ, we obtain the sets (F ′ γ)∩F . These are again disjoint by assumption on F ′ and their union equals F . Therefore, X X mG (F ′ ) = mG (F ′ ∩ (F γ −1 )) = mG ((F ′ γ) ∩ F ) = mG (F ) γ∈Γ

γ∈Γ

by unimodularity of G. Suppose now that g0 ∈ G and F is a measurable fundamental domain. Then F ′ = g0 F is another fundamental domain and (10.23) defines a map φ by F ∋ g 7−→ g0 g ∈ F ′ 7−→ g0 gγg ∈ F (with γg ∈ Γ being uniquely determined by the condition g0 gγg ∈ F ). It is clear that the inverse to this map is given by the same procedure but using g0−1 . To see that these maps are measure-preserving we consider the function φ above. Now let B ⊆ F be measurable, and note that φ(B) is defined by piecewise right translation of the set g0 B ⊆ F ′ = g0 F back to F . In other words, we use (10.23) and apply the same cut-and-translate procedure to obtain the desired equality X  mG (B) = mG (g0 B) = mG (g0 B) ∩ (F γ −1 ) γ∈Γ

=

X

γ∈Γ



mG (g0 Bγ) ∩ F = mG

G

γ∈Γ

!

(g0 Bγ) ∩ F = mG (φ(B)).

Strictly speaking we should prove that mG (φ−1 (B)) = mG (B) as in the definition of a measure-preserving map (Definition 8.35), but since φ is invertible this distinction is not important. Suppose now that B ⊆ G is measurable with mG (B) > mG (F ). Applying (10.19) we see that X  X  mG (F ) < mG (B) = mG B ∩ (F γ −1 ) = mG (Bγ) ∩ F , γ∈Γ

γ∈Γ

which implies the existence of γ1 6= γ2 ∈ Γ and g1 , g2 ∈ B with g1 γ1 = g2 γ2 as required.  Proof of LemmaF10.50. Let F and F ′ again denote two fundamental domains so that F = γ∈Γ F ∩ (F ′ γ −1 ). Applying the first equality in (10.22) (which is the definition of the norm on HG using F ) we obtain

388

10 Locally Compact Groups, Amenability, Property (T)

kf k2HG =

XZ

γ∈Γ

F ∩(F ′ γ −1 )

kf (g)k22 dmG (g) =

XZ

γ∈Γ

(F γ)∩F ′

kf (hγ −1 )k22 dmG (h)

by using the substitution g = hγ −1 for g ∈ F ∩ F ′ γ −1 and h ∈ F γ ∩ F ′ and unimodularity of G. Using the defining formula f (h) = f (gγ) = πΓ (γ)−1 f (g) we see that kf (h)k2 = kf (g)k2 = kf (hγ −1 )k and so kf k2HG =

XZ

γ∈Γ

(F γ)∩F ′

kf (h)k22 dmG (h) =

where we used the consequence F ′ =

F

γ∈Γ (F γ)

Z

F′

kf (h)k22 dmG (h),

∩ F ′ of (10.19).



We are now ready to prove the main properties of the unitary induction (which in a sense combines the unitary representation πΓ and the measurepreserving maps discussed in Lemma 10.51). Proposition 10.52. Let G be a locally compact σ-compact metrizable unimodular group and Γ < G a lattice (so that there exists a fundamental domain F as in (10.19) with mG (F ) < ∞). Given a unitary representation πΓ of Γ on a separable Hilbert space HΓ , the Hilbert space HG constructed above admits a unitary representation πG of G defined by πG,g0 f (g) = f (g0−1 g) for g0 , g ∈ G and f ∈ HG . Moreover, HΓ has a non-trivial Γ -fixed vector if and only if HG has a non-trivial G-fixed vector, and HG has almost invariant vectors if HΓ has almost invariant vectors. Note that the formula defining πG,g0 is the same formula as for the left regular representation on the space of functions on G, but that the space and the norm are different. Exercise 10.53. Let Γ < G be a discrete subgroup of a locally compact, σ-compact, metrizable, unimodular group G. Let πΓ be the left regular representation of Γ on ℓ2 (Γ ) defined by πΓ,γ0 f (γ) = f (γ0−1 γ) for all f ∈ ℓ2 (Γ ) and γ0 , γ ∈ Γ . Show that the induced representation πG is then unitarily isomorphic to the left regular representation of G.

Proof of Proposition 10.52. Let g0 ∈ G and f ∈ HG . Then Z Z −1 2 2 kπG,g0 (f )kHG = kf (g0 g)k2 dmG (g) = kf (h)k22 dmG (h) g0−1 F

F

by left-invariance of the Haar measure mG . However, by Lemma 10.50 the latter is equal to kf k2HG since g0−1 F = F ′ is also a fundamental domain. Hence πG,g0 is a unitary operator on HG for any g0 ∈ G. That πG is a homomorphism from G into the group of unitary operators of HG follows by the same argument as for the regular representation on p. 353. Lifting invariant vectors. Suppose now that v ∈ HΓ is an invariant unit vector. Then we can define f (g) = √ 1 v for all g ∈ F , and the extension mG (F )

10.3 Property (T)

389 1 v mG (F )

of f to G will be f (g) = √

for all g ∈ G. Notice that f ∈ HG

since mG (F ) < ∞. We therefore obtain a unit vector of HG that is invariant with respect to G. Pushing invariant vectors back to HΓ . Suppose for the opposite direction that HG has a G-invariant unit vector f . Since G acts transitively on itself, this implies that f (g) = v for some non-zero v in HΓ and almost every g ∈ G (by using Exercise 10.4 for each component fj of f = (f1 , f2 , . . .)). Since we also have f (gγ) = πΓ (γ)−1 f (g) for almost every g ∈ G and all γ ∈ Γ we see that v ∈ HΓ is a non-zero Γ -invariant vector. Lifting almost invariant vectors. Suppose next that HΓ has almost invariant vectors, let K ⊆ G be a compact subset and let ε > 0. Recall that mG (F ) < ∞ since Γ < G is a lattice. By regularity of mG there exists a compact subset L ⊆ F such that mG (F rL) < εmG (F ). Since L−1 KL ⊆ G is compact and Γ < G is discrete, the set Q = Γ ∩ (L−1 KL) is a finite subset of Γ . Suppose now that v ∈ HΓ is a (Q, ε)-almost invariant unit vector. Much as in the discussion of invariant vectors, we define f ∈ HG by setting ( v for g ∈ L, f (g) = 0 for g ∈ F rL and use the formula f (gγ) = πΓ (γ)−1 f (g) for all g ∈ F and γ ∈ Γ to extend f to a function f ∈ HG . Now let k ∈ K and g ∈ F . Then πG,k f (g) = f (k −1 g) = πΓ (γ)f (k −1 gγ) for all γ ∈ Γ . Choose γg ∈ Γ such that k −1 gγg = g ′ ∈ F . Using this notation we can then write Z

πG,k f − f 2 = πΓ (γg )f (k −1 gγg ) − f (g) 2 dmG (g). HG 2 | {z } F =g′ ∈F

Next we decompose the integral into the subsets:

• g ∈ F rL (with f (g) = 0 and kf (g ′ )k2 6 kvk2 = 1); • g ∈ L and g ′ = k −1 gγg ∈ F rL (with f (g) = v and f (g ′ ) = 0); and • g, g ′ ∈ L (with f (g) = f (g ′ ) = v).

In the first case we will use mG (F rL) < εmG (F ), in the second case mG ({g ∈ L | g ′ ∈ F rL}) 6 mG (F rL) < εmG (F )

(which follows since F ∋ g 7→ g ′ ∈ F is measure-preserving by Lemma 10.51), and in the third case we see that

390

10 Locally Compact Groups, Amenability, Property (T)

γg = g −1 kg ′ ∈ L−1 KL ∩ Γ = Q and so kπΓ (γg )v − vk2 < ε. Together this gives kπG,k f − f k2HG < 2εmG (F ) + ε2 mG (F ). We may assume that ε < 12 , so that p kf k2 = mG (L) >

r

mG (F ) . 2

Since ε > 0 was arbitrary, this shows that HG has almost invariant vectors. Continuity. The alert reader will have noticed that the arguments above do not finish the proof, because it remains to be shown that HG is indeed a unitary representation, and in particular satisfies the continuity requirement. We will use Lemma 3.74 for this, but only indirectly. Let us begin by noting that it is enough to prove continuity at the identity e ∈ G for the following reason. Suppose that for any sequence (kn ) in G with kn → e as n → ∞ we have πG,kn f → f as n → ∞ for any f ∈ HG . It follows that if (gn ) is a sequence in G with gn → g as n → ∞ then πG,gn f −πG,g f = πG,g (πG,kn f −f ) → 0 since kn = g −1 gn → e as n → ∞ and πG,g is continuous. L 2 2 2 In the following, we identify n∈N L (F ) with L (F × N) ⊆ L (G × N) and endow the latter with the unitary representation defined by λg (f )(h, n) = f (g −1 h, n) for (h, n) ∈ G × N, g ∈ G, and f ∈ L2 (G × N). L 2 Given any f ∈ HG , we consider f |F ∈ n∈N L (F ) as an element 2 of L (G × N) satisfying f |F (g, n) = fn (g) for g ∈ F and f |F (g, n) = 0 for g ∈ GrF , and n ∈ N. Applying Lemma 3.74 to f |F ∈ L2 (G × N), we see that there exists, for every ε > 0, a neighbourhood V of e such that



λg f |F − f |F < ε (10.24) 2 whenever g ∈ V . Given one such g ∈ V we write

 F = F ∩ (g −1 F ) ∩ (gF ) ⊔ F r(gF ) ∪ F r(g −1 F ) = Fin ⊔ B, | {z } | {z } | {z } =Fin

=B+

=B−

that is, we decompose F into the part Fin ⊆ F that stays inside F (under the action of g and of g −1 ) and its relative complement B = F r(gF ) ∪ F r(g −1 F ) = F rFin (see Figure 10.2). It may help to think of B as the bad set on which λg and πG,g are quite different. We need to estimate its significance.

10.3 Property (T)

391

B+

Fin

g −1 F

F

B−

gF

Fig. 10.2: The circle depicts F , and the action of g translates the circle to the right, giving rise to the decomposition F = Fin ⊔ B.

Using these sets, and recalling that HG ∋ f ′ 7−→ f ′ |F ∈ L2 (F × N) is a unitary isomorphism, we also decompose f ∈ HG into f = fin + fB with fin , fB ∈ HG satisfying fin |F = f |F 1Fin and fB |F = f |F 1B . Using the identity λg 1B− = 1gB− this implies that λg (f |F 1B− ) vanishes outside of gB− and hence in particular it vanishes on F . Similarly λg (1F ) = 1gF shows that λg (f |F ) vanishes on B+ . Combining these with (10.24), it follows that   kf |F 1B− k2 = kλg f |F 1B− k2 6 kλg f |F − f |F k2 < ε,  kf |F 1B+ k2 6 kλg f |F − f |F k2 < ε, and so

kfB kHG < 2ε.

(10.25)

For fin we claim that   πG,g fin F = λg fin |F .

Indeed, if h ∈ Fin ∪ B−rB+ = F ∩ gF , then g −1 h ∈ F and   πG,g fin (h) = fin (g −1 h) = λg fin |F (h)

(10.26)

as required. On the other hand, if h ∈ B+ , then h ∈ F but g −1 h ∈ / F. Let γ ∈ Γ r{e} be such that g −1 hγ ∈ F , which implies g −1 hγ ∈ B− and hence by Lemma 10.49    πG,g fin (h) = πΓ (γ) fin (g −1 hγ) = 0 = fin |F (g −1 h) = λg fin |F (h), as claimed. Combining (10.25)–(10.26) with (10.24) we can now obtain

392

10 Locally Compact Groups, Amenability, Property (T)

kπG,g f − f kHG 6 kπG,g fin − fin kHG + 4ε  = kλg fin |F − fin |F k2 + 4ε  < kλg f |F − f |F k2 + 8ε < 9ε.

Since this holds for any g ∈ V , we obtain the continuity of the unitary representation and hence the theorem.  Proposition 10.54. Let G be a locally compact σ-compact metrizable unimodular group and let Γ < G be a lattice in G. If G has property (T) then Γ also has property (T).

ý

Proof. Let πΓ : Γ HΓ be a unitary representation on a separable Hilbert space that has almost invariant vectors. Applying Proposition 10.52 we find the unitary representation πG : G HG , also with almost invariant vectors. By assumption G has property (T) so that HG has a nontrivial G-invariant vector. By Proposition 10.52, this implies that HΓ has a non-trivial Γ -invariant vector. It follows that Γ has property (T). 

ý

It should now be clear how to combine the arguments to obtain a proof of Corollary 10.40: By Theorem 10.38, the topological group SL3 (R) has property (T). In Theorem 10.48 we claimed that the discrete subgroup SL3 (Z) is a lattice in SL3 (R). Hence Proposition 10.54 shows that SL3 (Z) also has property (T). For the proof of the lattice property in Theorem 10.48 we have to make a short excursion into the ‘geometry of numbers’. 10.3.5 Iwasawa Decomposition, Geometry of Numbers, and Reduction Theory Semi-simple groups have distinguished subgroups, some of which permit the group to be decomposed. The Iwasawa or KAN decomposition of SL3 (R) concerns the following subgroups: • the compact special orthogonal group  K = SO3 (R) = g ∈ SL3 (R) | gg t = I ,

(where we write I for the identity matrix) which is also sometimes written SO(3, R); • the positive diagonal subgroup     a1 0 0  A =  0 a2 0  | a1 , a2 , a3 > 0 and a1 a2 a3 = 1 ;   0 0 a3

• and the unipotent subgroup

10.3 Property (T)

393

    1xz  N = 0 1 y  | x, y, z ∈ R .   001

Lemma 10.55 (Iwasawa or KAN decomposition). Any element of SL3 (R) can be written uniquely in the form kan with k ∈ K, a ∈ A and n ∈ N . Proof. As we will see, this is simply a reformulation of the familiar Gram– Schmidt procedure in R3 (see the proof of Theorem 3.39). Writing the matrix g = (w1 , w2 , w3 ) ∈ SL3 (R) in terms of its column vectors w1 , w2 , w3 ∈ R3 we may apply the Gram–Schmidt orthonormalization procedure to obtain v1 = a1 −1 w1 with a1 = kw1 k > 0,

v2 = a2 −1 (w2 − hw2 , v1 i v1 ) = a2 −1 (w2 − n12 a1 v1 ) for some a2 > 0, n12 ∈ R, v3 = a3 −1 (w3 − n13 a1 v1 − n23 a2 v2 ) for some a3 > 0 and n13 , n23 ∈ R,

with the property that {v1 , v2 , v3 } is an orthonormal basis of R3 . This gives v1 , v2 , v3



  a1 0 0 1 n12 n13  0 a2 0  0 1 n23  0 0 a3 0 0 1

 = w1 , n12 a1 v1 + a2 v2 , n13 a1 v1 + n23 a2 v2 + a3 v3 = g.

We define k, a, n to be the first, second, and third matrix in the equation above. Clearly det n = 1 and det a > 0. Since k has orthonormal column 2 vectors we have det k = ±1 (since k t k = I and so (det k) = 1). Since we know that det g = 1, this implies that det k = det a = 1, so we have established the existence of the decomposition. If kan = g = k ′ a′ n′ are both decompositions of this form, then an(a′ n′ )−1 = k −1 k ′ ∈ K ∩ AN since K and AN are both subgroups. Since all elements of K are diagonaliziable over C with eigenvalues of absolute value one, we see that K ∩AN = {I}, which implies k = k ′ and an = a′ n′ . Similarly, since A ∩ N = {I} we now see in the same way that a = a′ and n = n′ .  A lattice in Rd is a subgroup of the form Λ = gZd for some g ∈ GLd (R). Recall from Lemma 10.51 that the co-volume of Λ is defined by the Lebesgue measure of any fundamental domain F ⊆ Rd for Λ. Using F = g[0, 1)d we see that the co-volume is given by |det g|. The next result is part of a theory from 1896 due to Minkowski, who also invented the descriptive name ‘geometry of numbers’ for it (see [74] for a reprint and the monograph of Lekkerkerker [60] for more material in this direction).

394

10 Locally Compact Groups, Amenability, Property (T)

Proposition 10.56 (Choice of basis). Let Λ < R3 be a lattice of covolume 1. Then there exists some g ∈ SL3 (R) with Λ = gZ3 with the property that the matrices a ∈ A, n ∈ N , and k ∈ K in the Iwasawa decomposition g = kan satisfy a1 ≪ a2 ≪ a3 and n12 , n13 , n23 ∈ [− 21 , 12 ). Proof. For the proof of the proposition we will also use a version of the conclusion in two dimensions. Let us assume first that d > 2 and Λ ⊆ Rd is a discrete subgroup. Let w1 ∈ Λ be a shortest non-zero vector of Λ, let V be the space (Rw1 )⊥ and let p : Rd → V denote the orthogonal projection. We claim that any non-zero vector p(w) ∈ p(Λ) has kp(w)k >

√ 3 2 kw1 k.

(10.27)

To prove this, suppose that w ∈ Λ satisfies 0 < kvk
0. To summarize, we have obtained a3 ≫ a2 ≫ a1 . Moreover, for any w ∈ Λ there exist ℓ2 , ℓ3 ∈ Z with p(w) = ℓ2 v2 + ℓ3 v3 . Considering w − ℓ2 w2 − ℓ3 w3 ∈ Rw1 and the choice of w1 , we also find ℓ1 ∈ Z with w = ℓ1 w1 + ℓ2 w2 + ℓ3 w3 , which shows that Λ = (w1 , w2 , w3 )Z3 = Zw1 + Zw2 + Zw3 . Multiplying (w1 , w2 , w3 ) on the right by   1 ℓ1 0 0 1 ℓ2  0 0 1

396

10 Locally Compact Groups, Amenability, Property (T)

with ℓ1 , ℓ2 ∈ Z allows us to modify n12 , n23 by integers while preserving the lattice. Thus we may assume that n12 , n23 ∈ [− 21 , 12 ). Multiplying (w1 , w2 , w3 ) after this on the right by   10ℓ 0 1 0 001 for some ℓ ∈ Z finally allows us to also obtain n13 ∈ [− 21 , 12 ).



Lemma 10.57. The group SL3 (R) is unimodular, and the Haar measure on SL3 (R) decomposes with respect to the Iwasawa decomposition into the (r) product of the Haar measure mK on K and the right Haar measure mAN on AN . Proof. Notice first that SL3 (R) ⊆ Mat33 (R) = R9 is defined by a single equation and hence is a hypersurface. We will define the Haar measure mSL3 (R) on SL3 (R) using the Lebesgue measure mR9 by the following trick. Define a measure µ on SL3 (R) by µ(B) = mR9 ({tg | t ∈ [0, 1], g ∈ B}) for any Borel measurable set B ⊆ SL3 (R). To see that the set on the right-hand side is measurable note that U = {m ∈ Mat33 (R) | det m ∈ (0, 1)} is open since the determinant map is continuous, and on U the map φ : U −→ (0, 1) × SL3 (R)   1 m m 7−→ det m, √ 3 det m is a homeomorphism. It follows that {tg | t ∈ [0, 1], g ∈ B} is also given by {0} ∪ φ−1 ((0, 1) × B) ∪ B and so is measurable. If now B = K is compact, then so is {tg | t ∈ [0, 1], g ∈ B} ⊆ R9 , which gives µ(B) < ∞. Also, if B = O is non-empty and open, then {tg | t ∈ [0, 1], g ∈ B} contains the non-empty open set {tg | t ∈ (0, 1), g ∈ B} and so in particular µ(B) > 0. Now let B ⊆ SL3 (R) be measurable and g0 ∈ SL3 (R). Then µ(g0 B) = mR9 ({tg0 g | t ∈ [0, 1], g ∈ B})

= mR9 (g0 {tg | t ∈ [0, 1], g ∈ B}) = mR9 ({tg | t ∈ [0, 1], g ∈ B}) = µ(B),

since left multiplication by g0 scales Lebesgue measure on Mat33 (R) by the Jacobian of the map m 7→ g0 m which is | det g0 |3 = 1, and so preserves the Lebesgue measure. This shows that µ is a left Haar measure, so we may write µ = mSL3 (R) . On the other hand exactly the same argument applies to right multiplication, so µ is also a right Haar measure, and hence SL3 (R) is unimodular.

10.3 Property (T)

397

For the second claim define a map ψ : K × AN −→ SL3 (R)

(k, an) 7−→ k(an)−1 ,

and note that the Gram–Schmidt procedure in the proof of Lemma 10.55 shows that ψ is a homeomorphism. Define a measure ν on K × AN by ν(B) = mSL3 (R) (ψ(B)) for any measurable set B ⊆ K × AN . Since ψ is a homeomorphism, µ is finite on compact sets and positive on non-empty open sets. Given some k ∈ K and an ∈ AN we also have ν ((k, an)B) = mSL3 (R) (ψ ((k, an)B))

 = mSL3 (R) kψ(B)(an)−1 = mSL3 (R) (ψ(B)) = ν(B).

Thus ν is a left Haar measure on K × AN , which by uniqueness of Haar measure means that ν is a scalar multiple of mK × mAN . Recalling that the inverse map AN → AN sending an to (an)−1 maps the left Haar measure to the right Haar measure, the result follows.  For the calculation coming up we also need to know the right Haar measure on the subgroup AN < SL3 (R) explicitly in terms of coordinates. Lemma 10.58. Using the responding to  a1 0 0

coordinates (a1 , a2 , n12 , n13 , n23 ) ∈ R2>0 × R3 cor  0 0 1 n12 n13 a2 0  0 1 n23  ∈ AN 0 a3 0 0 1

(10.29)

a1 a1 a2 da1 da2 dn12 dn13 dn23 . a2 a3 a3 a1 a2

(10.30)

(r)

with a3 = (a1 a2 )−1 , the right Haar measure mAN is given by

Proof. Multiplying the matrix in (10.29) on the right by   1 m12 m13 0 1 m23  0 0 1 gives the map (in our chosen coordinates)

(a1 , a2 , n12 , n13 , n23 ) 7−→ (a1 , a2 , n12 +m12 , m13 +n13 +n12 m23 , n23 +m23 ), and it is easy to see that this preserves the measure defined by (10.30). Multiplying on the right by

398

10 Locally Compact Groups, Amenability, Property (T)

 b1 0 0  0 b2 0  0 0 b3

with b1 , b2 > 0 and b3 = (b1 b2 )−1 we obtain the map   (a1 , a2 , n12 , n13 , n23 ) 7−→ a1 b1 , a2 b2 , bb21 n12 , bb31 n13 , bb32 n23 .

Let f be a positive measurable function on H = R2>0 × R3 . Then in Z   1 da2 f a1 b1 , a2 b2 , bb21 n12 , bb31 n13 , bb32 n23 aa12 aa13 aa23 da a1 a2 dn12 dn13 dn23 H

we may substitute m12 = Z

H

b2 b1 n12 ,

m13 =

b3 b1 n13 ,

and m23 =

f (a1 b1 , a2 b2 , m12 , m13 , m23 ) aa12 aa13 aa23 bb12 bb13 bb23

da1 da2 a1 a2

b3 b2 n23

to obtain

dm12 dm13 dm23 .

Using the substitution c1 = a1 b1 and c2 = a2 b2 (and setting c3 = a3 b3 ) we obtain Z 1 dc2 f (c1 , c2 , m12 , m13 , m23 ) cc12 cc13 cc23 dc c1 c2 dm12 dm13 dm23 . H

As AN is generated by these two types of elements, this proves the lemma.  The proof of Theorem 10.48 is now (essentially) reduced to a calculation. Proof of Theorem 10.48. Since SL3 (R) acts on lattices Λ = gZ3 by left multiplication, and the stabilizer of Z3 under this action is precisely SL3 (Z), we have the identification  SL3 (R)/ SL3 (Z) ∼ = gZ3 | g ∈ SL3 (R) .

Applying Proposition 10.56 we see that there exists some c > 0 such that SL3 (R) = B SL3 (Z), where B = KD is called a Siegel set, K = SO3 (R) and the Borel measurable set D consists of all matrices in AN as in (10.29) satisfying the conditions 0 < a1 6 ca2 6 c2 a3 , a3 =

1 a1 a2 ,

and n12 , n13 , n23 ∈ [− 12 , 12 ). (r)

By Lemma 10.57 we can calculate mSL3 (R) (B) by calculating mAN (D). Note that the conditions on the diagonal entries a1 , a2 , a3 imply that a1 ∈ (0, c1 ] −1/2 and a2 ∈ [c2 a1 , c3 a1 ] for some constants c1 , c2 , c3 > 0. By Lemma 10.58

10.3 Property (T) (r) mAN (D)

6

399

Z

c1

=

c1

0

=

1 2

−1/2

c 3 a1

a1 a1 a2 da2 da1 a2 (a1 a2 )−1 (a1 a2 )−1 a2 a1

c 2 a1

0

Z

Z

Z

0

Z

−1/2

c 3 a1

c 2 a1

c1

a31 a2 da2 da1

c3 a1−1/2 a31 a22 da1 = c 2 a1

1 2

Z

0

c1

 2 2 a31 c23 a−1 1 − c2 a1 da1 < ∞.

To prove the theorem we have to show that there exists a fundamental domain contained inFB, that is, a Borel measurable subset F ⊆ B = KD such that SL3 (R) = γ∈SL3 (Z) F γ. For this we first prove the following injectivity claim: for any g0 ∈ SL3 (R) there exists an open neighbourhood U of g0 such that h, h′ ∈ U and hγ = h′ for some γ ∈ SL3 (Z) implies γ = I. Indeed, if this were not true, we could find two sequences (hn ) , (h′n ) converging to g0 as n → ∞ such that ′ r h−1 n hn = γn ∈ SL3 (Z) {I}.

However, this contradictsSthe fact that SL3 (Z) is a discrete subgroup of SL3 (R). Next write SL3 (R) = n Kn as a countable union of compact subsets (for example, define Kn to be the intersection of SL3 (R) with closed balls in R9 of radius n > 1 around 0). For each n choose a finite cover Un,1 , . . . , Un,mn of Kn such that the above injectivity claim holds on each of these sets. To simplify the notation, let us summarize the above by saying that we have found a countable list of open sets U1 , U2 , . . . satisfying the injectivity claim and covering all of SL3 (R). We now define F1 = B ∩ U1 and F2 = (B ∩ U2 )r(F1 SL3 (Z)). Assuming that F1 , . . . , Fn−1 are already defined, we define  Fn = (B ∩ Un )r (F1 ∪ · · · ∪ Fn−1 ) SL3 (Z) . S∞ We claim that the set F = n=1 Fn is the desired fundamental domain. First note that by construction we have F ⊆ B. Moreover, for a given g ∈ SL3 (R) we know that (g SL3 (Z)) ∩ B 6= ∅ and hence there exists a minimal n ∈ N such that g SL3 (Z) ∩ B ∩ Un 6= ∅. By the injectivity property on Un this intersection then consists of a single element gγ for some γ ∈ SL3 (Z). By minimality of n we see that gγ ∈ / (F1 ∪ · · · ∪ Fn−1 ) SL3 (Z),

400

10 Locally Compact Groups, Amenability, Property (T)

so that we have {gγ} = g SL3 (Z) ∩ Fn . Finally, by construction it also follows that g SL3 (Z) ∩ Fm = ∅ for all m 6= n, giving {gγ} = g SL3 (Z) ∩ F . Hence F is a fundamental domain and the theorem follows. 

10.4 Highly Connected Networks: Expanders In designing large connected networks (for example, connecting many computers and servers) one is often confronted with two competing constraints: • (High connectivity) Starting from any vertex, it should be easy to reach any other vertex quickly (that is, in few steps). • (Sparsity) The network should be economical, meaning that there should not be an unnecessarily large number of edges in the network. Clearly it is easy to achieve the first at the expense of the second by using a complete graph (in which every pair of vertices has an edge joining them), and it is easy to achieve the second at the expense of the first (by arranging the edges so that the vertices are strung along a single line, so as to achieve connectivity at the lowest possible cost). Exercise 10.59. Analyze the number of edges as a function of the number of vertices in the two extreme constructions of connected networks from above.

Of course there is another option of creating a centre vertex with a direct connection to each of the existing vertices (something that might be called a hub for an airline), but the centre vertex created in this way would be very costly (or even technically impossible because of limits to the number of edges at a single vertex) and would defeat the objective of achieving sparsity. The notion of expander graphs is an attempt to achieve a balance between the two constraints. In order to describe expanders, we will need some basic notation from graph theory. A graph G = (V, E) is a set of vertices V (the nodes of the network) and edges E ⊆ V ×V giving the list of direct connections between vertices. We will always assume that the graph is undirected , so each edge goes both ways and the set E is symmetric. In particular, a pair of vertices is at most connected by one edge. We will also assume that the graph is simple, meaning that there is never an edge from a vertex to itself. Formally, the set of edges is a subset of (V × V)r{(a, a) | a ∈ V} with the property that (a, b) ∈ E if and only if (b, a) ∈ E. We identify (a, b) ∈ E with (b, a) so that |E| equals the total number of edges in the graph, each of which is viewed as a two-way connection. The requirement of sparsity is achieved by requiring that the graph G be kregular for a fixed k. A graph G = (V, E) is said to be k-regular if, for any vertex v ∈ V, there are exactly k edges from v to other vertices in V. We will fix k and look for k-regular graphs with a large number of vertices (it is easy

10.4 Highly Connected Networks: Expanders

401

to see that a k-regular graph on n vertices exists if and only if n > k+1 and nk is even). Notice that this will impose a sparsity condition on the graph, since the number of edges |E| will be a linear function of the number of vertices |V| (in contrast to the case of a complete graph, for which |E| = 12 |V| (|V| − 1)). In order to define the notion of high connectivity, we will need some preparations. A graph G = (V, E) is called connected if for any two v, w ∈ V there exists a path from v to w in that there is a list v = v0 , v1 , v2 , . . . , vn = w of vertices in V with (vi , vi+1 ) ∈ E for i = 0, . . . , n − 1. Such a path may consist of a single vertex, so each vertex is connected to itself by a path of length zero. Notice that there is a natural metric on any connected graph: we may define d(v, w) to be the minimal length of a path from v to w (that is, the minimal number of edges in a path joining v to w; see Figure 10.3 and Exercise 10.60). In this metric the diameter of a connected graph G is the minimal N ∈ N with the property that for any two vertices v and w there is a path of length no more than N connecting v to w. Exercise 10.60. Verify that the notion of distance on a graph defines a metric on the set of vertices of a connected graph.

v1

v

v2

w

Fig. 10.3: Two points v, w at distance 3 in a connected graph.

The smaller the diameter is in comparison with V, the better the connectivity of the graph is. The worst case with the vertices strung out on a line (or if we seek a 2-regular graph, arranged around a circle) has diameter |V| − 1 (or ⌊ |V| 2 ⌋). The other extreme case of a complete graph has diameter 1. In the case of expander graphs we will see that such families may be found with diameter N ≪ log |V|. The implied constant will depend on k and on ξ (as in Definition 10.61) but is not allowed to depend on the particular graph G. Considering the growth rate of the logarithm, it should be clear that this is a formulation of high connectivity. Definition 10.61 (Expanders). A sequence of finite k-regular graphs (Gi = (Vi , Ei ))i>1

402

10 Locally Compact Groups, Amenability, Property (T)

is an expander family if there exists a constant ξ ∈ (0, 1) (independent of i) with  |∂S| > ξ min |S|, |VirS| for any subset S ⊆ Vi and any i > 1, where

∂S = {v ∈ S | there exists a w ∈ VirS with (v, w) ∈ E} ∪ {v ∈ VirS | there exists a w ∈ S with (v, w) ∈ E} is called the boundary of S. A few comments are in order. We first note that the above definition of the boundary of a subset of the vertex set of a graph does not coincide with the boundary of S considered in the metric space Vi (the latter is empty since Vi is discrete). Any finite collection of finite k-regular connected graphs (formally, a sequence as in Definition 10.61 that repeats these) is an expander family. As this is not at all interesting — and in particular does not achieve the real benefit of the slower growth rate from the logarithmic bound on the diameter — one usually requires in addition that |Vi | → ∞ as i → ∞. Notice that we must also have k > 3, because k = 2 corresponds to a sequence of regular polygons, which we quickly see cannot be an expander family. An expander family consists of connected graphs, but as already mentioned much more is true. Proposition 10.62 (Small diameter). For an expander family (Gi )i , we have diam Gi ≪ log |Vi |. Proof. Given some vertex v ∈ Vi we claim that the metric ball Ba (v) = {w ∈ Vi | d(v, w) 6 a} has more than

|Vi | 2

elements if the integer a satisfies a>D=

log(|Vi |/2) . log 1 + ξ/(k + 1)

Assuming the claim, suppose that v, w ∈ Vi are any pair of vertices and set a equal to ⌈D⌉. Then, by the claim, each of |Ba (v)| and |Ba (w)| is greater than |V2i | , so that these two balls must have non-empty intersection. By the triangle inequality, it follows that d(v, w) 6 2(D + 1) ≪ξ,k log |Vi |, giving the proposition. To prove the claim, let n > 0, set S = Bn (v) and assume |S| 6 then have Bn+1 (v)rBn (v) = ∂SrS.

|Vi | 2 .

We

10.4 Highly Connected Networks: Expanders

403

Note that every element of ∂S∩S must connect to one element of ∂SrS and at most k elements of ∂S ∩ S can connect to the same element of ∂SrS. We can use this to define a map from ∂S ∩ S to ∂SrS that is at most k-to-1, showing that |∂S ∩ S| 6 k|∂SrS|. This, together with |∂S| = |∂S ∩ S| + |∂SrS|, gives |∂SrS| >

1 |∂S|. k+1

Together with the defining property of expander graphs, and assuming as we may that ξ ∈ (0, 1), we deduce that |∂Bn (v)rBn (v)| >

ξ k+1 |Bn (v)|.

By induction we now prove |B0 (v)| = 1, |B1 (v)| = k + 1 > 1 +

ξ k+1 ,

and

   n+1 ξ ξ |Bn+1 (v)| = |Bn (v)|+|∂Bn (v)rBn (v)| > 1+ k+1 |Bn (v)| > 1+ k+1

for all n with |Bn (v)| 6 |V2i | . Since for n = a > D the lower bound is greater than or equal to |V2i | , this proves the claim.  Thus expander families achieve a balance between the two constraints of high connectivity (with logarithmic growth of the diameter) and sparsity of the graph (with only linear growth of the number of edges and a fixed number of edges at every vertex). However, several questions remain, the most pressing of which are the following. • Do expander families exist? • What is their connection to functional analysis? The first examples of expander families were found by Pinsker [87] (translated in [88]) using a non-constructive probabilistic argument. The same year Margulis [67] (translation in [68]) was able to give an explicit construction(30) using Kaˇzdan’s Property (T) for the group SL3 (Z). Towards the proof of this, we now exhibit a connection between the expander property and properties of eigenvalues of linear maps associated to the graphs. Let G = (V, E) be a finite graph and identify V with the set {1, 2, . . . , |V|}. The adjacency matrix AG of the graph G is the matrix with |V| rows and |V| columns and with entries in {0, 1} so that (AG )i,j = 1 if and only if there is an edge from vertex i to vertex j. A simple graph G with adjacency matrix   010110 1 0 1 0 0 1   0 1 0 1 0 1   AG =   1 0 1 0 1 0 1 0 0 1 0 1 011010

404

10 Locally Compact Groups, Amenability, Property (T)

is shown in Figure 10.4.

2

3

1

4

G

6

5

Fig. 10.4: A connected 3-regular graph on 6 vertices.

Several properties of the graph are reflected in the properties of the adjacency matrix. The matrix AG is symmetric by our standing assumption on the graph G = (V, E). We also define MG = k1 AG , which is an averaging operator in the following sense. A vector x ∈ R|V| may be thought of as a function on the set of vertices, and applying MG to x gives a new function which at the vertex i is equal to the mean of the values of the function x at all the neighbours of i. By analogy with the discussion in Section 1.2, one also studies the graph Laplace operator ∆G = I − MG . Since MG is symmetric, it is diagonalizable and has only real eigenvalues. Moreover, X X X X X |(MG x)i | = (MG )i,j xj 6 (MG )i,j |xj | = |xj |, i

since

i

j

i,j

X i

(MG )i,j = 1

j

(10.31)

for all j by construction. Therefore, any eigenvalue λ on MG has |λ| 6 1 and by (10.31) we see that λ1 = 1 is an eigenvalue (with the constant vectors as eigenvectors). The relationship between the eigenvalues and connectivity is illustrated by the following elementary lemma. Lemma 10.63 (Connectivity). A k-regular graph is connected if and only if 1 is a simple eigenvalue of MG . Essential Exercise 10.64. Prove Lemma 10.63. What we need next is a quantitative version of this relationship, which is given by the following proposition. Proposition 10.65 (Eigenvalues and expanders). Let (Gi = (Vi , Ei ))i>1 be a sequence of graphs. For each i, let Mi = MGi be the averaging operator

10.4 Highly Connected Networks: Expanders

405

for Gi , and order its eigenvalues λ1 (Mi ) = 1 > λ2 (Mi ) > · · · > λ|Vi | (Mi ). Suppose that there exists some ε > 0 with λ2 (Mi ) 6 1 − ε

(10.32)

for all i > 1. Then the sequence of graphs is an expander family. The uniform estimate in (10.32) is called a spectral gap for the sequence of graphs. The converse of Proposition 10.65 also holds, but we will not need this direction (we refer to Lubotzky [66] for the proof). Proof of Proposition 10.65. Let ε > 0 be as in the statement of the proposition. Let G = Gi and M = Mi for some fixed i, so that λ2 (M ) 6 1 − ε. |V| Also let S ⊆ V be any subset with |S| 6 |V| 2 . We again think of vectors in R as functions on V, and notice that M (1S ) = 1S + f∂S , where f∂S is a vector that vanishes outside of ∂S and has absolute value less than or equal to 1 on the elements of ∂S. We will estimate kf∂S k2 from above and below, and the resulting estimate will prove the claim. In fact, it follows that p kM (1S ) − 1S k2 = kf∂S k2 6 |∂S|.

On the other hand, M is diagonalizable and so we may expand 1S into a sum of eigenvectors X 1S = vj , (10.33) j

corresponding to the eigenvalues λ1 = 1 > λ2 > · · · > λ|V| . This then gives M (1S ) =

X

λj vj ,

j

and finally M (1S ) − 1S =

|V| X j=2

(λj − 1)vj .

Furthermore, since M is symmetric we can assume that the vectors vj are orthogonal to each other, so v u |V| |V|

X

uX

t 2 kvj k2 > ε vj . kM (1S ) − 1S k2 > min |λj − 1| 26j6|V|

j=2

j=2

2

Thus we need to relate the last norm to the size of S. To this end, notice p that a constant 1 is an eigenvector for the eigenvalue λ1 = 1, k1k2 = |V|, |S| and h1S , 1i = |S|, so the orthogonal projection of 1S onto 1 is |V| 1. Therefore, as in (10.33), we may subtract from 1S the vector v1 =

|S| |V|

1 and obtain

If λj = λj+1 for some j ∈ {2, . . . , |V| − 1} we may and will assume vj+1 = 0.

406

10 Locally Compact Groups, Amenability, Property (T) |V|

X

vj = 1S −

j=2

2

|S| |V|

1

2

  |S| k1S k2 > 1 − |V| p > 21 |S|

(by restricting the sum to S) (since

|S| |V|

6 12 )

and putting these inequalities together gives |V|

X

p p

|∂S| > kM (1S ) − 1S k2 > ε vj > 2ε |S|. j=2

2

As this holds for all subsets S ⊆ V with |S| 6 |V| 2 and all graphs in the 2 sequence (Gi )i>1 , we see that this is an expander family with ξ = ε4 . 

10.4.1 Constructing an Explicit Expander Family Corollary 10.66 (Explicit expander family). Let S = S −1 be a finite symmetric set of generators (not containing the identity) of Γ = SL3 (Z). Let (Vn ) be a sequence of finite sets on each of which Γ acts transitively, with the property that the elements of S 2r{I} have no fixed points for the action. We define the sequence of graphs (Gn ) by Gn = (Vn , En ) for all n > 1, where vertices v, w in Vn are connected in Gn if s v = w for some s ∈ S. Then (Gn ) is a sequence of |S|-regular graphs that form an expander family.

.

We note that the generating set S, for example, could be taken to comprise the 12 matrices given by   1 ±1 0 0 1 0 0 0 1

together with all its conjugates by permutation matrices (a permutation matrix is one obtained by permuting the rows or the columns of the identity matrix). Furthermore, the sequence of sets with the transitive actions could, for example, be Vn = SL3 (Z/pn Z), where pn denotes the nth odd prime number. In this section we will prove Corollary 10.66. By Corollary 10.40 we know that Γ = SL3 (Z) has property (T). In the following argument we let S = S −1 be a finite symmetric set of generators of Γ (not containing the identity). Such a set exists by Exercise 10.37 (or Exercise 10.67 below). Essential Exercise 10.67. Prove that SL3 (Z) is generated by the 12 elements given just after Corollary 10.66. (Note, however, that the argument after Proposition 10.41 does not apply directly since Z is not a field.)

10.4 Highly Connected Networks: Expanders

407

Proof of Corollary 10.66. By Lemma 10.36 all unitary representations of Γ have uniform spectral gap, meaning that there exist a finite Q0 ⊆ Γ and ε0 > 0 such that for any unitary representation π : Γ H there are ⊥ no (Q0 , ε0 )-almost invariant vectors in HΓ . Notice that Q0 ⊆ S k for some k > 1 and that a (S, εk0 )-almost invariant vector is also (Q0 , ε0 )-almost invariant. In other words, we may assume that Q0 = S and that for any unitary representation π : Γ H there are no (S, ε0 )-almost invariant vectors  Γ ⊥ in H . Suppose now that (Vn ) is a sequence of finite sets with the property that for every n the group Γ acts transitively on Vn and there are no elements of Vn that are fixed by S 2r{I}. We note that V = SL3 (Z/pZ) for any odd prime p satisfies these assumptions. In fact, V is itself a group, and we may use the reduction modulo p map φ : Γ = SL3 (Z) → SL3 (Z/pZ) to define the action of Γ by γ v = φ(γ)v for all v ∈ V = SL3 (Z/pZ). Transitivity of this action follows since the subgroup φ(SL3 (Z)) contains all elementary unipotent subgroups of SL3 (Z/pZ) and since these generate SL3 (Z/pZ) (by the argument after the statement of Proposition 10.41 applied to the field K = Fp = Z/pZ). We return to the general case and fix some n > 1. Since Γ acts on the finite set Vn we also obtain the unitary representation

ý

ý

.

πn : Γ

.

ýH

n

= ℓ2 (Vn ) = CVn

defined by πn,γ f (v) = f (γ −1 v) for all γ ∈ Γ and f ∈ Hn . By transitivity of Γ a Γ -invariant function in Hn must be constant, that is HnΓ = C1. Suppose ⊥ now that f ∈ HnΓ is a unit vector. By the uniform spectral gap property above, there exists some γ ∈ S such that kπn,γ f − f k > ε0 .

(10.34)

We now show that this uniform claim implies that the sequence of graphs (Gn ) defined by Gn = (Vn , En ) as in Corollary 10.66 is an expander family by using Proposition 10.65. For this, let ε > 0 and suppose in addition ⊥ that f ∈ HnΓ is an eigenvector for the averaging operator Mn = MGn associated to the graph Gn and eigenvalue λ2 (Mn ) > 1 − ε. By definition of the graph structure and the averaging operator we have Mn (f ) =

1 X πn,γ f, |S| γ∈S

and so 1 − ε 6 λ2 (Mn ) hf, f i = ℜ hMn (f ), f i =

1 X ℜ hπn,γ f, f i . |S| γ∈S

408

10 Locally Compact Groups, Amenability, Property (T)

Fix some γ ∈ S. By using in addition that ℜ hπn,γ ′ f, f i 6 kf k2 = 1 for 1 all γ ′ ∈ Sr{γ}, we deduce from this that 1 − ε 6 |S| |S| − 1 + ℜ hπn,γ f, f i and therefore 1 − |S|ε 6 ℜ hπn,γ f, f i . However, this shows that kπn,γ f −f k2 = kπn,γ f k2 −2ℜ hπn,γ (f ), f i+kf k2 6 2−2 (1−|S|ε)= 2|S|ε ε2

0 for all γ ∈ S. Setting ε = 2|S| , this contradicts (10.34). In other words, using this ε we have λ2 (Mn ) < 1 − ε for every n, which shows that the assumptions in Proposition 10.65 are satisfied, and so Corollary 10.66 follows. 

10.5 Further Topics • The above concludes our main discussion of non-abelian topological groups. Abelian groups will appear in Section 11.4 and Section 12.8, which together give, among other things, a complete classification of unitary representations for locally compact abelian groups. We refer to Folland [32] and [26] for more on the general theory of unitary representations of locally compact groups. • Homogeneous spaces are quotients of the form G/Γ where SL3 (R)/ SL3 (R) is a very important example. These spaces have many interesting and important connections to geometry (see Helgason [44] and Ratcliffe [90]), ergodic theory and dynamical systems [27, Ch. 9–11] and [25], algebraic groups (see, for example, the monograph of Witte Morris [116]), and number theory (an introduction to this large area of interaction may be found in the monographs of Diamond and Shurman [22] and Serre [97]). • Amenable groups play an important role in ergodic theory, see [27, Ch. 8]. • For further reading on property (T) we refer the reader to the monograph of Bekka, de la Harpe and Valette [6], to [26], and for an account of the special role played by property (T) in ergodic theory to the work of Gorodnik and Nevo [41].

Chapter 11

Banach Algebras and the Spectrum

In this chapter we will study Banach algebras as introduced in Section 2.4.2. For most of the discussion we will work over C and assume that the Banach algebra A is unital, meaning that there is a multiplicative unit 1A . A multiplicative unit 1A ∈ A is an element with 1A a = a1A = a for all a ∈ A. We assume that 1A 6= 0, or equivalently that A 6= {0} (in the literature one sometimes also sees the assumption k1A k = 1, which will hold in all the examples that we will consider). At first sight it seems as if the assumption that A is unital excludes the important example (L1 (Rd ), +, ∗), but this may be overcome by the simple construction in the following exercise. Essential Exercise 11.1 (Adding a unit). Let A be a complex Banach algebra. Define the algebra A1 = A ⊕ C with the convention that we write the elements of A1 in the form a + λI with a ∈ A and λ ∈ C, use the norm ka + λIk = kakA + |λ|, the obvious linear structure as a vector space over C, and the multiplication (a + λI)(b + µI) = (ab + λb + µa) + λµI. Show that with these definitions A1 is a unital Banach algebra with 1A1 = I being its multiplicative unit.

11.1 The Spectrum and Spectral Radius We say that an element a of a unital Banach algebra A is invertible if there exists some b ∈ A called the inverse of a with ab = ba = 1A . Definition 11.2. Let A be a unital Banach algebra over C. The spectrum of an element a ∈ A is the set σ(a) = {λ ∈ C | a − λ1A is not invertible}. The resolvent set is its complement ρ(a) = {λ ∈ C | a − λ1A is invertible}. Let us note that the above generalizes the notion of an eigenvalue in the following way: If A is the algebra of linear maps on Cd , then the spectrum of an element T ∈ A equals the set of eigenvalues of T . © Springer International Publishing AG 2017 M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications, Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_11

409

410

11 Banach Algebras and the Spectrum

Exercise 11.3. Let X be a compact topological space, and let A = C(X). Find the spectrum of f ∈ C(X) as an element of the Banach algebra C(X).

Essential Exercise 11.4. In this exercise we describe the spectrum of multiplication operators Mg : H → H as in Exercise 6.25. (a) Let µ be a compactly supported finite (or σ-finite) measure on C, and write MI for the multiplication operator corresponding to the identity map, so (MI (f ))(z) = zf (z) for f ∈ L2µ (C). Show that the spectrum σ(MI ) within the algebra of bounded operators B(L2µ (C)) equals the support of µ. (b) Let (X, B, µ) be a σ-finite measure space, H = L2µ (X) and g : X → C a bounded measurable function. Show that the spectrum of the multiplication operator Mg within the algebra of bounded operators coincides with the essential range of g, which is defined to consist of all λ ∈ C with the property that µ(g −1 (U )) > 0 for all neighbourhoods U of λ. The following theorem will show that the spectrum is always non-empty, and so provides us with generalized eigenvalues. Since even in finite dimensions eigenvalues may be complex, we will only consider Banach algebras over C. Definition 11.5. For an element a of a complex unital Banach algebra the spectral radius is maxλ∈σ(a) |λ|. Theorem 11.6 (Spectrum and spectral radius formula). Let A be a complex unital Banach algebra. Then for every a ∈ A the spectrum σ(a) is a non-empty compact subset of C. Moreover, the spectral radius satisfies p max |λ| = lim n kan k. (11.1) λ∈σ(a)

n→∞

This theorem is the first of many that relate the algebraic to the topological structure in Banach algebras. The spectrum and the spectral radius of an element are defined in purely algebraic terms, whereas the limit is defined in terms of the norm. One surprising consequence is the following observation: If A is a unital Banach algebra contained in a larger Banach algebra B (with compatible structures), then it is possible for an element a ∈ A to be non-invertible in A but to be invertible in B. Thus the spectrum of an element depends on the algebra it is viewed in, and σB (a) ⊆ σA (a) with strict containment being a possibility (see Exercise 11.7). Despite this, the spectral radius of a ∈ A is not changed when it is viewed as an element of B, since Theorem 11.6 expresses it in terms of the norms of powers of a, which are not affected by the switch from A to B (by the implicit compatibility assumption). Exercise 11.7. Let U : ℓ2 (Z) → ℓ2 (Z) be the unitary shift operator from Exercises 6.1 and 6.23(a), so U ((xn )) = (xn+1 ). (a) Show that the spectrum of U considered within the algebra B of all bounded operators on ℓ2 (Z) is given by S1 = {λ ∈ C | |λ| = 1}. (b) Now consider the Banach algebra A generated by U (obtained by taking the closed linear hull of U 0 = I, U, U 2 , . . .). Show that the spectrum of U within A is {λ ∈ C | |λ| 6 1}.

11.1 The Spectrum and Spectral Radius

411

The above mentioned dependence of the spectrum of an element on the ambient algebra will not cause any confusion: In the abstract setting considered here we will only work with one algebra at a time, and in the application of these results in the context of operators on a Hilbert space H we will always consider the algebra B(H) of all bounded linear operators on H. Exercise 11.8 (There are no interesting Banach fields). Use Theorem 11.6 to show that C is the only Banach algebra over C that is also a field.

Finally, let us comment on the precise shape of the spectral radius formula (11.1). It will be relatively straightforward to show that scalars λ ∈ C with |λ| > kak cannot belong to the spectrum of a ∈ A. However, it is also clear that in general the norm may be much larger than the spectral radius. Self-adjoint and, more generally, normal operators on Hilbert spaces will form a nice exception to this. In fact, even in the elementary case of the algebra of two-by-two matrices (equipped with the operator norm) the norm of the matrix   1C a= 0 1 can be made arbitrarily large by increasing the value of C, but the spectrum always consists simply of 1 ∈ C. The right-hand side of the spectral radius formula (11.1) essentially ignores the original size of the matrix a and instead looks at the exponential growth rate of the norm of an . In the case at hand the norm of an grows linearly which makes the right-hand side equal to one (and thus equal to the left-hand side). Exercise 11.9. Let k ∈ C([0, 1]2 ) be a continuous function, so that K(f )(x) =

Z

x

k(x, t)f (t) dt

0

for f ∈ C([0, 1]) defines an operator K : C([0, 1]) → C([0, 1]). Determine σ(K).

For the proof of Theorem 11.6 we will use Cauchy integration on the complex plane and convergent geometric series in the unital Banach algebra. 11.1.1 The Geometric Series and its Consequences Given a unital Banach algebra A we will set a0 = 1A for any a ∈ A as is customary. Also recall that the inverse of an invertible element a ∈ A is uniquely determined by a. Proposition 11.10. The set U of invertible elements of a unital Banach algebra A is open. Moreover, for any a ∈ A the resolvent set ρ(a) is open in C, so the spectrum is a closed set. Proof. If a ∈ A and kak < 1, then the inverse of 1A − a is given by the geometric series

412

11 Banach Algebras and the Spectrum

(1A − a)−1 =

∞ X

an .

(11.2)

n=0

Indeed, since the right-hand side converges absolutely we may take the product and obtain ! ! ∞ ∞ ∞ ∞ X X X X n n (1A − a) a = a (1A − a) = an − an = 1 A n=0

n=0

n=0

n=1

as desired. This shows that B1 (1A ) ⊆ U. −1 Now let a0 ∈ U be any invertible element with ka − a0 k < ka−1 . Then 0 k we claim that a is also invertible, which will then show that U ⊆ A is open. To prove the claim, notice that  a = a0 + (a − a0 ) = a0 1A + a−1 0 (a − a0 ) {z } | ∈B1 (1A )

is a product of two elements of U and so lies in U. Finally, for any a ∈ A the resolvent set ρ(a) = {λ ∈ C | a − λ1A ∈ U} is the pre-image of an open set under a continuous mapping, and so is open. Therefore the spectrum σ(a) = Crρ(a) is closed. 

Proposition 11.11.p Let A be a unital Banach algebra over C, and let a ∈ A. If λ ∈ C has |λ| > m kam k for some m > 1, then λ ∈ ρ(a). In particular, C and so is compact. the spectrum σ(a) is a closed subset of Bkak Proof. Let a ∈ A and λ ∈ C be as in the proposition. Then we claim that −1

(λ1A − a)

∞ X  = λ−1 1A + λ−1 a + · · · + λ−(m−1) am−1 λ−mn amn n=0

(here and below we will sometimes study λ1A − a instead of a − λ1A , which clearly will not make any difference). For this notice first that by assumption ∞ X

n=0

kλ−mn amn k 6

P∞

∞  X

n=0

|λ|−m kam k | {z } 1 and with limn→∞ an = a, limn→∞ bn = b. Show that ab = ba.

11.1.2 Using Cauchy Integration We have shown that σ(a) ⊆ C is compact for any element a ∈ A, but have yet to show that σ(a) is non-empty. This existence theorem uses Cauchy integration, and to prepare for this we need the following lemma concerning the resolvent. Lemma 11.13 (Resolvent function). Let a be an element of a unital Banach algebra A over C. Then the resolvent function R : ρ(a) → A defined by R(λ) = (λ1A − a)−1 is an analytic function in the sense that for any λ0 ∈ ρ(a) there is an open neighbourhood of λ0 on which R is given by an absolutely convergent power series R(λ) =

∞ X

n=0

bn (λ − λ0 )n

with coefficients bn ∈ A. Proof. We use essentially the same formulas as those that arise in the proof of Proposition 11.10. Let a ∈ A and λ0 ∈ ρ(a) be as in the lemma. Suppose that λ ∈ C satisfies |λ − λ0 | < k(λ0 1A − a)−1 k−1 . Then λ1A − a = (λ0 1A − a) − (λ0 − λ)1A which shows that

 = (λ0 1A − a) 1A − (λ0 − λ)(λ0 1A − a)−1 ,

R(λ) = (λ0 1A − a)−1

∞ X

n=0

(λ0 1A − a)−1

n

(−1)n (λ − λ0 )n

is, for |λ − λ0 | < k(λ0 1A − a)−1 k−1 , an absolutely convergent power series, as claimed. 

414

11 Banach Algebras and the Spectrum

With this analyticity we are ready to prove the first part of Theorem 11.6. Proof that σ(a) is non-empty. Let a be an element of a unital Banach algebra, and suppose that σ(a) is empty. We first sketch an argument that produces a contradiction from this assumption, and then fill in the details. An entire function. Since σ(a) is empty, the resolvent function R(λ) = (λI − a)−1 is an entire function (that is, is an analytic function defined on all of C). It follows by Cauchy’s integral formula that I R(z) dz = 0 γ

for any closed piecewise differentiable path γ in C. The alert reader may notice that this usage of Cauchy integration is a bit unorthodox, but should read on — this will be resolved below. In particular, if γ is the closed positively oriented path with centre 0 and radius kak + 1, then R(z) = (z1A − a)−1 =

1 z

1A − z1 a

−1

=

∞ X

z −n−1 an

(11.3)

n=0

for any z on the path γ, and the sum is absolutely convergent. Therefore, 0=

I

R(z) dz =

γ

=

I

∞ X

I

(11.4)

a

n=0

since

z −n−1 an dz

|z|=kak+1 n=0 I ∞ X n

|z|=kak+1

z

|z|=kak+1

−n−1

dz =

(

z −n−1 dz = 2πi1A ,

2πi if n = 0, 0 if n = 6 0.

Now 1A 6= 0, and so (11.4) shows that the assumption σ(a) = ∅ leads to a contradiction. Using the standard Cauchy integral formula. The difficulty with the argument sketched above is that most of the integrals are integrals of Avalued functions. Even though it is possible to make sense of integration for Avalued functions (see Proposition 3.81), we do not need to extend the Cauchy integral formula for A-valued functions because of the following argument (which could be used to prove such an extension). Let ℓ ∈ A∗ be a linear functional with ℓ(1A ) 6= 0 (such a functional is guaranteed to exist by Theorem 7.3) and consider ℓ ◦ R : ρ(a) = C −→ C. By Lemma 11.13, R(z) can locally be represented as a power series. By continuity of ℓ, the same holds for ℓ ◦ R. It follows that ℓ ◦ R : C → C is an entire

11.1 The Spectrum and Spectral Radius

415

function (in the usual sense of complex analysis). Using this entire function in the calculation in (11.4) we see that 0=

I

|z|=kak+1

ℓ ◦ R(z) dz =

∞ X

n=0

I ℓ(an )

|z|=kak+1

z −n−1 dz = 2πiℓ(1A ) 6= 0.

It follows that ℓ ◦ R cannot be defined on all of C, and so σ(a) is non-empty.  For the spectral radius formula in Theorem 11.6 we also need the following elementary property of sub-additive and sub-multiplicative real-valued sequences. Definition 11.14. A real sequence (αn ) is sub-additive if αm+n 6 αm + αn for all m, n > 1, and is sub-multiplicative if αn > 0 and αm+n 6 αm αn for all m, n > 1. Lemma 11.15 (Fekete’s lemma). Let (αn ) be a real sequence. (1) If (αn ) is sub-additive then αn αn = inf . n→∞ n n>1 n lim

(2) If (αn ) is non-negative and sub-multiplicative then lim

n→∞

√ √ n αn = inf n αn . n>1

Proof. Suppose first αn > 0 is sub-multiplicative. If αn0 = 0 for some n0 , then αn = 0 for all n > n0 and in this case the claim is trivial. On the other hand, if all αn are stricly positive the statement follows from (1) applied to the sequence (log αn ). So consider now a real-valued sub-additive sequence (αn ) and let αn , n∈N n

α = inf

so that αnn > α for all n > 1. Let β > α be arbitrary and pick k > 1 such that αkk < β. For any n > k we apply division with remainder to get n = mk + j for some j ∈ {0, . . . , k − 1} and m > 1. By the sub-additivity property we then have αn αmk αj mαk jα1 mk αk jα1 6 + 6 + = + . n n n n n n k n If now n = mk + j is large enough we see that the right-hand side is less than β, which proves the statement.  Proof of Theorem 11.6. Notice first that the sequence (αn ) defined by

416

11 Banach Algebras and the Spectrum

αn = kan k for n > 1 is sub-multiplicative, since αm+n = kam+n k 6 kam kkan k = αm αn p p for all m, n > 1. Thus m kam k converges to inf m>1 m kam k by Lemma 11.15. Proposition 11.11 shows that, if λ ∈ C satisfies p |λ| > inf m kam k, m>1

then λ ∈ ρ(a). Thus

max |λ| 6 inf

λ∈σ(a)

p p kam k = lim m kam k.

m

m>1

m→∞

Using the Cauchy integral formula. The reverse inequality is more involved. It involves a refinement of the proof that σ(a) is non-empty, and will use the Cauchy integral formula again. Let s = maxλ∈σ(a) |λ|, so that the resolvent function R(z) = (z1A − a)−1 is analytic on {z ∈ C | |z| > s} ⊆ ρ(a). Pick ℓ ∈ A∗ with kℓk 6 1 and fix ε > 0. Using the positively oriented closed path with centre 0 and radius s + ε, we see that I   −1 m ℓ (z1A − a) z dz ≪a,ε (s + ε)m |z|=s+ε for all m > 0, where |z m | = (s+ε)m and the implicit constant only depends on the restriction of R(z) = (z1A −a)−1 to {z ∈ C | |z| = s+ε}, and in particular does not depend on m and ℓ. Expanding the circle to the radius kak + 1 does not change the integral, so that we may use (11.3) again to see that I I   −1 m ℓ (z1A − a) z dz = ℓ (z1A − a)−1 z m dz |z|=s+ε

|z|=kak+1

=

∞ X

I ℓ(an )

n=0

z −n+m−1 dz = 2πiℓ(am ). |z|=kak+1

Together this gives |ℓ(am )| ≪a,ε (s + ε)m

for all m > 1 and ℓ ∈ A∗ with kℓk 6 1. Using Corollary 7.4 we deduce that kam k ≪a,ε (s + ε)m . Taking the mth root and the limit we see that the implicit constant disappears, and we get

11.2 C ∗ -algebras

417

p lim m kam k 6 s + ε = max |λ| + ε.

m→∞

λ∈σ(a)

Since this holds for any ε > 0 the theorem follows.



The results of this and the following section will be used in Chapter 12 to derive the spectral theory of bounded self-adjoint operators and their functional calculus.

11.2 C ∗ -algebras Definition 11.16. A Banach algebra A over C is a C ∗ -algebra if it has a star operator ∗ : A → A with the following properties: • • • •

∗ is semi-linear; (ab)∗ = b∗ a∗ for a, b ∈ A; (a∗ )∗ = a for a ∈ A; and ka∗ ak = kak2 for a ∈ A (the C ∗ -property of the norm).

Example 11.17. (a) The algebra of bounded operators B(H) on a Hilbert space H has a star operator, namely the map that sends A ∈ B(H) to its adjoint A∗ ∈ B(H) (introduced in Section 6.2.1). For this star operator we already know all the desired properties with the exception of the last (critical) property. To see this last property, let A ∈ B(H), and notice that A∗ A is selfadjoint since (A∗ A)∗ = A∗ (A∗ )∗ = A∗ A. Hence, by Lemma 6.31, kA∗ Ak = sup |hA∗ Ax, xi| = sup hAx, Axi = kAk2 , kxk61

kxk61

as required. Therefore B(H) is a C ∗ -algebra. (b) The space of bounded functions B(X), of continuous bounded functions Cb (X), of measurable bounded functions L∞ (X), and of measurable ∗ essentially bounded functions L∞ µ (X) are all commutative unital C -algebras. For these multiplication is defined pointwise, and the star operator is pointwise complex conjugation. Essential Exercise 11.18. Show that ka∗ k = kak for a ∈ A if A is a C ∗ algebra. Definition 11.19. Let A be a C ∗ -algebra. Then an element a ∈ A is called self-adjoint if a∗ = a, and is called normal if a∗ a = aa∗ . Essential Exercise 11.20. Let A be a unital C ∗ -algebra. Show that the unit 1A is self-adjoint and has k1A k = 1. For normal elements in a C ∗ -algebra the spectral radius formula simplifies.

418

11 Banach Algebras and the Spectrum

Proposition 11.21 (Spectral radius formula for normal elements). Let A be a unital C ∗ -algebra, and let a ∈ A be a normal element. Then the spectral radius satisfies max |λ| = kak. λ∈σ(a)

Proof. We will prove by induction on n that n

n

ka2 k = kak2 .

(11.5)

The case n = 0 is trivial. For n = 1 we have ka2 k2 = k(a2 )∗ a2 k = k(a∗ a)∗ (a∗ a)k = ka∗ ak2 = kak4 , where we used the C ∗ -property of the norm for a2 , normality of a, and the C ∗ property of the norm for a∗ a and for a. Now suppose that (11.5) holds for a n given n > 1 and set b = a2 . Then n+1

ka2

n

n+1

k = kb2 k = kbk2 = ka2 k2 = kak2

,

where we used the definition of b, the case n = 1 for the normal element b, and the inductive hypothesis. This concludes the induction, proving (11.5) for all n > 0. Applying Theorem 11.6 now gives the proposition.  Starting in Section 12.5, we will use the results of this section and their refinements in Section 11.3.4 to obtain the spectral theory of commutative C ∗ -subalgebras of bounded operators on a Hilbert space.

11.3 Commutative Banach Algebras and their Gelfand Duals Recall that the dual space A∗ of a Banach algebra A consists of all bounded linear functionals A → C. If A is in addition commutative (with ab = ba for all a, b ∈ A) then it is useful to study algebra homomorphisms. The trivial map χ defined by χ(a) = 0 for all a ∈ A may also be considered an algebra homomorphism, but we will exclude this trivial map in the discussion below. Definition 11.22. Let A be a commutative Banach algebra over C. Then the Gelfand dual σ(A) is the set of all non-trivial (equivalently, surjective) continuous algebra homomorphisms χ : A → C (which are also called char acters). That is, σ(A) = χ ∈ A∗r{0} | χ(ab) = χ(a)χ(b) for all a, b ∈ A .

11.3 Commutative Banach Algebras and their Gelfand Duals

419

11.3.1 Commutative Unital Banach Algebras If the Banach algebra that we consider also has a unit, then we can link the notion of algebra homomorphisms to the spectrum of the elements of the algebra. The following result establishes this link and a great deal more. Theorem 11.23 (Properties of the Gelfand dual). Let A be a commut∗ ative unital Banach algebra over C. Then σ(A) ⊆ B1A is non-empty and weak* compact, and σ(a) = {χ(a) | χ ∈ σ(A)} for every a ∈ A. We start the proof of Theorem 11.23 by showing that any algebra homomorphism χ : A → C is continuous(31) (and so strictly speaking the continuity hypothesis in Definition 11.22 could be dropped). Lemma 11.24. Let A be a commutative Banach algebra, and let χ : A → C be an algebra homomorphism. Then χ is continuous and kχk 6 1. Proof. Suppose there is an element a ∈ A with kak < 1 and with kχ(a)k > 1. Replacing P a by a/χ(a) we may assume that kak < 1 and χ(a) = 1. Then the n series b = ∞ n=1 a converges and satisfies a + ab = b, so that 1 + χ(b) = χ(a) + χ(a)χ(b) = χ(b),

a contradiction. This implies that χ is continuous and has kχk 6 1.



For the next steps we will need to use some more terminology from basic algebra. Recall that an ideal J of a commutative algebra A is a subspace such that AJ = {ab | a ∈ A, b ∈ J} ⊆ J, and that for any ideal J the quotient A/J is also a commutative algebra with multiplication given by (a + J)(b + J) = ab + J for all a, b ∈ A. An ideal J ⊆ A is proper if J 6= A. A maximal ideal M in a unital commutative algebra A is a proper ideal such that if J is an ideal with M ⊆ J ⊆ A then J = M or J = A. The quotient of a unital algebra by a maximal ideal M is always a field, for if a+ M ∈ A/M r{0} then J = Aa+ M is an ideal strictly bigger than M , and so must be A. Since A has a unit 1A , we have ba + m = 1A for some b ∈ A and m ∈ M , so every non-zero element of A/M has a multiplicative inverse. The next lemma examines these general notions for Banach algebras. Lemma 11.25. Let A be a commutative unital Banach algebra. The closure of any ideal in A is an ideal, and any maximal ideal is closed. Proof. The first claim is an easy consequence of the fact that the multiplication map A × A → A is continuous by the discussion in Section 2.4.2. For the second claim, notice that a proper ideal J ⊆ A cannot contain 1A , nor indeed any invertible element. By Proposition 11.10 this implies that 1A

420

11 Banach Algebras and the Spectrum

is not an element of J . Since a maximal ideal M is proper, and its closure M is also a proper ideal, we see that M = M is closed.  We note that Lemma 11.25 gives a second proof that any algebra homomorphism χ : A → C on a commutative unital Banach algebra is continuous: Given a non-trivial algebra homomorphism χ : A → C its kernel M = ker χ is a maximal ideal, and so is closed. Then χ equals the composition of the continuous projection A → A/M and the isomorphism A/M → C induced by χ (which is continuous by finite-dimensionality), hence we see that χ : A → A/M → C is a continuous map. For the proof of Theorem 11.23 we need one more algebraic result. Lemma 11.26. Let R be a commutative ring with a unit, and let J0 ⊆ R be a proper ideal. Then there exists a maximal ideal M ⊆ R containing J0 . Proof. This is a direct application of Zorn’s lemma. Define a set S = {J ⊆ R | J is an ideal and J0 ⊆ J ( R} with the partial order defined by inclusion. If {Jα | α ∈ I} is a linearly ordered subset (a chain) in S, then [ J= Jα α∈I

is again an ideal. Moreover, since each Jα is proper, we have 1A ∈ / Jα for all α in I, and so 1A ∈ / J, showing that J is also proper. By Zorn’s lemma, it follows that the set S contains a maximal element, which by construction is a maximal ideal containing J0 .  Proof of Theorem 11.23. Note that an algebra homomorphism χ : A → C is non-trivial if and only if χ(1A ) = 1. Indeed, if χ(a) 6= 0 for some a ∈ A then χ(a) = χ(1A a) = χ(1A )χ(a) shows that χ(1A ) = 1. By the definition and Lemma 11.24 we have n o ∗ σ(A) = χ ∈ B1A | χ(ab) = χ(a)χ(b) for all a, b ∈ A and χ(1A ) = 1 \  ∗ = B1A ∩ χ ∈ A∗ | χ(ab) = χ(a)χ(b)} ∩ {χ ∈ A∗ | χ(1A ) = 1 . a,b∈A

Since the sets {χ ∈ A∗ | χ(1A ) = 1} and {χ ∈ A∗ | χ(ab) = χ(a)χ(b)} are closed in the weak* topology for every a, b ∈ A, we see that σ(A) is weak* compact by the Banach–Alaoglu theorem (Theorem 8.10). Now let a0 ∈ A be non-invertible, so that J = Aa0 is a proper ideal. By Lemma 11.26 there is a maximal ideal M ⊆ A containing J . By Lemma 11.25, M is closed. We claim that B = A/M is also a Banach algebra. To see this, we equip B with the quotient norm (from Section 2.1.2)

11.3 Commutative Banach Algebras and their Gelfand Duals

421

which makes B into a Banach space by Lemma 2.29. Since M is an ideal, multiplication is well-defined on A/M. Finally, kab + MkA/M 6 k(a + m1 )(b + m2 )kA 6 ka + m1 kA kb + m2 kA for all a, b ∈ A and all m1 , m2 ∈ M, which implies that k(a + M)(b + M)kA/M 6 ka + MkA/M kb + MkA/M by taking the infimum over m1 and m2 ∈ M, as required. Thus A/M is a Banach algebra and a field (since M is maximal). We claim this implies that A/M = C(1A + M) ∼ = C. Indeed (solving Exercise 11.8), if a + M ∈ A/M, then σ(a + M) 6= ∅ by Theorem 11.6, and so a−λ1A +M is non-invertible for some λ ∈ C. However, since A/M is a field this implies that a+M = λ1A +M and hence the claim. Together we have shown that if a0 ∈ A is non-invertible, then there exists a non-trivial algebra homomorphism χ : A → A/M ∼ = C with χ(a0 ) = 0. Applying this for any a ∈ A to a − λ1A for λ ∈ σ(a), we see that for any such λ there is some χ ∈ σ(A) with χ(a) = λ. On the other hand, if χ(a) = λ for some a ∈ A, λ ∈ C and χ ∈ σ(A), then χ(a − λ1A ) = 0 and hence a − λ1A cannot be invertible (since χ 6= 0). Together we have shown the theorem.  ˇ Example 11.27 (Stone–Cech compactification). Let A = ℓ∞ (N), which is a Banach algebra with respect to the pointwise product. Clearly for any n0 ∈ N the map defined by χn0 ((an )) = an0 is an algebra homomorphism, and N ∋ n0 7→ χn0 defines a map from N to σ(A). The compact (but non-metrizable) topological space σ(A) is called ˇ the Stone–Cech compactification of N and is denoted βN. Exercise 11.28. (a) Show that the image of N is dense in β N. (b) Show that ℓ∞ (N) can be canonically identified with C(β N). (c) Show that β N is non-metrizable.

11.3.2 Commutative Banach Algebras without a Unit While the notions of invertibility and spectrum are linked to the existence of a unit, the definition of the Gelfand dual is not. However, the topological properties of σ(A) are changed by the absence of a unit.

422

11 Banach Algebras and the Spectrum

Corollary 11.29 (Properties of the Gelfand dual). Let A be a com∗ mutative Banach algebra over C. Then σ(A) ⊆ B1A is locally compact (and σ(A) ∪ {0} is compact) in the weak* topology on A∗ . For any a ∈ A we also have p max |χ(a)| = lim n kan k. n→∞

χ∈σ(A)∪{0}

Recall that if X is a compact space and x0 ∈ X is any point, then the space Y = Xr{x0 } is in general only locally compact. Moreover, in the case when Y is not compact, the one-point compactification of Y is homeomorphic to (and so can be identified with) X = Y ∪ {x0 }, where the point x0 takes the role of ∞. Exercise 11.30. Recall the definition of the one-point compactification of a locally compact space Y and show the above claim.

Proof of Corollary 11.29. The proof that σ(A) ∪ {0} is compact in the weak* topology is the same as in the proof of Theorem 11.23 since Lemma 11.24 implies that n o ∗ σ(A) ∪ {0} = χ ∈ B1A | χ(ab) = χ(a)χ(b) for all a, b ∈ A , ∗

which is easily seen to be a closed subset of B1A in the weak* topology. For the last claim of the corollary note that if A has a unit, then Theorem 11.23 applies and gives the statement. So assume that A does not have a unit, and consider the algebra A1 = A ⊕ C with the multiplication and norm as in Exercise 11.1. As argued in the beginning of the proof of Theorem 11.23, χ1 (1A ) = 1 for any χ1 ∈ σ (A1 ) so that χ1 is uniquely determined by χ = χ1 |A ∈ σ(A) ∪ {0}. Moreover, any χ ∈ σ(A) ∪ {0} can be extended to a character χ1 ∈ σ (A1 ) by setting χ1 (a + λ1A ) = χ(a) + λ for any a + λ1A ∈ A1 , which allows us to identify σ (A1 ) with σ(A) ∪ {0}. Applying Theorems 11.23 and 11.6 to A1 now gives p max |χ(a)| = max |χ(a)| = lim n kan k. χ∈σ(A)∪{0}

χ∈σ(A1 )

n→∞



Exercise 11.31. Show that every character χ ∈ σ(A) can be extended to a character χ1 , as claimed in the proof of Corollary 11.29.

11.3.3 The Gelfand Transform Definition 11.32. Let A be a commutative Banach algebra with Gelfand dual σ(A). Then the map (·)o : A → C(σ(A)) defined by

11.3 Commutative Banach Algebras and their Gelfand Duals

423

f o (χ) = χ(f ) for f ∈ A and χ ∈ σ(A) is called the Gelfand transform. Just as in Theorem 11.23 we will always use the weak* topology on σ(A). Proposition 11.33. Let A be a commutative Banach algebra. The Gelfand transform is an algebra homomorphism from A into C0 (σ(A)) (or C(σ(A)) if A has a unit) so that (f1 f2 )o = f1o f2o for all f1 , f2 ∈ A. Moreover, it satisfies kf o k∞ 6 kf k for all f ∈ A. Proof. By definition of the weak* topology, f o (χ) = χ(f ) depends continuously on χ ∈ σ(A) for each f ∈ A. By Lemma 11.24, |f o (χ)| = |χ(f )| 6 kχkkf k 6 kf k for all χ ∈ σ(A), and so kf o k∞ 6 kf k. Finally, 0 ∈ A∗ plays the role of infinity in the one-point compactification of σ(A) in Corollary 11.29. This gives f o ∈ C0 (σ(A)), as required. Finally, f1 , f2 ∈ A and χ ∈ σ(A) implies that (f1 f2 )o (χ) = χ(f1 f2 ) = χ(f1 )χ(f2 ) = f1o (χ)f2o (χ), which shows that the Gelfand transform is an algebra homomorphism from A into C0 (σ(A)).  11.3.4 The Gelfand Transform for Commutative C ∗ -algebras The Gelfand transform has good additional properties for C ∗ -algebras. Corollary 11.34. Let A be a commutative unital C ∗ -algebra. Then the Gelfand transform is an isometric algebra isomorphism from A onto C(σ(A)) satisfying (a∗ )o = ao for all a ∈ A. Proof. From Proposition 11.33 we know that the Gelfand transform is an algebra homomorphism. For a ∈ A the norm kao k∞ of the Gelfand transform equals the spectral radius of a (see Theorem 11.23). By Proposition 11.21 we get kao k∞ = kak, since in a commutative C ∗ -algebra every element a ∈ A is normal. This shows that (·)o : A −→ C(σ(A)) is an isometric algebra homomorphism between A and a complete sub-algebra of C(σ(A)) which contains the unit since 1oA = 1, and separates points since χ1 6= χ2 ∈ σ(A) implies that there exists some a ∈ A with ao (χ1 ) = χ1 (a) 6= χ2 (a) = ao (χ2 ). Since σ(A) is a compact space we can apply the Stone–Weierstrass theorem (Theorem 2.40) provided that we can also show that the image is closed under conjugation. Once this is done, we can conclude that the image of the Gelfand transform is both dense in C(σ(A)) and complete, and therefore must be all of C(σ(A)).

424

11 Banach Algebras and the Spectrum

To show closure under conjugation, it suffices to prove that χ(a∗ ) = χ(a) for all a ∈ A and χ ∈ σ(A), which also implies that (a∗ )o = ao for all a ∈ A, as claimed in the corollary. This in turn follows if we know that a = a∗ ∈ A implies that χ(a) ∈ R for every χ ∈ σ(A). Indeed, any a ∈ A can be written as a + a∗ a − a∗ a= +i = aℜ + iaℑ , 2 2i ∗

where both aℜ = a+a and aℑ = a−a are self-adjoint. Assuming that χ(aℜ ) 2 2i and χ(aℑ ) are real, we deduce that χ(a∗ ) = χ(aℜ ) − iχ(aℑ ) = χ(aℜ ) + iχ(aℑ ) = χ(aℜ + iaℑ ) = χ(a). The following lemma then finishes the proof of the corollary.



Lemma 11.35. Let a = a∗ ∈ A be a self-adjoint element of a unital C ∗ algebra. Then σ(a) ⊆ R. As we will see in the course of the proof, this can be deduced from Proposition 11.11. This might be a little confusing initially. How can a property like maxλ∈σ(a) |λ| 6 kak imply that σ(a) is real? One way of viewing the situation is to apply a vertical translation to the set σ(a), as illustrated in Figure 11.1.

σ(a−iy1A )

kak C Bkak

ka−iy1A k

σ(a)

Fig. 11.1: Many possible λ ∈ C that satisfy the constraint |λ| 6 kak might not satisfy p the constraint |λ − iy| 6 ka − iy 1k if the norm of a − iy1A for y ∈ R is kak2 + |y|2 as Figure 11.1 suggests. Taking y → ∞ and y → −∞ shows that the spectrum σ(a) is a subset of R.

11.4 Locally Compact Abelian Groups

425

Proof of Lemma 11.35. By Proposition 11.11 we know that the spectral radius of a − iy1A is at most ka − iyIk. Let λ ∈ σ(a) and y ∈ R. Then λ − iy ∈ σ(a − iy1A ), and |λ−iy|2 6 ka−iy1Ak2 = k(a−iy1A )∗ (a−iy1A )k = k(a + iy1A )(a−iy1A )k = ka2 + y 2 1A k 6 ka2 k + y 2 k1A k = kak2 + y 2 ,

where we have used the C ∗ -property, the fact that a and 1A are self-adjoint, and the fact that k1A k = 1 (see Exercise 11.20 and its hint on p. 581). Writing λ = x0 + iy0 ∈ C with x0 , y0 ∈ R, the calculation above gives x20 + y 2 − 2yy0 + y02 = x20 + (y − y0 )2 6 kak2 + y 2 for all y ∈ R. However, this shows that y0 = 0, and so σ(a) ⊆ R, as claimed.  Exercise 11.36. Let A be a commutative C ∗ -algebra. (a) Show that the Gelfand transform is an isometry onto C0 (σ(A)). (b) Show that σ(A) is compact if and only if A is unital. (c) Assume now that A is not unital. Show that it is possible to define a norm on AI = A⊕C so that AI is again a C ∗ -algebra. (The norm from Exercise 11.1 may not do this.)

11.4 Locally Compact Abelian Groups An important special case of the Gelfand transform is given by the following proposition, but we first need a definition. The reader may make this more familiar by assuming that G = Rd or G = Td (cf. Exercise 1.3, which easily generalizes to Td and Rd ). Definition 11.37. Let G be a locally compact metrizable abelian group. The dual group, character group, or Pontryagin dual of G is the abelian group b = Hom(G, S1 ) = {χ : G → S1 | χ is a continuous homomorphism}, G

where S1 is the multiplicative unit circle and the group operation is pointwise multiplication. Proposition 11.38 (Algebra homomorphisms on L1 (G)). Let G be a σcompact locally compact metrizable abelian group, which we equip with a Haar measure m = mG . Then L1 (G) is a separable commutative Banach algebra with respect to the convolution defined by Z f1 ∗ f2 (g) = f1 (h)f2 (g − h) dm(h) G

426

11 Banach Algebras and the Spectrum

for f1 , f2 ∈ L1 (G). The Gelfand dual σ(L1 (G)) of all non-trivial algebra homomorphisms from L1 (G) onto C is a locally compact σ-compact metrizable b The Gelfand transspace which can be identified with the Pontryagin dual G. form can be identified with the Fourier back transform. b is a We now explain the two identifications in more detail. If χG ∈ G continuous group homomorphism χG : G −→ S1 = {z ∈ C | |z| = 1},

then it gives rise to an algebra homomorphism χA on A = L1 (G) defined by Z χA (f ) = f (g)χG (g) dm(g), G

which is well-defined since f ∈ L1 (G) and χG ∈ L∞ (G). The first identification claimed is the statement that every algebra homomorphism on L1 (G) has this shape. This also explains the second identification as follows. The b Fourier back transform of an element f ∈ L1 (G) is the function fq on G defined by Z q f (χG ) = f (g)χG (g) dm(g) G

b Since we identify the Pontryagin dual G b with the Gelfand for χG ∈ G. b corresponds precisely dual σ(L1 (G)), we see that every character χG ∈ G to one χA ∈ σ(A) and vice versa, and that fq(χG ) = χA (f ) = f o (χA )

is the Fourier back transform and is at the same time also the Gelfand transform. Proof of Proposition 11.38. By Proposition 3.91, L1 (G) is a separable Banach algebra. By Exercise 3.92 (see also Lemma 3.59(1)), L1 (G) is commutative. The proof that every continuous group homomorphism χG : G → S1 gives rise to an algebra homomorphism Z 1 χA : L (G) ∋ f 7−→ f χG dm G

is very similar to the proof for the case G = Rd in Proposition 9.31 and is therefore left to the reader. Corollary 11.29 shows that σ(L1 (G)) is locally compact in the weak* topology. The claimed metrizability follows from Proposition 8.11 since L1 (G) is separable. Finally, the σ-compactness follows since σ(L1 (G)) ∪ {0} is also compact and metrizable by Corollary 11.29 and Proposition 8.11. The main claim of the proposition is therefore that every non-trivial algebra homomorphism χA : L1 (G) → C arises from some continuous group

11.4 Locally Compact Abelian Groups

427

homomorphism χG : G → S1 . So let χA ∈ σ(L1 (G)). Then by Lemma 11.24 we have kχA k 6 1. By Proposition 7.34 there is an element χ ∈ L∞ (G) with kχk∞ 6 1 such that Z χA (f ) = f χ dm G

1

for all f ∈ L (G). We have to show that χ can be chosen in Cb (G) and with the property that χ(gh) = χ(g)χ(h) for all g, h ∈ G (which will also imply that χ(g) ∈ S1 for all g ∈ G). For this proof we apply the algebra homomorphism property of the map χA ∈ σ(L1 (G)) for f, f0 ∈ L1 (G) and obtain together with Fubini’s theorem that Z  f (h) χ(h)χA (f0 ) dm(h) = χA (f )χA (f0 ) = χA (f ∗ f0 ) G Z Z = f (h)f0 (g − h) dm(h)χ(g) dm(g) ZG G Z = f (h) f0 (g − h)χ(g) dm(g) dm(h). G

G

As this holds for any fixed f0 and for all f ∈ L1 (G), the uniqueness property in Proposition 7.34 implies that Z χ(h)χA (f0 ) = f0 (g − h)χ(g) dm(g) = χA (f0h ) (11.6) G

for almost every h ∈ G, where we write f0h (g) = f0 (g − h) for g, h ∈ G as usual. We now fix some f1 ∈ L1 (G) such that χA (f1 ) 6= 0 and define χG (h) = χA (f1 )−1 χA (f1h )

R for h ∈ G, so that χG = χ almost everywhere and thus χA (f ) = G f χG dm for all f ∈ L1 (G). Now note that χA (f0h ) depends continuously on h ∈ G for any f0 ∈ L1 (G) by the continuity claim in Lemma 3.74. Therefore χG is continuous and we may replace χ by its continuous representative χG , which implies in turn that (11.6) holds for χG in fact for all h ∈ G (as both sides of the equation are now continuous with respect to h ∈ G). Applying the definition of χG and this version of (11.6) for f0 = f1g2 and h = g1 we obtain χG (g1 + g2 ) = χA (f1 )−1 χA (f1g1 +g2 ) = χG (g1 )χA (f1 )−1 χA (f1g2 ) = χG (g1 )χG (g2 )

for g1 , g2 ∈ G. In other words, χG : G → C is a continuous homomorphism to the multiplicative structure of C. Since χG is bounded and not identically

428

11 Banach Algebras and the Spectrum

zero, it follows that χG is non-zero everywhere, and that χG takes values b The identification between in S1 . This shows that χA is defined by χG ∈ G. the Fourier back transform and the Gelfand transform follows from this, as explained before the proof.  The next example shows that the Fourier transform (or in general the Gelfand transform) is not an isometry. Example 11.39. Let G = R and f1 = 1[0,1] ∈ L1 (R). Then ( −2πit −πit πit sin(πt) −e−πit e −1 = e−πit e 2πit = e πt for t 6= 0, −2πit b f1 (t) = 1 for t = 0

so kfb1 k∞ = 1 = kf1 k1 , but the maximum value of |fb1 (t)| is attained precisely at the point t = 0. Now consider f2 (x) = 1[0,1] (x) − 1[−1,0] (x) = f1 (x) − f1 (−x)

with

fb2 (t) = fb1 (t) − fb1 (−t)

for t ∈ R and fb(0) = 0. Hence |fb2 (t)| achieves its maximum for some t0 6= 0, so that kfb2 k∞ = |fb2 (t0 )| 6 |fb1 (t0 )| + |fb1 (−t0 )| < 2kf1k1 = kf2 k1 ,

showing that the Fourier transform (and hence a Gelfand transform) need not be an isometry. Exercise 11.40. Let G be as in Proposition 11.38. When does L1 (G) have a unit with respect to convolution?

As the following exercise shows, the theory developed above is quite powerful. In fact the original proof of the Wiener lemma [114] was complicated and the Gelfand theory allows for a clean simple proof. Exercise 11.41 (Wiener lemma for C(Td )). Let f ∈ C(Td ) be the limit of an absolutely convergent Fourier series with f (x) 6= 0 for all x ∈ Td . Show that f1 is also the limit of an absolutely convergent Fourier series. Exercise 11.42. (a) (Wiener theorem for L1 (Td )) Let f ∈ L1 (Td ). Show that the R span hλy f | y ∈ Zd i is dense in L1 (Td ) if and only if fb(n) = f (x)χn (x) dx 6= 0 for d all n ∈ Z . (b) (Wiener theorem for L1 (Rd )) Let f ∈ L1 (Rd ). Show that hλy f | y ∈ Rd i is dense in L1 (Rd ) if and only if fb(t) 6= 0 for all t ∈ Rd .

11.4.1 The Pontryagin Dual b is again a topological group. We now show that G

11.4 Locally Compact Abelian Groups

429

b is a topological group). Let G be a σ-compact Proposition 11.43 (G b be the dual group equipped locally compact metrizable abelian group and let G with the weak* topology from the identification with σ(L1 (G)) in Proposib is also a locally compact σ-compact metrizable abelian tion 11.38. Then G b is the topology of uniform convergence on group. The weak* topology on G compact sets, that is equivalent to the topology defined by the neighbourhoods b | kχ − χ0 kK,∞ < ε} UK,ε (χ0 ) = {χ ∈ G

b for compact sets K ⊆ G and ε > 0. of χ0 ∈ G

Proof. By Proposition 2.51, Cc (G) is dense in L1 (G) which, together with Proposition 8.11, shows that the weak* topology on b = σ(L1 (G)) ⊆ B L1 (G)∗ G 1

can be defined by functions in Cc (G). So let Nf1 ,...,fn ;ε (χ0 ) be a neighbourb defined by some functions f1 , . . . , fn ∈ Cc (G)r{0} and ε > 0. hood of χ0 ∈ G Sn Let M = maxj=1,...,n kfj k1 and K = j=1 Supp(fj ). If now χ ∈ UK,ε/M (χ0 ), then Z Z ε fj (χ − χ0 ) dm 6 |fj ||χ − χ0 | dm < kfj k1 M 6ε G

K

for j = 1, . . . , n. This shows that UK,ε/M (χ0 ) ⊆ Nf1 ,...,fn ;ε (χ0 ), which gives one direction for the equivalence of the two topologies. b a compact subset K ⊆ G, For the reverse direction, we fix some χ0 ∈ G, and ε > 0. We need to find some f0 , f1 , . . . , fn ∈ L1 (G) and δ > 0 so that Nf0 ,f1 ,...,fn ;δ (χ0 ) ⊆ UK,ε (χ0 ).

(11.7)

1 We start by defining f0 = m(B 1B0 for some compact neighbourhood B0 0) of 0 ∈ G such that Z 1 1 6 . χ − 1 0 3 m(B0 ) B0

We will now use a similar argument as in the proof of Proposition 11.38. In fact, if χ ∈ Nf0 ; 31 (χ0 ) then |fq0 (χ) − 1| 6 32 and so |fq0 (χ)| > 31 . Moreover, using the relation Z | h f0 (χ) = f0 (g − h)χ(g) dm(g) = χ(h)fq0 (χ), G

we see that

 −1 Z χ(h) = fq0 (χ) f0h χ dm G

for every h ∈ G. This also gives

|χ(h) − χ(0)| 6 3kf0h − f0 k1

(11.8)

430

11 Banach Algebras and the Spectrum

for all χ ∈ Nf0 ; 13 (χ0 ) and h ∈ G. In other words, the equi-continuity properties of all elements χ ∈ Nf0 ; 13 (χ0 ) at 0 are controlled by the continuity of the map G ∋ h 7→ f0h ∈ L1 (G). We set δ = min{ 5ε , 13 }. Using (11.8) and Lemma 3.74 we find some open neighbourhood B ⊆ G of 0 ∈ G with compact closure such that h ∈ B, g0 ∈ G, and χ ∈ Nf0 ; 13 (χ0 ) implies |χ(g0 + h) − χ(g0 )| = |χ(h) − χ(0)| < δ,

(11.9)

b is a character. Since K ⊆ G is compact, where we used the fact that χ ∈ G there exists a finite collection g1 , . . . , gn ∈ K such that K⊆

n [

(gj + B).

(11.10)

j=1

1 We define fj = m(B) 1gj +B for j = 1, . . . , n and claim that Nf0 ,f1 ,...,fn ,δ (χ0 ) is the neigbourhood we were looking for. Indeed, let

χ ∈ Nf0 ,f1 ,...,fn ;δ (χ0 ) ⊆ Nf0 ; 31 (χ0 ) and fix some j ∈ {1, . . . , n}. Using (11.9) for g0 = gj and g = gj + h ∈ gj + B we obtain |χ(g) − χ(gj )| < δ and hence also Z q (11.11) fj (χ) − χ(gj ) = fj χ dm − χ(gj ) < δ. For any g ∈ gj + B we can now combine (11.9) and (11.11) for χ and χ0 , and the assumption χ ∈ Nfj ;δ (χ0 ) to obtain |χ(g) − χ0 (g)| 6 δ + |χ(gj ) − χ0 (gj )| + δ 6 χ(gj )− fqj (χ) + fqj (χ)− fqj (χ0 ) + fqj (χ0 )−χ0 (gj ) +2δ 6 fqj (χ) − fqj (χ0 ) + 4δ < 5δ 6 ε

for all g ∈ gj + B. Varying j and using (11.10) this implies (11.7). Thus, the b in the weak* topology are precisely the neighbourneighbourhoods of χ0 ∈ G hoods of χ0 with respect to the topology of uniform convergence on compact subsets of G. b it is straightforward to check With this identification of the topology on G, that the group operations are continuous. Indeed, UK,ε (χ0 ) = UK,ε (χ0 )

b shows that the map for all compact K ⊆ G, all ε > 0, and χ0 ∈ G

11.5 Further Topics

431

b ∋ χ 7→ χ−1 = χ G

b we therefore know that χ ∈ UK,ε/2 (χ0 ) is continuous. Similarly, for χ0 , η0 ∈ G and η ∈ UK,ε/2 (η0 ) imply kχη − χ0 η0 kK,∞ 6 kχη − χ0 ηkK,∞ + kχ0 η − χ0 η0 kK,∞
1 implies that χ = 1 elements, meaning that for any χ ∈ G is the trivial character. b has (b) Suppose that G is compact and not connected as a topological space. Show that G br{1} and some n > 1 such that χn = 1. torsion elements: there is some χ ∈ G

11.5 Further Topics • The study of L1 (G) for a locally compact abelian group can lead to a vast generalization of the theory of Fourier series and the Fourier transform to all such groups. This is known as Pontryagin duality or harmonic analysis on locally compact abelian groups and will be discussed further in Section 12.8. • Another important class of Banach algebras with additional structure are the von Neumann algebras. These are special C ∗ -sub-algebras of B(H) for a Hilbert space H. We refer to Blackadar [11] for an overview.

Chapter 12

Spectral Theory and Functional Calculus

In this chapter we use results from Chapter 11 to prove the spectral theorem and develop the functional calculus for single self-adjoint operators and for certain commutative C ∗ -algebras (arising, for example, from unitary representations of locally compact abelian groups). As an example of a self-adoint operator we discuss the Laplace operator on a regular tree.

12.1 Definitions and Basic Lemmas In this section we will study the spectrum (as defined for abstract algebras in Section 11.1) in the context of bounded operators on a Hilbert space. More precisely, we fix a complex Hilbert space H, let A = B(H) be the Banach algebra of bounded operators, and study the spectrum of some T ∈ A. 12.1.1 Decomposing the Spectrum Since an operator with non-trivial kernel cannot be invertible, it is clear that any eigenvalue of T ∈ B(H) belongs to the spectrum of T . It is usual to call the set of eigenvalues the discrete or point spectrum. Definition 12.1 (Discrete spectrum). We say that λ ∈ C belongs to the discrete spectrum of T ∈ B(H), and write λ ∈ σdisc (T ), if ker(T − λI) 6= {0}. As we have already seen in Exercise 6.25 and Example 9.1, the discrete spectrum may well be empty for a given bounded operator. For the operators in these examples the notion of eigenvector has to be generalized to a sequence of approximate eigenvectors in the following sense. Definition 12.2 (Approximate point spectrum). We say that λ ∈ C belongs to the approximate point spectrum of T ∈ B(H), and write λ ∈ σappt (T ), © Springer International Publishing AG 2017 M. Einsiedler, T. Ward, Functional Analysis, Spectral Theory, and Applications, Graduate Texts in Mathematics 276, DOI 10.1007/978-3-319-58540-6_12

433

434

12 Spectral Theory and Functional Calculus

if there is a sequence of approximate eigenvectors (vn ) in H with kvn k = 1 for all n > 1, and with k(T − λI)vn k → 0 as n → ∞. For normal operators we will see that the approximate point spectrum coincides with the whole spectrum. We now try to describe the non-discrete part of the spectrum further. Definition 12.3 (Approximate spectrum). We say that λ ∈ C belongs to the approximate spectrum of T ∈ B(H), and write λ ∈ σapprox (T ), if there ⊥ is a sequence of approximate eigenvectors (vn ) with vn ∈ (ker(T − λI)) and kvn k = 1 for all n > 1, and with k(T − λI)vn k → 0 as n → ∞. Definition 12.4 (Continuous spectrum). We say that λ ∈ C belongs to the continuous spectrum of T ∈ B(H) if λ ∈ σcont (T ) = σappt (T )rσdisc (T ). We note that the notion of continuous spectrum is quite standard, that of approximate spectrum less so. These two parts of the spectrum are similar, and should both be thought of as a ‘complement’ to σdisc (T ) inside σappt (T ). The advantage of σapprox (T ) over σcont (T ) is discussed in the exercises below. Note also that σappt (T ) = σdisc (T ) ⊔ σcont (T ) = σdisc (T ) ∪ σapprox (T ). In general the approximate point spectrum may not yet describe the whole spectrum, which motivates the next definition. Definition 12.5 (Residual spectrum). We say that λ ∈ C belongs to the residual spectrum of T ∈ B(H), and write λ ∈ σresid (T ), if λ ∈ / σdisc (T ) and im(T − λI) 6= H. Exercise 12.6. (a) Show that σappt (T ) is a closed subset of C for any T ∈ B(H), and that σapprox (T ) is a closed subset of C for any normal operator T ∈ B(H). (b) Let H = L2 ([0, 1])2 and define T ∈ B(H) by T (f, g) = (MI f, f ) for all (f, g) ∈ H (so that T (f, g)(x) = (xf (x), f (x)) for x ∈ (0, 1)). Show that σapprox (T ) is equal to (0, 1], and in particular is not closed. (c) Find an example of an operator T ∈ B(H) for which σdisc (T ) and σcont (T ) are not closed subsets of C. More specifically, find an example of a self-adjoint operator for which σdisc (T ) is countable and dense in σapprox (T ) = σappt (T ) = [0, 1]. Exercise 12.7. Suppose Tj ∈ B(Hj ) for j = 1, 2 are bounded operators on two Hilbert spaces H1 , H2 . Let T = T1 × T2 ∈ B(H1 × H2 ). Show that σdisc (T ) = σdisc (T1 )∪ σdisc (T2 ), and similarly for σappt and σapprox . Find an example of a pair of self-adjoint operators showing that the corresponding statement does not hold for the continuous spectrum.

Roughly speaking, for multiplication operators the discrete spectrum corresponds to atoms, and we would expect the continuous spectrum to correspond to the continuous part of the measure, as discussed in the following refinement of Exercise 11.4.

12.1 Definitions and Basic Lemmas

435

Exercise 12.8. (a) Let µ be a compactly supported σ-finite measure on C, and let (MI (v))(z) = zv(z) for v ∈ L2µ (C) be the multiplication operator corresponding to the identity map on C. Show that σdisc (MI ) = {λ ∈ C | µ({λ}) > 0},

σappt (MI ) = σ(MI ) = Supp(µ),

σapprox (MI ) = {λ ∈ C | µ (Ur{λ}) > 0 for every neighbourhood U of λ}, σcont (MI ) ⊇ Supp(µcont ), and that σresid (MI ) is empty (here µcont is the measure determined by the decomposition µ = µcont + µdisc , where µcont has no atoms and µdisc is purely atomic). (b) Let (X, B, µ) be a σ-finite measure space, and let g : X → C be a bounded measurable function. Generalize (a) to the multiplication operator Mg on L2µ (X). (c) Let X = [0, 1] ⊆ R, and let λcount be the counting measure on Q ∩ [0, 1] considered as a σ-finite measure on X. Let MI be as in part (a). Describe each of the parts of the spectrum of MI .

Example 12.9. Let T : ℓ2 (N) → ℓ2 (N) be the operator from Exercise 6.23(c) defined by T (vn ) = (0, v1 , v2 , . . . ). Then kT vk = kvk for any v ∈ ℓ2 (N), and so 0∈ / σappt (T ) = σdisc (T ) ∪ σapprox (T ).

However, the image of T is the proper closed subspace {v ∈ ℓ2 (N) | v1 = 0}, so 0 ∈ σresid (T ). Exercise 12.10. For the operator T from Example 12.9, show that σdisc (T ) = ∅, σapprox (T ) = σcont (T ) = S1 = {λ ∈ C | |λ| = 1}, and σresid (T ) = B1C = {λ ∈ C | |λ| < 1}.

The next lemma gives the main relationship between the parts of the spectrum from this section and the spectrum in the sense of Definition 11.2 for A = B(H). Lemma 12.11 (Decomposition of spectrum). Let H be a complex Hilbert space, and let T ∈ B(H). Then σ(T ) = σappt (T ) ∪ σresid (T ) = σdisc (T ) ∪ σapprox (T ) ∪ σresid (T ). Moreover, the residual spectrum is empty if T is normal, so in this case the spectrum coincides with the approximate point spectrum. Proof. If λ ∈ / σ(T ) then, by definition, (T − λI)−1 ∈ B(H) and so vn ∈ H with kvn k = 1 for all n > 1 implies 1 = kvn k = k(T − λI)−1 (T − λI)vn k 6 k(T − λI)−1 kk(T − λI)vn k.

436

12 Spectral Theory and Functional Calculus

This shows that (vn ) cannot be a sequence of approximate eigenvectors, and hence λ ∈ / σappt (T ). Finally, if λ ∈ / σ(T ) then T − λI is an onto map, and so λ ∈ / σresid (T ). The reverse inclusion can be shown almost as directly. Suppose that λ∈ / σappt (T ) ∪ σresid (T ). Then T − λI is injective, since in particular λ ∈ / σdisc (T ), and there exists some ε > 0 with εkvk 6 k(T − λI)vk (12.1) for all v ∈ H since λ ∈ / σapprox (T ). Therefore (T − λI) : H → im(T − λI) is bijective and has an inverse (T − λI)−1 : im(T − λI) → H that is continuous by (12.1). This implies that im(T − λI) is complete (check this), and so is a closed subspace of H. Since λ ∈ / σresid (T ), it follows that im(T − λI) = H and so T − λI is invertible, so that λ ∈ / σ(T ). Suppose now that T : H → H is a normal operator, and that λ ∈ C has V = im(T − λI) 6= H. By normality, T ∗ (T − λI)v = (T − λI)(T ∗ v) for v ∈ H, which implies in particular that V is T ∗ -invariant. By Lemma 6.30 we deduce that V ⊥ is T invariant. Now let v ∈ V ⊥r{0}. Then (T − λI)v ∈ V ⊥ by T -invariance, and (T − λI)v ∈ V by definition. This implies that (T − λI)v = 0, and so λ ∈ σdisc (T ). It follows that σresid (T ) = ∅.  12.1.2 The Numerical Range The following definition is useful because it gives an ‘upper bound’ for the spectrum. Definition 12.12. The numerical range of T ∈ B(H) is the set N (T ) = {hT v, vi | v ∈ H, kvk = 1}. Lemma 12.13. The spectrum of T ∈ B(H) is contained in the closure of the numerical range of T . Proof. We have to show that λ ∈ CrN (T ) implies that λ ∈ / σ(T ). By assumption, |h(T − λI)v, vi | = |hT v, vi − λ| > ε for some fixed ε > 0 and all v ∈ H with kvk = 1. This shows that λ ∈ / σapprox (T ), and that any vector v ∈ H with kvk = 1 is not orthogonal to im(T − λI), so λ ∈ / σresid (T ). By Lemma 12.11, we deduce that λ ∈ / σ(T ).  Exercise 12.14. Show that N (T ) is really only an upper bound for the spectrum of an operator T ∈ B(H) by showing that N (T ) is the convex hull of the eigenvalues of T if T is diagonalizable, that is, if H admits an orthonormal basis consisting of eigenvectors of T .

12.2 The Spectrum of a Tree

437

The following is a direct consequence of Lemma 12.13 (giving an easy alternative to the argument used in Lemma 11.35 in the setting of bounded operators on a Hilbert space). Lemma 12.15. If T ∈ B(H) is self-adjoint then σ(T ) ⊆ R. Proof. For any v ∈ H we have hT v, vi = hv, T vi = hT ∗ v, vi = hT v, vi so that hT v, vi ∈ R and hence σ(T ) ⊆ N (T ) ⊆ R by Lemma 12.13.  12.1.3 The Essential Spectrum In this section we describe another notion of spectrum through a series of exercises. Recall the definition of the Calkin algebra B(H)/ K(H) from Exercise 6.8. The spectrum of an operator T ∈ B(H) when considered in the Calkin algebra is the essential spectrum of T , denoted σess (T ), and the spectral radius of T in this algebra is the essential radius. Definition 12.16. A bounded operator T ∈ B(H) on a separable Hilbert space H is called a Fredholm operator if T is almost injective in the sense that dim(ker(T )) < ∞, and almost surjective in the sense that T (H) is closed and dim(H/T (H)) < ∞. Clearly invertible operators are Fredholm, as is the operator in Example 12.9. Exercise 12.17. Show that I − A is Fredholm for any compact operator A ∈ K(H) on a separable Hilbert space H. Exercise 12.18 (Atkinson’s theorem). Let T ∈ B(H) be a bounded operator on a separable Hilbert space H. Prove that T is Fredholm if and only if there exists some operator S ∈ B(H) such that ST − I and T S − I are both compact. Exercise 12.19. Let (X, µ) be a σ-finite measure space and g ∈ L∞ µ (X). Show that the essential spectrum σess (Mg ) of the normal operator Mg is given by σess (Mg ) = σapprox (Mg ) ∪ {λ ∈ σdisc (Mg ) | dim(ker(Mg − λI)) = ∞}.

12.2 The Spectrum of a Tree In this section we want to study the spectrum of the Laplace operator on a (p + 1)-regular tree (which has strong connections to the properties of the random walk on the tree). Let us recall that a graph is a set of vertices V together with a set of edges E ⊆ V × V. We will assume that the graph is undirected, meaning

438

12 Spectral Theory and Functional Calculus

that (v, w) ∈ E if and only if (w, v) ∈ E for all v, w ∈ V. We write v ∼ w if there is an edge (v, w) ∈ E joining v to w. More concretely, we fix an integer p > 2 (the case p = 1 is quite different and much easier, see Exercise 12.21) and suppose that (V, E) is a (p + 1)regular tree. This means that V is countably infinite, connected, every vertex v ∈ V is connected to exactly (p + 1) further vertices by edges in E, and there are no loops (see Figure 12.1).

w

v0

Fig. 12.1: The 3-regular tree is illustrated here by showing all vertices of distance no more than 4 from a given initial vertex v0 (also called the root). Of course the pattern repeats indefinitely, from w and from all the other vertices at distance 4 from our chosen root v0 .

At first sight there are three natural operators that we can define on ℓ2 (V) using the tree structure (and our discussion will also involve a fourth). In the following we fix p > 2 and a (p + 1)-regular tree (V, E). Definition 12.20. The averaging operator on ℓ2 (V) is defined by T (f )(v) =

1 X f (w), p + 1 w∼v

for f ∈ ℓ2 (V). It replaces the value of a function at a vertex v by the average T (f )(v) of all values f (w) at the direct neighbours w ∼ v in the tree. The summing operator is defined by S = (p + 1)T , simply summing the values at the immediate neighbours. Finally, the Laplace operator ∆ = I − T compares the value at each vertex with the average over all its immediate neighbours. Clearly T , S, and ∆ are essentially equivalent. If one is understood well, then the same applies to the other two. Exercise 12.21. Set p = 1, so that we may think of the (p + 1)-regular graph as having vertex set V = Z and edge set E = {(n, n ± 1) | n ∈ Z}. Show that the summing operator S is self-adjoint, and describe its spectrum.

12.2 The Spectrum of a Tree

439

Exercise 12.22. Show that the summing operator S : ℓ2 (V) → ℓ2 (V) on a (p + 1)-regular tree is a self-adjoint bounded operator with kSk 6 p + 1. Show that there is no eigenvalue λ ∈ σdisc (S) of absolute value |λ| = p + 1.

12.2.1 The Correct Upper Bound for the Summing Operator While it is not difficult to see that kSk 6 p + 1, one might also guess that this upper bound is not the real value of kSk. Indeed, the proof of the last statement of Exercise 12.22 already hints at this. Due to the very rapid growth in the number of vertices in balls BnV (v0 ) (measured with respect to the natural path length on the tree), elements of ℓ2 (V) must decay rather rapidly. We start by calculating kSk and go on to discuss the spectrum of S on ℓ2 (V). Theorem 12.23. (32) Let p > 2 and let (V, E) be a (p + 1)-regular tree. The √ summing operator S : ℓ2 (V) → ℓ2 (V) satisfies kSk = 2 p < p + 1. √ Proof of the lower bound in Theorem 12.23. We first show kSk > 2 p by considering the function f = fN defined by ( 1 p− 2 d(v,v0 ) if d(v, v0 ) 6 N ; f (v) = 0 if d(v, v0 ) > N, where d(·, ·) denotes the distance function on V (cf. p. 401), v0 ∈ V is a fixed initial vertex of V, and N > 1 is an arbitrary integer. First note that X X kf k22 = 1 + p−1 + · · · + p−N v∼v0

d(v,v0 )=N

= 1 + (p + 1)p

−1

Now calculate S(f )(v) = p−(n−1)/2 +

  + · · · + (p + 1)pN −1 p−N = 1 + N 1 + p1 .

X

(12.2)

√ √ p−(n+1)/2 = pp−n/2 + pp−(n+1)/2 = 2 pf (v)

w∼v d(w,v0 )=n+1

whenever 1 6 n = d(v, v0 ) < N . This gives kS(f )k22 >

N −1 X n=1

√ 2 (2 p)

X

d(v,v0 )=n

  √ 2 |f (v)|2 > (2 p) (N − 1) 1 + p1

by using the same calculation as for kf k22 again. On dividing this lower bound √ by (12.2) and letting N → ∞ we deduce that kSk > 2 p. 

440

12 Spectral Theory and Functional Calculus



Exercise 12.24. Show that the sequence kf1 k fN in the previous proof is a sequence N of approximate eigenvectors of S in the sense of Definition 12.2.

For the proof of the upper bound we use an argument that goes back to work of Gabber and Galil [38], which can also be used for other graphs. Proof of the upper bound in Theorem 12.23. Let G = (V, E) be an undirected graph with the property that every vertex v ∈ V has at most N neighbours for some fixed N ∈ N. The summing operator S is again defined by X S(f )(v) = f (w) w∼v

for v ∈ V. Notice that 2 X X X  X 3 kS(f )k22 = f (w) 6 N 2 max |f (w)|2 6 N |f (w)|2 v∈V w∼v

w∼v

v∈V

w∈V

for all f ∈ ℓ2 (V), so that kSk < ∞. Given f1 , f2 ∈ ℓ2 (V) we have X X hf1 , Sf2 i = f1 (v) f2 (w) v∈V

=

X

w∼v

w∼v

f1 (v)f2 (w) =

XX

w∈V v∼w

f1 (v)f2 (w) = hSf1 , f2 i,

which shows that S = S ∗ is self-adjoint. →

We let E denote the set of edges in the directed graph G = (V, E ), where we replace each edge in G by two edges going in either direction, so form→

ally E = {(v, w) ∈ V × V | v ∼ w}. Suppose that λ : E → R>0 is a function  −1 satisfying λ (w, v) = λ (v, w) for each pair of neighbours (v, w) ∈ E, and suppose that X ρ = sup λ(v, w) < ∞. v∈V w∼v

We claim that this implies kSk 6 ρ. To prove the claim fix some f ∈ ℓ2 (V). Then for any two neighbours v ∼ w in G we have   |f (v)|2 λ(v, w) + |f (w)|2 λ(w, v) ∓ f (v)f (w) + f (w)f (v)

or equivalently

2 p p = f (v) λ(v, w) ∓ f (w) λ(w, v) > 0

 ±2ℜ f (v)f (w) 6 |f (v)|2 λ(v, w) + |f (w)|2 λ(w, v).

Summing over all neighbouring vertices we obtain from this

12.2 The Spectrum of a Tree

±2ℜ hf, Sf i = ±2ℜ 6

X v

441

X

f (v)

X

w:w∼v

v∈V

|f (v)|2

6 2ρkf k22.

X

f (w) = ±2ℜ

λ(v, w) +

w:w∼v

X w

X

f (v)f (w)

v,w:v∼w

|f (w)|2

X

λ(w, v)

v:v∼w

Since S is self-adjoint we have hf, Sf i = hSf, f i ∈ R and therefore |hSf, f i| 6 ρkf k22 for any f ∈ ℓ2 (V). Using Lemma 6.31 this implies that kSk 6 ρ. It remains to define λ(v, w) in the context of the (p + 1)-regular tree so √ that ρ = 2 p. We again use a root v0 ∈ V and define ( p−1/2 if d(w, v0 ) = d(v, v0 ) + 1; λ(v, w) = p1/2 if d(w, v0 ) = d(v, v0 ) − 1. With this we obtain X √ λ(v0 , w) = (p + 1)p−1/2 = p1/2 + p−1/2 < 2 p w∼v0

in the case v = v0 and X

√ λ(v, w) = p1/2 + pp−1/2 = 2 p

w∼v

in the case d(v, v0 ) > 1.



12.2.2 The Spectrum of S We outline in this section how to obtain the complete description of the spectrum of S. Proposition 12.25. The spectrum of the summing operator S on a (p + 1)√ √ regular tree is the interval [−2 p, 2 p]. We will leave the details of the proof as an exercise, explaining just the crucial ideas. Since S is self-adjoint, σ(S) ⊆ R by Lemma 12.15. By The√ √ √ orem 12.23 we know that kSk = 2 p, so σ(S) ⊆ [−2 p, 2 p]. For the reverse inclusion we generalize Exercise 12.24 and give for each θ ∈ [0, π] a sequence √ of approximate eigenvectors (fN ) for λ = 2 p cos θ. So we again fix a root vertex v0 and define fN by ( d(v,v0 ) eiθ p−1/2 if d(v, v0 ) 6 N ; fN (v) = 0 otherwise.

442

12 Spectral Theory and Functional Calculus

Exercise 12.26. Calculate SfN , and show that √ eigenvectors of S for λ = 2 p cos θ.



1 f kfN k N



is a sequence of approximate

12.2.3 No Eigenvectors on the Tree We outline in this subsection, via a series of exercises, a proof of the fact that the summing operator S on the (p + 1)-regular tree has no discrete spectrum. For this proof we will use yet another normalization of the averaging and summing operators. We refer to this as the unitarily normalized summation, U1 =

√1 S. p

In fact we will also need the operators Un for n > 0 as defined in the next exercise. Exercise 12.27. For any n > 0, let Un be the operator that maps any function f on a (p + 1)-regular tree to the function Un (f ) defined by Un (f )(v) =

1 pn/2

X

X

f (w),

k6n, w∼k v k≡n(mod2)

where w ∼k v means that w and v have distance k in the (p + 1)-regular tree. Then the sequence of operators (Un ) satisfies U0 = I, U1 = √1p S, and Un+1 = U1 ◦ Un − Un−1 for n > 1.

The recurrence relation above is classical.(33) Definition 12.28. The Chebyshev polynomials of the second kind are the polynomials Un ∈ Z[x] defined recursively by U0 (x) = 1, U1 (x) = 2x, and Un+1 (x) = 2xUn (x) − Un−1 (x) for n > 1. This sequence of polynomials has the following concrete connection to trigonometric functions. Exercise 12.29. Let x = cos θ for some θ ∈ (0, π). Show that Un (x) =

sin (n + 1)θ sin θ



(12.3)

for all n > 1. Exercise 12.30. Suppose that f ∈ ℓ2 (V) is an eigenfunction for U1 with corresponding eigenvalue λ ∈ [−2, 2], and derive a contradiction as follows. (a) Show that

X

w∼2n v

|f (w)|2 >

1 2



U2n (cos θ) −

2

1 U2n−2 (cos θ) p

|f (v)|2

12.3 Main Goals: The Spectral Theorem and Functional Calculus

443

for any n > 1. (b) Show that it is enough to consider the case λ > 0 so that we may write λ = 2 cos θ for some θ ∈ [0, π2 ]. (c) Assuming f (v) 6= 0, show that

X

d(w,v)6n

|f (w)|2 −→ ∞

as n → ∞, and conclude that U1 (or, equivalently, S) has no eigenfunctions in ℓ2 (V).

12.3 Main Goals: The Spectral Theorem and Functional Calculus The main goal of this chapter is to establish two related theorems about normal operators, the first of which gives a complete classification of normal operators in terms of operators as in the next example (which featured in other forms before). Example 12.31. Let H = L2 (X, µ) for a σ-finite measure space (X, µ), and let g : X → C be a bounded measurable function. The multiplication operator Mg is then normal on H. We claim that the spectrum σ(Mg ) is the essential range of g, consisting of all z ∈ C with the property that µ(g −1 U ) > 0 for any neighbourhood U of z. Note first that we have Mg − λI = Mg−λ . If Xλ = {x ∈ X | g(x) = λ} has positive measure (which clearly implies that λ belongs to the essential range), then λ lies in σdisc (Mg ) since, for example, 1B ∈ ker(Mg − λI)r{0} for any measurable B ⊆ Xλ of positive finite measure. If on the other hand λ lies in σdisc (Mg ) and v ∈ ker(Mg − λI)r{0}, then µ({x ∈ X | v(x) 6= 0}rXλ ) = 0, so that µ(Xλ ) > 0. So suppose now g(x) 6= λ almost everywhere. Then we can solve the equation (Mg − λI)u = v, formally, for any v ∈ L2 (X, µ), by putting u = (g − λ)−1 v, and this is in fact the only solution as a set-theoretic function on X. It follows that λ ∈ / σ(Mg ) if and only if the operator v 7−→ (g − λ)−1 v is a bounded linear map on L2 (X, µ). By Corollary 4.30, we know this is equivalent to asking that (g − λ)−1 be an L∞ µ function on X. This translates to the condition that there exists some C > 0 such that µ({x ∈ X | |(g(x) − λ)−1 | > C}) = 0,

444

12 Spectral Theory and Functional Calculus

or equivalently that µ({x ∈ X | |g(x) − λ| < C1 }) = 0, which means that λ is not in the essential range of g. This chain of equivalences gives the claim. It is convenient to observe that the essential range coincides with the support of the push-forward measure ν = g∗ µ on C so that the above can be reformulated as σ(Mg ) = Supp(g∗ µ). (12.4) In particular, if X is a bounded subset of C and g(z) = I(z) = z, then the spectrum of MI is the support of µ. Exercise 12.32. Let (X, µ) be a σ-finite measure space. Show that there exists a finite measure ν on X with ν ≪ µ ≪ ν and a unitary isomorphism Φ : L2 (X, µ) → L2 (X, ν) such that Φ ◦ Mg = Mg ◦ Φ whenever g : X → C is measurable and bounded and Mg acts on the spaces L2 (X, µ) and L2 (X, ν), respectively.

Theorem 12.33 (Spectral theorem for normal operators). Let H be a separable complex Hilbert space, and let T ∈ B(H) be a normal operator on H. Then there exists a finite measure space (X, µ), a bounded measurable function g : X → C, and a unitary isomorphism φ : H → L2µ (X) such that H   φy

T

−−−−→

H  φ y

L2µ (X) −−−−→ L2µ (X) Mg

commutes. As we will see, we can always choose X = σ(T ) × N, which we will identify F with the countable disjoint union n∈N σ(T ). Moreover, the measure µ on X will be obtained from countably many spectral measures which we will define using the continuous functional calculus. Finally, g will be the bounded continuous map g(z, n) = z on X. The second goal is to establish the measurable functional calculus, which allows us to obtain normal operators f (T ) from any normal T ∈ B(H) and any bounded measurable f ∈ L ∞ (σ(T )). Notice that the function f in this formulation lies in the space L ∞ (σ(T )) defined in Example 2.24(8) rather than L∞ µ (σ(T )) for some measure µ, because we do not have, at this stage, any preferred measure (see Exercise 12.71 for more on this). For a given normal T ∈ B(H) this assignment L ∞ (σ(T )) ∋ f 7−→ f (T ) ∈ B(H)

(12.5)

has many natural functorial properties: n n X X (FC1) (Polynomials) If f (z) = aj z j for all z ∈ σ(T ), then f (T ) = aj T j . j=0

j=0

12.3 Main Goals: The Spectral Theorem and Functional Calculus

445

(FC2) (Continuity) The map in (12.5) is continuous, with kf (T )kop 6 kf k∞ for f ∈ L ∞ (σ(T )), and is an isometry on C(σ(T )), meaning that kf (T )kop = kf k∞

(12.6)

for f ∈ C(σ(T )). (FC3) (Algebra) The map in (12.5) is an algebra homomorphism. In particular, f1 (T ) commutes with f2 (T ) for f1 , f2 ∈ L ∞ (σ(T )). Moreover, (f (T ))∗ = f (T ∗ ) for f ∈ L ∞ (σ(T )). (FC4) (Multiplication operators) If H is unitarily isomorphic to L2µ (X) and T in B(H) corresponds (via φ and the commutative diagram) to Mg on L2µ (X) as in Theorem 12.33, then f ◦ g is defined almost everywhere and f (T ) corresponds to Mf ◦g for any f ∈ L ∞ (σ(T )). (FC5) (Commuting operators) If V ⊆ H is a closed subspace that is invariant under both T and T ∗ , then V is also f (T )-invariant for all f ∈ L ∞ (σ(T )). Moreover, if S ∈ B(H) commutes with the normal operator T ∈ B(H) and its adjoint, then S also commutes with f (T ) for all f ∈ L ∞ (σ(T )).  (FC6) (Iteration) If f ∈ L ∞ (σ(T )) and h ∈ L ∞ f (σ(T )) , then h(f (T )) = (h ◦ f )(T ).

Theorem 12.34 (Measurable functional calculus for normal operators). Let H be a complex Hilbert space, and T ∈ B(H) a normal operator. Then there exists a functional calculus for T — that is, a uniquely determined map as in (12.5) with the properties (FC1)–(FC6). Example 12.35. (a) Suppose T ∈ B(H) is P a normal operator on a complex Hilbert space H, and suppose that f (z) = n>0 an z n is a power series with C radius of convergence R such that σ(T ) ⊆ BR . Then we may restrict the function f to σ(T ) so that the power series converges uniformlyPand absolutely. By combining (FC1) with (FC2), it follows that f (T ) = n>0 an T n is also defined by the absolutely converging power series. (b) Let (X, µ) be a finite (or σ-finite) measure space, g : X → C a bounded measurable function, and let Mg : L2µ (X) → L2µ (X) be the multiplication operator as in Example 12.31. For f ∈ C[z] it follows from (FC1) that f (Mg ) = Mf ◦g . Hence it is reasonable to expect that this holds more generally for f ∈ C(σ(Mg )) or even f ∈ L ∞ (σ(Mg )) as in (FC4). Note that the composition f ◦ g is well-defined in L∞ µ (X), although the image

446

12 Spectral Theory and Functional Calculus

of g might not lie entirely in σ(Mg ) (and so in the domain of the continuous or measurable function). In fact, the description of the spectrum σ(Mg ) in (12.4) shows that µ({x ∈ X | g(x) ∈ / σ(Mg )}) = g∗ µ(Crσ(Mg )) = 0, (the complement of the support being the largest open set with measure 0) so that, for almost every x ∈ X, g(x) lies in σ(Mg ) and therefore f (g(x)) is defined for almost every x (we can set the value of the function f (g(x)) on the zero-measure subset where g(x) ∈ / σ(Mg ) to be 0). This shows that all the expressions in property (FC4) make sense. It is tempting to say that in view of (FC4) the existence of the map in (12.5) is simply a consequence of Theorem 12.33. However, if we really used Theorem 12.33 and (FC4) as the definition of the functional calculus then we would not know whether it is canonical — that is, independent of the isomorphism φ in Theorem 12.33. The fact that we will define the functional calculus independently of the isomorphism φ, but nonetheless obtain (FC4) as one of its properties, demonstrates that there is only one reasonable way to define f (T ) for f ∈ L ∞ (σ(T )). In particular, we obtain the uniqueness claimed in Theorem 12.34. Exercise 12.36. Show that Theorem 12.33, (FC4), and (FC5) together imply the uniqueness claim in Theorem 12.34 (despite the difference in the assumptions on H).

For simplicity we will start with the case of self-adjoint operators, which only needs the material from Section 11.1 and Section 11.2. In Section 12.5 we will start the discussion of commutative C ∗ -sub-algebras of B(H), which includes the case of finitely many commuting normal operators and builds on Section 11.3. Before continuing, let us note a slightly confusing point in the notation for the functional calculus. As usual I denotes the identity map x 7→ x and 1 denotes the constant function x 7→ 1. Thus (FC1) states in particular that 1(T ) = I and I(T ) = T for any normal operator T ∈ B(H). The connection to multiplication operators should help to explain why this makes sense.

12.4 Self-Adjoint Operators The goal of this section is to show how to define an operator f (T ) where the operator T ∈ B(H) is a self-adjoint operator and f ∈ C(σ(T )) and to use this functional calculus to clarify the relationship between the spectrum of an operator and its action on vectors. This will imply Theorem 12.33 for these operators.

447

12.4.1 Continuous Functional Calculus For certain functions f the definition of f (T ) for an operator T on a Hilbert space H is clear. For example, if p(z) =

d X

aj z j

j=0

is a polynomial with coefficients in C restricted to σ(T ), then the only reasonable definition for p(T ) is p(T ) =

d X j=0

aj T j ∈ B(H),

where T 0 is defined to be the identity I on H. In fact, this polynomial definition makes sense for any T ∈ B(H), not only for T normal, but there is a technical point which explains why only normal operators are really suitable here. If σ(T ) is finite, then a polynomial of unknown degree is not uniquely determined by its restriction to σ(T ). Thus in this case the definition above a priori only gives a map C[T ] → B(H), not one defined on C(σ(T )). We cannot hope to have a functional calculus only depending on the spectrum if this dependency is real, and simple examples show that it sometimes is. Consider, for example, the operator A ∈ B(C2 ) given by the matrix   01 A= . 00 Clearly σ(A) = {0}, so the polynomials p1 (z) = z and p2 (z) = z 2 coincide when restricted to σ(A), but p1 (A) = A 6= 0 = p2 (A). However, if we assume that T is normal, this problem does not arise, because then (as we will show) for every p ∈ C[z] we have kp(T )kop = kpk∞,σ(T ) , as claimed in (12.6). This suggests that we should attempt to define, for any function f ∈ C(σ(T )), the functional calculus for T applied to f by FCT (f ) = lim pn (T ), n→∞

(12.7)

where (pn ) is a sequence of polynomials with kf − pn k∞,σ(T ) → 0 as n → ∞, which will then allow us to define f (T ) to be FCT (f ). This definition is indeed sensible and possible, and the basic properties of this construction are given in the following theorem. Roughly speaking, any operation on (or property of) the function f which is reasonable corresponds to an analogous operation on (or property of) f (T ). Theorem 12.37 (Continuous functional calculus). Let H be a complex Hilbert space and T ∈ B(H) a self-adjoint bounded operator. Then there exists

448

12 Spectral Theory and Functional Calculus

a unique linear map FC = FCT : C(σ(T )) −→ B(H), which we will also denote by f 7→ f (T ) = FC(f ), with the following properties: Pd (1) For any polynomial p(z) = j=0 aj z j ∈ C[z] we have FC(p) = p(T ) =

d X

aj T j

j=0

(that is, FC(f ) is just f (T ) when f is a polynomial, extending the definition above). (2) For any f ∈ C(σ(T )) we have kFC(f )kop = kf k∞,σ(T ) .

(12.8)

(3) The map FC is a Banach algebra homomorphism, meaning that FC(f1 f2 ) = FC(f1 )FC(f2 ) for f1 , f2 ∈ C(σ(T )) and FC(1) = I. For any f ∈ C(σ(T )), we have FC(f )∗ = FC(f¯) (that is, f (T )∗ = f¯(T )), and in particular f (T ) is normal. (4) If λ ∈ σdisc (T ) is in the point spectrum then ker(T − λI) ⊆ ker(f (T ) − f (λ)I). If T = Mg for some bounded measurable g : X → C and a finite measure space (X, µ), then f (Mg ) = Mf ◦g for all f ∈ C(σ(Mg )). As already observed, the essence of the proof of existence of FC is to show that (12.7) is a valid definition. After having established its existence we will again simply write f (T ) = FCT (f ) for all f ∈ C(σ(T )). Lemma 12.38. Let H be a complex Hilbert space.

(1) For T ∈ B(H) and a polynomial p ∈ C[z], define p(T ) ∈ B(H) as before. Then σ(p(T )) = p(σ(T )). (12.9) (2) Let T ∈ B(H) be normal and let p ∈ C[z] be a polynomial. Then kp(T )kop = kpk∞,σ(T ) .

(12.10)

Proof. For (1), observe first that the statement is trivially true if p is a constant. If p has degree at least one, then fix λ ∈ C and factor the polyno-

449

mial p(z) − λ in C[z] to give p(z) − λ = α

Y

16i6d

(z − λi ),

for some α ∈ Cr{0} and complex numbers λ1 , . . . , λd ∈ C (not necessarily distinct). Since p 7→ p(T ) is an algebra homomorphism, it follows that Y p(T ) − λI = α (T − λi I). 16i6d

If λ ∈ / p(σ(T )), then the solutions λi to the equation p(z) = λ are not in σ(T ), so each factor T −λi I is invertible, and hence p(T )−λI is invertible. It follows that σ(p(T )) ⊆ p(σ(T )). Conversely, if λ ∈ p(σ(T )), then one of the λi must lie in σ(T ). Because the factors commute, we can assume without loss of generality that either i = 1 if T − λi I is not surjective — in which case p(T ) − λI is not surjective either, or i = d if T − λi I is not injective — in which case neither is p(T ) − λI. In all situations, λ ∈ σ(p(T )), proving the reverse inclusion by Proposition 4.25. The use of Proposition 4.25 here can be avoided by using the fact that σ(T ) = σappt (T ) ∪ σresid (T ) in Lemma 12.11 and arguing for λi ∈ σappt (T ) as in the case where T − λi I is not injective. For (2), we note first that FC(p) = p(T ) is normal if T is. By the improved spectral radius formula (Proposition 11.21), we have kp(T )kop =

max

λ∈σ(p(T ))

|λ|,

and by (12.9), we get kp(T )kop =

max

λ∈p(σ(T ))

|λ| = max |p(λ)| = kpk∞,σ(T ) , λ∈σ(T )

as desired.



Proof of Theorem 12.37. Let T ∈ B(H) be self-adjoint so that σ(T ) is a compact subset of R. By Lemma 12.38, we deduce that the map FC : (C[z], k · k∞,σ(T ) ) −→ B(H) sending p to p(T ) is linear and continuous (indeed, is an isometry). Hence it extends uniquely, using the Stone–Weierstrass theorem (Theorem 2.40) and the automatic extension to the closure (Proposition 2.59), to a map defined on C(σ(T )), and the extension remains isometric, as claimed in (2).

450

12 Spectral Theory and Functional Calculus

By continuity, the properties FC(f1 f2 ) = FC(f1 )FC(f2 ) and FC(f )∗ = FC(f¯), which are valid for polynomials (using T = T ∗ for the latter), pass to the limit and are true for all f ∈ C(σ(T )). It follows that FC (C(σ(T ))) is commutative and closed under taking adjoints, and in particular f (T ) is normal for all functions f ∈ C(σ(T )). This proves (1), (2), and (3). To prove (4) we choose for f ∈ C(σ(T )) a sequence (pn ) in C[z] such that kpn − f k∞,σ(T ) → 0 as n → ∞. If now v ∈ ker(T − λI) then T v = λv, and by induction and linearity pn (T )v = pn (λ)v, for all n > 1, and we deduce that f (T )(v) = f (λ)v. Finally, assume that T = Mg . Then by (1) we have p(Mg ) = Mp◦g for all p ∈ C[x] and by (2), Mpn ◦g = pn (Mg ) → f (Mg ) as n → ∞. However, by Corollary 4.30 and the discussion in Example 12.35(b) we also have kMpn ◦g −Mf ◦g k = kMpn◦g−f ◦g k = kpn ◦g −f ◦gkesssup = kpn −f k∞,σ(T ) −→ 0 as n → ∞. Together these imply that f (Mg ) = Mf ◦g .



Exercise 12.39. Analyze the proof of Theorem 12.37 above and find out where the argument fails for a normal operator that is not self-adjoint.

The following definition will help us introduce spectral measures in the next section. Definition 12.40. Let T ∈ B(H) be a bounded operator on a complex Hilbert space. We say that T is a positive operator, written T > 0, if it is self-adjoint and has hT v, vi > 0 for all v ∈ H. The requirement that T is self-adjoint is redundant, as the next exercise shows. Exercise 12.41. Let H be a complex Hilbert space. Show that any T ∈ B(H) with the property that hT v, vi ∈ R for all v ∈ H is self-adjoint.

Corollary 12.42. Let H be a complex Hilbert space, and let T ∈ B(H) be self-adjoint. If f ∈ C(σ(T )) is non-negative, then f (T ) is a positive operator.

√ Proof. If f√∈ C(σ(T )) satisfies f > 0, then we can write f = ( f )2 = g 2 where g = f > 0 is also continuous on σ(T ). Then g(T ) is well-defined by the continuous functional calculus in Theorem 12.37, is self-adjoint (by Theorem 12.37(3) since g is real-valued), and hf (T )v, vi = hg(T )2 v, vi = hg(T )v, g(T )vi > 0, for all v ∈ V , which shows that f (T ) > 0.



In the case of a self-adjoint compact operator the continuous functional calculus discussed above is quite straightforward.

451

Example 12.43. Let H be a separable complex Hilbert space, and let T in K(H) be a compact self-adjoint operator. Applying Theorem 6.27 we have X Tv = λn hv, en ien , n>1

where (λn ) is the sequence of (real) eigenvalues of T with (en ) the sequence of corresponding eigenvectors. If dim H < ∞ then the spectrum simply consists of the eigenvalues, and if dim H = ∞ then σ(T ) = {0} ∪ {λ1 , λ2 , . . . } and for f ∈ C(σ(T )) we have by Theorem 12.37(4) X f (T )v = f (λ)Pλ v, λ∈σ(T )

where Pλ ∈ B(H) is the orthogonal projection onto ker(T − λI). 12.4.2 Corollaries to the Continuous Functional Calculus †

The following exercise generalizes Lemma 12.38(1) to any continuous function. Exercise 12.44 (Spectral mapping theorem). Let H be a complex Hilbert space, and let T ∈ B(H) be a self-adjoint operator. Show that σ(f (T )) = f (σ(T )) for any f ∈ C(σ(T )).

Corollary 12.45 (Positive roots). Let T ∈ B(H) be a positive operator. For any n > 1, there exists a positive operator, denoted T 1/n , with the property that (T 1/n )n = T . We note that such an operator is unique, but we will only prove this a little later (see Exercise 12.72). Proof. Since T > 0, we have σ(T ) ⊆ [0, ∞), so the function f : x 7→ x1/n is defined and continuous on σ(T ). Since f (x)n = x for all x > 0, the functional calculus implies that f (T )n = T . Moreover f > 0, and hence f (T ) > 0 by Corollary 12.42.  The case n = 2 is sufficient to prove Lemma 6.38, which claimed that any bounded operator on a separable complex Hilbert space may be written as a sum of four unitary operators and was used in our discussion of trace-class operators. Proof of Lemma 6.38. Let B be a bounded operator on the complex Hilbert space H. Then † The results of this subsection help to explain the functional calculus and how it can be used further, but will not be needed later.

452

12 Spectral Theory and Functional Calculus 1 B = 12 (B + B ∗ ) + i 2i (B − B ∗ )

1 and both 12 (B + B ∗ ) and 2i (B − B ∗ ) are self-adjoint. Thus it remains to show that every self-adjoint operator can be written as a linear combination of two unitary operators. So let A be a self-adjoint operator on H and assume without loss of generality that kAkop 6 1. Then I − A2 is positive since it is clearly self-adjoint and

(I − A2 )v, v = kvk2 − kAvk2 > 0

for all v ∈ H. By Corollary 12.45 we find an operator U = A + i(I − A2 )1/2 satisfying U U ∗ = U ∗ U = A2 + (I − A2 ) = I

and 12 (U + U ∗ ) = A, which shows that A is the linear combination of two unitary operators.  The next corollary, which will be generalized later, starts to show how the functional calculus can be used to provide detailed information about the spectrum. Corollary 12.46 (Isolated points). Let H be a complex Hilbert space and let T ∈ B(H) be a bounded self-adjoint operator. Let λ ∈ σ(T ) be an isolated point meaning that there is some ε > 0 for which σ(T ) ∩ (λ − ε, λ + ε) = {λ}. Then λ ∈ σdisc (T ). Proof. The fact that λ is isolated implies that the function f = 1{λ} : σ(T ) → C which maps λ to 1 and σ(T )r{λ} to 0 is a continuous function on σ(T ). Hence we can define an operator P = f (T ) ∈ B(H). We claim that P is non-zero, and is a projection to ker(T − λI). This will show that λ is in the discrete spectrum. Firstly, P 6= 0 because kP kop = kf k∞,σ(T ) = 1 by the functional calculus. Clearly f = f 2 in C(σ(T )), so P = f (T ) = f (T )2 = P 2 , and P = f (T ) = f (T ) = P ∗ since f is real-valued, which shows that P is an orthogonal projection. Moreover, we have an identity of continuous functions [(I − λ1)f ](z) = (z − λ)f (z) = 0 for all z ∈ σ(T ), so by the functional calculus we get (T − λI)P = 0, which shows that 0 6= im(P ) ⊆ ker(T − λI). 

453

Exercise 12.47. Extend Corollary 12.46 by showing that λ ∈ / σapprox (T ). Exercise 12.48 (Polar decomposition). Let H1 , H2 be complex Hilbert spaces and suppose that T : H1 → H2 is a bounded operator. Show that there exist a positive self-adjoint operator A ∈ B(H1 ) and a bounded operator U : H1 → H2 with the property that U |ker(T ) = 0, A|ker(T ) = 0, U |(ker(T ))⊥ : (ker(T ))⊥ → H2 is an isometry, and T = U A.

12.4.3 Spectral Measures Using the functional calculus, we can now clarify how the spectrum represents an operator T and its action on vectors v ∈ H. Proposition 12.49 (Spectral measure). Let T ∈ B(H) be a self-adjoint operator on a complex Hilbert space H. Then for any v ∈ H there exists a uniquely determined measure µv on σ(T ), depending on T and on v, such that Z f (x) dµv (x) = hf (T )v, vi σ(T )

for all f ∈ C(σ(T )). In particular, we have µv (σ(T )) = kvk2 ,

(12.11)

so µv is a finite measure. This measure is called the spectral measure associated to v (with respect to T ). Proof. This is a direct application of the Riesz representation theorem (Theorem 7.44). Indeed, the linear functional ℓ : C(σ(T )) −→ C f 7−→ hf (T )v, vi is well-defined and positive, since if f > 0 we have f (T ) > 0 by Corollary 12.42, and so hf (T )v, vi > 0 by definition. Hence there exists a uniquely determined positive locally finite measure µv on σ(T ) such that Z hf (T )v, vi = ℓ(f ) = f (x) dµv (x) σ(T )

for all f ∈ C(σ(T )). Moreover, taking f = 1, we obtain (12.11) (which also implies that kℓk = kvk2 ).  Example 12.50. Let T : H → H be as in Example 12.43. Then we have Z X f (x) dµv (x) = f (0)kP0 (v)k2 + f (λn )kPλn (v)k2 σ(T )

n>1

454

12 Spectral Theory and Functional Calculus

for all continuous functions f on σ(T ). Therefore, as a measure on σ(T ), µv is a series of Dirac measures at the eigenvalues λn (including 0 if 0 is an eigenvalue) with µv ({λn }) equal to kPλn (v)k2 . This example indicates how, roughly speaking, one can think of µv in general. The spectral measure indicates how the vector v is spread out across the spectrum; in general, any individual point λ ∈ σ(T ) carries a vanishing proportion of the vector, because µv ({λ}) is often zero. However, µv (U ) > 0 for a subset U ⊆ σ(T ) indicates that a positive proportion of the vector belongs to the ‘generalized eigenspace’ corresponding to that part of the spectrum. We will discuss this interpretation of the spectrum again in Section 12.7. Essential Exercise 12.51. Let (X, µ) be a finite measure space and let T be the multiplication operator Mg for some bounded measurable g : X → R. Describe the spectral measure of v ∈ L2µ (X). 12.4.4 The Spectral Theorem for Self-Adjoint Operators Using spectral measures, we can now give a complete description of a selfadjoint operator in B(H) (essentially by adapting the arguments from Section 9.1.2). To see how this works, consider first some v ∈ H and the associated spectral measure µv , so that Z hf (T )v, vi = f (x) dµv (x) σ(T )

for all continuous functions f defined on the spectrum of T . In particular, if we apply this to |f |2 = f f and use the properties of the continuous functional calculus in Theorem 12.37, we get

kf (T )vk2 = hf (T )v, f (T )vi = (f f )(T )v, v Z = |f (x)|2 dµv (x) = kf k2L2 (σ(T ),µv ) . σ(T )

In other words, the map φ defined by φ : {f (T )v | f ∈ C(σ(T ))} −→ L2 (σ(T ), µv ) f (T )v 7−→ f

is an isometry. We note that the above also implies that φ is well-defined, since f1 (T )v = f2 (T )v for f1 , f2 ∈ C(σ(T )) implies 0 = k(f1 − f2 )(T )vk = kf1 − f2 kL2 (σ(T ),µv ) and so f1 = f2 in L2 (σ(T ), µv ). Using the automatic extension to the closure (Proposition 2.59) we can extend the above map to an isometry, again denoted

455

by φ, from the closed subspace Hv = {f (T )v | f ∈ C(σ(T ))} into L2 (σ(T ), µv ). Since µv is a finite positive measure, continuous functions are dense in the Hilbert space L2µv (σ(T )) (by Proposition 2.51), which implies that φ is onto (since the image is complete due to the isometry property). Next we show that the subspace Hv is invariant under T . Indeed, to see that T (Hv ) ⊆ Hv , it is enough to show that T (f (T )v) ∈ Hv for f ∈ C(σ(T )). For this let us again write I for the function σ(T ) ∋ x 7→ x. In this notation the functional calculus in Theorem 12.37 gives I(T ) = T and (If )(T ) = T f (T ). Applying this operator to v gives T f (T )v = (If )(T )(v) and hence φ ◦ T (f (T )v) = If = MI φ(f (T )v) for all f ∈ C(σ(T )). By the density of the vectors f (T )v ∈ Hv for functions f ∈ C(σ(T )) we obtain φ ◦ T = MI ◦ φ.

(12.12)

Thus the above discussion proves a special case of Theorem 12.33, namely the case where T is self-adjoint and there exists some vector v with Hv = H. It is important in this reasoning to keep track of the measure µv , which depends on the vector v, and to remember that elements of L2 are actually equivalence classes of functions. Indeed, it could well be that µv has support which is much smaller than the spectrum, and then the values of a continuous function f outside the support are irrelevant in viewing f as an element of L2µv . In particular, the map C(σ(T )) → L2µv (σ(T )) is not necessarily injective. Definition 12.52. Let H be a Hilbert space and T ∈ B(H). The cyclic subspace generated by a vector v ∈ H (also called the cyclic vector for Hv ) equals the closure Hv = {f (T )v | f ∈ C(σ(T ))} = hT n v | n ∈ N0 i. For a unital sub-algebra A ⊆ B(H) the cyclic subspace generated by v is defined by Hv = Av. The equivalence of the two definitions of Hv follows from the density of the subspace of polynomials in C(σ(T )) and Theorem 12.37. We also note that Hv is separable.

456

12 Spectral Theory and Functional Calculus

It is not always the case that T admits a cyclic vector for all of H. However, we have the following lemma which allows us to reduce many questions to the cyclic case. Lemma 12.53. Let H be a Hilbert space, and let T ∈ B(H) be a self-adjoint operator. Then there exists a family (H Li )i∈I of non-zero, pairwise orthogonal, closed subspaces of H such that H = i∈I Hi is the orthogonal direct sum of the Hi , T (Hi ) ⊆ Hi for all i, and T restricted to Hi is, for all i, a self-adjoint bounded operator in B(Hi ) with a cyclic vector. Essential Exercise 12.54. Prove Lemma 12.53. Notice that if H is separable, the index set in the above result is either finite or countable, since each Hi is non-zero. We can now prove Theorem 12.33 for a single self-adjoint operator. Theorem 12.55 (Spectral theorem for self-adjoint operators). Let H be a separable complex Hilbert space and T ∈ B(H) a continuous self-adjoint operator. Then there exists a finite measure space (X, µ), a unitary isomorphism φ : H → L2µ (X) and a bounded measurable function g : X → R, such that Mg ◦ φ = φ ◦ T. In fact, we can set X = σ(T ) × N and g(z, n) = z for z ∈ σ(T ) and n ∈ N. Proof. Consider a (possibly finite) family (Hn )n>1 of pairwise orthogonal non-zero closed subspaces of H, spanning H, for which T (Hn ) ⊆ Hn and T has a cyclic vector vn 6= 0 on Hn as in Lemma 12.53. By replacing vn with n−1 kvn k−1 vn , we can assume that kvn k2 = n−2 (without changing Hn ). Let µn = µvn be the spectral measure associated to vn (and T ), so that µn (σ(T )) = kvn k2 = n−2 for all n > 1. If the list of subspaces is finite, H1 , . . . , Hn0 say, then we set Hn = {0} for n > n0 and still work with the index set N. By the argument at the beginning of this section, we have unitary maps φn : Hn → L2µn (σ(T )), such that φn ◦ T = MI ◦ φn and MI is the multiplication operator corresponding to the function I defined by I(z) = z for z ∈ σ(T ). Now define X = σ(T ) × N with the product topology, and define the locally finite positive measure µ by µ(A × {n}) = µn (A) for n > 1 and measurable A ⊆ σ(T ). It is easily checked (see Exercise 3.30) that this indeed defines

457

a measure on X. Moreover, in this context measurable functions on X correspond one-to-one with sequences of measurable functions (fn ) on σ(T ) by mapping f to (fn ) with fn (z) = f (z, n) for all (z, n) ∈ σ(T ) × N, and Z XZ f (x) dµ(x) = fn (z) dµn (z) X

n>1

σ(T )

whenever this makes sense (for example, if f > 0, equivalently fn > 0 for all n, or if f is integrable, which is equivalent to fn being µn -integrable for all n and the sum of the integrals of |fn | being convergent). In particular, X X µ(X) = µn (σ(T )) = n−2 < ∞, n>1

n>1

so that (X, µ) is a finite measure space. Moreover, the map M L2 (σ(T ), µn ) −→ L2 (X, µ) n

(fn ) 7−→ f

is a unitary isomorphism (cf. Exercise L3.37) which we will use implicitly in the following. Now recall that H = n Hn so that we can construct φ by defining X   φ wn = φn (wn ) n P

L

n

for all n wn ∈ n Hn = H. Since kφn (wn )kL2 (σ(T ),µn ) = kwn kH , this defines a unitary map with inverse given by X M φ−1 (f ) = φ−1 Hn = H n (fn ) ∈ n>1

n

for f = (fn )n ∈ L2 (X, µ). Now consider the map g : X → C sending (z, n) to z, which is bounded and measurable. Then by (12.12) we have φ(T (wn )) = φn (T (wn )) = MI (φn (wn )) = Mg (φ(wn )) for all wn ∈ Hn . As this holds for all n > 1 we see that φ ◦ T = Mg ◦ φ.



This spectral theorem is extremely useful. It immediately implies a number of results which could also be proved directly from the continuous functional calculus, but less transparently so. Note that the method of proof (treating first the case of cyclic operators, and then extending the result to direct sums) may also be a shorter approach to some of the other corollaries, since in the cyclic case one knows that the

458

12 Spectral Theory and Functional Calculus

multiplication operator MI can be taken to correspond to the identity function I : σ(T ) ∋ x 7→ x on the spectrum. Corollary 12.56 (Positivity). Let H be a separable complex Hilbert space and let T ∈ B(H) be a self-adjoint operator. Then for any f ∈ C(σ(T )) we have f (T ) > 0 if and only if f > 0. Proof. Because of Corollary 12.42, we only need to check that f (T ) > 0 implies that f > 0. Now two unitarily equivalent operators are simultaneously either positive or not, so it suffices to consider an operator of the form T = Mg acting on L2µ (X) for a finite measure space (X, µ). Without loss of generality we may assume that X = σ(T ) × N and g(z, n) = z as in Theorem 12.55. Recall that f (Mg ) = Mf ◦g for any function f ∈ C(σ(T )) by Theorem 12.37(4). Now set v = 1{(z,n)|f (z) 0, {(z,n)|f (z) 0, since σ(T ) = σ(Mg ) = Supp(g∗ µ) by Example 12.31.  12.4.5 Consequences for Unitary Representations As the following exercises show, the material above is also useful for the study of unitary representations.

ý

ý

Exercise 12.57. Let G be a topological group, H1 and H2 complex Hilbert spaces, and let π1 : G H1 and π2 : G H2 be unitary representations of G. Suppose that π1 and π2 are isomorphic in the sense that there exists a bijective bounded operator T from H1 to H2 with T π1 (g) = π2 (g)T for all g ∈ G. Show that this implies that π1 and π2 are also unitarily isomorphic, meaning that T can be chosen to be in addition a unitary isomorphism T : H1 → H2 .

ý

ý

Exercise 12.58 (Schur’s lemma). (a) Let π1 : G H1 and π2 : G H2 be unitary representations of a topological group G, and let B : H1 → H2 be a bounded operator with Bπ1 (g) = π2 (g)B for all g ∈ G. Show that if π1 is irreducible (that is, there are no closed π1 -invariant subspaces in H1 other than {0} and H1 ) then B ∗ B = λIH1 for some λ > 0 and if π2 is also irreducible then BB ∗ = λIH2 . (b) Suppose now that π1 = π2 is irreducible and deduce that B = λIH1 for some λ ∈ C. Exercise 12.59. Let G be a topological group, and recall the set P1 (G) of normalized positive-definite functions in Cb (G) from Exercise 9.55. Show that p ∈ P1 (G) is extreme in P1 (G) if and only if the associated unitary representation from Exercise 9.55(a) is irreducible.

12.5 Commuting Normal Operators

459

12.5 Commuting Normal Operators The following is a natural generalization of the spectral theorem for normal operators (Theorem 12.33), showing that any commutative C ∗ -sub-algebra of B(H) is unitarily equivalent to a C ∗ -sub-algebra of multiplication operators. Theorem 12.60 (Spectral theorem for commuting normal operators). Let H be a separable complex Hilbert space, and let A ⊆ B(H) be a separable commutative unital C ∗ -sub-algebra of B(H). Then there exists a finite measure space (X, µ), a unitary isomorphism φ : H → L2µ (X) and for every a ∈ A a bounded function ga ∈ L∞ µ (X) such that H   φy

a

−−−−→

H  φ y

L2µ (X) −−−−→ L2µ (X) Mga

commutes. In fact, we can choose X = σ(A) × N and ga = ao , where we identify the function ao with the function defined by ao (x, n) = ao (x) for (x, n) ∈ σ(A) × N and all a ∈ A and the map that sends a ∈ A ∗ to ga ∈ L∞ µ (X) is a C -isomorphism preserving products, the adjoint operation, and norms. Clearly Theorem 12.33 is a special case of Theorem 12.60 (cf. Exercise 12.61). Using Section 12.4 the following proof will be much shorter than the proof of the case of a single self-adjoint operator above. By Corollary 11.34 the Gelfand transform A ∋ a 7−→ ao ∈ C(σ(A)) is an isometry and an algebra isomorphism satisfying (a∗ )o = ao

(12.13)

for all a ∈ A. (We note that Lemma 11.35 in the proof of Corollary 11.34 can here be replaced by Lemma 12.15.) We recall that σ(A) is the generalization of the spectrum of a single operator and note that in the following the inverse map C(σ(A)) ∋ f = ao 7−→ a ∈ A should be thought of as a generalized continuous functional calculus. Proof of Theorem 12.60. Fix v ∈ H and define a linear functional Λ : C(σ(A)) → C by

460

12 Spectral Theory and Functional Calculus

Λ(ao ) = hav, vi for every a ∈ A. We claim that Λ is a positive functional on C(σ(A)). Suppose that a ∈ A with ao > 0. Then√there exists some b = b∗ ∈ A (defined using the Gelfand transform by bo = ao ) with b2 = a. The claimed positivity now follows, since Λ(ao ) = hav, vi = hbv, bvi > 0. By the Riesz representation theorem (Theorem 7.44) there exists a positive finite measure µv on σ(A) such that Z hav, vi = ao dµv σ(A)

and it follows that 2

kavk = hav, avi = ha av, vi =

Z

o

(a a) dµv =

Z

|ao |2 dµv

for all a ∈ A by (12.13). Just as in Sections 9.1.2 and 12.4.4, this induces a unitary isomorphism between the cyclic subspace Hv = Av and L2µv (σ(A)) which sends av ∈ Av to ao ∈ C(σ(A)). In particular, for a, b ∈ A we have φ(abv) = (ab)o = ao bo = ao φ(bv). Fixing a ∈ A, this extends by continuity to the statement φ(aw) = Mao φ(w) for all w ∈ Hv . As in Sections 9.1.2 and 12.4.4 this extends to a proof of Theorem 12.60 as follows. If w1 , w2 , . . . is an orthonormal basis of H then we define H1 = Hw1 , H2 = Hw2⊥ where w2 ∈ H1⊥ is the orthogonal projection to H1⊥ , similarly H3 = Hw3⊥ with w3 ∈ (H1 ⊕ H2 )⊥ , and so on. Replacing wn P by a scalar multiple for each n > 1 we may assume that n>1 kwn⊥ k2 < ∞. Define G X = σ(A) × N ∼ σ(A) = n∈N

with measure

µ=

G

µwn⊥ ,

n∈N

where the disjoint union notation indicates that we consider µwn⊥ as a measure on σ(A) × {n} and then take the sum to obtain the measure µ on X. With this M M L2µ (X) ∼ L2 (X, µwn⊥ ) ∼ Hwn⊥ = H, (12.14) = = n∈N

n∈N

12.6 Spectral Measures and the Measurable Functional Calculus

461

and application of a ∈ A leaves each subspace Hwn⊥ invariant and corresponds to multiplication by ao ∈ C(σ(A)) on L2 (σ(A), µwn⊥ ) ⊆ L2 (X, µ). As this holds for all n ∈ N the map in (12.14) gives the unitary isomorphism φ : H → L2µ (X), with the required properties.



Exercise 12.61. (a) Suppose that A is the unital commutative C ∗ -algebra generated by T , a normal operator on a complex Hilbert space H, in the sense that A = hT m (T ∗ )n | m, n > 0i where T 0 = (T ∗ )0 = I. Show that ı : σ(A) ∋ φ 7→ φ(T ) ∈ σ(T ) defines a homeomorphism between compact metric spaces and use this to deduce Theorem 12.33 from Theorem 12.60. (b) Now consider the algebra A = I, T1 , T1∗ , T2 , T2∗ generated by two commuting normal operators T1 , T2 ∈ B(H) on a complex Hilbert space. Show that ı : σ(A) ∋ φ 7−→ (φ(T1 ), φ(T2 )) ∈ σ(T1 ) × σ(T2 ) is continuous and injective. Give a concrete example to show that the image of ı may not be all of σ(T1 ) × σ(T2 ). Exercise 12.62. State and prove a spectral theorem for normal compact operators as a corollary of Theorem 12.60.

12.6 Spectral Measures and the Measurable Functional Calculus For the proof of Theorem 12.34 we now discuss some more general spectral measures. As it makes little difference whether we consider a single (selfadjoint or normal) operator or a commutative Banach algebra (as in the previous section), we will do the latter. The reader only interested in the case of a single self-adjoint operator T may replace the use of Theorem 12.60 below by Theorem 12.55, set σ(A) = σ(T ) as in Exercise 12.61(a) and replace the operation ao 7→ a ∈ A by the continuous functional calculus C(σ(T )) ∋ f 7−→ f (T ). 12.6.1 Non-Diagonal Spectral Measures Definition 12.63. Let H be a complex Hilbert space and let A ⊆ B(H) be a separable commutative unital C ∗ -sub-algebra. For v, w ∈ H a non-diagonal spectral measure is a finite complex-valued measure µv,w on σ(A) with

462

12 Spectral Theory and Functional Calculus

Z

σ(A)

ao dµv,w = hav, wi

(12.15)

for all ao ∈ C(σ(A)). Proposition 12.64. Let H be a complex Hilbert space, and assume that A is a separable commutative unital C ∗ -sub-algebra of B(H). Then for every pair v, w ∈ H there exists a uniquely determined finite complex-valued spectral measure µv,w on σ(A) satisfying (12.15). Moreover, the measure µv,w depends sesqui-linearly on v, w ∈ H and satisfies kµv,w k 6 kvkkwk. Proof. We may assume without loss of generality that H is separable, for otherwise we may replace H by the closure of Av + Aw, which is separable by the assumption on A. Recall from Corollary 11.34 that C(σ(A)) = {ao | a ∈ A}. Recall that linear functionals on C(σ(A)) can be uniquely identified with complex-valued measures by Theorem 7.54. We now apply this to the linear functional Λv,w : C(σ(A)) ∼ = A ∋ ao 7−→ hav, wi which satisfies hav, wi 6 kavkkwk 6 kakkvkkwk = kao k∞ kvkkwk

by Cauchy–Schwarz, the definition of the operator norm, and the isometry claim in Corollary 11.34. This shows that Λv,w ∈ C(σ(A))∗ is a bounded functional and we obtain the uniquely defined complex-valued measure µv,w on σ(A) satisfying (12.15) and kµv,w k 6 kvkkwk. Since uniqueness and existence are now shown, the sesqui-linearity follows easily from the sesqui-linearity of the inner product on H.  The following exercise clarifies how the more general non-diagonal spectral measures can be constructed from the diagonal spectral measures µv = µv,v appearing in the spectral theorem. Exercise 12.65. Let H and A be as in Proposition 12.64. Let v, w ∈ H and decompose w into w = w0 + w ⊥ with w0 ∈ Hv and w ⊥ ∈ H⊥ v . Use the spectral theorem to express µv,w in terms of µv and the vector f0 ∈ L2µv (σ(A)) corresponding to w0 .

12.6.2 The Measurable Functional Calculus Using the spectral measures from above, we can now define for every measurable function f ∈ L ∞ (σ(A)) a corresponding operator fH ∈ B(H). In the case of A being generated by I and a normal operator T this gives a definition of f (T ) for a function f ∈ L ∞ (σ(T )).

12.6 Spectral Measures and the Measurable Functional Calculus

463

Proposition 12.66. Let H be a complex Hilbert space and let A ⊆ B(H) be a separable commutative unital C ∗ -sub-algebra. For any f ∈ L ∞ (σ(A)) there exists a bounded operator fH which is uniquely characterized by the property Z hfH v, wi = f dµv,w (12.16) σ(A)

for all v, w ∈ H. Moreover, the operator norm of fH satisfies kfH k 6 kf k∞ . Proof. Since µv,w is a finite complex-valued measure, and f ∈ L ∞ (σ(A)) R is bounded, the integral σ(A) f dµv,w exists. Moreover, Z f dµv,w 6 kf k∞ kµv,w k 6 kf k∞ kvkkwk σ(A)

by Proposition 12.64. Thus for a fixed v ∈ H the map Z w 7−→ f dµv,w σ(A)

is linear and bounded with operator norm bounded by kf k∞ kvk. Therefore, by Fr´echet–Riesz representation (Corollary 3.19) there exists some uniquely determined vf with kvf k 6 kf k∞ kvk (12.17) for which hvf , wi = hw, vf i =

Z

f dµv,w

σ(A)

for all w ∈ H. By linearity of v 7→ µv,w and the bound (12.17), we see that v 7−→ vf = fH v defines a bounded operator fH with (12.16), and kfH k 6 kf k∞ .



Proposition 12.66 defines the measurable functional calculus. We now discuss its main properties, which will also give the proof of the existence claim in Theorem 12.34 (recall that uniqueness was the content of Exercise 12.36). Proposition 12.67. Let H be a complex Hilbert space, and let A ⊆ B(H) be a separable commutative unital C ∗ -sub-algebra. The measurable functional calculus L ∞ (σ(A)) ∋ f 7→ fH ∈ B(H) has the following properties: (FC1) If f = ao ∈ C(σ(A)), then fH = a ∈ A. (FC2) kfH k = kf k∞ for any f in C(σ(A)) and kfH k 6 kf k∞ for any f in L ∞ (σ(A)). (FC3) (fH )∗ = (f )H and (f1 f2 )H = (f1 )H (f2 )H for f1 , f2 , f ∈ L ∞ (σ(A)). In particular, properties (FC1)–(FC3) in Theorem 12.34 hold.

464

12 Spectral Theory and Functional Calculus

Proof. Recall that Corollary 11.34 gives the existence of a C ∗ -algebra isomorphism C(σ(A)) ∋ f = ao 7→ fH = a ∈ A (see Theorem 12.37 in the case of a single self-adjoint operator). Also recall that in Proposition 12.64 we derived the existence of the family of finite complex-valued measures {µv,w } on σ(A) with Z Z hfH v, wi = hav, wi = ao dµv,w = f dµv,w (12.18) σ(A)

σ(A)

for all f = ao ∈ C(σ(A)), which in Proposition 12.66 we turned around to use (12.18) as the definition of fH for f ∈ L ∞ (σ(A)). Hence this definition of the measurable functional calculus extends the definition of the continuous functional calculus (that is, of the map C(σ(A)) ∋ ao 7→ a ∈ A), and hence satisfies (FC1). By Corollary 11.34 and Proposition 12.66 above we also have (FC2). To prove (FC3) we argue in the following way. First, by Corollary 11.34 we already know (FC3) for continuous functions. We will use this to encode the properties in (FC3) into properties of the non-diagonal spectral measures µv,w , which in turn will give the same properties for measurable functions. ∗ Let us start with fH = (f )H , which we know by Corollary 11.34 for f = ao ∈ C(σ(A)). We claim this implies that µv,w = µw,v . To see this, let a ∈ A and notice that Z Z Z ao dµv,w = ha∗ v, wi = hv, awi = haw, vi = ao dµw,v = ao dµw,v , for any v, w ∈ H, and since this holds for all f = ao ∈ C(σ(A)) the claim follows. Now we use essentially the same identity (in a slightly different order, ∗ and with a different logic) to deduce that fH = fH for f ∈ L ∞ (σ(A)). So ∞ let f ∈ L (σ(A)). Then ∗ hfH v, wi = hv, fH wi = hfH w, vi Z Z

= f dµw,v = f dµv,w = (f )H v, w

for all v, w ∈ H, as required. We now show that (f1 f2 )H = (f1 )H (f2 )H

(12.19)

for f1 , f2 ∈ L (σ(A)). Again we know this property for f1 , f2 ∈ C(σ(A)). We claim that this implies dµ(f2 )H v,w = f2 dµv,w

(12.20)

12.6 Spectral Measures and the Measurable Functional Calculus

for f2 ∈ L ∞ (σ(A)) and v, w ∈ H. For f1 , f2 ∈ C(σ(A)) we have Z f1 dµ(f2 )H v,w = h(f1 )H (f2 )H v, wi σ(A) Z = h(f1 f2 )H v, wi =

465

f1 f2 dµv,w .

σ(T )

As this holds for all f1 ∈ C(σ(A)) we obtain (12.20) for f2 ∈ C(σ(A)) and for all v, w ∈ H. Using µv,w = µw,v this also shows that dµv,(f1 )H w = dµ(f1 )H w,v = f1 dµw,v = f1 dµv,w for all f1 ∈ C(σ(A)). For general f2 ∈ L ∞ (σ(A)) we now see that Z

f1 dµ(f2 )H v,w = h(f1 )H (f2 )H v, wi = (f2 )H v, (f1 )H w σ(A) Z Z = f2 dµv,(f1 )H w = f1 f2 dµv,w σ(A)

σ(A)

for all f1 ∈ C(σ(A)) and v, w ∈ H, which implies the claim in (12.20). We now derive (12.19) from (12.20) for f1 , f2 ∈ L ∞ (σ(A)). Indeed, applying (12.20) we see that Z h(f1 )H (f2 )H v, wi = f1 dµ(f2 )H v,w σ(A) Z = f1 f2 dµv,w = h(f1 f2 )H v, wi . σ(A)

As v, w ∈ H were arbitrary, we derive (12.19) and conclude (FC3).



Proposition 12.68. Under the same hypotheses as in Proposition 12.64, the measurable functional calculus has the following properties: (FC4) Suppose µ is a finite measure on X = σ(A) × N so that the operators a in A correspond to multiplication operators Mao via a unitary isomorphism φ : H → L2µ (X) as in Theorem 12.60. Then for any f in L ∞ (σ(A)) the operator fH corresponds to Mf (with f (x, n) = f (x) for (x, n) ∈ σ(A) × N). (FC5) If V ⊆ H is a closed subspace such that aV ⊆ V for all a ∈ A, then V is also invariant under fH for all f ∈ L ∞ (σ(A)). Moreover, if S ∈ B(H) commutes with all a ∈ A, then S commutes with fH for all f ∈ L ∞ (σ(A)). In the case of a normal operator T ∈ B(H) the measurable functional calculus satisfies (FC4)–(FC6) in Theorem 12.34. Proof. We first prove (FC5). Suppose that S ∈ B(H) commutes with all elements a ∈ A. We extend this again to fH for f ∈ L ∞ (σ(A)) using the

466

12 Spectral Theory and Functional Calculus

spectral measures. For these, we have that µSv,w = µv,S ∗ w since Z Z ao dµSv,w = haSv, wi = hav, S ∗ wi = ao dµv,S ∗ w σ(A)

σ(A)

for all a ∈ A. Now let f ∈ L ∞ (σ(A)), and notice that Z Z hfH Sv, wi = f dµSv,w = f dµv,S ∗ w = hSfH v, wi σ(A)

σ(A)

for all v, w ∈ H, showing that fH S = SfH . To complete the proof of (FC5), we still have to consider an invariant subspace V ⊆ H. By Lemma 6.30 the closed subspace V ⊥ is a∗ -invariant for all a ∈ A. This implies that every a ∈ A commutes with the orthogonal projection PV : H → H onto V , since for v ∈ V we have av ∈ V and aPV (v) = av = PV (av), and since for w ∈ V ⊥ we also have aw ∈ V ⊥ and aPV (w) = 0 = PV (aw). By what we have already proved, PV commutes with fH for f ∈ L ∞ (σ(A)). If now v ∈ V then fH v = fH ◦ PV (v) = PV ◦ fH (v) ∈ V shows the remaining claim in (FC5). For the proof of (FC4) it suffices, because of (FC5), to prove the cyclic case. That is, to prove (FC4) in the case where X = σ(A) and µ is the spectral measure on σ(A) corresponding to the generator of H. For v, w ∈ H we then have dµv,w = φ(v)φ(w) dµ since hav, wi =

Z

ao φ(v)φ(w) dµ

for all a ∈ A by the spectral theorem (Theorem 12.60). Therefore Z Z hfH v, wi = f dµv,w = f φ(v)φ(w) dµ = hMf φ(v), φ(w)i L2 (σ(A),µ) σ(A)

σ(A)

for any v, w ∈ H and for f ∈ L ∞ (σ(A)). This proves (FC4) as in the proposition. It remains to prove (FC4) and (FC6) in Theorem 12.34 (since (FC4) above differs slightly from (FC4) in Theorem 12.34).

12.6 Spectral Measures and the Measurable Functional Calculus

467

So suppose that T is unitarily isomorphic to the multiplication operator Mg : L2µ (X) → L2µ (X) for some bounded measurable g : X → C, and let f ∈ L ∞ (σ(Mg )). Then f ◦ g is defined almost everywhere (specifically, on g −1 (σ(Mg )); see Examples 12.31 and 12.35(b)). For v, w ∈ L2µ (X) we see that dµv,w is the push-forward of vw dµ under g (by solving Exercise 12.51) since Z Z hf (Mg )v, wi = f ◦ gvw dµ = f dµv,w X

σ(Mg )

first for f ∈ C[z] and then for all f ∈ C(σ(Mg )) by the properties of the functional calculus in Theorem 12.37. Therefore, Z Z hf (Mg )v, wi = f dµv,w = (f ◦ g)vw dµ = hMf ◦g v, wi σ(Mg )

X

for all v, w ∈ L2µ (X) and f ∈ L ∞ (σ(Mg )), which proves (FC4). For the proof of (FC6) we assume first that H is cyclic. By Theorem 12.55 there is a finite measure space (X, µ), some bounded measurable g : X → C, and a unitary isomorphism φ : H → L2µ (X) such that φ ◦ T = Mg ◦ φ. By (FC4) we have φ ◦ f (T ) = f (Mg ) ◦ φ = Mf ◦g ◦ φ for all f ∈ L ∞ (σ(T )). Next note that by Example 12.31 (specifically, by (12.4)) we have σ(Mf ◦g ) ⊆ f (σ(T )). Let h ∈ L ∞ (f (σ(T ))) and apply (FC4) twice more to see that φ◦h(f (T )) = h(Mf◦g )◦φ = Mh◦f◦g ◦φ = h◦f (Mg )◦φ = φ◦(h◦f )(T ), which gives h(f (T )) = (h ◦ f )(T ), as claimed in (FC6). If H is not cyclic, then we decompose H into a direct sum of cyclic subspaces and apply (FC5) and the above case.  Exercise 12.69. Suppose that H is a separable complex Hilbert space and that A ⊆ A′ ⊆ B(H) are two separable commutative unital C ∗ -sub-algebras. (a) Suppose that H and the action of A on H is described as in Theorem 12.60. Generalize (12.4) from Example 12.31 to this context by showing that



σ(A) = Supp (πσ(A) )∗ µ , where πσ(A) : X = σ(A) × N −→ σ(A) is the projection to the first factor. (b) Show that the restriction map π : σ(A′ ) ∋ φ′ 7→ φ′ |A ∈ σ(A) is continuous. (c) Let µv,w be the spectral measure for A on σ(A) and let µ′v,w be the spectral measure for A′ on σ(A′ ) for v, w ∈ H. Show that π∗ µ′v,w = µv,w .

468

12 Spectral Theory and Functional Calculus

(d) Show that the two notions of measurable calculus are compatible in the sense that ′ . any f ∈ L ∞ (σ(A)) defines some f ′ ∈ L ∞ (σ(A′ )) by f ′ = f ◦ π which satisfies fH = fH (e) Show that π is surjective. Exercise 12.70. Generalize the results of Section 9.1.3 to the context of a single normal operator T ∈ B(H) or a separable commutative unital C ∗ -sub-algebra of B(H). Exercise 12.71. In the notation of Theorem 12.34, fix a normal operator T , suppose it has a description as a multiplication operator Mg on some measure space L2µ (X), and let ν = g∗ µ be the push-forward measure on C. Show that f (T ) is now well-defined ∞ (σ(T )) agree ν-almost everywhere, with f ∈ L∞ ν (σ(T )) by proving that if f1 , f2 ∈ L then f1 (T ) = f2 (T ). Exercise 12.72. Let T ∈ B(H) be a positive self-adjoint bounded operator on a complex separable Hilbert space. Show that for every n > 1 there is only one positive operator S in B(H) with S n = T .

12.7 Projection-Valued Measures In this section, we describe another version of the spectral theorem, which is essentially equivalent but sometimes more convenient. Moreover, it allows us to examine some concepts from Section 9.1.4 in greater detail. The idea is to generalize the following interpretation of the spectral theorem (Theorem 6.27) for a compact self-adjoint operator T ∈ K(H). If we denote by Pλ the orthogonal projection onto ker(T − λI) for λ ∈ R as in Example 12.43, then we have X v= Pλ (v), λ∈R

T (v) =

X

λPλ (v),

λ∈R

f (T )(v) =

X

f (λ)Pλ (v)

λ∈R

for all v ∈ H and f ∈ C(σ(T )), where the series are well-defined because Pλ is 0 for λ ∈ / σ(T ). To generalize this, it is natural to expect that one must replace the summations with appropriate integrals. Thus some form of integration for functions taking values in B(H) is needed. Moreover, ker(T − λI) may be zero for all λ, and so the projections must be generalized. We start by considering these two questions abstractly. Definition 12.73 (Projection-valued measure). Let H be a complex Hilbert space and let P(H) denote the set of orthogonal projections onto closed subspaces in B(H). A (finite) projection-valued measure Π on H is a map

12.7 Projection-Valued Measures

469

B −→ P(H)

B 7−→ ΠB

defined on the Borel σ-algebra B of a compact metric space X and taking values in the set of projections, such that the following hold: (1) Π∅ = 0 and ΠX = I. (2) If (Bn ) is a sequence (or finite list) of pairwise disjoint Borel subsets of X, and G B= Bn , n>1

then ΠB =

X

ΠBn

(12.21)

n>1

where the series converges in the strong operator topology (see Section 8.3). P We note that (12.21) simply means that ΠB (v) = n>1 ΠBn (v) for v ∈ H (see Exercise 8.57). In the study of a single normal operator T on H we will set X = σ(T ) ⊆ C, and more generally X = σ(A) in the study of a commutative separable unital C ∗ -sub-algebra A ⊆ B(H). Also notice that Definition 12.73 resembles in some ways the definition of a (finite) Borel measure on X. The discussion below will reveal further parallels to Lebesgue integration. Lemma 12.74. Let H be a complex Hilbert space and Π a projection-valued measure on H defined Fn on the σ-algebra B of Borel subsets of a compact metric space X. If X = j=1 Bj is a disjoint decomposition of X into measurable L subsets B1 , . . . , Bn ∈ B, then H = nj=1 im ΠBj is an orthogonal direct sum of the closed subspaces im ΠB1 , . . . , im ΠBn ⊆ H. Proof. By the defining properties of projection-valued measures we have v=

n X

ΠBj v

j=1

with ΠBj v ∈ Hj = im ΠBj and it remains to show that Hj ⊥ Hk for 1 6 j 6= k 6 n. So suppose w =